CVRG EDDI Server Configuration

From CVRG Wiki

Jump to: navigation, search

Contents

Installing DSpace/EDDI

The CVRG EDDI tool was developed as a direct extension of DSpace. As such, the installation process laid out in the DSpace 5.x Documentation is an excellent resource for a new deployment of EDDI. To avoid duplication, these instructions will link to sections of the official documentation and be followed with special instructions added for the unique EDDI deployment.

System Requirements

https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace#InstallingDSpace-PrerequisiteSoftware All software required by Dspace is listed below with the version currently in use by the EDDI development team. Links to available CVRG documentation for installation procedures are listed below when available. It is recommended to read through the documented DSpace installation instructions (link below) before proceeding with these installs. The PostgreSQL install for DSpace in particular has special requirements.

Software Type Vendor/Name Minimum Required Version Installation Instructions
Application Server Tomcat 7.0.30 https://tomcat.apache.org/tomcat-7.0-doc/index.html
Build Mechanism Ant 1.9.4 http://wiki.cvrgrid.org/index.php/CVRG_Waveform_Server_Configuration#Install_Apache_Ant
Build Mechanism Maven 3.3.1 http://wiki.cvrgrid.org/index.php/CVRG_Waveform_Server_Configuration#Install_Apache_Maven
Development Language Java (Oracle not Open JDK) 7.0 (1.7.0_75) http://tecadmin.net/steps-to-install-java-on-centos-5-6-or-rhel-5-6/
Relational Database PostgreSQL 9.4 http://wiki.cvrgrid.org/index.php/CVRG_Waveform_Server_Configuration#Installing_and_Configuring_the_Database This section can be followed for the PostgreSQL install and configuration. The Database account and setup section can be used for Dspace with only minor modifications (i.e. different DB name, user, and passwords)
Server Operating System CentOS 6.5 n/a



The following are additional requirements for the EDDI Development Environment:

Software Type Vendor/Name Minimum Required Version Installation Instructions
Application Server Solr 4.10.2 See Below in http://wiki.cvrgrid.org/index.php/CVRG_EDDI_Server_Configuration#EDDI_Notes: for Solr configuration
Deployment Tool Jenkins latest version http://wiki.cvrgrid.org/index.php/CVRG_Waveform_Server_Configuration#Jenkins_Install
Deployment Tool Git 1.7.1 http://wiki.cvrgrid.org/index.php/CVRG_Waveform_Server_Configuration#Install_Git


Installation Instructions:

Official Install Documentation on DSpace Wiki: https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace

As mentioned above, the DSpace source installation is well documented in its wiki. The special instructions which follow annotate minor changes added for the unique EDDI deployment. The section numbers below correspond to the downloadable DSpace Manual available as a pdf from their main documentation page here: https://wiki.duraspace.org/display/DSDOC5x/DSpace+5.x+Documentation

EDDI Notes:

  • Before you go anywhere, make sure to open port 8585 (i.e., sudo /usr/sbin/lokkit --port=8585:tcp) for Jenkins. Running tomcat instances interfere with port openings.
  • Configuration of all prerequisite software excluding PostgreSQL and Solr should be done now. Postgres and Solr can be completed when you reach them in the DSpace instructions.
  • (section 2.4.1) The source release will be needed for EDDI development.
  • (section 2.4.2) EDDI uses the following directory path in its server environment for the 4 main Dspace directories
    • Installation Directory: /opt/dspace
    • Source Directory: /opt/dspace-source/[dspace-<version>-src-release]
    • Web Deployment Directory: /opt/dspace-apache/
    • SOLR install: /opt/solr-[version]
  • **NOTE** Ownership Permissions on any directory which Jenkins will need to write files to should be changed to the 'Jenkins' user created by Linux during Jenkins install as that user will control processes triggered by Jenkins during Maven compilation of code and the Ant deployment. ([Source Directory] & [Installation Directory])
  • Under "Deploy Web Applications" (section 2.4.3.9): EDDI is deployed using "technique A" with a single Context Path being defined.
    • In [Web Deployment Directory]conf/Catalina/localhost, create a file named 'ROOT.xml' and place the following code inside (modifying the docBase to point to your installation directory):
      <?xml version='1.0'?>
      <Context
         docBase="[Installation Directory]/webapps/jspui"
         debug="0"
         reloadable="true"
         cachingAllowed="false"
         allowLinking="true"/>
      This will direct Tomcat to use the most recently built JSPUI webapp as the root web application at server startup.
  • SET UP SOLR After the "Administrator Account" step (section 2.4.3.10): The JSPUI web application relies on Solr for rendering many elements on the page. Configuration of Solr is recommended before initial startup.
    • download solr and unzip in the [Solr Install Directory]
    • At the command line, execute:
      cp -r [Source Directory]/solr/search [Source Directory]/solr/statistics [Solr Install Directory]/example/solr/
    • disable ICUFoldingFilterFactory: In order to add the 'search' core to Solr, it is necessary to disable a filter class within the schema.xml file.
      FIXME this is not a standard configuration and requires troubleshooting
      • At the command line, execute:
        sudo vi [Solr Install Directory]/example/solr/search/conf/schema.xml
      • comment out the filter class declarations for "solr.ICUFoldingFilterFactory"(lines 241 & 254 in my document) so they look like the sample below.
        <!--<filter class="solr.ICUFoldingFilterFactory"/>-->
      • write & quit
    • START SOLR
      • from [SOLR install]/example, execute:
        sudo java -jar start.jar
    • With a web browser, navigate to localhost:8983/solr/#/ This address assumes your deployment is on a local VM. For server deployments, you should substitute your port number or web address for localhost and you will need to make port configuration changes to access Solr from your browser.
    • Finish SOLR configuration in web interface
      • Click on the 'core admin' button in the left menu.
      • Under the admin interface, click on 'add core'
      • Fill in all fields. The default 'collection1' Core can be used for reference to correctly identify the instanceDir, and dataDir as Statistics and Search cores should be in the same location.
      • Click 'add core' to complete. Repeat process for all remaining cores.
  • Initial Startup (Section 2.4.3.11)

EDDI Jenkins Configuration:

For EDDI continuous integration you will need 2 separate builds:

  • dspace-maven - The Maven build which will pull changes from Github and compile all code to DSpace's source target directory.
  • dspace-ant - An ant build triggered on a success of the Maven build above. This will deploy the newly built target to [Installation Directory].

Maven Build:

  1. Go to Jenkins main page and click New Item.
  2. Choose a name for the Job (e.g. 'dspace-maven'), select the Maven Project option and click OK.
  3. Check the Discard Old Builds checkbox and Choose the Log Rotation strategy to keep builds up to 30 days.
  4. Set the GitHub project URL. (i.e. <undisclosed_location>/PROJECT_NAME/)
  5. At the Source Code Management area, choose Git
  6. Set the Repository URL (i.e. <undisclosed_location>/PROJECT_NAME.git) and the Branches to build (i.e. */master), please check the list above to set the correct branch to each project.
  7. At the Build Triggers area:
    1. Uncheck All (trigger builds manually for DSpace)
  8. At the Build area:
    1. For the ROOT POM: pom.xml
    2. For Goals & Options: clean package -U
    3. Click Advanced
    4. Check the Use custom workspace, to set the project Directory in our default folder fill in the path to the dspace directory of your dspace source folder used for Maven builds. (e.g. /opt/dspace-source/dspace-5.3-src-release/dspace)
    5. Default settings can be used for validation level, settings file, and global settings file
  9. If you need to notify someone about the integration of this JOB. Plase check the E-mail Notification checkbox at Build Settings area and inform which mail should be notified.
  10. At the end click on Save button to apply and you will be ready to build your project.
    Note: To check if any errors occurred at your build, you can see them in Console Output for the selected build number.

Ant Build:

  1. Go to Jenkins main page and click New Item.
  2. Choose a name for the Job (e.g. 'dspace-ant'), select the Freestyle Project option and click OK.
  3. Uncheck the Discard Old Builds checkbox
  4. Under "Advanced Project Options":
    1. Check "Use Custom Workspace" and fill in path to the target install directory within the DSpace source code (e.g. /opt/dspace-source/dspace-5.3-src-release/dspace/target/dspace-installer)
  5. At the Source Code Management area, choose None
  6. At the Build Triggers area:
    1. Check "Build after other projects are built" and set 'dspace-maven' as the "project to watch". Check "Trigger only if build is stable"
  7. At the Build area:
    1. Under "Ant Version" select ant1.9.5 (or the correct version for your install)
    2. Set the "target" to 'update'
  8. At the end click on Save button to apply and you will be ready to build your project.
    Note: To check if any errors occurred at your build, you can see them in Console Output for the selected build number.

Build on Jenkins:

Now, it's time to compile all JARS, Web Services, Portlets and Extension. To build the dspace-maven project from the Jenkins homepage, just click on the 'schedule a build' icon to the left for 'dspace-maven'. This will trigger the maven build which on success will trigger the ant build.

OPTIONAL Additional Builds:

One advantage to setting permissions on the [Source Directory] and [Installation Directory] to Jenkins is that Jenkins can now also be used to run other build processes and schedule cron jobs for internal dspace maintenance tasks. Below are some general guidelines for setting these projects up and examples of practical applications.

solr-stats-maint: Set up a script to run DSpace solr statistics maintenance...

  1. Choose a name for the Job (e.g. 'solr-stats-maint'), select the Freestyle Project option and click OK.
  2. Uncheck the Discard Old Builds checkbox if checked.
  3. At the Source Code Management area, choose None
  4. At the Build Triggers area
    1. Check 'build periodically' and fill in a schedule using standard Linux scheduling protocol (e.g. "H H * * *" would schedule the process to run daily)
  5. At the Build area:
    1. Choose 'execute shell'
    2. In the command field type the process you would like to run (e.g. "/opt/dspace/bin/dspace stats-util -o" would execute dspace's internal solr statistics maintenance script)
  6. At the end click on Save button to apply and you will be ready to build your project.

dspace-ant-cleanup: Schedule an ant build which will clean all old archived builds from the [installation directory]

  1. To set up this project, Click on 'New Item' from the Jenkins homepage and simply copy the existing 'dspace-ant' project above.
  2. Under 'Build Triggers', Uncheck everything. (only trigger builds manually)
  3. Under 'Build', change the Target to 'clean_backups'

dspace-root-maven: Schedule a Maven build for DSpace from the ROOT [dspace source] directory. This is only necessary if an update contains NEW customizations to JAVA source code. Setup for this project would be nearly identical to the standard dspace-maven build above. The only differences would be that the project is only ever triggered manually, it wouldn't pull updated code from GitHub and the path is different.

statistics general report: Setup is similar to solr-stats-maint example above, but substitutes statistics maintenance scripts for solr.

Additional Configuration:

After the DSpace/EDDI instance is initially configured, there are a few more steps to configuring DSpace.

Metadata Schema Update

In order for the database to recognize the new schema items added upon rebuilding with the custom EDDI configurations, a metadata registry update script must be run vi command line as described in the following wiki entry:

Adding New Communities and Collections

While logged in as administrator (account you created when installing DSpace)

  1. in the user menu to the right (the 'logged in as [username]' tab), click on the 'Administer' option.
  2. under 'Content' in the top menu select 'Communities and Collections'
  3. in the 'Communities and Collections' page, choose 'Create Top-Level Community' from the Admin tools.
  4. Add at a minimum a name for your community (e.g. "ECG") and click create at the bottom.
  5. Now Click 'Create Collection' in the admin tools bar for the new Community you've created.
  6. Follow the on-screen prompts to describe the Collections and assign privileges as desired.
  7. Get handles of new collections
    1. Visit the homepage for the new collection you created and inspect the url. The handle you will want to input is the sequence directly after 'handle'. (e.g. "123456789/3")
    2. The input-forms.xml file [dspace-source]/dspace/config/input-forms.xml will need to be modified with the handles for new collections (e.g. datasets and publications for various collections) to ensure custom publication and dataset form fields for submissions map correctly.
    3. Dataset handles will also have to be added to the value-pair list for "dataset_defaults" in the input-forms.xml file. This configuration allows newly added items in Dataset Collections to appear as an option for selection in the 'Referenced Datasets' field of new publication entries.
    4. A Rebuild/Restart of DSpace will be required after these config changes.

Adding new Items to the Discovery Index

FIXME: this section requires more complete documentation

  1. get value (schema item)
  2. create "search filter" bean (in __________.xml file) [link]
  3. add to search filters
  4. add references to Messages.properties file
  5. rebuild/restart dspace
  6. index-discovery -b [link]

Adding a new Schema item to collection (e.g. below = dc.identifier.pubmed)

FIXME: this section requires more complete documentation

NOTE: dc.identifier.pubmed exists in the EDDI schema and is shown below only for illustrative purposes

  1. add dc.identifier.pubmed to ecg-datatype.xml
  2. rebuild/restart dspace
    1. To update the metadata schema within DSpace's database instance and run index-discovery -b & metadata schema update scripts
    2. Run Metadata Schema Update as described above (link: http://wiki.cvrgrid.org/index.php/CVRG_EDDI_Server_Configuration#Metadata_Schema_Update )
    3. Update discovery index at the command line with the following:
      [dspace-installation]/bin/dspace index-discovery -b
  3. add pmid in input-forms.xml to value pairs...
  4. try adding dc.identifier.pubmed in bte.xml transform engine >> step (2) (for things you want imported from pubmed)
  5. add pmid to messages.properties file in jspui
  6. add pmid to webui.itemdisplay.default in dspace.cfg file (outputs when pub displays)
  7. rebuild/restart dspace

importing modified metadata exports

DSpace has the ability to export metadata. Metadata files can then be edited to perform operations on bulk entries in a more efficient manner. This technique can be useful for adding new fields to old entries and other modifications greater than the limitations of the admin GUI tools.

  • [dspace-installation]/bin/dspace metadata-import -f [path to csv file]

NOTE: use the above with extreme care and only after database backups as it is possible to bulk delete/damage multiple entries.

Bookmarks/Additional Reference

The following links document several customizations used while developing EDDI. They may be useful for reference by those developing parallel systems or extending our tools further.

Personal tools
Project Infrastructures