Page History
Table of Contents | ||
---|---|---|
|
...
Introduction
The Data Repository is a software component that manages an RDF database and makes it available to other applications through a REST API, and gives end users specific views of the data. It adds role-based access control of varying granularity, transactional editing, custom treatment of ontologies and minimal/fast inference, and various administrative functions on top of the RDF database.
This page explains how it works on a host computer system, and how to install and maintain it. This page serves as an application administrator's manual for the development cycle.
Components and Layout
The data repository is installed in two intentionally separate places on the host operating system:
...
Another advantage to the separate location is that it gives the system administrator more flexibility to assign that directory to a location with appropriate capacity, reliability, and performance.
Command-Line Tools
The installed repository includes a set of command-line tools you will use for many of the administrative tasks. They are found in the etc/ subdirectory of the repository home directory. All of them respond to these two options:
--version
- display what released version the tool came from--help
- display a synopsis of command args and switches
For example:
No Format |
---|
bash ${REPO_HOME}/etc/upgrade.sh --version
upgrade.sh from release 1.1-MS4.00 SCM revision 5422
|
Installation
Platform Requirements
- This application requires Sun's JRE version 1.6, e.g. "Java HotSpot(TM) SE Runtime Environment".
- The repository is a pure Java webapp and ought to run on any Java Servlet container conforming to the 2.5 version of the specification. It has only been thoroughly tested on Apache Tomcat 6.0 and Apache Tomcat 7.0, however.
- The supporting utility scripts and tools require a Unix environment such as MacOS or Linux. MS Windows is NOT supported.
- Aside from the Java Servlet environment, the webapp requires a separate "home" directory, located outside of the servlet container hierarchy, to which the container's JVM has read/write access.
...
Install and Configure Repository
Step 1. Get Repository Distribution
The repository is distributed as a single Zip file. It contains a file README which identifies the software release it was built from. It is the artifact produced by the Maven project:
No Format |
---|
org.eagle-i:eagle-i-repository-dist
|
Step 2. Establish the Repository Home Directory
...
We will call this directory REPO_HOME
and it will appear in commands and scripts below as ${REPO_HOME
}.
Create the repository home directory in your file system. It is useful to have a base eagle-i directory to place data and configuration used by other eagle-i applications. For example,
No Format |
---|
mkdir /opt/eaglei
mkdir /opt/eaglei/repo
|
If necessary (i.e if you created it using your own user-id), change ownership of the directory to the user-id under which Tomcat is running. If you followed the example above, change the ownership of the two directories using the -R option. For example, if the user-id under which Tomcat executes is tomcat,
No Format |
---|
chown -R tomcat /opt/eaglei
|
Initialize it as a variable in your shell environment. In this example (Bourne/bash shell) the repository home directory is /opt/eaglei/repo
:
No Format |
---|
REPO_HOME=/opt/eaglei/repo
|
Step 3. Populate the Repository Home Directory from the Distribution
Unpack the distribution Zip archive in a directory under /tmp
:
No Format |
---|
cd /tmp
unzip repository-dist.zip
|
Move the contents of the unzipped directory to your repository home directory. In this example the distribution is version 1.1-MS1.00-SNAPSHOT
No Format |
---|
mv /tmp/repository-1.1-MS1.00-SNAPSHOT/* ${REPO_HOME}/.
|
List the contents of the home directory:
No Format |
---|
cd ${REPO_HOME}
ls
|
It should contain the subdirectories etc/
lib/
and webapps/
Step 4. Locate the Servlet Container (Apache Tomcat)
...
We will call this directory CATALINA_HOME
and it will appear in commands and scripts below as
No Format |
---|
${CATALINA_HOME}
|
Initialize it as a variable in your shell environment. In this example (in Bourne/bash shell) the Tomcat'shome directory is /opt/tomcat
:
No Format |
---|
CATALINA_HOME=/opt/tomcat
|
...
Ensure that your Tomcat server is run with the following options on its JVM. The simplest way to accomplish this is to have the environment variable JAVA_OPTS
include those options, but each platform, distro, package etc. of Tomcat has its own mechanism for setting this variable. For example, on Fedora 14, it should be in the file /etc/tomcat6/tomcat6.conf
. If you can't find your distribution's configuration file, you may create a file setenv.sh
in tomcat's bin directory to add the environment variable:
No Format |
---|
...(ONLY DO THIS if you can't find your distribution's config file)
cd ${CATALINA_HOME}/bin
touch setenv.sh
|
Edit the configuration file (tomcat6.conf
, setenv.sh
or whatever your distribution uses) and add the following line:
No Format |
---|
JAVA_OPTS="-XX:PermSize=64M -XX:MaxPermSize=256M -Xmx1024m"
|
Add the following two system properties to file conf/catalina.properties
under the CATALINA_HOME
directory -- the same directory where you'll find server.xml
. The value for both of these properties is the absolute path of the repository home directory. In this example, it is /opt/eaglei/repo
:
No Format |
---|
# example
org.eaglei.repository.home = /opt/eaglei/repo
derby.system.home= /opt/eaglei/repo
|
...
Look in your Tomcat installation's main lib directory. If there are no files named derby.jar
or derby-version.jar
, you must install the Derby jars from the "scripts" distribution, e.g.
No Format |
---|
cp ${REPO_HOME}/lib/derby-* ${CATALINA_HOME}/lib/
|
...
Bourne/bash shell version:
No Format |
---|
....(ONLY DO THIS when ALREADY running Apache Derby!)
export DERBY_HOME=my-derby-installation-toplevel
|
C Shell/csh version:
No Format |
---|
....(ONLY DO THIS when ALREADY running Apache Derby!)
setenv DERBY_HOME my-derby-installation-toplevel
|
NOTE: You must use the same version of Derby to create this initial user database as the version installed in Tomcat, so if Tomcat is already running a version of Derby, set DERBY_HOME
to use that.
Step 8. Install the Repository
Follow this step-by-step procedure. Before you start, make sure the Tomcat server is not running.
Navigate to Tomcat's webapps directory. If there exist a directory named ROOT, move it aside. The eagle-i repository must be the ROOT application
No Format cd ${CATALINA_HOME}/webapps mv ROOT ROOT.original
Copy the repository webapp to the Tomcat webapps directory:
No Format cp ${REPO_HOME}/webapps/ROOT.war ${CATALINA_HOME}/webapps/.
Create your initial administrative user login. Think of a USERNAME and PASSWORD and substitute them for the upper case words in this command:
No Format bash ${REPO_HOME}/etc/prepare-install.sh USERNAME PASSWORD ${REPO_HOME}
- Start up Tomcat.
Run the finish-install script, which loads the data model ontology among other things. Note that you can also give it additional options to specify a personal name and email box for the initial admin user.
No Format bash ${REPO_HOME}/etc/finish-install.sh USERNAME PASSWORD https://localhost:8443
...or, with username metadata included:
No Format bash ${REPO_HOME}/etc/finish-install.sh \ -f firstname \ -l lastname \ -m admin@ei.edu \ USERNAME PASSWORD https://localhost:8443
Run the upgrade.sh script, which preforms additional configurations.
No Format bash ${REPO_HOME}/etc/upgrade.sh USERNAME PASSWORD https://localhost:8443
- Copy the file default.configuration.properties in located in {
${REPO_HOME} }}into a file named {{configuration.properties
and edit the latter to reflect your installation. See the #Configuration section below for details on the property definitions and expected values. Restart Tomcat to pick up these configuration changes. Confirm that the eagle-i repository is running by visiting the admin page (login with USERNAME and PASSWORD):
No Format https://localhost:8443/repository/admin
Anchor | ||||
---|---|---|---|---|
|
This is the procedure to upgrade an existing repository instance to a new release of the software. All existing configurations, data, and user accounts are preserved. However, if the upgrade includes ontology changes there will also be an extra procedure to transform the existing data to reconcile it with ontology changes.
Before Upgrading
Get the Repository Distribution
The repository release is distributed as a single Zip file. It contains a file README
whcih identifies the software release it was built from. It is the artifact produced by the Maven project:
No Format |
---|
org.eagle-i:eagle-i-repository-dist
|
...
It would be a wise precaution to make a backup of the current repository state so you can roll back to it in case of fatal problems with the upgrade. Follow the Backup Procedure in the #Procedures section to get a snapshot of the current repository contents.
Step By Step Upgrade Procedure
Note that the directory macros ${CATALINA_HOME} and ${REPO_HOME} are used in the examples here; see the Install Procedure above for a description of what they mean.
Unpack the distribution Zip archive in a directory e.g. under
/tmp
:No Format cd /tmp unzip repository-dist.zip
- Shut down your Tomcat java servlet container.
Delete the old repo webapp subdirectory and WAR file, since there should not be any local modifcations there. For example:
No Format rm -rf ${CATALINA_HOME}/webapps/ROOT*
Save the current release files in case you have to roll back:
No Format cd ${REPO_HOME} mv etc etc.old mv lib lib.old mv webapps webapps.old
Copy the distribution into place (in this example the distribution is version 1.7-MS1.01) -- note there are 2 steps:
No Format cp -f -rp /tmp/repository-1.7-MS1.01/* ${REPO_HOME} cp ${REPO_HOME}/webapps/ROOT.war ${CATALINA_HOME}/webapps/.
- Start up your tomcat java servlet container.
Run the upgrade script, substituting your admin's username and password:
No Format bash ${REPO_HOME}/etc/upgrade.sh USERNAME PASSWORD https://localhost:8443
Watch the output of upgrade.sh very carefully! Pay particular attention to the final status and any messages beginning "WARN", they will indicate problems you MUST resolve.
- Confirm that it worked: visit the repo admin page, check for new version, and then follow the link to Show Data Model Ontology versions to confirm that "loaded" and "available" versions of the ontology are the same.When running the upgrade script, there may be messages about out-of-date
NG_Internal
andNG_Query
graphs. Most likely, these are nothing to worry about -- check the release notes. These graphs are only initialized from static files when the repository was created, and afterward they accumulate statements, so reloading a new copy of the original data is not practical. Some releases may include instructions for making changes in these graphs when upgrading from previous versions. Download the data migration toolkit that corresponds to your repository version (in this example, version 1.7-MS1.02) and run the data migration script, substituting your admin's username and password:
No Format wget -O ${REPO_HOME}/etc/eagle-i-datatools-datamanagement.jar \ http://infra.search.eagle-i.net:8081/nexus/content/repositories/\ releases/org/eagle-i/eagle-i-datatools-datamanagement/1.7-MS1.02/\ eagle-i-datatools-datamanagement-1.7-MS1.02.jar bash ${REPO_HOME}/etc/data-migration.sh -u USERNAME -p PASSWORD -r https://localhost:8443
Watch the output of data-migration.sh very carefully! Pay particular attention to the final status and any messages beginning "WARN", they will indicate problems you MUST resolve. In addition to the output on screen, the data-migration script will place a data migration report in the
logs
directory directly under/etc
.
Configuration
URIs for Creating New Roles, Transitions, and Workspaces
...
For Workspace (aka Named Graph) URIs, you have to assign them in the process of creating a new Named Graph. Follow the rules below to create a reasonable URI.
Rules of Creating Your Own URIs
...
http://dartmouse.edu/repo/Role_LabRathttp://dartmouse.edu/repo/WFT_13_2http://eagle-i.org/ont/repo/1.0/DARTMOUSE_ROLE_PIhttp://eagle-i.org/ont/repo/1.0/DARTMOUSE_WFT_TRASH
Exception: The URI of a named graph representing an ontology is usually the same as the URI of the ontology itself, i.e. the subject of its owl:versionInfo
statement. If you should happen to add a new ontology named graph to the repository, use that URI for its name. However this should be a very rare occurrence; usually new ontological information is simply added to the existing eagle-i data model ontology graph.
Managing Access Controls on Contact & "Hidden" Properties
...
Once you have set up a single repository to your liking, you can export and re-import the grants to other repositories. See the Procedure: Exporting and Importing Property Access Controls section below.
Configuration Reference
This section lists everything that can be configured, so you can get familiar with it before installing anything.
System Properties
The repository requires these system properties to be defined in the JVM environment running your servlet container:
...
If you are using the Apache Tomcat version 6 container (which is recommended), you can add these system properties to file conf/catalina.properties
- add lines like these: (note that the path /opt/eaglei/repo
is just shown as an example)
No Format |
---|
org.eaglei.repository.home = /opt/eaglei/repo
derby.system.home= /opt/eaglei/repo
|
...
configuration.properties
- java properties file with repository and log4j configuration props. This is optional, it must be created by the administrator.logs/
- Default subdirectory for log files, see configuration. Created automatically by default.sesame/
- Default Sesame RDF database files - DO NOT TOUCH. Created automatically by default.etc/
- Contains scripts and tools for the repo administrator.db/
- Default subdirectory Derby RDBMS files - DO NOT TOUCH. Created automatically by default.
The Repository Configuration Properties File
...
eaglei.repository.namespace
- The namespace URI prefix for Eagle-I resource instances created in the repository.- Every administrator should set this to a reasonable value for his/her site, because the default is NOT desireable.
- The value must be a fully qualified, resolvable, HTTP URL.
- For example,
http://foo.bar.edu/i/
- Use the http scheme, NOT https, since the container will redirect to https if necessary, but it is not possible to direct back if it becomes preferable to use http later.
- The system-generated default is the hostname followed by /i/ -- but this is often wrong, since Java's determination of hostnames in a servlet container environment is not reliable.
eaglei.repository.title
- the decorative title for UI pages, should be set for cosmetic reasons.- Set this to the name of your site, e.g. "Miskatonic University School of Medicine".
eaglei.repository.logo
- URL of the logo image for your site, may be either relative URL (to refer to a image embedded in the webapp) or an absolute URL to use an image hosted elsewhere. It should be about 50 pixels high and a suitable with given the proportions.eaglei.repository.index.url
- Set this to the URL to which you want the site's "root" (top-level index) page redirected. Although the repository is installed as the root webapp to have control over resolving Semantic Web URIs, it does not need the root page so this allows you to configure your site as you like.eaglei.repository.admin.backgroundColor
- Lets you change the background color for admin web UI pages, to give admins an obvious cue when they are operating on e.g. the production vs. test repos. Value is CSS color expression, e.g. crayon name like "bisque" or hex #CCFFCC (Added in Release 1.2MS2 or 3)eaglei.repository.instance.xslt
- path to XSL stylesheet used to transform the HTML output of the instance dissemination service. A value for this key is required to produce XHTML in the dissemination service; without it, the service returns the internal XML document describing the instance.- If it is a relative path then it must be located relative to the root of the web application, if absolute then it is in the filesystem at large.
- The advantage of keeping your stylesheets external to the webapp is that you can change them easily, and don't have to modify the webapp from its default installation.
- An example is provided at
repository/styles/example.xsl
which creates very simple HTML, as a demonstration of how to write an XSL stylesheet.
eaglei.repository.instance.css
- URI of the CSS stylesheet resource to be used to style instance dissemination pages. It must be an absolute path or absolute URL. The default is:No Format eaglei.repository.instance.css = /repository/styles/i.css
eaglei.repository.tbox.graphs
- a comma-separated list of graph URIs making up the "TBox".
You should never have to set this! It is configurable "just in case", and for testing/experimenting. For more information, see the section on inferencing in the API Manual.
By default, the TBox consists of:- The repository's internal ontology, http://eagle-i.org/ont/repo/1.0/
- The eagle-i data model ontology,
http://purl.obolibrary.org/obo/ero.owl
eaglei.repository.datamodel.source
- the full name of a resource within the webapp which its itself a property file describing the RDF data model ontology. You should not need to set this, the default is adequate for the eagle-i applicaiton. Default iseaglei-datamodel.properties
which is a built-in resource file.
For a description of the contents of this properties file, see the separate document Guide to Data Model Configuration Propertieseaglei.repository.sesame.dir
- directory where Sesame RDF database files are created.- Defaults to
sesame
subdirectory of home dir.
- Defaults to
eaglei.repository.log.dir
- Directory where log files are created.- Defaults to
logs
subdirectory of the home dir. - You can also configure
log4j
explicitly by addinglog4j
properties to this file.
- Defaults to
eaglei.repository.sesame.indexes
- index configuration for Sesame triple store. Must be a comma-separated list of index specifiers, see Sesame NativeStore configuration documentation for details. Use this to change the internal indexes Sesame maintains to process queries. It takes effect on next servlet container (tomcat) restart.Warning title WARNING If you have a configured value and wish to go back to the default, do NOT just delete this configuration property. If you do, Sesame will simply keep the existing indexes. You must change it to the original default value, which is documented int he default configuration file.
eaglei.repository.slow.query
- Value in seconds of time after which a SPARQL query should be considered "slow" and logged as such. Only affects the SPARQL Protocol endpoint service. Default is 0, which never logs. Use this to check for performance problems, since it logs the full text of the query and time of occurance in the regular log at INFO level.eaglei.repository.sparqlprotocol.max.time
- Time limit, in seconds, of the maximum time allowed for a query invoked by the SPARQL Protocol endpoint. Note that this does not affect any internally-generated SPARQL queries.- Any user can override this setting to impose a shorter timeout by giving a value for the nonstandard time argument.
- Only the Administrator can override with a longer timeout.
- The built-in default is 600 seconds (10 min) if nothing is configured.
- If a SPARQL Protocol request cannot be complted within the timeout, it returns an HTTP 413 status (result too large - it was the standard response code that comes closest to the concept).
eaglei.repository.anonymous.user
- This is a hack, only intended for testing the Anonymous role. Its value is a username, e.g. "nobody". If configured, when the designated user logs in, their session is downgraded to the Anonymous role; this allows explicit testing of Anonymous (vs. Authenticated) access even when the webapp configuration does not allow unauthenticated access. ONLY TESTERS SHOULD EVER NEED TO SET THIS.- Configuring Contact Hiding:*The following properties control the contact hiding extension, which restricts the display of "contact location" properties of instances and instead offers an anonymous email option. Red properties are required *only if you enable contact hiding:
eaglei.repository.hideContacts
- true|false, enables the contact hiding function. When it is false, none of the other properties are used.eaglei.repository.postmaster
- email address of repository administrator(s). User-generated messages about resources without a contact email address get sent here, as well as diagnostic messages. We recommend using an email list or alias so it can be changed or directed to multiple people.eaglei.repository.mail.host
- hostname of SMTP server for outgoing mail, defaults to localhost.eaglei.repository.mail.port
- TCP port number of SMTP server for outgoing mail, only necessary if using a non-default port for your chosen type of service.eaglei.repository.mail.ssl
- Use SSL for connection to SMTP server for outgoing mail, value is true or false.eaglei.repository.mail.username
- Username with which to authenticate to SMTP server for outgoing mail, default is unauthenticated.eaglei.repository.mail.password
- password with which to authenticate to SMTP server for outgoing mail, default is none.
...
Note that the properties file may also contain Log4J configuration properties. For example you can turn on debugging log output by adding this line:
No Format |
---|
log4j.logger.org.eaglei.repository=DEBUG, repository
|
...
The default log4j configuration sets up an appender named repository with buffered I/O for efficiency. Note that this means log messages will not appear in the log file immediately, but only after the logging volume fills a buffer. This is useless for interactive debugging through the logs. If you are doing interactive debugging and want to see more log detail, along with immediate results, you should add the properties:
No Format |
---|
log4j.logger.org.eaglei.repository=DEBUG, repository
log4j.appender.repository.BufferedIO=false
log4j.appender.repository.ImmediateFlush=true
|
Also note that the default configuration turns off additivity in the repo root Logger; this means its log events do not propagate up to e.g. the root logger. If you wish to turn it back on, add this to your configuration:
No Format |
---|
log4j.additivity.org.eaglei.repository=true
|
Here are all of the default log4j configuration properties:
No Format |
---|
log4j.logger.org.eaglei.repository=INFO, repository
log4j.additivity.org.eaglei.repository=false
log4j.appender.repository=org.apache.log4j.RollingFileAppender
log4j.appender.repository.File=${eaglei.repository.log.dir}/repository.log
log4j.appender.repository.ImmediateFlush=false
log4j.appender.repository.BufferedIO=true
log4j.appender.repository.Append=true
log4j.appender.repository.Encoding=UTF-8
log4j.appender.repository.layout=org.apache.log4j.PatternLayout
log4j.appender.repository.layout.ConversionPattern=%d{ISO8601} %p %c - %m%n
|
IMPORTANT NOTE: If you add logger
configurations to tweak the level of a subset of the repo log hierarchy, you must add an additivity
configuration to prevent log4j from applying the ancestor logger as well, which would result in double log entries. For example, this fragment shows a default log level of INFO but adds DEBUG logging of RepositoryServlet to get elapsed time messages:
No Format |
---|
log4j.logger.org.eaglei.repository=INFO, repository
log4j.additivity.org.eaglei.repository=false
log4j.logger.org.eaglei.repository.servlet.RepositoryServlet=DEBUG, repository
log4j.additivity.org.eaglei.repository.servlet.RepositoryServlet=false
log4j.appender.repository.BufferedIO=false
log4j.appender.repository.ImmediateFlush=true
|
Monitoring and Troubleshooting
Version Information
It's often helpful to know exactly what version of the repository you're dealing with, especially in a hectic development and/or testing environment when many versions are available. The release version appears in these places:
Dissemination HTML pages, the head element contains a meta tag with the name
eaglei.version
, e.g.No Format <meta name="eaglei.version" content="1.1-MS5.00-SNAPSHOT" />
- The repository admin home page
/repository/admin
lists application version info in a human-readable format. - The page
/repository/version
gives a complete breakdown of component versions, including repo source and the version of the OpenRDF Sesame database. It is XHTML, and it includes meta tags to be easy to scrape or transform.
Log Files
Since the repository is mainly accessed by the REST service API it provides to other applications, you should get used to monitoring it by watching the log file. This is a text file (UTF-8 encoding) maintained by the log4j library under the control of the repository's configuration properties. See the description of the log.dir
property above to learn the directory where logfiles are created; they are automatically rotated when the logfile grows too large.
...
To troubleshoot problems with the logging system itself (e.g. log4j config that isn't working as expected), look for where your Java Servlet container writes the standard output stream. For Tomcat 6, this is typically the catalina.out
file in some log directory.
Performance Monitoring
As of release 1.1MS5 the repo can log the elapsed time (in milliseconds) for each service request. You must enable DEBUG level logging for the RepositoryServlet, as in this configuration example.
No Format |
---|
log4j.logger.org.eaglei.repository=INFO, repository
log4j.additivity.org.eaglei.repository=false
log4j.logger.org.eaglei.repository.servlet.RepositoryServlet=DEBUG, repository
log4j.additivity.org.eaglei.repository.servlet.RepositoryServlet=false
log4j.appender.repository.BufferedIO=false
log4j.appender.repository.ImmediateFlush=true
|
As of release 1.2MS3 the repo will also show the time spent on internal SPARQL queries, which can be useful when tuning Sesame indexes. Add these log4j configuration lines to see just the query log messages:
No Format |
---|
log4j.logger.org.eaglei.repository.util.SPARQL = DEBUG, repository
log4j.additivity.org.eaglei.repository.util.SPARQL = false
|
Then, you'll see log entries like this which you can correlate to requests from your application:
No Format |
---|
...service invocation examples:
2011-01-27 14:28:06,483 T=http-8443-1 DEBUG org.eaglei.repository.servlet.RepositoryServlet -
============== Ending Request /repository/update (2,159 mSec elapsed)
2011-01-27 14:27:58,023 T=http-8443-1 DEBUG org.eaglei.repository.servlet.RepositoryServlet -
============== Ending Request /repository/workflow/push (1,763 mSec elapsed)
... (internal query example:)
2011-04-15 14:13:28,383 T=http-8443-1 DEBUG org.eaglei.repository.util.SPARQL -
SPARQL Query executed by
org.eaglei.repository.model.User:findAll at line 227 in elapsed time (mSec) 15
|
...
See the eaglei.repository.slow.query
configuration property for more details. Note that this only applies to to queries made through the SPARQL Protocol endpoint, not the SPARQL queries generated internally by the repo code.
Tuning
The performance of Sesame's NativeStore implementation is extremely sensitive to its index configuration. There is a major benefit to configuring indexes that help resolve triple patterns used by the most frequent and/or voluminous SPARQL queries. A knowledgeable repository administrator should adjust the setting of the eaglei.repository.sesame.indexes
property to get the NativeStore to build the most necessary indexes. See doc on that configuration for more details.
Administrator Tools
make-snapshot.sh Script
The make-snapshot script creates a complete backup copy of a data repository, in a designated directory. It has to be given a directory because the backup consists of multiple files. It is packaged with the repository distribution, under the etc/
directory.
...
NO MESSAGE is printed upon success, which lets it run under cron.
Usage
Synopsis:
No Format |
---|
make-snapshot.sh username password repo-URL directory
|
...
- username - username with which to authenticate to the repo
- password - password with which to authenticate to the repo
- repo-URL - prefix of repository URL, e.g. "https://localhost/"
- directory - directory in which to write the dump, will be created if necessary
Restoring Dumps made by make-snapshot
Given a dump created in e.g. ${DUMPDIR}, to restore this dump on a newly-created, empty, repository, use these commands: (where ${REPOSITORY} is URL prefix of the repo)
No Format |
---|
curl -D - -s -S -u ADMIN:PASSWORD -F type=user -F format=application/x-trig \
-F content=@${DUMPDIR}/users.trig -F duplicate=replace \
-F transform=no ${REPOSITORY}/repository/import
|
No Format |
---|
curl -s -S -D - -u ADMIN:PASSWORD -F action=replace -F all= \
-F "content=@${DUMPDIR}/resources.trig;type=application/x-trig" \
${REPOSITORY}/repository/graph
|
...
in a differently-named directory each day, rotating through a week:
No Format |
---|
make-snapshot.sh ADMIN PASSWORD https://localhost:8443 "daily_cron_`date +%u`"
|
...
Since resource URIs have to be resolvable, this effectively creates new resources in the destination repository with URIs that resolve there. It does this by substituting the target's default prefix into all URIs that used to resolve at the source repository.
This Is Inherently Not A Good Idea
...
Given all of these limitations, move-everything can still be an effective way of populating a repository for testing and demonstrations. Just stay aware of what doesn't work, and only use it when the results are temporary and will be discarded.
Restoring from Backups
There is one other legitimate use of move-everything: restoring a backup copy made with make-snapshot. In this case you don't really have to transform the URIs, and the whole intent is to re-create the original state of the repo so the side effects are all desired.
Using the Script
The resource copying script is installed under etc/ in the repository home directory. Its name is move-everything.sh
. It only runs on a Unix-based operating system such as Linux or MacOS X. It requires bash, perl 5, and the curl executable.
The synopsis for copying from repository to repository:
No Format |
---|
Usage: move-everything.sh [--version|--version] [ -f | --force ]
[-exclude-users user,user,..|-exclude-users user,user,..] [-nousers]
from-username from-password from-repo-URL
to-username to-password to-repo-URL
|
The synopsis for copying from file to repository:
No Format |
---|
Usage: move-everything.sh [--version|--version] [ -f | --force ]
[-exclude-users user,user,..|-exclude-users user,user,..] [-nousers]
--from-snapshot directory --from-prefix from-prefix
to-username to-password to-repo-URL
|
...
The --from-snapshot
and --from-prefix
options must be specified together. They select the input data from a directory of serialized files, in the same format as produced by the make-snapshot script. The value of --from-snapshot
is the path to the direcotry containing the RDF serialization files. The value of -from-prefix
is the exact and complete URI prefix (including the trailing '/') of the repo that generated the dump in the directory. This is necessary because the script does not ahve access to that repository to query it for its prefix.
Fixed Arguments
The fixed command arguments are either one or two triplets of repository access information, i.e. the username, password, and URL of each repo.
...
Here is an example that copies from the production Harvard repo to a local one:
No Format |
---|
move-everything.sh bigbird PASSWORD https://harvard.eagle-i.net \
bigbird PASSWORD https://localhost:8443
|
Here is an example that copies a snapshot the production Harvard repo to a local one:
No Format |
---|
make-snapshot bigbird PASSWORD https://harvard.eagle-i.net \
harvard.monday
move-everything.sh -f \
--from-snapshot harvard.monday \
--from-prefix http://harvard.eagle-i.net/i/ \
bigbird PASSWORD https://localhost:8443
|
...
We strongly recommend you avoid using the Superuser (administrator) login on the source repository, to prevent accidentally obliterating it by getting the argument order wrong. Use an account that has read access to every graph (e.g. the Admin-Read-Only role). This restricts you to using the --nousers version of the command but in most cases that is adequate. See the #Procedures section for recommendations on how to maintain copies of repositories this way.
move-resources.sh - Copying Only Resource Instances
...
Since resource URIs have to be resolvable, this effectively creates new resources in the destination repository with URIs that resolve there. The hostname portion of the URI matches the new repository server, and even the local name is allocated by the destination repository -- so there is no predictable way to relate new URIs to the old ones.
This Is Inherently Not A Good Idea
...
Given all of these limitations, the resource-mover script can still be an effective way of populating a repository for testing and demonstrations. Just stay aware of what doesn't work.
Using the Script
The resource copying script is installed under etc/
in the repository home directory. Its name is move-resources
. It only runs on a Unix-based operating system such as Linux or MacOS X. It requires perl 5 and the curl executable.
Run it with -h to get the synopsis:
No Format |
---|
Usage: move-resources [-verbose] [-replace]
[--type published|workspace]{ --file source-file --prefix uri-prefix | --source source-repo-url
--user login:password --graph src-graph-URI }
dest-repo-url dest-login:dest-password dest-graph-URI
(options may be abbreviated to first letter, e.g. -f)
|
...
Here is an example command, it copies from the Published graph on qa.harvard to an "Experimental" graph on the local repo (on https://localhost:8443)
No Format |
---|
move-resources -s https://qa.harvard.eagle-i.net:8443 -u bert:ernie \
-g http://eagle-i.org/ont/repo/1.0/NG_Published https://localhost:8443 \
root:password http://eagle-i.org/ont/repo/1.0/NG_Experimental
Moved 4694 data statements and 322 metadata statements.
|
Procedures
Procedure: Upgrading Packaged Tomcat
...
- Shut down tomcat. This is major surgery, and tomcats don't like to be vivisected no matter how much more satisfying you may find it.
- Disable Java Security -- alternately, you could try to configure all the authorization grants to give the repository webapp access to the filesystem and property resources it needs, but I found it much easier to just disable java security. DO NOT RUN THE TOMCAT PROCESS AS ROOT if you do this, but you should not be running it as root in any case. That's just insane.
Edit the file
/etc/init.d/tomcat6
and change the following variable to look like this:No Format TOMCAT6_SECURITY=no
- Install Derby jars: ONLY IF DERBY IS NOT ALREADY INSTALLED IN THE COMMON AREA OF YOUR TOMCAT. If another webapp is already using Derby, they should share that version.
- Find the Derby jars in the
lib/
subdirectory under where you installed thecreate-user.sh
script. Copy them to the Tomcat common library directory:
No Format cp ${REPO-ZIP-DIR}/lib/derby* /usr/share/tomcat6/lib/
- Find the Derby jars in the
Install the webapp: First, get rid of any existing root webapp, then copy in the webapp (
ROOT.war
file from your installation kit) and be sure it is readable by the tomcat6 user:No Format rm /var/lib/tomcat6/webapps/ROOT*cp ROOT.war /var/lib/tomcat6/webapps/ROOT.war
Install cached webapp context: This is VERY IMPORTANT, and the Tomcat docs does not even mention it, but without it your server will be mysteriously broken. The file
/etc/tomcat6/Catalina/localhost/ROOT.xml
must be a copy of your app'scontext.xml
. Redo this command after installing every newROOT.war
:No Format mkdir -p /etc/tomcat6/Catalina/localhost unzip -p /var/lib/tomcat6/webapps/ROOT.war META-INF/context.xml > /etc/tomcat6/Catalina/localhost/ROOT.xml
Add System Properties: Be sure you have added system properties to the file
/etc/tomcat6/catalina.properties
e.g.No Format org.eaglei.repository.home = /opt/eaglei/repoderby.system.home = /opt/eaglei/repo
...of course, the value of these properties will be your Repository Home Directory path.
Start up Tomcat:
No Format sudo /etc/init.d/tomcat6 start
- Troubleshooting: If there are problems, check the following places for logs (because packaged apps make everything so much easier):
/var/log/daemon.log
- really dire tomcat problems and stdout/stderr go to syslog/var/log/tomcat6/*
- normal catalina logging${REPOSITORY_HOME}/logs/repository.log
- default repo log file in release 1.1; under 1.0 the filename wasdefault.log
.
...
- have been tested under Ubuntu Linux 9.10 _(krazy kitten), Fedora 12 and 14, and CentOS 6.03
- assume you are running Tomcat on port 8080. To redirect the HTTPS (HTTP on SSL) port, also run the 3 additional iptables commands (assuming port 443) below.
- require root privileges
- assume the Bourne shell (/bin/sh)
To check the what rules are running
No Format iptables -t nat -n -L
Discover your machine's primary IP address and set the ADDR shell variable: (Note that this assumes eth0 is your primary network interface --use
ifconfig -a
to see them all)No Format ADDR=`ifconfig eth0 | perl -ne 'print "$1\n" if m/\sinet addr\:(\d+\.\d+\.\d+\.\d+)\s/;'`
Run these iptables commands to redirect all port 80 requests to port 8080.
No Format iptables -t nat -A OUTPUT -d localhost -p tcp --dport 80 -j REDIRECT --to-ports 8080 iptables -t nat -A OUTPUT -d $ADDR -p tcp --dport 80 -j REDIRECT --to-ports 8080 iptables -t nat -A PREROUTING -d $ADDR -p tcp --dport 80 -j REDIRECT --to-ports 8080
(If using SSL) Run these iptables commands to redirect all port 443 requests to port 8443.
No Format iptables -t nat -A OUTPUT -d localhost -p tcp --dport 443 -j REDIRECT --to-ports 8443 iptables -t nat -A OUTPUT -d $ADDR -p tcp --dport 443 -j REDIRECT --to-ports 8443 iptables -t nat -A PREROUTING -d $ADDR -p tcp --dport 443 -j REDIRECT --to-ports 8443
- Check that your new rules are running (use the command above)
- Additional configuration
- Ubuntu
Save the rules in the canonical place to be reloaded on boot:
No Format iptables-save > /etc/iptables.rules
Create a script to be run by the network startup infrastructure that will reload the iptables whenever the network is configured on:
No Format cat << EOF > /etc/network/if-pre-up.d/iptablesload #!/bin/sh iptables-restore < /etc/iptables.rules exit 0 EOF
- Fedora
- Save the rules to be reloaded on boot:
The cleaner/preferable method, but apparently not working:
No Format /sbin/iptables-save
- Hacky, but works: manually edit
/etc/sysconfig/iptables
Update the startup settings so iptables will run upon reboot:
No Format chkconfig --level 35 iptables on
- Save the rules to be reloaded on boot:
- Ubuntu
- Test by accessing your server both locally and remotely by the port-80 URL. Then reboot the machine and try it again to be sure the iptables commands are run correctly on boot.
Procedure: Dump and Restore the RDF Resource Data
...
This is a complex manual procedure with many options -- for a simpler semi-automated backup snapshot procedure, see the section on using the make-snapshot script.
Make Backup Dump (obsolete - see make-snapshot)
Typical command to make a backup, in TriG format to a file, e.g. all-dump.trig (here highlighted in yellow) from a server running locally on port 80. In practice, you'll probably need to change all the highlighted parts, such as the username:password
login credentials, and the hostname in the target URL if not running locally.
No Format |
---|
curl -G -X GET -s -S -u username:password -o all-dump.trig -d all \
--write-out 'status=%{http_code}, %{time_total}sec\n' \
-d format=application/x-trig https://localhost:8443/repository/graph
|
Be sure the output shows a successful status code (namely 200), as shown here, since curl will return a successful status even if the HTTP service did not succeed; curl only reports on the success of the network request-and-response transaction.
No Format |
---|
status=200, 13.283sec
|
Restore Repository from Backup
...
Warning | ||
---|---|---|
| ||
this replaces the entire contents of the repository! |
No Format |
---|
curl -s -S -u username:password -F action=replace -F all= \
--write-out 'status=%{http_code}, %{time_total}sec\n' \
-F 'content=@all-dump.trig;type=application/x-trig' https://localhost:8443/repository/graph
|
Be sure the output shows a successful status code (namely 201, since it created graphs), as shown here, since curl will return a successful status even if the HTTP service did not succeed; curl only reports on the success of the network request-and-response transaction.
No Format |
---|
status=201, 13.283sec
|
Procedure: Saving and Restoring User Accounts
As of the MS6 release, you can use the new Export/Import service to create user accounts automatically (e.g. on a newly-created repository). This is NOT the same thing as true backup and restore; rather, it is intended more for setting up a test environment. The export and import services are very complex and powerful. This only gives one small example of what they can do. For all the details, see their entry in the API Manual.
Step 0. Create Prototype Accounts and Export Them
...
Now run a command like this to export the accounts into the file all-users.trig
No Format |
---|
curl -s -S -u username:password -G -d type=user -d format=application/x-trig \
--write-out 'status=%{http_code}\n' \
-o all-users.trig https://hostname:8443/repository/export
|
Note that you have to change the hostname and possibly the login. If there are accounts you do not want in the export, add an exclude argument to filter them out, with a space-separated list, e.g.
No Format |
---|
.... -d 'exclude=frankenstein moreau lizardo' ....
|
...
You can start with a newly-created repository which needs to have user accounts added. It only has the initial administrator login, e.g. bigbird
. Use the import service to add users from the file you created in step 0. The following command adds all of the accounts except bigbird (since it already exists), and aborts without changing anything if there are already duplicates of any of the users on the destination repo. It will print "status=200" on success.
No Format |
---|
curl -s -S -u username:password -F type=user -F format=application/x-trig \
-F transform=yes --write-out 'status=%{http_code}\n' \
-F exclude=bigbird \
-F content=@all-users.trig https://hostname:8443/repository/import
|
Note that the transform=yes argument means import will translate the instance URIs of the new users to newly-created URIs in the repository's default namespace. This is usually what you want. If you are positively restoring users already in the correct namespace and you want to preserve the old URIs, substitute transform=no.
Step 2. Testing Users
The easiest way to test the existence and details of a user is with the /whoami service. It does not show roles, however, you'll have to go to the repository administrative UI for that (or take it on faith). For example, after restoring users including curator, this is how you'd check that curator exists:
No Format |
---|
curl -s -S -u curator:password -G -d format=text/plain https://hostname:8443/repository/whoami
|
It's probably only necessary to test one user like this, and to make sure the output includes a URI, as a check that the whole import succeeded.
Procedure: Exporting and Importing Property Access Controls
...
To export property grants, plug those URIs into the following command (you need to replace italicized words):
No Format |
---|
curl -G -k -u ADMIN:PASSWORD -d type=grant -d "include=HIDE,CONTACT" \
-d format=application/x-trig https://localhost:8443/repository/export
|
This writes a record of grants to the standard output. Since the URIs are the same between other repositories running the same data model, you should be able to import them with the command (shows standard input in the example):
No Format |
---|
curl -k -u ADMIN:PASSWORD -F type=grant \
-F duplicate=abort -F transform=no -F content=@- \
-F format=application/x-trig https://localhost:8443/repository/import
|