Date: Thu, 28 Mar 2024 14:06:26 -0400 (EDT) Message-ID: <1900928748.786.1711649186823@prodopencatalystconfluence.catalyst> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_785_332151774.1711649186819" ------=_Part_785_332151774.1711649186819 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
The Data Repository is a software component that manages an RDF database= and makes it available to other applications through a REST API, and gives= end users specific views of the data. It adds role-based access control of= varying granularity, transactional editing, custom treatment of ontologies= and minimal/fast inference, and various administrative functions on top of= the RDF database.
This page explains how it works on a host computer system, and how to in= stall and maintain it. This page serves as an application administrator's m= anual for the development cycle.
The data repository is installed in two intentionally separa= te places on the host operating system:
Why do we want the home directory separated from the webapp? Mainly, bec= ause the webapp is prone to getting replaced completely when a new version = is deployed; typically, the servlet container unpacks a new WAR file and re= places all of the old webapp. Any data files stored there would be lost. Si= nce the configuration and data have to persist through many incarnations of= the webapp, it is much safer to keep them in a separate place in the files= ystem, outside of the entire servlet container hierarchy. Also, this way th= e webapp needs no modification after it is installed, simplifying the proce= dure for the system administrator.
Another advantage to the separate location is that it gives the system a= dministrator more flexibility to assign that directory to a location with a= ppropriate capacity, reliability, and performance.
The installed repository includes a set of command-line tools you will u= se for many of the administrative tasks. They are found in the etc/ subdire= ctory of the repository home directory. All of them respond to these two op= tions:
--version
- display what released version=
the tool came from--help
- display a synopsis of command ar=
gs and switchesFor example:
bash ${REPO_HOME}/etc/upgrade.sh --version upgrade.sh from release 1.1-MS4.00 SCM revision 5422
bash
perl
curl
awk
(surely anything that calls itself unix must have=
awk)tr
(seriously, is tr missing? if you are running Gent=
oo, install an operating system)Note that only one instance of a Repository webapp may be run on= a given home directory. This means that only one JVM and Servlet = Container may access that home directory and RDF dataset at any one time. T= his is a restriction imposed by the Sesame triplestore.
It is not possible to "scale" performance of the repository by sharing t= he online RDF database among multiple machines or processes. it is possible= to make periodic read-only snapshots of a database and serve them from sep= arate machines, so long as you do not allow them to be changed.
The repository is distributed as a single Zip file. It contains a file R= EADME which identifies the software release it was built from. It is the ar= tifact produced by the Maven project:
org.eagle-i:eagle-i-repository-dist
You need to determine the repository's home directory. It may be anywher= e on the system so long as it satisfies these criteria:
We will call this directory REPO_HOME
and it will appear in=
commands and scripts below as ${REPO_HOME
}.
Create the repository home directory in your file system. It is useful to =
have a base eagle-i directory to place data and configuration used by other=
eagle-i applications. For example,
mkdir /opt/eaglei mkdir /opt/eaglei/repo
If necessary (i.e if you created it using your own user-id), change owne= rship of the directory to the user-id under which Tomcat is running. If you= followed the example above, change the ownership of the two directories us= ing the -R option. For example, if the user-id under which Tomcat executes = is tomcat,
chown -R tomcat /opt/eaglei
Initialize it as a variable in your shell environment. In this example (=
Bourne/bash shell) the repository home directory is /opt/eaglei/repo<=
/code> :
REPO_HOME=3D/opt/eaglei/repo
Unpack the distribution Zip archive in a directory under /tmp:
cd /tmp unzip repository-dist.zip
Move the contents of the unzipped directory to your repository home dire=
ctory. In this example the distribution is version 1.1-MS1.00-SNAPSHO=
T
mv /tmp/repository-1.1-MS1.00-SNAPSHOT/* ${REPO_HOME}/.
List the contents of the home directory:
cd ${REPO_HOME} ls
It should contain the subdirectories etc/
lib/
=
and webapps/
Determine the Java Servlet Container's home directory (e.g. Tomcat) whic=
h is usually dictated by your host OS. For example, it may be the 'tomcat' =
user's home directory, ~tomcat
.
We will call this directory CATALINA_HOME
and it will appea=
r in commands and scripts below as
${CATALINA_HOME}
Initialize it as a variable in your shell environment. In this example (=
in Bourne/bash shell) the Tomcat'shome directory is /opt/tomcat
:
CATALINA_HOME=3D/opt/tomcat
Ensure that your Tomcat server is run with the following options on its =
JVM. The simplest way to accomplish this is to have the environment variabl=
e JAVA_OPTS
include those options, but each platform, distro, =
package etc. of Tomcat has its own mechanism for setting this variable. For=
example, on Fedora 14, it should be in the file /etc/tomcat6/tomcat6=
.conf
. If you can't find your distribution's configuration file=
, you may create a file setenv.sh
in tomcat's bin directory to=
add the environment variable:
...(ONLY DO THIS if you can't find your distribution's config file) cd ${CATALINA_HOME}/bin touch setenv.sh
Edit the configuration file (tomcat6.conf
, setenv.sh<=
/code> or whatever your distribution uses) and add the following line:
JAVA_OPTS=3D"-XX:PermSize=3D64M -XX:MaxPermSize=3D256M -Xmx1024m"
Add the following two system properties to file conf/catalina.prop=
erties
under the CATALINA_HOME
directory -- the same di=
rectory where you'll find server.xml
. The value for both of th=
ese properties is the absolute path of the repository home directory. In th=
is example, it is /opt/eaglei/repo
:
# example org.eaglei.repository.home =3D /opt/eaglei/repo derby.system.home=3D /opt/eaglei/repo
Look in your Tomcat installation's main lib directory. If there are no f=
iles named derby.jar
or derby-version.jar
, you mu=
st install the Derby jars from the "scripts" distribution, e.g.
cp ${REPO_HOME}/lib/derby-* ${CATALINA_HOME}/lib/
Are you already running applications which use a certain Apache =
Derby in your servlet container? If so, set the environment variab=
le DERBY_HOME
as documented by Apache; if not, leave it unset =
and the script will use its own version of Derby (the jars in its lib=
/
subdirectory):
Bourne/bash shell version:
....(ONLY DO THIS when ALREADY running Apache Derby!) export DERBY_HOME=3Dmy-derby-installation-toplevel
C Shell/csh version:
....(ONLY DO THIS when ALREADY running Apache Derby!) setenv DERBY_HOME my-derby-installation-toplevel
NOTE: You must use the same version of Derby to create =
this initial user database as the version installed in Tomcat, so if Tomcat=
is already running a version of Derby, set DERBY_HOME
to use =
that.
Follow this step-by-step procedure. Before you start, make sure the Tomcat server is <= strong>not running.
Navigate to Tomcat's webapps directory. If there exist a directory named ROOT, mov= e it aside. The eagle-i repository must be the ROOT applic= ation
cd ${CATALINA_HOME}/webapps mv ROOT ROOT.original
Copy the repository webapp to the Tomcat webapps directory:
cp ${REPO_HOME}/webapps/ROOT.war ${CATALINA_HOME}/webapps/.
Create your initial administrative user login. Think of a USERNAME a= nd PASSWORD and substitute them for the upper case words in this command:= p>
bash ${REPO_HOME}/etc/prepare-install.sh USERNAME PASSWORD ${REPO_HOME= }
Run the finish-install script, which loads the data model ontology a= mong other things. Note that you can also give it additional options to spe= cify a personal name and email box for the initial admin user.
bash ${REPO_HOME}/etc/finish-install.sh USERNAME PASSWORD https://loca= lhost:8443
...or, with username metadata included:
bash ${REPO_HOME}/etc/finish-install.sh \ -f firstname \ -l lastname \ -m admin@ei.edu \ USERNAME PASSWORD https://localhost:8443
Run the upgrade.sh script, which preforms additional configurations.=
bash ${REPO_HOME}/etc/upgrade.sh USERNAME PASSWORD https://localhost:8= 443
Restart Tomcat to pick up these configuration changes. Confirm that = the eagle-i repository is running by visiting the admin page (login with US= ERNAME and PASSWORD):
https://localhost:8443/repository/admin
This is the procedure to upgrade an existing repository instance to a ne= w release of the software. All existing configurations, data, and u= ser accounts are preserved. However, if the upgrade includes ontol= ogy changes there will also be an extra procedure to transform the existing= data to reconcile it with ontology changes.
The repository release is distributed as a single Zip file. It contains =
a file README
whcih identifies the software release it was bui=
lt from. It is the artifact produced by the Maven project:
org.eagle-i:eagle-i-repository-dist
It would be a wise precaution to make a backup of the current repository= state so you can roll back to it in case of fatal problems with the upgrad= e. Follow the Backup Procedure in the #Procedures se= ction to get a snapshot of the current repository contents.
Note that the directory macros ${CATALINA_HOME} and ${REPO_HOME} are use= d in the examples here; see the Install Procedure above fo= r a description of what they mean.
Unpack the distribution Zip archive in a directory e.g. under =
/tmp
:
cd /tmp unzip repository-dist.zip
Delete the old repo webapp subdirectory and WAR file, since there sh= ould not be any local modifcations there. For example:
rm -rf ${CATALINA_HOME}/webapps/ROOT*
Save the current release files in case you have to roll back:
cd ${REPO_HOME} mv etc etc.old mv lib lib.old mv webapps webapps.old
Copy the distribution into place (in this example the distribution i= s version 1.7-MS1.01) -- note there are 2 steps:
cp -f -rp /tmp/repository-1.7-MS1.01/* ${REPO_HOME} cp ${REPO_HOME}/webapps/ROOT.war ${CATALINA_HOME}/webapps/.
Run the upgrade script, substituting your admin's username and passw= ord:
bash ${REPO_HOME}/etc/upgrade.sh USERNAME PASSWORD https://localhost:8= 443
Watch the output of upgrade.sh very carefully! Pay particular atte= ntion to the final status and any messages beginning "WARN", they will indi= cate problems you MUST resolve.
NG_Intern=
al
and NG_Query
graphs. Most likely, these are nothing =
to worry about -- check the release notes. These graphs are only initialize=
d from static files when the repository was created, and afterward they acc=
umulate statements, so reloading a new copy of the original data is not pra=
ctical. Some releases may include instructions for making changes in these =
graphs when upgrading from previous versions.Download the data migration toolkit that corresponds to your= repository version (in this example, version 1.7-MS1.02) and run = the data migration script, substituting your admin's username and password:=
wget -O ${REPO_HOME}/etc/eagle-i-datatools-datamanagement.jar \ http://infra.search.eagle-i.net:8081/nexus/content/repositories/\ releases/org/eagle-i/eagle-i-datatools-datamanagement/1.7-MS1.02/\ eagle-i-datatools-datamanagement-1.7-MS1.02.jar bash ${REPO_HOME}/etc/data-migration.sh -u USERNAME -p PASSWORD -r https://= localhost:8443
Watch the output of data-migration.sh very carefully! Pay particul=
ar attention to the final status and any messages beginning "WARN", they wi=
ll indicate problems you MUST resolve. In addition to the output on screen,=
the data-migration script will place a data migration report in the =
logs
directory directly under /etc
.
When you create a new Role or Workflow Transiti= on, you have the option of assigning your own URI to the new resou= rce. When should you make up a URI, and when should you just let the system= create one?
The answer is, if you expect to be exporting and sharing this resource -= - which is to be expected for most Roles and Transitions, since there will = typically be many commonly-administered repositories sharing the same confi= guration of Roles and workflow, make up your own URIs following gui= delines here. This ensures that when, e.g. a User is copied from o= ne repository to another, her Roles are all available on the destination re= pository with the same access grants. Likewise, Workflow Transitions should= be given the same uniform URI on all repository sites to ensure that a cha= nge on the master site is propagated correctly. Since you ensure the Transi= tion's URI is globally unique, you can import it on all the slave repos wit= h the URI preserved, replacing the local copy, since the local URI will be = the same as the master's URI.
For Workspace (aka Named Graph) URIs, you have to assig= n them in the process of creating a new Named Graph. Follow the rules below= to create a reasonable URI.
Note that these URIs do not need to be resolvable. They= are purely symbolic names for instances buried within the reposit= ory, which are virtually guaranteed never to appear in the outside world. S= o don't worry about whether the URI is actually resolved, most of the exist= ing URIs for these types of things are not resolvable anyway.
Examples of good URIs:
http://dartmouse.edu/repo/Role_LabRathttp://dartmouse.edu/repo/WFT_13_2http://eagle-i.org/ont/repo/1.0/DARTMOUSE_ROLE_PIhttp://eagle-i.org/ont/repo/1.0/DARTMOUSE_WFT_TRASH
<=
strong>Exception: The URI of a named graph representing an ontolog=
y is usually the same as the URI of the ontology itself, i.e. the subject o=
f its owl:versionInfo
statement. If you should happen to add a=
new ontology named graph to the repository, use that URI for its name. How=
ever this should be a very rare occurrence; usually new ontological informa=
tion is simply added to the existing eagle-i data model ontology graph.
The repository has a mechanism for restricting access to some of the pro= perties of resource instances, deemed "hidden" and "contact" properties - t= hese are two distinct sets of properties, configured independently but by a= n identical mechanism. See the Resource Property Hiding&nb= sp;and Acces Control sections under Concepts<= /strong> in the Repository Design Specification / API Manual for more details about how this works.
To configure access control, bring up the Admin GUI home page, and click= on the link Manage Property Ac= cess Controls under Administrator Tasks. This page = lets you edit the Access Control List (ACL) of both contact and hiding prop= erty sets. Granting READ permission allows a user or role = to see those properties in Dissemination and harvest reports.
It is best to grant these permissions only to Roles - there should be no= need to grant property read access at the granularity of users.
Note that if you grant READ to the Anonymous pseudo-role, that is the same as turning off all protection since un= authenticated users will be able to see the hidden/contact properties.
Once you have set up a single repository to your liking, you can export = and re-import the grants to other repositories. See the Procedure: = Exporting and Importing Property Access Controls section below.
This section lists everything that can be configured, so you can get fam= iliar with it before installing anything.
The repository requires these system properties to be defined in the JVM= environment running your servlet container:
org.eaglei.repository.home
- absolute path of the repo=
sitory home directory (see below)derby.system.home
- directory containing Derby databases.<=
/li>
We recommend you set to the same path as repository home
If you are using the Apache Tomcat version 6 container (which is recomme=
nded), you can add these system properties to file conf/catalina.prop=
erties
- add lines like these: (note that the path /opt/eaglei=
/repo
is just shown as an example)
org.eaglei.repository.home =3D /opt/eaglei/repo derby.system.home=3D /opt/eaglei/repo
The repository has a notion of a home directory, the root of a = hierarchy of other runtime files.
We recommend that you place this home directory outside
The path of the home directory is determined:
org.eaglei.repository.home
is =
set, its value must be the absolute path of the home directory.These files and subdirectories are found under the repository home:
configuration.properties
- java properties file with repos=
itory and log4j configuration props. This is optional, it must be created b=
y the administrator.logs/
- Default subdirectory for log files, see configurat=
ion. Created automatically by default.sesame/
- Default Sesame RDF database files - DO NOT TOUCH=
. Created automatically by default.etc/
- Contains scripts and tools for the repo administrat=
or.db/
- Default subdirectory Derby RDBMS files - DO NOT TOUC=
H. Created automatically by default.
The configuration file is read by Apache Co= mmons Configuration, which recognizes interpolated property and system = property values. See its documentation for more information about features = in the configuration file.
You can set the following properties in the configuration.properti=
es
file. Most of the repository's "configuration" comes from adminis=
trative metadata in its RDF database and from the ontologies loaded into it=
, so the configuration settings here are very minimal and mostly serve to b=
ootstrap the RDF repository.
The properties in red&n= bsp;are required; those in orange are important and can be c= onsidered required for a production system, although they can be elided for= a test or development system at the cost of some ugliness in the UI.
Any properties not present (or commented-out) in the configuration prope= rties file will revert to the default values documented here. In most cases= this is just fine. The property is only provided so that the application's= behavior can be customized and adjusted to suit the requirements of a part= icular installation site. For example, your site may have a convention of w= riting all log files to a fielsystem separate from the applications.
eaglei.repository.namespace<=
/code>
- The namespace URI prefix for Eagle-I resource instances cre=
ated in the repository.=20
http://foo.bar.edu/i/
eaglei.repository.title
- the decorative title for UI pages, should be set for cosmetic=
reasons.=20
eaglei.repository.logo
- URL of the logo image for your site, may be either relative UR=
L (to refer to a image embedded in the webapp) or an absolute URL to use an=
image hosted elsewhere. It should be about 50 pixels high and a suitable w=
ith given the proportions.eaglei.repository.index.ur=
l
- Set this to the URL to which you want the site's "root" (=
top-level index) page redirected. Although the repository is installed as t=
he root webapp to have control over resolving Semantic Web URIs, it does no=
t need the root page so this allows you to configure your site as you like.=
eaglei.repository.admin.ba=
ckgroundColor
- Lets you change the background color for admi=
n web UI pages, to give admins an obvious cue when they are operating on e.=
g. the production vs. test repos. Value is CSS color expression, e.g. crayo=
n name like "bisque" or hex #CCFFCC (Added in Release 1.2MS2 or 3)eaglei.repository.instance=
.xslt
- path to XSL stylesheet used to transform the HTML out=
put of the instance dissemination service. A value for this key is required to produce XH=
TML in the dissemination service; without it, the service returns the inter=
nal XML document describing the instance.=20
repository/styles/example.xsl
wh=
ich creates very simple HTML, as a demonstration of how to write an XSL sty=
lesheet.eaglei.repository.instance.css
- URI of the CSS stylesh=
eet resource to be used to style instance dissemination pages. It must be a=
n absolute path or absolute URL. The default is:
eaglei.repository.instance.css =3D /repository/styles/i.css
eaglei.repository.tbox.graphs
- a comma-separated list of =
graph URIs making up the "TBox".http=
://purl.obolibrary.org/obo/ero.owl
eaglei.repository.datamodel.source
- the full name of a re=
source within the webapp which its itself a property file describing the RD=
F data model ontology. You should not need to set this, the default is =
adequate for the eagle-i applicaiton. Default is eaglei-datamode=
l.properties
which is a built-in resource file.eaglei.repository.sesame.dir
- directory where Sesame RDF =
database files are created.=20
sesame
subdirectory of home dir.eaglei.repository.log.dir
- Directory where log files are =
created.=20
logs
subdirectory of the home dir. log4j
explicitly by adding lo=
g4j
properties to this file.eaglei.repository.sesame.indexes
- index configuration =
for Sesame triple store. Must be a comma-separated list of index specifiers=
, see Sesame Nativ=
eStore configuration documentation for details. Use this to change=
the internal indexes Sesame maintains to process queries. It takes effect =
on next servlet container (tomcat) restart.
WARNING
If you have a configured value and wish to go back to the default,
eaglei.repository.slow.query
- Value in seconds of time af=
ter which a SPARQL query should be considered "slow" and logged as such. On=
ly affects the SPARQL Protocol endpoint service. Default is 0, which never =
logs. Use this to check for performance problems, since it logs the full te=
xt of the query and time of occurance in the regular log at INFO level.eaglei.repository.sparqlprotocol.max.time
- Time limit, in=
seconds, of the maximum time allowed for a query invoked by the SPARQL Pro=
tocol endpoint. Note that this does not affect any internally-gene=
rated SPARQL queries.=20
eaglei.repository.anonymous.user
- This is a hack,=
only intended for testing the Anonymous role. Its value is a user=
name, e.g. "nobody". If configured, when the designated user logs in, their=
session is downgraded to the Anonymous role; this allows =
explicit testing of Anonymous (vs. Authentica=
ted) access even when the webapp configuration does not allow unau=
thenticated access. ONLY TESTERS SHOULD EVER NEED TO SET THIS.eaglei.repository.hideContac=
ts
- true|false, enables the contact hiding function. When it=
is false, none of the other properties are used.eaglei.repository.postmaster=
- email address of repository administrator(s). User-generat=
ed messages about resources without a contact email address get sent here, =
as well as diagnostic messages. We recommend using an email list or alias s=
o it can be changed or directed to multiple people.eaglei.repository.mail.host<=
/code>
- hostname of SMTP server for outgoing mail, defaults to loca=
lhost.eaglei.repository.mail.port
- TCP port number of SMTP serv=
er for outgoing mail, only necessary if using a non-default port for your c=
hosen type of service.eaglei.repository.mail.ssl
- Use SSL for connection to SMT=
P server for outgoing mail, value is true or false.eaglei.repository.mail.username
- Username with which to a=
uthenticate to SMTP server for outgoing mail, default is unauthenticated.=
li>
eaglei.repository.mail.password
- password with which to a=
uthenticate to SMTP server for outgoing mail, default is none.The following optional properties are valid after the 2= .0 release.
eaglei.repository.searchBar.javascript.url
- Location of t=
he source of the JavaScript for the search bar. The default value is suffic=
ient unless custom search bar code needs to be loaded.eaglei.repository.centralSearch.url
- Location of the dest=
ination of the actual searches from the search bar. The default value is su=
fficient unless search is to be performed by a specific application.Note that the properties file may also contain Log4J configuration prope= rties. For example you can turn on debugging log output by adding this line= :
log4j.logger.org.eaglei.repository=3DDEBUG, repository
The repository uses Apache log4j for its logging. Any pro=
perties starting with log4j.
in the repository configuration p=
roperties are simply passed through to configure log4j. The Loggers (aka Ca=
tegories) are all descendents of the repository root Logger, org.eaglei.repository
, so you should configure the log leve=
l and appenders for that Logger.
Any log4j configuration properties in your repository configuration
The default log4j configuration sets up an appender named repository wit= h buffered I/O for efficiency. Note that this means log message= s will not appear in the log file immediately, but only after the logging v= olume fills a buffer. This is useless for interac= tive debugging through the logs. If you are doing interactive= debugging and want to see more log detail, along with immediate results, y= ou should add the properties:
log4j.logger.org.eaglei.repository=3DDEBUG, repository log4j.appender.repository.BufferedIO=3Dfalse log4j.appender.repository.ImmediateFlush=3Dtrue
Also note that the default configuration turns off additivity in the rep= o root Logger; this means its log events do not propagate up to e.g. the ro= ot logger. If you wish to turn it back on, add this to your configuration:<= /p>
log4j.additivity.org.eaglei.repository=3Dtrue
Here are all of the default log4j configuration properties:
log4j.logger.org.eaglei.repository=3DINFO, repository log4j.additivity.org.eaglei.repository=3Dfalse log4j.appender.repository=3Dorg.apache.log4j.RollingFileAppender log4j.appender.repository.File=3D${eaglei.repository.log.dir}/repository.lo= g log4j.appender.repository.ImmediateFlush=3Dfalse log4j.appender.repository.BufferedIO=3Dtrue log4j.appender.repository.Append=3Dtrue log4j.appender.repository.Encoding=3DUTF-8 log4j.appender.repository.layout=3Dorg.apache.log4j.PatternLayout log4j.appender.repository.layout.ConversionPattern=3D%d{ISO8601} %p %c - %m= %n
IMPORTANT NOTE: If you add logger
configur=
ations to tweak the level of a subset of the repo log hierarchy, you additivity
configuration to prevent log=
4j from applying the ancestor logger as well, which would result in double =
log entries. For example, this fragment shows a default log level of INFO b=
ut adds DEBUG logging of RepositoryServlet to get elapsed time messages:
log4j.logger.org.eaglei.repository=3DINFO, repository log4j.additivity.org.eaglei.repository=3Dfalse log4j.logger.org.eaglei.repository.servlet.RepositoryServlet=3DDEBUG, repos= itory log4j.additivity.org.eaglei.repository.servlet.RepositoryServlet=3Dfalse log4j.appender.repository.BufferedIO=3Dfalse log4j.appender.repository.ImmediateFlush=3Dtrue
It's often helpful to know exactly what version of the repository you're= dealing with, especially in a hectic development and/or testing environmen= t when many versions are available. The release version appears in these pl= aces:
Dissemination HTML pages, the head element contains=
a meta tag with the name eaglei.version
, e.g=
.
<meta name=3D"eaglei.version" content=3D"1.1-MS5.00-SNAPSHOT" />=
/repository/admin
lists app=
lication version info in a human-readable format./repository/version
gives a complete breakdown of=
component versions, including repo source and the version of the OpenRDF S=
esame database. It is XHTML, and it includes meta tags to =
be easy to scrape or transform.
Since the repository is mainly accessed by the REST service API it provi=
des to other applications, you should get used to monitoring it by watching=
the log file. This is a text file (UTF-8 encoding) maintained by the log4j=
library under the control of the repository's configuration properties. Se=
e the description of the log.dir
property above to learn the d=
irectory where logfiles are created; they are automatically rotated when th=
e logfile grows too large.
The default repository logfile is in the logs/
subdirectory=
of the repository home directory, and it is named repository.log. See the Configuration Properties section, above, for =
instructions on changing the destination directory for logfiles.
To troubleshoot problems with the logging system itself (e.g. log4j conf=
ig that isn't working as expected), look for where your Java Servlet contai=
ner writes the standard output stream. For Tomcat 6, this is typically the =
catalina.out
file in some log directory.
As of release 1.1MS5 the repo can log the elapsed time (in milliseconds)= for each service request. You must enable DEBUG level logging for the Repo= sitoryServlet, as in this configuration example.
log4j.logger.org.eaglei.repository=3DINFO, repository log4j.additivity.org.eaglei.repository=3Dfalse log4j.logger.org.eaglei.repository.servlet.RepositoryServlet=3DDEBUG, repos= itory log4j.additivity.org.eaglei.repository.servlet.RepositoryServlet=3Dfalse log4j.appender.repository.BufferedIO=3Dfalse log4j.appender.repository.ImmediateFlush=3Dtrue
As of release 1.2MS3 the repo will also show the time spent on internal = SPARQL queries, which can be useful when tuning Sesame indexes. Add these l= og4j configuration lines to see just the query lo= g messages:
log4j.logger.org.eaglei.repository.util.SPARQL =3D DEBUG, repository log4j.additivity.org.eaglei.repository.util.SPARQL =3D false
Then, you'll see log entries like this which you can correlate to reques= ts from your application:
...service invocation examples: 2011-01-27 14:28:06,483 T=3Dhttp-8443-1 DEBUG org.eaglei.repository.servlet= .RepositoryServlet - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Ending Request /repository/updat= e (2,159 mSec elapsed) 2011-01-27 14:27:58,023 T=3Dhttp-8443-1 DEBUG org.eaglei.repository.servlet= .RepositoryServlet - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Ending Request /repository/workf= low/push (1,763 mSec elapsed) ... (internal query example:) 2011-04-15 14:13:28,383 T=3Dhttp-8443-1 DEBUG org.eaglei.repository.util.SP= ARQL - SPARQL Query executed by org.eaglei.repository.model.User:findAll at line 227 in elapsed time (mSec)= 15
You can also get the SPARQL Protocol endpoint to make log entries at the= INFO level for "slow" queries, i.e. ones that take longer than a certain t= hreshold.
See the eaglei.repository.slow.query
configuration property=
for more details. Note that this only applies to=
to queries made through the SPARQL Protocol endpoint, not the SPARQL queri=
es generated internally by the repo code.
The performance of Sesame's NativeStore implementation is extremely sens=
itive to its index configuration. There is a major benefit to configuring i=
ndexes that help resolve triple patterns used by the most frequent and/or v=
oluminous SPARQL queries. A knowledgeable repository administrator should a=
djust the setting of the eaglei.repository.sesame.indexes
prop=
erty to get the NativeStore to build the most necessary indexes. See doc on=
that configuration for more details.
The make-snapshot script creates a complete backup copy=
of a data repository, in a designated directory. It has to be giv=
en a directory because the backup consists of multiple files. It is package=
d with the repository distribution, under the etc/
directory.<=
/p>
Upon success, the directory will contain two files:
resources.trig
-- RDF resource data in TriG format, read b=
y /graphusers.trig
-- user accounts, must be read by /import servi=
ceUpon failure, it prints an explanatory messaeg and returns non-0 status.<=
/li>
NO MESSAGE is printed upon success, which lets it run under cron.
Synopsis:
make-snapshot.sh username password repo-URL directory
Where:
Given a dump created in e.g. ${DUMPDIR}, to restore this dump on a newly= -created, empty, repository, use these commands: (where ${REPOSITORY} is UR= L prefix of the repo)
curl -D - -s -S -u ADMIN:PASSWORD -F type=3Duser -F format=3Dapplicati= on/x-trig \ -F content=3D@${DUMPDIR}/users.trig -F duplicate=3Dreplace \ -F transform=3Dno ${REPOSITORY}/repository/import
curl -s -S -D - -u ADMIN:PASSWORD -F action=3Dreplace -F all=3D \ -F "content=3D@${DUMPDIR}/resources.trig;type=3Dapplication/x-trig" \ ${REPOSITORY}/repository/graph
For example, your crontab might invoke this command to write a daily sna= pshot
in a differently-named directory each day, rotating through a week:
make-snapshot.sh ADMIN PASSWORD https://localhost:8443 "daily_cron_`da= te +%u`"
The move-everything.sh
script replicates all of a repository's contents - including resources, users and met=
adata - from one repository to a different one, or from a static file dump =
to a live repository. It transforms all resource =
(and user) URIs to match the URI prefix of the destination repository.
WARNING
This command obliterates all contents of the target repository.
Why do you need this script instead of just export and import requests? = Because moving from one repo to another, the URIs of resources and users ha= ve to be transformed.
Since resource URIs have to be resolvable, this effectively creates new = resources in the destination repository with URIs that resolve there. It do= es this by substituting the target's default prefix into all URIs that used= to resolve at the source repository.
Before you start copying resources around, be sure you understand
However, move-everything has some advantages over move-resources:
Given all of these limitations, move-everything can still be an effectiv= e way of populating a repository for testing and demonstrations. Just stay = aware of what doesn't work, and only use it when the results are te= mporary and will be discarded.
There is one other legitimate use of move-everything: r= estoring a backup copy made with make-snapshot. In this ca= se you don't really have to transform the URIs, and the whole intent is to = re-create the original state of the repo so the side effects are all desire= d.
The resource copying script is installed under etc/ in the repository ho=
me directory. Its name is move-everything.sh
. It only runs on=
a Unix-based operating system such as Linux or <=
strong>MacOS X. It requires bash,
The synopsis for copying from repository to rep= ository:
Usage: move-everything.sh [--version|--version] [ -f | --force ] [-exclude-users user,user,..|-exclude-users user,user,..] [-nousers] from-username from-password from-repo-URL to-username to-password to-repo-URL
The synopsis for copying from file to repositor= y:
Usage: move-everything.sh [--version|--version] [ -f | --force ] [-exclude-users user,user,..|-exclude-users user,user,..] [-nousers] --from-snapshot directory --from-prefix from-prefix to-username to-password to-repo-URL
The --force
option: Normally the script starts up with a di=
alog explaining how dangerous it is and how the destination repo will be co=
mpletely obliterated, and ask if you want to continue. Adding this option (=
abbreviated -f) will bypass the question and run every tim=
e, without asking. It is necessary when embedding it in another script.
If you specify a --exclude-users
option, its value is a lis=
t of one or more usernames (separated by commas and/or spaces) to be left o=
ut of the source export. This is handy when you do not want to import the a=
dministrator accounts from the source system, for example. Note that th=
e excluded users' RDF metadata will still
If you specify the --nousers
option (has no value), this tu=
rns off explicit copying of user accounts entirely. Note that users' RD=
F metadata will still get copied, becaus=
e it is in one of the named graphs which gets moved and transformed. T=
here will just be no login accounts. Also note this allows you to run move-=
everything without an Administrator login at the source repo, since all you=
need is read access to all the graphs - that does not necessarily require =
Administrator access.
The --from-snapshot
and --from-prefix
options =
must be specified together. They select the input data from a directory of =
serialized files, in the same format as produced by the make-snapshot scrip=
t. The value of --from-snapshot
is the path to the direcotry c=
ontaining the RDF serialization files. The value of -from-prefix is the exact and complete URI prefix (including the tra=
iling '/') of the repo that generated the dump in the dire=
ctory. This is necessary because the script does not ahve access to that re=
pository to query it for its prefix.
The fixed command arguments are either one or two triplets of repository= access information, i.e. the username, password, and URL of each repo.
If you selected file as input with --from-snapshot
, then yo=
u must only specify the destination repository args. Otherwise, you specify=
first the source or from repository, and=
then the target or destination repo. Eac=
h set of args consists of:
https://localhost:8443
Here is an example that copies from the production Harvard repo to a loc= al one:
move-everything.sh bigbird PASSWORD https://harvard.eagle-i.net \ bigbird PASSWORD https://localhost:8443
Here is an example that copies a snapshot the production Harvard repo to= a local one:
make-snapshot bigbird PASSWORD https://harvard.eagle-i.net \ harvard.monday move-everything.sh -f \ --from-snapshot harvard.monday \ --from-prefix http://harvard.eagle-i.net/i/ \ bigbird PASSWORD https://localhost:8443
We strongly recommend you avoid using the Superuser (administrator) login on the source= repository, to prevent accidentally obliterating it by getting the argumen= t order wrong. Use an account that has read access to every graph (e.g. the= Admin-Read-Only role). This restricts you to using the --= nousers version of the command but in most cases that is adequate. See the = #Procedures section for recommendations on how to maintain copies of = repositories this way.
The goal of this procedure is to copy all of the resource instances
Since resource URIs have to be resolvable, this effectively creates new = resources in the destination repository with URIs that resolve there. The h= ostname portion of the URI matches the new repository server, and even the = local name is allocated by the destination repository -- so there is no pre= dictable way to relate new URIs to the old ones.
Before you start copying resources around, be sure you understand
Given all of these limitations, the resource-mover script can still be a= n effective way of populating a repository for testing and demonstrations. = Just stay aware of what doesn't work.
The resource copying script is installed under etc/
in the =
repository home directory. Its name is move-resources
. It only=
runs on a Unix-based operating system such as Linux or MacOS X. It requires perl 5 and the curl executable.
Run it with -h to get the synopsis:
Usage: move-resources [-verbose] [-replace] [--type published|workspace]{ --file source-file --prefix uri-prefix | --so= urce source-repo-url =09--user login:password --graph src-graph-URI } dest-repo-url dest-login:dest-password dest-graph-URI (options may be abbreviated to first letter, e.g. -f)
By default it adds data to the destination graph, --replace
changes that to replacing the entire graph.
You can change the type of the destination graph with the --type=
code> arg. E.g. set it to either workspace or publ=
ished. By default the type is left alone.
You must choose a source by specifying either the file= arguments (-f and -p), or repository= arguments (-s, -u, -g). You must always specify the destination repository, login, and graph = so they are plain args, not options.
Here is an example command, it copies from the Published graph on qa.harvard to an "Experimental" graph on the lo=
cal repo (on https://localhost:8443)
move-resources -s https://qa.harvard.eagle-i.net:8443 -u bert:ernie \ -g http://eagle-i.org/ont/repo/1.0/NG_Published https://localhost:8443 \ root:password http://eagle-i.org/ont/repo/1.0/NG_Experimental Moved 4694 data statements and 322 metadata statements.
IMPORTANT
If you are using the Tomcat server from e.g. a Linux distro's package sy= stem, you must be aware of the following serious pitfall that can affect th= e repository when you upgrade Tomcat through the package system:
Some if not all packaged Tomcat servers include a sample webapp installe= d as teh ROOT webapp, so that the default server address can respond with a= page congratulating you on installing Tomcat.
Meanwhile, the Repository replaces this ROOT w= ebapp with its own (for good and compelling reasons detailed in the design = documents). Thus, we destructively modify the installed state of Tomcat.
Some Tomcat package upgrade proceduers (notably Fedora Core 12) have bee= n observed to simply replace files in the expanded ROOT webapp without chec= king that it was the original default ROOT webapp installed from the packag= e. While we consider this a serious bug in the distribution package, it is = unlikely to be fixed, so you must learn to expect and recover from it.
So, after upgrading a packaged Tomcat:
Remove the ROOT webapp directory (with Tomcat still shutdown) and the
Finally, delete the entire ${CATALINA_HOME}/work
direcotry.=
Tomcat rebuilds it on startup anyway, but it can contain mistaken caches t=
hat do not get updated. Now you can start up Tomcat as usual.
See also: The Procedure to redirect Port 80 so your URLs are simplified.
This section describes the differences in the install procedure when usi= ng the packaged tomcat6 server on Ubuntu Linux 9.10 (karmic koala). It was = based on experience with package tomcat6, version 6.0.20-2ubuntu2.1 .
NOTE: This procedure only lists the steps specific to U= buntu's tomcat package. You need to review the previous section and follow = that procedure, referring to this one for the steps related to tomcat.
Edit the file /etc/init.d/tomcat6
and change the follow=
ing variable to look like this:
TOMCAT6_SECURITY=3Dno
lib/
subdirectory under where y=
ou installed the create-user.sh
script.Copy them to the Tomcat common library directory:
cp ${REPO-ZIP-DIR}/lib/derby* /usr/share/tomcat6/lib/
Install the webapp: First, get rid of any existing =
root webapp, then copy in the webapp (ROOT.war
file from your =
installation kit) and be sure it is readable by the tomcat6 user:
rm /var/lib/tomcat6/webapps/ROOT*cp ROOT.war /var/lib/tomcat6/webapps/= ROOT.war
Install cached webapp context: This is VERY IMP=
ORTANT, and the Tomcat docs does not even mention it, but without it y=
our server will be mysteriously broken. The file /etc/tomcat6/Catalin=
a/localhost/ROOT.xml
must be a copy of your app's context.xml<=
/code>. Redo this command after installing every new
ROOT.war
:=
mkdir -p /etc/tomcat6/Catalina/localhost unzip -p /var/lib/tomcat6/webapps/ROOT.war META-INF/context.xml > /etc/t= omcat6/Catalina/localhost/ROOT.xml
Add System Properties: Be sure you have added syste=
m properties to the file /etc/tomcat6/catalina.properties
=
e.g.
org.eaglei.repository.home =3D /opt/eaglei/repoderby.system.home =3D /= opt/eaglei/repo
...of course, the value of these properties will be your Repositor= y Home Directory path.
Start up Tomcat:
sudo /etc/init.d/tomcat6 start
/var/log/daemon.log
- really dire tomcat problems and stdo=
ut/stderr go to syslog/var/log/tomcat6/*
- normal catalina logging${REPOSITORY_HOME}/logs/repository.log
- default repo log =
file in release 1.1; under 1.0 the filename was default.log
.=
li>
We want the repository (and other Web tools) to have a simple URL, witho=
ut the ugly port number after the hostname, e.g. like this http://dev.harvard.eagle-i.net/ and NOT http://dev.harvard.eagle-i.net:8080/...
(because real=
ly, that first one is already enough to remember.) This procedure uses IP p=
ort redirection to let your Tomcat server appear to be running on the canon=
ical HTTP port, which is 80. It is the simplest and safest method to accomp=
lish this under Linux.
The sanest alternative, running an Apache httpd server as an AJP forward= er, is much more effort and adds another point of failure. We will not even= discuss running Tomcat as root so it has access to port 80, since that is = simply unacceptable.
These procedures
To check the what rules are running
iptables -t nat -n -L
Discover your machine's primary IP address and set the ADDR shell va=
riable: (Note that this assumes eth0 is your primary netwo=
rk interface --use ifconfig -a
to see them all)
ADDR=3D`ifconfig eth0 | perl -ne 'print "$1\n" if m/\sinet addr\:(\d+\= .\d+\.\d+\.\d+)\s/;'`
Run these iptables commands to redirect all port 80 requests to port= 8080.
iptables -t nat -A OUTPUT -d localhost -p tcp --dport 80 -j REDIRECT -= -to-ports 8080 iptables -t nat -A OUTPUT -d $ADDR -p tcp --dport 80 -j REDIRECT --to-ports= 8080 iptables -t nat -A PREROUTING -d $ADDR -p tcp --dport 80 -j REDIRECT --to-p= orts 8080
(If using SSL) Run these iptables commands to redirect all port 443 = requests to port 8443.
iptables -t nat -A OUTPUT -d localhost -p tcp --dport 443 -j REDIRECT = --to-ports 8443 iptables -t nat -A OUTPUT -d $ADDR -p tcp --dport 443 -j REDIRECT --to-port= s 8443 iptables -t nat -A PREROUTING -d $ADDR -p tcp --dport 443 -j REDIRECT --to-= ports 8443
Save the rules in the canonical place to be reloaded on boot:
iptables-save > /etc/iptables.rules
Create a script to be run by the network startup infrastructure that= will reload the iptables whenever the network is configured on:
cat << EOF > /etc/network/if-pre-up.d/iptablesload #!/bin/sh iptables-restore < /etc/iptables.rules exit 0 EOF
The cleaner/preferable method, but apparently not w= orking:
/sbin/iptables-save
/etc/sysconfig/iptables
Update the startup settings so iptables will run upon reboot:
chkconfig --level 35 iptables on
The recommended way to dump out the RDF resource data content of the rep= ository is to export it as serialized RDF. If you are exp= orting the entire contents of the repository, it is essential to preserve t= he mapping of statements to named graphs, so you must use one of the format= s that encodes RDF as quads (statement plus graph-name/context).
The reason for this is that the repository server employs an RDF databas= e, Sesame, to manage the RDF statements. It uses Sesame's "native" store, w= hich records statements in opaque data files on the host OS's filesystem --= but much like relational database systems, Sesame's files are never in a c= onsistent state while it is running so it would have to be shut down (by sh= utting down the repository Web service) to make a "cold" snapshot backup. I= t is much easier to simply export the live data. Another advantage of expor= ts as backups is that the data can easily be imported into a later= version of Sesame or even a different database.
This is a complex manual procedure with many options --= for a simpler semi-automated backup snapshot procedure, see the section on= using the make-snapshot script.
Typical command to make a backup, in TriG format to a file, e.g. all-dum=
p.trig (here highlighted in yellow) from a server running locally on port 8=
0. In practice, you'll probably need to change all the highlighted parts, s=
uch as the username:password
login credentials, and the hostna=
me in the target URL if not running locally.
curl -G -X GET -s -S -u username:password -o all-dump.trig -d all \ --write-out 'status=3D%{http_code}, %{time_total}sec\n' \ -d format=3Dapplication/x-trig https://localhost:8443/repository/graph
Be sure the output shows a successful status code (name= ly 200), as shown here, since curl will return a successfu= l status even if the HTTP service did not succeed; curl only reports on the= success of the network request-and-response transaction.
status=3D200, 13.283sec
NOTE: This form of the procedure is a bit obsolete, sin= ce the new move-everything.sh script can also restore the state of a reposi= tory from its own backup -- effectively moving data to itself. See that com= mand for details.
The procedure is still worth mentioning since it demonstrates the nature= of the backup's contents:
Typical restore command.
WARNING
this replaces the
curl -s -S -u username:password -F action=3Dreplace -F all=3D \ --write-out 'status=3D%{http_code}, %{time_total}sec\n' \ -F 'content=3D@all-dump.trig;type=3Dapplication/x-trig' https://localhost:8= 443/repository/graph
Be sure the output shows a successful status code (name= ly 201, since it created graphs), as shown here, since cur= l will return a successful status even if the HTTP service did not succeed;= curl only reports on the success of the network request-and-response trans= action.
status=3D201, 13.283sec
As of the MS6 release, you can use the new Export/Import service to crea= te user accounts automatically (e.g. on a newly-created repository). This i= s NOT the same thing as true backup and restore; rather, it is int= ended more for setting up a test environment. The export and import service= s are very complex and powerful. This only gives one small example of what = they can do. For all the details, see their entry in the API Manual= .
Only do this once. Once you create a user file you like= , you can use it over and over, on any different sites and tiers you like.<= /p>
Create the user accounts you want on some repository instance. You will = export them to create a document describing the user accounts you want. The= re can be extra accounts, you can filter them out of the export. So, get al= l the accounts you want in order, with roles, passwords, and personal names= set up.
Now run a command like this to export the accounts into the file a=
ll-users.trig
curl -s -S -u username:password -G -d type=3Duser -d format=3Dapplicat= ion/x-trig \ --write-out 'status=3D%{http_code}\n' \ -o all-users.trig https://hostname:8443/repository/export
Note that you have to change the hostname and possibly =
the login. If there are accounts you do not want in the export, add an
.... -d 'exclude=3Dfrankenstein moreau lizardo' ....
You can start with a newly-created repository which needs to have user a=
ccounts added. It only has the initial administrator login, e.g. bigb=
ird
. Use the import service to add users from the file you created i=
n step 0. The following command adds all of the accounts except bigbird (since it already exists), and aborts without changi=
ng anything if there are already duplicates of any of the users on the dest=
ination repo. It will print "status=3D200" on success.
curl -s -S -u username:password -F type=3Duser -F format=3Dapplication= /x-trig \ -F transform=3Dyes --write-out 'status=3D%{http_code}\n' \ -F exclude=3Dbigbird \ -F content=3D@all-users.trig https://hostname:8443/repository/import
Note that the transform=3Dyes argument means import= will translate the instance URIs of the new users to newly-created UR= Is in the repository's default namespace. This is usually what you want. If= you are positively restoring users already in the correct namespace and yo= u want to preserve the old URIs, substitute transform=3Dno= .
The easiest way to test the existence and details of a user is with the = /whoami service. It does not show roles, however, you'll h= ave to go to the repository administrative UI for that (or take it on faith= ). For example, after restoring users including curator, t= his is how you'd check that curator exists:
curl -s -S -u curator:password -G -d format=3Dtext/plain https://hostn= ame:8443/repository/whoami
It's probably only necessary to test one user like this, and to make sur= e the output includes a URI, as a check that the whole import succeeded.
This is only relevant to release 1.5MS1 and later, when resource propert= ies have access controls.
To determine the URIs of the access controls, bring up the admin UI page= s and login as an Administrator. There will be a link to the Properites acc= ess control page named Manage P= roperty Access Controls. If you go to that page, it will display two= sets of properties for which there is an access control list:
Go to each of these in turn and observe the URI of the subject, e.g.&nbs=
p;http://eagle-i.org/ont/app/1.0/PropertyGroup=
_AdminData
. This is the URI to include in your expor=
t request. Now do the same for contact properties and record that URI too.<=
/p>
To export property grants, plug those URIs into the following command (y= ou need to replace italicized words):
curl -G -k -u ADMIN:PASSWORD -d type=3Dgrant -d "include=3DHIDE,CONTAC= T" \ -d format=3Dapplication/x-trig https://localhost:8443/repository/export
This writes a record of grants to the standard output. Since the URIs ar= e the same between other repositories running the same data model, you shou= ld be able to import them with the command (shows standard input in the exa= mple):
curl -k -u ADMIN:PASSWORD -F type=3Dgrant \ -F duplicate=3Dabort -F transform=3Dno -F content=3D@- \ -F format=3Dapplication/x-trig https://localhost:8443/repository/import