h1. Contents
{toc}
This page describes the way a _data model ontology_ is  configured into the repository. This ontology is like (but not really  equivalent to) a schema for RDF data, it predicts what you are likely to  find in the data and describes how you should react to it. For  example, the ontology describes which properties of a resource instance  should not be exposed to the public, and this configuration tells the  repository where to look in the ontology for those rules.

h3. Introduction

The  repository is designed to be independent of any data model ontology. All of its interactions with the ontology are externally configured,  through a properties file. This should allow other projects to use the  eagle-i repository as a standalone component, incorporating their own  data model.

The data model configuration is contained in a properties file _separate_ from the main repository configuration. Since the data model configuration is:
# uniform among all repository instances for an application
# unlikely to be changed
# contains very complex and specialized information

It makes more sense to keep it separate from an instance's configuration, reserving that for instance-specific variables.

h3. Locating the Configuration

The  data model configuration file is expected to be packaged as a resource  in the repository's web application (webapp). Since it is so crucial to  the operation of the repository, this ensures it is always available  and is not subject to an administrator's mistakes. 

To allow  alternate data model configurations to be selected, there is a  configuration property in the main repository configuration which names  the resource path of the data model configuration: *eaglei.repository.datamodel.source*. Its default value is the path "eaglei-datamodel.properties",  which describes the eagle-i data model.  You do not need this property  in the system configuration if this default is acceptable.

h4. Format

The data model configuration file is read by [Apache Commons Configuration|http://www.google.com/url?q=http%3A%2F%2Fcommons.apache.org%2Fconfiguration%2Findex.html&sa=D&sntz=1&usg=AFrqEzdYKqJPqb5zeZ8iHrUZ9afjBCSBJg],  which recognizes interpolated property and system property values.  See  the Apache  documentation for more information about features in the  configuration file.

You _can_ just treat it simply as a Java  properties file, but be aware that the comma (,) character has special  significance in property values.  It is used to specify a list of  values, and if the value is *{_}not{_}* expected to be a list, everything after the first comma is ignored.

h4. About Versions

Although your data model ontology may contain any RDF statements at all, it is *{_}required{_}* to have at least one statement describing its _version_,  so the loaded version can be compared with the one available in the  webapp when considering an update and generating reports.  (See the */model* service in the REST API for details.)
The version statement consists of an arbitrary URI as subject, the predicate [http://www.w3.org/2002/07/owl#versionInfo], and a literal string value that is expected to start with a Maven-style  version number (a sequence of integers separated by dot (.). For  example,{{[http://purl.org/eagle-i/app-ext/]}} {{[http://www.w3.org/2002/07/owl#versionInfo]}} {{"0.7.6 8274  2011-03-28 15:06:29"}}
As you will see in the next section, the version is also key to describing the location of the data model as well.

h4. Loading the Data Model

See  the description of the property {datamodel.versionInfo.source} for the gory details of how the repository actually loads the data model. Although it seems somewhat convoluted, it lets you avoid specifying redundant information (e.g. the JAR file containing the ontology) since  it can be derived from what was already given by making some safe  assumptions.

h3. Properties in the Data Model Configuration


h4. Ontology Metadata

These properties describe the data model ontology itself, and how it is managed by the repository.

* *datamodel.graph.uri*
The  value is the URI naming the named graph were the data model ontology  statements are to be stored. The repository adds statements to this  graph when it loads the data model.
* *datamodel.graph.label*
The  human-readable label to be set on the data model's named graph. This  is purely decorative and will be seen in the Admin GUI and *listGraphs* service.
* {{{*}datamodel.versionInfo.source{*}}}
This property names the file in the data model containing the relevant {{owl:versionInfo statement}}. It has two distinct purposes:

* First, indicate the resource file containing the statement describing the actual released version of this data model ontology.
* Second,  as a fortunate side-effect, identify the container or archive that can  be expected to contain the data model ontology files.
The current  algorithm for loading the data model ontology is to locate this  container (which is assumed to be a jar archive file) and _load every file in it a name ending in_ ".owl".  This is a crude assumption that works for ontologies maintained in Protoge.
* {{{*}datamodel.versionInfo.uri{*}}}
Identifies the subject URI of the statement expected to contain the version of  the data model ontology. The predicate is, as has been noted, {{owl:versionInfo}} (see section on Versions above).

h3. Data Model Description

These  properties direct where the repository looks in the ontology to control  some of its actions, e.g. to decide what properties of a resource  instance are to be hidden from public view.


h4. Label Properties

When the repository prepares a human-readable description of a resource instance, it attempts to display the text _label_ for all URIs in the subject, predicate, and object of the instance's  graph. Also, when computing the graph to return for an RDF  dissemination request, it includes the labels of all of these objects.

Since some ontologies use other properties for labels instead of or in addition to the conventional {{rdfs:label}}, the repository can be configured with an ordered list of label properties. Its influence is as follows:

# In a human-readable dissemination of the resource instance, the _first_ label predicate with a value is shown in place of the URI. If none have values, then the URIs local name is displayed.
# In a dissemination of serialized RDF, values for _any and all of the label predicates_ are included, in no particular order.

The configuration property is:

* *datamodel.label.predicate*
Value is a comma-and-or-space separated list of URIs to be used as label predicates. {code} example:
datamodel.label.predicate = [http://eagle-i.org/ont/app/1.0/preferredLabel], \ [http://purl.obolibrary.org/obo/IAO_0000111], \ [http://www.w3.org/2000/01/rdf-schema#label] {code}

h4. Embedded Instances

* *datamodel.embedded.predicate*
This defines the _predicate_ of a statement _in the data model ontology_ whose subject URI is identified as  the name of an embedded instance class.
* *datamodel.embedded.object*
This defines the _object_ of a statement _in the data model ontology_ whose subject URI is identified as the name of an embedded instance class.

Any class URIs in the ontology which are the subject of statements with the given predicate and object are considered _embedded object classes_.   This is significant in the implementation of the repository and will  not be covered here.  For example, here is the configuration, and a  statement in the ontology describing an affected class:

{code}
datamodel.embedded.predicate = http://eagle-i.org/ont/app/1.0/inClassGroupdatamodel.embedded.object = http://eagle-i.org/ont/app/1.0/ClassGroup_embedded_class
{code}

{code}
# some embedded instance classes
@prefix ero: <http://eagle-i.org/ont/app/1.0/> .
<http://purl.obolibrary.org/obo/OBI_1110023> ero:inClassGroup ero:ClassGroup_embedded_class .
<http://purl.obolibrary.org/obo/ERO_0000404> ero:inClassGroup ero:ClassGroup_embedded_class . ...
{code}

h4. Hidden Properties

These properties tell the repository how to identify properties that should be hidden from public view. {color:#ff0000}The exact rules about how they are to be exposed are not designed yet{color}.

In  practice, the predicate and object configuration lines work exactly the  same as for embedded instances above, only they name the URIs of _properties_ that are to be hidden.

* *datamodel.hideProperty.predicate*
This defines the _predicate_ of a statement _in the data model ontology_ whose subject URI is identified as the name of a _hidden property_.
* *datamodel.hideProperty.object*
This defines the _object_ of a statement _in the data model ontology_ _in the data model ontology_ whose subject URI is identified as the name of a _hidden property_.

h4. Contact Properties

These configuration keys define how the repository detects _contact properties_,  i.e. properties that contain information about real-world contacts such  as email addresses, phone numbers, lat/long, etc. There are some  situations where this information must be kept confidential, so this is  how it is identified. {color:#ff0000}The exact rules about how they are to be exposed are not designed yet{color}.

The predicate and object configuration lines work exactly the same as for hidden properties above.

* *datamodel.contactProperty.predicate*
This defines the _predicate_ of a statement _in the data model ontology_ whose subject URI is identified as&nbsp; the name of a _contact property_.
* *datamodel.contactProperty.object*
This defines the _object_ of a statement _in the data model ontology_ _in the data model ontology_ whose subject URI is identified as&nbsp; the name of a _contact property_.

h4. Contact Email Address

These configuration properties describe how the repository locates a _contact email address_ for a given resource instance. This address is where email generated by  the "contact the owner" form (see API manual for details).

* *datamodel.contact.email.query*
The  text of the SPARQL query to find the email addresses.&nbsp; It is expected  to be a SELECT query that binds the variables mentioned in the bindings  list below.&nbsp; You can use a backslash at the end of the line to escape  newlines in the query.
* *datamodel.contact.email.bindings*
An ordered list of the query variables that can be bound by the email query. Each variable is identified only by its name, _without_ the leading "?". They are separated by comma and/or space. The _first_ one of these with a value is used as the email address.&nbsp; If there are multiple results to the query, any additional values of *{_}that{_}* variable (and no others) are added as destination addresses.

Here is an example:

{code}
datamodel.contact.email.bindings = email1,email2,email3, email4
datamodel.contact.email.query = \
SELECT ?email1 ?email2 ?email3 ?email4 WHERE { \
OPTIONAL { \    
?subject <http://purl.obolibrary.org/obo/ERO_0000021> ?person . \    
?person <http://purl.obolibrary.org/obo/ERO_0000041> ?email1 . } \OPTIONAL { \    
?subject ?any_predicate ?organization . \    
?any_predicate <http://eagle-i.org/ont/app/1.0/inPropertyGroup>  \      <http://eagle-i.org/ont/app/1.0/PropertyGroup_RelatedResourceProvider> . \    
OPTIONAL { \      
?organization <http://purl.obolibrary.org/obo/ERO_0000021>
?person . \       ?person <http://purl.obolibrary.org/obo/ERO_0000041> ?email2 . \      
filter(!bound(?email1)) } \     OPTIONAL { \      
?organization <http://purl.obolibrary.org/obo/ERO_0000041> ?email3 . \      
filter(!bound(?email1) && !bound(?email2)) } \    
OPTIONAL { \      
?organization <http://purl.obolibrary.org/obo/ERO_0000022>
?pi . \       ?pi <http://purl.obolibrary.org/obo/ERO_0000041> ?email4 . \      
filter(!bound(?email1) && !bound(?email2) && !bound(?email3)) \     }}}
{code}