Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Username---*Present in RDF as well, this is the common key.  Since it is the permanent description of a user this is **{*}immutable*.
  2. Password---Only in RDBMS.
  3. Membership in Superuser role.  No other roles.

...

Note that the there is at most one value of any of these properties for each subject. That means the "modified" properties are updated whenever a subject is modified and the record of the previous modification is lost. This is a simplification that may be remedied at some point in the future if we add versioning of data to the repository.

Named Graphs:

  • dcterms:modified---literal date of last modification, encoded as xsd:dateTime
  • dcterms:contributor---the URI of the Agent (authenticated user) who last modified it
  • dcterms:source---description of file or URI last loaded into this NG, if that is how it was created.  This record is used to compare it against the source later to decide whether an updaet update is necessary. It is a node (possibly blank node) with the following properties:
    • dcterms:identifier---the resolvable URI of the resource loaded, most likely a URL in either the file or http scheme.
    • dcterms:modified---last-modification date of the resource, for later comparison when deciding whether to decache the repository copy of an external file, a literal xsd:dateTime.
      Resource Instance:
    • unmigrated-wiki-markup
    • *dcterms:created* \---literal date when resource was created, encoded as \[ xsd:dateTime

        <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="804accce-6100-4fe7-9d54-e23434c8296d"><ac:plain-text-body><![CDATA[

        [http://www.google.com/url?q=http%3A%2F%2Fwww.w3.org%2FTR%2Fxmlschema-2%2F%23dateTime&sa=D&sntz=1&usg=AFrqEzfRDUuCfWjkGIj5ZuqOGGVPGBo7ZQ]]

        ]]></ac:plain-text-body></ac:structured-macro>

        • Set automatically by the Repository.
        • Set automatically by the Repository.
      • dcterms:creator---the URI of the Agent (should be an authorized user) who created the instance.
        • The true meaning is the user who authored the data in the inital version of this instance.
        • Usually, this is the same as the user directly responsible for creating the instance.
        • However, when a different user uploads, a spreadsheet created by other RNAVs, for example, the Repository user is a mediator. The actual value of dcterms:creator comes
        dcterms:creator ---the URI of the Agent (should be an auth'd user) who created teh instance.
        • The true meaning is the user who authored the data in the inital version of tihs instance.
        • Usually, this is the same as the user directly responsible for creating the instance.
        • However, when a different user uploads e.g. a spreadsheet created by other RNAVs, the Repository user is a mediator. The actual value of dcterms:creator comes from the uploaded data.
      • (optional) dcterms dcterms:mediator---ONLY when dcterms:creator does not refer to the authenticated user who created the data, this is the URI of the Agent (authenticated user) who created this instance in the Repository.
        • Set automatically by the Repository.
        • Set automatically by the Repository.
      • dcterms:modified---literal date when resource was last modified, encoded as xsd:dateTime
        Wiki Markup
        *dcterms:modified* \---literal date when resource was last modified, encoded as \[xsd:dateTime

        <ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="d737eee0-b16e-49ad-b8f6-b031af6719fb"><ac:plain-text-body><![CDATA[

        [http://www.google.com/url?q=http%3A%2F%2Fwww.w3.org%2FTR%2Fxmlschema-2%2F%23dateTime&sa=D&sntz=1&usg=AFrqEzfRDUuCfWjkGIj5ZuqOGGVPGBo7ZQ]]

        ]]></ac:plain-text-body></ac:structured-macro>
        • Set automatically by the Repository.
      • dcterms:contributor---the URI of the Agent (authenticated user) who last modified this instance.
        • Set automatically by the Repository.

    ...

    Some repository features are implemented as extensions to the Sesame RDF database (aka triplestore). This means they are available both internally to the repository implementation and externally whenever an API to Sesame (e.g. , its SPARQL query engine) , is exposed.

    1. Output formats

    Additional output formats for both RDF serialization and SPARQL tuple query results allow output in:

    • *HTML, * media type text/html
    • *plain text, * media type text/plain (for SPARQL)
    • N-Quads, RDF only, for comparing serialized RDF including context (graph).

    ...

    The repository adds a custom function to Sesame's query engine: :upperCaseStr. It returns the toUpperCase() version of of the string value of an RDF value. Use it to sort values ignoring whether (a) the case of characters differs, (b) they are datatyped-literals or untyped literals (or other terms).

    To invoke it you must have the repository's URI namespace defined as a
    prefix, e.g.. For example,

    Code Block
     PREFIX PREFIXreporepo:<[http://eagle-i.org/ont/repo/1.0/]>&nbsp;&nbsp; 
    ..query text...
     ORDER BYrepo:upperCaseStr(?label) 

    ...

    1. Workflow State---A node on the state map.   Every resource instance has exactly one current state.   A newly created resource is initialized to a fixed "New" state.
    2. *Transition---*Description of an arc on the map, or a transition from an initial state to a final state.
    3. *Claim---*Assertion on a resource instance that a specific user (the claimant) has taken possession of it, in order to prepare it for the next workflow transition.
      • Claiming typically causes some side-effects such as changing access controls.
      • A claim may be resolved by taking a transition to another state, or by releasing it.
    4. Pool---The set of resource instances available for claiming to a specific user. A pool is always computed by a query, it is not materialized.

    ...

    The repository must be simple (ideally "foolproof", although that only breeds more destructive fools) to install and manage, since it is intended to be deployed at dozens or hundreds of sites, managed by administrators with varying experience levels.   Al All the while, it must still maintain adequate security and data integrity.

    ...

    1. Install the webapp by dropping the WAR file into place, and adding configuration:
      1. Add one system property to the Web server's JVM to indicate the configuration properties.
      2. Create configuration properties file with at least the minimum required properties.
    2. Create authentication DB and add the initial Administrator user.
    3. Start the servlet container---the repository webapp automatically initializes the following named graphs if they are found to be empty:
      1. The internal Repository ontology (read-only)
      2. Internal Repository administrative metadata (e.g. Roles, Named Graphs, WF Transitions, etc.)
      3. SPARQL Query macros.
    4. Create an RDF metadata wrapper for the initial adminstrator user - this is done automatically by the post-install procedure, finish-install.sh. See Admin Guide for Data Repository
    5. Load the eagle-i data model ontology.  Although this is not necessary for any of the Repository's internal functions, the eagle-i apps will expect it to be there.
    6. Browse to the Admin UI and log in as the Administrator user. You can create user logins and assign roles; modulo any customizations of workflow, workspaces and such. With that, the repository is open for business.

    ...

    The configuration properties are loaded from the file configuration.properties in the repository home directory. It is read by Apache Commons Configuration, which allows system properties and other variables to be interpolated into the values. See the Apached Apache Web site for complete documentation.

    See the Configuration section in the Administrator Guide for a complete list of configuration properties.

    ...

    In a request, you can usually specify input format in two ways:

    1. Explicitly through the format arg, which takes precedence.
    2. As the HTTP Content-Type header on the request body or arg entity (this only takes precedence in a request like /update which has multiple text entities with a possible content-type)

    You can ask for an output format in two ways as well:

    1. Explicitly through the format arg, which takes precedence.
    2. As the HTTP Accept header in the request headers, which may be a list of formats and qualifiers; the repo implements full HTTP 1.1 content negotiation.

    Note that the tabular (tuple) and boolean query result formats are output-only. There are no requests that take them as input formats.

    ...

        
    1 The Context-NTriples format is not an official RDF serialization; it was added for this repository, as a convenient way to export quads for testing. Note that it was formerly named NQuads, but there is already a different unofficial format known to the RDF community that is called "NQuads".
    2 This format only supports output, it cannot be read by the repository.
    ^3^HTML 3 HTML is for interactive viewing only, it cannot be parsed.

    ...

    Args:
    format---Same as for SPARQL result format, same default (SPARQL XML)
    create=(true|false) When true, invokes alternate function of this service to create RDF metadata for current user (see explanation above).
    firstname=text---ONLY when create=true, the first name value of the created User instance (optional).
    lastname=text---ONLY when create=true, the last name value of the created User instance(optional).
    mbox=text---ONLY when create=true, the mbox value of the created User instance(optional).

    GET Result:
    Response document is a SPARQL tuple result, format determined by the same protocols as for /sparql. It contains the following columns:

    1. uri---resolvable URI of the :Agent instance for this person if any (see below).
    2. username---short user name as given to the login process
    3. firstnamefirst---(given) name, and middle names if any
    4. lastname---user's last (family) name
    5. mbox---user's email address

    Note that the last 3 fields may be empty if that data is not available.

    ...

    This call creates one or more globally unique, resolvable, URIs for new resource instances. It does not add any data to the repository; the instances will not exist until a user inserts some statements about them. The URI namespace is the default namespace from the configuration properties, followed immediately by the unique identifier. Note that ETL tools may request thousands of URIs at once so the mechanism to produce unique IDs must be able to handle that.

    URL: {{/repository/new (POST only)

    Args:
    count---number of URIs to return; optional, default is 1.
    format---same as for SPARQL result format, same default (SPARQL XML)

    Result:
    The requested number of new URIs are returned, packaged as a SPARQL query result for a field named "new". Its encoding is determined by the format parameter or, if none specified, by the Accept header of the HTTP request. Default is SPARQL/XML.

    Access: Requires an authenticated user.

    Disseminate Instance (/i, /repository/resource)

    The disseminate service returns the RDF content of an instance; it is how the URI is resolved to implement the Linked Open Data paradigm of the Semantic Web. Note that there are actually three valid ways to construct a request for any given data instance:

    1. /i/instance-ID---assumes that URI prefix matches Web server's DNS address

    ...

    1. , in other words, the configured default URI namespace

    ...

    1. .
    2. /i?uri=instance-URI---retrieves any instance URI whether the prefix matches the default

    ...

    1. namespace or not. This allows one repository to resolve multiple domains.
    2. /repository/resource?uri=instance-URI---Just like

    ...

    1. the /i form, only with authentication required. This is the recommended URL for programs accessing resource contents through the REST API, since /i might not require or make use of authentication credentials.

    URL: /i/instance-ID (GET or POST method)
            /i
           /repository/resource

    Args:
    uri=uri---optional, only if a URI is not specified as the tail of the request URI; an alternate way to explicitly specify the complete URI of the resource to disseminate. Allows any URI to be accessed, instead of assuming that the URI's namespace matches the hostname, context, and servlet path ("/i") to whcih which the reporepository's webserver responds.
    format=mimetype---optionally override the dissemination format that would be chosen by HTTP content negotiation. Note that choosing text/html results in a special human-readable result.
    view=view---optionally choose a different view dataset (see Views in concepts section) from which to select the graph for dissemination. Mutually exclusive with workspace.
    workspace=uri---URI of the workspace named graph to take the place of the default graph. Relevant metadata and ontology graphs are included automatically. Mutually exclusive with view.
    noinferred (boolean)- Exclusde --Exclusive of all inferred statements from the generated results. This really only applies to rdf:type statements; if the noinferred query argument is present (it need not have any value) then inferred types are left out of the results.
    forceRDF (boolean)---forces a result of serialized RDF data. When the negotiated result format is text/html, the usual choice is to geneate the human-readable view. This forces an HTML rendering of the RDF statements whcih which can be handy for troubleshooting, especially when combined with noinferred. NOTE: This is the only way to see the Embedded Instance statements in an interactive HTML view, e.g. for example, in a web browser, so it is especially handy for generating a clean view for debugging EIs. Default false.
    forceXML (boolean)---when an HTML format would be generated, output the intermediate XML document instead of transforming it to XHTML. This is mainly useful for obtaining examples of the intermediate XML for developing new XSLT stylesheets and testing/debugging. Default is , of course, false.

    Result:
    Returns a serialization, or, optionally, human-readable HTML renderign, rendering of the graph of RDF statements describing the indicated resource instance. Note the deliberate choice of words: This graph includes not only the statements of which the URI is the subject, but also:

    1. Statements describing Embedded Instances in the resource instance, for the same meaning of "describing" (but note that EIs are not recursive, an EI may not have its own EIs.)
    2. Statements about the "Label" properties of all predicates and object-property values of the instance and its EIs. (The exact set of properties used for "label" is configurable.) See the SWAG page for more about label properties.
    3. Provenance and some administrative metadata about the instance.

    Note that depending on the authenticated roles of the requesting user and the configured access controls, some properties may be excluded from this result. For example, in some cases, unauthenticated users will not see certain properties which may contain confidential information.

    About HTML dissemination:
    When the negotiated format is text/html, and unless either of the forceRDF or forceXML args was given, the dissemination process creates an intermediate XML document and transforms it into XHTML with the configured XSLT stylesheet. ( See description of the eaglei.repository.instance.xslt in the Admin Repository Administrator Guide.)

    If no XSLT stylesheet is configured, the intermediate XML document is delivered instead, with a media content type of application/xml. Note that this means, to obtain correct XHTML output, you MUST configure an XSLT stylesheet.

    The content of the intermediate XML format is described in a W3C XML Schema document that may be downloaded from a running repository at e.g.for example, https://localhost:8443/repository/schemas/instance.xsd

    We provide an example transformation stylesheet that produces very simple HTML, intended to be the basis of custom stylesheets. It is available for download at e.g .:https://localhost:8443/repository/styles/example.xsl

    We manaeg manage the transformation within the repository, instead of adding an xml-stylesheet processing instruction to the XML, for compelling reasons:

    1. Client-side transformation is a relatively new concept and we cannot trust it to be implemented reliably and consistently.
    2. Some user agents, such as Web crawlers, depend on receiving true HTML content and so they should not be given XML. This might be determined by content negotiation but we do not expect that to be reliably implemented either.
    3. Some aspects of the transformation depend on parameters in the stylesheet (see below) that are supplied at run-time from repository configuration values.

    The transformation stylesheet is supplied with these parameters when it is invoked. They should be declared with <xsl:param name="..."/> directives in the XSL. Be sure your stylesheet can cope with parameters that are not set, by supplying default values.

    • __repo_version---string containing Maven version spec of the running repository code. This is always set.
    • __repo_css---configured value of eaglei.repository.instance.css, may not be set.
    • __repo_logo---configured value of eaglei.repository.logo, may not be set.
    • __repo_title---configured value of eaglei.repository.title, may not be set.

    Property Filtering

    The set of properties returned in the HTML view is based on the same result as RDF disseminations, which is automatically filtered as necessary for the requesting user's access level.
    Access:

    Requires read permission on all named graphs in the query's chosen view. Note that this is the ONLY service available to unauthenticated users, so it must be able to gather a useful result from named graphs readable by the Anonymous role. If you do access this service with credentials, you will be able to see instances and properties that would be invisible to an unauthenticated user, e.g. , for example, instances in private workspaces that are still in unpublished workflow states.

    ...

    Anyone with READ access on the URI gets to see the properties, so e.g. to . To expose them to the world you'd give access to the Anonymous role. Normally only Curator, RNAV, and Lab User roles would be granted access to hidden and contact properties since they have to see and manipulate them through the dta data tools.

    Update a Single Instance (/update)

    This service actually implements three different kinds of request:

    1. Create a new resource instance (including embedded instances) - must specify its home graph.
    2. Obtain the edit token with which to modify an existing resource instance.
    3. Modify ("update") the properties of a resource instance, both delete and add in one transaction. This includes adding, deleting, and modifying any embedded instances.

    Update operation does all its work in the instance's home named graph. For an existing instance being modified, it is computed as the named graph in which the asserted rdf:type statement(s) are found.

    When creating new resources: Since the create operation hasn't got an instance from which to derive its home graph, its home graph must be specified as the workspace arg.

    Workflow Implications of Creating New Resources: Since the /update action that creates a new resource instance is effectively performing a transition from the New workflow state, the current user must have permission to make such a transition to the destination workspace; if there are multiple transitions, one is chosen arbitrarily.

    On the Acquiring and Use of Edit Tokens: The edit token is intended to "protect" the read-only copy of the instance that you (presumably) download as a basis for edits. The correct sequence of operations when modifying an instance is:

    1. Obtain the edit token.
    2. Get RDF for the resource instance.
    3. Submit an /update request to modify the resource instance, with token (1).

    This ensures that no matter how much time passes between (2) and (3), e.g. if   if, for example, a user dawdles over an interactive edit or forgets and leaves it overnight, the edit token is already in place to indicate his/her intention to make a change. It does not prevent another user from coming along and grabbing the token to make a change, but it will indicate that there is an edit in progress, and it will prevent a stale copy from being checked in.

    Comparison with SPARQL/UPDATE: In case you are wondering why we chose to implement this complex special-purpose service instead of a general protocol like SPARQL/UPDATE - there were some compelling reasons:

    1. Too difficult to impose the necessary fine-grained access controls on SPARQL/UPDATE. This service is defined to be a transaction that only affects one resource instance (and its embedded instances).
    2. Major benefit of S/U protocol is accepting a list of deletes and a list of inserts, which we do here.
    3. SPARQL/UPDATE is still not a recognized standard, and implementations are poor.

    When action=create, there must be no existing statements in the repo with the given URI as a subject. The request must include an insert arg containing one or more statements whose predicate is rdf:type. (All of the subjects must match the request URI). It is an error to specify a delete arg.

    ...

    Content-Type header on the value of the content entity in a POST request. This takes precedence because it allows for different content-types in insert and delete args.
    format query argument value.

    Wiki Markup
    *URL:* /repository/update \[ /instance-ID  \] (POST only)

    Args:
    uri - optional way to explicitly specify the complete URI, instead of assuming that the URI's namespace matches the hostname, context, and servlet path ("/i") of this webserver.
    format - the default expected format for insert and delete graphs. If the args specify a content-type header, that overrides this value. Only recognizes triples even if the format supports quads.
    action=(update|create|gettoken) - Update to modify an existing instance, create adds a new one. See below for details about gettoken.
    token=uri - When action is update or create, this must be supplied. The value is the URI returned by the last gettoken request.
    workspace=uri - Choose workspace named graph where new instance is created. Only necessary when action=create. Optional, default is the default workspace. DO NOT specify a workspace when action=update.
    delete - graph of statements to remove from the instance; subject must be the instance URI. Deletes are done before inserts. Graph may include wildcard URIs in predicate and/or object to match all values in that part of a statement.
    insert - graph of statements to add to instance; subject must be the instance URI.
    bypassSanity - (boolean, default false, deprecated) - NOTE: It is best if you pretend this option does not exist. When true, it skips some of the sanity tests on the resulting instance graph, mostly the ones checking the integrity of Embedded instances. Requires Administrator privilege. This was added to make the data migration from broken old EI data possible, it should rarely if ever be needed now.

    The delete wildcard URI is http://eagle-i.org/ont/repo/1.0/MatchAnything

    Result:
    HTTP status indicates success or failure. Any modifications are transactional; on success the entire change was made, and upon failure nothing gets changed.

    ...

    token - URI of the edit token. It has no meaning other than its use in an update transaction. This is the last edit token created (and not yet "used up" by an update) on this instance; or else a new one that was created if there wasn't one available.
    created - literal timestamp at which the token was created. It may be useful to display the date to the user if there was an existing token of recent vintage.
    creator - URI of the user who created the token
    new - boolean literal that is true if this gettoken operation created a new token. when false, taht means the token already existed, which indicates there MAY already be another user's update in progress which might conflict with yours. (see the created and creator values)
    creatorLabel - rdfs:label of the creator if available.

    Access:
    Requires ADD access to the either the instance itself or the its home named graph if the insert argument was given, and REMOVE access on the instance or the graph if the delete argument was given. When action=create, requires READ access on an appropriate Workflow Transition - out of the New state.

    ...