tech-docs

Technical documentation for ArchivesSpace

View the Project on GitHub archivesspace/tech-docs

The ArchivesSpace backend

The backend is responsible for implementing the ArchivesSpace API, and supports the sort of access patterns shown in the previous section. We’ve seen that the backend must support CRUD operations against a number of different record types, and those records as expressed as JSON documents produced from instances of JSONModel classes.

The following sections describe how the backend fits together.

main.rb – load and initialize the system

The main.rb program is responsible for starting the ArchivesSpace system: loading all controllers and models, creating users/groups/permissions as needed, and preparing the system to handle requests.

When the system starts up, the main.rb program performs the following actions:

In addition to handling the system startup, main.rb also provides the following facilities:

rest.rb – Request and response handling for REST endpoints

The rest.rb module provides the mechanism used to define the API’s REST endpoints. Each endpoint definition includes:

Each controller in the system consists of one or more of these endpoint definitions. By using the endpoint syntax provided by rest.rb, the controllers can declare the interface they provide, and are freed of having to perform the sort of boilerplate associated with request handling–check parameter types, coerce values from strings into other types, and so on.

The main.rb and rest.rb components work together to insulate the controllers from much of the complexity of request handling. By the time a request reaches the body of an endpoint:

Controllers

As touched upon in the previous section, controllers implement the functionality of the ArchivesSpace API by registering one or more endpoints. Each endpoint accepts a HTTP request for a given URI, carries out the request and returns a JSON response (if successful) or throws an exception (if something goes wrong).

Each controller lives in its own file, and these can be found in the backend/app/controllers directory. Since most of the request handling logic is captured by the rest.rb module, controllers generally don’t do much more than coordinate the classes from the model layer and send a response back to the client.

crud_helpers.rb – capturing common CRUD controller actions

Even though controllers are quite thin, there’s still a lot of overlap in their behaviour. Each record type in the system supports the same set of CRUD operations, and from the controller’s point of view there’s not much difference between an update request for an accession and an update request for a digital object (for example).

The crud_helpers.rb module pulls this commonality into a set of helper methods that are invoked by each controller, providing methods for the standard operations of the system.

Models

The backend’s model layer is where the action is. The model layer’s role is to bridge the gap between the high-level JSONModel objects (complete with their properties, nested records, references to other records, etc.) and the underlying relational database (via the Sequel database toolkit). As such, the model layer is mainly concerned with mapping JSONModel instances to database tables in a way that preserves everything and allows them to be queried efficiently.

Each record type has a corresponding model class, but the individual model definitions are often quite sparse. This is because the different record types differ in the following ways:

The first of these–the set of allowable properties–is already captured by the JSONModel schema definitions, so the model layer doesn’t have to enforce these restrictions. Each model can simply take the values supplied by the JSONModel object it is passed and assume that everything that needs to be there is there, and that validation has already happened.

The remaining two aspects are enforced by the model layer, but generally don’t pertain to just a single record type. For example, an accession may be linked to zero or more subjects, but so can several other record types, so it doesn’t make sense for the Accession model to contain the logic for handling subjects.

In practice we tend to see very little functionality that belongs exclusively to a single record type, and as a result there’s not much to put in each corresponding model. Instead, models are generally constructed by combining a number of mix-ins (Ruby modules) to satisfy the requirements of the given record type. Features à la carte!

ASModel and other mix-ins

At a minimum, every model includes the ASModel mix-in, which provides base versions of the following methods:

These methods comprise the primary interface of the model layer: virtually every mix-in in the model layer overrides one or all of these to add behaviour in a modular way.

For example, the ‘notes’ mix-in adds support for multiple notes to be added to a record type–by mixing this module into a model class, that class will automatically accept a JSONModel property called ‘notes’ that will be stored and retrieved to and from the database as needed. This works by overriding the three methods as follows:

All of the mix-ins follow this pattern: call ‘super’ to delegate the call to the next mix-in in the chain (eventually reaching ASModel), then manipulate the result to implement the desired behaviour.

Nested records

Some record types, like accessions, digital objects, and subjects, are top-level records, in the sense that they are created independently of any other record and are addressable via their own URI. However, there are a number of records that can’t exist in isolation, and only exist in the context of another record. When one record can contain instances of another record, we call them nested records.

To give an example, the date record type is nested within an accession record (among others). When the model layer is asked to save a JSONModel instance containing nested records, it must pluck out those records, save them in the appropriate database table, and ensure that linkages are created within the database to allow them to be retrieved later.

This happens often enough that it would be tedious to write code for each model to handle its nested records, so the ASModel mix-in provides a declaration to handle this automatically. For example, the accession model uses a definition like:

 base.def_nested_record(:the_property => :dates,
                        :contains_records_of_type => :date,
                        :corresponding_to_association  => :date)

When creating an accession, this declaration instructs the Accession model to create a database record for each date listed in the “dates” property of the incoming record. Each of these date records will be automatically linked to the created accession.

Relationships

A relationship is a link between two top-level records, where the link is a separate, dynamically generated, model with zero or more properties of its own.

For example, the Event model can be related to several different types of records:

 define_relationship(:name => :event_link,
                     :json_property => 'linked_records',
                     :contains_references_to_types => proc {[Accession, Resource, ArchivalObject]})

This declaration generates a custom class that models the relationship between events and the other record types. The corresponding JSON schema declaration for the linked_records property looks like this:

  "linked_records" => {
    "type" => "array",
    "ifmissing" => "error",
    "minItems" => 1,
    "items" => {
      "type" => "object",
      "subtype" => "ref",
      "properties" => {
        "role" => {
          "type" => "string",
          "dynamic_enum" => "linked_event_archival_record_roles",
          "ifmissing" => "error",
        },
        "ref" => {
          "type" => [{"type" => "JSONModel(:accession) uri"},
                     {"type" => "JSONModel(:resource) uri"},
                     {"type" => "JSONModel(:archival_object) uri"},
                     ...],
          "ifmissing" => "error"
        },
      ...

That is, the property includes URI references to other records, plus an additional “role” property to indicate the nature of the relationship. The corresponding JSON might then be:

linked_records: [{ref: '/repositories/123/accessions/456', role: 'authorizer'}, ...]

The define_relationship definition automatically makes use of the appropriate join tables in the database to store this relationship and retrieve it later as needed.

Agents and agent_manager.rb

Agents present a bit of a representational challenge. There are four types of agents (person, family, corporate entity, software), and at a high-level they are structured in the same way: each type can contain one or more name records, zero or more contact records, and a number of properties. Records that link to agents (via a relationship, for example) can link to any of the four types so, in some sense, each agent type implements a common Agent interface.

However, the agent types differ in their details. Agents contain name records, but the types of those name records correspond to the type of the agent: a person agent contains a person name record, for example. So, in spite of their similarities, the different agents need to be modelled as separate record types.

The agent_manager module captures the high-level similarities between agents. Each agent model includes the agent manager mix-in:

 include AgentManager::Mixin

and then defines itself declaratively by the provided class method:

 register_agent_type(:jsonmodel => :agent_person,
                     :name_type => :name_person,
                     :name_model => NamePerson)

This definition sets up the properties of that agent. It creates:

Validations

As records are added to and updated within the ArchivesSpace system, they are validated against a number of rules to make sure they are well-formed and don’t conflict with other records. There are two types of record validation:

Record-level validations can be performed in isolation, while system-level records require comparing the record to others in the database.

System-level validations need to be implemented in the database itself (as integrity constraints), but record-level validations are often too complex to be expressed this way. As a result, validations in ArchivesSpace can appear in one or both of the following layers:

As a general rule, record-level validations are handled by the JSONModel validations (either through the JSON schema or custom validations), while system-level validations are handled by the model and the database schema.

Optimistic concurrency control

Updating a record using the ArchivesSpace API is a two part process:

 # Perform a `GET` against the desired record to fetch its JSON
 # representation:

   GET /repositories/5/accessions/2

 # Manipulate the JSON representation as required, and then `POST`
 # it back to replace the original:

   POST /repositories/5/accessions/2

If two people do this simultaneously, there’s a risk that one person would silently overwrite the changes made by the other. To prevent this, every record is marked with a version number that it carries in the lock_version property. When the system receives the updated copy of a record, it checks that the version it carries is still current; if the version number doesn’t match the one stored in the database, the update request is rejected and the user must re-fetch the latest version before applying their update.

The ArchivesSpace permissions model

The ArchivesSpace backend enforces access control, defining which users are allowed to create, read, update, suppress and delete the records in the system. The major actors in the permissions model are:

To summarize, a user can perform an action within a repository if they are a member of a group that has been assigned permission to perform that action.

Conceptual trickery

Since they’re repository-scoped, groups govern access to repositories. However, there are several record types that exist at the top-level of the system (such as the repositories themselves, subjects and agents), and the permissions model must be able to accommodate these.

To get around this, we invent a concept: the “global” repository conceptually contains the whole ArchivesSpace universe. As with other repositories, the global repository contains groups, and users can be made members of these groups to grant them permissions across the entire system. One example of this is the “admin” user, which is granted all permissions by the “administrators” group of the global repository; another is the “search indexer” user, which can read (but not update or delete) any record in the system.