Technical documentation for ArchivesSpace
View the Project on GitHub archivesspace/tech-docs
The backend is responsible for implementing the ArchivesSpace API, and supports the sort of access patterns shown in the previous section. We’ve seen that the backend must support CRUD operations against a number of different record types, and those records as expressed as JSON documents produced from instances of JSONModel classes.
The following sections describe how the backend fits together.
The main.rb
program is responsible for starting the ArchivesSpace
system: loading all controllers and models, creating
users/groups/permissions as needed, and preparing the system to handle
requests.
When the system starts up, the main.rb
program performs the
following actions:
In addition to handling the system startup, main.rb
also provides
the following facilities:
X-ArchivesSpace-Session
request header.The rest.rb
module provides the mechanism used to define the API’s
REST endpoints. Each endpoint definition includes:
Each controller in the system consists of one or more of these
endpoint definitions. By using the endpoint syntax provided by
rest.rb
, the controllers can declare the interface they provide, and
are freed of having to perform the sort of boilerplate associated
with request handling–check parameter types, coerce values from
strings into other types, and so on.
The main.rb
and rest.rb
components work together to insulate the
controllers from much of the complexity of request handling. By the
time a request reaches the body of an endpoint:
As touched upon in the previous section, controllers implement the functionality of the ArchivesSpace API by registering one or more endpoints. Each endpoint accepts a HTTP request for a given URI, carries out the request and returns a JSON response (if successful) or throws an exception (if something goes wrong).
Each controller lives in its own file, and these can be found in the
backend/app/controllers
directory. Since most of the request
handling logic is captured by the rest.rb
module, controllers
generally don’t do much more than coordinate the classes from the
model layer and send a response back to the client.
Even though controllers are quite thin, there’s still a lot of overlap in their behaviour. Each record type in the system supports the same set of CRUD operations, and from the controller’s point of view there’s not much difference between an update request for an accession and an update request for a digital object (for example).
The crud_helpers.rb
module pulls this commonality into a set of
helper methods that are invoked by each controller, providing methods
for the standard operations of the system.
The backend’s model layer is where the action is. The model layer’s role is to bridge the gap between the high-level JSONModel objects (complete with their properties, nested records, references to other records, etc.) and the underlying relational database (via the Sequel database toolkit). As such, the model layer is mainly concerned with mapping JSONModel instances to database tables in a way that preserves everything and allows them to be queried efficiently.
Each record type has a corresponding model class, but the individual model definitions are often quite sparse. This is because the different record types differ in the following ways:
The first of these–the set of allowable properties–is already captured by the JSONModel schema definitions, so the model layer doesn’t have to enforce these restrictions. Each model can simply take the values supplied by the JSONModel object it is passed and assume that everything that needs to be there is there, and that validation has already happened.
The remaining two aspects are enforced by the model layer, but
generally don’t pertain to just a single record type. For example, an
accession may be linked to zero or more subjects, but so can several
other record types, so it doesn’t make sense for the Accession
model
to contain the logic for handling subjects.
In practice we tend to see very little functionality that belongs exclusively to a single record type, and as a result there’s not much to put in each corresponding model. Instead, models are generally constructed by combining a number of mix-ins (Ruby modules) to satisfy the requirements of the given record type. Features à la carte!
At a minimum, every model includes the ASModel
mix-in, which provides
base versions of the following methods:
Model.create_from_json
– Take a JSONModel instance and create a
model instance (a subclass of Sequel::Model) from it. Returns the
instance.model.update_from_json
– Update the target model instance with
the values from a given JSONModel instance.Model.sequel_to_json
– Return a JSONModel instance of the appropriate
type whose values are taken from the target model instance.
Model classes are declared to correspond to a particular JSONModel
instance when created, so this method can automatically return a
JSONModel instance of the appropriate type.These methods comprise the primary interface of the model layer: virtually every mix-in in the model layer overrides one or all of these to add behaviour in a modular way.
For example, the ‘notes’ mix-in adds support for multiple notes to be added to a record type–by mixing this module into a model class, that class will automatically accept a JSONModel property called ‘notes’ that will be stored and retrieved to and from the database as needed. This works by overriding the three methods as follows:
Model.create_from_json
– Call ‘super’ to delegate the creation to
the next mix-in in the chain. When it returns the newly created
object, extract the notes from the JSONModel instance and attach
them to the model instance (saving them in the database).model.update_from_json
– Call ‘super’ to save the other updates
to the database, then replace any existing notes entries for the
record with the ones provided by the JSONModel.Model.sequel_to_json
– Call ‘super’ to have the next mix-in in
the chain create a JSONModel instance, then pull the stored notes
from the database and poke them into it.All of the mix-ins follow this pattern: call ‘super’ to delegate the call to the next mix-in in the chain (eventually reaching ASModel), then manipulate the result to implement the desired behaviour.
Some record types, like accessions, digital objects, and subjects, are top-level records, in the sense that they are created independently of any other record and are addressable via their own URI. However, there are a number of records that can’t exist in isolation, and only exist in the context of another record. When one record can contain instances of another record, we call them nested records.
To give an example, the date
record type is nested within an
accession
record (among others). When the model layer is asked to
save a JSONModel instance containing nested records, it must pluck out
those records, save them in the appropriate database table, and ensure
that linkages are created within the database to allow them to be
retrieved later.
This happens often enough that it would be tedious to write code for
each model to handle its nested records, so the ASModel mix-in
provides a declaration to handle this automatically. For example, the
accession
model uses a definition like:
base.def_nested_record(:the_property => :dates,
:contains_records_of_type => :date,
:corresponding_to_association => :date)
When creating an accession, this declaration instructs the Accession
model to create a database record for each date listed in the “dates”
property of the incoming record. Each of these date records will be
automatically linked to the created accession.
A relationship is a link between two top-level records, where the link is a separate, dynamically generated, model with zero or more properties of its own.
For example, the Event
model can be related to several different
types of records:
define_relationship(:name => :event_link,
:json_property => 'linked_records',
:contains_references_to_types => proc {[Accession, Resource, ArchivalObject]})
This declaration generates a custom class that models the relationship
between events and the other record types. The corresponding JSON
schema declaration for the linked_records
property looks like this:
"linked_records" => {
"type" => "array",
"ifmissing" => "error",
"minItems" => 1,
"items" => {
"type" => "object",
"subtype" => "ref",
"properties" => {
"role" => {
"type" => "string",
"dynamic_enum" => "linked_event_archival_record_roles",
"ifmissing" => "error",
},
"ref" => {
"type" => [{"type" => "JSONModel(:accession) uri"},
{"type" => "JSONModel(:resource) uri"},
{"type" => "JSONModel(:archival_object) uri"},
...],
"ifmissing" => "error"
},
...
That is, the property includes URI references to other records, plus an additional “role” property to indicate the nature of the relationship. The corresponding JSON might then be:
linked_records: [{ref: '/repositories/123/accessions/456', role: 'authorizer'}, ...]
The define_relationship
definition automatically makes use of the
appropriate join tables in the database to store this relationship and
retrieve it later as needed.
agent_manager.rb
Agents present a bit of a representational challenge. There are four
types of agents (person, family, corporate entity, software), and at a
high-level they are structured in the same way: each type can contain
one or more name records, zero or more contact records, and a number
of properties. Records that link to agents (via a relationship, for
example) can link to any of the four types so, in some sense, each
agent type implements a common Agent
interface.
However, the agent types differ in their details. Agents contain name records, but the types of those name records correspond to the type of the agent: a person agent contains a person name record, for example. So, in spite of their similarities, the different agents need to be modelled as separate record types.
The agent_manager
module captures the high-level similarities
between agents. Each agent model includes the agent manager mix-in:
include AgentManager::Mixin
and then defines itself declaratively by the provided class method:
register_agent_type(:jsonmodel => :agent_person,
:name_type => :name_person,
:name_model => NamePerson)
This definition sets up the properties of that agent. It creates:
As records are added to and updated within the ArchivesSpace system, they are validated against a number of rules to make sure they are well-formed and don’t conflict with other records. There are two types of record validation:
Record-level validations can be performed in isolation, while system-level records require comparing the record to others in the database.
System-level validations need to be implemented in the database itself (as integrity constraints), but record-level validations are often too complex to be expressed this way. As a result, validations in ArchivesSpace can appear in one or both of the following layers:
common/validations.rb
file, allowing validation
logic to be expressed using arbitrary Ruby code.As a general rule, record-level validations are handled by the JSONModel validations (either through the JSON schema or custom validations), while system-level validations are handled by the model and the database schema.
Updating a record using the ArchivesSpace API is a two part process:
# Perform a `GET` against the desired record to fetch its JSON
# representation:
GET /repositories/5/accessions/2
# Manipulate the JSON representation as required, and then `POST`
# it back to replace the original:
POST /repositories/5/accessions/2
If two people do this simultaneously, there’s a risk that one person
would silently overwrite the changes made by the other. To prevent
this, every record is marked with a version number that it carries in
the lock_version
property. When the system receives the updated
copy of a record, it checks that the version it carries is still
current; if the version number doesn’t match the one stored in the
database, the update request is rejected and the user must re-fetch
the latest version before applying their update.
The ArchivesSpace backend enforces access control, defining which users are allowed to create, read, update, suppress and delete the records in the system. The major actors in the permissions model are:
update_accession_record
permission is allowed to
update accessions for a repository.To summarize, a user can perform an action within a repository if they are a member of a group that has been assigned permission to perform that action.
Since they’re repository-scoped, groups govern access to repositories. However, there are several record types that exist at the top-level of the system (such as the repositories themselves, subjects and agents), and the permissions model must be able to accommodate these.
To get around this, we invent a concept: the “global” repository conceptually contains the whole ArchivesSpace universe. As with other repositories, the global repository contains groups, and users can be made members of these groups to grant them permissions across the entire system. One example of this is the “admin” user, which is granted all permissions by the “administrators” group of the global repository; another is the “search indexer” user, which can read (but not update or delete) any record in the system.