Technical documentation for ArchivesSpace
View the Project on GitHub archivesspace/tech-docs
This document describes two aspects of running ArchivesSpace in a clustered environment: for load-balancing purposes, and for supporting multiple tenants (isolated installations of the system in a common deployment environment).
The configuration described in this document is one possible approach, but it is not intended to be prescriptive: the application layer of ArchivesSpace is stateless, so any mechanism you prefer for load balancing across web applications should work just as well as the one described here.
Unless otherwise stated, it is assumed that you have root access on your machines, and all commands are to be run as root (or with sudo).
This document assumes an architecture with the following components:
/aspace
on each machineThe files
directory in this repository (in the same directory as this
README.md
) contains what will become the contents of the /aspace
directory, shared by all servers. It has the following layout:
/aspace
├── archivesspace
│ ├── config
│ │ ├── config.rb
│ │ └── tenant.rb
│ ├── software
│ └── tenants
│ └── \_template
│ └── archivesspace
│ ├── config
│ │ ├── config.rb
│ │ └── instance_hostname.rb.example
│ └── init_tenant.sh
└── nginx
└── conf
├── common
│ └── server.conf
└── tenants
└── \_template.conf.example
The highlights:
/aspace/archivesspace/config/config.rb
– A global configuration file for all ArchivesSpace instances. Any configuration options added to this file will be applied to all tenants on all machines./aspace/archivesspace/software/
– This directory will hold the master copies of the archivesspace.zip
distribution. Each tenant will reference one of the versions of the ArchivesSpace software in this directory./aspace/archivesspace/tenants/
– Each tenant will have a sub-directory under here, based on the _template
directory provided. This holds the configuration files for each tenant./aspace/archivesspace/tenants/[tenant name]/config/config.rb
– The global configuration file for [tenant name]. This contains tenant-specific options that should apply to all of the tenant’s ArchivesSpace instances (such as their database connection settings)./aspace/archivesspace/tenants/[tenant name]/config/instance_[hostname].rb
– The configuration file for a tenant’s ArchivesSpace instance running on a particular machine. This allows configuration options to be set on a per-machine basis (for example, setting different ports for different application servers)/aspace/nginx/conf/common/server.conf
– Global Nginx configuration settings (applying to all tenants)/aspace/nginx/conf/tenants/[tenant name].conf
– A tenant-specific Nginx configuration file. Used to set the URLs of each tenant’s ArchivesSpace instances.We’ll assume you already have the following ready to go:
loadbalancer
, apps1
and
apps2
)./aspace
on each machine.
All machines should have full read/write access to this area./aspace.local
which will store instance-specific
files (such as log files and Solr indexes). Ideally this is just
a directory on local disk.Start by copying the directory structure from files/
into your
/aspace
volume. This will contain all of the configuration files
shared between servers:
mkdir /var/tmp/aspace/
cd /var/tmp/aspace/
unzip -x /path/to/archivesspace.zip
cp -av archivesspace/clustering/files/* /aspace/
You can do this on any machine that has access to the shared
/aspace/
volume.
On your application servers (apps1
and apps2
) you will need to
install the supplied init script:
cp -a /aspace/aspace-cluster.init /etc/init.d/aspace-cluster
chkconfig --add aspace-cluster
This will start all configured instances when the system boots up, and can also be used to start/stop individual instances.
You will need to install Nginx on your loadbalancer
machine, which
you can do by following the directions at
http://nginx.org/en/download.html. Using the pre-built packages for
your platform is fine. At the time of writing, the process for CentOS
is simply:
wget http://nginx.org/packages/centos/6/noarch/RPMS/nginx-release-centos-6-0.el6.ngx.noarch.rpm
rpm -i nginx-release-centos-6-0.el6.ngx.noarch.rpm
yum install nginx
Nginx will place its configuration files under /etc/nginx/
. For
now, the only change we need to make is to configure Nginx to load our
tenants’ configuration files. To do this, edit
/etc/nginx/conf.d/default.conf
and add the line:
include /aspace/nginx/conf/tenants/\*.conf;
Note: the location of Nginx’s main config file might vary between
systems. Another likely candidate is /etc/nginx/nginx.conf
.
Rather than having every tenant maintain their own copy of the
ArchivesSpace software, we put a shared copy under
/aspace/archivesspace/software/
and have each tenant instance refer
to that copy. To set this up, run the following commands on any one
of the servers:
cd /aspace/archivesspace/software/
unzip -x /path/to/downloaded/archivesspace-x.y.z.zip
mv archivesspace archivesspace-x.y.z
ln -s archivesspace-x.y.z stable
Note that we unpack the distribution into a directory containing its version number, and then assign that version the symbolic name “stable”. This gives us a convenient way of referring to particular versions of the software, and we’ll use this later on when setting up our tenant.
We’ll be using MySQL, which means we must make the MySQL connector
library available. To do this, place it in the lib/
directory of
the ArchivesSpace package:
cd /aspace/archivesspace/software/stable/lib
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.24/mysql-connector-java-5.1.24.jar
With our server setup out of the way, we’re ready to define our first
tenant. As shown in Overview of files above, each tenant has their
own directory under /aspace/archivesspace/tenants/
that holds all of
their configuration files. In defining our new tenant, we will:
Our newly defined tenant won’t initially have any ArchivesSpace instances, but we’ll set those up afterwards.
To complete the remainder of this process, there are a few bits of information you will need. In particular, you will need to know:
exampletenant
.staff.example.com
for the staff interface, and public.example.com
for the public interface.Although not strictly required, for security and ease of system monitoring it’s a good idea to have each tenant instance running under a dedicated Unix account.
We will call our new tenant exampletenant
, so let’s create a user
and group for them now. You will need to run these commands on both
application servers (apps1
and apps2
):
groupadd --gid 2000 exampletenant
useradd --uid 2000 --gid 2000 exampletenant
Note that we specify a UID and GID explicitly to ensure they match across machines.
ArchivesSpace assumes that each tenant will have their own MySQL database. You can create this from the MySQL shell:
create database exampletenant default character set utf8;
grant all on exampletenant.* to 'example'@'%' identified by 'example123';
In this example, we have a MySQL database called exampletenant
, and
we grant full access to the user example
with password example123
.
Assuming our database server is db.example.com
, this corresponds to
the database URL:
jdbc:mysql://db.example.com:3306/exampletenant?user=example&password=example123&useUnicode=true&characterEncoding=UTF-8
We’ll make use of this URL in the following section.
Each tenant has their own set of files under the
/aspace/archivesspace/tenants/
directory. We’ll define our new
tenant (called exampletenant
) by copying the template set of
configurations and running the init_tenant.sh
script to set them
up. We can do this on either apps1
or apps2
–it only needs to be
done once:
cd /aspace/archivesspace/tenants
cp -a \_template exampletenant
Note that we’ve named the tenant exampletenant
to match the Unix
account it will run as. Later on, the startup script will use this
fact to run each instance as the correct user.
For now, we’ll just edit the configuration file for this tenant, under
exampletenant/archivesspace/config/config.rb
. When you open this file you’ll see two
placeholders that need filling in: one for your database URL, which in
our case is just:
jdbc:mysql://db.example.com:3306/exampletenant?user=example&password=example123&useUnicode=true&characterEncoding=UTF-8
and the other for this tenant’s search, staff and public user secrets, which should be random, hard to guess passwords.
To add our tenant instances, we just need to initialize them on each
of our servers. On apps1
and apps2
, we run:
cd /aspace/archivesspace/tenants/exampletenant/archivesspace
./init_tenant.sh stable
If you list the directory now, you will see that the init_tenant.sh
script has created a number of symlinks. Most of these refer back to
the stable
version of the ArchivesSpace software we unpacked
previously, and some contain references to the data
and logs
directories stored under /aspace.local
.
Each server has its own configuration file that tells the
ArchivesSpace application which ports to listen on. To set this up,
make two copies of the example configuration by running the following
command on apps1
then apps2
:
cd /aspace/archivesspace/tenants/exampletenant/archivesspace
cp config/instance_hostname.rb.example config/instance_`hostname`.rb
Then edit each file to set the URLs that the instance will use.
Here’s our config/instance_apps1.example.com.rb
:
{
:backend_url => "http://apps1.example.com:8089",
:frontend_url => "http://apps1.example.com:8080",
:solr_url => "http://apps1.example.com:8090",
:indexer_url => "http://apps1.example.com:8091",
:public_url => "http://apps1.example.com:8081",
}
Note that the filename is important here: it must be:
instance_[server hostname].rb
These URLs will determine which ports the application listens on when it starts up, and are also used by the ArchivesSpace indexing system to track updates across the cluster.
As a one-off, we need to populate this tenant’s database with the
default set of tables. You can do this by running the
setup-database.sh
script on either apps1
or apps2
:
cd /aspace/archivesspace/tenants/exampletenant/archivesspace
scripts/setup-database.sh
With the two instances configured, you can now use the init script to start them up on each server:
/etc/init.d/aspace-cluster start-tenant exampletenant
and you can monitor each instance’s log file under
/aspace.local/tenants/exampletenant/logs/
. Once they’re started,
you should be able to connect to each instance with your web browser
at the configured URLs.
Our final step is configuring Nginx to accept requests for our staff
and public interfaces and forward them to the appropriate application
instance. Working on the loadbalancer
machine, we create a new
configuration file for our tenant:
cd /aspace/nginx/conf/tenants
cp -a \_template.conf.example exampletenant.conf
Now open /aspace/nginx/conf/tenants/exampletenant.conf
in an
editor. You will need to:
<tenantname>
with exampletenant
where it appears.server
URLs to match the hostnames and ports you
configured each instance with.server_name
entry. In
our case these are public.example.com
for the public interface, and
staff.example.com
for the staff interface.Once you’ve saved your configuration, you can test it with:
/usr/sbin/nginx -t
If Nginx reports that all is well, reload the configurations with:
/usr/sbin/nginx -s reload
And, finally, browse to http://public.example.com/
to verify that Nginx
is now accepting requests and forwarding them to your app servers.
We’re done!