Technical documentation for ArchivesSpace
View the Project on GitHub archivesspace/tech-docs
The ArchivesSpace system uses Solr for its full-text search. As records are added/updated/deleted by the backend, the corresponding changes are made to the Solr index to keep them (roughly) synchronized.
Keeping the backend and Solr in sync is the job of the “indexer”, a separate process that runs in the background and watches for record updates. The indexer operates in two modes simultaneously:
The two modes of operation overlap somewhat, but they serve different purposes. The periodic mode ensures that records are never missed due to transient failures, and will bring the indexes up to date even if the indexer hasn’t run for quite some time–even creating them from scratch if necessary. This mode is also used for indexing updates made by bulk import processes and other updates that don’t need to be reflected in the indexes immediately.
The real-time indexer mode attempts to apply updates to the index much
more quickly. Rather than polling, it performs a GET
request
against the /update-feed
endpoint of the backend. This endpoint
returns any records that were updated since the last time it was asked
and, most importantly, it leaves the request hanging if no records
have changed.
By calling this endpoint in a loop, the real-time indexer spends most
of its time sitting around waiting for something to happen. The
moment a record is updated, the already-pending request to the
/update-feed
endpoint yields the updated record, which is sent to
Solr and indexed immediately. This avoids the delays associated with
polling and keeps indexing latency low where it matters. For example,
newly created records should appear in the browse list by the time a
user views it.