1.2. Stack Components

This section presents the various components of LOCKSS, their function, and why you might run them as part of your LOCKSS stack. When Configuring LOCKSS, the configuration tool will allow you to select which components you wish to run.

1.2.1. Mandatory Stack Components

1.2.1.1. LOCKSS Repository Service

The LOCKSS Repository Service is the mandatory component of the LOCKSS stack responsible for storing data objects as artifacts.

By default, it runs a REST API on port 24611 [1].

1.2.1.2. LOCKSS Configuration Service

The LOCKSS Configuration Service is the mandatory component of the LOCKSS stack acting as its central nervous system, keeping state data and other configuration data information synchronized between Stack Components.

By default, it runs a REST API on port 24612 and a Web user interface on port 24602 [1].

1.2.1.3. LOCKSS Poller Service

The LOCKSS Poller Service is the mandatory component of the LOCKSS stack responsible for the operating LCAP, the LOCKSS audit and repair protocol that allows the nodes in a LOCKSS network to preserve data together.

By default, it runs LCAP on port 9729, a REST API on port 24603, and a Web user interface on port 24613 [1].

1.2.1.4. PostgreSQL

LOCKSS requires a PostgreSQL database to store underlying data, including configuration data, artifact indexing data, extracted metadata, and more.

By default, it uses an embedded PostgreSQL database [2] by running a PostgreSQL container as part of the LOCKSS stack, on port 24620 [1].

Alternatively, it can be configured to use an external PostgreSQL database [3] maintained outside the LOCKSS stack.

1.2.2. Optional Stack Components

1.2.2.1. LOCKSS Crawler Service

The LOCKSS Crawler Service is an optional component of the LOCKSS stack that can harvest content from the Web using a registered Web crawler.

Out of the box, the LOCKSS Crawler Service ships with the Classic LOCKSS Crawler, a mature, highly extensible Web crawler built into LOCKSS.

It also includes an API framework for registering external crawlers, and comes with one such external crawler, based on Wget.

LOCKSS plugins are used to control and customize Web harvesting behavior. You may run the LOCKSS Crawler Service as part of your LOCKSS stack if your application of LOCKSS involves Web crawling activities.

By default, the LOCKSS Crawler Service runs a REST API on port 24614 and a Web user interface on port 24604 [1].

1.2.2.2. LOCKSS Metadata Service

The LOCKSS Metadata Service is an optional component of the LOCKSS stack that manages metadata extraction jobs and provides access to extracted metadata via DOI resolution, OpenURL queries, and other means.

LOCKSS plugins describe how to extract metadata or other meaning from preserved content, especially content harvested from the Web by the LOCKSS Crawler Service. You may run the LOCKSS Metadata Service as part of your LOCKSS stack if your application of LOCKSS involves metadata extraction and retrieval activities.

By default, the LOCKSS Metadata Service runs a REST API on port 24615 and a Web user interface on port 24605 [1].

1.2.2.3. LOCKSS SOAP Compatibility Service

The LOCKSS SOAP Compatibility Service is an optional component of the LOCKSS stack that implements a subset of the limited LOCKSS 1.x SOAP APIs, to help with the transition to the comprehensive LOCKSS 2.x REST APIs.

You may run it as part of your LOCKSS stack if your application of LOCKSS involves legacy use of the LOCKSS 1.x SOAP APIs, such as scripting.

By default, it runs a SOAP API on port 24616 [1].

1.2.3. Web Replay Engines

The LOCKSS stack can also run a number of Web replay engines: ServeContent, Pywb, and OpenWayback.

1.2.3.1. ServeContent

ServeContent is an OpenURL resolver and sophisticated Web replay engine built into LOCKSS, whose behavior can be customized via LOCKSS plugins.

Currently, ServeContent is embedded in the LOCKSS Poller Service rather than deployed as a separate container in the LOCKSS stack.

By default, it runs a Web application on port 24640 [1].

1.2.3.2. Pywb

Pywb (https://github.com/webrecorder/pywb), pronounced pie-W-B, is an open source Web replay engine, available as an optional component of the LOCKSS stack.

By default, it runs a Web application on port 24641 [1].

1.2.3.3. OpenWayback

OpenWayback (https://github.com/iipc/openwayback) is an open source Web replay engine, available as an optional component of the LOCKSS stack.

By default, it runs a Web application on port 8080 [1].

1.2.4. Deprecated Stack Components

As of LOCKSS 2.0-beta2, the LOCKSS stack no longer requires a Solr database (embedded or external). Additionally, the LOCKSS Metadata Extraction Service has been merged into the LOCKSS Metadata Service.


Footnotes