1.1. LOCKSS Primer

This section offers a high-level introduction to LOCKSS, particularly LOCKSS 2.x.

1.1.1. What Is LOCKSS?

LOCKSS is an open source distributed digital preservation system developed by the LOCKSS program at Stanford University Libraries.

The LOCKSS Content Audit Protocol, or LCAP (pronounced L-cap), is the sophisticated cryptographic audit and repair protocol at the heart of LOCKSS, that enables peer-to-peer networks called LOCKSS networks to preserve data together.

The LOCKSS system also includes components, tools and features to support diverse digital preservation activities:

1.1.2. What Is New in LOCKSS 2.x?

LOCKSS 2.x stems from the LAAWS (LOCKSS Architected As Web Services) initiative, an ambitious modernization project that included rewriting LOCKSS 1.x from a monolithic daemon into a suite of containerized components with REST APIs, funded in part by a grant from the Andrew W. Mellon Foundation.

Although LOCKSS 2.x is structurally different from LOCKSS 1.x, the LCAP audit and repair protocol remains the same (so a LOCKSS network can have a mixture of LOCKSS 1.x and 2.x nodes), and the plugin system is backward-compatible (so LOCKSS network operators can enjoy operational continuity during the transitional period).

LOCKSS 2.x offers an array of improvements compared to LOCKSS 1.x:

  • Containerized architecture. The functionality of LOCKSS 2.x is split into containerized components orchestrated into a stack by Kubernetes. Advantages over LOCKSS 1.x include:

    • Right-sized functionality. You can pick and choose which functional components you operate as part of your LOCKSS stack, skipping functionality not needed for your particular application. For example, many applications do not make use of metadata extraction or Web replay, and some applications do not make use of the Web crawling infrastructure to speak of, so you could leaves these components out of your configured stack if applicable.

    • Diverse Linux operating systems. LOCKSS 2.x runs on the K3s Kubernetes distribution, which can be installed on diverse Linux Operating Systems fitting your organizational IT infrastructure. Previously, LOCKSS 1.x was restricted to RHEL-compatible Linux operating systems.

  • Storage performance and scalability: LOCKSS 2.x has a revamped storage backend. Advantages over LOCKSS 1.x include:

    • WARC-based storage format. LOCKSS 2.x aggregates preserved content into WARC files [1], resulting in far fewer on-disk files, compression, and the ability for preserved content that is Web-harvested to be interoperable with other WARC-compatible Web archiving tools, especially Web replay engines.

    • Database-based state management. LOCKSS 2.x stores the state of preserved content in a PostgreSQL database, offering efficiency and scalability. Previously, LOCKSS 1.x stored state in flat files.

    • Better storage utilization. The LOCKSS 2.x storage repository can be configured to span multiple content storage areas, but unlike in LOCKSS 1.x, each archival unit (preserved collection of objects) can utilize them all. Previously, an AU would be assigned to a given content storage area and grow only there, which could lead to lopsided storage utilization and required rebalancing AUs across content storage areas.


Footnotes