Glossary

archival unit
AU

In the LOCKSS system, an archival unit, or AU, is an arbitrary collection of content preserved meaningfully together, such as a volume's worth of an academic journal, an e-book and all its related assets, a given collection of digitized materials, and similar.

Many LOCKSS functions operate at the AU level, such as LCAP polling/voting, Web crawls, metadata extraction, and more. Each AU handled by the system is referenced by its archival unit identifier or AUID.

The shape and definition of an AU is determined by the LOCKSS plugin that handles it.

archival unit identifier
AUID

The unique string identifier of a given archival unit handled by the LOCKSS system is called an archival unit identifier, or AUID.

The definition, and therefore AUID, of an AU is determined by the LOCKSS plugin that handles it.

configure-lockss

configure-lockss is a script in the LOCKSS Installer that configures the LOCKSS stack on your host system.

container

A software container, or simply a container, is a lightweight, narrow-purpose bundle of software code, its dependencies, and its environment, forming a virtual unit of software that can be run on a variety of host environments by a container orchestration system.

The LOCKSS system uses Docker containers.

container orchestration system

A container orchestration system is a software automation tool that configures, deploys and manages containers on a host system, providing them with a runtime environment and resources such as storage and networking, and assembling them like building blocks into cohesive software applications called stacks [1].

The LOCKSS system uses Kubernetes as its container orchestration system.

content storage

The LOCKSS stack's content storage is the storage space devoted to the content being preserved by the system, consisting of one or more content storage areas.

Content storage is one of three kinds of storage needed by the LOCKSS stack, together with system storage and operating storage.

Adding, removing, or reordering the content storage areas requires a reindexing step.

content storage area

A content storage area is an individual directory of content storage, as content storage can consist of one or multiple areas.

CoreDNS

CoreDNS is an open source DNS server, used by K3s.

Curl

Curl, also styled cURL, is an open source software package for downloading and uploading files.

Docker

Docker is a software development platform for containers.

The LOCKSS stack uses Docker containers downloaded from Docker Hub.

Docker (the software) is developed by Docker (the company).

Docker Hub

Docker Hub is a container distribution service, from which containers involved in the LOCKSS stack are downloaded.

Docker Hub is operated by Docker.

firewall

A firewall is a computer security system that monitors, controls, and restricts network traffic.

firewalld

Firewalld, pronounced firewall-D, is an open source firewall management tool for Linux.

Firewalld interacts with the Linux kernel's netfilter framework via nftables.

Git

Git is an open source distributed version control system.

Software development related to LOCKSS is done in Git at GitHub.

GitHub

GitHub is a software development platform.

Software development related to LOCKSS is done in Git at GitHub.

GitHub (the platform) is operated by GitHub (the company), a subsidiary of Microsoft.

install-lockss

install-lockss is a script in the LOCKSS Installer that installs the infrastructure necessary to run the LOCKSS stack on your host system, notably K3s.

iptables

iptables, pronounced I-P-tables, is an open source software package that implements firewall rules using the Linux kernel's netfilter framework.

Although still widely in use, iptables has been succeeded by nftables.

Iptables is developed by the Netfilter project.

K3s

K3s, pronounced K-three-S, is a lightweight, open source Kubernetes distribution, on which the LOCKSS system runs.

Important prerequisites for running LOCKSS apply to the K3s data directory.

K3s is developed by Rancher, a subsidiary of SUSE.

K3s Configuration Checker

The K3s Configuration Checker, also known as k3s check-config, is a script that assesses the health of K3s after it is installed by the K3s Installer. This script is provided by Rancher as part of K3s.

K3s data directory

K3s downloads containers and stores configuration and other data into a directory known as the K3s data directory.

The K3s data directory is the most sizeable part of the system storage needs of the LOCKSS system.

Important prerequisites for running LOCKSS apply to the K3s data directory, as outlined in Section 1.3.2.1 (System Storage Prerequisites): K3s data directory must be local, K3s data directory cannot be backed by legacy XFS with ftype=0.

K3s Installer

The K3s Installer is a script that installs and configures K3s onto a Linux host. This script, provided by Rancher as part of K3s, is invoked by the LOCKSS Installer as part of install-lockss.

Kubernetes

Kubernetes, pronounced coo-burn-NET-ease, also styled K8s, is an open source container orchestration system.

The LOCKSS system uses the K3s Kubernetes distribution.

Kubernetes development is governed by the Cloud Native Computing Foundation, a project of the Linux Foundation.

LOCKSS

LOCKSS can refer to:

  • LOCKSS system: LOCKSS is an open source distributed digital preservation system.

  • LOCKSS Program: LOCKSS is a program of Stanford University Libraries that develops the LOCKSS software and provides digital preservation services.

  • The acronym Lots Of Copies Keep Stuff Safe: "Lots of Copies Keep Stuff Safe" is a digital preservation methodology embodied by the LOCKSS software.

LOCKSS Configuration Service

The LOCKSS Configuration Service is the mandatory component of the LOCKSS stack acting as its central nervous system, keeping state data and other configuration data information synchronized between Stack Components.

LOCKSS Content Audit Protocol
LCAP

The LOCKSS Content Audit Protocol, or LCAP (pronounced L-cap), is the sophisticated cryptographic audit and repair protocol at the heart of LOCKSS, that enables peer-to-peer networks called LOCKSS networks to preserve data together.

LCAP is named after El Capitán. Before being called LOCKSS Content Audit Protocol, LCAP was called Library Content Audit Protocol.

LOCKSS Crawler Service

The LOCKSS Crawler Service is an optional component of the LOCKSS stack that can harvest content from the Web using a registered Web crawler.

LOCKSS Downloader

The LOCKSS Downloader is a convience script to perform a one-time download of a software project from GitHub, using Curl or Wget rather than Git.

By default, the LOCKSS Downloader downloads the LOCKSS Installer.

LOCKSS Installer

The LOCKSS Installer is a collection of scripts to install, configure and run the LOCKSS stack on a host system.

LOCKSS Installer Directory

The LOCKSS Installer Directory is simply the directory under which the LOCKSS Installer is stored.

LOCKSS Metadata Service

The LOCKSS Metadata Service is an optional component of the LOCKSS stack that manages metadata extraction jobs and provides access to extracted metadata via DOI resolution, OpenURL queries, and other means.

LOCKSS network

A LOCKSS network is a peer-to-peer network of nodes running the LOCKSS system's LCAP audit and repair protocol to preserve data together.

LOCKSS plugin

A LOCKSS plugin is a bundle of descriptors, rules and code that adapts the general LOCKSS software to a particular digital preservation target.

LOCKSS plugins offer numerous features that affect audit and repair (content canonicalization, poll result weighting...), Web crawling (crawl initiation, crawl rules, crawl rate limiting, HTTP response handling, URL normalization, custom link extraction...), metadata extraction, Web replay (link rewriting, HTML <meta> tag rewriting...), and more.

LOCKSS Poller Service

The LOCKSS Poller Service is the mandatory component of the LOCKSS stack responsible for the operating LCAP, the LOCKSS audit and repair protocol that allows the nodes in a LOCKSS network to preserve data together.

LOCKSS Program

LOCKSS is a program of Stanford University Libraries, that develops the LOCKSS system, and provides digital preservation services.

LOCKSS Repository Service

The LOCKSS Repository Service is the mandatory component of the LOCKSS stack responsible for storing data objects as artifacts.

LOCKSS SOAP Compatibility Service

The LOCKSS SOAP Compatibility Service is an optional component of the LOCKSS stack that implements a subset of the limited LOCKSS 1.x SOAP APIs, to help with the transition to the comprehensive LOCKSS 2.x REST APIs.

LOCKSS stack

The LOCKSS stack is the stack of containers providing the functionality of the LOCKSS system as a whole.

LOCKSS system

LOCKSS is an open source distributed digital preservation system developed by the LOCKSS program at Stanford University Libraries.

log storage area

The LOCKSS stack's log storage area is devoted to log files generated by the Stack Components. The log storage area is one of the three areas of operating storage, together with the state data storage area and the temporary storage area.

netfilter

netfilter is a Linux kernel module that provides a packet filtering framework, relied on by many firewall software applications.

nftables

nftables,pronounced N-F-tables, is an open source software package that implements firewall rules using the Linux kernel's netfilter packet filtering system.

Nftables is the successor for iptables, although the latter is still widely in use.

Nftables is developed by the Netfilter project.

OpenWayback

OpenWayback (https://github.com/iipc/openwayback) is an open source Web replay engine, available as an optional component of the LOCKSS stack.

OpenWayback development is shepherded by the International Internet Preservation Consortium.

operating storage

The LOCKSS stack's operating storage is the storage space devoted to its internal operating needs, such as database data, state files, log files, temporary files, etc.

Operating storage is one of three kinds of storage needed by the LOCKSS stack, together with system storage and content storage.

Operating storage consists of the state data storage area, the log storage area, and the temporary storage area. All three can be set to the same actual directory.

PostgreSQL

PostgreSQL is an open source relational database management system.

LOCKSS requires a PostgreSQL database. By default, it uses an embedded PostgreSQL database by running a PostgreSQL container as part of the LOCKSS stack, or it can be configured to use an external PostgreSQL database maintained outside the LOCKSS stack.

Pywb

Pywb (https://github.com/webrecorder/pywb), pronounced pie-W-B, is an open source Web replay engine, available as an optional component of the LOCKSS stack.

Pywb is developed by Webrecorder.

ServeContent

ServeContent is an OpenURL resolver and sophisticated Web replay engine built into LOCKSS, whose behavior can be customized via LOCKSS plugins.

stack

A container stack, or simply a stack [1], is a cohesive software application made up of a suite of containers managed by a container orchestration system.

In particular, the suite of containers providing the functionality of the LOCKSS system as a whole is referred to as the LOCKSS stack.

state data storage area

The LOCKSS stack's state data storage area is devoted to database files, state files, and similar persistent data needed as part of normal operation. The state data storage area is one of the three areas of operating storage, together with the log storage area and the temporary storage area.

system storage

The LOCKSS stack's system storage is the storage space needed for installed software, downloaded containers, data generated by K3s, etc.

System storage is one of three kinds of storage needed by the LOCKSS stack, together with operating storage and content storage.

The most sizeable part of the system storage needs of the LOCKSS system is the K3s data directory, which has its own particular requirements.

temporary storage area

The LOCKSS stack's temporary storage area is devoted to temporary files, staging areas for unpacking compressed files or off-heap data processing, etc. The temporary storage area is one of the three areas of operating storage, together with the state data storage area and the log storage area.

ufw

Ufw, also styled UFW, short for Uncomplicated Firewall, is an open source firewall management tool for Linux.

Ufw interacts with the Linux kernel's netfilter framework via iptables.

Ufw is developed by Canonical.

Web replay engine

A Web replay engine, or Web playback engine, is a software application that allows a user to interact with a Web archive in a Web browser, typically with the ability to view a URL as it was at different points in time. The most famous example of a Web replay engine is that of the Internet Archive Wayback Machine.

LOCKSS offers up to three Web replay engines: ServeContent, Pywb, and OpenWayback.

Wget

Wget, pronounced W-get, is an open source software package for downloading and uploading files.

Wget is developed by the GNU project.


Footnotes