LOCKSS Documentation Portal Logo

LOCKSS System

  • Releases
    • LOCKSS 2.0-beta1
    • LOCKSS 1.78
    • Archived 2.x Releases
      • LOCKSS 2.0-alpha7
      • LOCKSS 2.0-alpha6
      • LOCKSS 2.0-alpha5
      • LOCKSS 2.0-alpha4
      • LOCKSS 2.0-alpha3
      • LOCKSS 2.0-alpha2
      • LOCKSS 2.0-alpha1
      • LOCKSS 2.0-alpha0
    • Archived 1.x Releases
      • LOCKSS 1.77
      • LOCKSS 1.76
      • LOCKSS 1.75
      • LOCKSS 1.74
      • Older Releases
  • Security
    • CVE-2022-39135
    • CVE-2021-45105 and CVE-2021-44832
    • CVE-2021-44228, CVE-2021-45046 and CVE-2021-4104
  • LOCKSS 1.x System Manual
  • LOCKSS 2.x System Manual
  • Acknowledgments

LOCKSS Guides

  • LOCKSS 1.x to 2.x Migration Guide
    • 1. Migration Overview
      • 1.1. Migration Scenario
      • 1.2. Overview of the Migration Process
    • 2. Upgrading to LOCKSS 1.78.5
    • 3. Preparing Your LOCKSS 2.x Host
    • 4. Installing LOCKSS 2.0.84-beta1
    • 5. Configuring LOCKSS 2.x for Migration
      • 5.1. Importing Configuration From LOCKSS 1.x
      • 5.2. Running configure-lockss --migrate
      • 5.3. Running LOCKSS 2.x
    • 6. Configuring LOCKSS 1.x for Migration
    • 7. Running the Migrator
    • 8. Reconfiguring LOCKSS 2.x for Normal Operation
    • 9. Frequently Asked Questions about the Migration
    • 10. Appendix: Differences Between LOCKSS 1.x and LOCKSS 2.x
      • 10.1. Technical Aspects
      • 10.2. Features
      • 10.3. Node Operation
    • 11. Appendix: LCAP Over SSL Migration
  • LOCKSS Network Administrator Guide
  • LOCKSS Plugin Developer Guide
    • 1. Introduction
      • 1.1. LOCKSS Plugin Concepts
        • 1.1.1. LOCKSS Plugin
        • 1.1.2. Archival Unit
        • 1.1.3. Plugin Configuration Parameters
        • 1.1.4. Plugin Format
        • 1.1.5. Plugin Feature Categories
        • 1.1.6. Identification Features
        • 1.1.7. Crawl Control Features
        • 1.1.8. Crawl Validation Features
        • 1.1.9. Poll Control Features
        • 1.1.10. Hash Filtering Features
        • 1.1.11. Metadata Extraction Features
        • 1.1.12. Web Replay Features
        • 1.1.13. Inheritance Features
        • 1.1.14. Miscellaneous Features
        • 1.1.15. Minimalistic Plugin
        • 1.1.16. Plugin Compatibility Between LOCKSS 1.x and LOCKSS 2.x
      • 1.2. LOCKSS Plugin Format
        • 1.2.1. Plugin Value Types
          • 1.2.1.1. String
          • 1.2.1.2. Integer
          • 1.2.1.3. Long Integer
          • 1.2.1.4. List
          • 1.2.1.5. Map
    • 2. Identification
      • 2.1. Plugin Identifier
      • 2.2. Plugin Name
      • 2.3. Plugin Version
      • 2.4. Plugin Configuration Parameters
        • 2.4.1. Parameter Types
          • 2.4.1.1. String
          • 2.4.1.2. URL
          • 2.4.1.3. User Credentials
          • 2.4.1.4. Integer
          • 2.4.1.5. Non-Negative Integer
          • 2.4.1.6. Long Integer
          • 2.4.1.7. Year
          • 2.4.1.8. Time Interval
          • 2.4.1.9. String Range
          • 2.4.1.10. Numeric Range
          • 2.4.1.11. Set
          • 2.4.1.12. Boolean
        • 2.4.2. Built-In Definitional Parameters
          • 2.4.2.1. Base URL
          • 2.4.2.2. Second Base URL
          • 2.4.2.3. Year
          • 2.4.2.4. Volume Number
          • 2.4.2.5. Volume Name
          • 2.4.2.6. Issue Range
          • 2.4.2.7. Numeric Issue Range
          • 2.4.2.8. Issue Set
          • 2.4.2.9. Journal Directory
          • 2.4.2.10. Journal Abbreviation
          • 2.4.2.11. Journal Identifier
          • 2.4.2.12. Journal ISSN
          • 2.4.2.13. Publisher Name
          • 2.4.2.14. OAI Request URL
          • 2.4.2.15. OAI Spec
        • 2.4.3. Built-In Non-Definitional Parameters
          • 2.4.3.1. Username and Password
          • 2.4.3.2. AU Down
          • 2.4.3.3. AU Off-Limits
          • 2.4.3.4. AU Closed
          • 2.4.3.5. Crawl Proxy
          • 2.4.3.6. New Content Crawl Interval
          • 2.4.3.7. Crawl Test Substance Threshold
        • 2.4.4. Derivative Parameters
          • 2.4.4.1. Derivative URL Parameters
          • 2.4.4.2. Derivative Year Parameters
      • 2.5. AU Name
      • 2.6. Required Daemon Version
    • 3. Crawl Control
      • 3.1. Start URLs
      • 3.2. Crawl Seed
      • 3.3. Permission URLs
      • 3.4. Per-Host Permission Path
      • 3.5. Permitted Host Pattern
      • 3.6. Crawl Rules
        • 3.6.1. Crawl Rule Types
          • 3.6.1.1. Include
          • 3.6.1.2. Exclude
          • 3.6.1.3. Include No Match
          • 3.6.1.4. Exclude No Match
          • 3.6.1.5. Include Match Else Exclude
          • 3.6.1.6. Exclude Match Else Include
      • 3.7. Crawl Window
      • 3.8. Recrawl Interval
      • 3.9. Refetch Depth
      • 3.10. Fetch Pause Time
      • 3.11. Crawl Rate Limiter
      • 3.12. Crawl Pool
      • 3.13. Response Handler
      • 3.14. URL Normalizer
      • 3.15. Link Extractor
      • 3.16. Crawl Filter
      • 3.17. URL Fetcher
      • 3.18. URL Consumer
    • 4. Crawl Validation
      • 4.1. Redirect to Login URL Pattern
      • 4.2. Login Page Checker
      • 4.3. Content Validator
      • 4.4. Substance Patterns
      • 4.5. Substance Predicate
    • 5. Poll Control
      • 5.1. Exclude URLs From Polls Pattern
      • 5.2. Poll Result Weight
      • 5.3. Repair From Publisher When Too Close
      • 5.4. Repair From Peer If Missing
    • 6. Hash Filtering
      • 6.1. Hash Filter
      • 6.2. HTML Filters
        • 6.2.1. HtmlFilterInputStream
        • 6.2.2. WhiteSpaceFilter
      • 6.3. PDF Filters
    • 7. Metadata Extraction
      • 7.1. Introduction to Metadata Extraction
      • 7.2. Article Iterator
        • 7.2.1. ArticleFiles
        • 7.2.2. SubTreeArticleIterator
        • 7.2.3. SubTreeArticleIteratorBuilder
      • 7.3. File Metadata Extractor
        • 7.3.1. SimpleFileMetadataExtractor
        • 7.3.2. JsoupTagExtractor
        • 7.3.3. RisMetadataExtractor
        • 7.3.4. SourceXmlMetadataExtractor
      • 7.4. Article Metadata Extractor
        • 7.4.1. ArticleMetadata
        • 7.4.2. BaseArticleMetadataExtractor
    • 8. Web Replay
      • 8.1. Link Rewriter
      • 8.2. Rewrite HTML Meta URLs
    • 9. Inheritance
      • 9.1. Parent Plugin
      • 9.2. Parent Plugin Version
    • 10. Appendix
      • 10.1. printf Format Strings
        • 10.1.1. printf Format String Format
        • 10.1.2. printf Format Specifiers
          • 10.1.2.1. String
          • 10.1.2.2. Integer
          • 10.1.2.3. Percent Sign
      • 10.2. Regular Expressions
  • LOCKSS Software Developer Guide
    • 1. Classic LOCKSS Development
      • 1.1. Prerequisites
        • 1.1.1. Installing Git
        • 1.1.2. Installing the Java Development Kit
        • 1.1.3. Installing Apache Ant
        • 1.1.4. Cloning the Git Repository
        • 1.1.5. JUnit Prerequisites
      • 1.2. Tour of lockss-daemon
    • 2. License Templates
      • 2.1. Plain Text
      • 2.2. Java
      • 2.3. Python
      • 2.4. Shell
      • 2.5. XML
    • 3. REST APIs

Navigation

  • LOCKSS Program Web Site
  • » LOCKSS Documentation Portal
  • LOCKSS Community Wiki
  • LOCKSS Community Discussions
LOCKSS Documentation Portal
  • LOCKSS Plugin Developer Guide
  • 8. Web Replay
  • 8.1. Link Rewriter
Previous Next

8.1. Link Rewriter

Note

This page is under construction.

Plugin Key

mediatype_link_rewriter_factory, where mediatype is a media type like text/html

Plugin Value Type

String

Plugin Value Type

The value is the fully qualified name of a Java class implementing the org.lockss.extractor.LinkRewriterFactory interface.

Sample
<entry>
  <string>text/html_link_rewriter_factory</string>
  <string>edu.example.plugin.publisherx.PublisherXHtmlLinkRewriterFactory</string>
</entry>
Description

When content is replayed through the LOCKSS system's ServeContent Web replay engine, links have to be rewritten so that they point to other ServeContent URLs where applicable. ServeContent contains logic to handle typical cases in HTML and CSS, but some specific use cases may require additional or custom link rewriting. To accomplish this, the plugin defines link rewriters for the affected media types.

For example, a Web site could have image tags for journal article figures that look like this: <img src="fig1_small.jpg" data-target="fig1_large.jpg" />, and Javascript code in the page such that when the small version of the image is clicked, an image viewer widget is displayed with the large version of the image instead. ServeContent has internal logic that knows to look for the src attribute of <img> tags, but would not know to also process this non-standard data-target attribute so the image viewer widget works with a preserved copy of the large version of the image. Depending on the situation, this might require a custom rewriter for just HTML, or for HTML plus Javascript.

Previous Next

© Copyright 2000-2025, LOCKSS Program.

Built with Sphinx using a theme provided by Read the Docs.