3.16. Crawl Filter

Note

This page is under construction.

Plugin Key

mediatype_crawl_filter_factory, where mediatype is a media type like text/html

Plugin Value Type

String

Plugin Value Format

The value is the fully qualified name of a Java class implementing the org.lockss.plugin.FilterFactory interface.

Sample
<entry>
  <string>text/html_crawl_filter_factory</string>
  <string>edu.example.plugin.publisherx.PublisherXHtmlCrawlFilterFactory</string>
</entry>
Description

If files of a given media type need to be pre-processed (filtered) before URLs are extracted by the crawler using a Link Extractor, this plugin feature can be used to point at custom filtering code.

Crawl filters are somewhat related to hash filters.