3.15. Link Extractor
Note
This page is under construction.
- Plugin Key
mediatype_link_extractor_factory
, wheremediatype
is a media type like text/html- Plugin Value Type
- Plugin Value Format
The value is the fully qualified name of a Java class implementing the
org.lockss.plugin.LinkExtractorFactory
interface.- Sample
<entry> <string>text/html_link_extractor_factory</string> <string>edu.example.plugin.publisherx.PublisherXHtmlLinkExtractorFactory</string> </entry>
- Description
The LOCKSS software comes with built-in code to extract URLs from HTML and CSS files encountered during the crawl of an AU. A URL extracted in this manner is then subject to the URL Normalizer, then the Crawl Rules determine if it should in turn be included in the AU. If URLs need to be extracted from other file types, or if the extraction behavior for built-in types like HTML and CSS needs to be extended or customized, this plugin feature can be used to point the plugin at new link extraction code.