LOCKSS Crawler Service REST API

REST API of the LOCKSS Crawler Service
More information: https://www.lockss.org/
Contact Info: lockss-support@lockss.org
Version: 2.0.0
BasePath:/
BSD-3-Clause
https://opensource.org/licenses/BSD-3-Clause

Access

  1. HTTP Basic Authentication

Methods

[ Jump to Models ]

Table of Contents

Crawlers

Crawls

Jobs

Status

Ws

Crawlers

Up
get /crawlers/{crawlerId}
Return information about a crawler. (getCrawlerConfig)
Get information related to a installed crawler.

Path parameters

crawlerId (required)
Path Parameter — Identifier for the crawler

Return type

crawlerConfig

Example data

Content-Type: application/json
{
  "crawlerId" : "classic",
  "attributes" : {
    "key" : "attributes"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

Crawler Configuration Found crawlerConfig

401

Access Denied.

404

No Such Crawler

500

Internal Server Error

Up
get /crawlers
Get the list of supported crawlers. (getCrawlers)
Return the list of supported crawlers.

Return type

crawlerStatuses

Example data

Content-Type: application/json
{
  "crawlerMap" : {
    "key" : {
      "numJobsSuccessful" : 1,
      "numJobsPending" : 5,
      "numJobsFailed" : 6,
      "isEnabled" : true,
      "isAutoCrawlEnabled" : true,
      "numJobsActive" : 0,
      "errMessage" : "errMessage"
    }
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The Status of supported Crawlers. crawlerStatuses

404

No Such Crawler

Crawls

Up
delete /crawls/{jobId}
Remove or stop a crawl (deleteCrawlById)
Delete a crawl given the crawl identifier, stopping any current processing, if necessary.

Path parameters

jobId (required)
Path Parameter — The identifier used to identify a specific crawl.

Return type

crawlStatus

Example data

Content-Type: application/json
{
  "auId" : "auId",
  "jobStatus" : {
    "msg" : "msg",
    "statusCode" : "STATUS_UNKNOWN"
  },
  "auName" : "auName",
  "sources" : [ "sources", "sources" ],
  "bytesFetched" : 2,
  "fetchedItems" : {
    "itemsLink" : "itemsLink",
    "count" : 7
  },
  "type" : "type",
  "refetchDepth" : 1,
  "isActive" : true,
  "mimeTypes" : [ {
    "count" : 9,
    "mimeType" : "mimeType",
    "counterLink" : "counterLink"
  }, {
    "count" : 9,
    "mimeType" : "mimeType",
    "counterLink" : "counterLink"
  } ],
  "isError" : true,
  "startTime" : 5,
  "crawlerId" : "classic",
  "priority" : 0,
  "startUrls" : [ "startUrls", "startUrls" ],
  "jobId" : "jobId",
  "proxy" : "proxy",
  "depth" : 6,
  "isWaiting" : true,
  "endTime" : 5
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The deleted crawl crawlStatus

401

Unauthorized

403

Forbidden

404

Not Found

500

Internal Server Error

Up
get /crawls/{jobId}
Get the crawl status of this job (getCrawlById)
Get the job represented by this crawl id

Path parameters

jobId (required)
Path Parameter

Return type

crawlStatus

Example data

Content-Type: application/json
{
  "auId" : "auId",
  "jobStatus" : {
    "msg" : "msg",
    "statusCode" : "STATUS_UNKNOWN"
  },
  "auName" : "auName",
  "sources" : [ "sources", "sources" ],
  "bytesFetched" : 2,
  "fetchedItems" : {
    "itemsLink" : "itemsLink",
    "count" : 7
  },
  "type" : "type",
  "refetchDepth" : 1,
  "isActive" : true,
  "mimeTypes" : [ {
    "count" : 9,
    "mimeType" : "mimeType",
    "counterLink" : "counterLink"
  }, {
    "count" : 9,
    "mimeType" : "mimeType",
    "counterLink" : "counterLink"
  } ],
  "isError" : true,
  "startTime" : 5,
  "crawlerId" : "classic",
  "priority" : 0,
  "startUrls" : [ "startUrls", "startUrls" ],
  "jobId" : "jobId",
  "proxy" : "proxy",
  "depth" : 6,
  "isWaiting" : true,
  "endTime" : 5
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The crawl status of the requested crawl crawlStatus

401

Unauthorized

404

Not Found

500

Internal Server Error

Up
get /crawls/{jobId}/mimeType/{type}
A pageable list of urls of mimetype. (getCrawlByMimeType)
Get a list of urls of mimetype.

Path parameters

jobId (required)
Path Parameter
type (required)
Path Parameter

Query parameters

limit (optional)
Query Parameter — The number of jobs per page.
continuationToken (optional)
Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json
{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up
get /crawls/{jobId}/errors
A pageable list of urls with errors. (getCrawlErrors)
Get a list of urls with errors.

Path parameters

jobId (required)
Path Parameter

Query parameters

limit (optional)
Query Parameter — The number of jobs per page.
continuationToken (optional)
Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json
{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested urls with errors. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up
get /crawls/{jobId}/excluded
A pageable list of excluded urls. (getCrawlExcluded)
Get a list of excluded urls.

Path parameters

jobId (required)
Path Parameter — identifier used to identify a specific crawl.

Query parameters

limit (optional)
Query Parameter — The number of jobs per page.
continuationToken (optional)
Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json
{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested excluded urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up
get /crawls/{jobId}/fetched
A pageable list of fetched urls. (getCrawlFetched)
Get a list of fetched urls.

Path parameters

jobId (required)
Path Parameter

Query parameters

limit (optional)
Query Parameter — The number of jobs per page.
continuationToken (optional)
Query Parameter — The continuation token of the next page of jobs to be\ \ returned.

Return type

urlPager

Example data

Content-Type: application/json
{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested fetched urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up
get /crawls/{jobId}/notModified
A pageable list of not modified urls. (getCrawlNotModified)
Get a list of not modified urls.

Path parameters

jobId (required)
Path Parameter

Query parameters

limit (optional)
Query Parameter — The number of jobs per page.
continuationToken (optional)
Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json
{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested not modified urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up
get /crawls/{jobId}/parsed
A pageable list of parsed urls. (getCrawlParsed)
Get a list of parsed urls.

Path parameters

jobId (required)
Path Parameter

Query parameters

limit (optional)
Query Parameter — The number of jobs per page.
continuationToken (optional)
Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json
{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested parsed urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up
get /crawls/{jobId}/pending
A pageable list of pending urls. (getCrawlPending)
Get a list of pending urls.

Path parameters

jobId (required)
Path Parameter

Query parameters

limit (optional)
Query Parameter — The number of jobs per page.
continuationToken (optional)
Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json
{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested pending urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up
get /crawls
Get the list of crawls. (getCrawls)
Get a list of crawls a pageful at a time as defined by limit.

Query parameters

limit (optional)
Query Parameter — The number of jobs per page default: 50
continuationToken (optional)
Query Parameter — The continuation token of the next page of crawl status data to be returned.

Return type

crawlPager

Example data

Content-Type: application/json
{
  "crawls" : [ {
    "auId" : "auId",
    "jobStatus" : {
      "msg" : "msg",
      "statusCode" : "STATUS_UNKNOWN"
    },
    "auName" : "auName",
    "sources" : [ "sources", "sources" ],
    "bytesFetched" : 2,
    "fetchedItems" : {
      "itemsLink" : "itemsLink",
      "count" : 7
    },
    "type" : "type",
    "refetchDepth" : 1,
    "isActive" : true,
    "mimeTypes" : [ {
      "count" : 9,
      "mimeType" : "mimeType",
      "counterLink" : "counterLink"
    }, {
      "count" : 9,
      "mimeType" : "mimeType",
      "counterLink" : "counterLink"
    } ],
    "isError" : true,
    "startTime" : 5,
    "crawlerId" : "classic",
    "priority" : 0,
    "startUrls" : [ "startUrls", "startUrls" ],
    "jobId" : "jobId",
    "proxy" : "proxy",
    "depth" : 6,
    "isWaiting" : true,
    "endTime" : 5
  }, {
    "auId" : "auId",
    "jobStatus" : {
      "msg" : "msg",
      "statusCode" : "STATUS_UNKNOWN"
    },
    "auName" : "auName",
    "sources" : [ "sources", "sources" ],
    "bytesFetched" : 2,
    "fetchedItems" : {
      "itemsLink" : "itemsLink",
      "count" : 7
    },
    "type" : "type",
    "refetchDepth" : 1,
    "isActive" : true,
    "mimeTypes" : [ {
      "count" : 9,
      "mimeType" : "mimeType",
      "counterLink" : "counterLink"
    }, {
      "count" : 9,
      "mimeType" : "mimeType",
      "counterLink" : "counterLink"
    } ],
    "isError" : true,
    "startTime" : 5,
    "crawlerId" : "classic",
    "priority" : 0,
    "startUrls" : [ "startUrls", "startUrls" ],
    "jobId" : "jobId",
    "proxy" : "proxy",
    "depth" : 6,
    "isWaiting" : true,
    "endTime" : 5
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested crawls crawlPager

400

Bad Request

401

Unauthorized

500

Internal Server Error

Jobs

Up
delete /jobs
Delete all of the currently queued and active jobs (deleteJobs)
Halt and delete all of the currently queued and active crawl jobs

Responses

200

All crawl jobs have been stopped and deleted.

400

Bad Request

401

Unauthorized

500

Internal Server Error

Up
get /jobs
Get the list of crawl jobs. (getJobs)
Get a list of crawl jobs a pageful at a time as defined by the continuation token and limit.

Query parameters

limit (optional)
Query Parameter — The number of jobs per page default: 50
continuationToken (optional)
Query Parameter — The continuation token of the next page of jobs to be returned.

Return type

jobPager

Example data

Content-Type: application/json
{
  "jobs" : [ {
    "result" : "result",
    "jobId" : "jobId",
    "jobStatus" : {
      "msg" : "msg",
      "statusCode" : "STATUS_UNKNOWN"
    },
    "crawlDesc" : {
      "forceCrawl" : false,
      "auId" : "auId",
      "crawlKind" : "newContent",
      "crawlerId" : "classic",
      "crawlList" : [ "crawlList", "crawlList" ],
      "extraCrawlerData" : {
        "key" : { }
      },
      "refetchDepth" : 0,
      "priority" : 6,
      "crawlDepth" : 1
    },
    "endDate" : 2,
    "requestDate" : 5,
    "startDate" : 5
  }, {
    "result" : "result",
    "jobId" : "jobId",
    "jobStatus" : {
      "msg" : "msg",
      "statusCode" : "STATUS_UNKNOWN"
    },
    "crawlDesc" : {
      "forceCrawl" : false,
      "auId" : "auId",
      "crawlKind" : "newContent",
      "crawlerId" : "classic",
      "crawlList" : [ "crawlList", "crawlList" ],
      "extraCrawlerData" : {
        "key" : { }
      },
      "refetchDepth" : 0,
      "priority" : 6,
      "crawlDepth" : 1
    },
    "endDate" : 2,
    "requestDate" : 5,
    "startDate" : 5
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The requested crawls jobPager

400

Bad Request

401

Unauthorized

500

Internal Server Error

Up
post /jobs
Request a crawl as defined by the descriptor (queueJob)
Enqueue a new crawl job as defined by the crawl descriptor and return it.

Consumes

This API call consumes the following media types via the Content-Type request header:

Request body

body crawlDesc (required)
Body Parameter — crawl request

Return type

crawlJob

Example data

Content-Type: application/json
{
  "result" : "result",
  "jobId" : "jobId",
  "jobStatus" : {
    "msg" : "msg",
    "statusCode" : "STATUS_UNKNOWN"
  },
  "crawlDesc" : {
    "forceCrawl" : false,
    "auId" : "auId",
    "crawlKind" : "newContent",
    "crawlerId" : "classic",
    "crawlList" : [ "crawlList", "crawlList" ],
    "extraCrawlerData" : {
      "key" : { }
    },
    "refetchDepth" : 0,
    "priority" : 6,
    "crawlDepth" : 1
  },
  "endDate" : 2,
  "requestDate" : 5,
  "startDate" : 5
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

202

The crawl request has been queued for operation. crawlJob

400

Bad Request

401

Unauthorized

403

Forbidden

404

Not Found

500

Internal Server Error

Status

Up
get /status
Get the status of the service (getStatus)
Get the status of the service

Return type

apiStatus

Example data

Content-Type: application/json
{
  "startupStatus" : "NONE",
  "reason" : "reason",
  "readyTime" : 0,
  "apiVersion" : "apiVersion",
  "ready" : true,
  "componentName" : "componentName",
  "componentVersion" : "componentVersion",
  "serviceName" : "serviceName",
  "lockssVersion" : "lockssVersion"
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

The status of the service apiStatus

401

Unauthorized

500

Internal Server Error

Ws

Up
get /ws/crawls
Query for list of crawls based on subset defined by query string (getWsCrawls)
Query for crawls that meet a set of specified conditions

Query parameters

crawlQuery (required)
Query Parameter — The query that specifies the crawls to be returned

Return type

crawlWsResult

Example data

Content-Type: application/json
{
  "auId" : "auId",
  "auName" : "auName",
  "pagesFetchedCount" : 5,
  "sources" : [ "sources", "sources" ],
  "linkDepth" : 1,
  "pagesFetched" : [ "pagesFetched", "pagesFetched" ],
  "pagesNotModifiedCount" : 2,
  "pagesParsed" : [ "pagesParsed", "pagesParsed" ],
  "refetchDepth" : 1,
  "duration" : 1,
  "mimeTypes" : [ "mimeTypes", "mimeTypes" ],
  "pagesWithErrorsCount" : 4,
  "pagesWithErrors" : [ {
    "severity" : "severity",
    "message" : "message",
    "url" : "url"
  }, {
    "severity" : "severity",
    "message" : "message",
    "url" : "url"
  } ],
  "startTime" : 6,
  "crawlType" : "crawlType",
  "pagesParsedCount" : 2,
  "bytesFetchedCount" : 5,
  "crawlKey" : "crawlKey",
  "crawlStatus" : "crawlStatus",
  "priority" : 0,
  "pagesPendingCount" : 7,
  "pagesPending" : [ "pagesPending", "pagesPending" ],
  "pagesExcludedCount" : 9,
  "offSiteUrlsExcludedCount" : 3,
  "pagesNotModified" : [ "pagesNotModified", "pagesNotModified" ],
  "mimeTypeCount" : 7,
  "pagesExcluded" : [ "pagesExcluded", "pagesExcluded" ],
  "startingUrls" : [ "startingUrls", "startingUrls" ]
}

Produces

This API call produces the following media types according to the Accept request header; the media type will be conveyed by the Content-Type response header.

Responses

200

Information about the requested crawls crawlWsResult

400

Bad Request

401

Unauthorized

500

Internal Server Error

Models

[ Jump to Methods ]

Table of Contents

  1. apiStatus
  2. counter
  3. crawlDesc
  4. crawlJob
  5. crawlPager
  6. crawlStatus
  7. crawlWsResult
  8. crawlWsResult_pagesWithErrors
  9. crawlerConfig
  10. crawlerStatus
  11. crawlerStatuses
  12. jobPager
  13. jobStatus
  14. mimeCounter
  15. pageInfo
  16. urlError
  17. urlInfo
  18. urlPager

apiStatus Up

The status information of the service
apiVersion
String The version of the API
componentName (optional)
String The name of the component
componentVersion (optional)
String The version of the component software
lockssVersion (optional)
String The version of the LOCKSS system
ready
Boolean The indication of whether the service is available
serviceName (optional)
String The name of the service
readyTime (optional)
Long The time the service last became ready. format: int64
reason (optional)
String The reason the service isn't ready.
startupStatus (optional)
String Enum indicating progress of plugin/AU processing at startup.
Enum:
NONE
PLUGINS_CRAWLING
PLUGINS_COLLECTED
PLUGINS_LOADING
PLUGINS_LOADED
AUS_STARTING
AUS_STARTED

counter Up

A counter for urls.
count
Integer The number of elements format: int32
itemsLink
String A link to the list of count items or to a pager with count\ \ items.

crawlDesc Up

A descriptor for a crawl.
auId
String The identifier of the archival unit to be crawled.
crawlKind
String The kind of crawl being performed either 'newContent' or 'repair'.
Enum:
newContent
repair
crawlerId (optional)
String The crawler to be used for this crawl.
forceCrawl (optional)
Boolean An indication of whether the crawl is to be forced,\ \ suppressing conditions that might otherwise prevent the crawl from\ \ happening.
refetchDepth (optional)
Integer The refetch depth to use for a deep crawl. format: int32
priority (optional)
Integer The priority for the crawl. format: int32
crawlList (optional)
array[String] The list of URLs to crawl.
crawlDepth (optional)
Integer The depth to which the links should be followed. 0 means\ \ do not follow links. format: int32
extraCrawlerData (optional)
map[String, Object] A map of additional properties for a crawl on a given crawler.

crawlJob Up

The job resulting from a request to perform a crawl.
crawlDesc
requestDate
Long The timestamp when the crawl was requested. format: int64
jobId
String Identifier of the crawl job.
jobStatus
startDate (optional)
Long The timestamp when the crawl began. format: int64
endDate (optional)
Long The timestamp when the crawl ended. format: int64
result (optional)
String A URI which can be used to retrieve the crawl data.

crawlPager Up

A display page of crawl status
crawls
array[crawlStatus] The crawls displayed in the page
pageInfo

crawlStatus Up

The status of a single crawl.
jobId
String The id for the crawl.
auId
String The id for the au.
auName
String The name for the au.
type
String The type of crawl.
startUrls
array[String] The array of start urls.
priority
Integer The priority for this crawl. format: int32
crawlerId
String The id of the crawler used for this crawl.
sources (optional)
array[String] The sources to use for the crawl.
depth (optional)
Integer The depth of the crawl. format: int32
refetchDepth (optional)
Integer The refetch depth of the crawl. format: int32
proxy (optional)
String The proxy used for crawling.
startTime
Long The timestamp for the start of crawl. format: int64
endTime
Long The timestamp for the end of the crawl. format: int64
jobStatus
isWaiting (optional)
Boolean True if the crawl waiting to start.
isActive (optional)
Boolean True if the crawl is active.
isError (optional)
Boolean True if the crawl has errored.
bytesFetched (optional)
Long The number of bytes fetched. format: int64
fetchedItems (optional)
excludedItems (optional)
notModifiedItems (optional)
parsedItems (optional)
pendingItems (optional)
errors (optional)
mimeTypes (optional)
array[mimeCounter] The list of urls by mimeType.

crawlWsResult Up

auId
auName
priority (optional)
Integer format: int32
crawlKey (optional)
crawlType (optional)
startTime (optional)
Integer format: int32
duration (optional)
Integer format: int32
crawlStatus (optional)
bytesFetchedCount (optional)
Integer format: int32
pagesFetchedCount (optional)
Integer format: int32
pagesFetched (optional)
pagesParsedCount (optional)
Integer format: int32
pagesParsed (optional)
pagesPendingCount (optional)
Integer format: int32
pagesPending (optional)
pagesExcludedCount (optional)
Integer format: int32
pagesExcluded (optional)
offSiteUrlsExcludedCount (optional)
Integer format: int32
pagesNotModifiedCount (optional)
Integer format: int32
pagesNotModified (optional)
pagesWithErrorsCount (optional)
Integer format: int32
pagesWithErrors (optional)
mimeTypeCount (optional)
Integer format: int32
mimeTypes (optional)
sources (optional)
startingUrls (optional)
refetchDepth (optional)
Integer format: int32
linkDepth (optional)
Integer format: int32

crawlWsResult_pagesWithErrors Up

url (optional)
severity (optional)
message (optional)

crawlerConfig Up

Configuration information about a specific crawler.
crawlerId
String The identifier for this crawler
example: classic
attributes
map[String, String] key value pairs specific providing attributes\ \ and configuration information.

crawlerStatus Up

Status about a specific crawler.
isEnabled
Boolean Is the crawler enabled
isAutoCrawlEnabled (optional)
Boolean Does crawler autocrawl AUs when needed.
numJobsActive (optional)
Integer The number of jobs running. format: int32
numJobsFailed (optional)
Integer The number of jobs failed. format: int32
numJobsSuccessful (optional)
Integer The number of jobs succeeded format: int32
numJobsPending (optional)
Integer The number of active jobs format: int32
errMessage (optional)

crawlerStatuses Up

The metadata generated for a single item
crawlerMap (optional)
map[String, crawlerStatus] An map of crawler status objects

jobPager Up

A display page of jobs
jobs
array[crawlJob] The jobs displayed in the page
pageInfo

jobStatus Up

A status which includes a code and a message.
statusCode
String The numeric value for this status.
Enum:
STATUS_UNKNOWN
STATUS_QUEUED
STATUS_ACTIVE
STATUS_SUCCESSFUL
STATUS_ERROR
STATUS_ABORTED
STATUS_WINDOW_CLOSED
STATUS_FETCH_ERROR
STATUS_NO_PUB_PERMISSION
STATUS_PLUGIN_ERROR
STATUS_REPO_ERR
STATUS_RUNNING_AT_CRASH
STATUS_EXTRACTOR_ERROR
STATUS_CRAWL_TEST_SUCCESSFUL
STATUS_CRAWL_TEST_FAIL
STATUS_INELIGIBLE
STATUS_INACTIVE_REQUEST
STATUS_INTERRUPTED
msg (optional)
String A text message explaining this status.

mimeCounter Up

A counter for mimeTypes seen during a crawl.
mimeType
String The mime type to count.
count (optional)
Integer The number of elements of mime type format: int32
counterLink (optional)
String A link to the list of count elements or to a pager with\ \ count elements.

pageInfo Up

The information related to pagination of content
totalCount
Integer The total number of elements to be paginated format: int32
resultsPerPage
Integer The number of results per page. format: int32
continuationToken
String The continuation token.
curLink
String The link to the current page.
nextLink (optional)
String The link to the next page.

urlError Up

information related to an error for a url.
message
String The error message
severity
String the severity of the error.
Enum:
Warning
Error
Fatal

urlInfo Up

information related to an url.
url
String The url string
error (optional)
referrers (optional)
array[String] An optional list of referrers.

urlPager Up

A Pager for urls with maps.
pageInfo
urls
array[urlInfo] An list of url with related info.