LOCKSS Crawler Service REST API

REST API of the LOCKSS Crawler Service

More information: https://www.lockss.org/

Contact Info: lockss-support@lockss.org

Version: 2.0.0

BasePath:/

BSD-3-Clause

https://opensource.org/licenses/BSD-3-Clause

Access

HTTP Basic Authentication

Methods

[ Jump to Models ]

Table of Contents

Crawlers

get /crawlers/{crawlerId}
get /crawlers

Crawls

delete /crawls/{jobId}
get /crawls/{jobId}
get /crawls/{jobId}/mimeType/{type}
get /crawls/{jobId}/errors
get /crawls/{jobId}/excluded
get /crawls/{jobId}/fetched
get /crawls/{jobId}/notModified
get /crawls/{jobId}/parsed
get /crawls/{jobId}/pending
get /crawls

Jobs

delete /jobs
get /jobs
post /jobs

Status

get /status

Ws

get /ws/crawls

Crawlers

Up

get /crawlers/{crawlerId}

Return information about a crawler. (getCrawlerConfig)

Get information related to a installed crawler.

Path parameters

crawlerId (required)

Path Parameter — Identifier for the crawler

Return type

crawlerConfig

Example data

Content-Type: application/json

{
  "crawlerId" : "classic",
  "attributes" : {
    "key" : "attributes"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

Crawler Configuration Found crawlerConfig

401

Access Denied.

404

No Such Crawler

500

Internal Server Error

Up

get /crawlers

Get the list of supported crawlers. (getCrawlers)

Return the list of supported crawlers.

Return type

crawlerStatuses

Example data

Content-Type: application/json

{
  "crawlerMap" : {
    "key" : {
      "numJobsSuccessful" : 1,
      "numJobsPending" : 5,
      "numJobsFailed" : 6,
      "isEnabled" : true,
      "isAutoCrawlEnabled" : true,
      "numJobsActive" : 0,
      "errMessage" : "errMessage"
    }
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The Status of supported Crawlers. crawlerStatuses

404

No Such Crawler

Crawls

Up

delete /crawls/{jobId}

Remove or stop a crawl (deleteCrawlById)

Delete a crawl given the crawl identifier, stopping any current processing, if necessary.

Path parameters

jobId (required)

Path Parameter — The identifier used to identify a specific crawl.

Return type

crawlStatus

Example data

Content-Type: application/json

{
  "auId" : "auId",
  "jobStatus" : {
    "msg" : "msg",
    "statusCode" : "STATUS_UNKNOWN"
  },
  "auName" : "auName",
  "sources" : [ "sources", "sources" ],
  "bytesFetched" : 2,
  "fetchedItems" : {
    "itemsLink" : "itemsLink",
    "count" : 7
  },
  "type" : "type",
  "refetchDepth" : 1,
  "isActive" : true,
  "mimeTypes" : [ {
    "count" : 9,
    "mimeType" : "mimeType",
    "counterLink" : "counterLink"
  }, {
    "count" : 9,
    "mimeType" : "mimeType",
    "counterLink" : "counterLink"
  } ],
  "isError" : true,
  "startTime" : 5,
  "crawlerId" : "classic",
  "priority" : 0,
  "startUrls" : [ "startUrls", "startUrls" ],
  "jobId" : "jobId",
  "proxy" : "proxy",
  "depth" : 6,
  "isWaiting" : true,
  "endTime" : 5
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The deleted crawl crawlStatus

401

Unauthorized

403

Forbidden

404

Not Found

500

Internal Server Error

Up

get /crawls/{jobId}

Get the crawl status of this job (getCrawlById)

Get the job represented by this crawl id

Path parameters

jobId (required)

Path Parameter —

Return type

crawlStatus

Example data

Content-Type: application/json

{
  "auId" : "auId",
  "jobStatus" : {
    "msg" : "msg",
    "statusCode" : "STATUS_UNKNOWN"
  },
  "auName" : "auName",
  "sources" : [ "sources", "sources" ],
  "bytesFetched" : 2,
  "fetchedItems" : {
    "itemsLink" : "itemsLink",
    "count" : 7
  },
  "type" : "type",
  "refetchDepth" : 1,
  "isActive" : true,
  "mimeTypes" : [ {
    "count" : 9,
    "mimeType" : "mimeType",
    "counterLink" : "counterLink"
  }, {
    "count" : 9,
    "mimeType" : "mimeType",
    "counterLink" : "counterLink"
  } ],
  "isError" : true,
  "startTime" : 5,
  "crawlerId" : "classic",
  "priority" : 0,
  "startUrls" : [ "startUrls", "startUrls" ],
  "jobId" : "jobId",
  "proxy" : "proxy",
  "depth" : 6,
  "isWaiting" : true,
  "endTime" : 5
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The crawl status of the requested crawl crawlStatus

401

Unauthorized

404

Not Found

500

Internal Server Error

Up

get /crawls/{jobId}/mimeType/{type}

A pageable list of urls of mimetype. (getCrawlByMimeType)

Get a list of urls of mimetype.

Path parameters

jobId (required)

Path Parameter —

type (required)

Path Parameter —

Query parameters

limit (optional)

Query Parameter — The number of jobs per page.

continuationToken (optional)

Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json

{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up

get /crawls/{jobId}/errors

A pageable list of urls with errors. (getCrawlErrors)

Get a list of urls with errors.

Path parameters

jobId (required)

Path Parameter —

Query parameters

limit (optional)

Query Parameter — The number of jobs per page.

continuationToken (optional)

Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json

{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested urls with errors. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up

get /crawls/{jobId}/excluded

A pageable list of excluded urls. (getCrawlExcluded)

Get a list of excluded urls.

Path parameters

jobId (required)

Path Parameter — identifier used to identify a specific crawl.

Query parameters

limit (optional)

Query Parameter — The number of jobs per page.

continuationToken (optional)

Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json

{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested excluded urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up

get /crawls/{jobId}/fetched

A pageable list of fetched urls. (getCrawlFetched)

Get a list of fetched urls.

Path parameters

jobId (required)

Path Parameter —

Query parameters

limit (optional)

Query Parameter — The number of jobs per page.

continuationToken (optional)

Query Parameter — The continuation token of the next page of jobs to be\ \ returned.

Return type

urlPager

Example data

Content-Type: application/json

{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested fetched urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up

get /crawls/{jobId}/notModified

A pageable list of not modified urls. (getCrawlNotModified)

Get a list of not modified urls.

Path parameters

jobId (required)

Path Parameter —

Query parameters

limit (optional)

Query Parameter — The number of jobs per page.

continuationToken (optional)

Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json

{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested not modified urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up

get /crawls/{jobId}/parsed

A pageable list of parsed urls. (getCrawlParsed)

Get a list of parsed urls.

Path parameters

jobId (required)

Path Parameter —

Query parameters

limit (optional)

Query Parameter — The number of jobs per page.

continuationToken (optional)

Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json

{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested parsed urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up

get /crawls/{jobId}/pending

A pageable list of pending urls. (getCrawlPending)

Get a list of pending urls.

Path parameters

jobId (required)

Path Parameter —

Query parameters

limit (optional)

Query Parameter — The number of jobs per page.

continuationToken (optional)

Query Parameter — The continuation token of the next page of urls to be returned.

Return type

urlPager

Example data

Content-Type: application/json

{
  "urls" : [ {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  }, {
    "referrers" : [ "referrers", "referrers" ],
    "error" : {
      "severity" : "Warning",
      "message" : "message"
    },
    "url" : "url"
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested pending urls. urlPager

400

Bad Request

401

Unauthorized

404

Not Found

500

Internal Server Error

Up

get /crawls

Get the list of crawls. (getCrawls)

Get a list of crawls a pageful at a time as defined by limit.

Query parameters

limit (optional)

Query Parameter — The number of jobs per page default: 50

continuationToken (optional)

Query Parameter — The continuation token of the next page of crawl status data to be returned.

Return type

crawlPager

Example data

Content-Type: application/json

{
  "crawls" : [ {
    "auId" : "auId",
    "jobStatus" : {
      "msg" : "msg",
      "statusCode" : "STATUS_UNKNOWN"
    },
    "auName" : "auName",
    "sources" : [ "sources", "sources" ],
    "bytesFetched" : 2,
    "fetchedItems" : {
      "itemsLink" : "itemsLink",
      "count" : 7
    },
    "type" : "type",
    "refetchDepth" : 1,
    "isActive" : true,
    "mimeTypes" : [ {
      "count" : 9,
      "mimeType" : "mimeType",
      "counterLink" : "counterLink"
    }, {
      "count" : 9,
      "mimeType" : "mimeType",
      "counterLink" : "counterLink"
    } ],
    "isError" : true,
    "startTime" : 5,
    "crawlerId" : "classic",
    "priority" : 0,
    "startUrls" : [ "startUrls", "startUrls" ],
    "jobId" : "jobId",
    "proxy" : "proxy",
    "depth" : 6,
    "isWaiting" : true,
    "endTime" : 5
  }, {
    "auId" : "auId",
    "jobStatus" : {
      "msg" : "msg",
      "statusCode" : "STATUS_UNKNOWN"
    },
    "auName" : "auName",
    "sources" : [ "sources", "sources" ],
    "bytesFetched" : 2,
    "fetchedItems" : {
      "itemsLink" : "itemsLink",
      "count" : 7
    },
    "type" : "type",
    "refetchDepth" : 1,
    "isActive" : true,
    "mimeTypes" : [ {
      "count" : 9,
      "mimeType" : "mimeType",
      "counterLink" : "counterLink"
    }, {
      "count" : 9,
      "mimeType" : "mimeType",
      "counterLink" : "counterLink"
    } ],
    "isError" : true,
    "startTime" : 5,
    "crawlerId" : "classic",
    "priority" : 0,
    "startUrls" : [ "startUrls", "startUrls" ],
    "jobId" : "jobId",
    "proxy" : "proxy",
    "depth" : 6,
    "isWaiting" : true,
    "endTime" : 5
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested crawls crawlPager

400

Bad Request

401

Unauthorized

500

Internal Server Error

Jobs

Up

delete /jobs

Delete all of the currently queued and active jobs (deleteJobs)

Halt and delete all of the currently queued and active crawl jobs

Responses

200

All crawl jobs have been stopped and deleted.

400

Bad Request

401

Unauthorized

500

Internal Server Error

Up

get /jobs

Get the list of crawl jobs. (getJobs)

Get a list of crawl jobs a pageful at a time as defined by the continuation token and limit.

Query parameters

limit (optional)

Query Parameter — The number of jobs per page default: 50

continuationToken (optional)

Query Parameter — The continuation token of the next page of jobs to be returned.

Return type

jobPager

Example data

Content-Type: application/json

{
  "jobs" : [ {
    "result" : "result",
    "jobId" : "jobId",
    "jobStatus" : {
      "msg" : "msg",
      "statusCode" : "STATUS_UNKNOWN"
    },
    "crawlDesc" : {
      "forceCrawl" : false,
      "auId" : "auId",
      "crawlKind" : "newContent",
      "crawlerId" : "classic",
      "crawlList" : [ "crawlList", "crawlList" ],
      "extraCrawlerData" : {
        "key" : { }
      },
      "refetchDepth" : 0,
      "priority" : 6,
      "crawlDepth" : 1
    },
    "endDate" : 2,
    "requestDate" : 5,
    "startDate" : 5
  }, {
    "result" : "result",
    "jobId" : "jobId",
    "jobStatus" : {
      "msg" : "msg",
      "statusCode" : "STATUS_UNKNOWN"
    },
    "crawlDesc" : {
      "forceCrawl" : false,
      "auId" : "auId",
      "crawlKind" : "newContent",
      "crawlerId" : "classic",
      "crawlList" : [ "crawlList", "crawlList" ],
      "extraCrawlerData" : {
        "key" : { }
      },
      "refetchDepth" : 0,
      "priority" : 6,
      "crawlDepth" : 1
    },
    "endDate" : 2,
    "requestDate" : 5,
    "startDate" : 5
  } ],
  "pageInfo" : {
    "curLink" : "curLink",
    "resultsPerPage" : 2,
    "totalCount" : 3,
    "continuationToken" : "continuationToken",
    "nextLink" : "nextLink"
  }
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The requested crawls jobPager

400

Bad Request

401

Unauthorized

500

Internal Server Error

Up

post /jobs

Request a crawl as defined by the descriptor (queueJob)

Enqueue a new crawl job as defined by the crawl descriptor and return it.

Consumes

This API call consumes the following media types via the request header:

*/*

Request body

body crawlDesc (required)

Body Parameter — crawl request

Return type

crawlJob

Example data

Content-Type: application/json

{
  "result" : "result",
  "jobId" : "jobId",
  "jobStatus" : {
    "msg" : "msg",
    "statusCode" : "STATUS_UNKNOWN"
  },
  "crawlDesc" : {
    "forceCrawl" : false,
    "auId" : "auId",
    "crawlKind" : "newContent",
    "crawlerId" : "classic",
    "crawlList" : [ "crawlList", "crawlList" ],
    "extraCrawlerData" : {
      "key" : { }
    },
    "refetchDepth" : 0,
    "priority" : 6,
    "crawlDepth" : 1
  },
  "endDate" : 2,
  "requestDate" : 5,
  "startDate" : 5
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

202

The crawl request has been queued for operation. crawlJob

400

Bad Request

401

Unauthorized

403

Forbidden

404

Not Found

500

Internal Server Error

Status

Up

get /status

Get the status of the service (getStatus)

Get the status of the service

Return type

apiStatus

Example data

Content-Type: application/json

{
  "startupStatus" : "NONE",
  "reason" : "reason",
  "readyTime" : 0,
  "apiVersion" : "apiVersion",
  "ready" : true,
  "componentName" : "componentName",
  "componentVersion" : "componentVersion",
  "serviceName" : "serviceName",
  "lockssVersion" : "lockssVersion"
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

The status of the service apiStatus

401

Unauthorized

500

Internal Server Error

Ws

Up

get /ws/crawls

Query for list of crawls based on subset defined by query string (getWsCrawls)

Query for crawls that meet a set of specified conditions

Query parameters

crawlQuery (required)

Query Parameter — The query that specifies the crawls to be returned

Return type

crawlWsResult

Example data

Content-Type: application/json

{
  "auId" : "auId",
  "auName" : "auName",
  "pagesFetchedCount" : 5,
  "sources" : [ "sources", "sources" ],
  "linkDepth" : 1,
  "pagesFetched" : [ "pagesFetched", "pagesFetched" ],
  "pagesNotModifiedCount" : 2,
  "pagesParsed" : [ "pagesParsed", "pagesParsed" ],
  "refetchDepth" : 1,
  "duration" : 1,
  "mimeTypes" : [ "mimeTypes", "mimeTypes" ],
  "pagesWithErrorsCount" : 4,
  "pagesWithErrors" : [ {
    "severity" : "severity",
    "message" : "message",
    "url" : "url"
  }, {
    "severity" : "severity",
    "message" : "message",
    "url" : "url"
  } ],
  "startTime" : 6,
  "crawlType" : "crawlType",
  "pagesParsedCount" : 2,
  "bytesFetchedCount" : 5,
  "crawlKey" : "crawlKey",
  "crawlStatus" : "crawlStatus",
  "priority" : 0,
  "pagesPendingCount" : 7,
  "pagesPending" : [ "pagesPending", "pagesPending" ],
  "pagesExcludedCount" : 9,
  "offSiteUrlsExcludedCount" : 3,
  "pagesNotModified" : [ "pagesNotModified", "pagesNotModified" ],
  "mimeTypeCount" : 7,
  "pagesExcluded" : [ "pagesExcluded", "pagesExcluded" ],
  "startingUrls" : [ "startingUrls", "startingUrls" ]
}

Produces

This API call produces the following media types according to the request header; the media type will be conveyed by the response header.

application/json

Responses

200

Information about the requested crawls crawlWsResult

400

Bad Request

401

Unauthorized

500

Internal Server Error

Models

[ Jump to Methods ]

Table of Contents

apiStatus
counter
crawlDesc
crawlJob
crawlPager
crawlStatus
crawlWsResult
crawlWsResult_pagesWithErrors
crawlerConfig
crawlerStatus
crawlerStatuses
jobPager
jobStatus
mimeCounter
pageInfo
urlError
urlInfo
urlPager

`apiStatus` Up

The status information of the service

apiVersion

String The version of the API

componentName (optional)

String The name of the component

componentVersion (optional)

String The version of the component software

lockssVersion (optional)

String The version of the LOCKSS system

ready

Boolean The indication of whether the service is available

serviceName (optional)

String The name of the service

readyTime (optional)

Long The time the service last became ready. format: int64

reason (optional)

String The reason the service isn't ready.

startupStatus (optional)

String Enum indicating progress of plugin/AU processing at startup.

Enum:

NONE

PLUGINS_CRAWLING

PLUGINS_COLLECTED

PLUGINS_LOADING

PLUGINS_LOADED

AUS_STARTING

AUS_STARTED

`counter` Up

A counter for urls.

count

Integer The number of elements format: int32

itemsLink

String A link to the list of count items or to a pager with count\ \ items.

`crawlDesc` Up

A descriptor for a crawl.

auId

String The identifier of the archival unit to be crawled.

crawlKind

String The kind of crawl being performed either 'newContent' or 'repair'.

Enum:

newContent

repair

crawlerId (optional)

String The crawler to be used for this crawl.

forceCrawl (optional)

Boolean An indication of whether the crawl is to be forced,\ \ suppressing conditions that might otherwise prevent the crawl from\ \ happening.

refetchDepth (optional)

Integer The refetch depth to use for a deep crawl. format: int32

priority (optional)

Integer The priority for the crawl. format: int32

crawlList (optional)

array[String] The list of URLs to crawl.

crawlDepth (optional)

Integer The depth to which the links should be followed. 0 means\ \ do not follow links. format: int32

extraCrawlerData (optional)

map[String, Object] A map of additional properties for a crawl on a given crawler.

`crawlJob` Up

The job resulting from a request to perform a crawl.

crawlDesc

requestDate

Long The timestamp when the crawl was requested. format: int64

jobId

String Identifier of the crawl job.

jobStatus

startDate (optional)

Long The timestamp when the crawl began. format: int64

endDate (optional)

Long The timestamp when the crawl ended. format: int64

result (optional)

String A URI which can be used to retrieve the crawl data.

`crawlPager` Up

A display page of crawl status

crawls

array[crawlStatus] The crawls displayed in the page

pageInfo

`crawlStatus` Up

The status of a single crawl.

jobId

String The id for the crawl.

auId

String The id for the au.

auName

String The name for the au.

type

String The type of crawl.

startUrls

array[String] The array of start urls.

priority

Integer The priority for this crawl. format: int32

crawlerId

String The id of the crawler used for this crawl.

sources (optional)

array[String] The sources to use for the crawl.

depth (optional)

Integer The depth of the crawl. format: int32

refetchDepth (optional)

Integer The refetch depth of the crawl. format: int32

proxy (optional)

String The proxy used for crawling.

startTime

Long The timestamp for the start of crawl. format: int64

endTime

Long The timestamp for the end of the crawl. format: int64

jobStatus

isWaiting (optional)

Boolean True if the crawl waiting to start.

isActive (optional)

Boolean True if the crawl is active.

isError (optional)

Boolean True if the crawl has errored.

bytesFetched (optional)

Long The number of bytes fetched. format: int64

fetchedItems (optional)

excludedItems (optional)

notModifiedItems (optional)

parsedItems (optional)

pendingItems (optional)

errors (optional)

mimeTypes (optional)

array[mimeCounter] The list of urls by mimeType.

`crawlWsResult` Up

auId

auName

priority (optional)

Integer format: int32

crawlKey (optional)

crawlType (optional)

startTime (optional)

Integer format: int32

duration (optional)

Integer format: int32

crawlStatus (optional)

bytesFetchedCount (optional)

Integer format: int32

pagesFetchedCount (optional)

Integer format: int32

pagesFetched (optional)

pagesParsedCount (optional)

Integer format: int32

pagesParsed (optional)

pagesPendingCount (optional)

Integer format: int32

pagesPending (optional)

pagesExcludedCount (optional)

Integer format: int32

pagesExcluded (optional)

offSiteUrlsExcludedCount (optional)

Integer format: int32

pagesNotModifiedCount (optional)

Integer format: int32

pagesNotModified (optional)

pagesWithErrorsCount (optional)

Integer format: int32

pagesWithErrors (optional)

array[crawlWsResult_pagesWithErrors]

mimeTypeCount (optional)

Integer format: int32

mimeTypes (optional)

sources (optional)

startingUrls (optional)

refetchDepth (optional)

Integer format: int32

linkDepth (optional)

Integer format: int32

`crawlWsResult_pagesWithErrors` Up

url (optional)

severity (optional)

message (optional)

`crawlerConfig` Up

Configuration information about a specific crawler.

crawlerId

String The identifier for this crawler

example: classic

attributes

map[String, String] key value pairs specific providing attributes\ \ and configuration information.

`crawlerStatus` Up

Status about a specific crawler.

isEnabled

Boolean Is the crawler enabled

isAutoCrawlEnabled (optional)

Boolean Does crawler autocrawl AUs when needed.

numJobsActive (optional)

Integer The number of jobs running. format: int32

numJobsFailed (optional)

Integer The number of jobs failed. format: int32

numJobsSuccessful (optional)

Integer The number of jobs succeeded format: int32

numJobsPending (optional)

Integer The number of active jobs format: int32

errMessage (optional)

`crawlerStatuses` Up

The metadata generated for a single item

crawlerMap (optional)

map[String, crawlerStatus] An map of crawler status objects

`jobPager` Up

A display page of jobs

jobs

array[crawlJob] The jobs displayed in the page

pageInfo

`jobStatus` Up

A status which includes a code and a message.

statusCode

String The numeric value for this status.

Enum:

STATUS_UNKNOWN

STATUS_QUEUED

STATUS_ACTIVE

STATUS_SUCCESSFUL

STATUS_ERROR

STATUS_ABORTED

STATUS_WINDOW_CLOSED

STATUS_FETCH_ERROR

STATUS_NO_PUB_PERMISSION

STATUS_PLUGIN_ERROR

STATUS_REPO_ERR

STATUS_RUNNING_AT_CRASH

STATUS_EXTRACTOR_ERROR

STATUS_CRAWL_TEST_SUCCESSFUL

STATUS_CRAWL_TEST_FAIL

STATUS_INELIGIBLE

STATUS_INACTIVE_REQUEST

STATUS_INTERRUPTED

msg (optional)

String A text message explaining this status.

`mimeCounter` Up

A counter for mimeTypes seen during a crawl.

mimeType

String The mime type to count.

count (optional)

Integer The number of elements of mime type format: int32

counterLink (optional)

String A link to the list of count elements or to a pager with\ \ count elements.

`pageInfo` Up

The information related to pagination of content

totalCount

Integer The total number of elements to be paginated format: int32

resultsPerPage

Integer The number of results per page. format: int32

continuationToken

String The continuation token.

curLink

String The link to the current page.

nextLink (optional)

String The link to the next page.

`urlError` Up

information related to an error for a url.

message

String The error message

severity

String the severity of the error.

Enum:

Warning

Error

Fatal

`urlInfo` Up

information related to an url.

url

String The url string

error (optional)

referrers (optional)

array[String] An optional list of referrers.

`urlPager` Up

A Pager for urls with maps.

pageInfo

urls

array[urlInfo] An list of url with related info.