Skip to main content
Version: Next

externalRequestQueue

Represents a queue of URLs to crawl, which is used for deep crawling of websites where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.

Each URL is represented using an instance of the Request class. The queue can only contain unique URLs. More precisely, it can only contain Request instances with distinct uniqueKey properties. By default, uniqueKey is generated from the URL, but it can also be overridden. To add a single URL multiple times to the queue, corresponding Request objects will need to have different uniqueKey properties.

Do not instantiate this class directly, use the RequestQueue.open function instead.

RequestQueue is used by BasicCrawler, CheerioCrawler, PuppeteerCrawler and PlaywrightCrawler as a source of URLs to crawl. Unlike RequestList, RequestQueue supports dynamic adding and removing of requests. On the other hand, the queue is not optimized for operations that add or remove a large number of URLs in a batch.

Example usage:

// Open the default request queue associated with the crawler run
const queue = await RequestQueue.open();

// Open a named request queue
const queueWithName = await RequestQueue.open('some-name');

// Enqueue few requests
await queue.addRequest({ url: 'http://example.com/aaa' });
await queue.addRequest({ url: 'http://example.com/bbb' });
await queue.addRequest({ url: 'http://example.com/foo/bar' }, { forefront: true });

Hierarchy

  • RequestProvider
    • RequestQueue

Index

Constructors

externalconstructor

  • Parameters

    • externaloptions: RequestProviderOptions
    • externaloptionalconfig: Configuration

    Returns RequestQueue

Properties

externalinheritedassumedHandledCount

assumedHandledCount: number

externalinheritedassumedTotalCount

assumedTotalCount: number

externalinheritedclient

client: RequestQueueClient

externalinheritedclientKey

clientKey: string

externalreadonlyinheritedconfig

config: Configuration

externalinheritedid

id: string

externalinheritedinternalTimeoutMillis

internalTimeoutMillis: number

externalinheritedlog

log: Log

externaloptionalinheritedname

name?: string

externalinheritedrequestLockSecs

requestLockSecs: number

externalinheritedtimeoutSecs

timeoutSecs: number

Methods

externaladdRequest

  • addRequest(requestLike, options): Promise<RequestQueueOperationInfo>
  • @inheritDoc

    Parameters

    Returns Promise<RequestQueueOperationInfo>

externaladdRequests

  • addRequests(requestsLike, options): Promise<BatchAddRequestsResult>
  • @inheritDoc

    Parameters

    Returns Promise<BatchAddRequestsResult>

externalinheritedaddRequestsBatched

  • addRequestsBatched(requests, options): Promise<AddRequestsBatchedResult>
  • Adds requests to the queue in batches. By default, it will resolve after the initial batch is added, and continue adding the rest in the background. You can configure the batch size via batchSize option and the sleep time in between the batches via waitBetweenBatchesMillis. If you want to wait for all batches to be added to the queue, you can use the waitForAllRequestsToBeAdded promise you get in the response object.


    Parameters

    • externalrequests: (string | Source)[]

      The requests to add

    • externaloptionaloptions: AddRequestsBatchedOptions

      Options for the request queue

    Returns Promise<AddRequestsBatchedResult>

externalinheriteddrop

  • drop(): Promise<void>
  • Removes the queue either from the Apify Cloud storage or from the local database, depending on the mode of operation.


    Returns Promise<void>

externalfetchNextRequest

  • fetchNextRequest(): Promise<null | Request<T>>
  • @inheritDoc

    Returns Promise<null | Request<T>>

externalinheritedgetInfo

  • getInfo(): Promise<undefined | RequestQueueInfo>
  • Returns an object containing general information about the request queue.

    The function returns the same object as the Apify API Client's getQueue function, which in turn calls the Get request queue API endpoint.

    Example:

    {
    id: "WkzbQMuFYuamGv3YF",
    name: "my-queue",
    userId: "wRsJZtadYvn4mBZmm",
    createdAt: new Date("2015-12-12T07:34:14.202Z"),
    modifiedAt: new Date("2015-12-13T08:36:13.202Z"),
    accessedAt: new Date("2015-12-14T08:36:13.202Z"),
    totalRequestCount: 25,
    handledRequestCount: 5,
    pendingRequestCount: 20,
    }

    Returns Promise<undefined | RequestQueueInfo>

externalinheritedgetRequest

  • getRequest(id): Promise<null | Request<T>>
  • Gets the request from the queue specified by ID.


    Parameters

    • externalid: string

      ID of the request.

    Returns Promise<null | Request<T>>

    Returns the request object, or null if it was not found.

externalinheritedgetTotalCount

  • getTotalCount(): number
  • Returns an offline approximation of the total number of requests in the queue (i.e. pending + handled).

    Survives restarts and actor migrations.


    Returns number

externalinheritedhandledCount

  • handledCount(): Promise<number>
  • Returns the number of handled requests.

    This function is just a convenient shortcut for:

    const { handledRequestCount } = await queue.getInfo();

    Returns Promise<number>

externalinheritedisEmpty

  • isEmpty(): Promise<boolean>
  • Resolves to true if the next call to RequestQueue.fetchNextRequest would return null, otherwise it resolves to false. Note that even if the queue is empty, there might be some pending requests currently being processed. If you need to ensure that there is no activity in the queue, use RequestQueue.isFinished.


    Returns Promise<boolean>

externalisFinished

  • isFinished(): Promise<boolean>
  • @inheritDoc

    Returns Promise<boolean>

externalmarkRequestHandled

  • markRequestHandled(request): Promise<null | RequestQueueOperationInfo>
  • @inheritDoc

    Parameters

    • externalrequest: Request<Dictionary>

    Returns Promise<null | RequestQueueOperationInfo>

externalreclaimRequest

  • reclaimRequest(...args): Promise<null | RequestQueueOperationInfo>
  • @inheritDoc

    Parameters

    Returns Promise<null | RequestQueueOperationInfo>

staticexternalopen

  • @inheritDoc

    Parameters

    • externalrest...args: [queueIdOrName?: null | string, options?: StorageManagerOptions]

    Returns Promise<RequestQueue>