Skip to main content
Version: 0.2

RequestQueue

{"content": ["Represents a queue of URLs to crawl.\n\nCan be used for deep crawling of websites where you start with several URLs and then recursively\nfollow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.\n\nEach URL is represented using an instance of the Request class.\nThe queue can only contain unique URLs. More precisely, it can only contain request dictionaries\nwith distinct uniqueKey properties. By default, uniqueKey is generated from the URL, but it can also be overridden.\nTo add a single URL multiple times to the queue,\ncorresponding request dictionary will need to have different uniqueKey properties.\n\nDo not instantiate this class directly, use the Actor.open_request_queue() function instead.\n\nRequestQueue stores its data either on local disk or in the Apify cloud,\ndepending on whether the APIFY_LOCAL_STORAGE_DIR or APIFY_TOKEN environment variables are set.\n\nIf the APIFY_LOCAL_STORAGE_DIR environment variable is set, the data is stored in\nthe local directory in the following files:\n``\n{APIFY_LOCAL_STORAGE_DIR}/request_queues/{QUEUE_ID}/{REQUEST_ID}.json\n``\nNote that {QUEUE_ID} is the name or ID of the request queue. The default request queue has ID: default,\nunless you override it by setting the APIFY_DEFAULT_REQUEST_QUEUE_ID environment variable.\nThe {REQUEST_ID} is the id of the request.\n\nIf the APIFY_TOKEN environment variable is set but APIFY_LOCAL_STORAGE_DIR is not, the data is stored in the\nApify Request Queue\ncloud storage."]}

Index

Constructors

__init__

  • __init__(id, name, client, config): None
  • {"content": ["Create a RequestQueue instance.\n\nDo not use the constructor directly, use the Actor.open_request_queue() function instead.\n\nArgs:\n id (str): ID of the request queue.\n name (str, optional): Name of the request queue.\n client (ApifyClientAsync or MemoryStorageClient): The storage client which should be used.\n config (Configuration): The configuration which should be used."]}


    Parameters

    • id: str
    • name: Optional[str]
    • client: Union[ApifyClientAsync, MemoryStorageClient]
    • config: Configuration

    Returns None

Methods

add_request

  • async add_request(request, *, forefront): Dict
  • {"content": ["Add a request to the queue.\n\nArgs:\n request (dict): The request to add to the queue\n forefront (bool, optional): Whether to add the request to the head or the end of the queue\n\nReturns:\n dict: Information about the queue operation with keys requestId, uniqueKey, wasAlreadyPresent, wasAlreadyHandled."]}


    Parameters

    • request: Dict
    • keyword-onlyforefront: bool = False

    Returns Dict

get_request

  • async get_request(request_id): Optional[Dict]
  • {"content": ["Retrieve a request from the queue.\n\nArgs:\n request_id (str): ID of the request to retrieve.\n\nReturns:\n dict, optional: The retrieved request, or None, if it does not exist."]}


    Parameters

    • request_id: str

    Returns Optional[Dict]

fetch_next_request

  • async fetch_next_request(): Optional[Dict]
  • {"content": ["Return the next request in the queue to be processed.\n\nOnce you successfully finish processing of the request, you need to call\nRequestQueue.mark_request_as_handled to mark the request as handled in the queue.\nIf there was some error in processing the request, call RequestQueue.reclaim_request instead,\nso that the queue will give the request to some other consumer in another call to the fetch_next_request method.\n\nNote that the None return value does not mean the queue processing finished, it means there are currently no pending requests.\nTo check whether all requests in queue were finished, use RequestQueue.is_finished instead.\n\nReturns:\n dict, optional: The request or None if there are no more pending requests."]}


    Returns Optional[Dict]

mark_request_as_handled

  • async mark_request_as_handled(request): Optional[Dict]
  • {"content": ["Mark a request as handled after successful processing.\n\nHandled requests will never again be returned by the RequestQueue.fetch_next_request method.\n\nArgs:\n request (dict): The request to mark as handled.\n\nReturns:\n dict, optional: Information about the queue operation with keys requestId, uniqueKey, wasAlreadyPresent, wasAlreadyHandled.\n None if the given request was not in progress."]}


    Parameters

    • request: Dict

    Returns Optional[Dict]

reclaim_request

  • async reclaim_request(request, forefront): Optional[Dict]
  • {"content": ["Reclaim a failed request back to the queue.\n\nThe request will be returned for processing later again\nby another call to RequestQueue.fetchNextRequest.\n\nArgs:\n request (dict): The request to return to the queue.\n forefront (bool, optional): Whether to add the request to the head or the end of the queue\nReturns:\n dict, optional: Information about the queue operation with keys requestId, uniqueKey, wasAlreadyPresent, wasAlreadyHandled.\n None if the given request was not in progress."]}


    Parameters

    • request: Dict
    • forefront: bool = False

    Returns Optional[Dict]

is_empty

  • async is_empty(): bool
  • {"content": ["Check whether the queue is empty.\n\nReturns:\n bool: True if the next call to RequestQueue.fetchNextRequest would return None, otherwise False."]}


    Returns bool

is_finished

  • async is_finished(): bool
  • {"content": ["Check whether the queue is finished.\n\nDue to the nature of distributed storage used by the queue,\nthe function might occasionally return a false negative,\nbut it will never return a false positive.\n\nReturns:\n bool: True if all requests were already handled and there are no more left. False otherwise."]}


    Returns bool

drop

  • async drop(): None
  • {"content": ["Remove the request queue either from the Apify cloud storage or from the local directory."]}


    Returns None

get_info

  • async get_info(): Optional[Dict]
  • {"content": ["Get an object containing general information about the request queue.\n\nReturns:\n dict: Object returned by calling the GET request queue API endpoint."]}


    Returns Optional[Dict]

open

  • async open(*, id, name, force_cloud, config): 'RequestQueue'
  • {"content": ["Open a request queue.\n\nRequest queue represents a queue of URLs to crawl, which is stored either on local filesystem or in the Apify cloud.\nThe queue is used for deep crawling of websites, where you start with several URLs and then\nrecursively follow links to other pages. The data structure supports both breadth-first\nand depth-first crawling orders.\n\nArgs:\n id (str, optional): ID of the request queue to be opened.\n If neither id nor name are provided, the method returns the default request queue associated with the actor run.\n If the request queue with the given ID does not exist, it raises an error.\n name (str, optional): Name of the request queue to be opened.\n If neither id nor name are provided, the method returns the default request queue associated with the actor run.\n If the request queue with the given name does not exist, it is created.\n force_cloud (bool, optional): If set to True, it will open a request queue on the Apify Platform even when running the actor locally.\n Defaults to False.\n config (Configuration, optional): A Configuration instance, uses global configuration if omitted.\n\nReturns:\n RequestQueue: An instance of the RequestQueue class for the given ID or name."]}


    Parameters

    • keyword-onlyid: Optional[str] = None
    • keyword-onlyname: Optional[str] = None
    • keyword-onlyforce_cloud: bool = False
    • keyword-onlyconfig: Optional[Configuration] = None

    Returns 'RequestQueue'