RequestQueue
apify.storages.RequestQueue
Index
Methods
add_request
Adds a request to the
RequestQueue
while managing deduplication and positioning within the queue.The deduplication of requests relies on the
uniqueKey
field within the request dictionary. IfuniqueKey
exists, it remains unchanged; if it does not, it is generated based on the request’surl
,method
, andpayload
fields. The generation ofuniqueKey
can be influenced by thekeep_url_fragment
anduse_extended_unique_key
flags, which dictate whether to include the URL fragment and the request’s method and payload, respectively, in its computation.The request can be added to the forefront (beginning) or the back of the queue based on the
forefront
parameter. Information about the request’s addition to the queue, including whether it was already present or handled, is returned in an output dictionary.Parameters
request: dict
forefront: bool = Falsekeyword-only
keep_url_fragment: bool = Falsekeyword-only
use_extended_unique_key: bool = Falsekeyword-only
Returns dict
requestId
(str)
drop
Remove the request queue either from the Apify cloud storage or from the local directory.
Returns None
fetch_next_request
Return the next request in the queue to be processed.
Once you successfully finish processing of the request, you need to call
RequestQueue.mark_request_as_handled
to mark the request as handled in the queue. If there was some error in processing the request, callRequestQueue.reclaim_request
instead, so that the queue will give the request to some other consumer in another call to thefetch_next_request
method.Note that the
None
return value does not mean the queue processing finished, it means there are currently no pending requests. To check whether all requests in queue were finished, useRequestQueue.is_finished
instead.Returns dict | None
The request or
None
if there are no more pending requests.
get_info
Get an object containing general information about the request queue.
Returns dict | None
Object returned by calling the GET request queue API endpoint.
get_request
Retrieve a request from the queue.
Parameters
request_id: str
ID of the request to retrieve.
Returns dict | None
The retrieved request, or
None
, if it does not exist.
is_empty
Check whether the queue is empty.
Returns bool
True
if the next call toRequestQueue.fetchNextRequest
would returnNone
, otherwiseFalse
.
is_finished
Check whether the queue is finished.
Due to the nature of distributed storage used by the queue, the function might occasionally return a false negative, but it will never return a false positive.
Returns bool
True
if all requests were already handled and there are no more left.False
otherwise.
mark_request_as_handled
Mark a request as handled after successful processing.
Handled requests will never again be returned by the
RequestQueue.fetch_next_request
method.Parameters
request: dict
The request to mark as handled.
Returns dict | None
Information about the queue operation with keys
requestId
,uniqueKey
,wasAlreadyPresent
,wasAlreadyHandled
.None
if the given request was not in progress.
open
Open a request queue.
Request queue represents a queue of URLs to crawl, which is stored either on local filesystem or in the Apify cloud. The queue is used for deep crawling of websites, where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.
Parameters
id: str | None = Nonekeyword-only
ID of the request queue to be opened. If neither
id
norname
are provided, the method returns the default request queue associated with the actor run. If the request queue with the given ID does not exist, it raises an error.name: str | None = Nonekeyword-only
Name of the request queue to be opened. If neither
id
norname
are provided, the method returns the default request queue associated with the actor run. If the request queue with the given name does not exist, it is created.force_cloud: bool = Falsekeyword-only
If set to True, it will open a request queue on the Apify Platform even when running the actor locally. Defaults to False.
config: Configuration | None = Nonekeyword-only
A
Configuration
instance, uses global configuration if omitted.
Returns RequestQueue
An instance of the
RequestQueue
class for the given ID or name.
reclaim_request
Reclaim a failed request back to the queue.
The request will be returned for processing later again by another call to
RequestQueue.fetchNextRequest
.Parameters
request: dict
The request to return to the queue.
forefront: bool = False
Whether to add the request to the head or the end of the queue
Returns dict | None
Information about the queue operation with keys
requestId
,uniqueKey
,wasAlreadyPresent
,wasAlreadyHandled
.None
if the given request was not in progress.
Represents a queue of URLs to crawl.
Can be used for deep crawling of websites where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.
Each URL is represented using an instance of the Request class. The queue can only contain unique URLs. More precisely, it can only contain request dictionaries with distinct
uniqueKey
properties. By default,uniqueKey
is generated from the URL, but it can also be overridden. To add a single URL multiple times to the queue, corresponding request dictionary will need to have differentuniqueKey
properties.Do not instantiate this class directly, use the
Actor.open_request_queue()
function instead.RequestQueue
stores its data either on local disk or in the Apify cloud, depending on whether theAPIFY_LOCAL_STORAGE_DIR
orAPIFY_TOKEN
environment variables are set.If the
APIFY_LOCAL_STORAGE_DIR
environment variable is set, the data is stored in the local directory in the following files:Note that
{QUEUE_ID}
is the name or ID of the request queue. The default request queue has ID:default
, unless you override it by setting theAPIFY_DEFAULT_REQUEST_QUEUE_ID
environment variable. The{REQUEST_ID}
is the id of the request.If the
APIFY_TOKEN
environment variable is set butAPIFY_LOCAL_STORAGE_DIR
is not, the data is stored in the Apify Request Queue cloud storage.