RequestQueue
Index
Methods
add_request
Add a request to the queue.
Parameters
request: Dict
The request to add to the queue
forefront: bool
Whether to add the request to the head or the end of the queue
Returns Dict
drop
Remove the request queue either from the Apify cloud storage or from the local directory.
Returns None
fetch_next_request
Return the next request in the queue to be processed.
Once you successfully finish processing of the request, you need to call
RequestQueue.mark_request_as_handled
to mark the request as handled in the queue. If there was some error in processing the request, callRequestQueue.reclaim_request
instead, so that the queue will give the request to some other consumer in another call to thefetch_next_request
method.Note that the
None
return value does not mean the queue processing finished, it means there are currently no pending requests. To check whether all requests in queue were finished, useRequestQueue.is_finished
instead.Returns Dict
get_info
Get an object containing general information about the request queue.
Returns Dict
get_request
Retrieve a request from the queue.
Parameters
request_id: str
ID of the request to retrieve.
Returns Dict
is_empty
Check whether the queue is empty.
Returns bool
is_finished
Check whether the queue is finished.
Due to the nature of distributed storage used by the queue, the function might occasionally return a false negative, but it will never return a false positive.
Returns bool
mark_request_as_handled
Mark a request as handled after successful processing.
Handled requests will never again be returned by the
RequestQueue.fetch_next_request
method.Parameters
request: Dict
The request to mark as handled.
Returns Dict
open
Open a request queue.
Request queue represents a queue of URLs to crawl, which is stored either on local filesystem or in the Apify cloud. The queue is used for deep crawling of websites, where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.
Parameters
optionalid: str
ID of the request queue to be opened. If neither
id
norname
are provided, the method returns the default request queue associated with the actor run. If the request queue with the given ID does not exist, it raises an error.optionalname: str
Name of the request queue to be opened. If neither
id
norname
are provided, the method returns the default request queue associated with the actor run. If the request queue with the given name does not exist, it is created.force_cloud: bool
If set to True, it will open a request queue on the Apify Platform even when running the actor locally. Defaults to False.
optionalconfig: Configuration
A
Configuration
instance, uses global configuration if omitted.
Returns 'RequestQueue'
reclaim_request
Reclaim a failed request back to the queue.
The request will be returned for processing later again by another call to
RequestQueue.fetchNextRequest
.Parameters
request: Dict
The request to return to the queue.
forefront: bool
Whether to add the request to the head or the end of the queue
Returns Dict
Represents a queue of URLs to crawl.
Can be used for deep crawling of websites where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.
Each URL is represented using an instance of the Request class. The queue can only contain unique URLs. More precisely, it can only contain request dictionaries with distinct
uniqueKey
properties. By default,uniqueKey
is generated from the URL, but it can also be overridden. To add a single URL multiple times to the queue, corresponding request dictionary will need to have differentuniqueKey
properties.Do not instantiate this class directly, use the
Actor.open_request_queue()
function instead.RequestQueue
stores its data either on local disk or in the Apify cloud, depending on whether theAPIFY_LOCAL_STORAGE_DIR
orAPIFY_TOKEN
environment variables are set.If the
APIFY_LOCAL_STORAGE_DIR
environment variable is set, the data is stored in the local directory in the following files:Note that
{QUEUE_ID}
is the name or ID of the request queue. The default request queue has ID:default
, unless you override it by setting theAPIFY_DEFAULT_REQUEST_QUEUE_ID
environment variable. The{REQUEST_ID}
is the id of the request.If the
APIFY_TOKEN
environment variable is set butAPIFY_LOCAL_STORAGE_DIR
is not, the data is stored in the Apify Request Queue cloud storage.