Actor
Index
Constructors
Methods
Constructors
__init__
Create an Actor instance.
Note that you don't have to do this, all the functionality is accessible using the default instance (e.g.
Actor.open_dataset()
).Parameters
configuration: Configuration | None = None
The Actor configuration to be used. If not passed, a new Configuration instance will be created.
keyword-onlyconfigure_logging: bool = True
Should the default logging configuration be configured?
Returns None
Methods
__call__
Make a new Actor instance with a non-default configuration.
Parameters
configuration: Configuration | None = None
keyword-onlyconfigure_logging: bool = True
Returns Self
__repr__
Returns str
abort
Abort given Actor run on the Apify platform using the current user account.
The user account is determined by the
APIFY_TOKEN
environment variable.Parameters
run_id: str
The ID of the Actor run to be aborted.
keyword-onlytoken: str | None = None
The Apify API token to use for this request (defaults to the
APIFY_TOKEN
environment variable).keyword-onlystatus_message: str | None = None
Status message of the Actor to be set on the platform.
keyword-onlygracefully: bool | None = None
If True, the Actor run will abort gracefully. It will send
aborting
andpersistState
events into the run and force-stop the run after 30 seconds. It is helpful in cases where you plan to resurrect the run later.
Returns ActorRun
Info about the aborted Actor run.
add_webhook
Create an ad-hoc webhook for the current Actor run.
This webhook lets you receive a notification when the Actor run finished or failed.
Note that webhooks are only supported for Actors running on the Apify platform. When running the Actor locally, the function will print a warning and have no effect.
For more information about Apify Actor webhooks, please see the documentation.
Parameters
webhook: Webhook
The webhook to be added
keyword-onlyignore_ssl_errors: bool | None = None
Whether the webhook should ignore SSL errors returned by request_url
keyword-onlydo_not_retry: bool | None = None
Whether the webhook should retry sending the payload to request_url upon failure.
keyword-onlyidempotency_key: str | None = None
A unique identifier of a webhook. You can use it to ensure that you won't create the same webhook multiple times.
Returns None
The created webhook.
apify_client
The ApifyClientAsync instance the Actor instance uses.
Returns ApifyClientAsync
call
Start an Actor on the Apify Platform and wait for it to finish before returning.
It waits indefinitely, unless the wait argument is provided.
Parameters
actor_id: str
The ID of the Actor to be run.
run_input: Any = None
The input to pass to the Actor run.
keyword-onlytoken: str | None = None
The Apify API token to use for this request (defaults to the
APIFY_TOKEN
environment variable).keyword-onlycontent_type: str | None = None
The content type of the input.
keyword-onlybuild: str | None = None
Specifies the Actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the default run configuration for the Actor (typically latest).
keyword-onlymemory_mbytes: int | None = None
Memory limit for the run, in megabytes. By default, the run uses a memory limit specified in the default run configuration for the Actor.
keyword-onlytimeout: timedelta | None = None
Optional timeout for the run, in seconds. By default, the run uses timeout specified in the default run configuration for the Actor.
keyword-onlywebhooks: list[Webhook] | None = None
Optional webhooks (https://docs.apify.com/webhooks) associated with the Actor run, which can be used to receive a notification, e.g. when the Actor finished or failed. If you already have a webhook set up for the Actor, you do not have to add it again here.
keyword-onlywait: timedelta | None = None
The maximum number of seconds the server waits for the run to finish. If not provided, waits indefinitely.
Returns ActorRun | None
Info about the started Actor run.
call_task
Start an Actor task on the Apify Platform and wait for it to finish before returning.
It waits indefinitely, unless the wait argument is provided.
Note that an Actor task is a saved input configuration and options for an Actor. If you want to run an Actor directly rather than an Actor task, please use the
Actor.call
Parameters
task_id: str
The ID of the Actor to be run.
task_input: dict | None = None
Overrides the input to pass to the Actor run.
keyword-onlybuild: str | None = None
Specifies the Actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the default run configuration for the Actor (typically latest).
keyword-onlymemory_mbytes: int | None = None
Memory limit for the run, in megabytes. By default, the run uses a memory limit specified in the default run configuration for the Actor.
keyword-onlytimeout: timedelta | None = None
Optional timeout for the run, in seconds. By default, the run uses timeout specified in the default run configuration for the Actor.
keyword-onlywebhooks: list[Webhook] | None = None
Optional webhooks (https://docs.apify.com/webhooks) associated with the Actor run, which can be used to receive a notification, e.g. when the Actor finished or failed. If you already have a webhook set up for the Actor, you do not have to add it again here.
keyword-onlywait: timedelta | None = None
The maximum number of seconds the server waits for the run to finish. If not provided, waits indefinitely.
keyword-onlytoken: str | None = None
The Apify API token to use for this request (defaults to the
APIFY_TOKEN
environment variable).
Returns ActorRun | None
Info about the started Actor run.
config
The Configuration instance the Actor instance uses.
Returns Configuration
configuration
The Configuration instance the Actor instance uses.
Returns Configuration
create_proxy_configuration
Create a ProxyConfiguration object with the passed proxy configuration.
Configures connection to a proxy server with the provided options. Proxy servers are used to prevent target websites from blocking your crawlers based on IP address rate limits or blacklists.
For more details and code examples, see the
ProxyConfiguration
class.Parameters
keyword-onlyactor_proxy_input: dict | None = None
Proxy configuration field from the Actor input, if input has such input field. If you pass this argument, all the other arguments will be inferred from it.
keyword-onlypassword: str | None = None
Password for the Apify Proxy. If not provided, will use os.environ['APIFY_PROXY_PASSWORD'], if available.
keyword-onlygroups: list[str] | None = None
Proxy groups which the Apify Proxy should use, if provided.
keyword-onlycountry_code: str | None = None
Country which the Apify Proxy should use, if provided.
keyword-onlyproxy_urls: list[str] | None = None
Custom proxy server URLs which should be rotated through.
keyword-onlynew_url_function: _NewUrlFunction | None = None
Function which returns a custom proxy URL to be used.
Returns ProxyConfiguration | None
ProxyConfiguration object with the passed configuration, or None, if no proxy should be used based on the configuration.
event_manager
The EventManager instance the Actor instance uses.
Returns EventManager
exit
Exit the Actor instance.
This stops the Actor instance. It cancels all the intervals for regularly sending
PERSIST_STATE
events, sends a finalPERSIST_STATE
event, waits for all the event listeners to finish, and stops the event manager.Parameters
keyword-onlyexit_code: int = 0
The exit code with which the Actor should fail (defaults to
0
).keyword-onlyevent_listeners_timeout: timedelta | None = EVENT_LISTENERS_TIMEOUT
How long should the Actor wait for Actor event listeners to finish before exiting.
keyword-onlystatus_message: str | None = None
The final status message that the Actor should display.
keyword-onlycleanup_timeout: timedelta = timedelta(seconds=30)
How long we should wait for event listeners.
Returns None
fail
Fail the Actor instance.
This performs all the same steps as Actor.exit(), but it additionally sets the exit code to
1
(by default).Parameters
keyword-onlyexit_code: int = 1
The exit code with which the Actor should fail (defaults to
1
).keyword-onlyexception: BaseException | None = None
The exception with which the Actor failed.
keyword-onlystatus_message: str | None = None
The final status message that the Actor should display.
Returns None
get_env
Return a dictionary with information parsed from all the
APIFY_XXX
environment variables.For a list of all the environment variables, see the Actor documentation. If some variables are not defined or are invalid, the corresponding value in the resulting dictionary will be None.
Returns dict
get_input
Get the Actor input value from the default key-value store associated with the current Actor run.
Returns Any
get_value
Get a value from the default key-value store associated with the current Actor run.
Parameters
key: str
The key of the record which to retrieve.
default_value: Any = None
Default value returned in case the record does not exist.
Returns Any
init
Initialize the Actor instance.
This initializes the Actor instance. It configures the right storage client based on whether the Actor is running locally or on the Apify platform, it initializes the event manager for processing Actor events, and starts an interval for regularly sending
PERSIST_STATE
events, so that the Actor can regularly persist its state in response to these events.This method should be called immediately before performing any additional Actor actions, and it should be called only once.
Returns None
is_at_home
Return
True
when the Actor is running on the Apify platform, andFalse
otherwise (e.g. local run).Returns bool
log
The logging.Logger instance the Actor uses.
Returns logging.Logger
metamorph
Transform this Actor run to an Actor run of a different Actor.
The platform stops the current Actor container and starts a new container with the new Actor instead. All the default storages are preserved, and the new input is stored under the
INPUT-METAMORPH-1
key in the same default key-value store.Parameters
target_actor_id: str
ID of the target Actor that the run should be transformed into
run_input: Any = None
The input to pass to the new run.
keyword-onlytarget_actor_build: str | None = None
The build of the target Actor. It can be either a build tag or build number. By default, the run uses the build specified in the default run configuration for the target Actor (typically the latest build).
keyword-onlycontent_type: str | None = None
The content type of the input.
keyword-onlycustom_after_sleep: timedelta | None = None
How long to sleep for after the metamorph, to wait for the container to be stopped.
Returns None
new_client
Return a new instance of the Apify API client.
The
ApifyClientAsync
class is provided by the apify-client package, and it is automatically configured using theAPIFY_API_BASE_URL
andAPIFY_TOKEN
environment variables.You can override the token via the available options. That's useful if you want to use the client as a different Apify user than the SDK internals are using.
Parameters
keyword-onlytoken: str | None = None
The Apify API token.
keyword-onlyapi_url: str | None = None
The URL of the Apify API server to which to connect to. Defaults to https://api.apify.com.
keyword-onlymax_retries: int | None = None
How many times to retry a failed request at most.
keyword-onlymin_delay_between_retries: timedelta | None = None
How long will the client wait between retrying requests (increases exponentially from this value).
keyword-onlytimeout: timedelta | None = None
The socket timeout of the HTTP requests sent to the Apify API.
Returns ApifyClientAsync
off
Remove a listener, or all listeners, from an Actor event.
Parameters
event_name: Event
The Actor event for which to remove listeners.
listener: Callable | None = None
The listener which is supposed to be removed. If not passed, all listeners of this event are removed.
Returns None
on
Add an event listener to the Actor's event manager.
The following events can be emitted:
Event.SYSTEM_INFO
: Emitted every minute; the event data contains information about the Actor's resource usage.Event.MIGRATING
: Emitted when the Actor on the Apify platform is about to be migrated to another worker server. Use this event to persist the Actor's state and gracefully stop in-progress tasks, preventing disruption.Event.PERSIST_STATE
: Emitted regularly (default: 60 seconds) to notify the Actor to persist its state, preventing work repetition after a restart. This event is emitted together with theMIGRATING
event, where theisMigrating
flag in the event data isTrue
; otherwise, the flag isFalse
. This event is for convenience; the same effect can be achieved by setting an interval and listening for theMIGRATING
event.Event.ABORTING
: Emitted when a user aborts an Actor run on the Apify platform, allowing the Actor time to clean up its state if the abort is graceful.
Parameters
event_name: Event
The Actor event to listen for.
listener: Callable
The function to be called when the event is emitted (can be async).
Returns Callable
open_dataset
Open a dataset.
Datasets are used to store structured data where each object stored has the same attributes, such as online store products or real estate offers. The actual data is stored either on the local filesystem or in the Apify cloud.
Parameters
keyword-onlyid: str | None = None
ID of the dataset to be opened. If neither
id
norname
are provided, the method returns the default dataset associated with the Actor run.keyword-onlyname: str | None = None
Name of the dataset to be opened. If neither
id
norname
are provided, the method returns the default dataset associated with the Actor run.keyword-onlyforce_cloud: bool = False
If set to
True
then the Apify cloud storage is always used. This way it is possible to combine local and cloud storage.
Returns Dataset
An instance of the
Dataset
class for the given ID or name.
open_key_value_store
Open a key-value store.
Key-value stores are used to store records or files, along with their MIME content type. The records are stored and retrieved using a unique key. The actual data is stored either on a local filesystem or in the Apify cloud.
Parameters
keyword-onlyid: str | None = None
ID of the key-value store to be opened. If neither
id
norname
are provided, the method returns the default key-value store associated with the Actor run.keyword-onlyname: str | None = None
Name of the key-value store to be opened. If neither
id
norname
are provided, the method returns the default key-value store associated with the Actor run.keyword-onlyforce_cloud: bool = False
If set to
True
then the Apify cloud storage is always used. This way it is possible to combine local and cloud storage.
Returns KeyValueStore
An instance of the
KeyValueStore
class for the given ID or name.
open_request_queue
Open a request queue.
Request queue represents a queue of URLs to crawl, which is stored either on local filesystem or in the Apify cloud. The queue is used for deep crawling of websites, where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.
Parameters
keyword-onlyid: str | None = None
ID of the request queue to be opened. If neither
id
norname
are provided, the method returns the default request queue associated with the Actor run.keyword-onlyname: str | None = None
Name of the request queue to be opened. If neither
id
norname
are provided, the method returns the default request queue associated with the Actor run.keyword-onlyforce_cloud: bool = False
If set to
True
then the Apify cloud storage is always used. This way it is possible to combine local and cloud storage.
Returns RequestQueue
An instance of the
RequestQueue
class for the given ID or name.
push_data
Store an object or a list of objects to the default dataset of the current Actor run.
Parameters
data: dict | list[dict]
The data to push to the default dataset.
Returns None
reboot
Internally reboot this Actor.
The system stops the current container and starts a new one, with the same run ID and default storages.
Parameters
keyword-onlyevent_listeners_timeout: timedelta | None = EVENT_LISTENERS_TIMEOUT
How long should the Actor wait for Actor event listeners to finish before exiting
keyword-onlycustom_after_sleep: timedelta | None = None
How long to sleep for after the reboot, to wait for the container to be stopped.
Returns None
set_status_message
Set the status message for the current Actor run.
Parameters
status_message: str
The status message to set to the run.
keyword-onlyis_terminal: bool | None = None
Set this flag to True if this is the final status message of the Actor run.
Returns ActorRun | None
The updated Actor run object.
set_value
Set or delete a value in the default key-value store associated with the current Actor run.
Parameters
key: str
The key of the record which to set.
value: Any
The value of the record which to set, or None, if the record should be deleted.
keyword-onlycontent_type: str | None = None
The content type which should be set to the value.
Returns None
start
Run an Actor on the Apify platform.
Unlike
Actor.call
, this method just starts the run without waiting for finish.Parameters
actor_id: str
The ID of the Actor to be run.
run_input: Any = None
The input to pass to the Actor run.
keyword-onlytoken: str | None = None
The Apify API token to use for this request (defaults to the
APIFY_TOKEN
environment variable).keyword-onlycontent_type: str | None = None
The content type of the input.
keyword-onlybuild: str | None = None
Specifies the Actor build to run. It can be either a build tag or build number. By default, the run uses the build specified in the default run configuration for the Actor (typically latest).
keyword-onlymemory_mbytes: int | None = None
Memory limit for the run, in megabytes. By default, the run uses a memory limit specified in the default run configuration for the Actor.
keyword-onlytimeout: timedelta | None = None
Optional timeout for the run, in seconds. By default, the run uses timeout specified in the default run configuration for the Actor.
keyword-onlywait_for_finish: int | None = None
The maximum number of seconds the server waits for the run to finish. By default, it is 0, the maximum value is 300.
keyword-onlywebhooks: list[Webhook] | None = None
Optional ad-hoc webhooks (https://docs.apify.com/webhooks/ad-hoc-webhooks) associated with the Actor run which can be used to receive a notification, e.g. when the Actor finished or failed. If you already have a webhook set up for the Actor or task, you do not have to add it again here.
Returns ActorRun
Info about the started Actor run
The class of
Actor
. Only make a new instance if you're absolutely sure you need to.