Skip to main content
Version: 0.2

Actor

{"content": ["The main class of the SDK, through which all the actor operations should be done."]}

Index

Constructors

__init__

  • __init__(config): None
  • {"content": ["Create an Actor instance.\n\nNote that you don't have to do this, all the methods on this class function as classmethods too,\nand that is their preferred usage.\n\nArgs:\n config (Configuration, optional): The actor configuration to be used. If not passed, a new Configuration instance will be created."]}


    Parameters

    • config: Optional[Configuration] = None

    Returns None

Methods

apify_client

  • apify_client(self_or_cls): ApifyClientAsync
  • {"content": ["The ApifyClientAsync instance the Actor instance uses."]}


    Parameters

    • self_or_cls: Any

    Returns ApifyClientAsync

config

  • config(self_or_cls): Configuration
  • {"content": ["The Configuration instance the Actor instance uses."]}


    Parameters

    • self_or_cls: Any

    Returns Configuration

event_manager

  • event_manager(self_or_cls): EventManager
  • {"content": ["The EventManager instance the Actor instance uses."]}


    Parameters

    • self_or_cls: Any

    Returns EventManager

log

  • log(_self_or_cls): logging.Logger
  • {"content": ["The logging.Logger instance the Actor uses."]}


    Parameters

    • _self_or_cls: Any

    Returns logging.Logger

init

  • async init(): None
  • {"content": ["Initialize the actor instance.\n\nThis initializes the Actor instance.\nIt configures the right storage client based on whether the actor is running locally or on the Apify platform,\nit initializes the event manager for processing actor events,\nand starts an interval for regularly sending PERSIST_STATE events,\nso that the actor can regularly persist its state in response to these events.\n\nThis method should be called immediately before performing any additional actor actions,\nand it should be called only once."]}


    Returns None

exit

  • async exit(*, exit_code, event_listeners_timeout_secs): None
  • {"content": ["Exit the actor instance.\n\nThis stops the Actor instance.\nIt cancels all the intervals for regularly sending PERSIST_STATE events,\nsends a final PERSIST_STATE event,\nwaits for all the event listeners to finish,\nand stops the event manager.\n\nArgs:\n exit_code (int, optional): The exit code with which the actor should fail (defaults to 0).\n event_listeners_timeout_secs (float, optional): How long should the actor wait for actor event listeners to finish before exiting"]}


    Parameters

    • keyword-onlyexit_code: int = 0
    • keyword-onlyevent_listeners_timeout_secs: Optional[float] = EVENT_LISTENERS_TIMEOUT_SECS

    Returns None

fail

  • async fail(*, exit_code, exception): None
  • {"content": ["Fail the actor instance.\n\nThis performs all the same steps as Actor.exit(),\nbut it additionally sets the exit code to 1 (by default).\n\nArgs:\n exit_code (int, optional): The exit code with which the actor should fail (defaults to 1).\n exception (BaseException, optional): The exception with which the actor failed."]}


    Parameters

    • keyword-onlyexit_code: int = 1
    • keyword-onlyexception: Optional[BaseException] = None

    Returns None

main

  • async main(main_actor_function): Optional[MainReturnType]
  • {"content": ["Initialize the actor, run the passed function and finish the actor cleanly.\n\nThe Actor.main() function is optional and is provided merely for your convenience.\nIt is mainly useful when you're running your code as an actor on the Apify platform.\n\nThe Actor.main() function performs the following actions:\n\n- When running on the Apify platform (i.e. APIFY_IS_AT_HOME environment variable is set),\n it sets up a connection to listen for platform events.\n For example, to get a notification about an imminent migration to another server.\n- It invokes the user function passed as the main_actor_function parameter.\n- If the user function was an async function, it awaits it.\n- If the user function throws an exception or some other error is encountered,\n it prints error details to console so that they are stored to the log,\n and finishes the actor cleanly.\n- Finally, it exits the Python process, with zero exit code on success and non-zero on errors.\n\nArgs:\n main_actor_function (Callable): The user function which should be run in the actor"]}


    Parameters

    • main_actor_function: Callable[[], MainReturnType]

    Returns Optional[MainReturnType]

new_client

  • new_client(*, token, api_url, max_retries, min_delay_between_retries_millis, timeout_secs): ApifyClientAsync
  • {"content": ["Return a new instance of the Apify API client.\n\nThe ApifyClientAsync class is provided by the apify-client package,\nand it is automatically configured using the APIFY_API_BASE_URL and APIFY_TOKEN environment variables.\n\nYou can override the token via the available options.\nThat's useful if you want to use the client as a different Apify user than the SDK internals are using.\n\nArgs:\n token (str, optional): The Apify API token\n api_url (str, optional): The URL of the Apify API server to which to connect to. Defaults to https://api.apify.com\n max_retries (int, optional): How many times to retry a failed request at most\n min_delay_between_retries_millis (int, optional): How long will the client wait between retrying requests\n (increases exponentially from this value)\n timeout_secs (int, optional): The socket timeout of the HTTP requests sent to the Apify API"]}


    Parameters

    • keyword-onlytoken: Optional[str] = None
    • keyword-onlyapi_url: Optional[str] = None
    • keyword-onlymax_retries: Optional[int] = None
    • keyword-onlymin_delay_between_retries_millis: Optional[int] = None
    • keyword-onlytimeout_secs: Optional[int] = None

    Returns ApifyClientAsync

open_dataset

  • async open_dataset(*, id, name, force_cloud): Dataset
  • {"content": ["Open a dataset.\n\nDatasets are used to store structured data where each object stored has the same attributes,\nsuch as online store products or real estate offers.\nThe actual data is stored either on the local filesystem or in the Apify cloud.\n\nArgs:\n id (str, optional): ID of the dataset to be opened.\n If neither id nor name are provided, the method returns the default dataset associated with the actor run.\n name (str, optional): Name of the dataset to be opened.\n If neither id nor name are provided, the method returns the default dataset associated with the actor run.\n force_cloud (bool, optional): If set to True then the Apify cloud storage is always used.\n This way it is possible to combine local and cloud storage.\n\nReturns:\n Dataset: An instance of the Dataset class for the given ID or name."]}


    Parameters

    • keyword-onlyid: Optional[str] = None
    • keyword-onlyname: Optional[str] = None
    • keyword-onlyforce_cloud: bool = False

    Returns Dataset

open_key_value_store

  • async open_key_value_store(*, id, name, force_cloud): KeyValueStore
  • {"content": ["Open a key-value store.\n\nKey-value stores are used to store records or files, along with their MIME content type.\nThe records are stored and retrieved using a unique key.\nThe actual data is stored either on a local filesystem or in the Apify cloud.\n\nArgs:\n id (str, optional): ID of the key-value store to be opened.\n If neither id nor name are provided, the method returns the default key-value store associated with the actor run.\n name (str, optional): Name of the key-value store to be opened.\n If neither id nor name are provided, the method returns the default key-value store associated with the actor run.\n force_cloud (bool, optional): If set to True then the Apify cloud storage is always used.\n This way it is possible to combine local and cloud storage.\n\nReturns:\n KeyValueStore: An instance of the KeyValueStore class for the given ID or name."]}


    Parameters

    • keyword-onlyid: Optional[str] = None
    • keyword-onlyname: Optional[str] = None
    • keyword-onlyforce_cloud: bool = False

    Returns KeyValueStore

open_request_queue

  • async open_request_queue(*, id, name, force_cloud): RequestQueue
  • {"content": ["Open a request queue.\n\nRequest queue represents a queue of URLs to crawl, which is stored either on local filesystem or in the Apify cloud.\nThe queue is used for deep crawling of websites, where you start with several URLs and then\nrecursively follow links to other pages. The data structure supports both breadth-first\nand depth-first crawling orders.\n\nArgs:\n id (str, optional): ID of the request queue to be opened.\n If neither id nor name are provided, the method returns the default request queue associated with the actor run.\n name (str, optional): Name of the request queue to be opened.\n If neither id nor name are provided, the method returns the default request queue associated with the actor run.\n force_cloud (bool, optional): If set to True then the Apify cloud storage is always used.\n This way it is possible to combine local and cloud storage.\n\nReturns:\n RequestQueue: An instance of the RequestQueue class for the given ID or name."]}


    Parameters

    • keyword-onlyid: Optional[str] = None
    • keyword-onlyname: Optional[str] = None
    • keyword-onlyforce_cloud: bool = False

    Returns RequestQueue

push_data

  • async push_data(data): None
  • {"content": ["Store an object or a list of objects to the default dataset of the current actor run.\n\nArgs:\n data (object or list of objects, optional): The data to push to the default dataset."]}


    Parameters

    • data: Any

    Returns None

get_input

  • async get_input(): Any
  • {"content": ["Get the actor input value from the default key-value store associated with the current actor run."]}


    Returns Any

get_value

  • async get_value(key): Any
  • {"content": ["Get a value from the default key-value store associated with the current actor run.\n\nArgs:\n key (str): The key of the record which to retrieve."]}


    Parameters

    • key: str

    Returns Any

set_value

  • async set_value(key, value, *, content_type): None
  • {"content": ["Set or delete a value in the default key-value store associated with the current actor run.\n\nArgs:\n key (str): The key of the record which to set.\n value (any): The value of the record which to set, or None, if the record should be deleted.\n content_type (str, optional): The content type which should be set to the value."]}


    Parameters

    • key: str
    • value: Any
    • keyword-onlycontent_type: Optional[str] = None

    Returns None

on

  • on(event_name, listener): Callable
  • {"content": ["Add an event listener to the actor's event manager.\n\nThe following events can be emitted:\n - ActorEventTypes.SYSTEM_INFO:\n Emitted every minute, the event data contains info about the resource usage of the actor.\n - ActorEventTypes.MIGRATING:\n Emitted when the actor running on the Apify platform is going to be migrated to another worker server soon.\n You can use it to persist the state of the actor and abort the run, to speed up the migration.\n - ActorEventTypes.PERSIST_STATE:\n Emitted in regular intervals (by default 60 seconds) to notify the actor that it should persist its state,\n in order to avoid repeating all work when the actor restarts.\n This event is automatically emitted together with the migrating event,\n in which case the isMigrating flag in the event data is set to True, otherwise the flag is False.\n Note that this event is provided merely for your convenience,\n you can achieve the same effect using an interval and listening for the migrating event.\n - ActorEventTypes.ABORTING:\n When a user aborts an actor run on the Apify platform,\n they can choose to abort it gracefully, to allow the actor some time before getting terminated.\n This graceful abort emits the aborting event, which you can use to clean up the actor state.\n\nArgs:\n event_name (ActorEventTypes): The actor event for which to listen to.\n listener (Callable): The function which is to be called when the event is emitted (can be async)."]}


    Parameters

    • event_name: ActorEventTypes
    • listener: Callable

    Returns Callable

off

  • off(event_name, listener): None
  • {"content": ["Remove a listener, or all listeners, from an actor event.\n\nArgs:\n event_name (ActorEventTypes): The actor event for which to remove listeners.\n listener (Callable, optional): The listener which is supposed to be removed. If not passed, all listeners of this event are removed."]}


    Parameters

    • event_name: ActorEventTypes
    • listener: Optional[Callable] = None

    Returns None

is_at_home

  • is_at_home(): bool
  • {"content": ["Return True when the actor is running on the Apify platform, and False otherwise (for example when running locally)."]}


    Returns bool

get_env

  • get_env(): Dict
  • {"content": ["Return a dictionary with information parsed from all the APIFY_XXX environment variables.\n\nFor a list of all the environment variables,\nsee the Actor documentation.\nIf some variables are not defined or are invalid, the corresponding value in the resulting dictionary will be None."]}


    Returns Dict

start

  • async start(actor_id, run_input, *, token, content_type, build, memory_mbytes, timeout_secs, wait_for_finish, webhooks): Dict
  • {"content": ["Run an actor on the Apify platform.\n\nUnlike Actor.call, this method just starts the run without waiting for finish.\n\nArgs:\n actor_id (str): The ID of the actor to be run.\n run_input (Any, optional): The input to pass to the actor run.\n token (str, optional): The Apify API token to use for this request (defaults to the APIFY_TOKEN environment variable).\n content_type (str, optional): The content type of the input.\n build (str, optional): Specifies the actor build to run. It can be either a build tag or build number.\n By default, the run uses the build specified in the default run configuration for the actor (typically latest).\n memory_mbytes (int, optional): Memory limit for the run, in megabytes.\n By default, the run uses a memory limit specified in the default run configuration for the actor.\n timeout_secs (int, optional): Optional timeout for the run, in seconds.\n By default, the run uses timeout specified in the default run configuration for the actor.\n wait_for_finish (int, optional): The maximum number of seconds the server waits for the run to finish.\n By default, it is 0, the maximum value is 300.\n webhooks (list of dict, optional): Optional ad-hoc webhooks (https://docs.apify.com/webhooks/ad-hoc-webhooks)\n associated with the actor run which can be used to receive a notification,\n e.g. when the actor finished or failed.\n If you already have a webhook set up for the actor or task, you do not have to add it again here.\n Each webhook is represented by a dictionary containing these items:\n * `event_types`: list of `WebhookEventType` values which trigger the webhook\n * `request_url`: URL to which to send the webhook HTTP request\n * `payload_template` (optional): Optional template for the request payload\n\nReturns:\n dict: Info about the started actor run"]}


    Parameters

    • actor_id: str
    • run_input: Optional[Any] = None
    • keyword-onlytoken: Optional[str] = None
    • keyword-onlycontent_type: Optional[str] = None
    • keyword-onlybuild: Optional[str] = None
    • keyword-onlymemory_mbytes: Optional[int] = None
    • keyword-onlytimeout_secs: Optional[int] = None
    • keyword-onlywait_for_finish: Optional[int] = None
    • keyword-onlywebhooks: Optional[List[Dict]] = None

    Returns Dict

abort

  • async abort(run_id, *, token, gracefully): Dict
  • {"content": ["Abort given actor run on the Apify platform using the current user account (determined by the APIFY_TOKEN environment variable).\n\nArgs:\n run_id (str): The ID of the actor run to be aborted.\n token (str, optional): The Apify API token to use for this request (defaults to the APIFY_TOKEN environment variable).\n gracefully (bool, optional): If True, the actor run will abort gracefully.\n It will send `aborting` and `persistStates` events into the run and force-stop the run after 30 seconds.\n It is helpful in cases where you plan to resurrect the run later.\n\nReturns:\n dict: Info about the aborted actor run"]}


    Parameters

    • run_id: str
    • keyword-onlytoken: Optional[str] = None
    • keyword-onlygracefully: Optional[bool] = None

    Returns Dict

call

  • async call(actor_id, run_input, *, token, content_type, build, memory_mbytes, timeout_secs, webhooks, wait_secs): Optional[Dict]
  • {"content": ["Start an actor on the Apify Platform and wait for it to finish before returning.\n\nIt waits indefinitely, unless the wait_secs argument is provided.\n\nArgs:\n actor_id (str): The ID of the actor to be run.\n run_input (Any, optional): The input to pass to the actor run.\n token (str, optional): The Apify API token to use for this request (defaults to the APIFY_TOKEN environment variable).\n content_type (str, optional): The content type of the input.\n build (str, optional): Specifies the actor build to run. It can be either a build tag or build number.\n By default, the run uses the build specified in the default run configuration for the actor (typically latest).\n memory_mbytes (int, optional): Memory limit for the run, in megabytes.\n By default, the run uses a memory limit specified in the default run configuration for the actor.\n timeout_secs (int, optional): Optional timeout for the run, in seconds.\n By default, the run uses timeout specified in the default run configuration for the actor.\n webhooks (list, optional): Optional webhooks (https://docs.apify.com/webhooks) associated with the actor run,\n which can be used to receive a notification, e.g. when the actor finished or failed.\n If you already have a webhook set up for the actor, you do not have to add it again here.\n wait_secs (int, optional): The maximum number of seconds the server waits for the run to finish. If not provided, waits indefinitely.\n\nReturns:\n dict: Info about the started actor run"]}


    Parameters

    • actor_id: str
    • run_input: Optional[Any] = None
    • keyword-onlytoken: Optional[str] = None
    • keyword-onlycontent_type: Optional[str] = None
    • keyword-onlybuild: Optional[str] = None
    • keyword-onlymemory_mbytes: Optional[int] = None
    • keyword-onlytimeout_secs: Optional[int] = None
    • keyword-onlywebhooks: Optional[List[Dict]] = None
    • keyword-onlywait_secs: Optional[int] = None

    Returns Optional[Dict]

call_task

  • async call_task(task_id, task_input, *, build, memory_mbytes, timeout_secs, webhooks, wait_secs, token): Optional[Dict]
  • {"content": ["Start an actor task on the Apify Platform and wait for it to finish before returning.\n\nIt waits indefinitely, unless the wait_secs argument is provided.\n\nNote that an actor task is a saved input configuration and options for an actor.\nIf you want to run an actor directly rather than an actor task, please use the Actor.call\n\nArgs:\n task_id (str): The ID of the actor to be run.\n task_input (Any, optional): Overrides the input to pass to the actor run.\n token (str, optional): The Apify API token to use for this request (defaults to the APIFY_TOKEN environment variable).\n content_type (str, optional): The content type of the input.\n build (str, optional): Specifies the actor build to run. It can be either a build tag or build number.\n By default, the run uses the build specified in the default run configuration for the actor (typically latest).\n memory_mbytes (int, optional): Memory limit for the run, in megabytes.\n By default, the run uses a memory limit specified in the default run configuration for the actor.\n timeout_secs (int, optional): Optional timeout for the run, in seconds.\n By default, the run uses timeout specified in the default run configuration for the actor.\n webhooks (list, optional): Optional webhooks (https://docs.apify.com/webhooks) associated with the actor run,\n which can be used to receive a notification, e.g. when the actor finished or failed.\n If you already have a webhook set up for the actor, you do not have to add it again here.\n wait_secs (int, optional): The maximum number of seconds the server waits for the run to finish. If not provided, waits indefinitely.\n\nReturns:\n dict: Info about the started actor run"]}


    Parameters

    • task_id: str
    • task_input: Optional[Dict[str, Any]] = None
    • keyword-onlybuild: Optional[str] = None
    • keyword-onlymemory_mbytes: Optional[int] = None
    • keyword-onlytimeout_secs: Optional[int] = None
    • keyword-onlywebhooks: Optional[List[Dict]] = None
    • keyword-onlywait_secs: Optional[int] = None
    • keyword-onlytoken: Optional[str] = None

    Returns Optional[Dict]

metamorph

  • async metamorph(target_actor_id, run_input, *, target_actor_build, content_type, custom_after_sleep_millis): None
  • {"content": ["Transform this actor run to an actor run of a different actor.\n\nThe platform stops the current actor container and starts a new container with the new actor instead.\nAll the default storages are preserved,\nand the new input is stored under the INPUT-METAMORPH-1 key in the same default key-value store.\n\nArgs:\n target_actor_id (str): ID of the target actor that the run should be transformed into\n run_input (Any, optional): The input to pass to the new run.\n target_actor_build (str, optional): The build of the target actor. It can be either a build tag or build number.\n By default, the run uses the build specified in the default run configuration for the target actor (typically the latest build).\n content_type (str, optional): The content type of the input.\n custom_after_sleep_millis (int, optional): How long to sleep for after the metamorph, to wait for the container to be stopped.\n\nReturns:\n dict: The actor run data."]}


    Parameters

    • target_actor_id: str
    • run_input: Optional[Any] = None
    • keyword-onlytarget_actor_build: Optional[str] = None
    • keyword-onlycontent_type: Optional[str] = None
    • keyword-onlycustom_after_sleep_millis: Optional[int] = None

    Returns None

reboot

  • async reboot(*, event_listeners_timeout_secs): None
  • {"content": ["Internally reboot this actor.\n\nThe system stops the current container and starts a new one, with the same run ID and default storages.\n\nArgs:\n event_listeners_timeout_secs (int, optional): How long should the actor wait for actor event listeners to finish before exiting"]}


    Parameters

    • keyword-onlyevent_listeners_timeout_secs: Optional[int] = EVENT_LISTENERS_TIMEOUT_SECS

    Returns None

add_webhook

  • async add_webhook(*, event_types, request_url, payload_template, ignore_ssl_errors, do_not_retry, idempotency_key): Dict
  • {"content": ["Create an ad-hoc webhook for the current actor run.\n\nThis webhook lets you receive a notification when the actor run finished or failed.\n\nNote that webhooks are only supported for actors running on the Apify platform.\nWhen running the actor locally, the function will print a warning and have no effect.\n\nFor more information about Apify actor webhooks, please see the documentation.\n\nArgs:\n event_types (list of WebhookEventType): List of event types that should trigger the webhook. At least one is required.\n request_url (str): URL that will be invoked once the webhook is triggered.\n payload_template (str, optional): Specification of the payload that will be sent to request_url\n ignore_ssl_errors (bool, optional): Whether the webhook should ignore SSL errors returned by request_url\n do_not_retry (bool, optional): Whether the webhook should retry sending the payload to request_url upon\n failure.\n idempotency_key (str, optional): A unique identifier of a webhook. You can use it to ensure that you won't\n create the same webhook multiple times.\n\nReturns:\n dict: The created webhook"]}


    Parameters

    • keyword-onlyevent_types: List[WebhookEventType]
    • keyword-onlyrequest_url: str
    • keyword-onlypayload_template: Optional[str] = None
    • keyword-onlyignore_ssl_errors: Optional[bool] = None
    • keyword-onlydo_not_retry: Optional[bool] = None
    • keyword-onlyidempotency_key: Optional[str] = None

    Returns Dict

set_status_message

  • async set_status_message(status_message): Optional[Dict]
  • {"content": ["Set the status message for the current actor run.\n\nArgs:\n status_message (str): The status message to set to the run.\n\nReturns:\n dict: The updated actor run object"]}


    Parameters

    • status_message: str

    Returns Optional[Dict]

create_proxy_configuration

  • async create_proxy_configuration(*, password, groups, country_code, proxy_urls, new_url_function, actor_proxy_input): Optional[ProxyConfiguration]
  • {"content": ["Create a ProxyConfiguration object with the passed proxy configuration.\n\nConfigures connection to a proxy server with the provided options.\nProxy servers are used to prevent target websites from blocking your crawlers based on IP address rate limits or blacklists.\n\nFor more details and code examples, see the ProxyConfiguration class.\n\nArgs:\n password (str, optional): Password for the Apify Proxy. If not provided, will use os.environ['APIFY_PROXY_PASSWORD'], if available.\n groups (list of str, optional): Proxy groups which the Apify Proxy should use, if provided.\n country_code (str, optional): Country which the Apify Proxy should use, if provided.\n proxy_urls (list of str, optional): Custom proxy server URLs which should be rotated through.\n new_url_function (Callable, optional): Function which returns a custom proxy URL to be used.\n actor_proxy_input (dict, optional): Proxy configuration field from the actor input, if actor has such input field.\n\nReturns:\n ProxyConfiguration, optional: ProxyConfiguration object with the passed configuration,\n or None, if no proxy should be used based on the configuration."]}


    Parameters

    • keyword-onlypassword: Optional[str] = None
    • keyword-onlygroups: Optional[List[str]] = None
    • keyword-onlycountry_code: Optional[str] = None
    • keyword-onlyproxy_urls: Optional[List[str]] = None
    • keyword-onlynew_url_function: Optional[Union[Callable[[Optional[str]], str], Callable[[Optional[str]], Awaitable[str]]]] = None
    • keyword-onlyactor_proxy_input: Optional[Dict] = None

    Returns Optional[ProxyConfiguration]