Skip to main content

Apify API client for Python

apify-client is the official library to access the Apify REST API from your Python applications. It provides useful features like automatic retries and convenience functions that improve the experience of using the Apify API. All requests and responses (including errors) are encoded in JSON format with UTF-8 encoding.

Pre-requisites

apify-client requires Python version 3.8 or higher. Python is available for download on the official website. Check for your current Python version by running:

python -V

Installation

You can install the client from its PyPI listing. To do that, run:

pip install apify-client

Authentication and initialization

To use the client, you need an API token. You can find your token under Integrations tab in Apify Console. Copy the token and initialize the client by providing the token (MY-APIFY-TOKEN) as a parameter to the ApifyClient constructor.

# import Apify client
from apify_client import ApifyClient

# Client initialization with the API token
apify_client = ApifyClient('MY-APIFY-TOKEN')
Secure access

The API token is used to authorize your requests to the Apify API. You can be charged for the usage of the underlying services, so do not share your API token with untrusted parties or expose it on the client side of your applications.

Quick start

One of the most common use cases is starting Actors (serverless programs running in the Apify cloud) and getting results from their datasets (storage) after they finish the job (usually scraping, automation processes or data processing).

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

# Start an Actor and waits for it to finish
actor_call = apify_client.actor('username/actor-name').call()

# Get a Actor's dataset
dataset_client = apify_client.dataset(actor_call['defaultDatasetId'])

# Lists items from the Actor's dataset
dataset_items = dataset_client.list_items().items

Running Actors

To start an Actor, you can use the ActorClient (client.actor()) and pass the Actor's ID (e.g. john-doe/my-cool-actor) to define which Actor you want to run. The Actor's ID is a combination of the username and the Actor owner’s username. You can run both your own Actors and Actors from Apify Store.

Passing input to the Actor

To define the Actor's input, you can pass a run input to the call() method. The input can be any JSON object that the Actor expects (respects the Actor's input schema).The input is used to pass configuration to the Actor, such as URLs to scrape, search terms, or any other data.

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

# Define the input for the Actor
actor_input = {
'some': 'input',
}

# Start an Actor and waits for it to finish
actor_call = apify_client.actor('username/actor-name').call(run_input=actor_input)

Getting results from the dataset

To get the results from the dataset, you can use the DatasetClient (client.dataset()) and list_items() method. You need to pass the dataset ID to define which dataset you want to access. You can get the dataset ID from the Actor's run dictionary (represented by defaultDatasetId).

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

# Get dataset
dataset_client = apify_client.dataset('dataset-id')

# Lists items from the Actor's dataset
dataset_items = dataset_client.list_items().items
Dataset access

Running an Actor might take time, depending on the Actor's complexity and the amount of data it processes. If you want only to get data and have an immediate response you should access the existing dataset of the finished Actor run.

Usage concepts

The ApifyClient interface follows a generic pattern that applies to all of its components. By calling individual methods of ApifyClient, specific clients that target individual API resources are created. There are two types of those clients:

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

# Collection clients do not require a parameter
actor_collection_client = apify_client.actors()

# Create an actor with the name: my-actor
my_actor = actor_collection_client.create(name='my-actor')

# List all of your actors
actor_list = actor_collection_client.list().items
Resource identification

The resource ID can be either the id of the said resource, or a combination of your username/resource-name.

# Resource clients accept an ID of the resource
actor_client = apify_client.actor('username/actor-name')

# Fetch the 'username/actor-name' object from the API
my_actor = actor_client.get()

# Start the run of 'username/actor-name' and return the Run object
my_actor_run = actor_client.start()

Nested clients

Sometimes clients return other clients. That's to simplify working with nested collections, such as runs of a given Actor.

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

actor_client = apify_client.actor('username/actor-name')
runs_client = actor_client.runs()

# List the last 10 runs of the Actor
actor_runs = runs_client.list(limit=10, desc=True).items

# Select the last run of the Actor that finished with a SUCCEEDED status
last_succeeded_run_client = actor_client.last_run(status='SUCCEEDED')

# Get dataset
actor_run_dataset_client = last_succeeded_run_client.dataset()

# Fetch items from the run's dataset
dataset_items = actor_run_dataset_client.list_items().items

The quick access to dataset and other storage directly from the run client can be used with the last_run() method.

Features

Based on the endpoint, the client automatically extracts the relevant data and returns it in the expected format. Date strings are automatically converted to datetime.datetime objects. For exceptions, we throw an ApifyApiError, which wraps the plain JSON errors returned by API and enriches them with other context for easier debugging.

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

try:
# Try to list items from non-existing dataset
dataset_client = apify_client.dataset('not-existing-dataset-id')
dataset_items = dataset_client.list_items().items
except Exception as ApifyApiError:
# The exception is an instance of ApifyApiError
print(ApifyApiError)

Retries with exponential backoff

Network communication sometimes fails. The client will automatically retry requests that failed due to a network error, an internal error of the Apify API (HTTP 500+) or rate limit error (HTTP 429). By default, it will retry up to 8 times. First retry will be attempted after ~500ms, second after ~1000ms and so on. You can configure those parameters using the max_retries and min_delay_between_retries_millis options of the ApifyClient constructor.

from apify_client import ApifyClient

apify_client = ApifyClient(
token='MY-APIFY-TOKEN',
max_retries=8,
min_delay_between_retries_millis=500, # 0.5s
timeout_secs=360, # 6 mins
)

Support for asynchronous usage

The package offers an asynchronous version of the client, ApifyClientAsync, which allows you to work with the Apify API in an asynchronous way, using the standard async/await syntax offered by Python.

For example, to run an actor and asynchronously stream its log while it's running, you can use this snippet:

from apify_client import ApifyClientAsync
apify_client_async = ApifyClientAsync('MY-APIFY-TOKEN')

async def main():
run = await apify_client_async.actor('my-actor').start()

async with apify_client_async.run(run['id']).log().stream() as async_log_stream:
if async_log_stream:
async for line in async_log_stream.aiter_lines():
print(line)

asyncio.run(main())

Logging

The library logs some useful debug information to the apify_client logger when sending requests to the Apify API. To have them printed out to the standard output, you need to add a handler to the logger:

import logging
apify_client_logger = logging.getLogger('apify_client')
apify_client_logger.setLevel(logging.DEBUG)
apify_client_logger.addHandler(logging.StreamHandler())

The log records have useful properties added with the extra argument, like attempt, status_code, url, client_method and resource_id. To print those out, you'll need to use a custom log formatter. To learn more about log formatters and how to use them, please refer to the official Python documentation on logging.

Convenience functions and options

Some actions can't be performed by the API itself, such as indefinite waiting for an actor run to finish (because of network timeouts). The client provides convenient call() and wait_for_finish() methods that do that.

Key-value store records can be retrieved as objects, buffers or streams via the respective options, dataset items can be fetched as individual objects or serialized data, or iterated asynchronously.

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

# Start an Actor and waits for it to finish
finished_actor_run = apify_client.actor('username/actor-name').call()

# Starts an Actor and waits maximum 60s (1 minute) for the finish
actor_run = apify_client.actor('username/actor-name').start(wait_for_finish=60)

Pagination

Most methods named list or list_something return a ListPage object, containing properties items, total, offset, count and limit. There are some exceptions though, like list_keys or list_head which paginate differently. The results you're looking for are always stored under items and you can use the limit property to get only a subset of results. Other properties can be available depending on the method.

from apify_client import ApifyClient

apify_client = ApifyClient('MY-APIFY-TOKEN')

# Resource clients accept an ID of the resource
dataset_client = apify_client.dataset('dataset-id')

# Number of items per page
limit = 1000
# Initial offset
offset = 0
# List to store all items
all_items = []

while True:
response = dataset_client.list_items(limit=limit, offset=offset)
items = response.items
total = response.total

print(f'Fetched {len(items)} items')

# Merge new items with other already loaded items
all_items.extend(items)

# If there are no more items to fetch, exit the loading
if offset + limit >= total:
break

offset += limit

print(f'Overall fetched {len(all_items)} items')

Streaming resources

Some resources (dataset items, key-value store records and logs) support streaming the resource from the Apify API in parts, without having to download the whole (potentially huge) resource to memory before processing it.

The methods to stream these resources are DatasetClient.stream_items(), KeyValueStoreClient.stream_record(), and LogClient.stream().

Instead of the parsed resource, they return a raw, context-managed httpx.Response object, which has to be consumed using the with keyword, and automatically gets closed once you exit the with block, preventing memory leaks and unclosed connections.

For example, to consume an actor run log in a streaming fashion, you can use this snippet:

with apify_client.run('MY-RUN-ID').log().stream() as log_stream:
if log_stream:
for line in log_stream.iter_lines():
print(line)