Apify API
UPDATE 2025-01-14: We have rolled out this new Apify API Documentation. In case of any issues, please report here. The old API Documentation is still available here.
The Apify API (version 2) provides programmatic access to the Apify platform. The API is organized around RESTful HTTP endpoints.
You can download the complete OpenAPI schema of Apify API in the YAML or JSON formats. The source code is also available on GitHub.
All requests and responses (including errors) are encoded in JSON format with UTF-8 encoding, with a few exceptions that are explicitly described in the reference.
To access the API using Node.js, we recommend the
apify-client
NPM
package.
To access the API using Python, we recommend the
apify-client
PyPI
package.
The clients' functions correspond to the API endpoints and have the same
parameters. This simplifies development of apps that depend on the Apify
platform.
Note: All requests with JSON payloads need to specify the Content-Type: application/json
HTTP header!
All API endpoints support the method
query parameter that can override the
HTTP method.
For example, if you want to call a POST endpoint using a GET request, simply
add the query parameter method=POST
to the URL and send the GET request.
This feature is especially useful if you want to call Apify API endpoints
from services that can only send GET requests.
Authentication
You can find your API token on the Integrations page in the Apify Console.
To use your token in a request, either:
- Add the token to your request's
Authorization
header asBearer <token>
. E.g.,Authorization: Bearer xxxxxxx
. More info. (Recommended). - Add it as the
token
parameter to your request URL. (Less secure).
Using your token in the request header is more secure than using it as a URL parameter because URLs are often stored in browser history and server logs. This creates a chance for someone unauthorized to access your API token.
Do not share your API token or password with untrusted parties.
For more information, see our integrations documentation.
Basic usage
To run an Actor, send a POST request to the Run
Actor endpoint using either the
Actor ID code (e.g. vKg4IjxZbEYTYeW8T
) or its name (e.g.
janedoe~my-actor
):
https://api.apify.com/v2/acts/[actor_id]/runs
If the Actor is not runnable anonymously, you will receive a 401 or 403
response code.
This means you need to add your secret API
token to the request's
Authorization
header (recommended) or as a
URL query parameter ?token=[your_token]
(less secure).
Optionally, you can include the query parameters described in the Run Actor section to customize your run.
If you're using Node.js, the best way to run an Actor is using the
Apify.call()
method from the Apify
SDK. It
runs the Actor using the account you are currently logged into (determined
by the secret API token).
The result is an Actor run
object and its output (if
any).
A typical workflow is as follows:
- Run an Actor or task using the Run Actor or Run task API endpoints.
- Monitor the Actor run by periodically polling its progress using the Get run API endpoint.
- Fetch the results from the Get
items API endpoint using the
defaultDatasetId
, which you receive in the Run request response. Additional data may be stored in a key-value store. You can fetch them from the Get record API endpoint using thedefaultKeyValueStoreId
and the store'skey
.
Note: Instead of periodic polling, you can also run your Actor or task synchronously. This will ensure that the request waits for 300 seconds (5 minutes) for the run to finish and returns its output. If the run takes longer, the request will time out and throw an error.
Response structure
Most API endpoints return a JSON object with the data
property:
{
"data": {
...
}
}
However, there are a few explicitly described exceptions, such as
Dataset Get items or
Key-value store Get record
API endpoints, which return data in other formats.
In case of an error, the response has the HTTP status code in the range of
4xx or 5xx and the data
property is replaced with error
. For example:
{
"error": {
"type": "record-not-found",
"message": "Store was not found."
}
}
See Errors for more details.
Pagination
All API endpoints that return a list of records (e.g. Get list of Actors) enforce pagination in order to limit the size of their responses.
Most of these API endpoints are paginated using the offset
and limit
query parameters.
The only exception is Get list of
keys,
which is paginated using the exclusiveStartKey
query parameter.
IMPORTANT: Each API endpoint that supports pagination enforces a certain
maximum value for the limit
parameter,
in order to reduce the load on Apify servers.
The maximum limit could change in future so you should never
rely on a specific value and check the responses of these API endpoints.
Using offset
Most API endpoints that return a list of records enable pagination using the following query parameters:
limit | Limits the response to contain a specific maximum number of items, e.g. limit=20 . |
offset | Skips a number of items from the beginning of the list, e.g. offset=100 . |
desc | By default, items are sorted in the order in which they were created or added to the list.
This feature is useful when fetching all the items, because it ensures that items
created after the client started the pagination will not be skipped.
If you specify the |
The response of these API endpoints is always a JSON object with the following structure:
{
"data": {
"total": 2560,
"offset": 250,
"limit": 1000,
"count": 1000,
"desc": false,
"items": [
{ 1st object },
{ 2nd object },
...
{ 1000th object }
]
}
}
The following table describes the meaning of the response properties:
Property | Description |
---|---|
total | The total number of items available in the list. |
offset | The number of items that were skipped at the start.
This is equal to the offset query parameter if it was provided, otherwise it is 0 . |
limit | The maximum number of items that can be returned in the HTTP response.
It equals to the limit query parameter if it was provided or
the maximum limit enforced for the particular API endpoint, whichever is smaller. |
count | The actual number of items returned in the HTTP response. |
desc | true if data were requested in descending order and false otherwise. |
items | An array of requested items. |
Using key
The records in the key-value store are not ordered based on numerical indexes, but rather by their keys in the UTF-8 binary order. Therefore the Get list of keys API endpoint only supports pagination using the following query parameters:
limit | Limits the response to contain a specific maximum number items, e.g. limit=20 . |
exclusiveStartKey | Skips all records with keys up to the given key including the given key, in the UTF-8 binary order. |
The response of the API endpoint is always a JSON object with following structure:
{
"data": {
"limit": 1000,
"isTruncated": true,
"exclusiveStartKey": "my-key",
"nextExclusiveStartKey": "some-other-key",
"items": [
{ 1st object },
{ 2nd object },
...
{ 1000th object }
]
}
}
The following table describes the meaning of the response properties:
Property | Description |
---|---|
limit | The maximum number of items that can be returned in the HTTP response.
It equals to the limit query parameter if it was provided or
the maximum limit enforced for the particular endpoint, whichever is smaller. |
isTruncated | true if there are more items left to be queried. Otherwise false . |
exclusiveStartKey | The last key that was skipped at the start. Is null for the first page. |
nextExclusiveStartKey | The value for the exclusiveStartKey parameter to query the next page of items. |
Errors
The Apify API uses common HTTP status codes: 2xx
range for success, 4xx
range for errors caused by the caller
(invalid requests) and 5xx
range for server errors (these are rare).
Each error response contains a JSON object defining the error
property,
which is an object with
the type
and message
properties that contain the error code and a
human-readable error description, respectively.
For example:
{
"error": {
"type": "record-not-found",
"message": "Store was not found."
}
}
Here is the table of the most common errors that can occur for many API endpoints:
status | type | message |
---|---|---|
400 | invalid-request | POST data must be a JSON object |
400 | invalid-value | Invalid value provided: Comments required |
400 | invalid-record-key | Record key contains invalid character |
401 | token-not-provided | Authentication token was not provided |
404 | record-not-found | Store was not found |
429 | rate-limit-exceeded | You have exceeded the rate limit of 30 requests per second |
405 | method-not-allowed | This API endpoint can only be accessed using the following HTTP methods: OPTIONS, POST |
Rate limiting
All API endpoints limit the rate of requests in order to prevent overloading of Apify servers by misbehaving clients.
There are two kinds of rate limits - a global rate limit and a per-resource rate limit.
Global rate limit
The global rate limit is set to 250 000 requests per minute. For authenticated requests, it is counted per user, and for unauthenticated requests, it is counted per IP address.
Per-resource rate limit
The default per-resource rate limit is 30 requests per second per resource, which in this context means a single Actor, a single Actor run, a single dataset, single key-value store etc.
The default rate limit is applied to every API endpoint except a few select ones, which have higher rate limits.
Each API endpoint returns its rate limit in X-RateLimit-Limit
header.
These endpoints have a rate limit of 100 requests per second per resource:
These endpoints have a rate limit of 200 requests per second per resource:
- Run Actor
- Run Actor task asynchronously
- Run Actor task synchronously
- Metamorph Actor run
- Push items to dataset
- CRUD (add, get, update, delete) operations on requests in request queues
Rate limit exceeded errors
If the client is sending too many requests, the API endpoints respond with the HTTP status code 429 Too Many Requests
and the following body:
{
"error": {
"type": "rate-limit-exceeded",
"message": "You have exceeded the rate limit of ... requests per second"
}
}
Retrying rate-limited requests with exponential backoff
If the client receives the rate limit error, it should wait a certain period of time and then retry the request. If the error happens again, the client should double the wait period and retry the request, and so on. This algorithm is known as exponential backoff and it can be described using the following pseudo-code:
- Define a variable
DELAY=500
- Send the HTTP request to the API endpoint
- If the response has status code not equal to
429
then you are done. Otherwise:- Wait for a period of time chosen randomly from the interval
DELAY
to2*DELAY
milliseconds - Double the future wait period by setting
DELAY = 2*DELAY
- Continue with step 2
- Wait for a period of time chosen randomly from the interval
If all requests sent by the client implement the above steps, the client will automatically use the maximum available bandwidth for its requests.
Note that the Apify API clients for JavaScript and for Python use the exponential backoff algorithm transparently, so that you do not need to worry about it.
Referring to resources
There are three main ways to refer to a resource you're accessing via API.
- the resource ID (e.g.
iKkPcIgVvwmztduf8
) username~resourcename
- when using this access method, you will need to use your API token, and access will only work if you have the correct permissions.~resourcename
- for this, you need to use an API token, and theresourcename
refers to a resource in the API token owner's account.
Authentication
- HTTP: Bearer Auth
- HTTP: Bearer Auth
- HTTP: Bearer Auth
- HTTP: Bearer Auth
- API Key: apiKey
- API Key: apiKeyActorBuilds
- API Key: apiKeyStoreId
- API Key: apiKeyQueueId
API authentication token.
Security Scheme Type: | http |
---|---|
HTTP Authorization Scheme: | bearer |
API authentication token. It is only required for private Actors. Builds of public Actors can be queried without any token.
Security Scheme Type: | http |
---|---|
HTTP Authorization Scheme: | bearer |
API authentication token. It is required only when using the username~store-name
format for storeId
.
Security Scheme Type: | http |
---|---|
HTTP Authorization Scheme: | bearer |
API authentication token. It is required only when using the username~queue-name
format for queueId
.
Security Scheme Type: | http |
---|---|
HTTP Authorization Scheme: | bearer |
API authentication token.
Security Scheme Type: | apiKey |
---|---|
Header parameter name: | token |
API authentication token. It is only required for private Actors. Builds of public Actors can be queried without any token.
Security Scheme Type: | apiKey |
---|---|
Header parameter name: | token |
API authentication token. It is required only when using the username~store-name
format for storeId
.
Security Scheme Type: | apiKey |
---|---|
Header parameter name: | token |
API authentication token. It is required only when using the username~queue-name
format for queueId
.
Security Scheme Type: | apiKey |
---|---|
Header parameter name: | token |