Skip to main content

Storage

Store anything from images and key-value pairs to structured output data. Learn how to access and manage your stored data from the Apify platform or via API.


The Apify platform includes four types of storage you can use both in your Actors and outside the Apify platform via API: the JavaScript SDK, the Python SDK, the JavaScript API client, and the Python API client.

This page contains a brief introduction of the three types of Apify Storage.

  • Dataset - storage for data objects such as scraping output.
  • Key-value store - storage for arbitrary data records such as files, images, and strings.
  • Request queue - a queue of URLs for your Actors to visit.

You will then find basic usage information relating to all types of storage. For example, how to manage your storage in Apify Console, the basics of setting up the JavaScript SDK and Crawlee, Python SDK, the JavaScript API client, and the Python API client. You will also find general information for using storage with the Apify API.

Dataset

Dataset storage allows you to store a series of data objects such as results from web scraping, crawling or data processing jobs. You can export your datasets in JSON, CSV, XML, RSS, Excel or HTML formats.

Dataset graphic

The easiest way to access your datasets is via Apify Console, which provides a user-friendly interface for viewing or downloading the data and editing your datasets' properties.

To manage your datasets, you can use the JavaScript SDK, Python SDK, JavaScript API client, Python API client, or the Apify API.

See the dataset documentation for details.

Key-value store

The key-value store is ideal for saving data records such as files, screenshots of web pages, and PDFs or for persisting your Actor's state. The records are accessible under a unique name and can be written and read quickly.

Key-value store graphic

The easiest way to access your key-value stores is via Apify Console, which provides a user-friendly interface for viewing or downloading the data and editing your key-value stores' properties.

To manage your key-value stores, you can use the JavaScript SDK, Python SDK, JavaScript API client, Python API client, or the Apify API.

See the key-value store documentation for details.

Request queue

Request queues allow you to dynamically maintain a queue of URLs of web pages. You can use this when recursively crawling websites: you start from initial URLs and add new links as they are found while skipping duplicates.

Request queue graphic

The easiest way to access your request queues is via Apify Console, which provides a user-friendly interface for viewing your request queues and editing your queues' properties.

To manage your request queues, you can use the JavaScript SDK, Python SDK, JavaScript API client, Python API client, or the Apify API.

See the request queue documentation for details.

Basic usage

There are five ways to access your storage:

Apify Console

To access your storages from Apify Console, go to the Storage section in the left-side menu. From there, you can click through the tabs to view your key-value stores, datasets, request queues and related API endpoints. To view a storage, click its ID.

Storages in app

Only named storages are displayed by default. Select the Include unnamed store checkbox to display all of your storages.

You can edit your stores' names by clicking their caption (ID or name) on their detail page.

Under the Settings tab of their detail page, you can grant access rights to other Apify users.

You can quickly share your storages' contents and details by sharing the URLs you find under the API tab in a store's detail page.

Storage API

These URLs provide links to API endpoints–the places where your data are stored. Endpoints that allow you to read stored information do not require an authentication token. The calls are authenticated using a hard-to-guess ID, so they can be shared freely. Operations such as update or delete, however, will need the authentication token.

Never share a URL containing your authentication token, as this will compromise your account's security.
If the data you want to share requires a token, first download the data, then share it as a file.

JavaScript SDK and Crawlee

The Apify JavaScript SDK is a JavaScript/Node.js library providing tools to build your own Actors. Crawlee is a JavaScript/Node.js library that allows you to build your own web scraping and automation solutions (it was formerly a part of the JavaScript SDK). Both libraries require Node.js 16 or later.

See Crawlee documentation for setup instructions and to learn how to build your own crawlers and run them on the Apify platform.

Python SDK

The Apify Python SDK is a Python library providing tools to build your own Actors. We do not currently have an alternative to Crawlee for Python, but we plan on developing it in the future.

JavaScript API client

Apify's JavaScript API client (apify-client) allows you to access your datasets from any Node.js application, whether it is running on the Apify platform or elsewhere.

See the client's documentation for help with setup.

Python API client

Apify's Python API client (apify-client) allows you to access your datasets from any Python application, whether it is running on the Apify platform or elsewhere.

See the client's documentation for help with setup.

Apify API

The Apify API allows you to access your storages programmatically using HTTP requests and easily share your crawling results.

In most cases, when accessing your storages via API, you will need to provide a store ID, which you can do in the following formats:

  • WkzbQMuFYuamGv3YF - the store's alphanumerical ID if the store is unnamed.
  • ~store-name - the store's name prefixed with tilde (~) character if the store is named (e.g. ~ecommerce-scraping-results)
  • username~store-name - username and the store's name separated by a tilde (~) character if the store is named and belongs to a different account (e.g. janedoe~ecommerce-scraping-results). Note that in this case, the store's owner needs to grant you access first.

For read (GET) requests, it is enough to use a store's alphanumerical ID, since the ID is hard to guess and effectively serves as an authentication key.

With other request types and when using the username~store-name, however, you will need to provide your secret API token in your request's Authorization header or as a query parameter. You can find your token on the Integrations page of your Apify account.

See the API documentation for details and a breakdown of each storage API endpoint.

Rate limiting

All API endpoints limit their rate of requests to protect Apify servers from overloading. The default rate limit is 30 requests per second per storage object, with a few exceptions, which are limited to 200 requests per second per storage object:

If a client sends too many requests, the API endpoints respond with the HTTP status code 429 Too Many Requests and the following body:

{
"error": {
"type": "rate-limit-exceeded",
"message": "You have exceeded the rate limit of ... requests per second"
}
}

See the API documentation for details and to learn what to do if you exceed the rate limit.

Data retention

Unnamed storages expire after 7 days unless otherwise specified. Named storages are retained indefinitely.

Preserving your storages

To preserve your storages indefinitely, give them a name. You can do this in Apify Console or using our API. First, you'll need your store's ID. You can find it in the details of the run that created it. In Apify Console, head over to your run's details and select the Dataset, Key-value store, or Request queue tab as appropriate. Check that store's details, and you will find its ID among them.

Finding your store's ID

Then, head over to the Storage menu, select the appropriate tab, and tick the Include unnamed [storages] box. Find and open your storage using the ID you just found, select the Settings tab, and enter its new name in the field. Your storage will now be preserved indefinitely.

To name your storage via API, get its ID from the run that generated it using the Get run endpoint. You can then give it a new name using the Update [storage] endpoint. For example, Update dataset.

The JavaScript SDK, Crawlee, The Python SDK, the JavaScript and Python clients have their own ways of naming storages - check their docs for details.

Named and unnamed storages

The default storages for an Actor run are created without a name (with only an ID). This allows them to expire after 7 days (on the free plan, longer on paid plans) and not take up your storage space. If you want to preserve a storage, simply give it a name, and it will be retained indefinitely.

Storages' names can be up to 63 characters long.

Named and unnamed storages are the same in all regards except their retention period. The only difference is that named storages make it easier to verify you are using the correct store.

For example, the storage names janedoe~my-storage-1 and janedoe~web-scrape-results are easier to tell apart than the alphanumerical IDs cAbcYOfuXemTPwnIB and CAbcsuZbp7JHzkw1B.

Sharing

You can invite other Apify users to view or modify your storages with the access rights system. See the full list of permissions.

Sharing storages between runs

Any storage can be accessed from any Actor or task run as long as you know its name or ID. You can access and manage storages from other runs using the same methods or endpoints as with storages from your current run.

Datasets and key-value stores can be used concurrently by multiple Actors. This means that multiple Actors or tasks running at the same time can write data to a single dataset or key-value store. The same applies for reading data – multiple runs can read data from datasets and key-value stores concurrently.

Request queues, on the other hand, only allow multiple runs to add new data. A request queue can only be processed by one Actor or task run at any one time.

When multiple runs try to write data to a storage at the same time, it isn't possible to control the order in which the data will be written. It will be written whenever the request is processed.
In key-value stores and request queues, the same applies for deleting records: if a request to delete a record is made shortly before a request to read that same record, the second request will fail.

Deleting storages

Named storages are only removed when you request it. You can delete storages in the following ways.