Skip to main content

Usage

Learn how to effectively use Apify's storage options. Understand key aspects of data retention, rate limiting, and secure sharing.


Dataset

Dataset storage allows you to store a series of data objects, such as results from web scraping, crawling, or data processing jobs. You can export your datasets in JSON, CSV, XML, RSS, Excel, or HTML formats.

Dataset graphic

Key-value store

The key-value store is ideal for saving data records such as files, screenshots of web pages, and PDFs or for persisting your Actor's state. The records are accessible under a unique name and can be written and read quickly.

Key-value store graphic

Request queue

Request queues allow you to dynamically maintain a queue of URLs of web pages. You can use this when recursively crawling websites: you start from initial URLs and add new links as they are found while skipping duplicates.

Request queue graphic

Basic usage

There are several ways to access your storage:

Apify Console

To access your storages via Apify Console, navigate to the Storage section in the left-side menu. From there, you can click through the tabs to view your key-value stores, datasets, and request queues, and you can click on the API button in the top right corner to view related API endpoints. To view a storage, click its ID.

Storages in app

Use the Include unnamed storages checkbox to either display or hide unnamed storages. By default Apify Console will display them.

You can edit your store's name by clicking on the Actions menu and selecting Rename.

Additionally, you can quickly share the contents and details of your storage by selecting Share under the Actions menu and providing either email, username or user ID.

Storage API

These URLs link to API endpoints—the places where your data is stored. Endpoints that allow you to read stored information do not require an authentication token. Calls are authenticated using a hard-to-guess ID, allowing for secure sharing. However, operations such as update or delete require the authentication token.

Never share a URL containing your authentication token, to avoid compromising your account's security.
If the data you want to share requires a token, first download the data, then share it as a file.

JavaScript SDK

The Apify JavaScript SDK is a JavaScript/Node.js library that provides tools for building your own Actors. Requires Node.js 16 or later.

Python SDK

The Apify Python SDK is a Python library providing tools to build your own Actors. Requires Python 3.8 or above.

JavaScript API client

The Apify JavaScript API client (apify-client) allows you to access your datasets from any Node.js application, whether it is running on the Apify platform or externally.

Go to the client's documentation for help with setup.

Python API client

The Apify Python API client (apify-client) allows you to access your datasets from any Python application, whether it is running on the Apify platform or externally. Requires Python 3.8 or above.

Go to the client's documentation for help with setup.

Apify API

The Apify API allows you to access your storages programmatically using HTTP requests and easily share your crawling results.

In most cases, when accessing your storages via API, you will need to provide a store ID, which you can do in the following formats:

  • WkzbQMuFYuamGv3YF - the store's alphanumerical ID if the store is unnamed.
  • ~store-name - the store's name prefixed with tilde (~) character if the store is named (e.g. ~ecommerce-scraping-results)
  • username~store-name - username and the store's name separated by a tilde (~) character if the store is named and belongs to a different account (e.g. janedoe~ecommerce-scraping-results). Note that in this case, the store's owner needs to grant you access first.

For read (GET) requests, it is enough to use a store's alphanumerical ID, since the ID is hard to guess and effectively serves as an authentication key.

With other request types and when using the username~store-name, however, you will need to provide your secret API token in your request's Authorization header or as a query parameter. You can find your token on the Integrations page of your Apify account.

For further details and a breakdown of each storage API endpoint, refer to the API documentation.

Rate limiting

All API endpoints limit their rate of requests to protect Apify servers from overloading. The default rate limit for storage objects is 30 requests per second. However, there are exceptions limited to 200 requests per second per storage object, including:

If a client exceeds this limit, the API endpoints responds with the HTTP status code 429 Too Many Requests and the following body:

{
"error": {
"type": "rate-limit-exceeded",
"message": "You have exceeded the rate limit of ... requests per second"
}
}

Go to the API documentation for details and to learn what to do if you exceed the rate limit.

Data retention

Named datasets are retained indefinitely. Unnamed datasets expire after 7 days unless otherwise specified.

Preserving your storages

To ensure indefinite retention of your storages, assign them a name. This can be done via Apify Console or through our API. First, you'll need your store's ID. You can find it in the details of the run that created it. In Apify Console, head over to your run's details and select the Dataset, Key-value store, or Request queue tab as appropriate. Check that store's details, and you will find its ID among them.

Finding your store's ID

Find and open your storage by clicking the ID, click on the Actions menu, choose Rename, and enter its new name in the field. Your storage will now be preserved indefinitely.

To name your storage via API, get its ID from the run that generated it using the Get run endpoint. You can then give it a new name using the Update \[storage\] endpoint. For example, Update dataset.

Our SDKs and clients each have unique naming conventions for storages. For more information check out documentation:

SDKs:

Clients:

Named and unnamed storages

The default storages for an Actor run are unnamed, identified only by an ID. This allows them to expire after 7 days (or longer on paid plans) conserving your storage space. If you want to preserve a storage, assign it a name, and it will be retained indefinitely.

Storages' names can be up to 63 characters long.

Named and unnamed storages are identical in all aspects except for their retention period. The key advantage of named storages is their ease in identifying and verifying the correct store.

For example, storage names janedoe~my-storage-1 and janedoe~web-scrape-results are easier to tell apart than the alphanumerical IDs cAbcYOfuXemTPwnIB and CAbcsuZbp7JHzkw1B.

Sharing

You can grant access rights to others Apify users to view or modify your storages. Check the full list of permissions.

Sharing storages between runs

Storage can be accessed from any Actor or task run, provided you have its name or ID. You can access and manage storages from other runs using the same methods or endpoints as with storages from your current run.

Datasets and key-value stores support concurrent use by multiple Actors. Thus, several Actors or tasks can simultaneously write data to a single dataset or key-value store. Similarly, multiple runs can read data from datasets and key-value stores at the same time.

Request queues, on the other hand, only allow multiple runs to add new data. A request queue can only be processed by one Actor or task run at any one time.

When multiple runs try to write data to a storage simultaneously, the order of data writing cannot be controlled. Data is written as each request is processed.
Similar principle applies in key-value stores and request queues, when a delete request for a record precedes a read request for the same record, the read request will fail.

Deleting storages

Named storages are only removed upon your request.
You can delete storages in the following ways: