Storage
Store anything from images and key-value pairs to structured output data. Learn how to access and manage your stored data from the Apify platform or via API.
The Apify platform includes three types of storage you can use both in your actors and outside the Apify platform via API, the Apify SDK and Apify's JavaScript API client and Python API client.
This page contains a brief introduction of the three types of Apify Storage.
- Dataset - storage for data objects such as scraping output.
- Key-value store - storage for arbitrary data records such as files, images, and strings.
- Request queue - a queue of URLs for your actors to visit.
You will then find basic usage information relating to all three types of storage. For example, how to manage your storages in Apify Console, the basics of setting up the Apify SDK, the JavaScript API client and the Python API client, and general information for using storages with the Apify API.
Dataset
Dataset storage allows you to store a series of data objects such as results from web scraping, crawling or data processing jobs. You can export your datasets in JSON, CSV, XML, RSS, Excel or HTML formats.
The easiest way to access your datasets is via Apify Console, which provides a user-friendly interface for viewing or downloading the data and editing your datasets' properties.
To manage your datasets, you can use the Apify SDK, JavaScript API client, Python API client, or the Apify API.
See the dataset documentation for details.
Key-value store
The key-value store is ideal for saving data records such as files, screenshots of web pages, and PDFs or for persisting your actors' state. The records are accessible under a unique name and can be written and read quickly.
The easiest way to access your key-value stores is via Apify Console, which provides a user-friendly interface for viewing or downloading the data and editing your key-value stores' properties.
To manage your key-value stores, you can use the Apify SDK, JavaScript API client, Python API client, or the Apify API.
See the key-value store documentation for details.
Request queue
Request queues allow you to dynamically maintain a queue of URLs of web pages. You can use this when recursively crawling websites: you start from initial URLs and add new links as they are found while skipping duplicates.
The easiest way to access your request queues is via Apify Console, which provides a user-friendly interface for viewing your request queues and editing your queues' properties.
To manage your request queues, you can use the Apify SDK, JavaScript API client, Python API client, or the Apify API.
See the request queue documentation for details.
Basic usage
There are five ways to access your storage:
- Apify Console - provides an easy-to-use interface [details].
- Apify SDK - Request/Result storage - when building your own Apify actor [details].
- JavaScript API client - to access your storages from any Node.js application [details].
- Python API client - to access your storages from any Python application [details].
- Apify API - for accessing your storages programmatically [details].
Apify Console
To access your storages from Apify Console, go to the Storage section in the left-side menu. From there, you can click through the tabs to view your key-value stores, datasets, request queues and related API endpoints. To view a storage, click its ID.
Only named storages are displayed by default. Select the Include unnamed store checkbox to display all of your storages.
You can edit your stores' names by clicking their caption (ID or name) on their detail page.
Under the Settings tab of their detail page, you can grant access rights to other Apify users.
You can quickly share your storages' contents and details by sharing the URLs you find under the API tab in a store's detail page.
These URLs provide links to API endpoints–the places where your data are stored. Endpoints that allow you to read stored information do not require an authentication token. The calls are authenticated using a hard-to-guess ID, so they can be shared freely. Operations such as update or delete, however, will need the authentication token.
Never share a URL containing your authentication token, as this will compromise your account's security.
If the data you want to share requires a token, first download the data, then share it as a file.
Apify SDK and Crawlee
The Apify SDK is a JavaScript/Node.js library providing tools to build your own actors. Crawlee is a JavaScript/Node.js library which allows you to build your own web scraping and automation solutions (formerly was a part of Apify SDK). Both libraries require Node.js 16 or later.
See Crawlee documentation for setup instructions and to learn how to build your own crawlers and run them on the Apify platform.
JavaScript API client
Apify's JavaScript API client (apify-client
) allows you to access your datasets from any Node.js application, whether it is running on the Apify platform or elsewhere.
See the client's documentation for help with setup.
Python API client
Apify's Python API client (apify-client
) allows you to access your datasets from any Python application, whether it is running on the Apify platform or elsewhere.
See the client's documentation for help with setup.
Apify API
The Apify API allows you to access your storages programmatically using HTTP requests and easily share your crawling results.
In most cases, when accessing your storages via API, you will need to provide a store ID, which you can do in the following formats:
- WkzbQMuFYuamGv3YF - the store's alphanumerical ID if the store is unnamed.
- ~store-name - the store's name prefixed with tilde (
~
) character if the store is named (e.g. ~ecommerce-scraping-results) - username~store-name - username and the store's name separated by a tilde (
~
) character if the store is named and belongs to a different account (e.g. janedoe~ecommerce-scraping-results). Note that in this case, the store's owner needs to grant you access first.
For read (GET) requests, it is enough to use a store's alphanumerical ID, since the ID is hard to guess and effectively serves as an authentication key.
With other request types and when using the username~store-name, however, you will need to provide your secret API token in your request's Authorization
header or as a query parameter. You can find your token on the Integrations page of your Apify account.
See the API documentation for details and a breakdown of each storage API endpoint.
Rate limiting
All API endpoints limit their rate of requests to protect Apify servers from overloading. The default rate limit is 30 requests per second per storage object, with a few exceptions, which are limited to 200 requests per second per storage object:
- Push items to dataset.
- CRUD (add, get, update, delete) operations of request queue requests.
If a client sends too many requests, the API endpoints respond with the HTTP status code 429 Too Many Requests
and the following body:
{
"error": {
"type": "rate-limit-exceeded",
"message": "You have exceeded the rate limit of ... requests per second"
}
}
See the API documentation for details and to learn what to do if you exceed the rate limit.
Data retention
Unnamed storages expire after 7 days unless otherwise specified. Named storages are retained indefinitely.
Preserving your storages
To preserve your storages indefinitely, give them a name. You can do this in Apify Console or using our API. First, you'll need your store's ID. You can find it in the details of the run that created it. In Apify Console, head over to your run's details and select the Dataset, Key-value store, or Request queue tab as appropriate. Check that store's details, and you will find its ID among them.
Then, head over to the Storage menu, select the appropriate tab, and tick the Include unnamed [storages] box. Find and open your storage using the ID you just found, select the Settings tab, and enter its new name in the field. Your storage will now be preserved indefinitely.
To name your storage via API, get its ID from the run that generated it using the Get run endpoint. You can then give it a new name using the Update [storage] endpoint. For example, Update dataset.
The Apify SDK, Crawlee, the JavaScript and Python clients have their own ways of naming storages - check their docs for details.
Named and unnamed storages
The default storages for an actor run are created without a name (with only an ID). This allows them to expire after 7 days (on the free plan, longer on paid plans) and not take up your storage space. If you want to preserve a storage, simply give it a name, and it will be retained indefinitely.
Storages' names can be up to 63 characters long.
Named and unnamed storages are the same in all regards except their retention period. The only difference is that named storages make it easier to verify you are using the correct store.
For example, the storage names janedoe~my-storage-1 and janedoe~web-scrape-results are easier to tell apart than the alphanumerical IDs cAbcYOfuXemTPwnIB and CAbcsuZbp7JHzkw1B.
Sharing
You can invite other Apify users to view or modify your storages with the access rights system. See the full list of permissions.
Sharing storages between runs
Any storage can be accessed from any Actor or task run as long as you know its name or ID. You can access and manage storages from other runs using the same methods or endpoints as with storages from your current run.
Datasets and key-value stores can be used concurrently by multiple actors. This means that multiple actors or tasks running at the same time can write data to a single dataset or key-value store. The same applies for reading data – multiple runs can read data from datasets and key-value stores concurrently.
Request queues, on the other hand, only allow multiple runs to add new data. A request queue can only be processed by one actor or task run at any one time.
When multiple runs try to write data to a storage at the same time, it isn't possible to control the order in which the data will be written. It will be written whenever the request is processed.
In key-value stores and request queues, the same applies for deleting records: if a request to delete a record is made shortly before a request to read that same record, the second request will fail.
Deleting storages
Named storages are only removed when you request it. You can delete storages in the following ways.
- Apify Console - using the Actions button in the store's detail page.
- Apify SDK - using the
.drop()
method of the Dataset, Key-value store, or Request queue class. - JavaScript API client - using the
.delete()
method in the dataset, key-value store, or request queue clients. - Python API client - using the
.delete()
method in the dataset, key-value store, or request queue clients. - API using the - Delete [store] endpoint, where [store] is the type of storage you want to delete.