search docs
Edit

Storage

Store anything from images and key-value pairs to structured output data. Learn how to access and manage your stored data from the Apify platform or via API.

The Apify platform includes three types of storage you can use both in your actors and outside the Apify platform via API, the Apify SDK and Apify's JavaScript API client.

This page contains a brief introduction of the three types of Apify Storage.

  • Dataset - storage for data objects such as scraping output.
  • Key-value store - storage for arbitrary data records such as files, images, and strings.
  • Request queue - a queue of URLs for your actors to visit.

You will then find basic usage information relating to all three types of storage. For example, how to manage your storages in the Apify app, the basics of setting up the Apify SDK and JavaScript API client, and general information for using storages with the Apify API.

Dataset

Dataset storage allows you to store a series of data objects such as results from web scraping, crawling or data processing jobs. You can export your datasets in JSON, CSV, XML, RSS, Excel or HTML formats.

Dataset graphic

The easiest way to access your datasets is via the Apify app, which provides a user-friendly interface for viewing or downloading the data and editing your datasets' properties.

To add data to your datasets (and for more management options), you can use the Apify SDK, Apify's JavaScript API client or the Apify API.

For more information, see the dataset documentation page.

Key-value store

The key-value store is ideal for saving data records such as files, screenshots of web pages, and PDFs or for persisting your actors' state. The records are accessible under a unique name and can be written and read quickly.

Key-value store graphic

The easiest way to access your key-value stores is via the Apify app, which provides a user-friendly interface for viewing or downloading the data and editing your key-value stores' properties.

To manage the data in your key-value stores (and for more access options), you can use the Apify SDK, Apify's JavaScript API client or the Apify API.

For more information, see the key-value store documentation page.

Request queue

Request queues allow you to dynamically maintain a queue of URLs of web pages. You can use this in recursively crawling websites: you start from initial URLs and add new links as they are found while skipping duplicates.

Request queue graphic

The easiest way to access your request queues is via the Apify app, which provides a user-friendly interface for viewing your request queues and editing your queues' properties.

To manage your request queues, you can use the Apify SDK, Apify's JavaScript API client or the Apify API.

For more information, see the request queue documentation page.

Basic usage

There are four ways to access your storage:

Apify app

To access your storages from the Apify app, go to the Storage section in the left-side menu. From there, you can click through the tabs to view your key-value stores, datasets, request queues and related API endpoints. To view a storage, click its ID.

Storages in app

Only named storages are displayed by default. Select the Include unnamed store checkbox to display all of your storages.

You can edit your stores' names under the Settings tab of their detail page. There, you can also grant access rights to other Apify users.

You can quickly share your storages' contents and details by sharing the URLs you find under the API tab in a store's detail page.

Storage API

These URLs provide links to API endpoints–the places where your data are stored. Endpoints that allow you to read stored information do not require an authentication token. The calls are authenticated using a hard-to-guess ID, so they can be shared freely. Operations such as update or delete, however, will need the authentication token.

Never share a URL containing your authentication token, as this will compromise your account's security.
If the data you want to share requires a token, first download the data, then share it as a file.

Apify SDK

The Apify SDK is a JavaScript/Node.js library which allows you to build your own web scraping and automation solutions. It requires Node.js 10.17 or later, with the exception of Node.js 11.

For setup instructions and to learn how to build your own actors, visit the SDK documentation.

JavaScript API client

Apify's JavaScript API client (apify-client) allows you to access your datasets from any Node.js application, whether it is running on the Apify platform or elsewhere.

To use apify-client in your application, you will first need to have Node.js version 10 or higher installed.

You can then install the apify-client package from NPM using the command below in your terminal.

npm install apify-client

Once installed, require the apify-client package in your app and create a new instance of it using your user ID and secret API token (you can find these on the Integrations page of your Apify account).

1
2
3
4
5
6
7
8
9
// Import the `apify-client` package
const ApifyClient = require('apify-client');

// Create a new instance of the client
// and configure it to use your credentials
const apifyClient = new ApifyClient({
    userId: 'RWnGtczasdwP63Mak',
    token: 'f5J7XsdaKDyRywwuGGo9',
});

Apify API

The Apify API allows you to access your storages programmatically using HTTP requests and easily share your crawling results.

In most cases, when accessing your storages via API, you will need to provide a store ID, which you can do in the following formats:

  • WkzbQMuFYuamGv3YF - the store's alpha-numerical ID if the store is unnamed
  • username~store-name - your username and the store's name separated by a tilde (~) character (e.g. janedoe~ecommerce-scraping-results) if the store is named

For read (GET) requests, it is enough to use a store's alpha-numerical ID, since the ID is hard to guess and effectively serves as an authentication key.

With other request types and when using the username~store-name, however, you will need to provide your secret API token as a query parameter. You can find your token on the Integrations page of your Apify account.

For more information and a detailed breakdown of each storage API endpoint, see the API documentation.

Rate limiting

All API endpoints limit their rate of requests to protect Apify servers from overloading. The default rate limit is 30 requests per second per storage object, with a few exceptions, which are limited to 200 requests per second per storage object:

If a client sends too many requests, the API endpoints respond with the HTTP status code 429 Too Many Requests and the following body:

{
    "error": {
        "type": "rate-limit-exceeded",
        "message": "You have exceeded the rate limit of ... requests per second"
    }
}

See the API documentation for more details and to learn what to do if you exceed the rate limit.

Data retention

Unnamed storages expire after 7 days unless otherwise specified.

Named storages are retained indefinitely.

You can edit your storages' names in the Apify app or using the access methods above.

Named and unnamed storages

All storages are created without a name (with only an ID). This allows them to expire after 7 days and not take up your storage space. If you want to preserve a storage, simply give it a name and it will be retained indefinitely.

Storages' names can be up to 63 characters long.

Named and unnamed storages are the same in all regards except their retention period. The only difference is that named storages make it easier to verify you are using the correct store.

For example, the storage names janedoe~my-storage-1 and janedoe~web-scrape-results are easier to tell apart than the alpha-numerical IDs cAbcYOfuXemTPwnIB and CAbcsuZbp7JHzkw1B.

Sharing

You can invite other Apify users to view or modify your storages using the access rights system. See the full list of permissions here.

Sharing storages between runs

Any storage can be accessed from any actor or task run as long as you know its name or ID. You can access and manage storages from other runs using the same methods or endpoints as with storages from your current run.

Datasets and key-value stores can be used concurently by multiple actors. This means that multiple actors or tasks running at the same time can write data to a single dataset or key-value store. The same applies for reading data–multiple runs can read data from datasets and key-value stores concurrently.

Request queues, on the other hand, only allow multiple runs to add new data. A request queue can only be processed by one actor or task run at any one time.

When multiple runs try to write data to a storage at the same time, it isn't possible to control the order in which the data will be written. It will be written whenever the request is processed.
In key-value stores and request queues, the same applies for deleting records: if a request to delete a record is made shortly before a request to read that same record, the second request will fail.

Deleting storages

Named storages are only removed when you request it. You can delete storages in the following ways.