Storage clients
Storage clients are the components that read and write your storages: datasets, key-value stores, and request queues. The Apify SDK selects an appropriate client automatically based on where the Actor runs. For most Actors you never need to think about them. This page explains the available clients and how to customize them when you do.
How the Actor selects a storage client
By default, the Actor uses a SmartApifyStorageClient, a hybrid client that delegates to one of two underlying clients depending on the environment:
- When running on the Apify platform (detected automatically), or when you pass
force_cloud=True, it uses the cloud client,ApifyStorageClient, which persists data through the Apify API. - When running locally, it uses the local client,
FileSystemStorageClient, which emulates platform storages on your filesystem under thestoragefolder.
As a result, the same Actor code can run unchanged both locally and on the platform.
Available storage clients
The apify.storage_clients module provides the following clients:
SmartApifyStorageClient- the default hybrid client. It wraps acloud_storage_clientand alocal_storage_clientand routes each call to the right one.ApifyStorageClient- talks to the Apify API. Used as the cloud client.FileSystemStorageClient- persists data to the local filesystem. Used as the default local client.MemoryStorageClient- keeps everything in memory and persists nothing. Useful for tests and short-lived runs.
All of these clients implement Crawlee's StorageClient interface, so any of them can be used as a sub-client of SmartApifyStorageClient. For details, see customizing the storage client.
Crawlee additionally ships storage clients backed by a self-hosted database: a RedisStorageClient and a SqlStorageClient. The Apify SDK doesn't re-export these, because they each require an extra dependency that the SDK doesn't install. To use one of the storage clients:
- Install the matching Crawlee extra (
crawlee[redis],crawlee[sql-postgres], orcrawlee[sql-sqlite]). - Import the client from
crawlee.storage_clientsand pass it as a sub-client ofSmartApifyStorageClient.
For details, see the Crawlee storage clients guide.
Single vs. shared request queue
ApifyStorageClient supports two ways of accessing the Apify request queue, selected via its request_queue_access argument:
'single'(default) - optimized for a single consumer. It makes fewer API calls, so it is cheaper and faster, but it doesn't support multiple clients consuming the same queue concurrently. This is the right choice for the majority of Actors.'shared'- supports multiple consumers working on the same queue at the same time, at the cost of more API calls.
To opt into the shared client, set it as the cloud client of the SmartApifyStorageClient in the service locator before entering the Actor context:
import asyncio
from crawlee import service_locator
from apify import Actor
from apify.storage_clients import ApifyStorageClient, SmartApifyStorageClient
async def main() -> None:
# Use the shared Apify request queue client, which supports multiple
# consumers working on the same queue at the cost of more API calls.
cloud_storage_client = ApifyStorageClient(request_queue_access='shared')
service_locator.set_storage_client(
SmartApifyStorageClient(cloud_storage_client=cloud_storage_client),
)
async with Actor:
request_queue = await Actor.open_request_queue()
await request_queue.add_request('https://crawlee.dev')
if __name__ == '__main__':
asyncio.run(main())
Using cloud storage while running locally
When developing locally, storages are read from and written to the local filesystem by default. To work with a storage on the Apify platform instead (for example, to read the output of a remote Actor run), pass force_cloud=True to Actor.open_dataset, Actor.open_key_value_store, or Actor.open_request_queue. This requires an Apify token, provided via the APIFY_TOKEN environment variable.
Customizing the storage client
You can replace either of the underlying clients, for example to keep all local data in memory instead of on disk. To do this, set a SmartApifyStorageClient with your chosen sub-clients in the service locator before entering the Actor context (or awaiting Actor.init):
import asyncio
from crawlee import service_locator
from apify import Actor
from apify.storage_clients import MemoryStorageClient, SmartApifyStorageClient
async def main() -> None:
# Keep all local data in memory instead of writing it to the filesystem
# when running outside the Apify platform.
local_storage_client = MemoryStorageClient()
service_locator.set_storage_client(
SmartApifyStorageClient(local_storage_client=local_storage_client),
)
async with Actor:
store = await Actor.open_key_value_store()
await store.set_value('example', {'hello': 'world'})
if __name__ == '__main__':
asyncio.run(main())
The Actor's storage client must be a SmartApifyStorageClient. Setting a bare ApifyStorageClient or MemoryStorageClient directly in the service locator raises an error. Wrap it in a SmartApifyStorageClient as shown above.
Conclusion
This page has explained how the Actor selects a storage client, the clients available in the apify.storage_clients module, the difference between the single and shared request-queue clients, and how to customize the client through the service locator.
For a deeper look at how storage clients work and how to write your own, see the Crawlee storage clients guide.