State persistence
Maintain a long-running Actor's state to prevent unexpected restarts. See a code example on how to prevent a run in the case of a server shutdown.
Long-running Actor jobs may need to migrate from one server to another. Unless you save your job's progress, it will be lost during the migration. The Actor will restart from scratch on the new server, which can be costly.
To avoid this, long-running Actors should save (persist) their state periodically and listen for migration events. When started, these Actors should check for persisted state, so they can continue where they left off.
For short-running Actors, the chance of a restart and the cost of repeated runs are low, so restarts can be ignored.
What is a migration?
A migration is when a process running on a server has to stop and move to another. All in-progress processes on the current server are stopped. Unless you have saved your state, the Actor run will restart on the new server. For example, if a request in your request queue has not been updated as crawled before the migration, it will be crawled again.
When a migration event occurs, you only have a few seconds to save your work.
Why do migrations happen?
- To optimize server workloads.
- When a server crashes (unlikely).
- When we release new features and fix bugs.
How often do migrations occur?
There is no specified interval at which migrations happen. They are caused by the above events, so they can happen at any time.
Why is state lost during migration?
Unless instructed to save its output or state to a storage, an Actor keeps them in the server's memory. So, when it switches servers, the run loses access to the previous server's memory. Even if data were saved on the server's disk, we would also lose access to that.
How to persist state
The Apify SDK (SDK for JavaScript, SDK for Python) persists its state automatically, using the migrating
and persistState
events. persistState
notifies SDK components to persist their state at regular intervals in case a migration happens. The migrating
event is emitted just before a migration.
Code examples
To persist state manually, you can use the Actor.on
method in the Apify SDK.
- JavaScript
- Python
import { Actor } from 'apify';
await Actor.init();
// ...
Actor.on('migrating', () => {
Actor.setValue('my-crawling-state', {
foo: 'bar',
});
});
// ...
await Actor.exit();
from apify import Actor
async def actor_migrate():
await Actor.set_value('my-crawling-state', {'foo': 'bar'})
async def main():
async with Actor:
# ...
Actor.on('migrating', actor_migrate)
# ...
To check for state saved in a previous run, use:
- JavaScript
- Python
import { Actor } from 'apify';
await Actor.init();
// ...
const previousCrawlingState = await Actor.getValue('my-crawling-state') || {};
// ...
await Actor.exit();
from apify import Actor
async def main():
async with Actor:
# ...
previous_crawling_state = await Actor.get_value('my-crawling-state')
# ...
To improve your Actor's performance, you can also cache repeated page data.