search docs
Edit on GitHub

Running

Documentation of Apify actors - serverless computing jobs that enable execution of long-running web scraping and automation tasks in the cloud.

An Apify actor can be invoked in a number of ways. One option is to start the actor from the Developer console in the app:

Apify developer console

The actor's owner can specify its default settings in the actor's Settings tab. If the actor caller does not specify a particular setting either in the Input or Options tabs, the default value is used.

The following table describes the default actor settings:

Build Tag or number of the build to run (e.g. latest or 1.2.34).
Timeout Timeout for the actor run in seconds. Zero value means there is no timeout.
Memory Amount of memory allocated for the actor run, in megabytes.

The actor can also be invoked using the Apify API by sending a HTTP POST request to the Run actor API endpoint, such as:

https://api.apify.com/v2/acts/apify~hello-world/runs?token=<YOUR_API_TOKEN>

The actor's input and its content type can be passed as a payload of the POST request and additional options can be specified using URL query parameters. For more details, see the Run actor section in the API reference.

Actors can also be invoked programmatically from other actors using the call() function provided by the apify NPM package. For example:

const run = await Apify.call('apify/hello-world', { 
    message: 'Hello!'
});
console.dir(run.output);

The newly started actor runs under the same user account as the initial actor and therefore all resources consumed are charged to the same user account. This allows more complex actors to be built using simpler actors built and owned by other users.

Internally, the call() function takes the user's API token from the APIFY_TOKEN environment variable, then it invokes the Run actor API endpoint, waits for the actor to finish and reads its output using the Get record API endpoint.

Resource limits

Actors run inside a Docker container whose resources are limited. When invoking the actor, the caller has to specify the amount of memory allocated for the actor. Additionally, each user has a certain total limit of memory for running actors. The sum of memory allocated for all running actors and builds needs to fit into this limit, otherwise the user cannot start a new actor. For more details, see Limits.

The share of CPU is computed automatically from the memory as follows: for each 4096 MB of memory, the actor gets 1 full CPU core. For other amounts of memory the number of CPU cores is computed fractionally. For example, an actor with 1024 MB of memory will have a 1/4 share of a CPU core.

The actor has hard disk space limited by twice the amount of memory. For example, an actor with 1024 MB of memory will have 2048 MB of disk available.

Lifecycle

Each run starts with the initial status READY and goes through one or more transitional statuses to one of the terminal statuses.

Status Type Description
READY initial Started but not allocated to any worker yet
RUNNING transitional Executing on a worker
SUCCEEDED terminal Finished successfully
FAILED terminal Run failed
TIMING-OUT transitional Timing out now
TIMED-OUT terminal Timed out
ABORTING transitional Being aborted by user
ABORTED terminal Aborted by user

Resurrection of finished run

Any actor run in terminal state, i.e. run with status FINISHED, FAILED, ABORTED and TIMED-OUT, might be resurrected back to a RUNNING state. This is helpful in many cases, for example when the timeout for actor run was too low or any a case of an unexpected error.

The whole process of resurrection looks as follows:

  • Run status will be updated to a RUNNING and its container will be restarted with the same storages (the same behaviour as when the run gets migrated to the new server).
  • Existing run log will be discarded. If you need to backup it then please download it before you resurrect this run.
  • Updated duration will include the time when actor was not running. This does not affect compute units consumption.
  • Timeout will be counted from the point when this actor run was resurrected.

Resurrection can be peformed in Apify app using the resurrect button or via API using the resurrect run API endpoint.

Container web server

Each actor run is assigned a unique hard-to-guess URL (e.g. http://kmdo7wpzlshygi.runs.apify.net), which enables HTTP access to an optional web server running inside the actor run's Docker container. The URL is available in the following places:

  • In the web application, on the actor run details page as the Container URL field.
  • In the API as the containerUrl property of the Run object.
  • In the actor run's container as the APIFY_CONTAINER_URL environment variable.

The web server running inside the container must listen at the port defined by the APIFY_CONTAINER_PORT environment variable (typically 4321). If you want to use another port, simply define the APIFY_CONTAINER_PORT environment variable with the desired port number in your actor version configuration - see Custom environment variable for details.

The following example demonstrates how to start a simple web server in your actor:

const Apify = require('apify');
const express = require('express');

const app = express()
const port = process.env.APIFY_CONTAINER_PORT;

app.get('/', (req, res) => {
    res.send('Hello World!');
});

app.listen(port, () => console.log(`Web server is listening
    and can be accessed at
    ${process.env.APIFY_CONTAINER_URL}!`))

Apify.main(async () => {
    // Let the actor run for an hour.
    await Apify.utils.sleep(60 * 60 * 1000);
});

Data retention

Actor run gets deleted along with its default storages (key-value store, dataset, request queue) after a data retention period which is based on the subscription plan of a user.