Skip to main content
Version: 1.3

Apify

The following section describes all functions and properties provided by the apify package, except individual classes and namespaces that have their separate, detailed, documentation pages accessible from the left sidebar. To learn how Apify SDK works, we suggest following the Getting Started tutorial.

Important:

The following functions: addWebhook, call, callTask and newClient invoke features of the Apify platform and require your scripts to be authenticated. See the authentication guide for instructions.

Apify Class

As opposed to those helper functions, there is an alternative approach using Apify class (a named export). It has mostly the same API, but the methods on Apify instance will use the configuration provided in the constructor. Environment variables will have precedence over this configuration.

const { Apify } = require('apify'); // use named export to get the class

const sdk = new Apify({ token: '123' });
console.log(sdk.config.get('token')); // '123'

// the token will be passed to the `call` method automatically
const run = await sdk.call('apify/hello-world', { myInput: 123 });
console.log(`Received message: ${run.output.body.message}`);

Another example shows how the default dataset name can be changed:

const { Apify } = require('apify'); // use named export to get the class

const sdk = new Apify({ defaultDatasetId: 'custom-name' });
await sdk.pushData({ myValue: 123 });

is equivalent to:

const Apify = require('apify'); // use default export to get the helper functions

const dataset = await Apify.openDataset('custom-name');
await dataset.pushData({ myValue: 123 });

See Configuration for details about what can be configured and what are the default values.


Apify.addWebhook(options)

Creates an ad-hoc webhook for the current actor run, which lets you receive a notification when the actor run finished or failed. For more information about Apify actor webhooks, please see the documentation.

Note that webhooks are only supported for actors running on the Apify platform. In local environment, the function will print a warning and have no effect.

Parameters:

  • options: object
    • eventTypes: EventTypes - Array of event types, which you can set for actor run, see the actor run events in the Apify doc.
    • requestUrl: string - URL which will be requested using HTTP POST request, when actor run will reach the set event type.
    • [payloadTemplate]: string - Payload template is a JSON-like string that describes the structure of the webhook POST request payload. It uses JSON syntax, extended with a double curly braces syntax for injecting variables {{variable}}. Those variables are resolved at the time of the webhook's dispatch, and a list of available variables with their descriptions is available in the Apify webhook documentation. If payloadTemplate is omitted, the default payload template is used (view docs).
    • [idempotencyKey]: string - Idempotency key enables you to ensure that a webhook will not be added multiple times in case of an actor restart or other situation that would cause the addWebhook() function to be called again. We suggest using the actor run ID as the idempotency key. You can get the run ID by calling Apify.getEnv() function.

Returns:

Promise<(WebhookRun|undefined)> - The return value is the Webhook object. For more information, see the Get webhook API endpoint.


Apify.call(actId, [input], [options])

Runs an actor on the Apify platform using the current user account (determined by the APIFY_TOKEN environment variable), waits for the actor to finish and fetches its output.

By passing the waitSecs option you can reduce the maximum amount of time to wait for the run to finish. If the value is less than or equal to zero, the function returns immediately after the run is started.

The result of the function is an ActorRun object that contains details about the actor run and its output (if any). If the actor run fails, the function throws the ApifyCallError exception.

If you want to run an actor task rather than an actor, please use the Apify.callTask() function instead.

For more information about actors, read the documentation.

Example usage:

const run = await Apify.call('apify/hello-world', { myInput: 123 });
console.log(`Received message: ${run.output.body.message}`);

Internally, the call() function invokes the Run actor and several other API endpoints to obtain the output.

Throws:

  • ApifyCallError If the run did not succeed, e.g. if it failed or timed out.

Parameters:

  • actId: string - Allowed formats are username/actor-name, userId/actor-name or actor ID.
  • [input]: Object<string, *> - Input for the actor. If it is an object, it will be stringified to JSON and its content type set to application/json; charset=utf-8. Otherwise the options.contentType parameter must be provided.
  • [options]: object - Object with the settings below:
    • [contentType]: string - Content type for the input. If not specified, input is expected to be an object that will be stringified to JSON and content type set to application/json; charset=utf-8. If options.contentType is specified, then input must be a String or Buffer.
    • [token]: string - User API token that is used to run the actor. By default, it is taken from the APIFY_TOKEN environment variable.
    • [memoryMbytes]: number - Memory in megabytes which will be allocated for the new actor run. If not provided, the run uses memory of the default actor run configuration.
    • [timeoutSecs]: number - Timeout for the actor run in seconds. Zero value means there is no timeout. If not provided, the run uses timeout of the default actor run configuration.
    • [build]: string - Tag or number of the actor build to run (e.g. beta or 1.2.345). If not provided, the run uses build tag or number from the default actor run configuration (typically latest).
    • [waitSecs]: number - Maximum time to wait for the actor run to finish, in seconds. If the limit is reached, the returned promise is resolved to a run object that will have status READY or RUNNING and it will not contain the actor run output. If waitSecs is null or undefined, the function waits for the actor to finish (default behavior).
    • [fetchOutput]: boolean = true - If false then the function does not fetch output of the actor.
    • [disableBodyParser]: boolean = false - If true then the function will not attempt to parse the actor's output and will return it in a raw Buffer.
    • [webhooks]: Array<AdhocWebhook> - Specifies optional webhooks associated with the actor run, which can be used to receive a notification e.g. when the actor finished or failed, see ad hook webhooks documentation for detailed description.

Returns:

Promise<ActorRun>


Apify.callTask(taskId, [input], [options])

Runs an actor task on the Apify platform using the current user account (determined by the APIFY_TOKEN environment variable), waits for the task to finish and fetches its output.

By passing the waitSecs option you can reduce the maximum amount of time to wait for the run to finish. If the value is less than or equal to zero, the function returns immediately after the run is started.

The result of the function is an ActorRun object that contains details about the actor run and its output (if any). If the actor run failed, the function fails with ApifyCallError exception.

Note that an actor task is a saved input configuration and options for an actor. If you want to run an actor directly rather than an actor task, please use the Apify.call() function instead.

For more information about actor tasks, read the documentation.

Example usage:

const run = await Apify.callTask('bob/some-task');
console.log(`Received message: ${run.output.body.message}`);

Internally, the callTask() function calls the Run task and several other API endpoints to obtain the output.

Throws:

  • ApifyCallError If the run did not succeed, e.g. if it failed or timed out.

Parameters:

  • taskId: string - Allowed formats are username/task-name, userId/task-name or task ID.
  • [input]: Object<string, *> - Input overrides for the actor task. If it is an object, it will be stringified to JSON and its content type set to application/json; charset=utf-8. Provided input will be merged with actor task input.
  • [options]: object - Object with the settings below:
    • [token]: string - User API token that is used to run the actor. By default, it is taken from the APIFY_TOKEN environment variable.
    • [memoryMbytes]: number - Memory in megabytes which will be allocated for the new actor task run. If not provided, the run uses memory of the default actor run configuration.
    • [timeoutSecs]: number - Timeout for the actor task run in seconds. Zero value means there is no timeout. If not provided, the run uses timeout of the default actor run configuration.
    • [build]: string - Tag or number of the actor build to run (e.g. beta or 1.2.345). If not provided, the run uses build tag or number from the default actor run configuration (typically latest).
    • [waitSecs]: number - Maximum time to wait for the actor task run to finish, in seconds. If the limit is reached, the returned promise is resolved to a run object that will have status READY or RUNNING and it will not contain the actor run output. If waitSecs is null or undefined, the function waits for the actor task to finish (default behavior).
    • [webhooks]: Array<AdhocWebhook> - Specifies optional webhooks associated with the actor run, which can be used to receive a notification e.g. when the actor finished or failed, see ad hook webhooks documentation for detailed description.

Returns:

Promise<ActorRun>


Apify.createProxyConfiguration([proxyConfigurationOptions])

Creates a proxy configuration and returns a promise resolving to an instance of the ProxyConfiguration class that is already initialized.

Configures connection to a proxy server with the provided options. Proxy servers are used to prevent target websites from blocking your crawlers based on IP address rate limits or blacklists. Setting proxy configuration in your crawlers automatically configures them to use the selected proxies for all connections.

For more details and code examples, see the ProxyConfiguration class.


// Returns initialized proxy configuration class
const proxyConfiguration = await Apify.createProxyConfiguration({
groups: ['GROUP1', 'GROUP2'] // List of Apify proxy groups
countryCode: 'US'
});

const crawler = new Apify.CheerioCrawler({
// ...
proxyConfiguration,
handlePageFunction: ({ proxyInfo }) => {
const usedProxyUrl = proxyInfo.url; // Getting the proxy URL
}
})

For compatibility with existing Actor Input UI (Input Schema), this function returns undefined when the following object is passed as proxyConfigurationOptions.

{ useApifyProxy: false }

Parameters:

Returns:

Promise<(ProxyConfiguration|undefined)>


Apify.events

Gets an instance of a Node.js' EventEmitter class that emits various events from the SDK or the Apify platform. The event emitter is initialized by calling the Apify.main() function.

Example usage:

Apify.events.on('cpuInfo', data => {
if (data.isCpuOverloaded) console.log('Oh no, the CPU is overloaded!');
});

The following events are emitted:

  • cpuInfo: { "isCpuOverloaded": Boolean } The event is emitted approximately every second and it indicates whether the actor is using the maximum of available CPU resources. If that's the case, the actor should not add more workload. For example, this event is used by the AutoscaledPool class.
  • migrating: void Emitted when the actor running on the Apify platform is going to be migrated to another worker server soon. You can use it to persist the state of the actor and gracefully stop your in-progress tasks, so that they are not interrupted by the migration. For example, this is used by the RequestList class.
  • persistState: { "isMigrating": Boolean } Emitted in regular intervals (by default 60 seconds) to notify all components of Apify SDK that it is time to persist their state, in order to avoid repeating all work when the actor restarts. This event is automatically emitted together with the migrating event, in which case the isMigrating flag is set to true. Otherwise the flag is false. Note that the persistState event is provided merely for user convenience, you can achieve the same effect using setInterval() and listening for the migrating event.

Apify.getEnv()

Returns a new ApifyEnv object which contains information parsed from all the APIFY_XXX environment variables.

For the list of the APIFY_XXX environment variables, see Actor documentation. If some of the variables are not defined or are invalid, the corresponding value in the resulting object will be null.

Returns:

ApifyEnv


Apify.getInput()

Gets the actor input value from the default KeyValueStore associated with the current actor run.

This is just a convenient shortcut for keyValueStore.getValue('INPUT'). For example, calling the following code:

const input = await Apify.getInput();

is equivalent to:

const store = await Apify.openKeyValueStore();
await store.getValue('INPUT');

Note that the getInput() function does not cache the value read from the key-value store. If you need to use the input multiple times in your actor, it is far more efficient to read it once and store it locally.

For more information, see Apify.openKeyValueStore() and KeyValueStore.getValue().

Returns:

Promise<(Object<string, *>|string|Buffer|null)> - Returns a promise that resolves to an object, string or Buffer, depending on the MIME content type of the record, or null if the record is missing.


Apify.getMemoryInfo()

Returns memory statistics of the process and the system, see MemoryInfo.

If the process runs inside of Docker, the getMemoryInfo gets container memory limits, otherwise it gets system memory limits.

Beware that the function is quite inefficient because it spawns a new process. Therefore you shouldn't call it too often, like more than once per second.

Returns:

Promise<MemoryInfo>


Apify.getValue(key)

Gets a value from the default KeyValueStore associated with the current actor run.

This is just a convenient shortcut for KeyValueStore.getValue(). For example, calling the following code:

const value = await Apify.getValue('my-key');

is equivalent to:

const store = await Apify.openKeyValueStore();
const value = await store.getValue('my-key');

To store the value to the default key-value store, you can use the Apify.setValue() function.

For more information, see Apify.openKeyValueStore() and KeyValueStore.getValue().

Parameters:

  • key: string - Unique record key.

Returns:

Promise<(Object<string, *>|string|Buffer|null)> - Returns a promise that resolves to an object, string or Buffer, depending on the MIME content type of the record, or null if the record is missing.


Apify.isAtHome()

Returns true when code is running on Apify platform and false otherwise (for example locally).

Returns:

boolean


Apify.launchPlaywright([launchContext])

Launches headless browsers using Playwright pre-configured to work within the Apify platform. The function has the same return value as browserType.launch(). See Playwright documentation for more details.

The launchPlaywright() function alters the following Playwright options:

  • Passes the setting from the APIFY_HEADLESS environment variable to the headless option, unless it was already defined by the caller or APIFY_XVFB environment variable is set to 1. Note that Apify Actor cloud platform automatically sets APIFY_HEADLESS=1 to all running actors.
  • Takes the proxyUrl option, validates it and adds it to launchOptions in a proper format. The proxy URL must define a port number and have one of the following schemes: http://, https://, socks4:// or socks5://. If the proxy is HTTP (i.e. has the http:// scheme) and contains username or password, the launchPlaywright functions sets up an anonymous proxy HTTP to make the proxy work with headless Chrome. For more information, read the blog post about proxy-chain library.

To use this function, you need to have the Playwright NPM package installed in your project. When running on the Apify Platform, you can achieve that simply by using the apify/actor-node-playwright-* base Docker image for your actor - see Apify Actor documentation for details.

Parameters:

Returns:

Promise<*> - Promise that resolves to Playwright's Browser instance.


Apify.launchPuppeteer([launchContext])

Launches headless Chrome using Puppeteer pre-configured to work within the Apify platform. The function has the same argument and the return value as puppeteer.launch(). See Puppeteer documentation for more details.

The launchPuppeteer() function alters the following Puppeteer options:

  • Passes the setting from the APIFY_HEADLESS environment variable to the headless option, unless it was already defined by the caller or APIFY_XVFB environment variable is set to 1. Note that Apify Actor cloud platform automatically sets APIFY_HEADLESS=1 to all running actors.
  • Takes the proxyUrl option, validates it and adds it to args as --proxy-server=XXX. The proxy URL must define a port number and have one of the following schemes: http://, https://, socks4:// or socks5://. If the proxy is HTTP (i.e. has the http:// scheme) and contains username or password, the launchPuppeteer functions sets up an anonymous proxy HTTP to make the proxy work with headless Chrome. For more information, read the blog post about proxy-chain library.

To use this function, you need to have the puppeteer NPM package installed in your project. When running on the Apify cloud, you can achieve that simply by using the apify/actor-node-chrome base Docker image for your actor - see Apify Actor documentation for details.

For an example of usage, see the Synchronous run Example or the Puppeteer proxy Example

Parameters:

  • [launchContext]: PuppeteerLaunchContext - All PuppeteerLauncher parameters are passed via an launchContext object. If you want to pass custom puppeteer.launch(options) options you can use the PuppeteerLaunchContext.launchOptions property.

Returns:

Promise<*> - Promise that resolves to Puppeteer's Browser instance.


Apify.main(userFunc)

Runs the main user function that performs the job of the actor and terminates the process when the user function finishes.

The Apify.main() function is optional and is provided merely for your convenience. It is mainly useful when you're running your code as an actor on the Apify platform. However, if you want to use Apify SDK tools directly inside your existing projects, e.g. running in an Express server, on Google Cloud functions or AWS Lambda, it's better to avoid it since the function terminates the main process when it finishes!

The Apify.main() function performs the following actions:

  • When running on the Apify platform (i.e. APIFY_IS_AT_HOME environment variable is set), it sets up a connection to listen for platform events. For example, to get a notification about an imminent migration to another server. See Apify.events for details.
  • It checks that either APIFY_TOKEN or APIFY_LOCAL_STORAGE_DIR environment variable is defined. If not, the functions sets APIFY_LOCAL_STORAGE_DIR to ./apify_storage inside the current working directory. This is to simplify running code examples.
  • It invokes the user function passed as the userFunc parameter.
  • If the user function returned a promise, waits for it to resolve.
  • If the user function throws an exception or some other error is encountered, prints error details to console so that they are stored to the log.
  • Exits the Node.js process, with zero exit code on success and non-zero on errors.

The user function can be synchronous:

Apify.main(() => {
// My synchronous function that returns immediately
console.log('Hello world from actor!');
});

If the user function returns a promise, it is considered asynchronous:

const { requestAsBrowser } = require('some-request-library');

Apify.main(() => {
// My asynchronous function that returns a promise
return request('http://www.example.com').then(html => {
console.log(html);
});
});

To simplify your code, you can take advantage of the async/await keywords:

const request = require('some-request-library');

Apify.main(async () => {
// My asynchronous function
const html = await request('http://www.example.com');
console.log(html);
});

Parameters:

  • userFunc: UserFunc - User function to be executed. If it returns a promise, the promise will be awaited. The user function is called with no arguments.

Apify.metamorph(targetActorId, [input], [options])

Transforms this actor run to an actor run of a given actor. The system stops the current container and starts the new container instead. All the default storages are preserved and the new input is stored under the INPUT-METAMORPH-1 key in the same default key-value store.

Parameters:

  • targetActorId: string - Either username/actor-name or actor ID of an actor to which we want to metamorph.
  • [input]: Object<string, *> - Input for the actor. If it is an object, it will be stringified to JSON and its content type set to application/json; charset=utf-8. Otherwise the options.contentType parameter must be provided.
  • [options]: object - Object with the settings below:
    • [contentType]: string - Content type for the input. If not specified, input is expected to be an object that will be stringified to JSON and content type set to application/json; charset=utf-8. If options.contentType is specified, then input must be a String or Buffer.
    • [build]: string - Tag or number of the target actor build to metamorph into (e.g. beta or 1.2.345). If not provided, the run uses build tag or number from the default actor run configuration (typically latest).

Returns:

Promise<void>


Apify.newClient([options])

Returns a new instance of the Apify API client. The ApifyClient class is provided by the apify-client NPM package, and it is automatically configured using the APIFY_API_BASE_URL, and APIFY_TOKEN environment variables. You can override the token via the available options. That's useful if you want to use the client as a different Apify user than the SDK internals are using.

Parameters:

  • [options]: object
    • [token]: string
    • [maxRetries]: string
    • [minDelayBetweenRetriesMillis]: string

Returns:

ApifyClient


Apify.openDataset([datasetIdOrName], [options])

Opens a dataset and returns a promise resolving to an instance of the Dataset class.

Datasets are used to store structured data where each object stored has the same attributes, such as online store products or real estate offers. The actual data is stored either on the local filesystem or in the cloud.

For more details and code examples, see the Dataset class.

Parameters:

  • [datasetIdOrName]: string - ID or name of the dataset to be opened. If null or undefined, the function returns the default dataset associated with the actor run.
  • [options]: Object
    • [forceCloud]: boolean = false - If set to true then the function uses cloud storage usage even if the APIFY_LOCAL_STORAGE_DIR environment variable is set. This way it is possible to combine local and cloud storage.

Returns:

Promise<Dataset>


Apify.openKeyValueStore([storeIdOrName], [options])

Opens a key-value store and returns a promise resolving to an instance of the KeyValueStore class.

Key-value stores are used to store records or files, along with their MIME content type. The records are stored and retrieved using a unique key. The actual data is stored either on a local filesystem or in the Apify cloud.

For more details and code examples, see the KeyValueStore class.

Parameters:

  • [storeIdOrName]: string - ID or name of the key-value store to be opened. If null or undefined, the function returns the default key-value store associated with the actor run.
  • [options]: object
    • [forceCloud]: boolean = false - If set to true then the function uses cloud storage usage even if the APIFY_LOCAL_STORAGE_DIR environment variable is set. This way it is possible to combine local and cloud storage.

Returns:

Promise<KeyValueStore>


Apify.openRequestList(listName, sources, [options])

Opens a request list and returns a promise resolving to an instance of the RequestList class that is already initialized.

RequestList represents a list of URLs to crawl, which is always stored in memory. To enable picking up where left off after a process restart, the request list sources are persisted to the key-value store at initialization of the list. Then, while crawling, a small state object is regularly persisted to keep track of the crawling status.

For more details and code examples, see the RequestList class.

Example usage:

const sources = ['https://www.example.com', 'https://www.google.com', 'https://www.bing.com'];

const requestList = await Apify.openRequestList('my-name', sources);

Parameters:

  • listName: string | null - Name of the request list to be opened. Setting a name enables the RequestList's state to be persisted in the key-value store. This is useful in case of a restart or migration. Since RequestList is only stored in memory, a restart or migration wipes it clean. Setting a name will enable the RequestList's state to survive those situations and continue where it left off.

    The name will be used as a prefix in key-value store, producing keys such as NAME-REQUEST_LIST_STATE and NAME-REQUEST_LIST_SOURCES.

    If null, the list will not be persisted and will only be stored in memory. Process restart will then cause the list to be crawled again from the beginning. We suggest always using a name.

  • sources: Array<(RequestOptions|Request|string)> - An array of sources of URLs for the RequestList. It can be either an array of strings, plain objects that define at least the url property, or an array of Request instances.

    IMPORTANT: The sources array will be consumed (left empty) after RequestList initializes. This is a measure to prevent memory leaks in situations when millions of sources are added.

Additionally, the requestsFromUrl property may be used instead of url, which will instruct RequestList to download the source URLs from a given remote location. The URLs will be parsed from the received response. In this case you can limit the URLs using regex parameter containing regular expression pattern for URLs to be included.

For details, see the RequestListOptions.sources

Returns:

Promise<RequestList>


Apify.openRequestQueue([queueIdOrName], [options])

Opens a request queue and returns a promise resolving to an instance of the RequestQueue class.

RequestQueue represents a queue of URLs to crawl, which is stored either on local filesystem or in the cloud. The queue is used for deep crawling of websites, where you start with several URLs and then recursively follow links to other pages. The data structure supports both breadth-first and depth-first crawling orders.

For more details and code examples, see the RequestQueue class.

Parameters:

  • [queueIdOrName]: string - ID or name of the request queue to be opened. If null or undefined, the function returns the default request queue associated with the actor run.
  • [options]: object
    • [forceCloud]: boolean = false - If set to true then the function uses cloud storage usage even if the APIFY_LOCAL_STORAGE_DIR environment variable is set. This way it is possible to combine local and cloud storage.

Returns:

Promise<RequestQueue>


Apify.openSessionPool(sessionPoolOptions)

Opens a SessionPool and returns a promise resolving to an instance of the SessionPool class that is already initialized.

For more details and code examples, see the SessionPool class.

Parameters:

Returns:

Promise<SessionPool>


Apify.pushData(item)

Stores an object or an array of objects to the default Dataset of the current actor run.

This is just a convenient shortcut for Dataset.pushData(). For example, calling the following code:

await Apify.pushData({ myValue: 123 });

is equivalent to:

const dataset = await Apify.openDataset();
await dataset.pushData({ myValue: 123 });

For more information, see Apify.openDataset() and Dataset.pushData()

IMPORTANT: Make sure to use the await keyword when calling pushData(), otherwise the actor process might finish before the data are stored!

Parameters:

  • item: object - Object or array of objects containing data to be stored in the default dataset. The objects must be serializable to JSON and the JSON representation of each object must be smaller than 9MB.

Returns:

Promise<void>


Apify.setValue(key, value, [options])

Stores or deletes a value in the default KeyValueStore associated with the current actor run.

This is just a convenient shortcut for KeyValueStore.setValue(). For example, calling the following code:

await Apify.setValue('OUTPUT', { foo: 'bar' });

is equivalent to:

const store = await Apify.openKeyValueStore();
await store.setValue('OUTPUT', { foo: 'bar' });

To get a value from the default key-value store, you can use the Apify.getValue() function.

For more information, see Apify.openKeyValueStore() and KeyValueStore.getValue().

Parameters:

  • key: string - Unique record key.
  • value: * - Record data, which can be one of the following values:
    • If null, the record in the key-value store is deleted.
    • If no options.contentType is specified, value can be any JavaScript object and it will be stringified to JSON.
    • If options.contentType is set, value is taken as is and it must be a String or Buffer. For any other value an error will be thrown.
  • [options]: object
    • [contentType]: string - Specifies a custom MIME content type of the record.

Returns:

Promise<void>