search docs
Edit on GitHub

Source code

Documentation of Apify actors - storing your actor's source code.

The Source type setting determines the location of the source code for the actor. It can have one of the following values: Single JavaScript file, Multiple source files, Git repository, Zip file or GitHub Gist.

Single JavaScript file

The source code of the actor can be hosted directly on Apify. All the code needs to be in a single file and written in JavaScript / Node.js. The version of Node.js is determined by the Base image setting - see base Docker images for the description of possible options.

The hosted source is especially useful for simple actors. The source code can require arbitrary NPM packages. For example:

const _ = require('underscore');
const request = require('request');

During the build process, the source code is scanned for occurrences of the require() function and the corresponding NPM dependencies are automatically added to the package.json file by running:

npm install underscore request --save --only=prod --no-optional

Note that certain NPM packages need additional tools for their installation, such as a C compiler or Python interpreter. If these tools are not available in the base Docker image, the build will fail. If that happens, try to change the base image to Node.js 10 + Puppeteer on Debian, because it contains much more tools than other images. Alternatively, you can use switch to the multifile editor and create your own docker image configuration.

Multiple source files

If the actor's source code requires the use of multiple files/directories, then it can be hosted on the Apify platform using this option. This is particulary useful when you need to add INPUT_SCHEMA.json or README.md to your actor, or if you want to create your actor in a language other than JavaScript.

The only required file for multifile is Dockerfile, and all other files depend on your Dockerfile settings. By default Apify's custom NodeJS Dockerfile is used, which requires a main.js file containing your source code and a package.json file containing package configurations for NPM.

Unlike with the single JavaScript file option, package.json is not automaticaly generated when you use multiple source files, so you need to configure it yourself.

See Custom Dockerfile and base Docker images for more information about creating your own Dockerfile and using Apify's prepared base images.

Git repository

If the actor's source code is hosted externally in a Git repository, it can consist of multiple files and directories, use its own Dockerfile to control the build process (see Custom Dockerfile for details) and have a user description in store fetched from the README.md file. The location of the repository is specified by the Git URL setting, which can be an https, git or ssh URL.

To help you get started quickly, you can use the apify/quick-start actor which contains all the boilerplate necessary when creating a new actor hosted on Git. The source code is available on GitHub.

To specify a Git branch or tag to check out, add a URL fragment to the URL. For example, to check out the develop branch, specify a URL such as https://github.com/jancurn/act-analyse-pages.git#develop

Optionally, the second part of the fragment in the Git URL (separated by a colon) specifies the context directory for the Docker build. For example, https://github.com/jancurn/act-analyse-pages.git#develop:some/dir will check out the develop branch and set some/dir as a context directory for the Docker build.

Note that you can easily set up an integration where the actor is automatically rebuilt on every commit to the Git repository. For more details, see GitHub integration.

Private repositories

If your source code is hosted in a private Git repository then you need to configure deployment key. Deployment key is different for each actor and might be used only once at Git hosting of your choice (Github, Bitbucket, Gitlab, etc.).

To obtain the key click at the deployment key link under the Git URL text input and follow the instructions there.

Zip file

The source code for the actor can also be located in a Zip archive hosted on an external URL. This option enables integration with arbitrary source code or continuous integration systems. Similarly as with the Git repository, the source code can consist of multiple files and directories, can contain a custom Dockerfile and the actor description is taken from README.md.

GitHub Gist

Sometimes having a full Git repository or a hosted Zip file might be overly complicated for your small project, but you still want to have the source code in multiple files. In this case, you can simply put your source code into a GitHub Gist. For example:

https://gist.github.com/jancurn/2dbe83fea77c439b1119fb3f118513e7

Then set the Source Type to GitHub Gist and paste the Gist URL as follows:

GitHub Gist settings

Note that the example actor is available in the Apify Store as apify/example-github-gist.

Similarly as with the Git repository, the source code can consist of multiple files and directories, it can contain a custom Dockerfile and the actor description is taken from README.md.

Custom Dockerfile

Internally, Apify uses Docker to build and run actors. To control the build of the actor, you can create a custom Dockerfile in the root of the Git repository or Zip directory. Note that this option is not available for the Single JavaScript file option. If the Dockerfile is missing, the system uses the following default:

FROM apify/actor-node-basic

# Copy all files and directories from the directory
# to the Docker image
COPY . ./

# Install NPM packages, skip optional and development
# dependencies to keep the image small,
# avoid logging too much, and log the dependency tree
RUN npm install --quiet --only=prod --no-optional \
 && npm list

For more information about Dockerfile syntax and commands, see the Dockerfile reference.

Note that apify/actor-node-basic is a base Docker image provided by Apify. There are other base images with other features available. However, you can use arbitrary Docker images as the base for your actors, although using the Apify images has some performance advantages. See base Docker images for details.

By default, all Apify base Docker images start your Node.js application same way as npm start does, i.e. by running the command specified in the package.json file under the scripts - start key. The default package.json file is similar to the following.

{
  "description": "Anonymous actor on the Apify platform",
  "version": "0.0.1",
  "license": "UNLICENSED",
  "main": "main.js",
  "scripts": {
    "start": "node main.js"
  },
  "dependencies": {
    "apify": ">=0.8.15",
    "apify-client": ">=0.3.0",
  },
  "repository": {}
}

This means that by default the system expects the source code to be in the main.js file. If you want to override this behavior, use a custom package.json and/or Dockerfile.

GitHub integration

If the source code of an actor is hosted in a Git repository, it is possible to set up integration so that on every push to the Git repository the actor is automatically rebuilt. For that, you only need to set up a webhook in your Git source control system that will invoke the Build actor API endpoint on every push to Git repository.

For example, for repositories on GitHub it can be done using the following steps. First, go to the actor detail page, open the API tab and copy the Build actor API endpoint URL. It should look something like this:

https://api.apify.com/v2/acts/apify~hello-world/builds?token=<API_TOKEN>&version=0.1

Then go to your GitHub repository, click Settings, select Webhooks tab and click Add webhook. Paste the API URL to the Payload URL as follows:

GitHub integration

And that's it! Now your actor should automatically rebuild on every push to the GitHub repository.

Custom environment variables

The actor owner can specify custom environment variables that are set to the actor's process during the run. Sensitive environment variables such as passwords or API tokens can be protected by setting the Secret option. With this option enabled, the value of the environment variable is encrypted and it will not be visible in the app or APIs, and the value is redacted from actor logs to avoid the accidental leakage of sensitive data.

Custom environment variables

Note that the custom environment variables are fixed during the build of the actor and cannot be changed later. See the Builds section for details.

To access environment variables in Node.js, use the process.env object, for example:

console.log(process.env.SMTP_HOST);

The actor runtime sets additional environment variables for the actor process during the run. See Environment variables for details.

Versioning

In order to enable active development, the actor can have multiple versions of the source code and associated settings, such as the Base image and Environment. Each version is denoted by a version number of the form MAJOR.MINOR; the version numbers should adhere to the Semantic Versioning logic.

For example, the actor can have a production version 1.1, a beta version 1.2 that contains new features but is still backwards compatible, and a development version 2.0 that contains breaking changes.

The versions of the actors are built and run separately. For details, see Build and Running.

Local development

It is possible to develop actors locally on your computer and then only deploy them to the Apify cloud when they are ready. This is especially useful if you're using Git integration. See Git repository for more details. The boilerplate for creating an actor in a Git repository is available on GitHub.

In order to test the input and output of your actors on your local machine, you might define the APIFY_DEV_KEY_VALUE_STORE_DIR environment variable, which will cause the apify NPM package to emulate the key-value store locally using files in a directory. For more details, please see the apify package documentation.

Unfortunately, not all features of the Apify platform can be emulated locally, therefore you might still need to let the apify NPM package use your API token in order to interact with the Apify platform. The simplest way to achieve that is by setting the APIFY_TOKEN environment variable on your local development machine.

Input schema

Actor source files may contain an input schema defining the input that actor accepts and the UI components used for input at Apify platform. Using input schema you can provide UI to actor users that is easy to use and also ensure that input of your actor is valid.

For more information on this topic see input schema documentation on a separate page.

Metamorph

The metamorph operation transforms an actor run into a run of another actor with a new input. This feature is useful if you want to use another actor to finish the work of your current actor, instead of internally starting a new actor run and waiting for its finish. With metamorph, you can easily create new actors on top of existing ones, and give your users nicer input structure and user-interface for the final actor. For the users of your actors, the metamorph operation is completely transparent, they will just see your actor got the work done.

Internally, the system stops the Docker container corresponding to the actor run and starts a new container using a different Docker image. All the default storages are preserved and the new input is stored under the INPUT-METAMORPH-1 key in the same default key-value store.

To make you actor compatible with metamorph operation use Apify.getInput() instead of Apify.getValue('INPUT'). This method will fetch the input using the right key INPUT-METAMORPH-1 in a case of metamorphed run.

For example, imagine you have an actor that accepts a hotel URL on input and then internally uses the apify/web-scraper actor to scrape all the hotel reviews. The metamorphing code would look as follows:

const Apify = require('apify');

Apify.main(async () => {
    // Get input of your actor.
    const { hotelUrl } = await Apify.getInput();

    // Create input for apify/web-scraper
    const newInput = {
        startUrls: [{ url: hotelUrl }],
        pageFunction: () => {
            // Here you pass the page function that
            // scrapes all the reviews ...
        },
        // ... and here would be all the additional
        // input parameters.
    };

    // Transform the actor run to apify/web-scraper
    // with the new input.
    await Apify.metamorph('apify/web-scraper', newInput);

    // The line here will never be reached, because the
    // actor run will be interrupted.
});