How to build Actors

At Apify, we try to make building web scraping and automation straightforward. You can customize our universal scrapers with JavaScript for quick tweaks, use our code templates for rapid setup in JavaScript, TypeScript, or Python, or build from scratch using our JavaScript and Python SDKs or Crawlee libraries for Node.js and Python for ultimate flexibility and control. This guide offers a quick overview of our tools to help you find the right fit for your needs.

Three ways to build Actors

Our universal scrapers — customize our boilerplate tools to your needs with a bit of JavaScript and setup.
Our code templates for web scraping projects — for a quick project setup to save you development time (includes JavaScript, TypeScript, and Python templates).
Open-source libraries and SDKs
1. JavaScript SDK & Python SDK — for creating your own solution from scratch on the Apify platform using our free development kits. Involves more coding but offers infinite flexibility.
2. Crawlee and Crawlee for Python — for creating your own solutions from scratch using our free web automation libraries. Involves even more coding but offers infinite flexibility. There’s also no need to host these on the platform.

Universal scrapers & what are they for

Universal scrapers were built to provide an intuitive UI plus configuration that will help you start extracting data as quickly as possible. Usually, you just provide a simple JavaScript function and set up one or two parameters, and you're good to go.

Since scraping and automation come in various forms, we decided to build not just one, but six scrapers. This way, you can always pick the right tool for the job. Let's take a look at each particular tool and its advantages and disadvantages.

Scraper	Technology	Advantages	Disadvantages	Best for
🌐 Web Scraper	Headless Chrome Browser	Simple, fully JavaScript-rendered pages	Executes only client-side JavaScript	Websites with heavy client-side JavaScript
👐 Puppeteer Scraper	Headless Chrome Browser	Powerful Puppeteer functions, executes both server-side and client-side JavaScript	More complex	Advanced scraping with client/server-side JS
🎭 Playwright Scraper	Cross-browser support with Playwright library	Cross-browser support, executes both server-side and client-side JavaScript	More complex	Cross-browser scraping with advanced features
🍩 Cheerio Scraper	HTTP requests + Cheerio parser (JQuery-like for servers)	Simple, fast, cost-effective	Pages may not be fully rendered (lacks JavaScript rendering), executes only server-side JavaScript	High-speed, cost-effective scraping
⚠️ JSDOM Scraper	JSDOM library (Browser-like DOM API)	+ Handles client-side JavaScript + Faster than full-browser solutions + Ideal for light scripting	Not for heavy dynamic JavaScript, executes server-side code only, depends on pre-installed NPM modules	Speedy scraping with light client-side JS
🍲 BeautifulSoup Scraper	Python-based, HTTP requests + BeautifulSoup parser	Python-based, supports recursive crawling and URL lists	No full-featured web browser, not suitable for dynamic JavaScript-rendered pages	Python users needing simple, recursive crawling

How do I choose the right universal web scraper to start with?

🎯 Decision points:

Use 🌐 Web Scraper if you need simplicity with full browser capabilities and client-side JavaScript rendering.
Use 🍩 Cheerio Scraper for fast, cost-effective scraping of static pages with simple server-side JavaScript execution.
Use 🎭 Playwright Scraper when cross-browser compatibility is crucial.
Use 👐 Puppeteer Scraper for advanced, powerful scraping where you need both client-side and server-side JavaScript handling.
Use ⚠️ JSDOM Scraper for lightweight, speedy scraping with minimal client-side JavaScript requirements.
Use 🍲 BeautifulSoup Scraper for Python-based scraping, especially with recursive crawling and processing URL lists.

To make it easier, here's a short questionnaire that guides you on selecting the best scraper based on your specific use case:

Questionnaire

Is the website content rendered with a lot of client-side JavaScript?
- Yes:
  - Do you need full browser capabilities?
    - Yes: use Web Scraper or Playwright Scraper
    - No, but I still want advanced features: use Puppeteer Scraper
- No:
  - Do you prioritize speed and cost-effectiveness?
    - Yes: use Cheerio Scraper
    - No: use JSDOM Scraper
Do you need cross-browser support for scraping?
- Yes:** use Playwright Scraper
- No:** continue to the next step.
Is your preferred scripting language Python?**
- Yes:** use BeautifulSoup Scraper
- No:** continue to the next step.
Are you dealing with static pages or lightweight client-side JavaScript?**
- Static pages: use Cheerio Scraper or BeautifulSoup Scraper
- Light client-side JavaScript:
  - Do you want a balance between speed and client-side JavaScript handling?
    - Yes: use JSDOM Scraper
    - No: use Web Scraper or Puppeteer Scraper
Do you need to support recursive crawling or process lists of URLs?
- Yes, and I prefer Python: use BeautifulSoup Scraper
- Yes, and I prefer JavaScript: use Web Scraper or Cheerio Scraper
- No: choose based on other criteria above.

This should help you navigate through the options and choose the right scraper based on the website’s complexity, your scripting language preference, and your need for speed or advanced features.

📚 Resources:

How to use Web Scraper to scrape any website
How to use Beautiful Soup to scrape the web
Learn about our $1/month Creator plan that encourages devs to build Actors based on universal scrapers

Web scraping code templates

Similar to our universal scrapers, our code templates also provide a quick start for developing web scrapers, automation scripts, and testing tools. Built on popular libraries like BeautifulSoup for Python or Playwright for JavaScript, they save time on setup, allowing you to focus on customization. Though they require more coding than universal scrapers, they're ideal for those who want a flexible foundation while still needing room to tailor their solutions.

Code template	Supported libraries	Purpose	Pros	Cons
🐍 Python	Requests, BeautifulSoup, Scrapy, Selenium, Playwright	Creating scrapers Automation Testing tools	- Simplifies setup - Supports major Python libraries	- Requires more manual coding (than universal scrapers)- May be restrictive for complex tasks
☕️ JavaScript	Playwright, Selenium, Cheerio, Cypress, LangChain	Creating scrapers Automation Testing tools	- Eases development with pre-set configurations - Flexibility with JavaScript and TypeScript	- Requires more manual coding (than universal scrapers)- May be restrictive for tasks needing full control

📚 Resources:

How to build a scraper using a web scraper template.

Toolkits and libraries

Apify JavaScript and Python SDKs

Apify SDKs are designed for developers who want to interact directly with the Apify platform. It allows you to perform tasks like saving data in Apify Datasets, running Apify Actors, and accessing the key-value store. Ideal for those who are familiar with Node.js and Python, SDKs provide the tools needed to develop software specifically on the Apify platform, offering complete freedom and flexibility within the JavaScript ecosystem.

Best for: interacting with the Apify platform (e.g., saving data, running Actors, etc)
Pros: full control over platform-specific operations, integrates seamlessly with Apify services
Cons: requires writing boilerplate code, higher complexity with more room for errors

Crawlee

Crawlee (for both Node.js and Python) is a powerful web scraping library that focuses on tasks like extracting data from web pages, automating browser interactions, and managing complex scraping workflows. Unlike the Apify SDK, Crawlee does not require the Apify platform and can be used independently for web scraping tasks. It handles complex operations like concurrency management, auto-scaling, and request queuing, allowing you to concentrate on the actual scraping tasks.

Best for: web scraping and automation (e.g., scraping paragraphs, automating clicks)
Pros: full flexibility in web scraping tasks, does not require the Apify platform, leverages the JavaScript ecosystem
Cons: requires more setup and coding, higher chance of mistakes with complex operations

Combining Apify SDK and Crawlee

While these tools are distinct, they can be combined. For example, you can use Crawlee to scrape data from a page and then use the Apify SDK to save that data in an Apify dataset. This integration allows developers to make use of the strengths of both tools while working within the Apify ecosystem.

📚 Resources:

Introduction to Crawlee
Crawlee blog
Webinar on scraping with Crawlee 101: how to create scrapers in JavaScript and TypeScript
Step-by-step video guide: building an Amazon Scraper in Node.js with Crawlee
Webinar on how to use Crawlee Python
Introduction to Apify's Python SDK

Code templates vs. universal scrapers vs. libraries

Basically, the choice here depends on how much flexibility you need and how much coding you're willing to do. More flexibility → more coding.

Universal scrapers are simple to set up but are less flexible and configurable. Our libraries, on the other hand, enable the development of a standard Node.js or Python application, so be prepared to write a little more code. The reward for that is almost infinite flexibility.

Code templates are sort of a middle ground between scrapers and libraries. But since they are built on libraries, they are still on the rather more coding than less coding side. They will only give you a starter code to begin with. So please take this into account when choosing the way to build your scraper, and if in doubt — just ask us, and we'll help you out.

Switching sides: how to transfer an existing solution from another platform

You can also take advantage of the Apify platform's features without having to modify your existing scraping or automation solutions.

Integrating Scrapy spiders

The Apify platform fully supports Scrapy spiders. By deploying your existing Scrapy code to Apify, you can take advantage of features like scheduling, monitoring, scaling, and API access, all without needing to modify your original spider. This process is made easy with the Apify CLI, which allows you to convert your Scrapy spider into an Apify Actor with just a few commands. Once deployed, your spider can run in the cloud, offering a reliable and scalable solution for your web scraping needs.

Additionally, you can monetize your spiders by publishing them as Actors on Apify Store, potentially earning passive income from your work while benefiting from the platform’s extensive features.

ScrapingBee, ScrapingAnt, ScraperAPI

To make the transition from these platforms easier, we've also created SuperScraper API. This API is an open-source REST API designed for scraping websites by simply passing a URL and receiving the rendered HTML content in return. This service functions as a cost-effective alternative to other scraping services like ScrapingBee, ScrapingAnt, and ScraperAPI. It supports dynamic content rendering with a headless browser, can use various proxies to avoid blocking, and offers features such as capturing screenshots of web pages. It is ideal for large-scale scraping tasks due to its scalable nature.

To use SuperScraper API, you can deploy it with an Apify API token and access it via HTTP requests. The API supports multiple parameters for fine-tuning your scraping tasks, including options for rendering JavaScript, waiting for specific elements, and handling cookies and proxies. It also allows for custom data extraction rules and JavaScript execution on the scraped pages. Pricing is based on actual usage, which can be cheaper or more expensive than competitors, depending on the configuration.

📚 Resources:

How to integrate Scrapy projects
Scrapy monitoring: how to manage your Scrapy spider on Apify
Run ScrapingBee, ScraperAPI, and ScrapingAnt on Apify — SuperScraper API Tutorial

General resources

Creating your Actor: Actor sources
Use it, build it or buy it? Choosing the right solution on Apify
How to programmatically retrieve data with the Apify API
Improved way to build your scrapers from a Git repo
Webinar on how to build and monetize Actors on Apify Store
6 things you should know before buying or building a web scraper
For a comprehensive guide on creating your first Actor, visit the Apify Academy.

Three ways to build Actors​

Universal scrapers & what are they for​

How do I choose the right universal web scraper to start with?​

Web scraping code templates​

Toolkits and libraries​

Apify JavaScript and Python SDKs​

Crawlee​

Combining Apify SDK and Crawlee​

Code templates vs. universal scrapers vs. libraries​

Switching sides: how to transfer an existing solution from another platform​

Integrating Scrapy spiders​

ScrapingBee, ScrapingAnt, ScraperAPI​

General resources​

Three ways to build Actors

Universal scrapers & what are they for

How do I choose the right universal web scraper to start with?

Web scraping code templates

Toolkits and libraries

Apify JavaScript and Python SDKs

Crawlee

Combining Apify SDK and Crawlee

Code templates vs. universal scrapers vs. libraries

Switching sides: how to transfer an existing solution from another platform

Integrating Scrapy spiders

ScrapingBee, ScrapingAnt, ScraperAPI

General resources