Crawl multiple URLs
This example crawls the specified list of URLs.
- Cheerio Crawler
- Puppeteer Crawler
- Playwright Crawler
Using CheerioCrawler
:
import { Actor } from 'apify';
import { CheerioCrawler } from 'crawlee';
await Actor.init();
const crawler = new CheerioCrawler({
// Function called for each URL
async requestHandler({ request, $ }) {
const title = $('title').text();
console.log(`URL: ${request.url}\nTITLE: ${title}`);
},
});
// Run the crawler
await crawler.run([
'http://www.example.com/page-1',
'http://www.example.com/page-2',
'http://www.example.com/page-3',
]);
await Actor.exit();
Using PuppeteerCrawler
:
tip
To run this example on the Apify Platform, select the apify/actor-node-puppeteer-chrome
image for your Dockerfile.
import { Actor } from 'apify';
import { PuppeteerCrawler } from 'crawlee';
await Actor.init();
const crawler = new PuppeteerCrawler({
// Function called for each URL
async requestHandler({ request, page }) {
const title = await page.title();
console.log(`URL: ${request.url}\nTITLE: ${title}`);
},
});
// Run the crawler
await crawler.run([
'http://www.example.com/page-1',
'http://www.example.com/page-2',
'http://www.example.com/page-3',
]);
await Actor.exit();
Using PlaywrightCrawler
:
tip
To run this example on the Apify Platform, select the apify/actor-node-playwright-chrome
image for your Dockerfile.
import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';
await Actor.init();
const crawler = new PlaywrightCrawler({
// Function called for each URL
async requestHandler({ request, page }) {
const title = await page.title();
console.log(`URL: ${request.url}\nTITLE: ${title}`);
},
});
// Run the crawler
await crawler.run([
'http://www.example.com/page-1',
'http://www.example.com/page-2',
'http://www.example.com/page-3',
]);
await Actor.exit();