Skip to main content
Version: 1.3

Crawl a website with relative links

If a website uses relative links, CheerioCrawler and Apify.enqueueLinks() may have trouble following them. This is why it is important to set the baseUrl property within Apify.enqueueLinks() to request.loadedUrl:

const Apify = require('apify');

Apify.main(async () => {
// Create a RequestQueue
const requestQueue = await Apify.openRequestQueue();
// Define the starting URL
await requestQueue.addRequest({ url: 'https://apify.com/' });
// Function called for each URL
const handlePageFunction = async ({ request, $ }) => {
console.log(request.url);
// Add all links from page to RequestQueue
await Apify.utils.enqueueLinks({
$,
requestQueue,
baseUrl: request.loadedUrl, // <-------------- important to set the base url here
});
};
// Create a CheerioCrawler
const crawler = new Apify.CheerioCrawler({
requestQueue,
handlePageFunction,
maxRequestsPerCrawl: 10, // Limitation for only 10 requests (do not use if you want to crawl all links)
});
// Run the crawler
await crawler.run();
});