Edit

Crawl a list of URLs from a Google Sheets document

Learn to crawl and scrape data from URLs specified in a spreadsheet with Apify scrapers. Scrape a pre-determined list of web pages with Apify actors.

Actors such as Web Scraper (apify/web-scraper) Cheerio Scraper (apify/cheerio-scraper) and Puppeteer Scraper (apify/puppeteer-scraper) make it simple to crawl web pages and extract data from them.

These actors start with a pre-defined list of URLs (start URLs), then recursively follow links to find new pages (optional).

Add Start URLs in Apify console

Let's say you have the start URLs you want to crawl entered in a Google Sheets spreadsheet, such as this one.

Start URLs in a spreadsheet

You don't have to add them to the actor manually or export them as a file, only to upload to the scraper.

Simply add the /gviz/tq?tqx=out:csv query parameter to the base part of the Google Sheet URL, right after the long document identifier.

https://docs.google.com/spreadsheets/d/1GA5sSQhQjB_REes8I5IKg31S-TuRcznWOPjcpNqtxmU/gviz/tq?tqx=out:csv

This gives you a URL that automatically exports the spreadsheet to CSV. Then, just click the Link remote text file button in the actor's input and paste the URL.

Link a remote text file

IMPORTANT: Make sure the document can be viewed by anyone with the link, otherwise the actor will not be able to access it.

Make the link viewable to anyone

And that's it, now the actor will download the content of the spreadsheet with up-to-date URLs whenever it starts.

Beware that the spreadsheet should have a simple structure, so the actor can easily find the URLs in it. Also, it should only have one sheet.