Getting Started
Without the right tools, crawling and scraping the web can be difficult. At the very least, you need an HTTP client to make the necessary requests, but that only gets you raw HTML and sometimes not even that. Then you have to read this HTML and extract the data you're interested in. Once extracted, it must be stored in a machine-readable format and easily accessible for further processing, because it is the processed data that holds value.
Apify SDK covers the process end-to-end. From crawling the web for links and scraping the raw data to storing it in various machine readable formats, ready for processing. With this guide in hand, you should have your own data extraction solutions up and running in a few hours.
Intro
The goal of this getting started guide is to provide a step-by-step introduction to all the features of the Apify SDK. It will walk you through creating the simplest of crawlers that only prints text to console, all the way up to complex systems that crawl pages, interact with them as if a real user were sitting in front of a real browser and output structured data.
Since Apify SDK is usable both locally on any computer and on the Apify platform, you will be able to use the source code in both environments interchangeably. Nevertheless, some initial setup is still required, so choose your preferred starting environment and let's get into it.
Setting up locally
To run Apify SDK on your own computer, you need to meet the following pre-requisites first:
- Have Node.js version 10.17 or higher, with the exception of Node.js 11, installed.
- Visit Node.js website to download or use nvm
- Have NPM installed.
- NPM comes bundled with Node.js so you should already have it. If not, reinstall Node.js.
If you're not certain, confirm the prerequisites by running:
node -v
npm -v