How to choose the right scraper for the job
Learn basic web scraping concepts to help you analyze a website and choose the best scraper for your particular use case.
There are two main ways you can proceed with building your crawler:
- Using plain HTTP requests.
- Using an automated browser.
We will briefly go through the pros and cons of both, and also will cover the basic steps on how to determine which one should you go with.
If it were only a question of performance, you'd of course use request-based scraping every time; however, it's unfortunately not that simple.
Dynamic pages & blocking
Some websites do not load any data without a browser, as they need to execute some scripts to show it (these are known as dynamic pages). Another problem is blocking. If the website is collecting a browser fingerprint, it is very easy for it to distinguish between a real user and a bot (crawler) and block access.
Making the choice
It also depends of course on whether you need to fill in some data (like a username and password) or select a location (such as entering zip code manually). Tasks where interacting with the page is absolutely necessary cannot be done using plain HTTP scraping, and require headless browsers. In some cases, you might also decide to use a browser-based solution in order to better blend in with the rest of the "regular" traffic coming from real users.