Learn about the reasons a bot might be presented a captcha, the best ways to avoid captchas in the first place, and how to programmatically solve them.
In general, a website will present a user (or scraper) a captcha for 2 main reasons:
- The website always does captcha checks to access the desired content.
- One of the website's anti-bot measures (or the WAF) has flagged the user as suspicious.
Dealing with captchas
When you've hit a captcha, your first thought should not be how to programmatically solve it. Rather, you should consider the factors as to why you received the captcha in the first place: your bot didn't appear enough like a real user to avoid being presented the challenge.
Have you expended all of the possible options to make your scraper appear more human-like? Are you:
- Using proxies?
- Making the request with the proper headers and cookies?
- Generating and using a custom browser fingerprint?
- Trying different general scraping methods (HTTP scraping, browser scraping)? If you are using browser scraping, have you tried using a different browser?
If you've tried everything you can to avoid being presented the captcha and are still facing this roadblock, there are methods to programmatically solve captchas.
There are tons of different types of captchas, but one of the most popular is Google's reCAPTCHA.
In this course, you've learned about some of the most common (and some of the most advanced) anti-scraping techniques. Keep in mind that as the web (and technology in general) evolves, this section of the Anti scraping course will evolve as well. In the next section, we'll be discussing how to mitigate the anti-scraping techniques you learned about in this section.