Geolocation

Learn about the geolocation techniques to determine where requests are coming from, and a bit about how to avoid being blocked based on geolocation.

Geolocation is yet another way websites can detect and block access or show limited data. Other than by using the Geolocation API (which requires user permission in order to receive location data), there are two main ways that websites geolocate a user (or bot) visiting it.

Cookies & headers

Certain websites might use certain location-specific/language-specific headers/cookies to geolocate a user. Some examples of these headers are Accept-Language and CloudFront-Viewer-Country (which is a custom HTTP header from CloudFront).

On targets which are utilizing just cookies and headers to identify the location from which a request is coming from, it is pretty straightforward to make requests which appear like they are coming from somewhere else.

IP address

The oldest (and still most common) way of geolocating is based on the IP address used to make the request. Sometimes, country-specific sites block themselves from being accessed from any other country (some Chinese, Indian, Israeli, and Japanese websites do this).

Proxies can be used in a scraper to bypass restrictions and to make requests from a different location. Oftentimes, proxies need to be used in combination with location-specific cookies/headers.

Override/emulate geolocation when using a browser-based scraper

When using Puppeteer, you can emulate the geolocation with the page.setGeolocation() function.

In Playwright, geolocation can be emulated by using browserContext.setGeolocation().

Overriding browser geolocation should be used in tandem with a proper proxy corresponding to the emulated geolocation. You would still likely get blocked if you, for example, used a German proxy with the overridden location set to Japan.

Cookies & headers​

IP address​

Override/emulate geolocation when using a browser-based scraper​

Cookies & headers

IP address

Override/emulate geolocation when using a browser-based scraper