Proxy
Learn to anonymously access websites in scraping/automation jobs. Improve data outputs and efficiency of bots, and access websites from various geographies.
Apify Proxy allows you to change your IP address when web scraping to reduce the chance of being blocked because of your geographical location.
You can use proxies in your actors or any other application that supports HTTP proxies. Apify Proxy monitors the health of your IP pool and intelligently rotates addresses to prevent IP address-based blocking.
You can view your proxy settings and password on the Proxy page in the Apify Console.
Our proxies
Datacenter proxy – the fastest and cheapest option, it uses datacenters to change your IP address. Note that there is a chance of being blocked because of the activity of other users. [Code examples]
Residential proxy – IP addresses located in homes and offices around the world. These IPs are the least likely to be blocked. [How to connect]
Google SERP proxy – download and extract data from Google Search Engine Result Pages (SERPs). You can select country and language to get localized results. [Code examples]
For pricing information, visit apify.com/proxy.
Using your own proxies
In addition to our proxies, you can use your own both in Apify Console and SDK.
Custom proxies in console
To use your own proxies with Apify Console, in your actor's Input and options tab, scroll down and open the Proxy and browser configuration section. Enter your proxy URLs, and you're good to go.
Custom proxies in SDK
In the Apify SDK, use the proxyConfiguration.newUrl(sessionId)
command to add your custom proxy URLs to the proxy configuration. See the SDK docs for more details.
IP address rotation
Web scrapers can rotate the IP addresses they use to access websites. They assign each request a different IP address, which makes it appear like they are all coming from different users. This greatly enhances performance and data throughout.
Depending on whether you use a browser or HTTP requests for your scraping jobs, IP address rotation works differently.
- Browser – a different IP address is used for each browser.
- HTTP request – a different IP address is used for each request.
You can use sessions to manage how you rotate and persist IP addresses.
Click here to learn more about IP address rotation and our findings on how blocking works.
Sessions
Sessions allow you to use the same IP address for multiple connections.
To set a new session, pass the session
parameter in your username field when connecting to a proxy. This will serve as the session's ID and an IP address will be assigned to it. To use that IP address in other requests, pass that same session ID in the username field.
The created session will store information such as cookies and can be used to generate browser fingerprints. You can also assign custom user data such as authorization tokens and specific headers.
Sessions are available for datacenter and residential proxies.
This parameter is optional. By default, each proxied request is assigned a randomly picked least used IP address.
Session persistence
You can persist your sessions (use the same IP address) by setting the session
parameter in the username
field. This assigns a single IP address to a session ID after you make the first request.
Session IDs represent IP addresses. Therefore, you can manage the IP addresses you use by managing sessions. In cases where you need to keep the same session (e.g. when you need to log in to a website), it is best to keep the same proxy. By assigning an IP address to a session ID, you can use that IP for every request you make.
For datacenter proxies, a session persists for 26 hours (more info). For residential proxies, it persists for 1 minute (more info). Using a session resets its expiry timer.
Google SERP proxies do not support sessions.
Dead proxies
Our health check performs an HTTP and HTTPS request with each proxy server every few hours. If a server fails both requests 3 times in a row, it's marked as dead and all user sessions with this server are discarded.
Banned proxies are not considered dead, since they become usable after a while.
A different approach to 502 Bad Gateway
There are times when the 502
status code is not comprehensive enough. Therefore, we have modified our server with 590-599
codes instead to provide more insight.
590 Non Successful
: upstream responded with non-200 status code.591 RESERVED
: this status code is reserved for further use.592 Status Code Out Of Range
: upstream responded with status code different than 100-999.593 Not Found
: DNS lookup failed -EAI_NODATA
orEAI_NONAME
.594 Connection Refused
: upstream refused connection.595 Connection Reset
: connection reset due to loss of connection or timeout.596 Broken Pipe
: trying to write on a closed socket.597 Auth Failed
: incorrect upstream credentials.598 RESERVED
: this status code is reserved for further use.599 Upstream Error
: generic upstream error.
590
and 592
indicate an issue on the upstream side.
593
indicates an incorrect proxy-chain
configuration.
594
, 595
and 596
may occur due to connection loss.
597
indicates incorrect upstream credentials.
599
is a generic error, where the above is not applicable.