Skip to main content

Proxy management

IP address blocking is one of the oldest and most effective ways of preventing access to a website. It is therefore paramount for a good web scraping library to provide easy to use but powerful tools which can work around IP blocking. The most powerful weapon in your anti IP blocking arsenal is a proxy server.

With the Apify SDK, you can use your own proxy servers, proxy servers acquired from third-party providers, or you can rely on Apify Proxy for your scraping needs.

Quick start

If you want to use Apify Proxy locally, make sure that you run your Actors via the Apify CLI and that you are logged in with your Apify account in the CLI.

Using Apify Proxy

proxy_configuration = await Actor.create_proxy_configuration()
proxy_url = await proxy_configuration.new_url()

Using your own proxies

proxy_configuration = await Actor.create_proxy_configuration(
proxy_urls=[
'http://proxy-1.com',
'http://proxy-2.com',
],
)
proxy_url = await proxy_configuration.new_url()

Proxy Configuration

All your proxy needs are managed by the ProxyConfiguration class. You create an instance using the Actor.create_proxy_configuration() method. Then you generate proxy URLs using the ProxyConfiguration.new_url() method.

Apify Proxy vs. your own proxies

The ProxyConfiguration class covers both Apify Proxy and custom proxy URLs, so that you can easily switch between proxy providers. However, some features of the class are available only to Apify Proxy users, mainly because Apify Proxy is what one would call a super-proxy. It's not a single proxy server, but an API endpoint that allows connection through millions of different IP addresses. So the class essentially has two modes: Apify Proxy or Your proxy.

The difference is easy to remember. Using the proxy_url or new_url_function arguments enables use of your custom proxy URLs, whereas all the other options are there to configure Apify Proxy. Visit the Apify Proxy docs for more info on how these parameters work.

IP Rotation and session management

proxyConfiguration.new_url() allows you to pass a session_id parameter. It will then be used to create a session_id-proxy_url pair, and subsequent new_url() calls with the same session_id will always return the same proxy_url. This is extremely useful in scraping, because you want to create the impression of a real user.

When no session_id is provided, your custom proxy URLs are rotated round-robin, whereas Apify Proxy manages their rotation using black magic to get the best performance.

proxy_configuration = await Actor.create_proxy_configuration(
proxy_urls=[
'http://proxy-1.com',
'http://proxy-2.com',
],
)
proxy_url = await proxy_configuration.new_url() # http://proxy-1.com
proxy_url = await proxy_configuration.new_url() # http://proxy-2.com
proxy_url = await proxy_configuration.new_url() # http://proxy-1.com
proxy_url = await proxy_configuration.new_url() # http://proxy-2.com
proxy_url = await proxy_configuration.new_url(session_id='a') # http://proxy-1.com
proxy_url = await proxy_configuration.new_url(session_id='b') # http://proxy-2.com
proxy_url = await proxy_configuration.new_url(session_id='b') # http://proxy-2.com
proxy_url = await proxy_configuration.new_url(session_id='a') # http://proxy-1.com

Apify Proxy Configuration

With Apify Proxy, you can select specific proxy groups to use, or countries to connect from. This allows you to get better proxy performance after some initial research.

proxy_configuration = await Actor.create_proxy_configuration(
groups=['RESIDENTIAL'],
country_code='US',
)
proxy_url = await proxy_configuration.new_url()

Now your connections using proxy_url will use only Residential proxies from the US. Note that you must first get access to a proxy group before you are able to use it. You can find your available proxy groups in the proxy dashboard.

If you don't specify any proxy groups, automatic proxy selection will be used.

Your own proxy configuration

There are two options how to make ProxyConfiguration work with your own proxies.

Either you can pass it a list of your own proxy servers:

proxy_configuration = await Actor.create_proxy_configuration(
proxy_urls=[
'http://proxy-1.com',
'http://proxy-2.com',
],
)
proxy_url = await proxy_configuration.new_url()

Or you can pass it a method (accepting one optional argument, the session ID), to generate proxy URLs automatically:

def custom_new_url_function(session_id: Optional[str] = None) -> str:
if session_id is not None:
return f'http://my-custom-proxy-supporting-sessions.com?session-id={session_id}
return 'http://my-custom-proxy-not-supporting-sessions.com'

proxy_configuration = await Actor.create_proxy_configuration(
new_url_function = custom_new_url_function,
)

proxy_url_with_session = await proxy_configuration.new_url('a')
proxy_url_without_Session = await proxy_configuration.new_url()

Configuring proxy based on Actor input

To make selecting the proxies that the Actor uses easier, you can use an input field with the editor proxy in your input schema. This input will then be filled with a dictionary containing the proxy settings you or the users of your Actor selected for the Actor run.

You can then use that input to create the proxy configuration:

actor_input = await Actor.get_input() or {}
proxy_settings = actor_input.get('proxySettings')
proxy_configuration = await Actor.create_proxy_configuration(actor_proxy_input=proxy_settings)
proxy_url = await proxy_configuration.new_url()

Using the generated proxy URLs

Requests

To use the generated proxy URLs with the requests library, use the proxies argument:

proxy_configuration = await Actor.create_proxy_configuration()
proxy_url = await proxy_configuration.new_url()
proxies = {
'http': proxy_url,
'https': proxy_url,
}

response = requests.get('http://example.com', proxies=proxies)
# --- OR ---
with requests.Session() as session:
session.proxies.update(proxies)
response = session.get('http://example.com')

HTTPX

To use the generated proxy URLs with the httpx library, use the proxies argument:

proxy_configuration = await Actor.create_proxy_configuration()
proxy_url = await proxy_configuration.new_url()

response = httpx.get('http://example.com', proxy=proxy_url)
# --- OR ---
async with httpx.AsyncClient(proxy=proxy_url) as httpx_client:
response = await httpx_client.get('http://example.com')