Using Requests and HTTPX
To use either of the libraries mentioned below in your Actors, you can start from the Start with Python Actor template.
Requests
The requests
library is one of the most popular Python libraries for making HTTP requests.
To use it in your Actors, no special configuration is needed.
Just put requests
in your requirements.txt
file,
reinstall dependencies if you're running the Actor locally,
and you're good to go.
import requests
from apify import Actor
async def main():
async with Actor:
response = requests.get('http://example.com')
print(response.text)
Using proxies with requests
To use Apify Proxy with requests
,
you can just generate a proxy URL through Actor.create_proxy_configuration()
,
and pass it to requests
using the proxies
argument:
import requests
from apify import Actor
async def main():
async with Actor:
proxy_configuration = await Actor.create_proxy_configuration()
proxy_url = await proxy_configuration.new_url()
proxies = {
'http': proxy_url,
'https': proxy_url,
}
response = requests.get('http://example.com', proxies=proxies)
print(response.text)
To learn more about using proxies in your Actor with requests
, check the documentation for proxy management.
HTTPX
Another very popular Python library for performing HTTP requests is HTTPX
.
Its main advantage over requests
is the ability to perform asynchronous HTTP requests,
making it ideal for large-scale, parallel web scraping.
To use it in your Actors, no special configuration is needed.
Just put httpx
in your requirements.txt
file,
reinstall dependencies if you're running the Actor locally,
and you're good to go.
import asyncio
import httpx
from apify import Actor
async def main():
async with Actor:
async with httpx.AsyncClient() as httpx_client:
# This will perform all the requests in parallel
http_requests = []
for i in range(10):
http_requests.append(httpx_client.get(f'http://example.com/{i}'))
responses = await asyncio.gather(*http_requests)
print(responses)
Using proxies with HTTPX
To use Apify Proxy with httpx
,
you can just generate a proxy URL through Actor.create_proxy_configuration()
,
and pass it to httpx
using the proxies
argument:
import httpx
from apify import Actor
async def main():
async with Actor:
proxy_configuration = await Actor.create_proxy_configuration()
proxy_url = await proxy_configuration.new_url()
async with httpx.AsyncClient(proxies=proxy_url) as httpx_client:
response = httpx_client.get(f'http://example.com'),
print(response)
To learn more about using proxies in your Actor with httpx
, check the documentation for proxy management.