apify-sdk-python
Index
Main Classes
Helper Classes
Methods
- apply_apify_settings
- budget_ow
- compute_short_hash
- compute_unique_key
- crypto_random_object_id
- decrypt_input_secrets
- force_remove
- force_rename
- get_basic_auth_header
- get_running_event_loop_id
- guess_file_extension
- is_url
- normalize_url
- open_queue_with_custom_client
- to_apify_request
- to_scrapy_request
- unique_key_to_request_id
Properties
Constants
Methods
apply_apify_settings
Parameters
settings: Settings | None = Nonekeyword-only
proxy_config: dict | None = Nonekeyword-only
Returns Settings
budget_ow
Budget version of ow.
Parameters
value: dict | str | float | bool
predicate: dict[str, tuple[type, bool]] | tuple[type, bool]
value_name: str | None = None
Returns None
compute_short_hash
Computes a hexadecimal SHA-256 hash of the provided data and returns a substring (prefix) of it.
Parameters
data: bytes
length: int = 8keyword-only
Returns str
compute_unique_key
Computes a unique key for caching & deduplication of requests.
This function computes a unique key by normalizing the provided URL and method. If ‘use_extended_unique_key’ is True and a payload is provided, the payload is hashed and included in the key. Otherwise, the unique key is just the normalized URL.
Parameters
url: str
method: str = 'GET'
payload: bytes | None = None
keep_url_fragment: bool = Falsekeyword-only
use_extended_unique_key: bool = Falsekeyword-only
Returns str
crypto_random_object_id
Python reimplementation of cryptoRandomObjectId from
@apify/utilities
.Parameters
length: int = 17
Returns str
decrypt_input_secrets
Decrypt input secrets.
Parameters
private_key: rsa.RSAPrivateKey
input: Any
Returns Any
force_remove
JS-like rm(filename, { force: true }).
Parameters
filename: str
Returns None
force_rename
Rename a directory. Checks for existence of soruce directory and removes destination directory if it exists.
Parameters
src_dir: str
dst_dir: str
Returns None
get_basic_auth_header
Generate a basic authentication header for the given username and password.
Parameters
username: str
password: str
auth_encoding: str = 'latin-1'
Returns bytes
get_running_event_loop_id
Get the ID of the currently running event loop.
It could be useful mainly for debugging purposes.
Returns int
guess_file_extension
Guess the file extension based on content type.
Parameters
content_type: str
Returns str | None
is_url
Check if the given string is a valid URL.
Parameters
url: str
Returns bool
normalize_url
Normalizes a URL.
This function cleans and standardizes a URL by removing leading and trailing whitespaces, converting the scheme and netloc to lower case, stripping unwanted tracking parameters (specifically those beginning with ‘utm_’), sorting the remaining query parameters alphabetically, and optionally retaining the URL fragment. The goal is to ensure that URLs that are functionally identical but differ in trivial ways (such as parameter order or casing) are treated as the same.
Parameters
url: str
keep_url_fragment: bool = Falsekeyword-only
Returns str
open_queue_with_custom_client
Open a Request Queue with custom Apify Client.
TODO: add support for custom client to Actor.open_request_queue(), so that we don’t have to do this hacky workaround
Returns RequestQueue
to_apify_request
Convert a Scrapy request to an Apify request.
Parameters
scrapy_request: Request
spider: Spider
Returns dict | None
to_scrapy_request
Convert an Apify request to a Scrapy request.
Parameters
apify_request: dict
spider: Spider
Returns Request
unique_key_to_request_id
Generate request ID based on unique key in a deterministic way.
Parameters
unique_key: str
Returns str
Properties
ResourceClientType
noqa: PLC0105
Integrates Apify configuration into a Scrapy project settings.
Note: The function directly modifies the passed
settings
object and also returns it.