Skip to main content

apify-sdk-python

Index

Methods

apply_apify_settings

  • apply_apify_settings(*, settings, proxy_config): Settings
  • Integrates Apify configuration into a Scrapy project settings.

    Note: The function directly modifies the passed settings object and also returns it.


    Parameters

    • settings: Settings | None = Nonekeyword-only
    • proxy_config: dict | None = Nonekeyword-only

    Returns Settings

budget_ow

  • budget_ow(value, predicate, value_name): None
  • Budget version of ow.


    Parameters

    • value: dict | str | float | bool
    • predicate: dict[str, tuple[type, bool]] | tuple[type, bool]
    • value_name: str | None = None

    Returns None

compute_short_hash

  • compute_short_hash(data, *, length): str
  • Computes a hexadecimal SHA-256 hash of the provided data and returns a substring (prefix) of it.


    Parameters

    • data: bytes
    • length: int = 8keyword-only

    Returns str

compute_unique_key

  • compute_unique_key(url, method, payload, *, keep_url_fragment, use_extended_unique_key): str
  • Computes a unique key for caching & deduplication of requests.

    This function computes a unique key by normalizing the provided URL and method. If ‘use_extended_unique_key’ is True and a payload is provided, the payload is hashed and included in the key. Otherwise, the unique key is just the normalized URL.


    Parameters

    • url: str
    • method: str = 'GET'
    • payload: bytes | None = None
    • keep_url_fragment: bool = Falsekeyword-only
    • use_extended_unique_key: bool = Falsekeyword-only

    Returns str

crypto_random_object_id

  • crypto_random_object_id(length): str
  • Python reimplementation of cryptoRandomObjectId from @apify/utilities.


    Parameters

    • length: int = 17

    Returns str

decrypt_input_secrets

  • decrypt_input_secrets(private_key, input): Any
  • Decrypt input secrets.


    Parameters

    • private_key: rsa.RSAPrivateKey
    • input: Any

    Returns Any

force_remove

  • async force_remove(filename): None
  • JS-like rm(filename, { force: true }).


    Parameters

    • filename: str

    Returns None

force_rename

  • async force_rename(src_dir, dst_dir): None
  • Rename a directory. Checks for existence of soruce directory and removes destination directory if it exists.


    Parameters

    • src_dir: str
    • dst_dir: str

    Returns None

get_basic_auth_header

  • get_basic_auth_header(username, password, auth_encoding): bytes
  • Generate a basic authentication header for the given username and password.


    Parameters

    • username: str
    • password: str
    • auth_encoding: str = 'latin-1'

    Returns bytes

get_running_event_loop_id

  • get_running_event_loop_id(): int
  • Get the ID of the currently running event loop.

    It could be useful mainly for debugging purposes.


    Returns int

guess_file_extension

  • guess_file_extension(content_type): str | None
  • Guess the file extension based on content type.


    Parameters

    • content_type: str

    Returns str | None

is_url

  • is_url(url): bool
  • Check if the given string is a valid URL.


    Parameters

    • url: str

    Returns bool

normalize_url

  • normalize_url(url, *, keep_url_fragment): str
  • Normalizes a URL.

    This function cleans and standardizes a URL by removing leading and trailing whitespaces, converting the scheme and netloc to lower case, stripping unwanted tracking parameters (specifically those beginning with ‘utm_’), sorting the remaining query parameters alphabetically, and optionally retaining the URL fragment. The goal is to ensure that URLs that are functionally identical but differ in trivial ways (such as parameter order or casing) are treated as the same.


    Parameters

    • url: str
    • keep_url_fragment: bool = Falsekeyword-only

    Returns str

open_queue_with_custom_client

  • Open a Request Queue with custom Apify Client.

    TODO: add support for custom client to Actor.open_request_queue(), so that we don’t have to do this hacky workaround


    Returns RequestQueue

to_apify_request

  • to_apify_request(scrapy_request, spider): dict | None
  • Convert a Scrapy request to an Apify request.


    Parameters

    • scrapy_request: Request
    • spider: Spider

    Returns dict | None

to_scrapy_request

  • to_scrapy_request(apify_request, spider): Request
  • Convert an Apify request to a Scrapy request.


    Parameters

    • apify_request: dict
    • spider: Spider

    Returns Request

unique_key_to_request_id

  • unique_key_to_request_id(unique_key): str
  • Generate request ID based on unique key in a deterministic way.


    Parameters

    • unique_key: str

    Returns str

Properties

ResourceClientType

ResourceClientType:

noqa: PLC0105