Skip to main content
Version: 1.7

Dataset

{"content": ["The Dataset class represents a store for structured data where each object stored has the same attributes.\n\nYou can imagine it as a table, where each object is a row and its attributes are columns.\nDataset is an append-only storage - you can only add new records to it but you cannot modify or remove existing records.\nTypically it is used to store crawling results.\n\nDo not instantiate this class directly, use the Actor.open_dataset() function instead.\n\nDataset stores its data either on local disk or in the Apify cloud,\ndepending on whether the APIFY_LOCAL_STORAGE_DIR or APIFY_TOKEN environment variables are set.\n\nIf the APIFY_LOCAL_STORAGE_DIR environment variable is set, the data is stored in\nthe local directory in the following files:\n``\n{APIFY_LOCAL_STORAGE_DIR}/datasets/{DATASET_ID}/{INDEX}.json\n``", "Note that {DATASET_ID} is the name or ID of the dataset. The default dataset has ID: default,\nunless you override it by setting the APIFY_DEFAULT_DATASET_ID environment variable.\nEach dataset item is stored as a separate JSON file, where {INDEX} is a zero-based index of the item in the dataset.\n\nIf the APIFY_TOKEN environment variable is set but APIFY_LOCAL_STORAGE_DIR is not, the data is stored in the\nApify Dataset cloud storage."]}

Index

Methods

drop

  • async drop(): None
  • {"content": ["Remove the dataset either from the Apify cloud storage or from the local directory."]}


    Returns None

export_to

  • async export_to(key, *, to_key_value_store_id, to_key_value_store_name, content_type): None
  • {"content": ["Save the entirety of the dataset's contents into one file within a key-value store.\n", {"


    Parameters

    • key: str
    • keyword-onlyto_key_value_store_id: str | None = None
    • keyword-onlyto_key_value_store_name: str | None = None
    • keyword-onlycontent_type: str | None = None

    Returns None

export_to_csv

  • async export_to_csv(key, *, from_dataset_id, from_dataset_name, to_key_value_store_id, to_key_value_store_name): None
  • {"content": ["Save the entirety of the dataset's contents into one CSV file within a key-value store.\n", {"


    Parameters

    • key: str
    • keyword-onlyfrom_dataset_id: str | None = None
    • keyword-onlyfrom_dataset_name: str | None = None
    • keyword-onlyto_key_value_store_id: str | None = None
    • keyword-onlyto_key_value_store_name: str | None = None

    Returns None

export_to_json

  • async export_to_json(key, *, from_dataset_id, from_dataset_name, to_key_value_store_id, to_key_value_store_name): None
  • {"content": ["Save the entirety of the dataset's contents into one JSON file within a key-value store.\n", {"


    Parameters

    • key: str
    • keyword-onlyfrom_dataset_id: str | None = None
    • keyword-onlyfrom_dataset_name: str | None = None
    • keyword-onlyto_key_value_store_id: str | None = None
    • keyword-onlyto_key_value_store_name: str | None = None

    Returns None

get_data

  • async get_data(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden, flatten, view): ListPage
  • {"content": ["Get items from the dataset.\n", {"


    Parameters

    • keyword-onlyoffset: int | None = None
    • keyword-onlylimit: int | None = None
    • keyword-onlyclean: bool | None = None
    • keyword-onlydesc: bool | None = None
    • keyword-onlyfields: list[str] | None = None
    • keyword-onlyomit: list[str] | None = None
    • keyword-onlyunwind: str | None = None
    • keyword-onlyskip_empty: bool | None = None
    • keyword-onlyskip_hidden: bool | None = None
    • keyword-onlyflatten: list[str] | None = None
    • keyword-onlyview: str | None = None

    Returns ListPage

get_info

  • async get_info(): dict | None
  • {"content": ["Get an object containing general information about the dataset.\n", {"


    Returns dict | None

iterate_items

  • iterate_items(*, offset, limit, clean, desc, fields, omit, unwind, skip_empty, skip_hidden): AsyncIterator[dict]
  • {"content": ["Iterate over the items in the dataset.\n", {"


    Parameters

    • keyword-onlyoffset: int = 0
    • keyword-onlylimit: int | None = None
    • keyword-onlyclean: bool | None = None
    • keyword-onlydesc: bool | None = None
    • keyword-onlyfields: list[str] | None = None
    • keyword-onlyomit: list[str] | None = None
    • keyword-onlyunwind: str | None = None
    • keyword-onlyskip_empty: bool | None = None
    • keyword-onlyskip_hidden: bool | None = None

    Returns AsyncIterator[dict]

open

  • async open(*, id, name, force_cloud, config): Dataset
  • {"content": ["Open a dataset.\n\nDatasets are used to store structured data where each object stored has the same attributes,\nsuch as online store products or real estate offers.\nThe actual data is stored either on the local filesystem or in the Apify cloud.\n", {"


    Parameters

    • keyword-onlyid: str | None = None
    • keyword-onlyname: str | None = None
    • keyword-onlyforce_cloud: bool = False
    • keyword-onlyconfig: Configuration | None = None

    Returns Dataset

push_data

  • async push_data(data): None
  • {"content": ["Store an object or an array of objects to the dataset.\n\nThe size of the data is limited by the receiving API and therefore push_data() will only\nallow objects whose JSON representation is smaller than 9MB. When an array is passed,\nnone of the included objects may be larger than 9MB, but the array itself may be of any size.\n", {"


    Parameters

    • data: JSONSerializable

    Returns None