DatasetClient
apify_client.clients.DatasetClient
Index
Methods
delete
Delete the dataset.
https://docs.apify.com/api/v2#/reference/datasets/dataset/delete-dataset
Returns None
download_items
Get the items in the dataset as raw bytes.
Deprecated: this function is a deprecated alias of
get_items_as_bytes
. It will be removed in a future version.https://docs.apify.com/api/v2#/reference/datasets/item-collection/get-items
Parameters
item_format: str = 'json'keyword-only
Format of the results, possible values are: json, jsonl, csv, html, xlsx, xml and rss. The default value is json.
offset: int | None = Nonekeyword-only
Number of items that should be skipped at the start. The default value is 0
limit: int | None = Nonekeyword-only
Maximum number of items to return. By default there is no limit.
desc: bool | None = Nonekeyword-only
By default, results are returned in the same order as they were stored. To reverse the order, set this parameter to True.
clean: bool | None = Nonekeyword-only
If True, returns only non-empty items and skips hidden fields (i.e. fields starting with the # character). The clean parameter is just a shortcut for skip_hidden=True and skip_empty=True parameters. Note that since some objects might be skipped from the output, that the result might contain less items than the limit value.
bom: bool | None = Nonekeyword-only
All text responses are encoded in UTF-8 encoding. By default, csv files are prefixed with the UTF-8 Byte Order Mark (BOM), while json, jsonl, xml, html and rss files are not. If you want to override this default behavior, specify bom=True query parameter to include the BOM or bom=False to skip it.
delimiter: str | None = Nonekeyword-only
A delimiter character for CSV files. The default delimiter is a simple comma (,).
fields: list[str] | None = Nonekeyword-only
A list of fields which should be picked from the items, only these fields will remain in the resulting record objects. Note that the fields in the outputted items are sorted the same way as they are specified in the fields parameter. You can use this feature to effectively fix the output format.
omit: list[str] | None = Nonekeyword-only
A list of fields which should be omitted from the items.
unwind: str | list[str] | None = Nonekeyword-only
A list of fields which should be unwound, in order which they should be processed. Each field should be either an array or an object. If the field is an array then every element of the array will become a separate record and merged with parent object. If the unwound field is an object then it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object and therefore cannot be merged with a parent object, then the item gets preserved as it is. Note that the unwound items ignore the desc parameter.
skip_empty: bool | None = Nonekeyword-only
If True, then empty items are skipped from the output. Note that if used, the results might contain less items than the limit value.
skip_header_row: bool | None = Nonekeyword-only
If True, then header row in the csv format is skipped.
skip_hidden: bool | None = Nonekeyword-only
If True, then hidden fields are skipped from the output, i.e. fields starting with the # character.
xml_root: str | None = Nonekeyword-only
Overrides default root element name of xml output. By default the root element is items.
xml_row: str | None = Nonekeyword-only
Overrides default element name that wraps each page or page function result object in xml output. By default the element name is item.
flatten: list[str] | None = Nonekeyword-only
A list of fields that should be flattened
Returns bytes
The dataset items as raw bytes
get
Retrieve the dataset.
https://docs.apify.com/api/v2#/reference/datasets/dataset/get-dataset
Returns dict | None
The retrieved dataset, or None, if it does not exist
get_items_as_bytes
Get the items in the dataset as raw bytes.
https://docs.apify.com/api/v2#/reference/datasets/item-collection/get-items
Parameters
item_format: str = 'json'keyword-only
Format of the results, possible values are: json, jsonl, csv, html, xlsx, xml and rss. The default value is json.
offset: int | None = Nonekeyword-only
Number of items that should be skipped at the start. The default value is 0
limit: int | None = Nonekeyword-only
Maximum number of items to return. By default there is no limit.
desc: bool | None = Nonekeyword-only
By default, results are returned in the same order as they were stored. To reverse the order, set this parameter to True.
clean: bool | None = Nonekeyword-only
If True, returns only non-empty items and skips hidden fields (i.e. fields starting with the # character). The clean parameter is just a shortcut for skip_hidden=True and skip_empty=True parameters. Note that since some objects might be skipped from the output, that the result might contain less items than the limit value.
bom: bool | None = Nonekeyword-only
All text responses are encoded in UTF-8 encoding. By default, csv files are prefixed with the UTF-8 Byte Order Mark (BOM), while json, jsonl, xml, html and rss files are not. If you want to override this default behavior, specify bom=True query parameter to include the BOM or bom=False to skip it.
delimiter: str | None = Nonekeyword-only
A delimiter character for CSV files. The default delimiter is a simple comma (,).
fields: list[str] | None = Nonekeyword-only
A list of fields which should be picked from the items, only these fields will remain in the resulting record objects. Note that the fields in the outputted items are sorted the same way as they are specified in the fields parameter. You can use this feature to effectively fix the output format.
omit: list[str] | None = Nonekeyword-only
A list of fields which should be omitted from the items.
unwind: str | list[str] | None = Nonekeyword-only
A list of fields which should be unwound, in order which they should be processed. Each field should be either an array or an object. If the field is an array then every element of the array will become a separate record and merged with parent object. If the unwound field is an object then it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object and therefore cannot be merged with a parent object, then the item gets preserved as it is. Note that the unwound items ignore the desc parameter.
skip_empty: bool | None = Nonekeyword-only
If True, then empty items are skipped from the output. Note that if used, the results might contain less items than the limit value.
skip_header_row: bool | None = Nonekeyword-only
If True, then header row in the csv format is skipped.
skip_hidden: bool | None = Nonekeyword-only
If True, then hidden fields are skipped from the output, i.e. fields starting with the # character.
xml_root: str | None = Nonekeyword-only
Overrides default root element name of xml output. By default the root element is items.
xml_row: str | None = Nonekeyword-only
Overrides default element name that wraps each page or page function result object in xml output. By default the element name is item.
flatten: list[str] | None = Nonekeyword-only
A list of fields that should be flattened
Returns bytes
The dataset items as raw bytes
iterate_items
Iterate over the items in the dataset.
https://docs.apify.com/api/v2#/reference/datasets/item-collection/get-items
Parameters
offset: int = 0keyword-only
Number of items that should be skipped at the start. The default value is 0
limit: int | None = Nonekeyword-only
Maximum number of items to return. By default there is no limit.
clean: bool | None = Nonekeyword-only
If True, returns only non-empty items and skips hidden fields (i.e. fields starting with the # character). The clean parameter is just a shortcut for skip_hidden=True and skip_empty=True parameters. Note that since some objects might be skipped from the output, that the result might contain less items than the limit value.
desc: bool | None = Nonekeyword-only
By default, results are returned in the same order as they were stored. To reverse the order, set this parameter to True.
fields: list[str] | None = Nonekeyword-only
A list of fields which should be picked from the items, only these fields will remain in the resulting record objects. Note that the fields in the outputted items are sorted the same way as they are specified in the fields parameter. You can use this feature to effectively fix the output format.
omit: list[str] | None = Nonekeyword-only
A list of fields which should be omitted from the items.
unwind: str | list[str] | None = Nonekeyword-only
A list of fields which should be unwound, in order which they should be processed. Each field should be either an array or an object. If the field is an array then every element of the array will become a separate record and merged with parent object. If the unwound field is an object then it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object and therefore cannot be merged with a parent object, then the item gets preserved as it is. Note that the unwound items ignore the desc parameter.
skip_empty: bool | None = Nonekeyword-only
If True, then empty items are skipped from the output. Note that if used, the results might contain less items than the limit value.
skip_hidden: bool | None = Nonekeyword-only
If True, then hidden fields are skipped from the output, i.e. fields starting with the # character.
Yields: dict: An item from the dataset
Returns Iterator[dict]
list_items
List the items of the dataset.
https://docs.apify.com/api/v2#/reference/datasets/item-collection/get-items
Parameters
offset: int | None = Nonekeyword-only
Number of items that should be skipped at the start. The default value is 0
limit: int | None = Nonekeyword-only
Maximum number of items to return. By default there is no limit.
clean: bool | None = Nonekeyword-only
If True, returns only non-empty items and skips hidden fields (i.e. fields starting with the # character). The clean parameter is just a shortcut for skip_hidden=True and skip_empty=True parameters. Note that since some objects might be skipped from the output, that the result might contain less items than the limit value.
desc: bool | None = Nonekeyword-only
By default, results are returned in the same order as they were stored. To reverse the order, set this parameter to True.
fields: list[str] | None = Nonekeyword-only
A list of fields which should be picked from the items, only these fields will remain in the resulting record objects. Note that the fields in the outputted items are sorted the same way as they are specified in the fields parameter. You can use this feature to effectively fix the output format.
omit: list[str] | None = Nonekeyword-only
A list of fields which should be omitted from the items.
unwind: str | list[str] | None = Nonekeyword-only
A list of fields which should be unwound, in order which they should be processed. Each field should be either an array or an object. If the field is an array then every element of the array will become a separate record and merged with parent object. If the unwound field is an object then it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object and therefore cannot be merged with a parent object, then the item gets preserved as it is. Note that the unwound items ignore the desc parameter.
skip_empty: bool | None = Nonekeyword-only
If True, then empty items are skipped from the output. Note that if used, the results might contain less items than the limit value.
skip_hidden: bool | None = Nonekeyword-only
If True, then hidden fields are skipped from the output, i.e. fields starting with the # character.
flatten: list[str] | None = Nonekeyword-only
A list of fields that should be flattened
view: str | None = Nonekeyword-only
Name of the dataset view to be used
Returns ListPage
A page of the list of dataset items according to the specified filters.
push_items
Push items to the dataset.
https://docs.apify.com/api/v2#/reference/datasets/item-collection/put-items
Parameters
items: JSONSerializable
Returns None
stream_items
Retrieve the items in the dataset as a stream.
https://docs.apify.com/api/v2#/reference/datasets/item-collection/get-items
Parameters
item_format: str = 'json'keyword-only
Format of the results, possible values are: json, jsonl, csv, html, xlsx, xml and rss. The default value is json.
offset: int | None = Nonekeyword-only
Number of items that should be skipped at the start. The default value is 0
limit: int | None = Nonekeyword-only
Maximum number of items to return. By default there is no limit.
desc: bool | None = Nonekeyword-only
By default, results are returned in the same order as they were stored. To reverse the order, set this parameter to True.
clean: bool | None = Nonekeyword-only
If True, returns only non-empty items and skips hidden fields (i.e. fields starting with the # character). The clean parameter is just a shortcut for skip_hidden=True and skip_empty=True parameters. Note that since some objects might be skipped from the output, that the result might contain less items than the limit value.
bom: bool | None = Nonekeyword-only
All text responses are encoded in UTF-8 encoding. By default, csv files are prefixed with the UTF-8 Byte Order Mark (BOM), while json, jsonl, xml, html and rss files are not. If you want to override this default behavior, specify bom=True query parameter to include the BOM or bom=False to skip it.
delimiter: str | None = Nonekeyword-only
A delimiter character for CSV files. The default delimiter is a simple comma (,).
fields: list[str] | None = Nonekeyword-only
A list of fields which should be picked from the items, only these fields will remain in the resulting record objects. Note that the fields in the outputted items are sorted the same way as they are specified in the fields parameter. You can use this feature to effectively fix the output format.
omit: list[str] | None = Nonekeyword-only
A list of fields which should be omitted from the items.
unwind: str | list[str] | None = Nonekeyword-only
A list of fields which should be unwound, in order which they should be processed. Each field should be either an array or an object. If the field is an array then every element of the array will become a separate record and merged with parent object. If the unwound field is an object then it is merged with the parent object. If the unwound field is missing or its value is neither an array nor an object and therefore cannot be merged with a parent object, then the item gets preserved as it is. Note that the unwound items ignore the desc parameter.
skip_empty: bool | None = Nonekeyword-only
If True, then empty items are skipped from the output. Note that if used, the results might contain less items than the limit value.
skip_header_row: bool | None = Nonekeyword-only
If True, then header row in the csv format is skipped.
skip_hidden: bool | None = Nonekeyword-only
If True, then hidden fields are skipped from the output, i.e. fields starting with the # character.
xml_root: str | None = Nonekeyword-only
Overrides default root element name of xml output. By default the root element is items.
xml_row: str | None = Nonekeyword-only
Overrides default element name that wraps each page or page function result object in xml output. By default the element name is item.
Returns Iterator[httpx.Response]
The dataset items as a context-managed streaming Response
update
Update the dataset with specified fields.
https://docs.apify.com/api/v2#/reference/datasets/dataset/update-dataset
Parameters
name: str | None = Nonekeyword-only
The new name for the dataset
Returns dict
The updated dataset
Sub-client for manipulating a single dataset.