ragoon.datasets

ragoon.datasets#

Functions

dataset_loader(name[, streaming, split])

Helper function to load a single dataset in parallel.

load_datasets(req[, streaming])

Downloads datasets specified in a list and creates a list of loaded datasets.

ragoon.datasets.dataset_loader(name: str, streaming: bool | None = True, split: str | List[str] | None = None) Dataset[source]#

Helper function to load a single dataset in parallel.

Parameters:
  • name (str) – Name of the dataset to be loaded.

  • streaming (bool, optional) – Determines if datasets are streamed. Default is True.

  • split (Optional[Union[str, List[str]]], optional) – Which split of the data to load. If None, will return a dict with all splits (typically datasets.Split.TRAIN and datasets.Split.TEST). If given, will return a single Dataset. Splits can be combined and specified like in tensorflow-datasets.

Returns:

dataset – Loaded dataset object.

Return type:

datasets.Dataset

Raises:

Exception – If an error occurs during dataset loading.

ragoon.datasets.load_datasets(req: list, streaming: bool | None = False) list[source]#

Downloads datasets specified in a list and creates a list of loaded datasets.

Parameters:
  • req (list) – A list containing the names of datasets to be downloaded.

  • streaming (bool, optional) – Determines if datasets are streamed. Default is False.

Returns:

datasets_list – A list containing loaded datasets as per the requested names provided in ‘req’.

Return type:

list

Raises:

Exception – If an error occurs during dataset loading or processing.

Examples

>>> req = [
...    "louisbrulenaudet/code-artisanat",
...    "louisbrulenaudet/code-action-sociale-familles",
... # ...
]
>>> datasets_list = load_datasets(
...    req=req,
...    streaming=True
)
>>> dataset = datasets.concatenate_datasets(
...    datasets_list
)