ragoon.datasets#
Functions
|
Helper function to load a single dataset in parallel. |
|
Downloads datasets specified in a list and creates a list of loaded datasets. |
- ragoon.datasets.dataset_loader(name: str, streaming: bool | None = True, split: str | List[str] | None = None) Dataset[source]#
Helper function to load a single dataset in parallel.
- Parameters:
name (str) – Name of the dataset to be loaded.
streaming (bool, optional) – Determines if datasets are streamed. Default is True.
split (Optional[Union[str, List[str]]], optional) – Which split of the data to load. If None, will return a dict with all splits (typically datasets.Split.TRAIN and datasets.Split.TEST). If given, will return a single Dataset. Splits can be combined and specified like in tensorflow-datasets.
- Returns:
dataset – Loaded dataset object.
- Return type:
datasets.Dataset
- Raises:
Exception – If an error occurs during dataset loading.
- ragoon.datasets.load_datasets(req: list, streaming: bool | None = False) list[source]#
Downloads datasets specified in a list and creates a list of loaded datasets.
- Parameters:
req (list) – A list containing the names of datasets to be downloaded.
streaming (bool, optional) – Determines if datasets are streamed. Default is False.
- Returns:
datasets_list – A list containing loaded datasets as per the requested names provided in ‘req’.
- Return type:
list
- Raises:
Exception – If an error occurs during dataset loading or processing.
Examples
>>> req = [ ... "louisbrulenaudet/code-artisanat", ... "louisbrulenaudet/code-action-sociale-familles", ... # ... ]
>>> datasets_list = load_datasets( ... req=req, ... streaming=True )
>>> dataset = datasets.concatenate_datasets( ... datasets_list )