datasets
datatrove==0.2.0
huggingface_hub==0.23.1
pyyaml
ruff
tqdm
streaming