The AfricanVoices corpus is a speech corpus containing datasets of aligned sentences and audio for 11 languages. We have uploaded data for {{ num_languages }} different languages in this website so far. We obtain the datasets in three ways:
- Create and record
- Align audio books from sources like Open.Bible , ALLFA project, LLSTI
- Obtain single speaker audio-text pairs from sources like Mozilla CommonVoice
Datasets
| Data_id | Lang code | Language | Source | Speaker | No. of sentences | Hrs | MCD* | Quality | {#rfs #utt | #} {#rfs #mcd | #}Download | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| {{ dataset.data_id }} | {{ dataset.lang.lang_code_639_2 }} | {% if not dataset.lang.lang_wikipedia_url %}{{ dataset.lang.lang_name }} | {% else %}{{ dataset.lang.lang_name }} | {% endif %}{{ dataset.source }} | {{ dataset.speaker_gender }} | {{ dataset.pass1_utt }} | {{ dataset.duration|floatformat:2 }} | {{ dataset.pass1_mcd|floatformat:2 }} | {% if dataset.pass1_mcd <= 6 %}Good | {% elif dataset.pass1_mcd <= 7 %}Okay | {% elif dataset.pass1_mcd <= 8 %}Bad | {% else %}}Something is wrong | {% endif %} {% if not dataset.data_location %}**Unavailable | {% else %}Available | {% endif %}