diarizers-community aims to promote speaker diarization on the Hugging Face hub. It contains:
The available datasets are the CallHome (Japanese, Chinese, German, Spanish, English), AMI Corpus (English), Vox-Converse (English) and Simsamu (French). We aim to add more datasets in the future to better support speaker diarization on the Hub.
Each model has been fine-tuned on a specific Callhome language subset. They achieve better performances on multilingual data compared to pyannote's pre-trained segmentation-3.0 model (see benchmark for more details on model performance).
Together with diarizers-community, we release:
diarizers, a library for fine-tuning pyannote speaker diarization models using the Hugging Face ecosystem.
A google colab notebook, with a step-by-step guide on how to use diarizers.
Benchmark
Callhome test dataset | Model | DER | False alarm | Missed detection | Confusion |
---|---|---|---|---|---|
Japanese | Pretrained | 25.44 | 2.30 | 17.45 | 5.69 |
Fine-tuned | 18.23 | 6.31 | 6.91 | 5.01 | |
Spanish | Pretrained | 33.44 | 2.59 | 25.19 | 5.66 |
Fine-tuned | 25.72 | 6.87 | 12.73 | 6.12 | |
English | Pretrained | 22.16 | 6.29 | 10.97 | 4.90 |
Fine-tuned | 18.40 | 7.10 | 6.98 | 4.32 | |
German | Pretrained | 21.90 | 3.10 | 14.25 | 4.55 |
Fine-tuned | 16.75 | 5.00 | 7.75 | 4.00 | |
Chinese | Pretrained | 19.73 | 4.81 | 9.82 | 5.11 |
Fine-tuned | 15.95 | 5.04 | 7.24 | 3.68 |
Results are in %. They have been obtained using the test script from diarizers.