This repository contains all the models and datasets for train/evaluate the Japanese ASR dataset generated through the process of achieving kotoba-whisper models.
Following table shows CER comparison with different data size of ReazonSpeech used to distill openai/whisper-large-v3. The model names follows
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-{size of reazonspeech}
.
CER
model | CommonVoice 8 (Japanese test set) | JSUT Basic 5000 | ReazonSpeech (held out test set) |
---|---|---|---|
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all | 9.2 | 8.4 | 11.6 |
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large | 9.4 | 8.5 | 12.2 |
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium | 10.9 | 11.3 | 14.8 |
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small | 30.2 | 39 | 40.7 |
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny | 94.8 | 96.3 | 96.7 |
openai/whisper-large-v3 | 8.5 | 7.1 | 14.9 |
openai/whisper-large-v2 | 9.7 | 8.2 | 28.1 |
openai/whisper-large | 10 | 8.9 | 34.1 |
openai/whisper-medium | 11.5 | 10 | 33.2 |
openai/whisper-base | 28.6 | 24.9 | 70.4 |
openai/whisper-small | 15.1 | 14.2 | 41.5 |
openai/whisper-tiny | 53.7 | 36.5 | 137.9 |
reazon-research/reazonspeech-nemo-v2 | 9.1 | 7.4 | 11.2 |
WER
model | CommonVoice 8 (Japanese test set) | JSUT Basic 5000 | ReazonSpeech (held out test set) |
---|---|---|---|
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all | 58.8 | 63.7 | 55.6 |
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large | 59.2 | 64.3 | 56.4 |
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium | 64.6 | 72.1 | 63 |
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small | 85 | 94.2 | 82.1 |
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny | 100 | 100 | 99 |
openai/whisper-large-v3 | 55.1 | 59.2 | 60.2 |
openai/whisper-large-v2 | 59.3 | 63.2 | 74.1 |
openai/whisper-large | 61.1 | 66.4 | 74.9 |
openai/whisper-medium | 63.4 | 69.5 | 76 |
openai/whisper-base | 87.2 | 93 | 91.8 |
openai/whisper-small | 74.2 | 81.9 | 83 |
openai/whisper-tiny | 93.8 | 97.6 | 94.9 |
reazon-research/reazonspeech-nemo-v2 | 57.5 | 60.6 | 47.5 |
Note that kotoba-tech/kotoba-whisper-v1.0 is an alias of japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large and kotoba-tech/kotoba-whisper-v2.0 is an alias of japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all.
Please find more detailed results at kotoba-whisper codebase.