Japanese ASR

This repository contains all the models and datasets for train/evaluate the Japanese ASR dataset generated through the process of achieving kotoba-whisper models. Following table shows CER comparison with different data size of ReazonSpeech used to distill openai/whisper-large-v3. The model names follows japanese-asr/distil-whisper-large-v3-ja-reazonspeech-{size of reazonspeech}.

CER

model	CommonVoice 8 (Japanese test set)	JSUT Basic 5000	ReazonSpeech (held out test set)
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all	9.2	8.4	11.6
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large	9.4	8.5	12.2
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium	10.9	11.3	14.8
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small	30.2	39	40.7
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny	94.8	96.3	96.7
openai/whisper-large-v3	8.5	7.1	14.9
openai/whisper-large-v2	9.7	8.2	28.1
openai/whisper-large	10	8.9	34.1
openai/whisper-medium	11.5	10	33.2
openai/whisper-base	28.6	24.9	70.4
openai/whisper-small	15.1	14.2	41.5
openai/whisper-tiny	53.7	36.5	137.9
reazon-research/reazonspeech-nemo-v2	9.1	7.4	11.2

WER

model	CommonVoice 8 (Japanese test set)	JSUT Basic 5000	ReazonSpeech (held out test set)
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all	58.8	63.7	55.6
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large	59.2	64.3	56.4
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-medium	64.6	72.1	63
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-small	85	94.2	82.1
japanese-asr/distil-whisper-large-v3-ja-reazonspeech-tiny	100	100	99
openai/whisper-large-v3	55.1	59.2	60.2
openai/whisper-large-v2	59.3	63.2	74.1
openai/whisper-large	61.1	66.4	74.9
openai/whisper-medium	63.4	69.5	76
openai/whisper-base	87.2	93	91.8
openai/whisper-small	74.2	81.9	83
openai/whisper-tiny	93.8	97.6	94.9
reazon-research/reazonspeech-nemo-v2	57.5	60.6	47.5

Note that kotoba-tech/kotoba-whisper-v1.0 is an alias of japanese-asr/distil-whisper-large-v3-ja-reazonspeech-large and kotoba-tech/kotoba-whisper-v2.0 is an alias of japanese-asr/distil-whisper-large-v3-ja-reazonspeech-all.

Please find more detailed results at kotoba-whisper codebase.