Liệt kê các phương pháp đo lường kết quả trong các bài báo quốc tế về ASR (Automatic Speech Recognition).
Danh sách:
Tên bài | Phương pháp | Model liên quan | năm & link | Ảnh |
---|---|---|---|---|
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORK AND CONNECTIONIST TEMPORAL CLASSIFIER | False Reject Rate - False Alarm Rate | Baseline Deep-KWS (3x128), DNN(3x128)+CTC, DNN(3x256)+CTC, DNN(4x256)+CTC, DNN(6x512)+CTC | 12 Sep 2017 | |
END-TO-END MODELS WITH AUDITORY ATTENTION IN MULTI-CHANNEL KEYWORD SPOTTING | False Reject Rate - False Alarm Rate | Baseline, Attention, Attention_map, Transfer, Transfer_map, Tran_Multi_map | 3 Nov 2018 | |
HIERARCHICAL NEURAL NETWORK ARCHITECTURE IN KEYWORD SPOTTING | Recall rate – Miswake/hour | HNN1/2/3 (allbn, 1bn), baseline | 6 Nov 2018 | |
EFFICIENT KEYWORD SPOTTING USING DILATED CONVOLUTIONS AND GATING | False Reject Rate - False Alarm Rate per hour | CNN, LSTM, WaveNet | 19 Nov 2018 | |
QUERY-BY-EXAMPLE KEYWORD SPOTTING USING LONG SHORT-TERM MEMORY NETWORKS | False Reject Rate - False Alarm Rate | Phone DNN/LSTM+DTW, LSTM Feat Extrator | 2015 | |
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System | LER- train set size | MFCC, MFCC+AUG, POW, POU+AUG | 13 Sep 2016 | |
minimum-word-error-rate-training-for-attention-based-sequence-to-sequence-models | WER-Training Eporchs | Lamda=0/0.01/0.1, Bi-LAS, +MWER, Uni-LAS, MWER, CD-phone (CE + sMBR) | Dec 05, 2017 | |
EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING | WER, #params~8.5M | Eesen RNN,Hybrid HMM/DNN dùng LM: Lexicon, trigram | 2015 | |
Low latency acoustic modeling using temporal convolution and LSTMs | WER | TDNN-D, LFR-LSTM, LFR-BLSTM, MFR-LSTM, MFR-BLSTM | 2018 | Stacking LSTMs over time-delay neural network (TDNN) |
Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling | interspeech_2014 | Baseline model | ||
. . .
Leave a comment