Liệt kê các phương pháp đo lường kết quả trong các bài báo quốc tế về ASR (Automatic Speech Recognition).


Danh sách:

Tên bài Phương pháp Model liên quan năm & link Ảnh
SMALL-FOOTPRINT KEYWORD SPOTTING USING DEEP NEURAL NETWORK AND CONNECTIONIST TEMPORAL CLASSIFIER False Reject Rate - False Alarm Rate Baseline Deep-KWS (3x128), DNN(3x128)+CTC, DNN(3x256)+CTC, DNN(4x256)+CTC, DNN(6x512)+CTC 12 Sep 2017
END-TO-END MODELS WITH AUDITORY ATTENTION IN MULTI-CHANNEL KEYWORD SPOTTING False Reject Rate - False Alarm Rate Baseline, Attention, Attention_map, Transfer, Transfer_map, Tran_Multi_map 3 Nov 2018
HIERARCHICAL NEURAL NETWORK ARCHITECTURE IN KEYWORD SPOTTING Recall rate – Miswake/hour HNN1/2/3 (allbn, 1bn), baseline 6 Nov 2018
EFFICIENT KEYWORD SPOTTING USING DILATED CONVOLUTIONS AND GATING False Reject Rate - False Alarm Rate per hour CNN, LSTM, WaveNet 19 Nov 2018
QUERY-BY-EXAMPLE KEYWORD SPOTTING USING LONG SHORT-TERM MEMORY NETWORKS False Reject Rate - False Alarm Rate Phone DNN/LSTM+DTW, LSTM Feat Extrator 2015
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System LER- train set size MFCC, MFCC+AUG, POW, POU+AUG 13 Sep 2016
minimum-word-error-rate-training-for-attention-based-sequence-to-sequence-models WER-Training Eporchs Lamda=0/0.01/0.1, Bi-LAS, +MWER, Uni-LAS, MWER, CD-phone (CE + sMBR) Dec 05, 2017
EESEN: END-TO-END SPEECH RECOGNITION USING DEEP RNN MODELS AND WFST-BASED DECODING WER, #params~8.5M Eesen RNN,Hybrid HMM/DNN dùng LM: Lexicon, trigram 2015  
Low latency acoustic modeling using temporal convolution and LSTMs WER TDNN-D, LFR-LSTM, LFR-BLSTM, MFR-LSTM, MFR-BLSTM 2018 Stacking LSTMs over time-delay neural network (TDNN)
Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling     interspeech_2014 Baseline model
         
         
         
         
         
         
         

. . .

Leave a comment