Các bài liên quan đến WER
Lưu lại các link hay và các mục quan trọng ở đây để sau này tiện tìm lại.
- Why Word Error Rate Is Not A Good Metric for Speech Recognizer Training for the Speech Translation Task?
-
Word Error Rate Calculation Python Code.
The Word Error Rate (short: WER) is a way to measure performance of an ASR. It compares a reference to an hypothesis and is defined like this:
where
- S is the number of substitutions,
- D is the number of deletions,
- I is the number of insertions and
- N is the number of words in the reference (Tổng số từ trong câu chuẩn)
Bản chất là:
where
- org: câu gốc, câu True label
- app: câu do model dự đoán.
Run code in here: http://www.compileonline.com/execute_python_online.php
#!/usr/bin/env python
def wer(r, h):
"""
Calculation of WER with Levenshtein distance.
Works only for iterables up to 254 elements (uint8).
O(nm) time ans space complexity.
Parameters
----------
r : list
h : list
Returns
-------
int
Examples
--------
>>> wer("who is there".split(), "is there".split())
1
>>> wer("who is there".split(), "".split())
3
>>> wer("".split(), "who is there".split())
3
"""
# initialisation
import numpy
d = numpy.zeros((len(r)+1)*(len(h)+1), dtype=numpy.uint8)
d = d.reshape((len(r)+1, len(h)+1))
for i in range(len(r)+1):
for j in range(len(h)+1):
if i == 0:
d[0][j] = j
elif j == 0:
d[i][0] = i
# computation
for i in range(1, len(r)+1):
for j in range(1, len(h)+1):
if r[i-1] == h[j-1]:
d[i][j] = d[i-1][j-1]
else:
substitution = d[i-1][j-1] + 1
insertion = d[i][j-1] + 1
deletion = d[i-1][j] + 1
d[i][j] = min(substitution, insertion, deletion)
return d[len(r)][len(h)]
if __name__ == "__main__":
import doctest
doctest.testmod()
- WER, Word Error Rate code in MathLab, to uderstand how it work:
h='Mathworks connection programs';
r='MathWorks Connections Program';
w=WER(h,r);
disp('WER, case sensitive');disp(w(1))
disp('WER, case insensitive');disp(w(2))
w=WER(h,r,1);
disp('CHR, case sensitive');disp(w(1))
disp('CHR, case insensitive');disp(w(2))
------------------------
WER, case sensitive
1
WER, case insensitive
0.66667
CHR, case sensitive
0.17241
CHR, case insensitive
0.068966
Leave a comment