Các bài liên quan đến WER

Lưu lại các link hay và các mục quan trọng ở đây để sau này tiện tìm lại.

  • Why Word Error Rate Is Not A Good Metric for Speech Recognizer Training for the Speech Translation Task?
  • Word Error Rate Calculation Python Code.

    The Word Error Rate (short: WER) is a way to measure performance of an ASR. It compares a reference to an hypothesis and is defined like this:

    where

    • S is the number of substitutions,
    • D is the number of deletions,
    • I is the number of insertions and
    • N is the number of words in the reference (Tổng số từ trong câu chuẩn)

    Bản chất là:

    where

    • org: câu gốc, câu True label
    • app: câu do model dự đoán.

Run code in here: http://www.compileonline.com/execute_python_online.php

#!/usr/bin/env python

def wer(r, h):
    """
    Calculation of WER with Levenshtein distance.

    Works only for iterables up to 254 elements (uint8).
    O(nm) time ans space complexity.

    Parameters
    ----------
    r : list
    h : list

    Returns
    -------
    int

    Examples
    --------
    >>> wer("who is there".split(), "is there".split())
    1
    >>> wer("who is there".split(), "".split())
    3
    >>> wer("".split(), "who is there".split())
    3
    """
    # initialisation
    import numpy
    d = numpy.zeros((len(r)+1)*(len(h)+1), dtype=numpy.uint8)
    d = d.reshape((len(r)+1, len(h)+1))
    for i in range(len(r)+1):
        for j in range(len(h)+1):
            if i == 0:
                d[0][j] = j
            elif j == 0:
                d[i][0] = i

    # computation
    for i in range(1, len(r)+1):
        for j in range(1, len(h)+1):
            if r[i-1] == h[j-1]:
                d[i][j] = d[i-1][j-1]
            else:
                substitution = d[i-1][j-1] + 1
                insertion    = d[i][j-1] + 1
                deletion     = d[i-1][j] + 1
                d[i][j] = min(substitution, insertion, deletion)

    return d[len(r)][len(h)]

if __name__ == "__main__":
    import doctest
    doctest.testmod()
  h='Mathworks connection programs';
  r='MathWorks Connections Program';
  w=WER(h,r);
  disp('WER, case sensitive');disp(w(1))
  disp('WER, case insensitive');disp(w(2))
  w=WER(h,r,1);
  disp('CHR, case sensitive');disp(w(1))
  disp('CHR, case insensitive');disp(w(2))
  ------------------------
  WER, case sensitive
       1
  WER, case insensitive
        0.66667
  CHR, case sensitive
        0.17241
  CHR, case insensitive
       0.068966

Leave a comment