banner

Technologies

核心技術

語音識別技術

有光科技自主研發的語音識別技術,利用海量語音數據進行深度學習,能夠準確識別普通話、英語,以及粵語、四川話等方言和小語種。

通過行業真實數據的訓練以及模型算法的不斷優化,我們的語音識別引擎能夠對方言口音、行業話術、背景噪音等特征變量進行針對性的優化,

顯著提升語音識別引擎在不同環境中的準確率和穩定性。

技術特點

  • 圖片一 支持各類方言和
    小語種
  • 圖片二 自主學習
    識別率不斷提升
  • 圖片三 強化訓練
    準確識別行業話術
  • 圖片四 定制化開發
    部署靈活

應用領域

  • 語音聊天機器人

    語音聊天機器人
  • 語音轉寫與記錄

    語音轉寫與記錄
  • 語音分析系統

    語音分析系統
  • 聲紋識別

    聲紋識別
  • 語音輸入

    語音輸入
  • 語音助手

    語音助手
  • 智能家居

    智能家居
  • 可穿戴設備

    可穿戴設備

科研論文

  • Domain Adaptation of End-to-end Speech Recognition in Low-resource Settings

    Lahiru Samarakoon, Brian Mak, and Albert Y.S. Lam. IEEE Workshop on Spoken Language Technology (IEEE SLT 2018), Athens, Greece, Dec. 2018.

    End-to-end automatic speech recognition (ASR) has simplified the traditional ASR system building pipeline by eliminating the need to have multiple components and also the requirement for expert linguistic knowledge for creating pronunciation dictionaries. Therefore, end-to-end ASR fits well when building systems for new domains. However, one major drawback of end-to-end ASR is that, it is necessary to have a larger amount of labeled speech in comparison to traditional methods. Therefore, in this paper, we explore domain adaptation approaches for end-to-end ASR in low-resource settings. We show that joint domain identification and speech recognition by inserting a symbol for domain at the beginning of the label sequence, factorized hidden layer adaptation and a domain-specific gating mechanism improve the performance for a low-resource target domain. Furthermore, we also show the robustness of proposed adaptation methods to an unseen domain, when only 3 hours of untranscribed data is available with improvements reporting up to 8.7% relative.


  • Subspace Based Sequence Discriminative Training of LSTM Acoustic Models with Feed-Forward Layers

    Lahiru Samarakoon, Brian Mak, and Albert Y.S. Lam. ISCSLP, Taipei, Taiwan, Nov. 2018.

    State-of-the-art automatic speech recognition (ASR) systems use sequence discriminative training for improved performance over frame-level cross-entropy (CE) criterion. Even though sequence discriminative training improves long short-term memory (LSTM) recurrent neural network (RNN) acoustic models (AMs), it is not clear whether these systems achieve the optimal performance due to overfitting. This paper investigates the effect of state-level minimum Bayes risk (sMBR) training on LSTM AMs and shows that the conventional way of performing sMBR by updating all LSTM parameters is not optimal. We investigate two methods to improve the performance of sequence discriminative training of LSTM AMs. First more feed-forward (FF) layers are included between the last LSTM layer and the output layer so those additional FF layers may bene- fit more from sMBR training. Second, a subspace is estimated as an interpolation of rank-1 matrices when performing sMBR for the LSTM layers of the AM. Our methods are evaluated in benchmark AMI single distance microphone (SDM) task. We find that the proposed approaches provide 1.6% absolute improvement over a strong sMBR trained LSTM baseline.