A basis representation of constrained MLLR transforms for robust adaptation☆

Publication year: 2012
Source: Computer Speech & Language, Volume 26, Issue 1, January 2012, Pages 35-51

Daniel, Povey , Kaisheng, Yao

 Abstract: Constrained Maximum Likelihood Linear Regression (CMLLR) is a speaker adaptation method for speech recognition that can be realized as a feature-space transformation. In its original form it does not work well when the amount of speech available for adaptation is less than about 5s, because of the difficulty of robustly estimating the parameters of the transformation matrix. In this paper we describe a basis representation of the CMLLR transformation matrix, in which the variation between speakers is concentrated in the leading coefficients. When adapting to a speaker, we can select a variable number of coefficients to estimate depending on the…

 Highlights: ► We address the estimation of Constrained MLLR (CMLLR) transforms from limited data. ► We represent the CMLLR matrix as a weighted sum of basis matrices. ► These come from a preconditioned PCA procedure that approximates Maximum Likelihood. ► We estimate a larger number of coefficients for longer utterances. ► We demonstrate improvements versus Bayesian approaches.