References

Man-Wai Mak; Jen-Tzung Chien

References

Published online by Cambridge University Press: 26 June 2020

Man-Wai Mak and

Jen-Tzung Chien

Show author details

Man-Wai Mak: Affiliation:
The Hong Kong Polytechnic University
Jen-Tzung Chien: Affiliation:
National Chiao Tung University, Taiwan

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Machine Learning for Speaker Recognition , pp. 289 - 306

DOI: https://doi.org/10.1017/9781108552332 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

[1] Bishop, C. M., Pattern Recognition and Machine Learning. New York: Springer, 2006.Google Scholar

[2] Tan, Z. L. and Mak, M. W., “Bottleneck features from SNR-adaptive denoising deep classifier for speaker identification,” in Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2015.Google Scholar

[3] Tan, Z., Mak, M., Mak, B. K., and Zhu, Y., “Denoised senone i-vectors for robust speaker verification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 4, pp. 820–830, Apr. 2018.CrossRef Google Scholar

[4] Maaten, L. v. d. and Hinton, G., “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579–2605, Nov. 2008.Google Scholar

[5] Moattar, M. H. and Homayounpour, M. M., “A review on speaker diarization systems and approaches,” Speech Communication, vol. 54, no. 10, pp. 1065–1103, 2012.Google Scholar

[6] Davis, S. B. and Mermelstein, P., “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, Aug. 1980.Google Scholar

[7] Reynolds, D. A., Quatieri, T. F., and Dunn, R. B., “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, no. 1–3, pp. 19–41, Jan. 2000.Google Scholar

[8] Dempster, A. P., Laird, N. M., and Rubin, D. B., “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977.Google Scholar

[9] Pelecanos, J. and Sridharan, S., “Feature warping for robust speaker verification,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2001, pp. 213–218.Google Scholar

[10] Mak, M. W., Yiu, K. K., and Kung, S. Y., “Probabilistic feature-based transformation for speaker verification over telephone networks,” Neurocomputing: Special Issue on Neural Networks for Speech and Audio Processing, vol. 71, pp. 137–146, 2007.Google Scholar

[11] Teunen, R., Shahshahani, B., and Heck, L., “A model-based transformational approach to robust speaker recognition,” in Proc of International Conference on Spoken Language Processing (ICSLP), vol. 2, 2000, pp. 495–498.Google Scholar

[12] Yiu, K. K., Mak, M. W., and Kung, S. Y., “Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning,” Computer Speech and Language, vol. 21, pp. 231–246, 2007.Google Scholar

[13] Auckenthaler, R., Carey, M., and Lloyd-Thomas, H., “Score normalization for text-independent speaker verification systems,” Digital Signal Processing, vol. 10, no. 1–3, pp. 42–54, Jan. 2000.Google Scholar

[14] Campbell, W. M., Sturim, D. E., and Reynolds, D. A., “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Processing Letters, vol. 13, no. 5, pp. 308–311, May 2006.Google Scholar

[15] Kenny, P., Boulianne, G., Ouellet, P., and Dumouchel, P., “Joint factor analysis versus eigen-channels in speaker recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1435–1447, May 2007.Google Scholar

[16] Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., and Ouellet, P., “Front-end factor analysis for speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788–798, May 2011.Google Scholar

[17] Prince, S. and Elder, J., “Probabilistic linear discriminant analysis for inferences about identity,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2007, pp. 1–8.CrossRef Google Scholar

[18] Martin, A., Doddington, G., Kamm, T., Ordowski, M., and Przybocki, M., “The DET curve in assessment of detection task performance,” in Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH), 1997, pp. 1895–1898.Google Scholar

[19] Leeuwen, D. and Brümmer, N., “The distribution of calibrated likelihood-ratios in speaker recognition,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2013, pp. 1619–1623.Google Scholar

[20] Hornik, K., Stinchcombe, M., and White, H., “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989.Google Scholar

[21] Kullback, S. and Leibler, R. A., “On information and sufficiency,” Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951.CrossRef Google Scholar

[22] Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L., “An introduction to variational methods for graphical models,” Machine Learning, vol. 37, no. 2, pp. 183–233, 1999.Google Scholar

[23] Attias, H., “Inferring parameters and structure of latent variable models by variational Bayes,” in Proceedings of Conference on Uncertainty in Artificial Intelligence (UAI), 1999, pp. 21–30.Google Scholar

[24] Neal, R. M., “Probabilistic inference using Markov chain Monte Carlo methods,” Department of Computer Science, University of Toronto, Tech. Rep., 1993.Google Scholar

[25] Liu, J. S., Monte Carlo Strategies in Scientific Computing. New York, NY: Springer, 2008.Google Scholar

[26] Andrieu, C., De Freitas, N., Doucet, A., and Jordan, M. I., “An introduction to MCMC for machine learning,” Machine Learning, vol. 50, no. 1-2, pp. 5–43, 2003.Google Scholar

[27] Geman, S. and Geman, D., “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 1, pp. 721–741, 1984.Google Scholar

[28] Hastings, W. K., “Monte Carlo sampling methods using Markov chains and their applications,” Biometrika, vol. 57, pp. 97–109, 1970.Google Scholar

[29] Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M., “Hierarchical Dirichlet processes,” Journal of American Statistical Association, vol. 101, no. 476, pp. 1566–1581, 2006.CrossRef Google Scholar

[30] Watanabe, S. and Chien, J.-T., Bayesian Speech and Language Processing. Cambridge, UK: Cambridge University Press, 2015.Google Scholar

[31] MacKay, D. J., “Bayesian interpolation,” Neural computation, vol. 4, no. 3, pp. 415–447, 1992.Google Scholar

[32] Kung, S. Y., Mak, M. W., and Lin, S. H., Biometric Authentication: A Machine Learning Approach. Englewood Cliffs, NJ: Prentice Hall, 2005.Google Scholar

[33] Vapnik, V. N., The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.Google Scholar

[34] Boyd, S. P. and Vandenberghe, L., Convex Optimization. New York: Cambridge University Press, 2004.Google Scholar

[35] Mak, M. and Rao, W., “Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification,” Speech Communication, vol. 53, no. 1, pp. 119–130, Jan. 2011.CrossRef Google Scholar

[36] Wu, G. and Chang, E. Y., “KBA: Kernel boundary alignment considering imbalanced data distribution,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 786–795, 2005.Google Scholar

[37] Tang, Y., Zhang, Y. Q., Chawla, N. V., and Krasser, S., “SVMs modeling for highly imbalanced classification,” IEEE Transactions on System, Man, and Cybernetics, Part B, vol. 39, no. 1, pp. 281–288, Feb. 2009.CrossRef Google Scholar PubMed

[38] Mak, M. W. and Rao, W., “Acoustic vector resampling for GMMSVM-based speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2010, pp. 1449–1452.Google Scholar

[39] Rao, W. and Mak, M. W., “Addressing the data-imbalance problem in kernel-based speaker verification via utterance partitioning and speaker comparison,” in Interspeech, 2011, pp. 2717–2720.Google Scholar

[40] Solomonoff, A., Quillen, C., and Campbell, W. M., “Channel compensation for SVM speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2004, pp. 57–62.Google Scholar

[41] Solomonoff, A., Campbell, W. M., and Boardman, I., “Advances in channel compensation for SVM speaker recognition,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2005, pp. 629–632.Google Scholar

[42] Campbell, W. M., Sturim, D. E., Reynolds, D. A., and Solomonoff, A., “SVM based speaker verification using a GMM supervector kernel and NAP variability compensation,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 2006, pp. 97–100.Google Scholar

[43] Kokiopoulou, E., Chen, J., and Saad, Y., “Trace optimization and eigenproblems in dimension reduction methods,” Numerical Linear Algebra with Applications, vol. 18, no. 3, pp. 565–602, 2011.Google Scholar

[44] Bromiley, P., “Products and convolutions of Gaussian probability density functions,” Tina-Vision Memo, vol. 3, no. 4, 2003.Google Scholar

[45] Kenny, P., Boulianne, G., and Dumouchel, P., “Eigenvoice modeling with sparse training data,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 3, pp. 345–354, 2005.Google Scholar

[46] Kay, S. M., Fundamentals of Statistical Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1993.Google Scholar

[47] Mak, M. W. and Chien, J. T., “PLDA and mixture of PLDA formulations,” Supplementary Materials for “Mixture of PLDA for Noise Robust I-Vector Speaker Verification,” IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, No. 1, pp. 130–142, Jan. 2016. [Online]. Available: http://bioinfo.eie.polyu.edu.hk/mPLDA/SuppMaterials.pdf Google Scholar

[48] Rajan, P., Afanasyev, A., Hautamäki, V., and Kinnunen, T., “From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification,” Digital Signal Processing, vol. 31, pp. 93–101, 2014.Google Scholar

[49] Chen, L., Lee, K. A., Ma, B., Guo, W., Li, H., and Dai, L. R., “Minimum divergence estimation of speaker prior in multi-session PLDA scoring,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 4007–4011.Google Scholar

[50] Cumani, S., Plchot, O., and Laface, P., “On the use of i-vector posterior distributions in probabilistic linear discriminant analysis,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 4, pp. 846–857, 2014.CrossRef Google Scholar

[51] Burget, L., Plchot, O., Cumani, S., Glembek, O., Matejka, P., and Briimmer, N., “Discrim-inatively trained probabilistic linear discriminant analysis for speaker verification,” in Acoustics, Speech, and Signal Processing (ICASSP), 2011 IEEE International Conference on, 2011, pp. 4832–4835.Google Scholar

[52] Vasilakakis, V., Laface, P., and Cumani, S., “Pairwise discriminative speaker verification in the I-vector space,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 6, pp. 1217–1227, 2013.Google Scholar

[53] Rohdin, J., Biswas, S., and Shinoda, K., “Constrained discriminative plda training for speaker verification,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2014, pp. 1670–1674.Google Scholar

[54] Li, N. and Mak, M. W., “SNR-invariant PLDA modeling for robust speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2015.Google Scholar

[55] Li, N. and Mak, M. W., “SNR-invariant PLDA modeling in nonparametric subspace for robust speaker verification,” IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 23, no. 10, pp. 1648–1659, 2015.Google Scholar

[56] Sadjadi, S. O., Pelecanos, J., and Zhu, W., “Nearest neighbor discriminant analysis for robust speaker recognition,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2014, pp. 1860–1864.Google Scholar

[57] He, L., Chen, X., Xu, C., and Liu, J., “Multi-objective optimization training of plda for speaker verification,” in 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6026–6030.Google Scholar

[58] Ghahabi, O. and Hernando, J., “Deep belief networks for i-vector based speaker recognition,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 1700–1704.Google Scholar

[59] Stafylakis, T., Kenny, P., Senoussaoui, M., and Dumouchel, P., “Preliminary investigation of Boltzmann machine classifiers for speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2012.Google Scholar

[60] Ghahabi, O. and Hernando, J., “I-vector modeling with deep belief networks for multi-session speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2014, pp. 305–310.Google Scholar

[61] Kenny, P., “Bayesian speaker verification with heavy-tailed priors,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2010.Google Scholar

[62] Brummer, N., Silnova, A., Burget, L., and Stafylakis, T., “Gaussian meta-embeddings for efficient scoring of a heavy-tailed PLDA model,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2018, pp. 349–356.Google Scholar

[63] Petersen, K. B. and Pedersen, M. S., “The matrix cookbook,” Oct 2008. [Online]. Available: www2.imm.dtu.dk/pubdb/p.php?3274 Google Scholar

[64] Penny, W. D., “KL-Divergences of Normal, Gamma, Direchlet and Wishart densities,” Department of Cognitive Neurology, University College London, Tech. Rep., 2001.Google Scholar

[65] Soch, J. and Allefeld, C., “Kullback-Leibler divergence for the normal-Gamma distribution,” arXiv preprint arXiv:1611.01437, 2016.Google Scholar

[66] Garcia-Romero, D. and Espy-Wilson, C., “Analysis of i-vector length normalization in speaker recognition systems,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2011, pp. 249–252.Google Scholar

[67] Silnova, A., Brummer, N., Garcia-Romero, D., Snyder, D., and Burget, L., “Fast variational Bayes for heavy-tailed PLDA applied to i-vectors and x-vectors,” arXiv preprint arXiv:1803.09153, 2018.Google Scholar

[68] Shum, S., Dehak, N., Chuangsuwanich, E., Reynolds, D., and Glass, J., “Exploiting intra-conversation variability for speaker diarization,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2011, pp. 945–948.Google Scholar

[69] Khoury, E. and Garland, M., “I-vectors for speech activity detection,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2016, pp. 334–339.Google Scholar

[70] Dehak, N., Torres-Carrasquillo, P. A., Reynolds, D., and Dehak, R., “Language recognition via i-vectors and dimensionality reduction,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2011, pp. 857–860.Google Scholar

[71] Xu, S. S., Mak, M.-W., and Cheung, C.-C., “Patient-specific heartbeat classification based on i-vector adapted deep neural networks,” in Proceedings of IEEE International Conference on Bioinformatics and Biomedicine, 2018.Google Scholar

[72] Kenny, P., “A small footprint i-vector extractor,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2012.Google Scholar

[73] Luttinen, J. and Ilin, A., “Transformations in variational bayesian factor analysis to speed up learning,” Neurocomputing, vol. 73, no. 7–9, pp. 1093–1102, 2010.Google Scholar

[74] Hatch, A., Kajarekar, S., and Stolcke, A., “Within-class covariance normalization for SVM-based speaker recognition,” in Proceedings of International Conference on Spoken Language Processing (ICSLP), 2006, pp. 1471–1474.Google Scholar

[75] Fukunaga, K., Introduction to Statistical Pattern Recognition. Boston, MA: Academic Press, 1990.Google Scholar

[76] Li, Z., Lin, D., and Tang, X., “Nonparametric discriminant analysis for face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 4, pp. 755– 761, 2009.Google Scholar

[77] Bahmaninezhad, F. and Hansen, J. H., “I-vector/PLDA speaker recognition using support vectors with discriminant analysis,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2017, pp. 5410–5414.Google Scholar

[78] Mak, M. W. and Yu, H. B., “A study of voice activity detection techniques for NIST speaker recognition evaluations,” Computer, Speech and Language, vol. 28, no. 1, pp. 295–313, Jan. 2014.Google Scholar

[79] Rao, W. and Mak, M. W., “Boosting the performance of i-vector based speaker verification via utterance partitioning,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 21, no. 5, pp. 1012–1022, May 2013.Google Scholar

[80] Kenny, P., Stafylakis, T., Ouellet, P., Alam, M. J., and Dumouchel, P., “PLDA for speaker verification with utterances of arbitrary duration,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 7649–7653.Google Scholar

[81] Rao, W., Mak, M. W., and Lee, K. A., “Normalization of total variability matrix for i-vector/PLDA speaker verification,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4180–4184.Google Scholar

[82] Lin, W. W. and Mak, M. W., “Fast scoring for PLDA with uncertainty propagation,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2016, pp. 31–38.Google Scholar

[83] Lin, W. W., Mak, M. W., and Chien, J. T., “Fast scoring for PLDA with uncertainty propagation via i-vector grouping,” Computer Speech & Language, vol. 45, pp. 503–515, 2017.Google Scholar

[84] Lei, Y., Scheffer, N., Ferrer, L., and McLaren, M., “A novel scheme for speaker recognition using a phonetically-aware deep neural network,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014.Google Scholar

[85] Ferrer, L., Lei, Y., McLaren, M., and Scheffer, N., “Study of senone-based deep neural network approaches for spoken language recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 1, pp. 105–116, 2016.Google Scholar

[86] Kenny, P., “Joint factor analysis of speaker and session variability: Theory and algorithms,” CRIM, Montreal, Tech. Rep. CRIM-06/08-13, 2005.Google Scholar

[87] Kenny, P., Ouellet, P., Dehak, N., Gupta, V., and Dumouchel, P., “A study of inter-speaker variability in speaker verification,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 5, pp. 980–988, 2008.Google Scholar

[88] Glembek, O., Burget, L., Dehak, N., Brummer, N., and Kenny, P., “Comparison of scoring methods used in speaker recognition with joint factor analysis,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009, pp. 4057–4060.Google Scholar

[89] Hinton, G. E. and Salakhutdinov, R. R., “Reducing the dimensionality of data with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.Google Scholar

[90] Hinton, G. E., A Practical Guide to Training Restricted Boltzmann Machines. Berlin Heidelberg: Springer, 2012, pp. 599–619.Google Scholar

[91] Hopfield, J. J., “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the National Academy of Sciences of the United States of America, vol. 79, no. 8, pp. 2554–2558, 1982.Google Scholar

[92] Hinton, G. E., “Training products of experts by minimizing contrastive divergence,” Neural Computation, vol. 14, no. 8, pp. 1771–1800, 2002.Google Scholar

[93] Carreira-Perpinan, M. A. and Hinton, G. E., “On contrastive divergence learning,” in Proceedings of International Workshop on Artificial Intelligence and Statistics (AISTATS), 2005, pp. 33–40.Google Scholar

[94] Hinton, G. E., Osindero, S., and Teh, Y.-W., “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006.Google Scholar

[95] Li, N., Mak, M. W., and Chien, J. T., “DNN-driven mixture of PLDA for robust speaker verification,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, no. 6, pp. 1371–1383, 2017.Google Scholar

[96] LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P., “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, pp. 2278–2324, 1998.Google Scholar

[97] Yu, D., Hinton, G., Morgan, N., Chien, J.-T., and Sagayama, S., “Introduction to the special section on deep learning for speech and language processing,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 4–6, 2012.Google Scholar

[98] Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B., “Deep neural networks for acoustic modeling in speech recognition: Four research groups share their views,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, 2012.Google Scholar

[99] Saon, G. and Chien, J.-T., “Large-vocabulary continuous speech recognition systems: A look at some recent advances,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 18– 33, 2012.Google Scholar

[100] Chien, J.-T. and Ku, Y.-C., “Bayesian recurrent neural network for language modeling,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 2, pp. 361– 374, 2016.Google Scholar

[101] Zeiler, M. D., Taylor, G. W., and Fergus, R., “Adaptive deconvolutional networks for mid and high level feature learning,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2011, pp. 2018–2025.Google Scholar

[102] Xie, J., Xu, L., and Chen, E., “Image denoising and inpainting with deep neural networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds., 2012, pp. 341–349.Google Scholar

[103] Salakhutdinov, R. and Larochelle, H., “Efficient learning of deep Boltzmann machines,” in Proceedings of International Conference on Artificial Intelligence and Statistics (AIS-TATS), 2010, pp. 693–700.Google Scholar

[104] Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P.-A., “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, vol. 11, pp. 3371–3408, 2010.Google Scholar

[105] Schuster, M. and Paliwal, K. K., “Bidirectional recurrent neural networks,” IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.Google Scholar

[106] Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning internal representation by backpropagating errors,” Nature, vol. 323, pp. 533–536, 1986.Google Scholar

[107] Goodfellow, I., Bengio, Y., and Courville, A., Deep Learning. Cambridge, MA: MIT Press, 2016.Google Scholar

[108] Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H., “Greedy layer-wise training of deep networks,” in Advances in Neural Information Processing Systems 19, Schölkopf, B., Platt, J. C., and Hoffman, T., Eds. Cambridge, MA: MIT Press, 2007, pp. 153–160.Google Scholar

[109] Hinton, G. E. and Salakhutdinov, R. R., “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504–507, 2006.Google Scholar

[110] Hinton, G. E., Osindero, S., and Teh, Y.-W., “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, 2006.Google Scholar

[111] Salakhutdinov, R. and Hinton, G. E., “Deep Boltzmann machines,” in Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), 2009, p. 3.Google Scholar

[112] Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A., “Extracting and composing robust features with denoising autoencoders,” in Proceedings of International Conference on Machine Learning (ICML), 2008, pp. 1096–1103.Google Scholar

[113] Hyvärinen, A., “Estimation of non-normalized statistical models by score matching,” Journal of Machine Learning Research, vol. 6, pp. 695–709, 2005.Google Scholar

[114] Kingma, D. P. and Welling, M., “Auto-encoding variational Bayes,” in Proceedings of International Conference on Learning Representation (ICLR), 2014.Google Scholar

[115] Chien, J.-T. and Kuo, K.-T., “Variational recurrent neural networks for speech separation,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2017, pp. 1193–1197.Google Scholar

[116] Chien, J.-T. and Hsu, C.-W., “Variational manifold learning for speaker recognition,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017, pp. 4935–4939.Google Scholar

[117] Rezende, D. J., Mohamed, S., and Wierstra, D., “Stochastic backpropagation and approximate inference in deep generative models,” in Proceedings of International Conference on Machine Learning (ICML), 2014, pp. 1278–1286.Google Scholar

[118] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y., “Generative adversarial nets,” in Advances in Neural Information Processing Systems (NIPS), 2014, pp. 2672–2680.Google Scholar

[119] Chien, J.-T. and Peng, K.-T., “Adversarial manifold learning for speaker recognition,” in Prof. of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017, pp. 599–605.Google Scholar

[120] Chien, J.-T. and Peng, K.-T., “Adversarial learning and augmentation for speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2018, pp. 342–348.Google Scholar

[121] Bengio, Y., Laufer, E., Alain, G., and Yosinski, J., “Deep generative stochastic networks trainable by backprop,” in Proceedings of International Conference on Machine Learning (ICML), 2014, pp. 226–234.Google Scholar

[122] Makhzani, A., Shlens, J., Jaitly, N., and Goodfellow, I., “Adversarial autoencoders,” arXiv preprint arXiv:1511.05644, 2015.Google Scholar

[123] Larsen, A. B. L., Sønderby, S. K., and Winther, O., “Autoencoding beyond pixels using a learned similarity metric,” in Proceedings of International Conference on Machine Learning (ICML), no. 1558–1566, 2015.Google Scholar

[124] Pan, S. J. and Yang, Q., “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2009.Google Scholar

[125] Evgeniou, A. and Pontil, M., “Multi-task feature learning,” Advances in Neural Information Processing Systems (NIPS), vol. 19, p. 41, 2007.Google Scholar

[126] Ando, R. K. and Zhang, T., “A framework for learning predictive structures from multiple tasks and unlabeled data,” Journal of Machine Learning Research, vol. 6, pp. 1817–1853, 2005.Google Scholar

[127] Argyriou, A., Pontil, M., Ying, Y., and Micchelli, C. A., “A spectral regularization framework for multi-task structure learning,” in Advances in Neural Information Processing Systems (NIPS), 2007, pp. 25–32.Google Scholar

[128] Lin, W., Mak, M., and Chien, J., “Multisource i-vectors domain adaptation using maximum mean discrepancy based autoencoders,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 12, pp. 2412–2422, Dec 2018.Google Scholar

[129] Lin, W. W., Mak, M. W., Li, L. X., and Chien, J. T., “Reducing domain mismatch by maximum mean discrepancy based autoencoders,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2018, pp. 162–167.Google Scholar

[130] Sugiyama, M., Nakajima, S., Kashima, H., Buenau, P. V., and Kawanabe, M., “Direct importance estimation with model selection and its application to covariate shift adaptation,” in Advances in Neural Information Processing Systems (NIPS), 2008, pp. 1433–1440.Google Scholar

[131] Bickel, S., Brückner, M., and Scheffer, T., “Discriminative learning under covariate shift,” Journal of Machine Learning Research, vol. 10, pp. 2137–2155, 2009.Google Scholar

[132] Blitzer, J., McDonald, R., and Pereira, F., “Domain adaptation with structural correspondence learning,” in Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), 2006, pp. 120–128.Google Scholar

[133] von Bünau, P., Meinecke, F. C., Király, F. C., and Müller, K.-R., “Finding stationary subspaces in multivariate time series,” Physical Review Letters, vol. 103, no. 21, p. 214101, 2009.Google Scholar

[134] Pan, S. J., Kwok, J. T., and Yang, Q., “Transfer learning via dimensionality reduction,” in Proceedings of AAAI Conference on Artificial Intelligence, vol. 8, 2008, pp. 677–682.Google Scholar

[135] Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., and Smola, A. J., “A kernel method for the two-sample-problem,” in Advances in Neural Information Processing Systems (NIPS), 2007, pp. 513–520.Google Scholar

[136] Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., Schölkopf, B., and Smola, A. J., “Integrating structured biological data by kernel maximum mean discrepancy,” Bioinformatics, vol. 22, no. 14, pp. e49–e57, 2006.Google Scholar

[137] Ahmed, A., Yu, K., Xu, W., Gong, Y., and Xing, E., “Training hierarchical feed-forward visual recognition models using transfer learning from pseudo-tasks,” in Proceedings of European Conference on Computer Vision (ECCV), 2008, pp. 69–82.Google Scholar

[138] Ji, S., Xu, W., Yang, M., and Yu, K., “3D convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013.Google Scholar

[139] Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., and Kingsbury, B., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, pp. 82–97, 2012.Google Scholar

[140] Dahl, G. E., Yu, D., Deng, L., and Acero, A., “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30–42, 2012.Google Scholar

[141] Deng, L., “A tutorial survey of architectures, algorithms, and applications for deep learning,” APSIPA Transactions on Signal and Information Processing, vol. 3, p. e2, 2014.Google Scholar

[142] Mohamed, A. R., Dahl, G. E., and Hinton, G., “Acoustic modeling using deep belief networks,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 14–22, 2012.Google Scholar

[143] Işik, Y. Z., Erdogan, H., and Sarikaya, R., “S-vector: A discriminative representation derived from i-vector for speaker verification,” in Proceedings of European Signal Processing Conference (EUSIPCO), 2015, pp. 2097–2101.Google Scholar

[144] Novoselov, S., Pekhovsky, T., Kudashev, O., Mendelev, V. S., and Prudnikov, A., “Non-linear PLDA for i-vector speaker verification,” in Proceedings of Annual Conference of the International Speech Communication Association (INTERSPEECH), 2015.Google Scholar

[145] Pekhovsky, T., Novoselov, S., Sholohov, A., and Kudashev, O., “On autoencoders in the i-vector space for speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2016, pp. 217–224.Google Scholar

[146] Mahto, S., Yamamoto, H., and Koshinaka, T., “I-vector transformation using a novel discriminative denoising autoencoder for noise-robust speaker recognition,” in Proceedings of Annual Conference of International Speech Communication Association (INTER-SPEECH), 2017, pp. 3722–3726.Google Scholar

[147] Tian, Y., Cai, M., He, L., and Liu, J., “Investigation of bottleneck features and multilingual deep neural networks for speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2015, pp. 1151– 1155.Google Scholar

[148] Tan, Z. L., Zhu, Y. K., Mak, M. W., and Mak, B., “Senone i-vectors for robust speaker verification,” in Proceedings of International Symposium on Chinese Spoken Language Processing (ISCSLP), Tianjin, China, October 2016.Google Scholar

[149] Yaman, S., Pelecanos, J., and Sarikaya, R., “Bottleneck features for speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), vol. 12, 2012, pp. 105–108.Google Scholar

[150] Variani, E., Lei, X., McDermott, E., Lopez, I. J. Gonzalez-Dominguez, M., “Deep neural networks for small footprint text-dependent speaker verification,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 4052–4056.Google Scholar

[151] Yamada, T., Wang, L. B., and Kai, A., “Improvement of distant-talking speaker identification using bottleneck features of DNN,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2013, pp. 3661–3664.Google Scholar

[152] Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., and Alam, J., “Deep neural networks for extracting Baum-Welch statistics for speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2014, pp. 293–298.Google Scholar

[153] Garcia-Romero, D. and McCree, A., “Insights into deep neural networks for speaker recognition,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2015, pp. 1141–1145.Google Scholar

[154] McLaren, M., Lei, Y., and Ferrer, L., “Advances in deep neural network approaches to speaker recognition,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4814–4818.Google Scholar

[155] Snyder, D., Garcia-Romero, D., Povey, D., and Khudanpur, S., “Deep neural network embeddings for text-independent speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2017, pp. 999–1003.Google Scholar

[156] Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., and Khudanpur, S., “X-vectors: Robust DNN embeddings for speaker recognition,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018, pp. 5329–5333.Google Scholar

[157] Tang, Y., Ding, G., Huang, J., He, X., and Zhou, B., “Deep speaker embedding learning with multi-level pooling for text-independent speaker verification,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019, pp. 6116–6120.Google Scholar

[158] Chen, C.-P., Zhang, S.-Y., Yeh, C.-T., Wang, J.-C., Wang, T., and Huang, C.-L., “Speaker characterization using tdnn-lstm based speaker embedding,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6211–6215.Google Scholar

[159] Zhu, W. and Pelecanos, J., “A bayesian attention neural network layer for speaker recognition,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6241–6245.Google Scholar

[160] Zhu, Y., Ko, T., Snyder, D., Mak, B., and Povey, D., “Self-attentive speaker embeddings for text-independent speaker verification,” in Proceedings Interspeech, vol. 2018, 2018, pp. 3573–3577.Google Scholar

[161] Brummer, N., Burget, L., Garcia, P., Plchot, O., Rohdin, J., Romero, D., Snyder, D., Stafylakis, T., Swart, A., and Villalba, J., “Meta-embeddings: A probabilistic generalization of embeddings in machine learning,” in JHU HLTCOE 2017 SCALE Workshop, 2017.Google Scholar

[162] Li, N. and Mak, M. W., “SNR-invariant PLDA modeling for robust speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2015, pp. 2317–2321.Google Scholar

[163] Li, N., Mak, M. W., Lin, W. W., and Chien, J. T., “Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification,” Computer Speech & Language, vol. 45, pp. 83–103, 2017.Google Scholar

[164] Prince, S. and Elder, J., “Probabilistic linear discriminant analysis for inferences about identity,” in Proceedings of IEEE International Conference on Computer Vision (ICCV), 2007, pp. 1–8.Google Scholar

[165] Prince, S. J., Computer Vision: Models, Learning, and Inference. New York: Cambridge University Press, 2012.Google Scholar

[166] Sizov, A., Lee, K. A., and Kinnunen, T., “Unifying probabilistic linear discriminant analysis variants in biometric authentication,” in Structural, Syntactic, and Statistical Pattern Recognition. Berlin, Heidelberg: Springer, 2014, pp. 464–475.Google Scholar

[167] Hasan, T., Saeidi, R., Hansen, J. H. L., and van Leeuwen, D. A., “Duration mismatch compensation for I-vector based speaker recognition system,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013, pp. 7663–7667.Google Scholar

[168] Kanagasundaram, A., Dean, D., Sridharan, S., Gonzalez-Dominguez, J., Gonzalez-Rodriguez, J., and Ramos, D., “Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques,” Speech Communication, vol. 59, pp. 69–82, 2014.Google Scholar

[169] Norwich, K. H., Information, Sensation, and Perception. San Diego: Academic Press, 1993.Google Scholar

[170] Billingsley, P., Probability and Measure. New York: John Wiley & Sons, 2008.Google Scholar

[171] Mak, M. W., Pang, X. M., and Chien, J. T., “Mixture of PLDA for noise robust i-vector speaker verification,” IEEE/ACM Trans. on Audio Speech and Language Processing, vol. 24, no. 1, pp. 132–142, 2016.Google Scholar

[172] Mak, M. W., “SNR-dependent mixture of PLDA for noise robust speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2014, pp. 1855–1859.Google Scholar

[173] Pang, X. M. and Mak, M. W., “Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA,” International Journal of Speech Technology, vol. 18, no. 4, 2015.Google Scholar

[174] Tipping, M. E. and Bishop, C. M., “Mixtures of probabilistic principal component analyzers,” Neural Computation, vol. 11, no. 2, pp. 443–482, 1999.Google Scholar

[175] Pekhovsky, T. and Sizov, A., “Comparison between supervised and unsupervised learning of probabilistic linear discriminant analysis mixture models for speaker verification,” Pattern Recognition Letters, vol. 34, no. 11, pp. 1307–1313, 2013.Google Scholar

[176] Li, N., Mak, M. W., and Chien, J. T., “Deep neural network driven mixture of PLDA for robust i-vector speaker verification,” in Proceedings of IEEE Workshop on Spoken Language Technology (SLT), San Diego, CA, 2016, pp. 186–191.Google Scholar

[177] Cumani, S. and Laface, P., “Large-scale training of pairwise support vector machines for speaker recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 11, pp. 1590–1600, 2014.Google Scholar

[178] Snyder, D., Ghahremani, P., Povey, D., Garcia-Romero, D., Carmiel, Y., and Khudanpur, S., “Deep neural network-based speaker embeddings for end-to-end speaker verification,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT), 2016, pp. 165–170.Google Scholar

[179] Mandasari, M. I., Saeidi, R., McLaren, M., and van Leeuwen, D. A., “Quality measure functions for calibration of speaker recognition systems in various duration conditions,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 11, pp. 2425–2438, Nov. 2013.Google Scholar

[180] Mandasari, M. I., Saeidi, R., and van Leeuwen, D. A., “Quality measures based calibration with duration and noise dependency for speaker recognition,” Speech Communication, vol. 72, pp. 126–137, 2015.Google Scholar

[181] Villalba, A. O. J., Miguel, A. and Lleida, E., “Bayesian networks to model the variability of speaker verification scores in adverse environments,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 12, pp. 2327–2340, 2016.Google Scholar

[182] Nautsch, A., Saeidi, R., Rathgeb, C., and Busch, C., “Robustness of quality-based score calibration of speaker recognition systems with respect to low-SNR and short-duration conditions,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2016, pp. 358–365.Google Scholar

[183] Ferrer, L., Burget, L., Plchot, O., and Scheffer, N., “A unified approach for audio characterization and its application to speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2012, pp. 317–323.Google Scholar

[184] Hong, Q., Li, L., Li, M., Huang, L., Wan, L., and Zhang, J., “Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2015.Google Scholar

[185] Shulipa, A., Novoselov, S., and Matveev, Y., “Scores calibration in speaker recognition systems,” in Proceedings of International Conference on Speech and Computer, 2016, pp. 596–603.Google Scholar

[186] Brümmer, N., Swart, A., and van Leeuwen, D., “A comparison of linear and non-linear calibrations for speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2014, , pp. 14–18.Google Scholar

[187] Brümmer, N. and Doddington, G., “Likelihood-ratio calibration using prior-weighted proper scoring rules,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2013, pp. 1976–1980.Google Scholar

[188] Brümmer, N. and Garcia-Romero, D., “Generative modelling for unsupervised score calibration,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 1680–1684.Google Scholar

[189] Caruana, R., “Multitask learning: A knowledge-based source of inductive bias,” Machine Learning, vol. 28, pp. 41–75, 1997.Google Scholar

[190] Chen, D. and Mak, B., “Multitask learning of deep neural networks for low-resource speech recognition,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 7, pp. 1172–1183, 2015.Google Scholar

[191] Yao, Q. and Mak, M. W., “SNR-invariant multitask deep neural networks for robust speaker verification,” IEEE Signal Processing Letters, vol. 25, no. 11, pp. 1670–1674, Nov. 2018.Google Scholar

[192] Garcia-Romero, D. and McCree, A., “Supervised domain adaptation for i-vector based speaker recognition,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 4047–4051.Google Scholar

[193] Villalba, J. and Lleida, E., “Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), Singapore, 2012.Google Scholar

[194] Villalba, J. and Lleida, E., “Unsupervised adaptation of PLDA by using variational Bayes methods,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 744–748.Google Scholar

[195] Borgström, B. J., Singer, E., Reynolds, D., and Sadjadi, O., “Improving the effectiveness of speaker verification domain adaptation with inadequate in-domain data,” in Proceedings of Annual Conference of International Speech Communication Association (INTER-SPEECH), 2017, pp. 1557–1561.Google Scholar

[196] Shum, S., Reynolds, D. A., Garcia-Romero, D., and McCree, A., “Unsupervised clustering approaches for domain adaptation in speaker recognition systems,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2014, pp. 266–272.Google Scholar

[197] Garcia-Romero, D., Zhang, X., McCree, A., and Povey, D., “Improving speaker recognition performance in the domain adaptation challenge using deep neural networks,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT). IEEE, 2014, pp. 378–383.Google Scholar

[198] Wang, Q. Q. and Koshinaka, T., “Unsupervised discriminative training of PLDA for domain adaptation in speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2017, pp. 3727–3731.Google Scholar

[199] Shon, S., Mun, S., Kim, W., and Ko, H., “Autoencoder based domain adaptation for speaker recognition under insufficient channel information,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2017, pp. 1014–1018.Google Scholar

[200] Aronowitz, H., “Inter dataset variability compensation for speaker recognition,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 4002–4006.Google Scholar

[201] Aronowitz, H., “Compensating inter-dataset variability in PLDA hyper-parameters for robust speaker recognition,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2014, pp. 282–286.Google Scholar

[202] Rahman, H., Kanagasundaram, A., Dean, D., and Sridharan, S., “Dataset-invariant covariance normalization for out-domain PLDA speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2015, pp. 1017–1021.Google Scholar

[203] Kanagasundaram, A., Dean, D., and Sridharan, S., “Improving out-domain PLDA speaker verification using unsupervised inter-dataset variability compensation approach,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4654–4658.Google Scholar

[204] Glembek, O., Ma, J., Matejka, P., Zhang, B., Plchot, O., Burget, L., and Matsoukas, S., “Domain adaptation via within-class covariance correction in i-vector based speaker recognition systems,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 4032–4036.Google Scholar

[205] Singer, E. and Reynolds, D. A., “Domain mismatch compensation for speaker recognition using a library of whiteners,” IEEE Signal Processing Letters, vol. 22, no. 11, pp. 2000– 2003, 2015.Google Scholar

[206] Bahmaninezhad, F. and Hansen, J. H. L., “Compensation for domain mismatch in text-independent speaker recognition,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2018, pp. 1071–1075.Google Scholar

[207] Yu, H., Tan, Z. H., Ma, Z. Y., and Guo, J., “Adversarial network bottleneck features for noise robust speaker verification,” arXiv preprint arXiv:1706.03397, 2017.Google Scholar

[208] Michelsanti, D. and Tan, Z. H., “Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification,” arXiv preprint arXiv:1709.01703, 2017.Google Scholar

[209] Zhang, J. C., Inoue, N., and Shinoda, K., “I-vector transformation using conditional generative adversarial networks for short utterance speaker verification,” in Proceedings Interspeech, 2018, pp. 3613–3617.Google Scholar

[210] Meng, Z., Li, J. Y., Chen, Z., Zhao, Y., Mazalov, V., Gong, Y. F., and Juang, B. H., “Speaker-invariant training via adversarial learning,” arXiv preprint arXiv:1804.00732, 2018.Google Scholar

[211] Wang, Q., Rao, W., Sun, S., Xie, L., Chng, E. S., and Li, H. Z., “Unsupervised domain adaptation via domain adversarial training for speaker recognition,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018, pp. 4889–4893.Google Scholar

[212] Viñals, I., Ortega, A., Villalba, J., Miguel, A., and Lleida, E., “Domain adaptation of PLDA models in broadcast diarization by means of unsupervised speaker clustering,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2017, pp. 2829–2833.Google Scholar

[213] Li, J. Y., Seltzer, M. L., Wang, X., Zhao, R., and Gong, Y. F., “Large-scale domain adaptation via teacher-student learning,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2017, pp. 2386–2390.Google Scholar

[214] Aronowitz, H., “Inter dataset variability modeling for speaker recognition,” in Proceedings International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017, pp. 5400–5404.Google Scholar

[215] McLaren, M. and Van Leeuwen, D., “Source-normalized LDA for robust speaker recognition using i-vectors from multiple speech sources,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 3, pp. 755–766, 2012.Google Scholar

[216] Rahman, M. H., Himawan, I., Dean, D., Fookes, C., and Sridharan, S., “Domain-invariant i-vector feature extraction for PLDA speaker verification,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2018, pp. 155–161.Google Scholar

[217] Shepstone, S. E., Lee, K. A., Li, H., Tan, Z.-H., and Jensen, S. H., “Total variability modeling using source-specific priors,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 3, pp. 504–517, 2016.Google Scholar

[218] Alam, M. J., Bhattacharya, G., and Kenny, P., “Speaker verification in mismatched conditions with frustratingly easy domain adaptation,” in Proceedings of Speaker and Language Recognition Workshop (Odyssey), 2018, pp. 176–180.Google Scholar

[219] Sun, B., Feng, J., and Saenko, K., “Return of frustratingly easy domain adaptation,” in Proceedings of AAAI Conference on Artificial Intelligence, vol. 6, no. 7, 2016.Google Scholar

[220] Alam, J., Kenny, P., Bhattacharya, G., and Kockmann, M., “Speaker verification under adverse conditions using i-vector adaptation and neural networks,” in Proceedings of Annual Conference of International Speech Communication Association (INTER-SPEECH), 2017, pp. 3732–3736.Google Scholar

[221] Szegedy, C., Ioffe, S., Vanhoucke, V., and Alemi, A. A., “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of AAAI Conference on Artificial Intelligence, 2017.Google Scholar

[222] Bhattacharya, G., Alam, J., Kenn, P., and Gupta, V., “Modelling speaker and channel variability using deep neural networks for robust speaker verification,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT), 2016, pp. 192–198.Google Scholar

[223] Domain Adaptation Challenge, John Hopkins University, 2013.Google Scholar

[224] Storkey, A., “When training and test sets are different: Characterizing learning transfer,” in Dataset Shift in Machine Learning, Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N., Eds. Cambridge, MA: MIT Press, 2009, pp. 3–28.Google Scholar

[225] Shimodaira, H., “Improving predictive inference under covariate shift by weighting the log-likelihood function,” Journal of Statistical Planning and Inference, vol. 90, no. 2, pp. 227–244, 2000.Google Scholar

[226] David, S. B., Lu, T., Luu, T., and Pál, D., “Impossibility theorems for domain adaptation,” in Proceedings International Conference on Artificial Intelligence and Statistics (AISTATS), 2010, pp. 129–136.Google Scholar

[227] Mansour, Y., Mohri, M., and Rostamizadeh, A., “Domain adaptation: Learning bounds and algorithms,” arXiv preprint arXiv:0902.3430, 2009.Google Scholar

[228] Germain, P., Habrard, A., Laviolette, F., and Morvant, E., “A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers,” in Proceedings International Conference on Machine Learning (ICML), 2013, pp. 738–746.Google Scholar

[229] Chen, H.-Y. and Chien, J.-T., “Deep semi-supervised learning for domain adaptation,” in IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2015, pp. 1–6.Google Scholar

[230] Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., and Smola, A. J., “A kernel method for the two-sample-problem,” in Advances in Neural Information Processing Systems (NIPS), 2007, pp. 513–520.Google Scholar

[231] Li, Y., Swersky, K., and Zemel, R., “Generative moment matching networks,” in Proceedings International Conference on Machine Learning (ICML), 2015, pp. 1718–1727.Google Scholar

[232] Long, M., Cao, Y., Wang, J., and Jordan, M., “Learning transferable features with deep adaptation networks,” in Proceedings International Conference on Machine Learning (ICML), 2015, pp. 97–105.Google Scholar

[233] Smola, A., Gretton, A., Song, L., and Schölkopf, B., “A Hilbert space embedding for distributions,” in International Conference on Algorithmic Learning Theory. Berlin, Heidelberg: Springer, 2007, pp. 13–31.Google Scholar

[234] Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P., “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion,” Journal of Machine Learning Research, vol. 11, pp. 3371–3408, 2010.Google Scholar

[235] Schroff, F., Kalenichenko, D., and Philbin, J., “Facenet: A unified embedding for face recognition and clustering,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 815–823.Google Scholar

[236] Wen, Y., Zhang, K., Li, Z., and Qiao, Y., “A discriminative feature learning approach for deep face recognition,” in Proceedings of European Conference on Computer Vision (ECCV), 2016, pp. 499–515.Google Scholar

[237] Kingma, D. P. and Welling, M., “Auto-encoding variational Bayes,” in Proceedings of International Conference on Learning Representations (ICLR), 2014.Google Scholar

[238] Kingma, D. P., Mohamed, S., Rezende, D. J., and Welling, M., “Semi-supervised learning with deep generative models,” in Advances in Neural Information Processing Systems (NIPS), 2014, pp. 3581–3589.Google Scholar

[239] Rezende, D. J., Mohamed, S., and Wierstra, D., “Stochastic backpropagation and approximate inference in deep generative models,” in Proceedings of International Conference on Machine Learning (ICML), 2014.Google Scholar

[240] Doersch, C., “Tutorial on variational autoencoders,” arXiv preprint arXiv:1606.05908, 2016.Google Scholar

[241] Wilson, E., “Backpropagation learning for systems with discrete-valued functions,” in Proceedings of the World Congress on Neural Networks, vol. 3, 1994, pp. 332–339.Google Scholar

[242] Glorot, X. and Bengio, Y., “Understanding the difficulty of training deep feedforward neural networks,” in Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), vol. 9, 2010, pp. 249–256.Google Scholar

[243] Kingma, D. and Ba, J., “Adam: A method for stochastic optimization,” in Proceedings of International Conference on Learning Representations (ICLR), San Diego, CA, 2015.Google Scholar

[244] Rao, W. and Mak, M. W., “Alleviating the small sample-size problem in i-vector based speaker verification,” in Proceedings of International Symposium on Chinese Spoken Language Processing (ISCSLP), 2012, pp. 335–339.Google Scholar

[245] Hinton, G. E. and Roweis, S. T., “Stochastic neighbor embedding,” in Advances in Neural Information Processing Systems (NIPS), Becker, S., Thrun, S., and Obermayer, K., Eds., Baltimore, MD: MIT Press, 2003, pp. 857–864.Google Scholar

[246] Tseng, H.-H., Naqa, I. E., and Chien, J.-T., “Power-law stochastic neighbor embedding,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017, pp. 2347–2351.Google Scholar

[247] Chien, J.-T. and Chen, C.-H., “Deep discriminative manifold learning,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016, pp. 2672–2676.Google Scholar

[248] Chen, K. and Salman, A., “Learning speaker-specific characteristics with a deep neural architecture,” IEEE Transactions on Neural Networks, vol. 22, no. 11, pp. 1744–1756, 2011.Google Scholar

[249] Chien, J.-T. and Hsu, C.-W., “Variational manifold learning for speaker recognition,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017, pp. 4935–4939.Google Scholar

[250] Odena, A., Olah, C., and Shlens, J., “Conditional image synthesis with auxiliary classifier GANs,” arXiv preprint arXiv:1610.09585, 2016.Google Scholar

[251] Mirza, M. and Osindero, S., “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.Google Scholar

[252] Cook, J., Sutskever, I., Mnih, A., and Hinton, G. E., “Visualizing similarity data with a mixture of maps,” in Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), 2007, pp. 67–74.Google Scholar

[253] Che, T., Li, Y., Jacob, A. P., Bengio, Y., and Li, W., “Mode regularized generative adversarial networks,” arXiv preprint arXiv:1612.02136, 2016.Google Scholar

[254] Min, M. R., Maaten, L., Yuan, Z., Bonner, A. J., and Zhang, Z., “Deep supervised t-distributed embedding,” in Proceedings of International Conference on Machine Learning (ICML), 2010, pp. 791–798.Google Scholar

[255] Palaz, D., Collobert, R., and Doss, M. M., “Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2013, pp. 1766–1770.Google Scholar

[256] Jaitly, N. and Hinton, G., “Learning a better representation of speech soundwaves using restricted Boltzmann machines,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011, pp. 5884–5887.Google Scholar

[257] Tüske, Z., Golik, P., Schlüter, R., and H. Ney, “Acoustic modeling with deep neural networks using raw time signal for LVCSR,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2014.Google Scholar

[258] Palaz, D., Magimai-Doss, M., and Collobert, R., “End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition,” Speech Communication, 2019.Google Scholar

[259] Hoshen, Y., Weiss, R. J., and Wilson, K. W., “Speech acoustic modeling from raw multi-channel waveforms,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4624–4628.Google Scholar

[260] Palaz, D., Magimai-Doss, M., and Collobert, R., “Analysis of CNN-based speech recognition system using raw speech as input,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2015, pp. 11–15.Google Scholar

[261] Sainath, T. N., Weiss, R. J., Senior, A., Wilson, K. W., and Vinyals, O., “Learning the speech front-end with raw waveform CLDNNs,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2015.Google Scholar

[262] Sainath, T. N., Vinyals, O., Senior, A., and Sak, H., “Convolutional, long short-term memory, fully connected deep neural networks,” in Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015, pp. 4580–4584.Google Scholar

[263] Zhang, C., Koishida, K., and Hansen, J. H., “Text-independent speaker verification based on triplet convolutional neural network embeddings,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 9, pp. 1633–1644, 2018.Google Scholar

[264] Zhang, C. and Koishida, K., “End-to-end text-independent speaker verification with triplet loss on short utterances,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2017, pp. 1487–1491.Google Scholar

[265] Chung, J. S., Nagrani, A., and Zisserman, A., “Voxceleb2: Deep speaker recognition,” in Proceedings Interspeech, 2018, pp. 1086–1090.Google Scholar

[266] Bhattacharya, G., Alam, J., and Kenny, P., “Adapting end-to-end neural speaker verification to new languages and recording conditions with adversarial training,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6041–6045.Google Scholar

[267] Yu, Y.-Q., Fan, L., and Li, W.-J., “Ensemble additive margin softmax for speaker verification,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6046–6050.Google Scholar

[268] Wang, S., Yang, Y., Wang, T., Qian, Y., and Yu, K., “Knowledge distillation for small foot-print deep speaker embedding,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6021–6025.Google Scholar

[269] He, K., Zhang, X., Ren, S., and Sun, J., “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.Google Scholar

[270] Kurakin, A., Goodfellow, I., and Bengio, S., “Adversarial machine learning at scale,” arXiv preprint arXiv:1611.01236, 2016.Google Scholar

[271] Makhzani, A., Shlens, J., Jaitly, N., and Goodfellow, I. J., “Adversarial autoencoders,” CoRR, vol. abs/1511.05644, 2015. [Online]. Available: http://arxiv.org/abs/1511.05644 Google Scholar

[272] Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V., “Domain-adversarial training of neural networks,” Journal of Machine Learning Research, vol. 17, no. 59, pp. 1–35, 2016.Google Scholar

[273] Tsai, J. C. and Chien, J. T., “Adversarial domain separation and adaptation,” in Proceedings IEEE MLSP, Tokyo, 2017.Google Scholar

[274] Bhattacharya, G., Monteiro, J., Alam, J., and Kenny, P., “Generative adversarial speaker embedding networks for domain robust end-to-end speaker verification,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6226–6230.Google Scholar

[275] Rohdin, J., Stafylakis, T., Silnova, A., Zeinali, H., Burget, L., and Plchot, O., “Speaker verification using end-to-end adversarial language adaptation,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6006–6010.Google Scholar

[276] Fang, X., Zou, L., Li, J., Sun, L., and Ling, Z.-H., “Channel adversarial training for cross-channel text-independent speaker recognition,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019, pp. 6221–6225.Google Scholar

[277] Zhou, J., Jiang, T., Li, L., Hong, Q., Wang, Z., and Xia, B., “Training multi-task adversarial network for extracting noise-robust speaker embedding,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6196–6200.Google Scholar

[278] Meng, Z., Zhao, Y., Li, J., and Gong, Y., “Adversarial speaker verification,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6216–6220.Google Scholar

[279] Nidadavolu, P. S., Villalba, J., and Dehak, N., “Cycle-gans for domain adaptation of acoustic features for speaker recognition,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6206–6210.Google Scholar

[280] Li, L., Tang, Z., Shi, Y., and Wang, D., “Gaussian-constrained training for speaker verification,” in Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 2019, pp. 6036–6040.Google Scholar

[281] Tu, Y., Mak, M.-W. and Chien, J.-T., “Variational domain adversarial learning for speaker verification,” in Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), 2019, pp. 4315–4319.Google Scholar

[282] Shapiro, S. S. and Wilk, M. B., “An analysis of variance test for normality (complete samples),” Biometrika, vol. 52, no. 3/4, pp. 591–611, 1965.Google Scholar

Book contents

References

Summary

Access options

Book purchase

Temporarily unavailable

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive