top of page

Publications


 

(A): International Refereed Journal Papers

  1. S. V. Bharath Kumar and S. Umesh [2008]: ``Non-Uniform Speaker Normalization Using Affine Transformation,'' To Appear in Journal of the Acoustical Society of America, Vol. 124, No. 3, Sep. 2008

  2. R. Sinha and S. Umesh [2008]: ``A Shift based Approach to Speaker Normalization using Non-Linear Frequency-Scaling Model,'' ISCA Transactions on Speech Communication, Vol. 50,No. 3, pp.191-202, Mar. 2008

  3. S. Umesh and R. Sinha [2007]: ``A Study of Filter-Bank Smoothing in MFCC Features for Recogniti on of Children Speech,'' IEEE Transactions on Audio, Speech and Language Processing, Volume 15, Issue 8, Nov. 2007 Page(s): 2418 – 2430

  4. S. Umesh, L. Cohen and D. Nelson [2007]: `` Fluctuations in Speech'', Fluctuations and Noise Letters, 1.Vol. 7, No. 3, Sep. 2007, pp. 215—224

  5. S. Umesh, L. Cohen and D. Nelson [2002]: ``The Speech Scale,'' Acoustics Research Letters Online of the Journal of Acoustical Society of America, Vol. 3, Issue 3, pp.83-88, July 2002.

  6. S. Umesh, L. Cohen and D. Nelson [2002]: ``Frequency Warping and the Mel-scale,'' IEEE Signal Processing Letters, vol. 9, no. 3, pp.104-107, March 2002.

  7. S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1999]: ``Scale-Transform in Speech Analysis,'' IEEE Transactions on Speech and Audio Processing, vol. 7, no. 1, pp.40-45, Jan. 1999.

  8. S. Umesh and D. W. Tufts [1996]: ``Estimation of Parameters of Multiple Exponentially Damped Sinusoids using Fast Maximum Likelihood Estimation with Application to NMR Spectroscopy Data,'' IEEE Trans. Signal Processing, vol. 44, no. 9, pp.2245-2259, Sept. 1996.

  9. D. W. Tufts, H. Ge, and S. Umesh [1993]: ``Fast Maximum Likelihood Estimation of Signal Parameters using the Shape of the Compressed Likelihood Function,'' IEEE Journal of Oceanic Engg., Vol. 18, no. 4, pp. 388-400, Oct. 1993.
    (Invited Paper).

(B): Refereed Papers in International Conferences held outside India

  1. D. R. Sanand and S. Umesh [2008]: ``Study of Jacobian Compensation Using Linear Transformation of Conventional MFCC for VTLN'', To Appear in Interspeech-2008, Brisbane, Sep. 2008

  2. D. R. Sanand, V. Balaji, R. Sandhya Rani and S. Umesh [2008]: ``Use of Spectral Center of Gravity for Generating Speaker Invariant Features for Automatic Speech Recognition'', To Appear in Interspeech-2008, Brisbane, Sep. 2008

  3. P. T. Akhil, S. P. Rath, S. Umesh and D. R. Sanand [2008]: ``A Computationally Efficient Approach to Warp Factor Estimation in VTLN Using EM Algoirthm and Sufficient Statistics'', To Appear in Interspeech-2008, Brisbane, Sep. 2008

  4. D. R. Sanand, D. Dinesh Kumar and S. Umesh [2007]: ``Linear Transformation Approach to VTLN Using Dynamic Frequency Warping,'' Proc. of International Conference on Spoken Language Processing (Interspeech 2007), Antwerp, Belgium, August 27-31, 2007. [Acceptance ratio: 59% = 748/1268]

  5. S. Umesh, L. Cohen and D. Nelson [2007]: ``Fluctuations in speech,'' Proc. of Conference on Noise and Fluctuations in Biological, Biophysical, and Biomedical Systems, Florence, Italy, May 2007

  6. S. Umesh, D. Rama Sanand, G. Praveen [2007]: ``Speaker-Invariant Features for Automatic Speech Recognition,'' Proc. of International Joint Conferences on Artificial Intelligence, (IJCAI-07), pp. 1738-1743, Jan. 2007 [Acceptance ratio: 15.5% = 212/1365]

  7. S. V. Bharath, S. Umesh and R. Sinha [2006]: ``Study of Non-Linear Frequency Warping Functions for Speaker Normalization,'' To Appear in Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Toulouse), April 2006 [Acceptance ratio: 48.1% = 1465/3045]

  8. J. Lööf and H. Ney and S. Umesh [2006]: ``VTLN Warping Factor Estimation Using Accumulation of Sufficient Statistics,'' To Appear in Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Toulouse), April 2006 [Acceptance ratio: 48.1% = 1465/3045]

  9. S. Umesh, A. Zolnay and H. Ney [2005]: ``Implementing Frequency-Warping and VTLN Through Linear Transformation of Conventional MFCC,'' Proc. of InterSpeech 2005, (Lisbon, Portugal), Sep.'2005 [Acceptance ratio: 62% = 855/1379]

  10. S. Umesh, L. Cohen and D. Nelson [2005]: ``The Speech Scale and Spectral Transformation,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., July'2005

  11. S. V. Bharath and S. Umesh [2004]: ``Non-uniform speaker normalization using frequency-dependent scaling function,'' Proc. IEEE International Conference on Signal Processing and Communications, (Bangalore), December 2004

  12. S. Tranter, M. J. Gales, R. Sinha, S. Umesh and P. Woodland [2004]: ``The Development of the Cambrdige University RT-04 Diarisation System,'' Proc. of 2004 Rich Transcription Workshop (RT-04) , (Palisades, NY, USA), November 2004

  13. D. Kim, S. Umesh, M. J. Gales, T. Hain and P. Woodland [2004]: ``Using VTLN for Broadcast News Transcription,'' Proc. of International Conference on Spoken Language Processing , (ICSLP, Jeju Island, S.Korea), October 2004

  14. S. V. Bharath, S. Umesh and R. Sinha [2004]: ``Non-Uniform Speaker Normalization using Affine Transformation,'' Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Montreal), Vol. I, pp.121-124, April 2004 Voted the top paper in its review category [Acceptance ratio: 51.8% = 1262/2434]

  15. S. Umesh, R. Sinha and S. V. Bharath [2004]: ``An Investigation into Front-End Signal Processing for Speaker Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Montreal), Vol. I, pp.345-348, April 2004 [Acceptance ratio: 51.8% = 1262/2434]

  16. D. Nelson, D. Smith, S. Umesh, L. Cohen [2003]: ``Estimating speaker scale factors from vowels,'' Proc. of SPIE Conference on Wavelets: Applications in Signal and Image Processing, , vol. 5207, pp. 794-800, July 2003.

  17. R.Sinha and S. Umesh [2003]: ``A Method for Compensation of Jacobian in Speaker Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Hong Kong), April 2003

  18. R. Sinha and S. Umesh [2003]: ``A Study into Front-End Signal Processing for Automatic Speech Recognition,'' Proc. of Workshop on Spoken Language Processing , (TIFR,Mumbai), pp. 87 - 92, January 2003

  19. S. Umesh, L. Cohen and D. Nelson [2002]: ``The speech scale, the Mel scale and the Tube Model for Speech,'' Proc. of SPIE Conference on Advanced Signal Processing Algorithms, Architectures and Implementations, vol. 4791, pp. 7 - 23, July 2002.

  20. R.Sinha and S. Umesh [2002]: ``Non-Uniform Scaling Based Speaker-Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Orlando, USA), Vol. I, pp. 589-592, May 2002 [Acceptance ratio: 56.9% = 1007/1770]

  21. S. Umesh, S. V. Bharath, M. K. Vinay, R. Sharma and R. Sinha [2002]: ``A Simple Approach to Non-Uniform Vowel Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP, Orlando, USA),Vol. I, pp. 517-520, May 2002 [Acceptance ratio: 56.9% = 1007/1770]

  22. S. Umesh, D. Nelson and L. Cohen [2001]: ``Further Experimental Results on the Speech-Hearing Connection,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 4478, pp. 361-366, July'2001

  23. S. Umesh, Richard C. Rose, and S. Parthasarathy [2000]: ``Exploiting Frequency-Scaling Invariance Properties of the Scale Transform for Automatic Speech Recognition,'' in Proc. of International Conference on Spoken Language Processing, (ICSLP Beijing, China), pp. 651-654, Oct.'2000

  24. D. Nelson, S. Umesh, and L. Cohen [2000]: ``High Frequency Formant Estimation & Its Application in Frequency-Scaling of Speech,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., vol. 4119, pp. 294-301, July'2000

  25. S. Umesh, L. Cohen and D. Nelson [1999]: ``Scale-Transform Based Features for Application in Speech Recognition,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 3813, pp.727-731, July'1999

  26. S. Umesh, L. Cohen, and D. Nelson [1999]: ``Fitting the Mel-Scale,'' Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Phoenix, Arizona, USA), Vol. 1, pp. 217-220, March 1999. [Acceptance ratio: 58.2% = 869/1490]

  27. S.Umesh, L.Cohen and D.Nelson [1998]: ``Warping Functions in Speech'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 3458, pp.194-209, July'1998

  28. S. Umesh, L. Cohen, and D. J. Nelson [1998]: ``Improved Scale-Cepstral Analysis in Speech,'' IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Seattle, USA), pp. 637-640, May 1998.

  29. S. Umesh, L. Cohen, and D. Nelson [1997]: ``Improvements in Scale-Cepstral Features for Speech Analysis,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc., (San Diego, USA), vol. 3169, pp. 481-494, July 1997.

  30. S.Umesh, A. Rao, G.Cristobal, L.Cohen and J.H. van Deemter [1997]: ``Global and local translation and magnification'' Proc. of SPIE Conference on Statistical & Stochastic Methods in Image Processing, Vol. 3167, pp.106-117, July'1997.

  31. S. Umesh, L. Cohen, and D. J. Nelson [1997]: ``Frequency-Warping and Speaker-Normalization,''
    IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Munich, Germany), pp. 983-986, May 1997.

  32. S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1996]: ``Frequency-Warping in Speech,'' in Proc. International Conference on Spoken Language Processing, (ICSLP Philadelphia,USA), pp. 414-417, October 1996.

  33. S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1996]: ``Psychoacoustic-Frequency Scales versus Frequency-Warping in Scale cepstrum ,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc. , Vol. 2825, pp. 530-539, July 1996.

  34. S. Umesh and D. J. Nelson [1996]: ``Computationally Efficient Estimation of Sinusoidal Frequency at low SNR,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Atlanta, USA), pp. 2797-2800, May 1996.

  35. L. Cohen, N. Marinovic, S. Umesh, and D. Nelson [1995]: ``Scale-Invariant Speech Analysis via joint time-frequency-scale processing,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 2569, pp. 522-537, July 1995.

  36. N. Marinovic, L. Cohen, S. Umesh, and D. Nelson [1995]: ``Classification of Digital Modulation Types,'' in Proc. SPIE Conference on Advanced Signal Processing Algorithms, vol. SPIE-2563, (San Diego, USA), pp. 125-143, July 1995.

  37. N. Marinovic, L. Cohen, and S. Umesh [1994]: ``Joint Representations in Time and Frequency Scale for Harmonic Type Signals,'' in Proc. IEEE-SP International Symposium on T-F and T-S Representations, (Philadelphia, PA), pp. 84-87, October 1994.

  38. N. Marinovic, L. Cohen, and S. Umesh [1994]: ``Scale and Harmonic Signal Analysis,'' in Proc. International Society of Optical Engineering Conference on Wavelet Applications in Signal & Image Proc., Vol. 2303, pp. 411-418, August 1994.

  39. E. Wilson, S. Umesh, and D. W. Tufts [1993]: ``Multistage Neural Network Structure for Transient Detection and Feature Extraction,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Minneapolis, USA), pp. 489-492, April 1993.

  40. E. Wilson, S. Umesh, and D. W. Tufts [1992]: ``Designing a Neural Network Structure for Transient Detection Using the Subspace Inhibition Filter Algorithm,'' in Proc. IEEE Oceans '92, pp. 120-125 (Newport, USA), Oct. 1992.

  41. E. Wilson, S. Umesh, and D. W. Tufts [1992]: ``Resolving the Components of Transient Signals Using the Neural Network and Subspace Inhibition Filter Algorithms,'' in Proc. International Joint Conference on Neural Networks, (Baltimore, USA), pp. 283-288, June 1992.

  42. S. Umesh and D. W. Tufts [1992]: ``Resolving the Components of Transient Signals by a Multistage Procedure,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP San Francisco, USA), pp. 553-556, March 1992.

  43. G. F. Boudreaux-Bartels, D. W. Tufts, and S. Umesh [1991]: ``On Improving the Detection of Gabor Components.,'' in Proc. of Mini ASSP Conference, (Boston, USA), April 1991.

 

(C): Papers in Conferences held in India

  1. D. Dinesh Kumar, D. R. Sanand and S. Umesh [2008]: `` Linear Transformation Approach to Speaker Normalization on Conventional MFCC,’’ Proc. Of National Conference on Communications, IIT-Bombay, Feb-2008

  2. R. Sandhya Rani, D. R. Sanand and S. Umesh [2008]: ``Speaker Normzalisation Using Center of Gravity’’ Proc. Of National Conference on Communications, IIT-Bombay, Feb-2008

  3. S. P. Rath, D. R. Sanand and S. Umesh [2008]: `` MAP based Warping factor Estimation in Vocal Tract Length Normalization’’ Proc. Of National Confer+ence on Communications, IIT-Bombay, Feb-2008

  4. Mohd Amir Khan, D. Rama Sanand, S. Umesh [2007]: ``Jacobian Compensation Using Variance Normalization in Automatic Speech Recognition,'' Proc. of National Conference on Communications, IIT-Kanpur, Jan. 2007

  5. S. Umesh, R. Sinha, D Rama Sanand [2007]: ``Using Vocal-Tract Length Normalization in Recognition of Children Speech,'' Proc. of National Conference on Communications, IIT-Kanpur, Jan. 2007

  6. R. Sinha and S. Umesh [2006]: ``Linear-Transformation Approach to Shift-Based Speaker-Normalisaion'' Proc. of National Conference on Communications , (IIT,Delhi), January 2006

  7. S. Umesh and S. V. Bharath [2006]: ``Study of Non-linear Frequency Warping functions for Speaker Normalisation'' Proc. of National Conference on Communications , (IIT,Delhi), January 2006

  8. R. Sinha and S. Umesh [2003]: ``Spectral Smoothing for Vocal-Tract Length Normalization,'' Proc. of National Conference on Communications , (IIT,Chennai), pp. 87 - 92, January 2003

  9. S.Umesh, M. Belkhode and Rohit Sinha [1999]: ``Comparison of Front-End Features for Speech Recognition'' Proc. of National Conf. on Communications, (Kharagpur), pp.163-170, Jan. 1999

 

(D): Technical Workshop Presentations

  1. S. Tranter and S. Umesh [2004]: ``Diarisation Research at CUED,'' Meta-Data Evaluation (MDE) Technical Meeting of U.S. ARPA's Effecti ve Affordable Reusable Speeech (EARS) Project, (Boston, USA), May 2004

  2. D.Y. Kim, M.J.F. Gales, H.Y.Chan, P.C. Woodland, S. Umesh and T. Hain [2004]: ``Progress in Broadcast News English Transcription,'' Speech-to-Text (STT) Workshop of ARPA's EARS Project, (Montreal, Canada), May 2004

 

(E): Invited Talks

  1. S. Umesh [2007]: ``Introduction to Large Vocabulary Continuous Speech Recognition'' National Conference on Communications, (IIT-Kanpur), Jan.2007

  2. S. Umesh [2006]: ``Statistical Fundamentals for Speech Recognition'' Winter School on Speech & Audio Processing (WISSAP-06), (IISc., Bangalore), Jan. 2006

  3. S. Umesh [2005]: ``Large Vocabulary Continuous Speech Recognition,'' International Conference on Natural Language Processing, (IIT, Kanpur), Dec. 2005

 

(F): Books

  1. Ajit K. Chaturvedi, Srinivasan Umesh, Adrish Banerjee, Kameswari Chebrolu, Joseph John, Ayyangar R. Harish (Editors): Proceedings of the Thirteenth National Conference on Communications, I.I.T. Kanpur, 26-28 January 2007. ISBN Number: 978-81-904444-0-8

bottom of page