We are happy to announce that the following distinguished scholars agreed to deliver a keynote lecture: (in alphabetic order):

  • Prof. Dafydd Gibbon (Bielefeld University, Germany)
  • Prof. Sonja Kotz (Maastricht University in the Netherlands)
  • Prof. Andrew Rosenberg (IBM Research AI, USA)
  • Prof. Jianhua Tao (Chinese Academy of Sciences)

Prof. Dafydd Gibbon

Dafydd Gibbon is emeritus professor of English and Linguistics at Bielefeld University and is currently visiting professor in linguistics and phonetics at Jinan University, Guanzhou, China. His publications on prosody started with Perspectives of Intonation Analysis (1976) and continued with the collection Intonation, Accent and Rhythm. Studies in Discourse Phonology (1984) the collection Rhythm, Melody and Harmony. Studies in Honour of Wiktor Jassem, and numerous articles and conference contributions on aspects of intonation, tone and speech timing. Specific contributions to the study of prosody include the three-way semiotic distinction between structure, form and function in prosody; the application of the rank-interpretation architecture to prosody; finite state models of tone, the concept of prosody as metalocution, and the computation of time trees from speech annotations. A further area of specialisation has been language documentation for heritage preservation, linguistics and speech technology, as lead editor and co-editor of three handbooks in these fields (1997, 2000, 2012). He has received awards from the Polish Phonetics Association, the Linguistic Association of Nigeria and from the Ivory Coast government for contributions to linguistics, phonetics and speech technology, including aspects of prosody of endangered languages in West Africa.

Title of the talk: "The Future of Prosody: Loose Ends and Open Challenges"

Prof. Sonja Kotz

Sonja A. Kotz is a cognitive, affective, and translational neuroscientist who investigates the role of prediction in multimodal domains (perception, action, communication, music) in healthy and clinical populations using behavioural and modern neuroimaging techniques (E/MEG, s/fMRI).
She holds a Chair in Translational Cognitive Neuroscience at Maastricht University in the Netherlands, is a Research Associate at the Max Planck Institute for Human Cognitive and Brain Sciences in Leipzig, Germany, has multiple honorary positions and professorships (Manchester & Glasgow, UK), Leipzig (Germany), (Georgetown, USA) and is currently the President of the
European Society for Cognitive and Affective Neuroscience. She also works for multiple funding agencies in Europe including the ERC. She has published close to 200 papers in leading journals of cognitive and affective neuroscience and her current h-index is 57 (Google Scholar).

Title of the talk: "Does prediction play a role in multimodal emotional speech perception?"

Prof. Andrew Rosenberg

Andrew Rosenberg is currently a Research Staff Member at IBM Research AI where he has worked since 2016.  He received his PhD from Columbia University in 2009.  He then taught and researched at CUNY Queens College as Assistant and, later, Associate Professor until joining IBM.  While at CUNY, from 2013 through 2016, he directed the CUNY Graduate Center Computational Linguistics Program.  His research is primarily on automated analyses of prosody and the use of these on downstream spoken language processing tasks.  This has included paralinguistic analysis, named entity recognition, segmentation, summarization and speech synthesis.  He has written over 70 journal and conference papers, the vast majority on speech prosody and language production.  He is the author and maintainer of AuToBI, an open-source tool for the automatic labeling of ToBI labels from speech.  He is an NSF CAREER award winner for a proposal titled "More than Words: Advancing Prosodic Analysis".

Title of the talk: "Speech, Prosody, and Machines: Old and New Technological Challenges for Prosody Research"

Prof. Jianhua Tao

Professor Jianhua Tao is currently the deputy director at National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. He is the winner of the National Science Fund for Distinguished Young Scholars.
He received the bachelor and M.S. degree from Nanjing University, Nanjing, China, in 1993 and 1996, and the Ph.D. degree from Tsinghua University, Beijing, China, in 2001. He is currently the Steering Committee Member of IEEE Trans. on Affective Computing, Vice-Chairperson of the ISCA Special Interest Group of Chinese Spoken Language Processing (SIG-CSLP), and Executive Committee member of HUMAINE, China Computer Federation, Chinese Association For Artificial Intelligence, Chinese Character Information Society of China, the Acoustical Society of China, and the secretary-general of Chinese Character Information Society of China Linguistic Data Development and Management Committee. He has directed and participated in more than 20 national projects, including “863”, National Natural Science Foundation of China, National Development and Reform Commission, International Cooperation Program of Ministry of Science of Technology. He has repeatedly served as an evaluation expert of national projects such as National Natural Science Foundation of China, and “863”. He has published more than 150 papers in SCI or EI journals and proceedings, authorized 15 domestic invention patents and 1 international patent, and edited 2 books. Prof. Tao received several awards from important conferences, and won twice the Scientific Technology Advance Award of Beijing City. Currently, he also serves as committee or chair of program committee in domestic and international famous conferences, including ICPR,ACII,ICMI,IUS,ISCSLP,NCMMSC. He also serves as a member of the editorial board of Journal on Multimodal User Interface and International Journal on Synthetic Emotions.

Title of the talk: "Speech emotion recognition"

Speech emotion recognition supports natural and efficient human-computer interaction with wide applications of website customization, education and gaming. Typical methods are based on short-time frame-level feature extraction, followed by utterance-level information extraction and classification or regression as required. However, the selection of a common and global emotional feature subspace is challenging. We explore the influence of different emotional features, voice quality features, spectral features and prosodic features on different types of corpora. Denoising auto-encoder is utilized to extract high-level discriminative representations. On the other hand, various machine learning algorithms are applied for speech emotion recognition, such as Gaussian Mixture Models, Deep Neural Networks, Support Vector Machines. Emotion is a temporally expression event, thus we favor the methods can model larger sets of contextual information well, such Hidden Markov Models and Long Short-Term Memory Recurrent Neural Network (LSTM-RNN).
In this talk, I present our multi-scale emotional dynamic temporal modeling using deep belief network and LSTM-RNN. We also propose temporal pooling to release the problem of redundant information and label noise for dimensional emotion recognition. To solve the ambiguity of emotion description, we combine dimensional emotion and discrete emotion information to improve the performance of emotion recognition.




Template by L.THEME. Photo by Kubiak