Special sessions

3 special sessions:

Full papers accepted for the special sessions will be published in SP9 conference proceedings.

When submitting a paper to a special session, please, mark the correct special session in the list of topics in EasyChair




Prosody in Speech and Music

Maciej Karpiński (Institute of Linguistics, Adam Mickiewicz University in Poznań, Poland, This email address is being protected from spambots. You need JavaScript enabled to view it.)

Piotr Podlipniak (Institute of Musicology, Adam Mickiewicz University in Poznań, Poland, This email address is being protected from spambots. You need JavaScript enabled to view it.)

Language and music are immanent components of human culture. Speech prosody and music are closely bound both evolutionary and in individual development and, in spite of differences, frequently described in similar terms. They share many neural processing paths and cognitive resources. Their mutual interdependencies are complex but, at the same time, they provide an extended, deeper insight into the nature of each of them. We hope that the Special Session will attract significant research contributions to the present image of the relationship between speech prosody and music. We invite Authors to submit original empirical studies as well as proposals of models and theoretical approaches related to the following topics as well as others relevant to the theme of the session and the profile of the conference.

  • Neural underpinnings of speech and music prosody: What is shared, what separated?
  • Cognitive aspects of speech prosody and music processing
  • Evolutionary relationship between speech prosody and music
  • Mutual influence of speech prosody and music
  • Music and speech prosody in the social and cultural context
  • Emotional expression in music and in speech prosody: perception and instrumental research
  • Prosodic features in speech and in music: pitch changes, timing, and intensity
  • Structural characteristics of speech prosody and music
  • Singing vs. speaking voice: perception, production, and instrumental research
  • Singing techniques: acoustic and articulatory characteristics
  • Prominence in speech and in music
  • Meaning in speech prosody and music
  • Perception and production of rhythmic and melodic violations in speech and in music
  • Deficiencies and disorders in speech prosody and music processing

It is essential that both language prosody and music are involved in each submission. We encourage inter-, cross- and multidisciplinary studies that go beyond phonetics and phonology of suprasegmentals.


Selected relevant publications:

  • Ackermann, H., Hage, S. R., & Ziegler, W. (2014). Brain mechanisms of acoustic communication in humans and nonhuman primates: An evolutionary perspective. Behavioral and Brain Sciences, 37(6), 529–546.
  • Borchgrevink, H. M. (1982). Prosody and musical rhythm are controlled by the speech hemisphere. In Music, mind, and brain (pp. 151-157). Springer US.
  • Brandt, A., Gebrian, M., & Slevc, L. R. (2012). Music and early language acquisition. Frontiers in Psychology, 3(SEP), 1–17.
  • Carlson, R., Friberg, A., Frydén, L., Granström, B., & Sundberg, J. (1989). Speech and music performance: Parallels and contrasts. Contemporary Music Review, 4(1), 391-404.
  • Goerlich, K. S., Witteman, J., Aleman, A., & Martens, S. (2011). Hearing feelings: affective categorization of music and speech in alexithymia, an ERP study. PloS one, 6(5), e19501.
  • Laukka, P., & Juslin, P. N. (2007). Similar patterns of age-related differences in emotion recognition from speech and music. Motivation and Emotion, 31(3), 182-191.
  • London, J. (2012). Three Things Linguists Need to Know About Rhythm and Time in Music. Empir. Musicol. Rev. 7, 5–11.
  • Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: behavioral and electrophysiological approaches. Journal of Cognitive Neuroscience, 18(2), 199-211.
  • Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a foreign language better than nonmusicians: behavioral and electrophysiological evidence. Journal of Cognitive Neuroscience, 19(9), 1453-1463.
  • McMullen, E., & Saffran, J. R. (2004). Music and language: A developmental comparison. Music Perception: An Interdisciplinary Journal, 21(3), 289-311.
  • Mithen, S. (2005). The singing Neanderthal. London: Weidenfeld & Nicholson.
  • Patel, A. D. (2005). The relationship of music to the melody of speech and to syntactic processing disorders in aphasia. Annals of the New York Academy of Sciences, 1060(1), 59-70.
  • Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology, 2.
  • Patel, A. D., & Daniele, J. R. (2003). An empirical comparison of rhythm in language and music. Cognition, 87(1), B35-B45.
  • Patel, A. D., & Iversen, J. R. (2007). The linguistic benefits of musical abilities. Trends in cognitive sciences, 11(9), 369-372.
  • Patel, A. D., Iversen, J. R., & Rosenberg, J. C. (2006). Comparing the rhythm and melody of speech and music: The case of British English and French. The Journal of the Acoustical Society of America, 119(5), 3034-3047.
  • Scherer, K. R. (1991). Emotion expression in speech and music. In Music, language, speech and brain (pp. 146-156). Macmillan Education UK.
  • Sundberg, J., & Rossing, T. D. (1990). The science of singing voice. Journal of the Acoustical Society of America, 87(1), 462-463.
  • Trainor, L. J., Austin, C. M., & Desjardins, R. N. (2000). Is infant-directed speech prosody a result of the vocal expression of emotion?. Psychological science, 11(3), 188-195.
  • Trehub, S. E., Trainor, L. J., & Unyk, A. M. (1993). Music and speech processing in the first year of life. Advances in child development and behavior, 24, 1-35
  • Wolfe, J. (2002). Speech and music, acoustics and coding, and what music might be ‘for’. In Proc. 7th International Conference on Music Perception and Cognition (pp. 10-13).
  • Zimmermann, E., Leliveld, L., and Schehka, S. (2013). “Toward the evolutionary roots of affective prosody in human acoustic communication: A comparative approach to mammalian voices,” in Evolution of Emotional Communication: from Sounds in Nonhuman Mammals to Speech and Music in Man., eds. E. Altenmüller, S. Schmidt, and E. Zimmermann (Oxford, New York), 116–132.




Prosodic Analysis in Digital Humanities 


Burkhard Meyer-Sickendiek, Freie Universität Berlin, Institut für deutsche und niederländische Philologie, Habelschwerdter Allee 45, 14195 Berlin, Telefon: +49-30-83857841
Email: This email address is being protected from spambots. You need JavaScript enabled to view it.

Hussein Hussein, Freie Universität Berlin, Institut für deutsche und niederländische Philologie, Habelschwerdter Allee 45, 14195 Berlin, Telefon: +49-30-83863664
Email: This email address is being protected from spambots. You need JavaScript enabled to view it.

Digital Humanities (DH), which have experienced an incredible gain in momentum over the last 10-15 years, have provided researchers in the humanities with a vast amount and new types of research data: digital editions of texts, films, poems, musical pieces and other semiotic artifacts (accessible via Google Books, zeno org, large digital editions like PHI or the Perseus Digital Library as well as online poetry collections or Sound and video recordings). With regards to these new Corpora held by broadcasters, archives and libraries, our special session addresses the question how to apply speech-analysis-techniques to these large-scale collections of speech-based audio and video. Obviously, the idea of language in these cultural archives differs from the everyday language, as long as it contains a certain kind of ‘artificial prosody’. There are a number of examples, such as metered language in poetry, dialogues and narrative modes in feature films (as distinct from documentaries, etc.), Hook-lines in pop-songs, vocal styles in music such as jazz, pop, blues, soul, country, folk, and rock, or narrative voices in audiobooks. Examining these different kinds of speech prosodies in cultural artefacts by digital methods means to develop new forms of (prosodic) modeling. In our special session, we want to discuss and practice these forms of modeling concerning prosody and speech recognition. A special focus will be on prosodic features like pitch, loudness, tempo, and rhythm in speech to convey information about the structure and meaning of a dialogue in a film, a musical style in a popsong, or a line in a poem.

The background of our special session is a research project from the program “Mixed Methods in the Humanities?” of the Volkswagen-Foundation. Our project “Rhythmicalizer. A digital tool to identify free verse prosody” (FU Berlin) uses a data-collection from the internet website lyrikline.org, the most important website for international poetry with more than 10,000 poems by 1,000 international poets from over 60 different countries. The analysis of this corpus means to carry out the annotation work to identify prosodic units—accentuation and intonation phrase— and to transform this manual annotation into an automatic analysis based on machine learning techniques.

In addition to applications in language processing, the methods to be developed in digital humanities are also of interest for other learning problems from sequential, high-variable acoustic data. Our special session addresses researchers in the domain of computer science as well as the humanities resp. linguistics, who are interested in this new and challenging field of digital humanities: researchers with a demonstrable understanding of one or more humanities disciplines and the willingness to undertake interdisciplinary scientific work based on digital techniques.


Burkhard Meyer-Sickendiek studied German, History and Philosophy at the University of Bielefeld from 1990 to 1996. In 1999 he held his PhD at the University of Tübingen. In 2008, he finished his habilitation on literary sarcasm in the German-Jewish modernism at the Ludwigs-Maximilian-Universität (LMU) in Munich. In October 2008, Meyer-Sickendiek became a guest professor at the Clusters of Excellence "Languages of Emotion" at Freie Universität Berlin (FUB), from 2010 to 2015 he got a Heisenberg scholarship of the Deutsche Forschungsgemeinschaft (DFG), working on modern and postmodern poetry. Since January 2016, Meyer-Sickendiek heads a research group at FUB in cooperation with the portal Lyrikline.org funded by the Volkswagen Foundation to develop a digital tool for automatic prosody recognition in (post-) modern online poetry.

Hussein Hussein studied at the Dresden University of Technology (TUD) in Germany, and achieved the Master of Science in the field of acoustic and speech communication in 2007. The same year, he started his PhD studies on prosodic analysis and synthesis at the TUD and completed them in 2013. In addition to his studies, Mr. Hussein worked as a research assistant, at the Laboratory of Acoustics and Speech Communication at the TUD, Beuth University of Applied Sciences (BHT) in Berlin, Germany, Chemnitz  University of Technology in Chemnitz, Germany and since August 2017 at the Free University in Berlin, Germany.


List of papers in the field of prosodic analysis in digital humanities:

  • "Large-scale Analysis of Spoken Free-verse Poetry", LT4DH 2016, Osaka, Japan, 2016, Universität Hamburg and Freie Universität Berlin, Timo Baumann; Burkhard Meyer-Sickendiek.
  • “Rhythmicalizer. A digital tool to identify free verse prosody”, in: Linguistics and Literature Studies, Freie Universität Berlin and Universität Hamburg, Burkhard Meyer-Sickendiek; Hussein Hussein; Timo Baumann (submitted).
  •  ‚free verse prosody‘: Eine Herausforderung an die germanistische Lyriktheorie, in: Deutsche Vierteljahrsschrift für Literaturwissenschaft und Geistesgeschichte 4 (2017), Freie Universität Berlin, Burkhard Meyer-Sickendiek (accepted).
  • ‚creating a spontaneous bop prosody‘: US-Import und literarische Rhythmik im Werk Rolf-Dieter Brinkmanns, in: Deutsche Vierteljahrsschrift für Literaturwissenschaft und Geistesgeschichte 3 (2014), S. 369-391, Freie Universität Berlin, Burkhard Meyer-Sickendiek.




Prosody in Social Contexts: Interpersonal Stance and Social Meanings as Encoded by Vocal Cues

Xiaoming Jiang, This email address is being protected from spambots. You need JavaScript enabled to view it.
Marc D. Pell, This email address is being protected from spambots. You need JavaScript enabled to view it.
McGill University, School of Communication Sciences and Disorders
Montreal, Quebec, Canada
2001 ave. McGill College, 8th Floor, Montreal (QC) H3G 1A8 Canada

Speech prosody is intrinsically communicative. While for decades, research on prosody has often focused on the communication of linguistic or emotional meanings, new work is beginning to characterize how interpersonal stance and social relations are represented by paralinguistic cues. This means that speech prosody is becoming a major focus for studying social inference-making and person perception.

A number of challenges face empirical studies of how social factors influence prosody and social perception. For example, how do we successfully elicit and define prosody in different interpersonal contexts, how do we understand the relationship between prosody and pragmatic-cognitive abilities, and how do we link prosodic features and their operation to interactive features of the mind/brain? These challenges are not only faced by experimental researchers but also by speech engineers who build artificial intelligence that possesses human-like ability to understand speech in various applied social contexts.

To provoke thought and provide some answers to these challenging questions, this session will focus on recent empirical discoveries on social communication through vocal and speech signals, highlighting what can be learned by adopting a “social cognitive neuroscientific” approach to studying speech prosody. We showcase the recent advancement in this field and facilitate the opportunity of collaborative research on speech prosody during social interaction. We aim to present a multiscale picture of how voice is encoded and decoded in social contexts by combining evidence from acoustic analysis, real-time brain activities (EEG, fMRI) and experimental paradigms in experimental psychology. Computational modeling will also be a welcome topic to address social representation of the speech and voice.

We invite submissions and will organize four dedicated oral presentations which will discuss three broad questions centering around prosody in social contexts. The submitted presentations will focus on how: vocal cues in spoken language encode a speakers’ emotions, “hidden meanings” (white lies, innuendos), interpersonal stance (confidence, politeness), and mental state (believability) to the listener; how the neurocognitive system decodes vocal signals and generates inferences about the true intentions of speakers in different social communicative contexts; and how social-contextual (speaker motivation), individual (speaker tension and social status), and cultural variables (speaker accent) predict social interpretations based on speech prosody.

With unique contributions from diverse disciplines, the demonstration of the promising role using state-of-the-art methodologies, and the dedicated talks characterizing the prosodic mechanisms and unique patterns that reveal the nature of interpersonal and social interaction, we believe that this special session will be of great interest to the wide audience of the SP conference and be a unique contribution to the theme of the SP conference 2018.


Xiaoming Jiang, PhD, develops his research agenda of studying the neurocognitive diversity underlying speech and socio-pragmatic communication who takes a unique approach of combining experimental psychology, neuroimaging/neurophysiology and computational modeling. His recent interest lies in how speaker social group (e.g. sex, accent, cultural background) affects the encoding and decoding of speaker confidence and trustworthiness. His work has received wide media attentions such Forbes and New Scientist. He has been a postdoctoral researcher and research associate in the School of Communication Sciences and Disorders in McGill University and a senior speech scientist in Nuance Communication (Montréal, Canada).

Marc D. Pell, PhD, has a broad interest in how humans communicate their emotions, attitudes and intentions in speech, in healthy adults and those with acquired disease of the brain (e.g., stroke, Parkinson’s disease). Much of his research has studied how a speaker’s tone of voice conveys different meanings in spoken language, and how listeners use these cues as a source of pragmatic information for understanding another person’s emotions and cognitive state. He holds appointments as James McGill Professor and Director of the School of Communication Sciences and Disorders, Faculty of Medicine, McGill University (Montréal, Canada).

Template by L.THEME. Photo by Kubiak