Edited by Anne Lacheret-Dujour, Sylvain Kahane and Paola Pietrandrea
[Studies in Corpus Linguistics 89] 2019
► pp. 21–34
In this chapter, we present the principles that we used for the orthographic and phonological transcriptions in Rhapsodie, as well as the process of automatic segmentation. We opted for three main principles in the orthographic transcription: (1) no adaptation of the standard spelling using tricks such as i-z-ont or pasque; (2) no punctuation; (3) phenomena that are peculiar to speech are duly represented: filled pauses, word repetitions, self-repairs, word fragments, interjections, onomatopoeias, and discourse markers. Then a phonetic transcription is obtained using an automatic grapheme-to-phoneme (g2p) conversion tool followed by manual verification; lastly, on the basis of the sound recording and the phonetic transcription, we provide a multi-layer alignment (or segmentation) at the phonetic, syllabic, and lexical levels, thanks to an automatic approach based on a speech recognition engine.