Chapter 4
Corpus and method
Article outline
- 4.1
The DisFrEn dataset
- 4.1.1Source corpora
- 4.1.2Comparable corpus design
- 4.1.3Corpus structure in situational features
- 4.2
Discourse marker annotation
- 4.2.1
Identification of DM tokens
- 4.2.2Functional taxonomy
- 4.2.3Three-fold positioning system
- 4.2.4Other variables
- 4.2.5Annotation procedure
- 4.2.5.1Software
- 4.2.5.2Disambiguation method
- 4.3Disfluency annotation
- 4.3.1Simple fluencemes
- 4.3.1.1
Silent pauses
- 4.3.1.2Filled pauses
- 4.3.1.3
Explicit editing terms
- 4.3.1.4False-starts
- 4.3.1.5Truncations
- 4.3.2Compound fluencemes
- 4.3.2.1Identical repetitions
- 4.3.2.2Modified repetitions
- 4.3.2.3Morphosyntactic substitutions
- 4.3.2.4Propositional substitutions
- 4.3.3Related phenomena and diacritics
- 4.3.4Annotation procedure
- 4.3.4.1Technical format
- 4.3.4.2
Scope of the disfluency annotation
- 4.3.4.3Replicability of the disfluency annotation
- 4.3.5Macro-labels of sequences
- 4.4Summary
-
Notes