“Going standard” on a blank page
A corpus-based approach to the written varieties of the
Italian Western Alps minorities (Occitan, Francoprovençal and Walser)
This chapter investigates non-standard languages,
i.e., those which are dialectal, non-standardised – or standardised
to a very limited extent, represented by the local linguistic
varieties that populate the Italian Western Alps. Despite the fact
that these have almost exclusively existed as spoken languages
throughout their history, our particular aim is to discuss methods
and problems raised by the investigation of written corpora of these
varieties from a corpus linguistics perspective. This is especially
challenging because corpus linguistics usually employs methods and
standards elaborated for standard(ised) written varieties. Focusing
the Occitan and Francoprovençal varieties, it is shown that the
different historical backgrounds of the two languages also have an
impact on their speakers’ attitude towards standardisation and on
how texts are produced and accordingly made accessible for corpus
linguistics methods.
Article outline
- 1.Introduction
- 2.The Western Alps minority languages: An overview
- 3.The CLiMAlp project
- 4.The “corpus-based” approach: Romance languages
- 4.1Written OC and FP standards
- 4.2Populating strategies
- 4.2.1The dictionaries
- 4.2.2The corpora
- 4.3“Machine learning” performances
- 5.Summary and outlook
-
Acknowledgements
-
Notes
-
References