Constructing a corpus-informed list of Arabic formulaic sequences (ArFSs) for language pedagogy and
technology
This study aims to construct a corpus-informed list of Arabic Formulaic Sequences (ArFSs) for use in language
pedagogy (LP) and Natural Language Processing (NLP) applications. A hybrid mixed methods model was adopted for extracting ArFSs
from a corpus, that combined automatic and manual extracting methods, based on well-established quantitative and qualitative
criteria that are relevant from the perspective of LP and NLP. The pedagogical implications of this list are examined to
facilitate the inclusion of ArFSs in the process of learning and teaching Arabic, particularly for non-native speakers. The
computational implications of the ArFSs list are related to the key role of the ArFSs as a novel language resource in the
improvement of various Arabic NLP tasks.
Article outline
- 1.Introduction
- 2.Formulaic Sequences in language pedagogy and technology
- 2.1Corpus-informed pedagogical formulaic sequences
- 2.2Arabic computational MWEs research
- 3.Methodology: A hybrid model for FSs extraction
- 3.1Issues of frequency, extent and identification
- 3.2The corpus source of the language data
- 3.3The selection criteria
- 3.4Stages of constructing the FSs list
- 3.4.1Statistical phase
- 3.4.2Qualitative phase
- 3.4.3Linguistic analysis and classification phase
- 4.Results and discussion
- 5.Conclusions
- Acknowledgements
- Note
-
References