Edited by Gisle Andersen
[Studies in Corpus Linguistics 49] 2012
► pp. 79–110
Multiword expressions (MWEs) are words that co-occur so often that they are perceived as a linguistic unit. Since MWEs pervade natural language, their identification is pertinent for a range of tasks within lexicography, terminology and language technology. We apply various statistical association measures (AMs) to word sequences from the Norwegian Newspaper Corpus (NNC) in order to rank two-and three-word sequences (bigrams and trigrams) in terms of their tendency to co-occur. The results show that some statistical measures favour relatively frequent MWEs (e.g. i motsetning til ‘as opposed to’), whereas other measures favour relatively low-frequent units, which typically comprise loan words (de facto), technical terms (notaries publicus) and phrasal anglicisms (practical jokes; cf. G. Andersen this volume). On this basis we evaluate the relevance of each of these measures for lexicography, terminology and language technology purposes.
This list is based on CrossRef data as of 17 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.