Article published In:
ITL - International Journal of Applied Linguistics: Online-First ArticlesSwedish word family resource
Construction, applicability, strengths and first experiments
The article introduces a novel lexical resource for Swedish based on word family principles. The development of
the Swedish Word Family (SweWF) resource is set into the context of linguistic complexity in second language acquisition. The
SweWF is particularly appropriate for that, given that it contains lexical items used in second language corpora, namely, in a
corpus of coursebook texts, and in a corpus of learner essays. The main focus of the article is on the construction of the
resource with its user interface and on its applicability for research, although it also opens vast possibilities for practical
applications for language learning, testing and assessment. We demonstrate the value of the resource through several case
studies.
Keywords: word family resources, linguistic complexity, graded lexical resources, Swedish as a second language, word formation morphology
Article outline
- 1.Introduction
- 2.Related research
- 3.Constructing the Swedish Word Family resource
- 3.1Swedish morphology – a short introduction
- 3.2Morphological complexity in a CAF context
- 3.3CoDeRooMor resource
- 3.4The Swedish Word Family user interface
- 3.5Limitations and discussion
- 1.Polysemy within word families
- 2.Hierarchy within word families
- 3.Replicability
- 4.Manual versus automatic approaches to morphological analysis
- 4.Hypotheses and case studies
- 4.1Hypothesis 1: Distribution of singleton families over levels
- 4.2Hypothesis 2: Morphological complexity within a family
- 4.3Hypothesis 3: Extralinguistic insights through word families
- 5.Conclusions and future work
- Notes
-
References
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at [email protected].
References (73)
Allén, Sture, Berg, Sture, Järborg, Jerker, Löfström, Jonas, Ralph, Bo, Sjögreen, Christian. (1980). Nusvensk
frekvensordbok baserad på tidningstext. Frequency Dictionary of Present-Day Swedish based on newspaper
material. 41. Ordled Betydelser. Morphemes
Meanings. Stockholm: Almqvist & Wiksell.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Anthony, Laurence. (2022). AntWordprofiler
[computer software]. [URL]
Baroni, Marco, & Evert, Stefan. (2014). The
zipfR package for lexical statistics: A tutorial introduction. [[URL]]
Bauer, Laurie, & Nation, Paul. (1993). Word
families. International journal of
Lexicography,
6
(4), 253–279. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Baayen, R. Harald, Piepenbrock, Richard, & Gulikers, Leon. (1996). The
CELEX lexical database (CD-rom).![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bolshakova, Elena, & Sapin, Alexander. (2020). An
experimental study of neural morpheme segmentation models for Russian word
forms. In CMCL (pp. 79–89).![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Borin, Lars, Forsberg, Markus, & Roxendal, Johan. (2012). Korp –
the corpus infrastructure of Språkbanken. In Proceedings of the
Eighth International Conference on Language Resources and Evaluation
(LREC’12), pp. 474–478.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bratlie, Siri Steffensen, Brinchmann, Ellen Irén, Melby-Lervåg, Monica, & Torkildsen, Janne von Koss. (2022). Morphology – A
Gateway to Advanced Language: Meta-Analysis of Morphological Knowledge in Language Minority
Children. Review of Educational
Research,
92
(4), 614–650. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Brezina, Vaclav, & Pallotti, Gabriele. (2019). Morphological
complexity in written L2 texts. Second language
research,
35
(1), 99–119. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Brown, Dale. (2018). Examining
the word family through word lists. Vocabulary Learning and
Instruction,
7
(1), 51–65. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Brown, Dale, Stoeckel, Tim, Mclean, Stuart, & Stewart, Jeff. (2022). The
most appropriate lexical unit for L2 vocabulary research and pedagogy: A brief review of the
evidence. Applied
Linguistics,
43
(3), 596–602. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Capel, Annette. (2012). Completing
the English vocabulary profile: C1 and C2 vocabulary. English Profile
Journal
3
1, pp.1–14. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cobb, Tom. (2021). Compleat
Web VP v.2.5. [Computer programme]. [URL]
Cobb, Tom, & Laufer, Batia. (2021). The
nuclear word family list: A list of the most frequent family members, including base and affixed
words. Language
Learning,
71
(3), 834–871. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Coulange, Sylvain, Jouannaud, Marie-Pierre, Cervini, Cristiana, & Masperi, Monica. (2020). From
placement to diagnostic testing: Improving feedback to learners and other stakeholders in SELF (Système d’Evaluation en
Langues à visée Formative). Language Learning in Higher
Education,
10
(1), 195–205. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Council of Europe [COE]. (2020). Common
European Framework of Reference for Languages: learning, teaching, assessment: companion
volume. Council of Europe Publishing.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Coxhead, Averil. (1998). An
academic word list. Vol. 181. School of Linguistics and Applied Language Studies, Victoria University of Wellington.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
De Clercq, Bastien, & Housen, Alex. (2019). The
development of morphological complexity: A cross-linguistic study of L2 French and
English. Second Language
Research,
35
(1), 71–97. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
de la Torre García, Nuria, Ainciburu, María Cecilia, & Buyse, Kris. (2021). Morphological
complexity and rated writing proficiency: The case of verbal inflectional diversity in L2
Spanish. ITL-International Journal of Applied
Linguistics,
172
(2), 290–318. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Creutz, Mathias & Lagus, Krista. (2007). Unsupervised
models for morpheme segmentation and morphology learning. ACM Transactions on Speech and
Language Processing
(TSLP), 4(1):1–34. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Dijkstra, Ton, Martín, Fermín Moscoso del Prado, Schulpen, Béryl, Schreuder, Robert, & Baayen, R. Harald. (2005). A roommate in
cream: Morphological family size effects on interlingual homograph recognition. Language and
cognitive
processes,
20
(1/2), 7–41. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Dokulil, Miloš. (1962). Tvoření
slov v češtině: Dokulil, M. Teorie odvozování slov. Nakl. Československé akademie věd.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Fellner, Hannes A., & Hill, Nathan. (2019). Word
families, allofams, and the comparative method. Cahiers de linguistique Asie
orientale,
48
(2), 91–124. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Fliessbach, K., Weis, S., Klaver, P., Elger, C. E., & Weber, B. (2006). The
effect of word concreteness on recognition memory. NeuroImage (Orlando,
Fla.),
32
(3), 1413–1421. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Forsberg, Fanny, & Bartning, Inge. (2010). Can
linguistic features discriminate between the communicative CEFR-levels?: A pilot study of written L2
French. In Barthing, I., Martin, M. and Vedder, I. Communicative
proficiency and linguistic development: Intersections between SLA and language testing
research (2010): 81–99.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
François, Thomas, Volodina, Elena, Pilán, Ildikó, & Tack, Anaïs. (2016). SVALex:
a CEFR-graded lexical resource for Swedish foreign and second language
learners. In Proceedings of the Tenth International Conference on
Language Resources and Evaluation
(LREC’16), pp. 213–219.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gaillat, Thomas, Knefati, Anas, & Lafontaine, Antoine. (2021). Towards
a Data Analytics Pipeline for the Visualisation of Complexity Metrics in L2
writings. In 16th Workshop on Innovative Use of NLP for Building
Educational Applications.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Gardner, Dee, & Davies, Mark. (2014). A
new academic vocabulary list. Applied
linguistics,
35
(3), 305–327. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Heatley, A., Nation, Paul, & Coxhead, Averil. (2002). Range
[Computer software]. [URL]
Hiebert, Elfrieda H., Goodwin, Amanda P., & Cervetti, Gina N. (2018). Core vocabulary: Its
morphological content and presence in exemplar texts. Reading Research
Quarterly,
53
(1), 29–49. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Housen, Alex, & Kuiken, Folkert. (2009). Complexity,
accuracy and fluency in second language acquisition. Applied
Linguistics,
30
(4), 461–473. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kilgarriff, Adam, Charalabopoulou, Frieda, Gavrilidou, Maria, Johannessen, Janne Bondi, Khalil, Saussan, Johansson Kokkinakis, Sofie, Lew, Robert, Sharoff, Serge, Vadlapudi, Ravikiran, & Volodina, Elena. (2014). Corpus-based
vocabulary lists for language learners for nine languages. Language resources and
evaluation,
48
(1), 121–163. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Kimppa, Lilli, Shtyrov, Yury, Hut, Suzanne C. A., Hedlund, Laura, Leminen, Miika, & Leminen, Alina. (2019). Acquisition
of L2 morphology by adult language
learners. Cortex,
116
1, 74–90. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Krippendorff, Klaus. (2011). Computing
Krippendorff’s alpha-reliability. Annenberg School for Communication Departmental Papers:
Philadelphia.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Körtvélyessy, Lívia, Bagasheva, Alexandra, & Štekauer, Pavol (eds.). (2020). Derivational
networks across languages. De Gruyter Mouton. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lango, Mateusz, Žabokrtský, Zdeněk, & Ševčíková, Magda. (2021). Semi-automatic
construction of word-formation networks. Language Resources and
Evaluation,
55
(1), 3–32. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Laufer, Batia. (2021). LEMMAS,
FLEMMAS, WORD FAMILIES, AND COMMON SENSE. Studies in Second Language
Acquisition,
43
(5), 965–968. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Laufer, Batia, & Nation, Paul. (1995). Vocabulary
Size and Use: Lexical Richness in L2 Written Production, Applied
Linguistics,
16
(3), 307–322. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Laufer, Batia, Webb, Stuart, Kim, Su Kyung, & Yohanan, Beverley. (2021). How
well do learners know derived words in a second language? The effect of proficiency, word frequency and type of
affix. ITL-International Journal of Applied
Linguistics 172:2, pp.229–258. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Leminen, Alina, Smolka, Eva, Dunabeitia, Jon A., & Pliatsikas, Christos. (2019). Morphological
processing in the brain: The good (inflection), the bad (derivation) and the ugly
(compounding). Cortex,
116
1, 4–44. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Leontjev, Dmitri, Huhta, Ari, & Tolvanen, Asko. (2022). L2
English Vocabulary breadth and knowledge of derivational morphology: One or two
constructs? Language
testing,
39
(1), 1–25.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Li, Juan, Hongquan Jiang, Aihua Shang, and Jingli Chen. (2021). Research
on associative learning mechanisms of L2 learners based on complex network theory. Computer
Assisted Language
Learning 34, no. 5–6: 637–662. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lindström Tiedemann, Therese. (2021). Official
L2P morphology annotation guidelines. [URL]
Lindström Tiedemann, Therese, Alfter, David, Mohammed, Yousuf Ali, Piipponen, Daniela, Silén, Beatrice, Volodina, Elena. (in
press). Multiword expressions in Swedish as a second language: taxonomy, annotation and
initial results. In: Giouli, Voula & Mititelu, Verginica Barbu (eds.), Multiword expressions in
language resources. Linguistic, Lexicographic and Computational
Considerations. Berlin: Language Science Press.
Lindström Tiedemann, Therese, Alfter, David, & Volodina, Elena. (2022). CEFR-nivåer
och svenska flerordsuttryck. In: S. Björklund, B. Haagensen, M. Nordman & A. Westerlund (eds.), Svenskan
i Finland 19: Föredrag vid den nittonde sammankomsten för beskrivningen av svenskan i Finland, Vasa den 6–7 maj
2021. Vasa: Svensk-österbottniska samfundet, pp. 218–233.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Lüdeling, Anke, Hirschmann, Hagen, & Shadrova, Anna. (2017). Linguistic
models, Acquisition Theories, and Learner Corpora: Morphological productivity in SLA research exemplified by complex verbs in
German. Language
learning,
67
(S1), 96–129. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Michel, Marije. (2017). Complexity,
accuracy, and fluency in L2 production. In The Routledge handbook of
instructed second language
acquisition, pp. 50–68. Routledge. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Morin, Regina. (2006). Building
depth of Spanish L2 vocabulary by building and using word
families. Hispania 89:1: 170–182. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nation, Paul. (2012). The
BNC/COCA word family lists. [URL]
Nation, Paul, & Heatley, A. (1996). VocabProfile,
Word and Range: Programs for Processing Text. LALS, Victoria University of Wellington.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nikolaev, Alexandre, Ashaie, Sameer, Hallikainen, Merja, Hänninen, Tuomo, Higby, Eve, Hyun, JungMoon, Lehtonen, Minna, & Soininen, Hilkka. (2019). Effects
of morphological family on word recognition in normal aging, mild cognitive impairment, and Alzheimer’s
disease. Cortex,
116
1, 91–103. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Sasao, Yosuke, & Webb, Stuart. (2017). The
word part levels test. Language Teaching
Research,
21
(
1
), 12–30. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Schmitt, Norbert, & Zimmerman, Cheryl Boyd. (2002). Derivative word
forms: What do learners know?. TESOL
quarterly,
36
(2), 145–171. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Smit, Peter, Virpioja, Sami, Grönroos, Stig-Arne & Kurimo, Mikko. (2014). Morfessor
2.0: Toolkit for statistical morphological
segmentation. In
The 14th Conference of the European
Chapter of the Association for Computational Linguistics (EACL)
, Gothenburg,
Sweden, April 26–30, 2014. Aalto University.
Šnajder, Jan. (2014). DerivBase.hr:
A high-coverage derivational morphology resource for
Croatian. In Proceedings of the Ninth International Conference on
Language Resources and Evaluation
(LREC’14), pp. 3371–3377.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Snoder, Per, & Laufer, Batia. (2022). EFL
Learners’ Receptive Knowledge of Derived Words: The Case of Swedish Adolescents. TESOL
Quarterly. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Sorokin, Alexey & Kravtsova, Anastasia. (2018). Deep
convolutional networks for supervised morpheme segmentation of Russian
language. In Conference on Artificial Intelligence and Natural
Language, p. 3–10. Springer. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Stoeckel, Tim, Ishii, Tomoko, & Bennett, Phil. (2020). Is
the lemma more appropriate than the flemma as a word counting unit? Applied
Linguistics,
41
(4), 601–606. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Talamo, Luigi, Celata, Chiara, & Bertinetto, Pier Marco. (2016). DerIvaTario: An
annotated lexicon of Italian derivatives. Word
Structure,
9
(1), 72–102. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Teleman, Ulf, Hellberg, Staffan, Andersson, Erik, & Christensen, Lisa. (1999). Svenska
Akademiens Grammatik. Stockholm: Svenska Akademien & Norstedts ordbok.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Volodina, Elena, Mohammed, Yousuf Ali, & Lindström Tiedemann, Therese. (2021). CoDeRooMor:
A new dataset for non-inflectional morphology studies of
Swedish. In Proceedings of the 23rd Nordic Conference on
Computational Linguistics
(NoDaLiDa), pp. 178–189.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Volodina, Elena, Mohammad, Yousuf Ali & Tiedemann Lindström, Therese. (2022). Lyxig
språklig födelsedagspresent from the Swedish Word Family. In Volodina, Dannélls, Berdicevskis, Forsberg and Virk (editors), Live
and Learn – Festschrift in honor of Lars
Borin, pages 153–160.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Volodina, Elena, Pilán, Ildikó, Enström, Ingegerd, Llozhi, Lorena, Lundkvist, Peter, Sundberg, Gunlög, & Sandell, Monica. (2016). SweLL
on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings
of LREC 2016, Slovenia.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Volodina, Elena, Pilán, Ildikó, Rødven Eide, Stian, & Heidarsson, Hannes. (2014). You
get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second
Language. Proceedings of the third workshop on NLP for computer-assisted language
learning. NEALT Proceedings Series 22 / Linköping Electronic Conference Proceedings
107: 128–144.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Webb, Stuart. (2021). Word
families and lemmas, not a real dilemma: Investigating lexical units. Studies in Second
Language
Acquisition,
43
(5), 973–984. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Žabokrtský, Zdeněk, Ševčíková, Magda, Straka, Milan, Vidra, Jonáš, & Limburská, Adéla. (2016). Merging
data resources for inflectional and derivational morphology in
Czech. In Proceedings of the Tenth International Conference on
Language Resources and Evaluation
(LREC’16), pp. 1307–1314.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)