Swedish word family resource
Construction, applicability, strengths and first experiments
The article introduces a novel lexical resource for Swedish based on word family principles. The development of
the Swedish Word Family (SweWF) resource is set into the context of linguistic complexity in second language acquisition. The
SweWF is particularly appropriate for that, given that it contains lexical items used in second language corpora, namely, in a
corpus of coursebook texts, and in a corpus of learner essays. The main focus of the article is on the construction of the
resource with its user interface and on its applicability for research, although it also opens vast possibilities for practical
applications for language learning, testing and assessment. We demonstrate the value of the resource through several case
studies.
Article outline
- 1.Introduction
- 2.Related research
- 3.Constructing the Swedish Word Family resource
- 3.1Swedish morphology – a short introduction
- 3.2Morphological complexity in a CAF context
- 3.3CoDeRooMor resource
- 3.4The Swedish Word Family user interface
- 3.5Limitations and discussion
- 1.Polysemy within word families
- 2.Hierarchy within word families
- 3.Replicability
- 4.Manual versus automatic approaches to morphological analysis
- 4.Hypotheses and case studies
- 4.1Hypothesis 1: Distribution of singleton families over levels
- 4.2Hypothesis 2: Morphological complexity within a family
- 4.3Hypothesis 3: Extralinguistic insights through word families
- 5.Conclusions and future work
- Notes
-
References
References (73)
References
Allén, Sture, Berg, Sture, Järborg, Jerker, Löfström, Jonas, Ralph, Bo, Sjögreen, Christian. (1980). Nusvensk
frekvensordbok baserad på tidningstext. Frequency Dictionary of Present-Day Swedish based on newspaper
material. 41. Ordled Betydelser. Morphemes
Meanings. Stockholm: Almqvist & Wiksell.
Anthony, Laurence. (2022). AntWordprofiler
[computer software]. [URL]
Baroni, Marco, & Evert, Stefan. (2014). The
zipfR package for lexical statistics: A tutorial introduction. [[URL]]
Bauer, Laurie, & Nation, Paul. (1993). Word
families. International journal of
Lexicography,
6
(4), 253–279.
Baayen, R. Harald, Piepenbrock, Richard, & Gulikers, Leon. (1996). The
CELEX lexical database (CD-rom).
Bolshakova, Elena, & Sapin, Alexander. (2020). An
experimental study of neural morpheme segmentation models for Russian word
forms. In CMCL (pp. 79–89).
Borin, Lars, Forsberg, Markus, & Roxendal, Johan. (2012). Korp –
the corpus infrastructure of Språkbanken. In Proceedings of the
Eighth International Conference on Language Resources and Evaluation
(LREC’12), pp. 474–478.
Bratlie, Siri Steffensen, Brinchmann, Ellen Irén, Melby-Lervåg, Monica, & Torkildsen, Janne von Koss. (2022). Morphology – A
Gateway to Advanced Language: Meta-Analysis of Morphological Knowledge in Language Minority
Children. Review of Educational
Research,
92
(4), 614–650.
Brezina, Vaclav, & Pallotti, Gabriele. (2019). Morphological
complexity in written L2 texts. Second language
research,
35
(1), 99–119.
Brown, Dale. (2018). Examining
the word family through word lists. Vocabulary Learning and
Instruction,
7
(1), 51–65.
Brown, Dale, Stoeckel, Tim, Mclean, Stuart, & Stewart, Jeff. (2022). The
most appropriate lexical unit for L2 vocabulary research and pedagogy: A brief review of the
evidence. Applied
Linguistics,
43
(3), 596–602.
Capel, Annette. (2012). Completing
the English vocabulary profile: C1 and C2 vocabulary. English Profile
Journal
3
1, pp.1–14.
Cobb, Tom. (2021). Compleat
Web VP v.2.5. [Computer programme]. [URL]
Cobb, Tom, & Laufer, Batia. (2021). The
nuclear word family list: A list of the most frequent family members, including base and affixed
words. Language
Learning,
71
(3), 834–871.
Coulange, Sylvain, Jouannaud, Marie-Pierre, Cervini, Cristiana, & Masperi, Monica. (2020). From
placement to diagnostic testing: Improving feedback to learners and other stakeholders in SELF (Système d’Evaluation en
Langues à visée Formative). Language Learning in Higher
Education,
10
(1), 195–205.
Council of Europe [COE]. (2020). Common
European Framework of Reference for Languages: learning, teaching, assessment: companion
volume. Council of Europe Publishing.
Coxhead, Averil. (1998). An
academic word list. Vol. 181. School of Linguistics and Applied Language Studies, Victoria University of Wellington.
De Clercq, Bastien, & Housen, Alex. (2019). The
development of morphological complexity: A cross-linguistic study of L2 French and
English. Second Language
Research,
35
(1), 71–97.
Creutz, Mathias & Lagus, Krista. (2007). Unsupervised
models for morpheme segmentation and morphology learning. ACM Transactions on Speech and
Language Processing
(TSLP), 4(1):1–34.
Dijkstra, Ton, Martín, Fermín Moscoso del Prado, Schulpen, Béryl, Schreuder, Robert, & Baayen, R. Harald. (2005). A roommate in
cream: Morphological family size effects on interlingual homograph recognition. Language and
cognitive
processes,
20
(1/2), 7–41.
Dokulil, Miloš. (1962). Tvoření
slov v češtině: Dokulil, M. Teorie odvozování slov. Nakl. Československé akademie věd.
Fellner, Hannes A., & Hill, Nathan. (2019). Word
families, allofams, and the comparative method. Cahiers de linguistique Asie
orientale,
48
(2), 91–124.
Fliessbach, K., Weis, S., Klaver, P., Elger, C. E., & Weber, B. (2006). The
effect of word concreteness on recognition memory. NeuroImage (Orlando,
Fla.),
32
(3), 1413–1421.
Forsberg, Fanny, & Bartning, Inge. (2010). Can
linguistic features discriminate between the communicative CEFR-levels?: A pilot study of written L2
French. In Barthing, I., Martin, M. and Vedder, I. Communicative
proficiency and linguistic development: Intersections between SLA and language testing
research (2010): 81–99.
François, Thomas, Volodina, Elena, Pilán, Ildikó, & Tack, Anaïs. (2016). SVALex:
a CEFR-graded lexical resource for Swedish foreign and second language
learners. In Proceedings of the Tenth International Conference on
Language Resources and Evaluation
(LREC’16), pp. 213–219.
Gaillat, Thomas, Knefati, Anas, & Lafontaine, Antoine. (2021). Towards
a Data Analytics Pipeline for the Visualisation of Complexity Metrics in L2
writings. In 16th Workshop on Innovative Use of NLP for Building
Educational Applications.
Gardner, Dee, & Davies, Mark. (2014). A
new academic vocabulary list. Applied
linguistics,
35
(3), 305–327.
Haspelmath, Martin. (2023). Defining
the
word. Word
69
(3):283–297.
Heatley, A., Nation, Paul, & Coxhead, Averil. (2002). Range
[Computer software]. [URL]
Hiebert, Elfrieda H., Goodwin, Amanda P., & Cervetti, Gina N. (2018). Core vocabulary: Its
morphological content and presence in exemplar texts. Reading Research
Quarterly,
53
(1), 29–49.
Housen, Alex, & Kuiken, Folkert. (2009). Complexity,
accuracy and fluency in second language acquisition. Applied
Linguistics,
30
(4), 461–473.
Karlgren, Bernhard. (1933). Word
families in Chinese. Stockholm.
Kilgarriff, Adam, Charalabopoulou, Frieda, Gavrilidou, Maria, Johannessen, Janne Bondi, Khalil, Saussan, Johansson Kokkinakis, Sofie, Lew, Robert, Sharoff, Serge, Vadlapudi, Ravikiran, & Volodina, Elena. (2014). Corpus-based
vocabulary lists for language learners for nine languages. Language resources and
evaluation,
48
(1), 121–163.
Kimppa, Lilli, Shtyrov, Yury, Hut, Suzanne C. A., Hedlund, Laura, Leminen, Miika, & Leminen, Alina. (2019). Acquisition
of L2 morphology by adult language
learners. Cortex,
116
1, 74–90.
Krippendorff, Klaus. (2011). Computing
Krippendorff’s alpha-reliability. Annenberg School for Communication Departmental Papers:
Philadelphia.
Körtvélyessy, Lívia, Bagasheva, Alexandra, & Štekauer, Pavol (eds.). (2020). Derivational
networks across languages. De Gruyter Mouton.
Lango, Mateusz, Žabokrtský, Zdeněk, & Ševčíková, Magda. (2021). Semi-automatic
construction of word-formation networks. Language Resources and
Evaluation,
55
(1), 3–32.
Laufer, Batia. (2021). LEMMAS,
FLEMMAS, WORD FAMILIES, AND COMMON SENSE. Studies in Second Language
Acquisition,
43
(5), 965–968.
Laufer, Batia, & Nation, Paul. (1995). Vocabulary
Size and Use: Lexical Richness in L2 Written Production, Applied
Linguistics,
16
(3), 307–322.
Leminen, Alina, Smolka, Eva, Dunabeitia, Jon A., & Pliatsikas, Christos. (2019). Morphological
processing in the brain: The good (inflection), the bad (derivation) and the ugly
(compounding). Cortex,
116
1, 4–44.
Leontjev, Dmitri, Huhta, Ari, & Tolvanen, Asko. (2022). L2
English Vocabulary breadth and knowledge of derivational morphology: One or two
constructs? Language
testing,
39
(1), 1–25.
Li, Juan, Hongquan Jiang, Aihua Shang, and Jingli Chen. (2021). Research
on associative learning mechanisms of L2 learners based on complex network theory. Computer
Assisted Language
Learning 34, no. 5–6: 637–662.
Lindström Tiedemann, Therese. (2021). Official
L2P morphology annotation guidelines. [URL]
Lindström Tiedemann, Therese, Alfter, David, Mohammed, Yousuf Ali, Piipponen, Daniela, Silén, Beatrice, Volodina, Elena. (in
press). Multiword expressions in Swedish as a second language: taxonomy, annotation and
initial results. In: Giouli, Voula & Mititelu, Verginica Barbu (eds.), Multiword expressions in
language resources. Linguistic, Lexicographic and Computational
Considerations. Berlin: Language Science Press.
Lindström Tiedemann, Therese, Alfter, David, & Volodina, Elena. (2022). CEFR-nivåer
och svenska flerordsuttryck. In: S. Björklund, B. Haagensen, M. Nordman & A. Westerlund (eds.), Svenskan
i Finland 19: Föredrag vid den nittonde sammankomsten för beskrivningen av svenskan i Finland, Vasa den 6–7 maj
2021. Vasa: Svensk-österbottniska samfundet, pp. 218–233.
Lüdeling, Anke, Hirschmann, Hagen, & Shadrova, Anna. (2017). Linguistic
models, Acquisition Theories, and Learner Corpora: Morphological productivity in SLA research exemplified by complex verbs in
German. Language
learning,
67
(S1), 96–129.
Michel, Marije. (2017). Complexity,
accuracy, and fluency in L2 production. In The Routledge handbook of
instructed second language
acquisition, pp. 50–68. Routledge.
Morin, Regina. (2006). Building
depth of Spanish L2 vocabulary by building and using word
families. Hispania 89:1: 170–182.
Nation, Paul. (2012). The
BNC/COCA word family lists. [URL]
Nation, Paul. (2021). Thoughts
on word families. Studies in Second Language
Acquisition,
43
(5), 969–972.
Nation, Paul, & Heatley, A. (1996). VocabProfile,
Word and Range: Programs for Processing Text. LALS, Victoria University of Wellington.
Nikolaev, Alexandre, Ashaie, Sameer, Hallikainen, Merja, Hänninen, Tuomo, Higby, Eve, Hyun, JungMoon, Lehtonen, Minna, & Soininen, Hilkka. (2019). Effects
of morphological family on word recognition in normal aging, mild cognitive impairment, and Alzheimer’s
disease. Cortex,
116
1, 91–103.
Sasao, Yosuke, & Webb, Stuart. (2017). The
word part levels test. Language Teaching
Research,
21
(
1
), 12–30.
Schmitt, Norbert, & Zimmerman, Cheryl Boyd. (2002). Derivative word
forms: What do learners know?. TESOL
quarterly,
36
(2), 145–171.
Smit, Peter, Virpioja, Sami, Grönroos, Stig-Arne & Kurimo, Mikko. (2014). Morfessor
2.0: Toolkit for statistical morphological
segmentation. In
The 14th Conference of the European
Chapter of the Association for Computational Linguistics (EACL)
, Gothenburg,
Sweden, April 26–30, 2014. Aalto University.
Šnajder, Jan. (2014). DerivBase.hr:
A high-coverage derivational morphology resource for
Croatian. In Proceedings of the Ninth International Conference on
Language Resources and Evaluation
(LREC’14), pp. 3371–3377.
Snoder, Per, & Laufer, Batia. (2022). EFL
Learners’ Receptive Knowledge of Derived Words: The Case of Swedish Adolescents. TESOL
Quarterly.
Sorokin, Alexey & Kravtsova, Anastasia. (2018). Deep
convolutional networks for supervised morpheme segmentation of Russian
language. In Conference on Artificial Intelligence and Natural
Language, p. 3–10. Springer.
Stoeckel, Tim, Ishii, Tomoko, & Bennett, Phil. (2020). Is
the lemma more appropriate than the flemma as a word counting unit? Applied
Linguistics,
41
(4), 601–606.
Svensk ordbok utgiven av Svenska
akademien (2009). Stockholm: Norstedts.
Svensson, Anders. (2022). Tre
av fyra nyord är
substantiv. Språktidningen 21 Jan. 2022.
Talamo, Luigi, Celata, Chiara, & Bertinetto, Pier Marco. (2016). DerIvaTario: An
annotated lexicon of Italian derivatives. Word
Structure,
9
(1), 72–102.
Teleman, Ulf, Hellberg, Staffan, Andersson, Erik, & Christensen, Lisa. (1999). Svenska
Akademiens Grammatik. Stockholm: Svenska Akademien & Norstedts ordbok.
Volodina, Elena, Mohammed, Yousuf Ali, & Lindström Tiedemann, Therese. (2021). CoDeRooMor:
A new dataset for non-inflectional morphology studies of
Swedish. In Proceedings of the 23rd Nordic Conference on
Computational Linguistics
(NoDaLiDa), pp. 178–189.
Volodina, Elena, Mohammad, Yousuf Ali & Tiedemann Lindström, Therese. (2022). Lyxig
språklig födelsedagspresent from the Swedish Word Family. In Volodina, Dannélls, Berdicevskis, Forsberg and Virk (editors), Live
and Learn – Festschrift in honor of Lars
Borin, pages 153–160.
Volodina, Elena, Pilán, Ildikó, Enström, Ingegerd, Llozhi, Lorena, Lundkvist, Peter, Sundberg, Gunlög, & Sandell, Monica. (2016). SweLL
on the rise: Swedish Learner Language corpus for European Reference Level studies. Proceedings
of LREC 2016, Slovenia.
Volodina, Elena, Pilán, Ildikó, Rødven Eide, Stian, & Heidarsson, Hannes. (2014). You
get what you annotate: a pedagogically annotated corpus of coursebooks for Swedish as a Second
Language. Proceedings of the third workshop on NLP for computer-assisted language
learning. NEALT Proceedings Series 22 / Linköping Electronic Conference Proceedings
107: 128–144.
Webb, Stuart. (2021). Word
families and lemmas, not a real dilemma: Investigating lexical units. Studies in Second
Language
Acquisition,
43
(5), 973–984.
Žabokrtský, Zdeněk, Ševčíková, Magda, Straka, Milan, Vidra, Jonáš, & Limburská, Adéla. (2016). Merging
data resources for inflectional and derivational morphology in
Czech. In Proceedings of the Tenth International Conference on
Language Resources and Evaluation
(LREC’16), pp. 1307–1314.
Zeller, Britta, Šnajder, Jan, & Padó, Sebastian. (2013). DErivBase:
Inducing and evaluating a derivational morphology resource for
German. In Proceedings of the 51st Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long
Papers), pp. 1201–1211.
Zhang, Dongbo, & Koda, Keiko. (2012). Contribution
of morphological awareness and lexical inferencing ability to L2 vocabulary knowledge and reading comprehension among advanced
EFL learners: testing direct and indirect effects. Reading and
writing
25
1, 1195–1216.