This study investigates the phenomenon of defectiveness in Russian case and number noun paradigms from the
perspective of distributional semantics. We made use of word embeddings, high-dimensional vectors trained from large text corpora,
and compared the observed paradigms of nouns that are defective in the genitive plural, as suggested by Zaliznjak (1977), with the observed paradigms for non-defective nouns. When the embeddings of about
20,000 inflected forms were projected onto a two-dimensional space, clusters of case and number within case were found, suggesting
global semantic similarity for words with the same inflectional features. Moreover, defective lexemes were characterized by lower
semantic transparency, in that inflected forms of the same lexeme are semantically less similar to each other, and their meanings
are also more idiosyncratic. Furthermore, compared to non-defective lexemes, inflected forms from defective lexemes are further
away from the idealized average case-number meanings, obtained by averaging over the vectors of all inflected forms of the same
case-number combination. As a consequence, the semantics of defective forms are predicted less precisely by a simple model of
conceptualization that assumes that the meaning of a given Russian inflected form is approximated well by the sum of pertinent
embeddings of the lexeme, case, and number within case. We conclude that the relationship between defectiveness and semantics, at
least the kind captured by word embeddings, is stronger than has been anticipated previously.
Baayen, R. H., Chuang, Y.-Y., Shafaei-Bajestan, E., and Blevins, J. (2019). The
discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production
grounded not in (de)composition but in linear discriminative
learning. Complexity.
Baerman, M. (2008). Historical
observations on defectiveness: the first singular non-past. Russian
Linguistics, 32(1):81–97.
Baerman, M. (2011). Defectiveness
and homophony avoidance. Journal of
Linguistics, 47(1):1–29.
Baerman, M., Brown, D., and Corbett, G. G. (2005). The
Syntax-Morphology Interface: A Study of Syncretism. Cambridge Studies in
Linguistics. Cambridge University Press.
Baerman, M. and Corbett, G. G. (2010). Introduction:
Defectiveness: Typology and diachrony. In Baerman, M., Corbett, G. G., and Brown, D., editors, Defective
Paradigms: Missing forms and what they tell
us, pages 1–18. Cambridge University Press.
Becker, M. and Gouskova, M. (2016). Source-Oriented
Generalizations as Grammar Inference in Russian Vowel Deletion. Linguistic
Inquiry, 47(3):391–425.
Benko, V. (2014). Compatible
sketch grammars for comparable corpora. In Abel, A., Vettori, C., and Ralli, N., editors, Proceedings
of the 16th EURALEX International
Congress, pages 417–430, Bolzano, Italy. EURAC research.
Bojanowski, P., Grave, E., Joulin, A., and Mikolov, T. (2017). Enriching
word vectors with subword information. Transactions of the Association for Computational
Linguistics, 51:135–146.
Boleda, G. (2020). Distributional
semantics and linguistic theory. Annu. Rev.
Linguist., 61:1–22.
Brown, D. and Arkadiev, P. (2018). Syncretism
(second edition). Oxford University Press. Oxford Bibliographies in Linguistics. New York: Oxford University Press.
Brown, D., Corbett, G. G., Fraser, N. M., Hippisley, A., and Timberlake, A. (1996). Russian
noun stress and network
morphology. Linguistics, 341:53–107.
Corbett, G. (2012). Features. Cambridge Textbooks in Linguistics. Cambridge University Press.
Daland, R., Sims, A. D., and Pierrehumbert, J. (2007). Much
ado about nothing: A social network model of Russian paradigmatic
gaps. In Proceedings of the 45th Annual Meeting of the Association of
Computational
Linguistics, pages 936–943, Prague, Czech Republic. Association for Computational Linguistics.
del Prado Martin, F. M., Kostić, A., and Baayen, R. H. (2004). Putting
the bits together: An information theoretical perspective on morphological
processing. Cognition, 94(1):1–18.
Firth, J. R. (1968). Selected
papers of J. R. Firth, 1952–59. Indiana University Press.
Gorman, K. and Yang, C. (2019). When
nobody wins. In Rainer, F., Gardani, F., Dressler, W. U., and Luschutzky, H. C., editors, Competition
in Inflection and
Word-Formation, pages 169–193. Springer International Publishing, Cham.
Gouskova, M. and Becker, M. (2013). Nonce
words show that Russian yer alternations are governed by the grammar. Natural Language &
Linguistic
Theory, 31(3):735–765.
Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018). Learning
word vectors for 157 languages. In Proceedings of the International
Conference on Language Resources and Evaluation (LREC 2018).
Ilola, E. and Mustajoki, A. (1989). Report
on Russian Morphology as it appears in Zaliznyak’s Grammatical Dictionary. Helsinki University Press, Helsinki. Type:
Book.
Janda, A. L. and Tyers, M. F. (2021). Less
is more: why all paradigms are defective, and why that is a good thing. Corpus Linguistics and
Linguistic
Theory, 17(1):109–141.
Landauer, T. and Dumais, S. (1997). A
solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction and representation of
knowledge. Psychological
Review, 104(2):211–240.
Lõo, K., Järvikivi, J., and Baayen, R. H. (2018a). Whole-word
frequency and inflectional paradigm size facilitate estonian case-inflected noun
processing. Cognition, 1751:20–25.
Lõo, K., Järvikivi, J., Tomaschek, F., Tucker, B. V., and Baayen, R. H. (2018b). Production
of estonian case-inflected nouns shows whole-word frequency and paradigmatic
effects. Morphology, 28(1):71–97.
Marelli, M. and Baroni, M. (2015). Affixation
in semantic space: Modeling morpheme meanings with compositional distributional
semantics. Psychological
Review, 122(3):485.
Matthews, P. H. (1997). The
concise Oxford dictionary of linguistics. Oxford University Press.
Meyer, P. (1994). Grammatical
categories and the methodology of linguistics: Review article on van helden, w. andries: 1993, ‘concept formation between
morphology and syntax’. Russian
Linguistics, 181:341–377.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed
representations of words and phrases and their
compositionality. In Advances in neural information processing
systems, pages 3111–3119.
Nikolaev, A. and Bermel, N. (2022). Explaining
uncertainty and defectivity of inflectional paradigms. Cognitive
Linguistics. in press.
Sims, A. D. (2015). Inflectional
Defectiveness. Cambridge University Press.
Thornton, A. M. (2019). Oxford
Research Encyclopedia of Linguistics, chapter Overabundance in
morphology. Oxford University Press.
Van der Maaten, L. and Hinton, G. (2008). Visualizing
data using t-SNE. Journal of Machine Learning
Research, 9(11).
van Helden, W. A. (1993). Case
and gender: Concept formation between morphology and syntax, volume II volumes
of Studies in Slavic and general
linguistics. Rodopi, 20
edition.
Wood, S. (2017). Generalized
Additive Models: An Introduction with R. Chapman and Hall/CRC, 2 edition.
Yamada, I., Asai, A., Sakuma, J., Shindo, H., Takeda, H., Takefuji, Y., and Matsumoto, Y. (2020). Wikipedia2Vec:
An efficient toolkit for learning and visualizing the embeddings of words and entities from
Wikipedia. In Proceedings of the 2020 Conference on Empirical Methods
in Natural Language Processing: System
Demonstrations, pages 23–30. Association for Computational Linguistics.
Yang, C. (2016). The
Price of Linguistic Productivity: How Children Learn to Break the Rules of Language. The MIT Press.
Zaliznjak, A. A. (1977). Grammatičeskij
slovar’ russkogo jazyka. Russkij jazyk, Moscow.
Švedova, N. J., editor (1984). Slovar’
russkogo jazyka (S. I. Ožegov). Russkij jazyk, Moscow, 16th
edition.
Cited by (5)
Cited by five other publications
Baayen, R. Harald
2024. The wompom. Corpus Linguistics and Linguistic Theory
Bermel, Neil, Luděk Knittl, Martin Alldrick & Alexandre Nikolaev
2024. Ideal and real paradigms: language users, reference works and corpora. Cognitive Linguistics 35:2 ► pp. 177 ff.
Herce, Borja & Marc Allassonnière-Tang
2024. The meaning of morphomes: distributional semantics of Spanish stem alternations. Linguistics Vanguard
Heitmeier, Maria, Yu-Ying Chuang & R. Harald Baayen
2023. How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning. Cognitive Psychology 146 ► pp. 101598 ff.
This list is based on CrossRef data as of 25 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.