Chapter 5
Evaluating a bracketing protocol for multiword terms
Multiword terms (MWTs) are frequently used to encapsulate and convey meaning in scientific and
technical texts. However, they can also make these texts difficult to understand because the relations between
constituents are not transparent. When MWTs have more than two constituents, a dependency analysis (bracketing) is
often necessary to facilitate their interpretation. NLP has proposed various models to automatize bracketing
operations, but none has been entirely satisfactory. This paper presents a protocol that combines various models and
applies it to a set of three-constituent MWTs in order to: (i) sort rules by their disambiguation potential, based on
their likelihood of retrieving results from any corpus and their ability to solve bracketing; and (ii) ascertain the
influence of corpus size and type in the results obtained.
Article outline
- 1.Introduction
- 2.Bracketing models
- 3.Materials and methods
- 3.1MWT extraction and manual bracketing
- 3.2Queries
- 3.3Bracketing rules
- 4.Results
- 4.1Rule comparison
- 4.1.1Quantitative performance of the rules
- 4.1.2Qualitative performance of the rules
- 4.1.3Quantitative and qualitative performance of the rules
- 4.2Comparison of corpora
- 4.3Comparison of MWT bracketing
- 5.Conclusions
-
References
-
Appendix
References (12)
References
Balyan, R. & Chatterjee, N. (2015). Translating
noun compounds using semantic relations. Computer Speech and
Language, 32, 91–108. 

Barrière, C., & Ménard, P. A. (2014). Multiword
noun compound bracketing using
Wikipedia. In Proceedings of the First Workshop on
Computational Approaches to Compound
Analysis (pp. 72–80). ACL and Dublin City University. 

Cabezas-García, M., & León-Araúz, P. (2019). On
the structural disambiguation of multi-word
terms. In G. Corpas Pastor & R. Mitkov (Eds.), Computational
and corpus-based phraseology, Lecture Notes in Computer Science,
11755 (pp. 46–60). Springer. 

Girju, R., Moldovan, D., Tatu, M., & Antohe, D. (2005). On
the semantics of noun compounds. Computer Speech &
Language, 19(4), 479–496. 

Grefenstette, G. (1994). Explorations
in automatic thesaurus discovery. Kluwer Academic Press. 

Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The
sketch engine. In G. Williams & S. Vessier (Eds.), Proceedings
of the Eleventh EURALEX International
Congress (pp. 105–116). EURALEX.
Lauer, M. (1995). Designing
statistical language learners: Experiments on noun compounds. PhD
dissertation. Macquarie University, Australia.
León-Araúz, P., Cabezas-García, M., & Faber, P. (2021). Multiword-term
bracketing and representation in terminological knowledge
bases. In Seventh Biennial Conference
on Electronic Lexicography, eLex
2021 (pp. 139–163). Lexical Computing.
Marcus, M. (1980). A
theory of syntactic recognition for natural language. MIT Press.
Nakov, P. (2007). Using
the web as an implicit training set: Application to noun compound syntax and
semantics. PhD dissertation. University of
California at Berkeley.
Nakov, P., & Hearst, M. (2005). Search
engine statistics beyond the n-gram: Application to noun compound
bracketing. In Proceedings of the Ninth
Conference on Computational Natural Language Learning, CoNLL
2005 (pp. 17–24). ACL. 
Pustejovsky, J., Anick, P., & Bergler, S. (1993). Lexical
semantic techniques for corpus analysis. Computational
Linguistics, 19(2), 331–358.