Differences in syntactic annotation affect retrieval
Verb-attached PPs in the history of English
Prepositional phrases (PPs) play an important part in English argument structure constructions, but pose considerable challenges for linguistic investigations of any kind. In addition to the fact that PP-attachment is generally notoriously difficult to model computationally, a particularly striking methodological challenge in investigating verb-dependent PPs across (synchronic and/or diachronic) corpora is that such cross-corpus studies may have to rely on material annotated with different tools. This study evaluates the impact that such differences in corpus annotation may have on retrieval of verb-attached PPs by means of data from Early and Late Modern English corpora. Our intrinsic (recall/precision) and extrinsic parser evaluation shows that annotation does play a role, but that the noise introduced is negligible as far as frequency developments are concerned.
Article outline
- 1.Introduction
- 2.PP-retrieval across annotation schemes
- 2.1PPs as a challenge for parsing
- 2.2Differences between annotation schemes: Constituency vs. dependency parsing
- 3.Data and methods
- 3.1Set-up of the study: Datasets
- 3.2Retrieval of verb-attached PPs from the Penn-parsed vs. dependency parsed corpora
- 4.Results
- 4.1Intrinsic evaluation: Precision and recall
- 4.1.1Recall: Causes of under-generation
- 4.1.2Precision: Causes of over-generation
- 4.1.3General issues observed for both recall and precision
- 4.1.4Inter-annotator agreement in the gold standard
- 4.2Extrinsic evaluation: Verb-attached PPs in the history of English
- 5.Conclusions
- Notes
-
References
References (36)
References
Agirre, E., Baldwin, T., & Martinez, D. (2008). Improving parsing and PP attachment performance with sense information. In J. D. Moore, S. Teufel, J. Allan, & S. Furui (Eds.), Proceedings of ACL-08 (pp. 317–325). Association for Computational Linguistics. [URL]
ARCHER-3.2 = A Representative Corpus of Historical English Registers version 3.2. 1990–1993/2002/2007/2010/2013/2016. Originally compiled under the supervision of Douglas Biber and Edward Finegan at Northern Arizona University and University of Southern California; modified and expanded by subsequent members of a consortium of universities. Current member universities are Bamberg, Freiburg, Heidelberg, Helsinki, Lancaster, Leicester, Manchester, Michigan, Northern Arizona, Santiago de Compostela, Southern California, Trier, Uppsala, Zurich.
Baldwin, T., Kordoni, V., & Villavicencio, A. (2009). Prepositions in applications: A survey and introduction to the special issue. Computational Linguistics,
25
(2), 119–149.
Baugh, A., & Cable, T. (1993). A History of the English Language. Routledge.
Biber, D., Finegan, E., & Atkinson, D. (1994). ARCHER and its challenges: Compiling and exploring A Representative Corpus of Historical English Registers. In U. Fries, G. Tottie, & P. Schneider (Eds.), Creating and Using English Language Corpora (pp. 1–14). Rodopi.
Claridge, C. (2000). Multi-Word Verbs in Early Modern English: A Corpus-Based Study. Rodopi.
Covington, M. (1994). An Empirically Motivated Reinterpretation of Dependency Grammar. Technical Report, University of Georgia.
De Kok, D., Ma, J., Dima, C., & Hinrichs, E. (2017). PP attachment: Where do we stand? In M. Lapata, P. Blunsom, & A. Koller (Eds.), Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2 (pp. 311–317). Association for Computational Linguistics. [URL].
Delecraz, S., Nasr, A., Béchet, F., & Favre, B. (2017). Correcting prepositional phrase attachments using multimodal corpora. In Y. Miyao & K. Sagae (Eds.), Proceedings of the 15th International Conference on Parsing Technologies, September 2017, Pisa, Italy (pp. 72–77). Association for Computational Linguistics. [URL]
Gong, H., Mu, J., Bhat, S., & Viswanath, P. (2018). Preposition sense disambiguation and representation. In E. Riloff, D. Chiang, J. Hockenmaier, & J. Tsujii (Eds.), Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (pp. 1510–1521). Association for Computational Linguistics. [URL].
Greenbaum, S., & Nelson, G. (2007). The International Corpus of English (ICE) Project. World Englishes,
15
(1), 3–15.
Hindle, D., & Rooth, M. (1993). Structural ambiguity and lexical relations. Computational Linguistics,
19
(1), 103–120.
Huang, G., Wang, J., Tang, H., & Ye, X. (2020). BERT-based contextual semantic analysis for English preposition error correction. Journal of Physics: Conf. Ser,
1693
1, 012115.
Kroch, A., Taylor, A., & Santorini, B. (2000). The Penn-Helsinki Parsed Corpus of Middle English (PPCME2). Department of Linguistics, University of Pennsylvania, second edition, release 4. [URL]
Kroch, A., Santorini, B., & Delfs, L. (2004). Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). Department of Linguistics, University of Pennsylvania, first edition, release 3. [URL]
Kroch, A., Santorini, B., & Diertani, A. (2016). The Penn Parsed Corpus of Modern British English (PPCMBE2). Department of Linguistics, University of Pennsylvania, second edition, release 1. [URL]
Kulick, S., Bies, A., Mott, J., Kroch, A., Liberman, M., & Santorini, B. (2014). Parser evaluation using derivation trees: A Complement to evalb. In K. Toutanova & H. Wu (Eds.), Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (pp. 668–673). Association for Computational Linguistics. [URL].
Levy, R., & Andrew, G. (2006). Tregex and Tsurgeon: Tools for querying and manipulating tree data structures. In 5th International Conference on Language Resources and Evaluation (LREC 2006).
Merlo, P., & Esteve Ferrer, E. (2005). The notion of argument in prepositional phrase attachment. Computational Linguistics,
32
(3), 341–378.
Mollá, D., & Hutchinson, B. (2003). Intrinsic versus extrinsic evaluations of parsing systems. In K. Pastra (Ed.), Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: Are evaluation methods, metrics and resources reusable? Budapest, Hungary, April 14, 2003 (pp. 43–50). Association for Computational Linguistics. [URL].
Rayson, P., Archer, D., & Smith, N. (2005). VARD versus WORD: A comparison of the UCREL variant detector and modern spellcheckers on English historical corpora. In Proceedings of Corpus Linguistics 2005, Birmingham University, July 14–17. [URL]
Rodríguez-Puente, P. (2019). The English Phrasal verb: History, Stylistic Drifts and lLexicalisation. Cambridge University Press.
Roh, Y.-H., Lee, K.-Y., & Kim, Y.-G. (2011). Improving PP attachment disambiguation in a rule-based parser. In H. H. Gao & M. Dong (Eds.), Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (pp. 559–566). Institute of Digital Enhancement of Cognitive Processing, Waseda University. [URL]
Santorini, B. (2016). Annotation manual for the Penn Historical Corpora and the York-Helsinki Corpus of Early English Correspondence. [URL]
Schneider, G. (2008). Hybrid Long-Distance Functional Dependency Parsing [Doctoral dissertation, University of Zurich]. Zurich Open Repository and Archive. [URL]
Schneider, G. (2012). Using semantic resources to improve a syntactic dependency parser. In V. Barbu Mititelu, O. Popescu, & V. Pekar (Eds.), Proceedings of the LREC 2012 Conference Workshop ‘Semantic Relations II’, Istanbul, Turkey, 22 May 2012 – 22 May 2012 (pp. 67–76). University of Istanbul. [URL]
Schneider, G., Lehmann, H. M., & Schneider, P. (2015). Parsing Early Modern English corpora. Digital Scholarship in the Humanities,
30
(3), 423–439. [URL].
Schneider, G., Pettersson, E., & Percillier, M. (2017). Comparing rule-based and SMT-based spelling normalisation for English historical texts. In G. Bouma & Y. Adesam (Eds.), Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language (pp. 40–46). Linköping University Electronic Press. [URL]
Schütze, C. (1995). PP attachment and argumenthood. MIT Working Papers in Linguistics
26
1, 95–151.
Szmrecsanyi, B. (2012). Analyticity and syntheticity in the history of English. In T. Nevalainen & E. Traugott (Eds.), The Oxford Handbook of the History of English (pp. 654–665). Oxford University Press.
Thim, S. (2012). Phrasal Verbs: The English Verb-Particle Construction and its History. De Gruyter Mouton.
Traugott, E. (1992). Syntax. In R. Hogg (Ed.), The Cambridge History of the English Language (pp. 168–289). Cambridge University Press.
Vadas, D., & Curran, J. (2011). Parsing noun phrases in the Penn Treebank. Computational Linguistics,
37
(4), 753–809.
Volk, M. (2001). Exploiting the WWW as a corpus to resolve PP attachment ambiguities. In P. Rayson, A. Wilson, T. McEnery, A. Hardie, & S. Khoja. (Eds.), Proceedings of the Corpus Linguistics 2001 Conference, Lancaster University, 29 March – 2 April 2001. (pp. 601–606). Lancaster University.
Cited by (1)
Cited by one other publication
Zehentner, Eva
2024.
Alternations (at) that time: NP versus PP time adjuncts in the history of English.
Linguistics Vanguard 10:s1
► pp. 19 ff.
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.