Using data-mining to identify and study patterns in lexical
innovation on the web
The NeoCrawler
This paper presents the NeoCrawler – a tailor-made webcrawler,
which identifies and retrieves neologisms from the Internet and systematically
monitors the use of detected neologisms on the web by means of weekly searches.
It enables researchers to use the web as a corpus in order to investigate the
dynamics of lexical innovation on a large-scale and systematic basis. The
NeoCrawler represents an innovative web-mining tool which opens up new
opportunities for linguists to tackle a number of unresolved and
under-researched issues in the field of lexical innovation. This paper presents
the design as well as the most important characteristics of two modules, the
Discoverer and the Observer, with regard to the usage-based study of lexical
innovation and diffusion.
Article outline
- 1.Introduction
- 2.The Discoverer
- 2.1Source material and pre-processing
- 2.2String matching procedure
- 2.3Reference dictionary
- 2.4Manual evaluation
- 3.The Observer
- 3.1Architecture of the Observer
- 3.2The NeoCrawler database
- 3.3The Observer interface
- 4.Summary and future work
- Notes
-
References
References
Algeo, John
1998 Vocabulary. In
Suzanne Romaine (ed.),
The Cambridge history of the English Language, vol. 31, Cambridge: Cambridge University Press. 57–91.
Ayto, John
2003 Newspapers and neologisms. In
Jean Aitchison &
Diana M. Lewis (eds.),
New media language, 182–187. Routledge: New York.
Baayen, Harald R. & Anneke Neijt
1997 Productivity in context: A case study of a Dutch
suffix.
Linguistics 351. 565–587.
Bauer, Laurie
1983 English word-formation. Cambridge: Cambridge University Press.
Cabré, Maria Teresa & Lluís de Yzaguirre
1995 Stratégie pour la détection semiautomatique des néologismes de
presse.
TTR: Traduction, Terminologie, Redaction 81. 89–100.
Cartier, Emmanuel
2017 Neoveille, a web platform for neologism tracking.
Proceedings of the Software Demonstrations of the 15th Conference of the
European Chapter of the Association for Computational Linguistics, 95–98.
Cartier, Emmanuel
2019 (to appear).
Néoveille, plateforme de détection, de description et de suivi
des néologismes en onze langues.
Néologica.
Falk, Ingrid, Delphine Bernhard & Christophe Gérard
2018 The Logoscope: A semi-automatic tool for detecting and
documenting French new words from the linguistic project to the web
interface. Research Report, Université Strasbourg.
[URL] [accessed 1 August 2018].
Fischer, Roswitha
1998 Lexical change in present-day English: A corpus-based study of the
motivation, institutionalization, and productivity of creative
neologisms. Tübingen: Narr.
Gérard, Christophe, Lauren Bruneau, Ingrid Falk, Delphine Bernhard & Ann-Lise Rosio
2017 Le Logoscope : Observatoire des innovations lexicales en français
contemporain. In
Joaquín García Palacios,
Goedele de Sterck,
Daniel Linder,
Jesús Torre del Rey,
Miguel Sánchez Ibanez &
Nava Maroto García (eds.),
La neología en las lenguas Románicas: Recursos, estrategias y nuevas
orientaciones. Frankfurt: Peter Lang. 339–356.
Hamilton, William L., Jure Leskovec & Dan Jurafsky
2016 Cultural shift or linguistic drift? Comparing two computational
models of semantic change.
Proceedings of Conference on Empirical Methods on Natural Language
Processing, Austin, Texas, USA, 1–5 November 2016.
[URL] [accessed 1 March 2018].
Iakovleva, Tatiana
2017 Automatic detection of neologisms in Russian newspaper corpora
with Néoveille.
Proceedings of the International Conference CORPUS LINGUISTICS – 2017,
St Petersburg, 27–30 June 2017, 43–47.
[URL] [accessed 1 May 2018].
Janssen, Maarten
2005 NeoTrack: Semiautomatic neologism detection.
APL Conference 2005, Lisboa, Portugal.
[URL] [accessed 15 March 2018].
Jatowt, Adam & Kevin Duh
2014 A framework for analysing semantic change of words across
time.
Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital
Libraries, 229–238.
Kerremans, Daphné
2015 A web of new words: A corpus-based study of the conventionalization
process of English neologisms. Frankfurt am Main: Peter Lang.
Kerremans, Daphné, Susanne Stegmayr & Hans-Jörg Schmid
2012 The NeoCrawler: Identifying and retrieving neologisms from the
internet and monitoring on-going change. In
Kathryn Allan &
Justyna Robinson (eds.),
Current methods in historical semantics, 59–96. Berlin: Mouton de Gruyter.
Kerremans, Daphné & Jelena Prokić
2018 Mining the web for new words: Semi-automatic neologism
identification with the NeoCrawler.
Anglia 136(2). 239–268.
Labov, William
1966 The social stratification of English in New York City. Washington: Center for Applied Linguistics.
Labov, William
1980 The social origins of sound change. In
William Labov (ed.),
Locating language in time and space, 251–266. New York: Academic Press.
Labov, William
2001 Principles of linguistic change. Volume II: Social factors. Oxford: Blackwell.
Levenshtein, Vladimir I.
1965 Binary codes capable of correcting deletions, insertions, and
reversals.
Soviet Physics Doklady 101. 707–710.
Lewandowski, Dirk
2008 A three-year study on the freshness of web search engine
databases.
Journal of Information Science 34(6). 817–831.
Liao, Xuanyi & Guang Cheng
2016 Analysing the semantic change based on word
embedding. In
Natural language understanding and intelligent applications. Proceedings
of the 5th CCF Conference on Natural Language Processing and Chinese
Computing, NLPCC 2016, and 24th International Conference on Computer
Processing of Oriental Languages, ICCPOL 2016, Kunming, China, December 2–6,
2016, 213–223. Cham: Springer.
Liu, Tsun-Jui, Shu-Kai Hsieh & Laurent Prevot
2013 Observing features of PTT neologisms: A corpus-driven study with
N-gram model.
Proceedings of the Twenty-Fifth Conference on Computational Linguistics
and Speech Processing (ROCLING 2013), 250–259.
Megerdoomian, Karine & Ali Hadjarian
2010 Mining and classification of neologisms in Persian
blogs.
Proceedings of the 2nd Workshop on Computational Approaches to
Linguistic Creativity (HLT 2010), 6–13.
Milroy, James & Lesley Milroy
1985 Linguistic change, social network and speaker
innovation.
Journal of Linguistics 211. 339–384.
Nevalainen, Terttu
2000 Mobility, social networks and language change in Early Modern
England.
European Journal of English Studies 4(3). 253–264.
Nevalainen, Terttu & Helena Raumolin-Brunberg
2003 Historical sociolinguistics: Language change in Tudor and Stuart
England. London: Longman.
Plag, Ingo
1999 Morphological productivity: Structural constraints in English
derivation. Berlin/New York: Mouton de Gruyter.
Säily, Tanja, Eetu Mäkelä & Mika Hämäläinen
Schmid, Hans-Jörg
2016 English morphology and word-formation: An introduction, 3rd revised and extended edition. Berlin: Erich Schmidt.
Tagliamonte, Sali A. & Derek Denis
2014 Expanding the transmission/diffusion dichotomy: Evidence from
Canada.
Language 90(1). 90–136.
Torres-del-Rey, Jesús & Nava Maroto
2014 Building the interface between experts and linguists in the
detection and characterisation of neology in the field of
neurosciences.
Proceedings of the 4th International Workshop on Computational
Terminology, Dublin, Ireland, August 2014, 64–67.
[URL] [accessed 25 March 2018].
Tournier, Jean
1985 Introduction Descriptive à la Lexicogénétique de l’Anglais
Contemporain. Paris: Champion-Slatkine.
Wilson, Lee
2017 Google Freshness Algorithm: Everything you need to
know.
Search Engine Journal.
[URL]. Last accessed August 1, 2018.
Cited by
Cited by 1 other publications
Würschinger, Quirin
2021.
Social Networks of Lexical Innovation. Investigating the Social Dynamics of Diffusion of Neologisms on Twitter.
Frontiers in Artificial Intelligence 4
This list is based on CrossRef data as of 12 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.