Natural Language Processing for Online Applications

Text retrieval, extraction and categorization

Second revised edition

Authors

Peter Jackson | Thomson Corporation

Isabelle Moulinier | Thomson Corporation

Hardbound – Available

ISBN 9789027249920 | EUR 105.00 | USD 158.00

Paperback – Available

ISBN 9789027249937 | EUR 33.00 | USD 49.95

e-Book –

ISBN 9789027292445 | EUR 105.00/33.00*
| USD 158.00/49.95*

This text covers the technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical concerns. It assumes some mathematical background on the part of the reader, but the chapters typically begin with a non-mathematical account of the key issues. Current research topics are covered only to the extent that they are informing current applications; detailed coverage of longer term research and more theoretical treatments should be sought elsewhere. There are many pointers at the ends of the chapters that the reader can follow to explore the literature. However, the book does maintain a strong emphasis on evaluation in every chapter both in terms of methodology and the results of controlled experimentation.

This title replaces:
Natural Language Processing for Online Applications: Text retrieval, extraction and categorization, Peter Jackson and Isabelle Moulinier (2002)

[Natural Language Processing, 5] 2007. x, 232 pp.

Publishing status: Available

https://doi.org/10.1075/nlp.5

Table of Contents

Preface to the 2nd edition | p. ix

Chapter 1. Natural language processing | p. 1

1.1 What is NLP?
1.2 NLP and linguistics
1.3 Linguistic tools
1.4 Plan of the book

Chapter 2. Document retrieval | p. 23

2.1 Information retrieval
2.2 Indexing technology
2.3 Query processing
2.4 Evaluating search engines
2.5 Attempts to enhance search performance
2.6 The future ofWeb searching

Chapter 3. Information extraction | p. 69

3.1 The message understanding conferences
3.2 Regular expressions
3.3 Finite automata in FASTUS
3.4 Context-free grammars
3.5 Limitations of current technology and future research
3.6 Summary of information extraction

Chapter 4. Text categorization | p. 113

4.1 Overview of categorization tasks
4.2 Handcrafted rule based methods
4.3 Inductive learning for text classification
4.4 Nearest neighbor algorithms
4.5 Combining classifiers
4.6 Evaluation of text categorization systems

Chapter 5. Text mining | p. 163

5.1 What is text mining?
5.2 Resolving reference and coreference
5.3 Automatic summarization
5.4 Testing of automatic summarization programs
5.5 Prospects for text mining and NLP

Index | p. 227

Cited by

Cited by 73 other publications

Order by:

Aboalnaser, Sara A.

2019. 2019 12th International Conference on Developments in eSystems Engineering (DeSE), ► pp. 290 ff.

Ansari, Md Tarique Jamal & Naseem Ahmad Khan

2021. Worldwide COVID-19 Vaccines Sentiment Analysis Through Twitter Content. Electronic Journal of General Medicine 18:6 ► pp. em329 ff.

Anzalone, Salvatore M., Yuichiro Yoshikawa, Hiroshi Ishiguro, Emanuele Menegatti, Enrico Pagello & Rosario Sorbello

2012. Towards Partners Profiling in Human Robot Interaction Contexts. In Simulation, Modeling, and Programming for Autonomous Robots [Lecture Notes in Computer Science, 7628], ► pp. 4 ff.

Anzalone, Salvatore Maria, Y. Yoshikawa, Hiroshi Ishiguro, Emanuele Menegatti, Enrico Pagello & Rosario Sorbello

2013. A Topic Recognition System for Real World Human-Robot Conversations. In Intelligent Autonomous Systems 12 [Advances in Intelligent Systems and Computing, 194], ► pp. 383 ff.

Ashley, Kevin D. & Stefanie Brüninghaus

2009. Automatically classifying case texts and predicting outcomes. Artificial Intelligence and Law 17:2 ► pp. 125 ff.

Banchs, Rafael E. & Carlos G. Rodríguez Penagos

2013. Mining User-Generated Content for Social Research and Other Applications. In Emerging Applications of Natural Language Processing, ► pp. 230 ff.

Banchs, Rafael E. & Carlos G. Rodríguez Penagos

2013. Mining User-Generated Content for Social Research and Other Applications. In Small and Medium Enterprises, ► pp. 1945 ff.

Baraibar-Diez, Elisa, Manuel Luna, María D. Odriozola & Ignacio Llorente

2020. Mapping Social Impact: A Bibliometric Analysis. Sustainability 12:22 ► pp. 9389 ff.

Blackburn, Timothy D., Thomas A. Mazzuchi & Shahram Sarkani

2011. Overcoming Inherent Limits to Pharmaceutical Manufacturing Quality Performance with QbD (Quality by Design). Journal of Pharmaceutical Innovation 6:2 ► pp. 69 ff.

Bobicev, Victoria, Marina Sokolova, Khaled El Emam, Yasser Jafer, Brian Dewar, Elizabeth Jonker & Stan Matwin

2013. Can Anonymous Posters on Medical Forums be Reidentified?. Journal of Medical Internet Research 15:10 ► pp. e215 ff.

Bonino, Dario, Alberto Ciaramella & Fulvio Corno

2010. Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information 32:1 ► pp. 30 ff.

Cahill, Maria, Soohyung Joo & Kathleen Campana

2018. Language investigations of children's information sources: A research agenda. Proceedings of the Association for Information Science and Technology 55:1 ► pp. 56 ff.

Cahill, Maria, Soohyung Joo & Kathleen Campana

2020. Analysis of language use in public library storytimes. Journal of Librarianship and Information Science 52:2 ► pp. 476 ff.

Campos, Diego G., Tim Fütterer, Thomas Gfrörer, Rosa Lavelle-Hill, Kou Murayama, Lars König, Martin Hecht, Steffen Zitzmann & Ronny Scherer

2024. Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research. Educational Psychology Review 36:1

Canan Pembe, F. & Tunga Güngör

2009. Structure‐preserving and query‐biased document summarisation for web searching. Online Information Review 33:4 ► pp. 696 ff.

Carchiolo, Vincenza, Alessandro Longheu & Michele Malgeri

2015. Using Twitter Data and Sentiment Analysis to Study Diseases Dynamics. In Information Technology in Bio- and Medical Informatics [Lecture Notes in Computer Science, 9267], ► pp. 16 ff.

Carvalho, Joao P., Fernando Batista & Luisa Coheur

2012. 2012 IEEE International Conference on Fuzzy Systems, ► pp. 1 ff.

Chantar, Hamouda, Majdi Mafarja, Hamad Alsawalqah, Ali Asghar Heidari, Ibrahim Aljarah & Hossam Faris

2020. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Computing and Applications 32:16 ► pp. 12201 ff.

Cheng, Li & Alei Liang

2013. Proceedings of 2013 3rd International Conference on Computer Science and Network Technology, ► pp. 174 ff.

Chukharev-Hudilainen, Evgeny & Aysel Saricaoglu

2016. Causal discourse analyzer: improving automated feedback on academic ESL writing. Computer Assisted Language Learning 29:3 ► pp. 494 ff.

Cohen, K. Bretonnel & Lawrence Hunter

2008. Getting Started in Text Mining. PLoS Computational Biology 4:1 ► pp. e20 ff.

Correa, Nelson & Antonio Correa

2022. 2022 IEEE ANDESCON, ► pp. 1 ff.

Csányi, Gergely & Tamás Orosz

2021. Comparison of data augmentation methods for legal document classification. Acta Technica Jaurinensis 15:1 ► pp. 15 ff.

Daniel, Gwendal & Jordi Cabot

2021. 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), ► pp. 324 ff.

Daniel, Gwendal, Jordi Cabot, Laurent Deruelle & Mustapha Derras

2019. Multi-platform Chatbot Modeling and Deployment with the Jarvis Framework. In Advanced Information Systems Engineering [Lecture Notes in Computer Science, 11483], ► pp. 177 ff.

Daniel, Gwendal, Jordi Cabot, Laurent Deruelle & Mustapha Derras

2020. Xatkit: A Multimodal Low-Code Chatbot Development Framework. IEEE Access 8 ► pp. 15332 ff.

Farrell, Treasa & Nick Rushby

2016. Assessment and learning technologies: An overview. British Journal of Educational Technology 47:1 ► pp. 106 ff.

Gardoň, Andrej & Aleš Horák

2011. Time Dimension in the Dolphin Nick Knowledge Base Using Transparent Intensional Logic. In Text, Speech and Dialogue [Lecture Notes in Computer Science, 6836], ► pp. 323 ff.

Geist, Anton

2009. Using Citation Analysis Techniques for Computer-Assisted Legal Research in Continental Jurisdictions. SSRN Electronic Journal

Gibert, Marcin

2015. Improving Information-Carrying Data Capacity in Text Mining. In Computational Collective Intelligence [Lecture Notes in Computer Science, 9330], ► pp. 648 ff.

Huijnen, Pim, Fons Laan, Maarten de Rijke & Toine Pieters

2014. A Digital Humanities Approach to the History of Science. In Social Informatics [Lecture Notes in Computer Science, 8359], ► pp. 71 ff.

Itahriouan, Zakaria, Nisserine El Bahri, Samir Brahim Belhaouari, Hajji Tarik & Mohamed Ouazzani Jamil

2021. Toward Intelligent Solution to Identify Learner Attitude from Source Code. In Artificial Intelligence and Industrial Applications [Lecture Notes in Networks and Systems, 144], ► pp. 110 ff.

Jain, Ashish, Sakthivel Durairaj, Anwesh Reddy Paduri, Praveen Krishnan, Pramod Chalaiah, Jaideep Chanda & Narayana Darapaneni

2023. 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), ► pp. 47 ff.

Kang, Jingjing, Tao Liu, He Hu & Xiaoyong Du

2011. 2011 Sixth Annual Chinagrid Conference, ► pp. 60 ff.

Kannan, Rajkumar, Maria Bielikova, Frederic Andres & S. R. Balasundaram

2011. Proceedings of the Fourth Annual ACM Bangalore Conference, ► pp. 1 ff.

Kejriwal, Mayank, Daniel Gilley, Pedro Szekely & Jill Crisman

2018. Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18, ► pp. 147 ff.

Krallinger, Martin, Obdulia Rabal, Anália Lourenço, Julen Oyarzabal & Alfonso Valencia

2017. Information Retrieval and Text Mining Technologies for Chemistry. Chemical Reviews 117:12 ► pp. 7673 ff.

Kucuk, Dilek & Adnan Yazici

2008. 2008 23rd International Symposium on Computer and Information Sciences, ► pp. 1 ff.

Kusumadewi, Sri, Chanifah Indah Ratnasari & Linda Rosita

2015. 2015 International Conference on Science and Technology (TICST), ► pp. 292 ff.

Küçük, Dilek & Adnan Yazıcı

2011. Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos. Knowledge-Based Systems 24:6 ► pp. 844 ff.

Lai, Kaitao, Natalie Twine, Aidan O’Brien, Yi Guo & Denis Bauer

2019. Artificial Intelligence and Machine Learning in Bioinformatics. In Encyclopedia of Bioinformatics and Computational Biology, ► pp. 272 ff.

Liszka, Kathy J., Chien-Chung Chan & Chandra Shekar

2012. Detecting Pharmaceutical Spam in Microblog Messages. In Social Network Mining, Analysis, and Research Trends, ► pp. 101 ff.

Liszka, Kathy J., Chien-Chung Chan & Chandra Shekar

2013. Detecting Pharmaceutical Spam in Microblog Messages. In Data Mining, ► pp. 1407 ff.

Lunn, Stephanie, Jia Zhu & Monique Ross

2020. 2020 IEEE Frontiers in Education Conference (FIE), ► pp. 1 ff.

Melhem, Mohammed K. Bani, Laith Abualigah, Raed Abu Zitar, Abdelazim G. Hussien & Diego Oliva

2023. Comparative Study on Arabic Text Classification: Challenges and Opportunities. In Classification Applications with Deep Learning and Machine Learning Technologies [Studies in Computational Intelligence, 1071], ► pp. 217 ff.

More, Joaquim, David Baneres, Jordi Conesa & Montse Junyent

2014. 2014 International Conference on Intelligent Networking and Collaborative Systems, ► pp. 480 ff.

Nundloll, Vatsala, Robert Smail, Carly Stevens & Gordon Blair

2022. Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science. Heliyon 8:10 ► pp. e10710 ff.

Oleshchuk, Vladimir & Vitaly Klyuev

2009. 2009 IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, ► pp. 561 ff.

O’Shea, James, Zuhair Bandar & Keeley Crockett

2011. Systems Engineering and Conversational Agents. In Intelligence-Based Systems Engineering [Intelligent Systems Reference Library, 10], ► pp. 201 ff.

Pérez-Soler, Sara, Gwendal Daniel, Jordi Cabot, Esther Guerra & Juan de Lara

2020. Towards Automating the Synthesis of Chatbots for Conversational Model Query. In Enterprise, Business-Process and Information Systems Modeling [Lecture Notes in Business Information Processing, 387], ► pp. 257 ff.

Rebelo, Francisco, Carlos Soares & Rosaldo J. F. Rossetti

2015. 2015 IEEE First International Smart Cities Conference (ISC2), ► pp. 1 ff.

Romanov, Dmitry, Valentin Molokanov, Nikolai Kazantsev & Ashish Kumar Jha

2023. Removing order effects from human-classified datasets: A machine learning method to improve decision making systems. Decision Support Systems 165 ► pp. 113891 ff.

Seki, Kazuhiro & Javed Mostafa

2008. Gene ontology annotation as text categorization: An empirical study. Information Processing & Management 44:5 ► pp. 1754 ff.

Shin, Teo Yon, Yuan Zihong, Ng Wee Siong, Zhang Yangfan & Valerie Phangt

2017. 2017 International Conference on Asian Language Processing (IALP), ► pp. 99 ff.

Soni, Mukesh, S. Gomathi & Yagna Bhupendra Kumar Adhyaru

2020. 2020 7th International Conference on Smart Structures and Systems (ICSSS), ► pp. 1 ff.

Stanković, Ranka, Cvetana Krstev, Ivan Obradović & Olivera Kitanović

2015. Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian. In Semantic Keyword-Based Search on Structured Data Sources [Lecture Notes in Computer Science, 9398], ► pp. 167 ff.

Stanković, Ranka, Cvetana Krstev, Ivan Obradović & Olivera Kitanović

2017. Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources. In Transactions on Computational Collective Intelligence XXVI [Lecture Notes in Computer Science, 10190], ► pp. 162 ff.

Sulieman, Lina, David Gilmore, Christi French, Robert M. Cronin, Gretchen Purcell Jackson, Matthew Russell & Daniel Fabbri

2017. Classifying patient portal messages using Convolutional Neural Networks. Journal of Biomedical Informatics 74 ► pp. 59 ff.

Sánchez-Cervantes, José Luis, Giner Alor-Hernández, Mario Andrés Paredes-Valverde, Lisbeth Rodríguez-Mazahua & Rafael Valencia-García

2021. NaLa-Search: A multimodal, interaction-based architecture for faceted search on linked open data. Journal of Information Science 47:6 ► pp. 753 ff.

Takemiya, Makoto, Kei Majima, Mitsuaki Tsukamoto & Yukiyasu Kamitani

2016. BrainLiner: A Neuroinformatics Platform for Sharing Time-Aligned Brain-Behavior Data. Frontiers in Neuroinformatics 10

Talukder, Md Ashraful Islam, Sheikh Abujar, Abu Kaisar Mohammad Masum, Sharmin Akter & Syed Akhter Hossain

2020. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), ► pp. 1 ff.

Tandon, Archana, Bireshwar Dass Mazumdar & Manoj Kumar Pal

2024. Integrated Intelligent Computing Models for Cognitive-Based Neurological Disease Interpretation in Children: A Survey. EAI Endorsed Transactions on Pervasive Health and Technology 10

Thessen, Anne E., Cynthia Sims Parr & Luis M. Rocha

2014. Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life. PLoS ONE 9:3 ► pp. e89550 ff.

Tikhonova, Olga, Aleksandr Khrulkov, Aleksandr Antonov, Stanislav L. Sobolevsky & Sergey A. Mityagin

2022. Extraction of hidden topics in urban context based on the Internet publications analysis. Procedia Computer Science 212 ► pp. 23 ff.

Tomašev, Nenad

2017. Extracting the patterns of truthfulness from political information systems in Serbia. Information Systems Frontiers 19:1 ► pp. 109 ff.

Vollero, Agostino, Domenico Sardanelli & Alfonso Siano

2023. Exploring the role of the Amazon effect on customer expectations: An analysis of user‐generated content in consumer electronics retailing. Journal of Consumer Behaviour 22:5 ► pp. 1062 ff.

Vollero, Agostino, Alfonso Siano & Domenico Sardanelli

2020. Amazon Effect? an Analysis of User-Generated Content on Consumer Electronics Retailers’ Facebook Pages. In Advances in Digital Marketing and eCommerce [Springer Proceedings in Business and Economics, ], ► pp. 188 ff.

Yeshambel, Tilahun, Josiane Mothe & Yaregal Assabie

2022. Amharic Adhoc Information Retrieval System Based on Morphological Features. Applied Sciences 12:3 ► pp. 1294 ff.

Yoon, Sunmoo, Noémie Elhadad & Suzanne Bakken

2013. A Practical Approach for Content Mining of Tweets. American Journal of Preventive Medicine 45:1 ► pp. 122 ff.

Zhang, Lishan & Kurt VanLehn

2017. Adaptively selecting biology questions generated from a semantic network. Interactive Learning Environments 25:7 ► pp. 828 ff.

Zhao, Qianqian, Kai Chen, Tongxin Li, Yi Yang & XiaoFeng Wang

2018. Detecting telecommunication fraud by understanding the contents of a call. Cybersecurity 1:1

[no author supplied]

2011. Bibliography. In Data Mining, ► pp. 510 ff.

[no author supplied]

2019. BIBLIOGRAPHY. In Data Mining, ► pp. 607 ff.

This list is based on CrossRef data as of 16 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Erratum

For errata please go to http://www.jacksonpeter.com/nlp

Subjects