Natural Language Processing for Online Applications

Text retrieval, extraction and categorization

Second revised edition

Peter Jackson | Thomson Corporation
Isabelle Moulinier | Thomson Corporation
ISBN 9789027249920 | EUR 105.00 | USD 158.00
ISBN 9789027249937 | EUR 33.00 | USD 49.95
ISBN 9789027292445 | EUR 105.00/33.00*
| USD 158.00/49.95*
Google Play logo
This text covers the technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical concerns. It assumes some mathematical background on the part of the reader, but the chapters typically begin with a non-mathematical account of the key issues. Current research topics are covered only to the extent that they are informing current applications; detailed coverage of longer term research and more theoretical treatments should be sought elsewhere. There are many pointers at the ends of the chapters that the reader can follow to explore the literature. However, the book does maintain a strong emphasis on evaluation in every chapter both in terms of methodology and the results of controlled experimentation.
This title replaces:
Natural Language Processing for Online Applications: Text retrieval, extraction and categorization, Peter Jackson and Isabelle Moulinier (2002)
[Natural Language Processing, 5] 2007.  x, 232 pp.
Publishing status: Available
Table of Contents
Cited by

Cited by 73 other publications

Aboalnaser, Sara A.
2019. 2019 12th International Conference on Developments in eSystems Engineering (DeSE),  pp. 290 ff. DOI logo
Ansari, Md Tarique Jamal & Naseem Ahmad Khan
2021. Worldwide COVID-19 Vaccines Sentiment Analysis Through Twitter Content. Electronic Journal of General Medicine 18:6  pp. em329 ff. DOI logo
Anzalone, Salvatore M., Yuichiro Yoshikawa, Hiroshi Ishiguro, Emanuele Menegatti, Enrico Pagello & Rosario Sorbello
2012. Towards Partners Profiling in Human Robot Interaction Contexts. In Simulation, Modeling, and Programming for Autonomous Robots [Lecture Notes in Computer Science, 7628],  pp. 4 ff. DOI logo
Anzalone, Salvatore Maria, Y. Yoshikawa, Hiroshi Ishiguro, Emanuele Menegatti, Enrico Pagello & Rosario Sorbello
2013. A Topic Recognition System for Real World Human-Robot Conversations. In Intelligent Autonomous Systems 12 [Advances in Intelligent Systems and Computing, 194],  pp. 383 ff. DOI logo
Ashley, Kevin D. & Stefanie Brüninghaus
2009. Automatically classifying case texts and predicting outcomes. Artificial Intelligence and Law 17:2  pp. 125 ff. DOI logo
Banchs, Rafael E. & Carlos G. Rodríguez Penagos
2013. Mining User-Generated Content for Social Research and Other Applications. In Emerging Applications of Natural Language Processing,  pp. 230 ff. DOI logo
Banchs, Rafael E. & Carlos G. Rodríguez Penagos
2013. Mining User-Generated Content for Social Research and Other Applications. In Small and Medium Enterprises,  pp. 1945 ff. DOI logo
Baraibar-Diez, Elisa, Manuel Luna, María D. Odriozola & Ignacio Llorente
2020. Mapping Social Impact: A Bibliometric Analysis. Sustainability 12:22  pp. 9389 ff. DOI logo
Blackburn, Timothy D., Thomas A. Mazzuchi & Shahram Sarkani
2011. Overcoming Inherent Limits to Pharmaceutical Manufacturing Quality Performance with QbD (Quality by Design). Journal of Pharmaceutical Innovation 6:2  pp. 69 ff. DOI logo
Bobicev, Victoria, Marina Sokolova, Khaled El Emam, Yasser Jafer, Brian Dewar, Elizabeth Jonker & Stan Matwin
2013. Can Anonymous Posters on Medical Forums be Reidentified?. Journal of Medical Internet Research 15:10  pp. e215 ff. DOI logo
Bonino, Dario, Alberto Ciaramella & Fulvio Corno
2010. Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information 32:1  pp. 30 ff. DOI logo
Cahill, Maria, Soohyung Joo & Kathleen Campana
2018. Language investigations of children's information sources: A research agenda. Proceedings of the Association for Information Science and Technology 55:1  pp. 56 ff. DOI logo
Cahill, Maria, Soohyung Joo & Kathleen Campana
2020. Analysis of language use in public library storytimes. Journal of Librarianship and Information Science 52:2  pp. 476 ff. DOI logo
Campos, Diego G., Tim Fütterer, Thomas Gfrörer, Rosa Lavelle-Hill, Kou Murayama, Lars König, Martin Hecht, Steffen Zitzmann & Ronny Scherer
2024. Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research. Educational Psychology Review 36:1 DOI logo
Canan Pembe, F. & Tunga Güngör
2009. Structure‐preserving and query‐biased document summarisation for web searching. Online Information Review 33:4  pp. 696 ff. DOI logo
Carchiolo, Vincenza, Alessandro Longheu & Michele Malgeri
2015. Using Twitter Data and Sentiment Analysis to Study Diseases Dynamics. In Information Technology in Bio- and Medical Informatics [Lecture Notes in Computer Science, 9267],  pp. 16 ff. DOI logo
Carvalho, Joao P., Fernando Batista & Luisa Coheur
2012. 2012 IEEE International Conference on Fuzzy Systems,  pp. 1 ff. DOI logo
Chantar, Hamouda, Majdi Mafarja, Hamad Alsawalqah, Ali Asghar Heidari, Ibrahim Aljarah & Hossam Faris
2020. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Computing and Applications 32:16  pp. 12201 ff. DOI logo
Cheng, Li & Alei Liang
2013. Proceedings of 2013 3rd International Conference on Computer Science and Network Technology,  pp. 174 ff. DOI logo
Chukharev-Hudilainen, Evgeny & Aysel Saricaoglu
2016. Causal discourse analyzer: improving automated feedback on academic ESL writing. Computer Assisted Language Learning 29:3  pp. 494 ff. DOI logo
Cohen, K. Bretonnel & Lawrence Hunter
2008. Getting Started in Text Mining. PLoS Computational Biology 4:1  pp. e20 ff. DOI logo
Correa, Nelson & Antonio Correa
2022. 2022 IEEE ANDESCON,  pp. 1 ff. DOI logo
Csányi, Gergely & Tamás Orosz
2021. Comparison of data augmentation methods for legal document classification. Acta Technica Jaurinensis 15:1  pp. 15 ff. DOI logo
Daniel, Gwendal & Jordi Cabot
2021. 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion),  pp. 324 ff. DOI logo
Daniel, Gwendal, Jordi Cabot, Laurent Deruelle & Mustapha Derras
2019. Multi-platform Chatbot Modeling and Deployment with the Jarvis Framework. In Advanced Information Systems Engineering [Lecture Notes in Computer Science, 11483],  pp. 177 ff. DOI logo
Daniel, Gwendal, Jordi Cabot, Laurent Deruelle & Mustapha Derras
2020. Xatkit: A Multimodal Low-Code Chatbot Development Framework. IEEE Access 8  pp. 15332 ff. DOI logo
Farrell, Treasa & Nick Rushby
2016. Assessment and learning technologies: An overview. British Journal of Educational Technology 47:1  pp. 106 ff. DOI logo
Gardoň, Andrej & Aleš Horák
2011. Time Dimension in the Dolphin Nick Knowledge Base Using Transparent Intensional Logic. In Text, Speech and Dialogue [Lecture Notes in Computer Science, 6836],  pp. 323 ff. DOI logo
Geist, Anton
2009. Using Citation Analysis Techniques for Computer-Assisted Legal Research in Continental Jurisdictions. SSRN Electronic Journal DOI logo
Gibert, Marcin
2015. Improving Information-Carrying Data Capacity in Text Mining. In Computational Collective Intelligence [Lecture Notes in Computer Science, 9330],  pp. 648 ff. DOI logo
Huijnen, Pim, Fons Laan, Maarten de Rijke & Toine Pieters
2014. A Digital Humanities Approach to the History of Science. In Social Informatics [Lecture Notes in Computer Science, 8359],  pp. 71 ff. DOI logo
Itahriouan, Zakaria, Nisserine El Bahri, Samir Brahim Belhaouari, Hajji Tarik & Mohamed Ouazzani Jamil
2021. Toward Intelligent Solution to Identify Learner Attitude from Source Code. In Artificial Intelligence and Industrial Applications [Lecture Notes in Networks and Systems, 144],  pp. 110 ff. DOI logo
Jain, Ashish, Sakthivel Durairaj, Anwesh Reddy Paduri, Praveen Krishnan, Pramod Chalaiah, Jaideep Chanda & Narayana Darapaneni
2023. 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC),  pp. 47 ff. DOI logo
Kang, Jingjing, Tao Liu, He Hu & Xiaoyong Du
2011. 2011 Sixth Annual Chinagrid Conference,  pp. 60 ff. DOI logo
Kannan, Rajkumar, Maria Bielikova, Frederic Andres & S. R. Balasundaram
2011. Proceedings of the Fourth Annual ACM Bangalore Conference,  pp. 1 ff. DOI logo
Kejriwal, Mayank, Daniel Gilley, Pedro Szekely & Jill Crisman
2018. Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18,  pp. 147 ff. DOI logo
Krallinger, Martin, Obdulia Rabal, Anália Lourenço, Julen Oyarzabal & Alfonso Valencia
2017. Information Retrieval and Text Mining Technologies for Chemistry. Chemical Reviews 117:12  pp. 7673 ff. DOI logo
Kucuk, Dilek & Adnan Yazici
2008. 2008 23rd International Symposium on Computer and Information Sciences,  pp. 1 ff. DOI logo
Kusumadewi, Sri, Chanifah Indah Ratnasari & Linda Rosita
2015. 2015 International Conference on Science and Technology (TICST),  pp. 292 ff. DOI logo
Küçük, Dilek & Adnan Yazıcı
2011. Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos. Knowledge-Based Systems 24:6  pp. 844 ff. DOI logo
Lai, Kaitao, Natalie Twine, Aidan O’Brien, Yi Guo & Denis Bauer
2019. Artificial Intelligence and Machine Learning in Bioinformatics. In Encyclopedia of Bioinformatics and Computational Biology,  pp. 272 ff. DOI logo
Liszka, Kathy J., Chien-Chung Chan & Chandra Shekar
2012. Detecting Pharmaceutical Spam in Microblog Messages. In Social Network Mining, Analysis, and Research Trends,  pp. 101 ff. DOI logo
Liszka, Kathy J., Chien-Chung Chan & Chandra Shekar
2013. Detecting Pharmaceutical Spam in Microblog Messages. In Data Mining,  pp. 1407 ff. DOI logo
Lunn, Stephanie, Jia Zhu & Monique Ross
2020. 2020 IEEE Frontiers in Education Conference (FIE),  pp. 1 ff. DOI logo
Melhem, Mohammed K. Bani, Laith Abualigah, Raed Abu Zitar, Abdelazim G. Hussien & Diego Oliva
2023. Comparative Study on Arabic Text Classification: Challenges and Opportunities. In Classification Applications with Deep Learning and Machine Learning Technologies [Studies in Computational Intelligence, 1071],  pp. 217 ff. DOI logo
More, Joaquim, David Baneres, Jordi Conesa & Montse Junyent
2014. 2014 International Conference on Intelligent Networking and Collaborative Systems,  pp. 480 ff. DOI logo
Nundloll, Vatsala, Robert Smail, Carly Stevens & Gordon Blair
2022. Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science. Heliyon 8:10  pp. e10710 ff. DOI logo
Oleshchuk, Vladimir & Vitaly Klyuev
2009. 2009 IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications,  pp. 561 ff. DOI logo
O’Shea, James, Zuhair Bandar & Keeley Crockett
2011. Systems Engineering and Conversational Agents. In Intelligence-Based Systems Engineering [Intelligent Systems Reference Library, 10],  pp. 201 ff. DOI logo
Pérez-Soler, Sara, Gwendal Daniel, Jordi Cabot, Esther Guerra & Juan de Lara
2020. Towards Automating the Synthesis of Chatbots for Conversational Model Query. In Enterprise, Business-Process and Information Systems Modeling [Lecture Notes in Business Information Processing, 387],  pp. 257 ff. DOI logo
Rebelo, Francisco, Carlos Soares & Rosaldo J. F. Rossetti
2015. 2015 IEEE First International Smart Cities Conference (ISC2),  pp. 1 ff. DOI logo
Romanov, Dmitry, Valentin Molokanov, Nikolai Kazantsev & Ashish Kumar Jha
2023. Removing order effects from human-classified datasets: A machine learning method to improve decision making systems. Decision Support Systems 165  pp. 113891 ff. DOI logo
Seki, Kazuhiro & Javed Mostafa
2008. Gene ontology annotation as text categorization: An empirical study. Information Processing & Management 44:5  pp. 1754 ff. DOI logo
Shin, Teo Yon, Yuan Zihong, Ng Wee Siong, Zhang Yangfan & Valerie Phangt
2017. 2017 International Conference on Asian Language Processing (IALP),  pp. 99 ff. DOI logo
Soni, Mukesh, S. Gomathi & Yagna Bhupendra Kumar Adhyaru
2020. 2020 7th International Conference on Smart Structures and Systems (ICSSS),  pp. 1 ff. DOI logo
Stanković, Ranka, Cvetana Krstev, Ivan Obradović & Olivera Kitanović
2015. Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian. In Semantic Keyword-Based Search on Structured Data Sources [Lecture Notes in Computer Science, 9398],  pp. 167 ff. DOI logo
Stanković, Ranka, Cvetana Krstev, Ivan Obradović & Olivera Kitanović
2017. Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources. In Transactions on Computational Collective Intelligence XXVI [Lecture Notes in Computer Science, 10190],  pp. 162 ff. DOI logo
Sulieman, Lina, David Gilmore, Christi French, Robert M. Cronin, Gretchen Purcell Jackson, Matthew Russell & Daniel Fabbri
2017. Classifying patient portal messages using Convolutional Neural Networks. Journal of Biomedical Informatics 74  pp. 59 ff. DOI logo
Sánchez-Cervantes, José Luis, Giner Alor-Hernández, Mario Andrés Paredes-Valverde, Lisbeth Rodríguez-Mazahua & Rafael Valencia-García
2021. NaLa-Search: A multimodal, interaction-based architecture for faceted search on linked open data. Journal of Information Science 47:6  pp. 753 ff. DOI logo
Takemiya, Makoto, Kei Majima, Mitsuaki Tsukamoto & Yukiyasu Kamitani
2016. BrainLiner: A Neuroinformatics Platform for Sharing Time-Aligned Brain-Behavior Data. Frontiers in Neuroinformatics 10 DOI logo
Talukder, Md Ashraful Islam, Sheikh Abujar, Abu Kaisar Mohammad Masum, Sharmin Akter & Syed Akhter Hossain
2020. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT),  pp. 1 ff. DOI logo
Tandon, Archana, Bireshwar Dass Mazumdar & Manoj Kumar Pal
2024. Integrated Intelligent Computing Models for Cognitive-Based Neurological Disease Interpretation in Children: A Survey. EAI Endorsed Transactions on Pervasive Health and Technology 10 DOI logo
Thessen, Anne E., Cynthia Sims Parr & Luis M. Rocha
2014. Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life. PLoS ONE 9:3  pp. e89550 ff. DOI logo
Tikhonova, Olga, Aleksandr Khrulkov, Aleksandr Antonov, Stanislav L. Sobolevsky & Sergey A. Mityagin
2022. Extraction of hidden topics in urban context based on the Internet publications analysis. Procedia Computer Science 212  pp. 23 ff. DOI logo
Tomašev, Nenad
2017. Extracting the patterns of truthfulness from political information systems in Serbia. Information Systems Frontiers 19:1  pp. 109 ff. DOI logo
Vollero, Agostino, Domenico Sardanelli & Alfonso Siano
2023. Exploring the role of the Amazon effect on customer expectations: An analysis of user‐generated content in consumer electronics retailing. Journal of Consumer Behaviour 22:5  pp. 1062 ff. DOI logo
Vollero, Agostino, Alfonso Siano & Domenico Sardanelli
2020. Amazon Effect? an Analysis of User-Generated Content on Consumer Electronics Retailers’ Facebook Pages. In Advances in Digital Marketing and eCommerce [Springer Proceedings in Business and Economics, ],  pp. 188 ff. DOI logo
Yeshambel, Tilahun, Josiane Mothe & Yaregal Assabie
2022. Amharic Adhoc Information Retrieval System Based on Morphological Features. Applied Sciences 12:3  pp. 1294 ff. DOI logo
Yoon, Sunmoo, Noémie Elhadad & Suzanne Bakken
2013. A Practical Approach for Content Mining of Tweets. American Journal of Preventive Medicine 45:1  pp. 122 ff. DOI logo
Zhang, Lishan & Kurt VanLehn
2017. Adaptively selecting biology questions generated from a semantic network. Interactive Learning Environments 25:7  pp. 828 ff. DOI logo
Zhao, Qianqian, Kai Chen, Tongxin Li, Yi Yang & XiaoFeng Wang
2018. Detecting telecommunication fraud by understanding the contents of a call. Cybersecurity 1:1 DOI logo
[no author supplied]
2011. Bibliography. In Data Mining,  pp. 510 ff. DOI logo
[no author supplied]
2019. BIBLIOGRAPHY. In Data Mining,  pp. 607 ff. DOI logo

This list is based on CrossRef data as of 16 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.



For errata please go to


Main BIC Subject

UYQL: Natural language & machine translation

Main BISAC Subject

COM042000: COMPUTERS / Natural Language Processing
ONIX Metadata
ONIX 2.1
ONIX 3.0
U.S. Library of Congress Control Number:  2007010559 | Marc record