Lexical segments in text

Berber Sardinha, Tony

doi:10.1075/z.107.11ber

Part of

Patterns of Text: In honour of Michael Hoey
Edited by Mike Scott and Geoff Thompson †
[Not in series 107] 2001
► pp. 213–237

Lexical segments in text

Tony Berber Sardinha | Catholic University of São Paulo Brazil

Editors’ introduction Berber Sardinha’s paper deals with a problem, namely text segmentation, which connects at several points with those of the other contributors to this volume. Like Scott, Sinclair and Coulthard, Berber Sardinha is interested in understanding the computer’s understanding of text, or rather the computer’s failure to handle the complexities of text satisfactorily. Like the other contributors who have been influenced by Hoey’s work on text patterning, his work is also concerned with the problem of identifying the stages which a text goes through as it moves from one component of a pattern to the next.

The problem is not trivial. Computer methods for processing text have already led to an explosion of text retrieval methods which anyone who uses Internet search engines knows, needs and curses. That is, a fairly simple technology is there to help us find all instances of a desired word or phrase in a database, or in the whole Internet, or on a given computer, and the uses to which this technology can be put are both text retrieval — to find the text one is searching for — and pedagogical: to learn about word collocation and colligation. But as Sinclair’s paper shows, such a technology may be efficient in its own terms but disconnected from the way human users relate to the world and to each other. Thus, a very large number of irrelevant hits are typically found, which usually hinder text retrieval as much as they help it and may also obscure and frustrate collocational inference.

It is likely that these problems will be best tackled by refinements to the techniques used, refinements which are very likely to involve questions central to the rest of this volume, concerning the aboutness of individual text segments, and the relations between text segments or elements. Thus, for information retrieval and language learning we certainly need to know much more than “which texts contain word x or phrase y?” and move towards “which texts are about z?” and “which segments of which texts are about p and not q?” and “where does the text change from explaining r to evaluating it?”. It is probable that as we learn to answer questions such as these, we shall be that much nearer to a truly useful text retrieval.

Berber Sardinha’s paper proposes a detailed and ingenious method for getting at the boundaries within a text, identifying its segments in the sense of changes in aboutness.

As with the other contributors using computer methods, the problems are as yet greater than the solutions encountered. It is therefore important to view the method being proposed in the right light: the purpose here as in so much else is to model the world; it is through insights arising from model-making, model application and model- testing that progress is eventually made.

Published online: 27 February 2001

https://doi.org/10.1075/z.107.11ber

Cited by

Cited by 2 other publications

McCarthy, Philip M., Adam M. Renner, Michael G. Duncan, Nicholas D. Duran, Erin J. Lightman & Danielle S. McNamara

2008. Identifying topic sentencehood. Behavior Research Methods 40:3 ► pp. 647 ff.

[no author supplied]

2016. Discourse Topics [Pragmatics & Beyond New Series, 269],

This list is based on CrossRef data as of 19 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.