Lexical frames in academic prose and conversation
While lexical bundles research identifies continuous sequences (e.g. the end of the, I don’t know if), researchers have also been interested in discontinuous sequences in which words form a ‘frame’ surrounding a variable slot (e.g. I don’t * to, it is * to). To date, most research has focused on a few intuitively-selected frames, or has begun with frequent continuous sequences and then analyzed those to identify associated frames. Few previous studies have attempted to directly identify the full set of discontinuous sequences in a corpus. In the present study, we work towards that goal, using a corpus-driven approach to identify the set of recurrent four-word continuous and discontinuous patterns in corpora of conversation and academic writing. This direct computational analysis of the corpora reveals a more complete set of frames than alternative approaches, resulting in the documentation of highly frequent frames that have not been identified in previous research.