Phrases in English - PIE
British National Corpus - BNC
BNC N-Grams Simple Search
The British National Corpus (BNC) is a carefully-selected collection of 4124 contemporary written and spoken English texts, primarily from the United Kingdom. The corpus totals over 100 million words and covers a representative range of domains, genres and registers. The entire corpus has been analyzed and marked up with part of speech (PoS) tags. Provenance and other attributes are carefully documented for each text.
"What is the BNC?" provides a succinct overview of the corpus; for an exhaustive description, consult the British National Corpus Users Reference Guide. Chapter 1 of Guy Aston and Lou Burnard's BNC Handbook includes an informative survey of possible uses of corpora in general and of the BNC in particular. Additional useful information and resources (including various frequency lists with more refined PoS tagging) are found on the companion website for Word Frequencies in Written and Spoken English based on the British National Corpus by Geoffrey Leech, Paul Rayson and Andrew Wilson. The introduction includes a very readable discussion of how the corpus was tokenized and tagged.
PIE incorporates a database derived from the second or World Edition of the BNC (2000), but is not affiliated with the BNC Consortium. It aims to provide a simple yet powerful interface for studying words and phrases up to eight words long appropriate for both experienced researchers and novice users. For investigating words in longer contexts, the full BNC corpus and Xaira search and analysis software is available on CD-ROM from the BNC Consortium (a single user license costs only Ł 75). Alternatively, one can look up individual words and phrases online.
To understand and interpret the datasets produced here and to compare them to results of direct queries to BNC, please read how and why the original data were normalized to build the PIE database.
Displaying 1-POS-grams by Types
- 9 CJT the subordinating conjunction that, when introducing a relative clause, as in the day that follows Christmas.
- 15 DTQ wh-determiner, e.g. which, what, whose, which, whether used interrogatively or to introduce a relative clause.
- 22 PNQ wh-pronoun, e.g. who, whoever, whom.
- 24 POS the possessive or genitive marker 's or ', tagged as a distinct word.
- 25 PRF the preposition of.
- 27 VBB the present tense forms of the verb be, except for is or 's: am, are 'm, 're, be (subjunctive or imperative), ai (as in ain't).
- 28 VBD the past tense forms of the verb be: was, were.
- 29 VBG -ing form of the verb be: being.
- 30 VBI the infinitive form of the verb be: be.
- 31 VBN the past participle form of the verb be: been
- 32 VBZ the -s form of the verb be: is, 's.
- 33 VDB the finite base form of the verb do: do.
- 34 VDD the past tense form of the verb do: did.
- 35 VDG the -ing form of the verb do: doing.
- 36 VDI the infinitive form of the verb do: do.
- 37 VDN the past participle form of the verb do: done.
- 38 VDZ the -s form of the verb do: does.
- 39 VHB the finite base form of the verb have: have, 've.
- 40 VHD the past tense form of the verb have: had, 'd.
- 41 VHG the -ing form of the verb have: having.
- 42 VHI the infinitive form of the verb have: have.
- 43 VHN the past participle form of the verb have: had.
- 44 VHZ the -s form of the verb have: has, 's.
- 52 EX0 existential there, the word there appearing in the constructions there is..., there are ....
- 54 TO0 the infinitive marker to.
- 56 XX0 the negative particle not or n't.
- 57 ZZ0 alphabetical symbols, e.g. A, a, B, b, c, d.
- 58 -*- "wildword" matching any PoS tag (non-standard extension for phrase-frame queries and result sets).