The current version of the corpus holds over 9.3 million words, of which approximately 103,900 belong to the 18th century and 9.2 million to the 19th. The unit is now compiling the bulk of 18th-century grammars for the sake of a better representativeness and comparability with the 19th century. The corpus has been compiled in two stages corresponding to the two different versions of the corpus. Each version of the corpus may be used for different research purposes.

Corpus Files

Plain text corpus (.txt): these files hold the transcribed version of the text as found in the original sources, where spelling and word division have been preserved.


Such pauses have the same effect as a strong emphasis; and are subject to the same rules: especially to the caution, of not repeating them too frequently. For as they excite uncommon attention, and of course raise expectation, if the importance of the matter be not fully answerable to such expectation, they occasion disappointment and disgust.

(Lindley Murray, The English Reader, 1799).


POS-tagged corpus (.pos): these files contain the POS-tagged version of the corpus, which has been carried out by means of CLAWS, which assigns a morpho-syntactic tag to each word in the corpus, punctuation marks included. The C7 tagset has been employed.


Such_DA pauses_NN2 have_VH0 the_AT same_DA effect_NN1 as_CSA a_AT1 strong_JJ emphasis_NN1 ;_; and_CC are_VBR subject_II21 to_II22 the_AT same_DA rules_NN2 :_: especially_RR to_II the_AT caution_NN1 ,_, of_IO not_XX repeating_VVG them_PPHO2 too_RG frequently_RR ._. For_CS as_CSA they_PPHS2 excite_VV0 uncommon_JJ attention_NN1 ,_, and_CC of_RR21 course_RR22 raise_VV0 expectation_NN1 ,_, if_CS the_AT importance_NN1 of_IO the_AT matter_NN1 be_VBI not_XX fully_RR answerable_JJ to_II such_DA expectation_NN1 ,_, they_PPHS2 occasion_VV0 disappointment_NN1 and_CC disgust_NN1 ._.

(Lindley Murray, The English Reader, 1799).

How to cite

Calle-Martín, Javier, Juan Lorente-Sánchez, Marta Pacheco-Franco and Jesús Romero-Barranco. 2024. The Málaga Corpus of English Grammars (CEG). Málaga: University of Málaga. Available from https://grammarcorpus.uma.es.