How the Spanish-English dictionary is compiled

For those that are interested, here is some information on how this site's Spanish-English dictionary has been (and is being) compiled.

Word inclusion criteria

This dictionary aims to be very comprehensive in the end. However, for the practical purpose of getting the project started and getting the first versions on line in a short amount of time, the dictionary currently includes words chosen according to certain criteria:

It is worth noting that already, the vast majority of words looked up are in the dictionary. Most "lookup failures" are actually because the use has misspelt the word, or because the word in question isn't Spanish!


Here is a brief overview of some of the techniques used to compile the dictionary.

Speakers manually annotating wordlists

In this part of the process, native speakers are presented with printed lists of chosen words (generally chosen according to the above criteria) and simply asked to annotate them manually. They are asked to provide example sentences containing the given words, or what they perceive a common expressions containing the words.

This process is useful in picking up certain colloquial expressions that aren't common in the types of text chosen for corpus analysis (see below). It generally works well with genuinely common expressions and compounds. It also has the disadvantage that in cases where there are no clearly common expressions, informants sometimes feel obliged to "invent" expressions that turn out not to be so common. For this reason, annotated lists are then cross-checked with another native speaker. An expression is usually only included where the cross-checker agrees that it is a relatively common phrase.

Continue to part 2 of the dictionary methodology

 Dictionary methodology (part 2)
 Spanish-English dictionary
 Site home

Page written by Neil Coffey. Please note that this page is for information only. The actual methods used in compiling the dictionary are subject to change over time, and to certain details not being disclosed.