Controlled Vocabulary

Controlled Vocabulary refers to an established list, organized arrangement, or database of preferred terms and phrases (usually subject or genre/form terms) in which all terms and phrases representing a concept are brought together. A controlled vocabulary is usually listed alphabetically in a subject headings list or thesaurus of indexing terms.

In a controlled vocabulary a preferred term or phrase is designated for use in surrogate records in a retrieval tool (e.g., bibliographic records in the library catalog), the non-preferred terms have references from them to the chosen term or phrase, and relationships among used terms are identified (e.g., broader terms, narrower terms, related terms). There may also be scope notes.

A cataloger or indexer must select terms from a controlled vocabulary when assigning subject headings or descriptors in a bibliographic record to indicate the subject of the work (e.g. a book) in a library catalog, bibliographic database, or an index.

Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies, and other knowledge organization systems. Controlled vocabulary schemes mandate the use of predefined, authorized terms that have been preselected by the designers of the schemes, in contrast to natural language vocabularies, which have no such restriction.

Note: The process of creating, maintaining, and using a controlled vocabulary is called Vocabulary Control.


Janis L. Young and  Daniel N. Joudrey¹ describe Controlled Vocabulary as below:

  • A standardized subject language used to describe the contents of the resources.
  • They generally include:
    • One term chosen as the preferred term
    • Control of its synonyms
    • Disambiguation among homographs/homonyms 
    • Identification of relationships among the terms
    • Cross-references

For consistency and improved retrieval, libraries and other information institutions attempt to suppress the anarchy of natural language when it comes to describing the aboutness of resources. Subject cataloging is more consistent when the vocabulary that is used is controlled.

The main objective of vocabulary control is to promote the consistent representation and comprehensive searching of subject matter.

This is achieved through the control of synonymous and nearly synonymous terms, by distinguishing among homographs and homonyms, and by linking together terms whose meanings are related in some fashion (identifying broader, narrower, and related terms).

But the use of controlled subject languages is only part of subject cataloging; the other part involves classification.

A controlled vocabulary is a list of authorized terms used to provide consistency and uniqueness among subjects in our descriptions of resources. The terms may be called subject headings, or descriptors, index terms, thesaurus terms, identifiers, or subjects.

Whatever you call them – all terms representing the same concept are brought together under one preferred term to provide collocation. For example, in LCSH we use Young adults even if the resource itself uses the phrases Young people and Young persons.

In LCSH, if a homonym is in popular use, we have to address it in some way, as well. For example, we use Bridges for the structures crossing rivers, but Bridges (Dentistry) for a partial denture. The term Bridges (unadorned) cannot be used for both.

Also, relationships among the terms are identified to create a syndetic structure (a network of relationships); we show that a doll is a type of toy and that the two terms representing those concepts have a hierarchical relationship.

Cross references are also created from unauthorized terms. The unauthorized terms point to the chosen (or preferred) term used to represent the concept. For example, we point from Young people to Young adults in LCSH.


Some examples of Controlled Vocabulary are: 

1. Simple Term Lists (Pick Lists)
2. Thesauri
3. Subject Heading Lists (e.g. LCSH, SLSH)
4. Authority Files (e.g. LCNAF)
5. Taxonomies
6. Alphanumeric Classification Schemes (e.g., LCC, DDC, UDC)
7. Ontologies
8. Folksonomies

Simple Term Lists (Pick Lists)

Controlled vocabularies appear in a variety of forms. One such form is the “simple term list” (sometimes called a pick list). This refers to a limited set of terms arranged as a simple alphabetical list or a list that is arranged in some other logically evident order.

These lists are not concerned with semantic relationships. They’re used to describe properties that tend to have a limited number of possibilities.

Examples might be geographic areas (maybe a list of countries or states or cities); maybe a list of languages; or perhaps a list of formats (which might include terms such as text or sound or image). They may be presented as pull-down menus in the cataloging system, so that they are available for easy use. 

Two examples of simple term lists are (a) An alphabetical list of states, e.g., Alabama, Alaska, Arizona, California, Delaware... (b) A list of terms based on physical order or spatial contiguity, e.g., Mercury, Venus, Earth, Mars ... 

Simple term lists need not be hierarchical, if the list is short and there is some intuitive way of navigating the list, it can be useful without further structure.

Subject Heading Lists

A classic and widely used form of Controlled Vocabulary is the Subject Heading List. It is described below:

Subject Heading List is the printed or published list of subject headings which may be produced from the subject authority file maintained by an organization or individual.

Subject heading list contains the preferred subject access terms (controlled vocabulary) that are assigned as an added entry in the bibliographic record which works as an access point and enables the work to be searched and retrieved by subject from the library catalog database. The controlled vocabulary identifies synonyms terms and selects one preferred term among them to be used as the subject heading. For homonyms, it explicitly identifies the multiple concepts expressed by that word or phrase. In short, vocabulary control helps in overcoming problems that occur due to natural language of the document’s subject. Hence, if vocabulary control is not exercised different indexers or the same indexer might use different terms for the same concept on different occasions for indexing the documents dealing with the same subject and also use a different set of terms for representing the same subject at the time of searching. This, in turn, would result in ‘mis-match’ and thus affect information retrieval. Cross-references are used with headings to direct the user from terms not used as headings to the term that is used, and from broader and related topics to the one chosen to represent a given subject.

Subject heading lists may have provision for the construction of pre-coordinated indexing strings including headings, plus rules for combining the single terms in strings and one or more levels of subheading. Based on these rules a subject heading may also be subdivided by the addition of form subdivisions, geographical subdivisions, chronological subdivisions, and topical subdivisions to add greater specificity.

Two popular subject heading lists are Library of Congress Subject Headings (LCSH) and Sears List of Subject Headings.

Examples based on Library of Congress Subject Headings (LCSH) following principles of assigning subject headings as described in Subject Headings Manual of Library of Congress:

English literature—20th century—History and criticism.
Construction industry—United States.
India—History—Autonomy and independence movements.
Piano music (Jazz)—France—History.
Aging—Egypt—Psychological aspects.

Following is an example of LCSH heading “Hotels” from Library of Congress Linked Data Service


Hotels, taverns, etc

Broader Terms
Hospitality industry

Narrower Terms
All-suite hotels
Allergen-free accommodations
Bed and breakfast accommodations
Gay accommodations
Haunted hotels
Historic hotels
Hotel chains
Hotel lobbies
Imaginary hotels
Nonsmoking accommodations
Park lodging facilities
Safari lodges
Single-room occupancy hotels
Tourist camps, hostels, etc

Related Terms
Taverns (Inns)

Earlier Established Forms
Hotels, taverns, etc

LC Classification

Subject headings, like access points based on author names and titles, serve the dual function of location and collocation. Subject heading lists are used by library catalogers to aid them in their choice of appropriate subject headings and to achieve uniformity. Subject Headings and thesauri are one of the two methods used to facilitate subject access to library materials. The other is library classification. Classification organizes knowledge and library materials into a systematic order according to their subject content, while subject headings provide access to documents through vocabulary terms. Subject Headings or Thesauri can assign multiple terms to the same document, but in classification, each document can only be placed in one class.

In a MARC bibliographic record Subject Heading is given in a 6XX field, consisting of either a single element in an $a subfield or of an $a subfield followed by subdivisions in $v, $x, $y, and/or $z subfields, that designates what a work is or what it is about. 


Janis L. Young and  Daniel N. Joudrey¹ describe this question as below:

Natural Languages Approaches


  • Extracting words from documents
  • Uncontrolled keywords / tags

Where we use natural language most frequently

  • General notes
  • Content notes
  • Summaries

There are many ways that catalogers could describe the subject matter of documents.

They could pull words from the documents themselves, or catalogers could simply pull words out of their heads. We are aware, however, that these approaches lead to inconsistency.

People can use very different words to express the same idea. For example, just in English, sweet carbonated beverages, such as Coca-Cola or Pepsi, are referred to by various terms in the United States alone. In various parts of the US, that beverage is referred to as a soda, a pop, a soft drink, a soda pop, a tonic, or even as a “coke” (used generically, not referring to a specific brand).

And we also know that the same word (for example, bridge) can be used for two or more different concepts. The words chosen by a creator to represent a concept may vary throughout a text (using first one word and then a synonym, then another variation, and so on).

With this kind of inconsistency, it’s easy to see why extraction or keywords (otherwise known as tags) can be problematic. 

In recent years, user tagging has become popular throughout social media but it also has been used for some specific library projects. For example, at the Library of Congress, the Prints and Photographs Division has invited the public to supply tags for digital historical images that it has uploaded to a FLICKR website. For a project of this nature, tagging has been helpful in identifying unknown places, objects, and people.

But in other contexts, tagging or general keyword assignment can be chaotic and unhelpful.

If this were the only approach to describing resources, then that violates one of the goals of organizing information – to collocate (or bring together) like resources. 

We do, however, use natural language in creating some library metadata.

We use it for all sorts of notes, particularly in contents notes where we transcribe the table of contents into our bibliographic description. We record the author’s words exactly in these cases; for example, we would never replace the phrase “soda pop” in a chapter title with the preferred term “soft drinks”!

We also see natural language used in summaries. If a summary is found in a publisher’s blurb on the back of a book, we will record it as it is written.

In some cases, the natural language terms found in summaries, abstracts, tables of contents, and other parts of the record can be useful in retrieving the resource if the terms used are different from the terms found in the controlled vocabulary.

In some cases, the searcher may choose a term that is widely used among metadata records and as a result get an overwhelming number of search results filled with many, many false drops. This certainly can impede efficient and effective retrieval.

Semantic Difficulties with Keywords
  • No synonym control
  • No homograph/homonym control
  • Function as different parts of speech
  • No relationships among terms
  • Little or no context
  • Puts the burden on the searcher

Lots of semantic difficulties can arise when searching by keyword only. For example:
  • We love synonyms, which are different ways of expressing the same meaning; this affects the way resources are written, the way resources are described for organizing purposes, and also the way we search.
  • Virtually every word in the English language has more than one meaning or sense, and many of those senses even have more than one nuance.
  • Many words can also be used as various parts of speech, such as nouns, verbs, adjectives, and adverbs. And, most search systems cannot yet distinguish among different meanings or various parts of speech.
  • There also aren’t relationships among keywords. There is a relationship between the terms toys and dolls, but the individual keywords are not connected.
  • When keywords are used to describe resources, often those keywords lack any context. So we are left asking, How do those individual keywords relate to each other?
  • And finally, searchers must come up with every possible word that is used to describe the concept if they are interested in getting all the pertinent materials. This can often require in-depth knowledge of the field.
Because of these types of difficulties, librarians, archives, museums, and other information institutions tend to favor the use of controlled vocabularies


Controlled Vocabulary Quiz -- List of questions, answers, and quizzes on Controlled Vocabulary from Library and Information Science Questions Answers Quizzes.



1. Janis L. Young and  Daniel N. Joudrey, Library of Congress, "Library of Congress Subject Headings: Online Training," (accessed March 17, 2020).

