BIBFRAME (Bibliographic Framework)

BIBFRAME (Bibliographic Framework) is a data model for bibliographic description. BIBFRAME was designed to replace the MARC standards, and to use linked data principles to make bibliographic data more useful both within and outside the library community.¹

Initiated by the Library of Congress, BIBFRAME provides a foundation for the future of bibliographic description, both on the web, and in the broader networked world that is grounded in Linked Data techniques. A major focus of the initiative is to determine a transition path for the MARC 21 formats while preserving a robust data exchange that has supported resource sharing and cataloging cost savings in recent decades³.

BIBFRAME provides a foundation for the future of bibliographic description that is grounded in Linked Data techniques.

Overview of the BIBFRAME 2.0 Model

BIBFRAME (Bibliographic Framework) is an initiative to evolve bibliographic description standards to a linked data model, in order to make bibliographic information more useful both within and outside the library community³.

When a resource is cataloged -- a book, for example -- the resulting description includes information elements such as the author, what the book is about, various published forms, and information about copies of the book.

BIBFRAME 2.0 organizes this information into three core levels of abstraction: Work, Instance, and Item,

Work. The highest level of abstraction, a Work, in the BIBFRAME context, reflects the conceptual essence of the cataloged resource: authors, languages, and subjects.
Instance. A Work may have one or more individual, material embodiments, for example, a particular published form. These are Instances of the Work. An Instance reflects information such as its publisher, place and date of publication, and format.
Item. An item is an actual copy (physical or electronic) of an Instance. It reflects information such as its location (physical or virtual), shelf mark, and barcode.

BIBFRAME 2.0 further defines additional key concepts that have relationships to the core classes:

Agents: Agents are people, organizations, jurisdictions, etc., associated with a Work or Instance through roles such as author, editor, artist, photographer, composer, illustrator, etc.
Subjects: A Work might be “about” one or more concepts. Such a concept is said to be a “subject” of the Work. Concepts that may be subjects include topics, places, temporal expressions, events, works, instances, items, agents, etc.
Events: Occurrences, the recording of which may be the content of a Work.

The BIBFRAME vocabulary consists of RDF classes and properties. Classes include the three core classes listed above as well as various additional classes, many of which are subclasses of the core classes. Properties describe characteristics of the resource being described as well as relationships among resources. For example: one Work might be a “translation of” another Work; an Instance may be an “instance of” a particular BIBFRAME Work. Other properties describe attributes of Works and Instances. For example: the BIBFRAME property “subject” expresses an important attribute of a Work (what the Work is about), and the property “extent” (e.g. number) expresses an attribute of an Instance.

Contents

History
Design
BIBFRAME (Bibliographic Framework) Frequently Asked Questions (FAQ)
BIBFRAME: Why? What? Who?
BIBFRAME versus Integrated Library System (ILS)
BIBFRAME Editor
BIBFRAME and Linked Data
BIBFRAME Linked Data: A Conceptual Study on the Prevailing Content Standards and Data Model
BIBFRAME Tools & Resources
BIBFRAME Semantic Web and Linked Data Quiz
BIBFRAME Videos

HISTORY

The MARC Standards, which BIBFRAME seeks to replace, were developed by Henriette Avram at the US Library of Congress during the 1960s. By 1971, MARC formats had become the national standard for dissemination of bibliographic data in the United States, and the international standard by 1973. In a provocatively titled 2002 article, library technologist Roy Tennant argued that "MARC Must Die", noting that the standard was old; used only within the library community; and designed to be a display, rather than a storage or retrieval format. A 2008 report from the Library of Congress wrote that MARC is "based on forty-year-old techniques for data management and is out of step with programming styles of today." In 2012, the Library of Congress announced that it had contracted with Zepheira, a data management company, to develop a linked data alternative to MARC. Later that year, the library announced a new model called MARC Resources (MARCR). That November, the library released a more complete draft of the model, renamed BIBFRAME. The Library of Congress released version 2.0 of BIBFRAME in 2016².

DESIGN

BIBFRAME is expressed in RDF and based on three categories of abstraction (work, instance, item), with three additional classes (agent, subject, event) that relate to the core categories. While the work entity in BIBFRAME may be "considered as the union of the disjoint work and expression entities" in IFLA's Functional Requirements for Bibliographic Records (FRBR) entity-relationship model, BIBFRAME's instance entity is analogous to the FRBR manifestation entity. This represents an apparent break with FRBR and the FRBR-based Resource Description and Access (RDA) cataloging code. However, the original BIBFRAME model argues that the new model "can reflect the FRBR relationships in terms of a graph rather than as hierarchical relationships, after applying a reductionist technique." Since both FRBR and BIBFRAME have been expressed in RDF, interoperability between the two models is technically possible².

Specific formats

The BIBFRAME model includes a serial entity for journals, magazines, and other periodicals. Several issues have prevented the model from being used for serials cataloging. BIBFRAME lacks several serials-related data fields available in MARC.

A 2014 report was optimistic about BIBFRAME's suitability for describing audio and video resources, but also expressed concern about the high-level Work entity, which is unsuitable for modeling certain audio resources².

BIBFRAME (BIBLIOGRAPHIC FRAMEWORK) FREQUENTLY ASKED QUESTIONS (FAQ)

Below is a transcription of the Frequently Asked Questions about BIBFRAME as provided by the Library of Congress³.

1. What is the Bibliographic Framework Initiative?

BIBFRAME Initiative is the foundation for the future of bibliographic description that happens on the web and in the networked world. It is designed to integrate with and engage in the wider information community and still serve the very specific needs of libraries. The BIBFRAME Initiative will bring new ways to:

Differentiate clearly between conceptual content and its physical/digital manifestation(s)
Unambiguously identify information entities (e.g., agents)
Leverage and expose relationships between and among entities

In a web-scale world, it is imperative to be able to cite library data in a way that differentiates the conceptual work (a title and author) from the physical details about that work's manifestation (page numbers, whether it has illustrations). It is equally important to produce library data so that it clearly identifies entities involved in the creation of a resource (authors, publishers) and the concepts (subjects) associated with a resource.

Although the BIBFRAME Initiative will instantiate a new way to represent and exchange bibliographic data – that is now provided by the Machine Readable Cataloging (MARC) format – its scope is broader. As an initiative, it is investigating all aspects of bibliographic description, data creation, and data exchange. In addition to replacing the MARC format, this includes accommodating different content models and cataloging rules, exploring new methods of data entry, and evaluating current exchange protocols.

2. What is the BIBFRAME Model and BIBFRAME Vocabulary?

The BIBFRAME Model is a conceptual/practical model that balances the needs of those recording detailed bibliographic description, the needs of those describing other cultural materials, and those who do not require such a detailed level of description. There are three high-level classes, or entities, in the BIBFRAME Model:

BIBFRAME Work
BIBFRAME Instance
BIBFRAME Item

BIBFRAME Work identifies the conceptual essence of something; a BIBFRAME Instance reflects the material embodiment of a Work. A BIBFRAME Item is an actual copy (physical or electronic) of an Instance. You can read more about the BIBFRAME Model here.

The BIBFRAME Vocabulary is the key to the description of resources. Like the MARC format has a defined set of elements and attributes, the BIBFRAME Vocabulary has a defined set of classes and properties. A class identifies a type of BIBFRAME resource (much like a MARC field might bundle a single concept); properties serve as a means to further describe a BIBFRAME resource (much like MARC subfields more specifically identify aspects of the concept).

3. What is BIBFRAME 2.0?

When the BIBFRAME project was initiated in 2012, it was called simply BIBFRAME and supporting model, vocabulary, tools, analysis, and experimentation took place. Based on that experience, expert advice, community comment, and a Library of Congress pilot, it was concluded that the vocabulary required redevelopment. The BIBFRAME 2.0 vocabulary replaces the original one (now called the BIBFRAME 1.0 vocabulary). Tools and other supporting components will follow.

4. What are the general differences between MARC and BIBFRAME?

As a bibliographic description format, the MARC format focuses on catalog records that are independently understandable. MARC aggregates information about the conceptual work and its physical carrier and uses strings for identifiers such as personal names, corporate name, subjects, etc. that have value outside the record itself.

Instead of bundling everything neatly as a “record” and potentially duplicating information across multiple records, the BIBFRAME Model relies heavily on relationships between resources (Work-to-Work relationships; Work-to-Instance relationships; Work-to-Agent relationships). It manages this by using controlled identifiers for things (people, places, languages, etc). MARC employs some of these ideas already (geographic codes, language codes) but BIBFRAME seeks to make these aspects the norm rather than the exception. In short, the BIBFRAME Model is the library community’s formal entry point for becoming part of a much larger web of data, where the links between things are paramount.

5. Will RDA elements be part of the BIBFRAME vocabulary?

Yes. RDA is an important source of elements in the vocabulary for BIBFRAME, even though it generally aims to be independent of any particular set of cataloging rules. We also expect community extensions to emerge which will accommodate additional elements.

6. How do I use bibliographic data using cataloging rules other than AACR2 or RDA in the context of BIBFRAME?

Work is planned to analyze elements in other cataloging rule sets and reconcile or add them to the BIBFRAME vocabulary as appropriate. This along with community extensions will enable broad use of BIBFRAME.

7. Why a single namespace for the BIBFRAME vocabulary?

There are many benefits of vocabulary reuse, but as with many things, there are costs as well that need to be carefully considered. Designing systems that leverage multiple vocabularies managed by various stakeholders is a tricky issue and one that requires careful consideration. There are many reasons why namespaces/vocabularies "drift" over time (“not found” errors being a worse case example) and all of these may have an affect on systems. Business acquisitions, economic factors, organizational changes, changing social interests, etc. are just a few reasons for causing such change. Thinking ahead to infrastructure to support the next 40+ years of libraries, namespace persistence is a key point to consider when dealing with how best to integrate and invest in vocabulary terms outside of ones community.

8. How is the BIBFRAME Vocabulary and Documentation licensed?

Public domain/CC0. BIBFRAME material and components issued by the Library of Congress are in the public domain. If one uses text from any of the documents it is customary to make attribution, of course.

9. When should we move to BIBFRAME?

BIBFRAME is far from an environment that you could move to yet. The model and its components are still in discussion and development -- a work in progress. When it is more mature, vendors and suppliers will need time to adjust services to accommodate it. And then we can expect a mixed environment for some time.

10. Are my MARC records convertible into BIBFRAME?

Yes. For the BIBFRAME 1.0 vocabulary we provided a conversion tool and for 2.0 we will be providing specifications and downloadable software tool and services. In the future, other community-provided tools and services will help community members to transform and move their data from MARC to BIBFRAME.

11. Can I get MARC records from BIBFRAME resources?

Not at this time. In the future, when descriptions begin to be exchanged in BIBFRAME, there will be utilities that provide transformation to MARC for organizations needing MARC for some or all of their internal systems. Presently, we are focusing on the BIBFRAME model, needed vocabulary, and required exchange mechanisms. Only after those elements have sufficiently stabilized would attention turn to a BIBFRAME-to-MARC transformation.

12. How will users/systems/organizations exchange or transfer BIBFRAME resources?

This is an active area of exploration at this time. The widely used communications protocols will be adapted and new internet-based protocols will be hospitable to bibliographic data.

13. BIBFRAME seems to be concentrating on mapping MARC fields—isn't this a new format instead of repackaging an old one?

The mapping activity is grounded on the premise that the millions of existing MARC records need to be able to be transformed into BIBFRAME resources, but BIBFRAME as a "format" is very different from MARC. This can be seen from the difficulty of the mapping. One factor that brings the data together is the new library cataloging rule set, Resource Description and Access (RDA). MARC has been adapted to carry RDA data, and BIBFRAME is being developed with RDA data as a prominent content type. Both MARC and BIBFRAME accommodate data recorded by other rules but the cataloging rules give them similarity. The repackaging is not of MARC data but of cataloging content data.

14. How can I get involved in BIBFRAME?

Read the documents at the Bibliographic Framework Initiative Web site and join the BIBFRAME listserv. As ideas are aired on the listserv, give feedback, respond to proposals, and make your requirements known. If you have the facilities, write code and experiment.

BIBFRAME: WHY, WHAT, WHO?

An article by Paul Frank, with the assistance of Judith P. Cannan and Kevin Ford⁶. (May 1, 2014)

BIBFRAME is short for Bibliographic Framework. It began as an LC initiative in 2011 to transition from a legacy, MARC-based environment to one that fully integrates with and reaps the benefits of the World Wide Web. BIBFRAME is the foundation for the future of bibliographic description ; it will become the primary means of bibliographic data exchange; and it will replace the MARC Format. BIBFRAME’s primary benefit to the community of knowledge seekers is its ability to enhance information exploration through the use of links and World Wide Web technologies, creating a virtual “stack browsing” experience while improving on physical browsing.

By integrating bibliographic data into the linked and networked environment of the World Wide Web, BIBFRAME will enhance information discovery and promote knowledge navigation. It will reduce the costs associated with traditional cataloging because it will lessen the time associated with maintaining authority data.

BIBFRAME defined

BIBFRAME relies on relationships between resources, not on bibliographic description alone. In the BIBFRAME environment, we will not refer to bibliographic “records” in the traditional sense of the word. BIBFRAME relies on controlled identifiers to identify entities, not on controlled strings of data. A controlled identifier is a number or code that uniquely identifies an entity, for example, “n 84125431” identifies “United States. Congress. Senate. Committee on Armed Services. Task Force on Selected Defense Procurement Matters.” The text in quotes is a controlled string of data. Such controlled strings of data are the hallmark of a MARC record.

BIBFRAME: Why is it important?

BIBFRAME opens the world community to the wealth of authoritative bibliographic data, which is essential to the access of knowledge and which has been so carefully curated and managed by libraries for generations. Library bibliographic data is built upon a solid infrastructure of authoritative names and subjects. It is reliable, consistent, and “clean,” thanks to its use of regulated standards. But it is encased in a data format that is not easily understood or easily deployed by non-library professionals. BIBFRAME seeks to lower the access barrier, partly by adopting contemporary data practices but more by fostering an environment that is not just on the World Wide Web but part of the World Wide Web. With BIBFRAME, the library community has an opportunity to make its controlled and well-crafted bibliographic data accessible to a global audience. Wider accessibility of a library’s bibliographic data makes the library’s resources and holdings known and available to “outsiders.” If one of those outsiders, for example, is Google, then exposing library bibliographic data in this way can translate into more relevant search results for users, and more patrons visiting library collections. BIBFRAME is very much a modernization effort.

And, as with most modernization efforts, this effort requires a new way of thinking and doing things. By integrating our bibliographic data into the World Wide Web, sharing it in non-traditional, non-MARC ways, and embracing the use of links to connect resources, libraries will create an atmosphere of knowledge exploration that cannot be achieved using the MARC Format to expose our bibliographic data.

The MARC Format is the machine-readable standard currently used by the library community. Created in the mid-1960s, it has been in continuous use for nearly fifty years and touches every aspect of library technology ranging from traditional nonprofit library catalogs to commercial library vendors. The library community uses the standard to record and share bibliographic data, and the MARC record is the “package” that contains and communicates the data. A MARC record is an aggregation of information about a described resource and its physical carrier. See Appendix A for an illustration of a typical MARC record.

The fact that the MARC Format was created for a defined set of users—the library community—cannot be ignored. Although the MARC Format has served the library community admirably for nearly half a century, technological advancements in the way all data can be created and shared has eclipsed the once revolutionary ability of this format to share its bibliographic data and has left the library community isolated. Information retrieval systems such as Google cannot harvest bibliographic data encoded in MARC and make it accessible in multiple ways because of the limitations of the format. When an information retrieval system cannot interpret MARC bibliographic data, it might present it in its raw form, not coupled with anything to increase its value to the patron, or simply fail to interpret the data. BIBFRAME will present bibliographic data in such a way that information retrieval systems can make semantic sense of it, so that the bibliographic data can be presented to a patron in an enhanced and linked manner, whether the information retrieval system is owned by Google or owned by the Library of Congress.

BIBFRAME: What will the final results be?

Rather than collocating data into a record, the BIBFRAME data will be decentralized with links to data replacing the MARC strings of data. The same resource described in the static, two-dimensional MARC record in Appendix A becomes a springboard for knowledge exploration when visualized through the BIBFRAME model. Appendix B shows a visualization of BIBFRAME’s powerful use of links to illuminate relationships among resources. The BIBFRAME visualization shares some of the attractions of browsing open stacks in a library—where the patron is in search of a particular resource, but in the process, is led to other related and relevant resources on the shelf. BIBFRAME will promote a more systematic and efficient retrieval and exploration of library resources than available when physically browsing the stacks or consulting a MARC-based catalog.

BIBFRAME: Who benefits?

First and foremost, the library community benefits from BIBFRAME as this system presents a new data model for libraries. It provides a contemporary way for libraries to realize cost savings in creating bibliographic data to share and exchange among their peers. BIBFRAME relies on links to avoid the duplicative efforts of manually creating multiple individual records for the same resource. By creating a single resource description in BIBFRAME for a work, and then linking that description to all versions of the work, and to other related resources, such as translations, movies, dramatizations, music, etc., libraries will be able to describe more resources, quicker and with increased efficiency. See Appendix B for an example. By relying on links to identify relationships, BIBFRAME will endow bibliographic data with entirely new dimensions that will be a benefit to both libraries and information seekers. Libraries will be better able to reveal the depth and breadth of their holdings, illuminating resources and connections that are vital to the community of knowledge seekers. BIBFRAME can help a local public library publish its holdings for a particular resource in such a way that if a reader searches for that resource in a Google search, the search engine could highlight the library’s holdings. See Appendix C for an example.

The library community will also benefit from BIBFRAME’s ability to make bibliographic work relevant in the twenty-first century, with data communication possibilities beyond the scope of the MARC Format. BIBFRAME employs resources beyond the library community. It enables librarians to embrace a wider range of data-sharing formats and technologies, with a reciprocal increase in choices of methodologies to employ in sharing library data. Authority work has historically been one of the most costly aspects of bibliographic description. BIBFRAME’s use of controlled identifiers over the MARC Format’s reliance on controlled text strings for entity description will lessen considerably the time and costs associated with maintenance of authority data. One controlled string of data may appear in thousands of MARC records. If that controlled string of data changes, a fairly common occurrence, all MARC records containing that controlled string of data need to be changed as well. Maintenance of MARC records can be costly and time-consuming. A controlled identifier does not change, even if the controlled string of data associated with that identifier changes; BIBFRAME thus reduces the time and the costs of bibliographic maintenance. In a similar way, BIBFRAME will integrate into the current model of cooperative bibliographic data and will provide libraries worldwide with the means to increase the visibility of their collections. BIBFRAME’s use of controlled identifiers, which are language-neutral, over MARC controlled text strings, which are language-dependent, facilitates wider international sharing of bibliographic description, with the same beneficial return on investment that results from the use of controlled identifiers over controlled text strings.

There are many unrevealed library resources backed by authoritative and bibliographic data that deserve to be brought to the attention of the community of knowledge seekers. BIBFRAME is not just on the World Wide Web, it is a part of the World Wide Web and through its use of links and World Wide Web technologies, BIBFRAME enables this rich and authoritative bibliographic data embedded in MARC records and in library catalogs to be harvested by Web-based search engines and made more accessible to the community of knowledge seekers. With this wider exposure of bibliographic data, the community of knowledge seekers will not need to connect directly with a particular library; the library’s data will be brought directly to the community. The virtual “stack browsing” experience that results from this wider exposure can result in unimaginable discoveries. When the well-crafted and authoritative data that librarians have been creating for years is joined with the technology of the World Wide Web and existing linking models, the possibilities for data sharing and knowledge dispersion are enhanced and augmented.

Appendix A

A MARC bibliographic record for the resource Tolstoy’s War and Peace, with MARC encoding elements highlighted. One resource searched, one resource identified:

Appendix B

A visualization of BIBFRAME’s powerful use of links to illuminate relationships among resources, using Tolstoy’s War and Peace:

A search for the novel War and Peace will reveal not only all editions of the novel, but also reveal the related translations, films, television programs, musical works, art work, etc., and even related resources that have similar subject content.

Note the Instance links in the visualization. Two editions of War and Peace require two MARC records, each of which must duplicate the title, author, subjects, and other information. With BIBFRAME, one record is created representing War and Peace, thereby recording the title, author, and subjects only once. Two smaller, non-duplicative records would be created, one for each of the two editions, and then linked to the main work War and Peace. Information will have only been entered once. In this way, using BIBFRAME, catalogers would nominally be able to describe more resources not only more efficiently but also more quickly because we are capitalizing on links and not relying on duplicative effort.

Appendix C

BIBFRAME can help a local public library publish its holdings for a particular resource in such a way that if someone searches for that resource in a Google search, the search engine could highlight the library’s holdings.

Here is a search result that you might see today:

BIBFRAME VERSUS INTEGRATED LIBRARY SYSTEM (ILS)

BIBFRAME is not an ILS. BIBFRAME is a tool through which we are utilizing linked data techniques to increase the visibility and usage of library data on the Web but a next-generation ILS must still be created to fully utilize the BIBFRAME tool⁴.

BIBFRAME EDITOR

The BIBFRAME Editor (BFE) is a tool that enables the organization of this information through the input of BIBFRAME vocabulary elements⁴.

BIBFRAME AND LINKED DATA

The section is drawn from the Library of Congress BIBFRAME Manual⁴

BIBFRAME is a linked data project that seeks to lower barriers to accessing library data, partly by adopting contemporary data practices, but more by fostering an environment that is not just on the World Wide Web but part of the World Wide Web. Library bibliographic data is built upon a solid infrastructure of authoritative names and subjects. It is reliable, consistent, and “clean,” thanks to its use of regulated standards. But it is encased in a data format that is not easily understood or easily deployed by non-library professionals.

With BIBFRAME and linked data, the library community has an opportunity to make its controlled and well-crafted bibliographic data accessible to a global audience. Wider accessibility of a library’s bibliographic data makes the library’s resources and holdings known and available to “outsiders.” If one of those outsiders, for example, is Google, then exposing library bibliographic data in this way can translate into more relevant search results for users, and more patrons utilizing library collections.

What is Linked Data?

The Web supports linked, related documents. It also allows for linking related data and stating the relationship amongst resources. The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. Key technologies that support linked data include the following:

Uniform Resource Identifiers (URIs) - a generic means to identify entities or concepts in the world
Hypertext Transfer Protocol (HTTP) - a simple yet universal mechanism for retrieving resources - descriptions of resources
Resource Description Framework (RDF) - a generic graph-based data model with which to structure and link data that describes things in the world.

Using Anglo-American Cataloging Rules (AACR2), Resource Description & Access (RDA), and MARC 21 for the creation of authority and bibliographic records in library environments results in “flat” records that live in silos of data and are not integrated with the Web. By transitioning from a static two-dimensional collocated record to decentralized data with links that illuminate relationships, linked data potentially increases the visibility and usage of library data on the Web. Integrating library data with the large number of structured data sources and links on the Web thus potentially enhances the sharing of library data with a wider audience. Moreover, linked data allows for a fuller implementation of RDA.

Linked data is integral to the Semantic Web, a collaborative effort led by the World Wide Web Consortium (W3C) to provide a framework that allows data to be shared and reused across application, enterprise, and community boundaries.4

What is a Web of data?

The semantic web of data provides a structure that allows machines to return information about the relationships between resources; it makes use of the existing http protocol and common linked data standards such as RDF to provide the semantic structure. The traditional web of documents is characterized by a flat web of links between documents and files posted on the web.

Web of Documents	Web of Data
information resources	“real-world objects”
links between documents	links between things
unstructured data	structured data
implicit semantics	explicit semantics
for human consumption	for humans and machines

A Web of Data uses a set of best practices for publishing and linking structured data on the Web with technologies that are more generic, more flexible, and which make it easier for data consumers to discover and integrate data from a large number of sources and links.

Resource Description Framework (RDF)

RDF is the standard model for exchange of data on the Web. RDF structures relationships between resources, people, and things on the web, and uses a graph model to represent the relationships. RDF and related standards are maintained by the World Wide Web Consortium (W3C).
The RDF data model consists of:

Triple statements (informally called “triples”)
URIs and IRIs
Ontologies and vocabularies

Triple Statements

RDF uses triples to make systematized statements about semantic data. The subject, predicate, and object are the basis of the triple statement, and can be modeled using graph data. Graph data is used for the semantic web, and represents the relationships between resources, books, people, etc. in a way that computers can process the information.

This is a graph data model of the triple statement "This work was written by this author."

Subjects, predicates, and objects can all be identified by URIs and Internationalized Resource Identifiers (IRIs). In RDF, URIs and IRIs retrieve content to be read by humans and machines via content negotiation, the use of redirects, or the minting of hash tag identifiers. Humans can get a Hypertext Markup Language (HTML) page to read, and machines can retrieve an RDF Extensible Markup Language (XML) file upon which they can interpret and act.

Uniform Resources Identifiers (URIs) and Internationalized Resource Identifiers (IRIs)

On the traditional Web, URIs are used primarily for Web documents -- to link to them, and to access them in a browser. The notion of resource identity was not so important on the traditional Web; a URL simply identified whatever we see when we type it into a browser. On the Semantic Web, URIs identify not just Web documents, but also real-world objects like people and cars, and even abstract ideas and non-existing things like a mythical unicorn.

The IRI was defined by the Internet Engineering Task Force (IETF) in 2005 as a new internet standard to extend upon the existing URI scheme. While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth. IRIs are defined by RFC 3987.

Triple Statements and URIs/IRIs

The subject of a triple is the URI identifying the described resource. The object can either be a simple literal value, like a string, number, or date; or the URI of another resource that is somehow related to the subject. The predicate, in the middle, indicates what kind of relation exists between subject and object, e.g., this is the name or date of birth (in the case of a literal), or the employer or someone the person knows (in the case of another resource). The predicate is also identified by a URI. These predicate URIs come from vocabularies, collections of URIs that can be used to represent information about a certain domain.

A blank node is a resource without a URI. IRIs and literals together provide the basic material for writing down RDF statements. In addition, it is sometimes handy to be able to talk about resources without bothering to use a global identifier.

There are multiple ways of creating a URI. The Library of Congress typically works through ID.LOC.GOV, the Library of Congress Linked Data Service, where a base is defined for any given dataset. ID.LOC.GOV will be explored in further detail in Unit 4 of this manual.

Vocabularies and Ontologies

Vocabularies and ontologies allow us to add meaning and relationship information in triple statements, and are in standard formats so that computers can process the meaningful relationships and serve meaningful search results to humans. Vocabularies and ontologies are the basic building blocks for inference techniques on the Semantic Web. Ontologies are a means of organizing and conceptualizing a domain of interest, and tend to be used for more complex collections of terms. Vocabularies are used when such complexity is not necessary. Different institutions develop unique vocabularies and your BIBFRAME use will comply with local norms and guidelines.

BIBFRAME LINKED DATA: A CONCEPTUAL STUDY ON THE PREVAILING CONTENT STANDARDS AND DATA MODEL

By Jung-Ran Park, Andrew Brenza and Lori Richards⁷

Abstract

The BIBFRAME model is designed with a high degree of flexibility in that it can accommodate any number of existing models as well as models yet to be developed within the Web environment. The model’s flexibility is intended to foster extensibility. This study discusses the relationship of BIBFRAME to the prevailing content standards and models employed by cultural heritage institutions across museums, archives, libraries, historical societies, and community centers or those in the process of being adopted by cultural heritage institutions. This is to determine the degree to which BIBFRAME, as it is currently understood, can be a viable and extensible framework for bibliographic description and exchange in the Web environment. We highlight the areas of compatibility as well as areas of incompatibility. BIBFRAME holds the promise of freeing library data from the silos of online catalogs permitting library data to interact with data both within and outside the library community. We discuss some of the challenges that need to be addressed in order to optimize the potential capabilities that the BIBFRAME model holds.

1. Introduction

Over the last several decades, the library community has been faced with the challenge of remaining relevant as an authoritative source of bibliographic data within the larger networked environment of the Web. This relevance has particularly been tested by what a number of information professionals see as the library community’s reliance on resource description such as Machine Readable Cataloging (MARC), which do not fully support the establishment of relationships between resources across the Web at large nor optimize library data for machine readability. As a result, the vast majority of bibliographic data held in libraries has been locked in library catalogs, which, although automated, essentially function as electronic equivalents of the physical card catalogs of a hundred years ago [1].

However, due to the rapidly changing technology environment, there is now the opportunity for the library community to expose the data created by cataloging and metadata professionals and to establish interconnections to related resources across the Web [2]. Newer technologies, such as developed by the World Wide Web Consortium’s (W3C) linked open data (LOD) initiative under the banner of the Semantic Web, offer libraries the potential to permit library data to be read and indexed by major online search engines, enhancing user access to authoritative sources of bibliographic data, as has been the library community’s historic role to create. As the World Wide Web Consortium defines it, the Semantic Web “is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation” [3]. In other words, the Semantic Web is a method whereby those who are creating content on the Web can markup this content with specific types of metadata in such a way that machines, meaning Web browsers and other applications, can better understand it and use it in novel ways.

Already a number of prominent libraries have developed projects that have published library data that are in compliance with Semantic Web principles, including the Swedish National Library, the French National Library (BnF), the British Library, the Spanish National Library, the German National Library as well as the OCLC [2]. Additionally, implementation of Semantic Web technologies like W3C’s Resource Description Framework (RDF) within the library community holds the potential for enriching user experience by permitting users to explore the diverse interconnections between resources through optimizing the machine readability of library data. Lastly, by altering the cataloging process to conform to LOD standards, libraries are afforded the opportunity to reduce cataloging costs through a reduction in duplicate cataloging efforts and to better leverage existing bibliographic data produced elsewhere.

In response to these challenges and opportunities, the Library of Congress (LOC) has developed a high-level model of bibliographic description called the Bibliographic Framework Initiative or BIBFRAME, which aims not only to replace MARC but to provide a framework for optimizing library data within the networked environment. BIBFRAME is essentially an entity-relationship model which uses the Web as architecture and a Resource Description Framework/Extensible Markup Language (RDF/XML) serialization for the description of bibliographic resources. It involves a radical reconceptualization of bibliographic description, eliminating the static, bibliographic record as the product of cataloging in favor of a series of machine readable statements that result in a graph of interconnected entities.

The purpose of this paper will be to examine the development of BIBFRAME through a comprehensive review of relevant literature. We will begin with an overview of BIBFRAME by LOC, outlining the history and structure of the model [in Section 2]. We will then examine the relationship of BIBFRAME to other relevant bibliographic models and content standards including MARC [in Section 3.1], Functional Requirements for Bibliographic Records (FRBR) [in Section 3.2], Resource Description and Access (RDA) [in Section 3.3], and Semantic Web [in Section 3.4]. We will highlight areas of compatibility as well as areas of incompatibility when known. Then, we will end the paper with some concluding remarks.

2. History and overview of BIBFRAME

Officially established in 2011 by the Library of Congress, the Bibliographic Framework Initiative, or BIBFRAME, is a high-level model designed to facilitate the bibliographic description of information resources as well as the exchange of bibliographic data in the networked environment. In 2012 the Library of Congress contracted Zepheria, a consulting firm that specializes in the deployment of semantic web technologies, to assist with the development of the model. In addition to its work with the Library of Congress, Zepheria has also played, in partnership with Google, Yahoo, and Bing, a key role in the development of Schema.org, a common set of web developer metadata schemas designed to describe websites in support of the indexing efforts of the Internet’s major search engines. Over its brief history, BIBFRAME has produced and published a vocabulary for the model, a number of discussion papers related to the vocabulary or other aspects of BIBFRAME implementation, and tools for data conversion.

In its essence, BIBFRAME is an entity-relation model similar to the model put forth in the Functional Requirements for Bibliographic Description. As such, it consists of entities and attributes designed for the description of resources typically managed by cultural heritage institutions. As a result of this entity-relation model, BIBFRAME emphasizes its focus on capturing data elements relevant to bibliographic description, such as title, author, publisher, etc., instead of the creation of complete bibliographic records, which has historically been the focus of the library community. In this way, BIBFRAME establishes a framework for bibliographic description that clearly separates information related to the intellectual contents of resources from their physical properties.

Within this entity-relation model, BIBFRAME is further modeled within RDF/XML in order to bring the model in-line with Semantic Web principles. The use of RDF/XML allows users of the model to identify entities and to describe the relationships between them more clearly and completely. Moreover, it permits these relationships be processed more easily by machines, making library data more conducive to the Web environment. In other words, it allows library data to be found more easily by Internet search engines and, by extension, users. At the heart of this development is the use of Universal Resource Identifiers, or URIs, to name entities and data values, instead of text strings. Thus, the entire BIBFRAME vocabulary of entities and properties has been rendered in URI form.

In summary, BIBFRAME utilizes Web architecture for the description, maintenance, and exchange of bibliographic data in order to accomplish three primary goals [4]:

Differentiate clearly between conceptual content and its physical manifestation(s) (e.g., works and instances).

Focus on unambiguously identifying information entities.

Leverage and expose relationships between and among entities.

2.1 The BIBFRAME model

The newest BIBFRAME model, version 2.0, consists of three core class entities [5, 6]. These are defined below:

Work: “a resource reflecting a conceptual essence of the cataloged resource” [5]

Instance: “a material embodiment of a work” [5]

Item: “an actual copy (physical or electronic) of an instance” [5].

As these entities and their definitions make clear, BIBFRAME, like FRBR, separates the intellectual content of a resource (creative work) from its physical realization (instance). However, instead of FRBR’s four entity classes (work, expression, manifestation, and item), BIBFRAME models only three. Thus, although BIBFRAME and FRBR are conceptually related, it appears that BIBFRAME has simplified the number of entity classes required for bibliographic description.

Below (Figure 1) is a graphical depiction of the BIBFRAME model that highlights the relationships between these core entities.

While presenting the evolution of the latest version of BIBFRAME 2.0 from the previous version, McCallum reports the participation of vendors in linked data: “Another major step is now beginning to happen as the vendors who supply many of the services in the community have started to explore linked data, and they are the community’s essential innovators” [7, p. 84].

BIBFRAME offers a significant amount of flexibility with resource description. However, per the BIBFRAME documentation, other relationships can also be described. Namely, works can be related to works, instances to instances, works to instances, and instances to works [8]. Beyond the main classes of entities, BIBFRAME also includes a number of properties that are related to each entity. For instance, the creative Work class contains properties that, as one researcher notes, reflect traditional bibliographic elements such as title, creator, language, etc. [9] as well as specific resource Work types that can be used to increase the granularity of a work’s description. These properties include resource-type concepts like audio, text, and movingimage.

The instance class contains properties which serve to describe the physical “embodiment” of resources. These properties include terms that overlap with those of the work class such as title and creator, as well as those that describe the aspects of a resource at the manifestation level, such as publisher [9]. Although there is overlap in terminology between the work and instance class, the modeling of these properties in RDF/XML serves as a means to disambiguate terms with the same name through the assignment of a specific URI. Thus, despite identical text names, the use of URIs serves to identify properties within their specific classes.

To put it plainly, BIBFRAME attempts to be content standard and model agnostic. Its framework is intended to be flexible enough to accommodate existing models (FRBR, MARC, etc.) and content standards (RDA, VRA, DACS) as well as models and standards that have yet to be developed. Thus, it appears that BIBFRAME appears to be poised to provide the library community with a new model of bibliographic description and exchange that takes full advantage of the Web as architecture. Furthermore, the model also promises to make library data more visible on the Web, not only to the benefit of users looking for library resources but also for re-use in contexts outside of the library community. Finally, it appears that BIBFRAME will permit the full description of relationships between and among resources, enhancing user experience of library information.

2.2 BIBFRAME profiles

It is worth noting that the high degree of flexibility and extensibility built into the model comes with a cost. The under-specification of the model, which is what lends it flexibility, means that there are no built in mechanisms within the model or its RDF schemas that guide and constrain the generation of BIBFRAME data [10]. Nevertheless, the initiative proposes the use of BIBFRAME profiles to address this issue. A BIBFRAME profile can be understood as “a document, or set of documents, that puts a Profile (e.g. local cataloging practices) into a broader context of functional requirements, domain models, guidelines on syntax and usage, and possibly data formats” [10]. In other words, a BIBFRAME Profile serves as a kind template for the generation of BIBFRAME descriptions through the establishment of metadata structure and value constraints. BIBFRAME data can be validated against relevant profiles in order to ensure conformance to an established metadata structure.

However, it should be noted that BIBFRAME profiles exist externally to the model and must be developed within the context of local needs and practices, likely within an application used by cataloguers to capture bibliographic data. In other words, a BIBFRAME profile matches the metadata structures needed within a given context. As long as the overall structure of the data conforms to the BIBFRAME model, then that data should remain interoperable on the Web. Thus, it appears that the initiative is attempting to balance the need for a flexible structure within the model itself and the need to contain that flexibility within a viable framework that can produce consistent and reliable data at the local level.

The study in [11] compares locally created Dublin Core metadata scheme-based application profiles from a number of institutions and digital projects (n = 8). The results of the study present the commonalities and variations of locally developed application profiles and shed light on the effects of resource type and subject domain on naming conventions. The experiences and lessons drawn from the implementation processes of locally developed metadata application profiles are invaluable in the sense that they offer insights and efficient mechanisms for metadata planning and reuse. Thus, the study may shed light on the development of BIBFRAME application profiles in local practice settings.

3. Relationship of BIBFRAME to prevailing content standards and models

It is the intention of the BIBFRAME initiative to design the model in such a way that it not only can serve as the standard encoding and interchange format of bibliographic data within the library community but also to be a model for integrating library data within the Web environment more generally. As such, the model is designed with a high degree of flexibility in the hope that it can accommodate any number of existing models as well as models yet to be developed. Put simply, the model’s flexibility is intended to foster extensibility. The following sections will discuss the relationship of BIBFRAME to the prevailing content standards and models employed by cultural heritage institutions, or those in the process of being adopted by cultural heritage institutions, in an effort to determine the degree to which BIBFRAME, as it is currently understood, can be a viable and extensible framework for bibliographic description and exchange in the Web environment.

3.1 Machine readable cataloging (MARC)

BIBFRAME is intended to replace MARC as the encoding and exchange format for the bibliographic data produced by the library community. But why? What is it about MARC’s design that requires the format to be replaced?

First of all, the design of MARC can perhaps be best understood as an exchange format which emphasizes the display of bibliographic information about specific library holdings within electronic catalogs. As a result of this emphasis, MARC records can be conceived as aggregates of information that include descriptions of both the conceptual essence of resources as well as aspects of their physicality [4]. These aggregates are realized in the cataloging process through the application of content standards such as AACR2 and now RDA and are captured, for the most part, in a series of tagged literals or tagged text strings. Ultimately, the overarching structure of MARC records and the content rules used to realize them serve as means to display bibliographic data in much the same way as the physical card catalogs which were its predecessor [1]. MARC’s design has served the library community well over the years and has, as the Library of Congress points out in their introductory paper on the BIBFRAME model, allowed librarians to accomplish three important bibliographic tasks [4]:

To capture information about the intellectual essence of resources

To capture information on the physical aspects of resources

To capture information about the management of resources such as control numbers and record handling codes

However, within the current context of the Web environment coupled with the increased processing capabilities of modern computers and applications, MARC’s design presents the library community with a number of structural difficulties that limit the potential uses of bibliographic data. First of all, MARC’s reliance on the use of literals as identifiers for resources and the elements that compose bibliographic records limits the ability of machines to process MARC information [4]. As a result, variations or equivalences of literals are difficult for machines to parse. Secondly, MARC does not separate information regarding the intellectual content of a resource and its physical carrier clearly enough [4]. Even with adjustments to MARC, such as those included in RDA, an FRBR-based content standard that makes a clearer distinction between the content and carrier, the very format of MARC will not allow machines to utilize it fully [12]. Thirdly, the structure of MARC records, although information rich, are poor at expressing relationships between bibliographic elements in ways that machines can easily understand [13]. Again, even with adjustments to MARC, such as MARC/XML, a serialization intended to increase the machine readability of MARC records, the use of content standards like AACR2 which were developed primarily with display issues in mind prevents the processing of MARC data significantly [14]. Ultimately, this means that library data is unable to interact with the vast majority of computer applications automatically, limiting the exposure of bibliographic data on the Web, preventing the rich relationships between data elements from being realized and effectively hiding bibliographic information from online users.

BIBFRAME is designed to address these issues. To begin, as one researcher notes, BIBFRAME is not only designed to replace MARC as an encoding and exchange format but to offer a complete re-conception of bibliographic description itself, one that is in-line with the capabilities of the Web environment [15]. BIBFRAME accomplishes this in a number of ways. First, BIBFRAME replaces the idea of the catalog record with the notion that a resource is defined by a discrete series of bibliographic elements. These elements clearly distinguish between the intellectual content of a resource, its physical carrier, and the various entities responsible for its production. Freed from the record as a bundle of data elements, the individual elements are better able to interact in computer applications, and the cataloguer is better able to describe relationships between elements. Secondly, text strings or literals are replaced by URIs or Universal Resource Identifiers. By using URIs to identify bibliographic elements and their values, machines are better able to process the bibliographic information and to utilize the relationships described between them. These two elements, when built upon a Web-based architecture and serialized in RDF/XML, permit BIBFRAME bibliographic data to interact more freely on the Web.

However, despite these changes and the claim that it is standard agnostic, the BIBFRAME initiative also claims that BIBFRAME will be backwards compatible with MARC, meaning that MARC will be mapped to BIBFRAME in such a way that MARC data can be automatically converted to BIBFRAME data without loss of information. Indeed, the BIBFRAME initiative has already developed tools that are available on its website which can translate MARC data into BIBFRAME 2.0 (Figure 2) [16]. As the relationship between MARC elements and BIBFRAME entities may be complex, may even be many-to-many, as one researcher notes [17], the success of such a mapping remains to be seen.

Figure 2. Screenshot of the BIBFRAME comparison service results page showing MARC data (left) and BIBFRAME RDF/XML data (right) for Terry Flanagan’s Snoopy on wheels.

3.2 Functional requirements for bibliographic records (FRBR)

Published in 1998 by the International Federation of Library Associations (IFLA), the final draft of the Functional Requirements for Bibliographic Records provided a radical re-conception of bibliographic description. In essence, FRBR is an entity-relation model which is composed of four primary classes (work, expression, manifestation, and item) that separate the intellectual content of resources from various aspects of their physical properties, resulting in a new emphasis on the component pieces of bibliographic data rather than the bibliographic record as a whole [15]. As BIBFRAME, with its three primary entity classes (work and instance and tem), is related, at least superficially to FRBR, and considering the likelihood of FRBR’s international acceptance as the standard model of bibliographic description, it is useful to compare the two models to determine the degree of compatibility and potential interoperability.

At least on the surface, BIBFRAME and FRBR appear to be closely related. Both models employ the entity-relation approach to bibliographic description and divide the bibliographic record into component pieces which are attached as attributes to entities. As noted, FRBR defines four primary entities for bibliographic description. These are as follows:

Work: “a distinct intellectual or artistic creation” [18]. As such, a work is abstract, pertaining to the intellectual content of a resource as separate from its physical existence. For example, Shakespeare’s Romeo and Juliet is a work apart from all of the various editions (print and electronic), performances, and films that have embodied it.

Expression: “the intellectual or artistic realization of a work in the form of alpha-numeric, musical, or choreographic notation, sound, image, object, movement, etc., or any combination of such forms” [18]. For example, the English text of Romeo and Juliet, as separate from the various ways is presented in different editions is an expression of the work.

Manifestation: “the physical embodiment of an expression of a work” [18]. For example, the 1998 Signet Classics edition of Romeo and Juliet is a manifestation. In other words, when the expression of a work takes on a physical form, as text, film, sound recording, etc., it becomes a manifestation.

Item: “a single exemplar of a manifestation” [18]. For example, an item is a single copy of the 1998 Signet Classics edition of Romeo and Juliet.

As can be seen, the FRBR main entities represent a hierarchical movement from abstraction to specificity of a particular information resource [17]. In a similar fashion, BIBFRAME is constructed of entities in a hierarchical fashion, but instead of FRBR’s four levels, BIBFRAME defines three [4]:

Work: “a resource reflecting a conceptual essence of the cataloged resource”

Instance: “a material embodiment of a work”

Item: “an actual copy (physical or electronic) of an instance”

Thus, although BIBFRAME only uses three main entity classes, there is still the same movement from abstraction to specificity as represented in the FRBR hierarchy. Nevertheless, the lack of conformance to the FRBR hierarchy has resulted in much discussion, and, perhaps, even some confusion about how BIBFRAME relates to FRBR. For instance, there appears to be some disagreement in the literature regarding the exact relationship between BIBFRAME and FRBR entities, especially with regard to how the BIBFRAME entities may represent conflations of FRBR entities. Although a number of researchers espouse a correspondence between the BIBFRAME work entity and the FRBR entities work and expression [13, 15, 16, 19], at least one researcher sees a correspondence only between BIBFRAME Work and FRBR Work [20]. Similarly, it appears that most researchers see a correspondence between BIBFRAME instance and FRBR manifestation entities [13, 15, 19], while others see a correspondence between BIBFRAME instance and FRBR manifestation and expression [20].

Perhaps some of the difficulty of mapping BIBFRAME to FRBR lies in the basic ambiguity of the meaning of the respective concepts. For instance, as is noted by IFLA, the FRBR concept of work is an abstraction, meaning that it is hard to define its “precise boundaries” and that the divisions between works and between works and expressions may in fact be culturally dependent [18]. Furthermore, as other researchers have noted, efforts at operationalizing the concept of work have led to at least two different conceptions of the concept. For instance, some have argued that a work can be conceived as the intellectual content of an endeavor with no “assumptions about how it is physically realized,” while, from a different point of view, a work can be conceived as the sum of all common attributes (author, title, etc.) from a set of manifestations [17]. Perhaps complicating the matter is fact that neither BIBFRAME’s nor FRBR’s hierarchy constitutes a definable bibliographic whole. For instance, although FRBR’s entities are organized hierarchically, and are often pictured within a box, there is no single concept to which this hierarchy relates [19]. The need for a kind of super-entity has been noted well in the literature [19]. It would seem that these questions regarding FRBR are equally applicable to BIBFRAME since BIBFRAME does not include a super-entity that encapsulates the work and instance entities. Thus, it appears that there may still be some serious conceptual difficulties that need to be overcome if BIBFRAME, as an entity-relation model, is to be a viable framework for bibliographic description.

Nevertheless, because BIBFRAME appears to be a simplified version of FRBR, perhaps some of the conceptual difficulties regarding FRBR will not negatively affect BIBFRAME as much. For instance, perhaps BIBFRAME’s conflation of FRBR’s work and expression concepts is useful since it is sometimes difficult to determine the boundaries between a work and its expression. However, since the BIBFRAME initiative has suggested that its model is agnostic, meaning that it can be applied to any model, it must be able to be mapped clearly to other models if it is to foster interoperability. Yet, as one researcher notes, to make the model completely agnostic may be unrealistic, since to be perfectly interoperable, both models require almost equivalent semantics and granularity, a situation which would suggest the redundancy of one of the models [2]. This does not seem to be the case between FRBR and BIBFRAME, which means that the initiative may need to re-examine the possibilities of BIBFRAME working with other models.

3.3 Resource description and access (RDA)

BIBFRAME is designed to be content standard agnostic, meaning that the model does not include requirements or specifications for the use of any particular content standard for bibliographic description. In fact, per the initiative, BIBFRAME is intentionally underspecified so that any content standard may be applied successfully within the context of the model, including those that have yet to be developed [4]. Thus, this intentional under-specification is designed to maximize the extensibility of the model and to help ensure its usefulness in a wide range of extant and future information management contexts and use scenarios, as well as for the widest variety of current and future resource types [4].

However, since the BIBFRAME initiative has positioned the model to be the replacement for MARC as the primary method of bibliographic description and data exchange between libraries, the initiative is doing more than simply ensuring the openness of the model to accommodate RDA and other content standards. Per the initiative, the designers are planning on taking an active look at the elements in RDA and other content standards, including the Anglo-American Cataloging Rules, Second Edition (AACR2). As a number of researchers have noted, it appears that BIBFRAME is also being designed to specifically accommodate RDA [1, 13, 20], which suggests that this particular content standard may be playing a stronger role in the design of the model than may have been suggested initially. As BIBFRAME is still under development, it remains to be seen exactly to what degree RDA plays a role in the design of the model and what effects this might have on the model’s extensibility.

Nevertheless, BIBFRAME designers suggest that the use of profiles will be another way to accommodate a variety of content standards within the model. A BIBFRAME profile is “a document, or set of documents, that puts a Profile (e.g., local cataloguing practices) into a broader context of functional requirements, domain models, guidelines on syntax and usage, and possibly data formats” [10]. According to the initiative, such profiles can be used to define constraints in the creation of BIBFRAME records such as those required by any content standard, including RDA.

As other researchers have noted, RDA may not have gone far enough in distinguishing the content from the carrier of information resources [1, 14]. This potential fundamental flaw in the content standard may pose further difficulties in mapping RDA to BIBFRAME. Such difficulties are presented in the study [21] which shows the uneven mapping between existing RDA classes and BIBFRAME 2.0— particularly the RDA Expression class. The study demonstrates many-to-many relationships in the mapping between RDA and BIBFRAME. Nevertheless, as BIBFRAME is in a relatively early stage of development, the nature and magnitude of these difficulties remain to be seen.

3.4 Semantic web

The current Web environment is structured in such a way that machines, and thus users, are unable to take full advantage of the links that are established among and between resources. In other words, the Web is an environment composed of Web pages and hypertext links that do not describe the nature of the links that connect pages together nor the nature of the data (content) contained in Web pages. In other words, as many researchers note, the current web is a “Web of Documents” versus a “Web of Data” [22, 23]. As a result, current search mechanisms, such as the major search engines, are limited in their ability to utilize information on the Web, relying almost solely on harvesting algorithms to index the content of Web pages and then to match this indexed information against the search terms entered by users. While, as one researcher notes, this method has served the Web well, permitting users to locate needed resources within the vast sea of online information, it lacks the ability to lead users to related content, even when complex and intelligent relevancy algorithms are employed [14]. Furthermore, within the context of the library community, it means that most library data remains relatively difficult to locate online and relatively static with regard to other online resources relevant to library holdings. In other words, library data, in its current form, remains in the proverbial silo of its online catalogs.

However, through the employment of Semantic Web technologies, there is the potential to expand the uses of library data in the Web environment and thereby to enhance user experience of this data. As is commonly the current case on the Web, a typical hyperlink connects resources but the nature of the connection remains unexplained. However, through the use of Semantic Web and Linked Data principles, such as the use of URIs to identify resources and the embedding of URIs in RDF statements, the nature of these connections can be exposed. In this scenario, a hyperlink can then be defined in almost any way that the user can imagine, indicating the link points to a reference, an author, a subject an authority, etc. Machines can then use this data to “infer” other resources that have been described similarly, such as resources with the same subject heading as the one in question, and permit users to explore these relationships more readily.

At the heart of the Semantic Web are four principles that Tim Berners-Lee, inventor of the World Wide Web and founder and director of the W3C, set forth in his paper entitled “Linked Data” [24]. These principles define the nature of Linked Data as it can be implemented in the current Web environment. Furthermore, they serve as a framework and guide for those interested in making their Web content viable within the Semantic Web, as some conformance to a standard model is required for successful implementation. These principles are as follows:

Use URIs as names for things [24].

Use Hypertext Transfer Protocol (HTTP) URIs so that people can look up those names [24].

When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) [24].

Include links to other URIs, so that they can discover more things [24].

Perhaps most significantly, the conception of Linked Data requires the use of URIs to identify resources or, more specifically, the data elements of resources (Principle 1). In other words, as was mentioned in the discussion on MARC above, the use of text strings to identify resources makes machine processing difficult. The shift to URIs as identifiers means that machines can better understand the identity of resources, especially if they are known by different names or to disambiguate different resources known by the same name. Furthermore, the shift to URIs also signals the shift in understanding in regards to the nature of information resources as described in the above FRBR section. It emphasizes the identification of discrete data elements within information resources versus the identification of the resource as a whole. In other words, it emphasizes the atomization of resources into their relevant components.

Principle 2 emphasizes the need for a common schema for the definition of URIs. Since HTTP is already the foundation of data transfer on the Web and since it appears to be serving its function well, Berners-Lee suggests that using this common protocol for the definition of URIs will increase the usefulness of data described in Semantic Web compliant ways. Furthermore, as the BIBFRAME initiative notes, these URI schemes should not be obscure, even if they are represented in HTTP, in order to facilitate data interaction and reuse [4].

Principle 3 emphasizes the need for a common framework for the exchange of information described with URIs. Typically this means the use of RDF for the modeling of data, which, as the BIBFRAME initiative notes, is the most common framework within the LOD community [4]. As a conceptual framework for representing resources on the Web [15], RDF can be understood as a kind of syntax for structuring data in such a way that it fosters the machine readability of that data through the use of URIs and the delineation of relationships between data elements. RDF is typically rendered in XML, but other languages, such as N3, Turtle, and N-Triples, are also used [22]. In its basic format RDF consists of statements, called triples, which, like sentences, contain subjects, predicates, and objects. A basic RDF statement might read as “Book A (subject)—Written By (predicate)—Author A (object),” where Book A, Written By, and Author A are all identified by URIs, with the possible exception of the object, which could be populated with a text string [22]. The power of this model is that the type of relationships between resources (Book A and Author A) is defined (Written By). Figure 3 illustrates this statement graphically. Thus, as a result of delineating relationships between data elements, tools called “reasoners” can make inferences about the data [19].

Figure 3. Graphical depiction of a basic RDF statement.

A reasoner is a software application that can make logical inferences based on a set of statements, or axioms, provided to it through queries. Although there are many query languages that can be used to access and manipulate data modeled in RDF, the SPARQL Protocol and RDF Query Language (SPARQL) has emerged as the most popular [23]. For instance, a reasoner, beginning with a SPARQL query to a database that contained the above RDF statement, could use that statement to make inferences about other books written by Author A and present those to users without the user specifically querying the system to do so (Figure 4). Furthermore, there are no restrictions on the number of RDF triples that can be created for a particular resource, which fosters the development of rich data graphs, or the decentralized interconnections between data elements, within the Web environment. Although RDF is not a data format, but a model for representing data elements on the Web, it has been serialized in a number of ways. For instance, BIBFRAME has been modeled in RDF/XML, but other languages, like N-Triples, ATOM, and JSON, also exist. Although BIBFRAME has been modeled in RDF/XML, the Initiative claims that any data format that conforms to the standard model of URIs embedded in triples should be compliant with the BIBFRAME model [4].

Figure 4. Graphical depiction of a reasoner using RDF statements to infer additional resources.

Principle 4 encourages broad use of the connections established through the first three principles [4]. Thus, data that has been described in conformance with the above principles can be considered Linked Data and Semantic Web compliant. However, if the URIs expose, point to, or otherwise include information that is made freely available for reuse on the Web, such as through a Creative Commons license, this data can be considered Linked Open Data, not just Linked Data.

As stated earlier, a number of prominent libraries have published library data in compliance with Semantic Web principles [2]. Even though these projects are not BIBFRAME projects, they are generally in-line with FRBR principles of bibliographic description. It is worth examining the degree to which the model conforms to the current understanding of Linked Data and the Semantic Web. To begin, BIBFRAME has defined URIs for all BIBFRAME entities and properties within the BIBFRAME namespace. This is particularly important as some properties that belong to different classes have identical names. The use of URIs serves as a clear means to disambiguate these properties. Secondly, as has been noted, BIBFRAME has been modeled in RDF/XML [25].

In addition to these two factors, the BIBFRAME model, like FRBR, deconstructs bibliographic records into their component pieces through the entity-relation conception of bibliographic description. Taken together, these elements suggest that BIBFRAME conforms well to the current understanding of Linked Data and the Semantic Web. Furthermore, even though the initiative has rendered the model in RDF/XML, BIBFRAME is also designed to be compliant with other data formats which conform to the structured use of URIs within syntax of triples statements. Thus, it also appears that BIBFRAME is, at least in principle, poised to integrate library data with other data produced within contexts outside the library community. This aspect too suggests that BIBFRAME is Semantic Web friendly.

4. Discussion

There are challenges that may hinder the widespread adoption of BIBFRAME within the library community. In addition to the modeling difficulties and potential conceptual misalignment of BIBFRAME in relation to MARC, FRBR, RDA, Linked Data, and RDF, there are difficulties posed by complex resource types such as audiovisual materials, manuscript, and serial publications [26]. Additionally, although MARC is in essence an exchange format for bibliographic data, it has become so intertwined with the content standards applied to it, first AACR2 and now RDA; this union of the two may further entrench it within the library community. Without consensus regarding the fate of MARC, it may be difficult to persuade MARC’s adherents, even if BIBFRAME proves to offer more capabilities to catalogers.

There may be significant conceptual difficulties with mapping RDA to BIBFRAME. For instance, RDA was developed within the context of the FRBR entity-relationship model. As such, RDA separates resources into FRBR’s four main entity classes: Work, Expression, Manifestation and Item. However, as has already been noted, BIBFRAME’s main entity classes do not align with FRBR’s classes in an exact manner [20]. This lack of alignment may make the mapping between RDA and BIBFRAME difficult.

Although it appears that BIBFRAME conforms to current conceptions of Linked Data and the Semantic Web, there are still a number of issues worth considering. First, since the usefulness of the relationships delineated through the RDF triples depends on the quality and stability of the resources to which they are linked, the BIBFRAME initiative will have to determine the degree to which it will maintain its own controlled vocabularies and ontologies versus relying on others to do so. Ontologies suitable for the Linked Data environment are taxonomies and thesauri that meet the W3C Web Ontology Language (OWL) standard [22]. For example, the Library of Congress Subject Headings modeled in the Simple Knowledge Organization System (SKOS) framework is an OWL-compliant ontology.

The existence of high-quality, stable ontologies is particularly a relevant concern with regard to the use and reuse of Linked Open Data resources. For instance, as one researcher notes, many LOD ontologies and vocabularies are developed in the context of research projects, which means that for a particular moment they may be up-to-date, accurate, and in compliance with current standards, though it does not ensure continued governance and maintenance [12]. Thus, the reliance on such vocabularies could present the threat of obsolescence should governing bodies discontinue their activities. Thus, it appears that BIBFRAME will need to assess the stability of ontologies and vocabularies, such as those for resource type, and determine if it is better to develop and maintain its own within the BIBFRAME namespace or to link to resources outside the initiative.

Secondly, although BIBFRAME claims that the model should be interoperable with any serialization using triples and URIs, the fact that the initiative has serialized the model in RDF/XML may be a limitation. In other words, because the initiative has limited its serialization within a single framework, it may discourage implementation in other formats. As one researcher notes, it may be better for the initiative to provide potential implementers with examples from a number of possible serializations in order to demonstrate the model’s flexibility, extensibility, and potential for interoperability [2].

Thirdly, there may be difficulties with viably implementing the BIBFRAME model which are rooted in the nature of RDF itself. As the study in [19] notes in their comparison of BIBFRAME, FRBR, and RDA, there is nothing in RDF that prevents people from making nonsensical RDF triples. In other words, there are no validation mechanisms for the creation of RDF statements, as there are for well-formed XML or HTML documents. While, as the researchers note, BIBFRAME has proposed the use of profiles in order to establish content rules and constraints on the creation of BIBFRAME records, these do not prevent potential difficulties with the integration of BIBFRAME data elements with data elements modeled in other frameworks such as FRBR.

However, perhaps the biggest threat to BIBFRAME as a mechanism to expose library data in a Semantic Web friendly way lies in the fact that, like the framework itself, the Semantic Web is still under development. For instance, as has been noted in the literature, understanding of what actually constitutes Linked Data is still under debate [19]. Since the very underpinning of the Semantic Web is still in flux, there is a possibility that any operationalization of the concept will change in the future. Thus, if the current methods for creating Linked Data alter significantly in the future, and if data described with current methods cannot be easily translated into the newer modes, then BIBFRAME Linked Data could potentially become obsolete, resulting in the relegation of library data to yet another, but different, silo.

This final point may also be exacerbated by the very fact that BIBFRAME is a model for the description of bibliographic data within the library community itself. For instance, as some researchers have noted, for data to be truly integrated in the Web, what is required is a common model for data description that includes not only bibliographic data but data of all types [2]. In other words, BIBFRAME, as a model for the description of bibliographic data, may not be intuitively understood by others outside the library community, which may result in a lack of implementation and difficulties with the integration of data embedded in other frameworks. This is particularly important as BIBFRAME data is intended for use outside of the library community, especially with regard to the authority data such as controlled subject headings that have been the province of the library community for so long [2, 13]. Thus, while BIBFRAME holds the promise of freeing library data from the silos of online catalogs and to permit library data to interact with data both within and outside the library community, there may still be challenges to overcome in order to optimize these capabilities.

5. Conclusion

It is the intention of the BIBFRAME initiative to design the model in such a way that it not only can serve as the standard encoding and interchange format of bibliographic data within the library community but also be a model for integrating library data within the Web environment more generally. As such, the BIBFRAME model is designed with a high degree of flexibility that can accommodate any number of existing models as well as models yet to be developed within the Web environment. The model’s flexibility is intended to foster extensibility.

However, regarding the model itself, there appears to be a significant need to consider the creation of a super-entity that would encapsulate the work and instance entities. With regard to the cataloging requirements for the description of complex resources such as audiovisual materials and serial publications, the creation of such a super-entity would solve a number of bibliographic description challenges. The existence of a super-entity would permit the description of resources and relationships that are currently difficult to model within the existing framework. Resources that do exhibit intellectual content or that are primarily event based would be easier to depict if such a super-entity was present.

BIBFRAME attempts to be content standard and model agnostic. Its framework is intended to be flexible enough to accommodate existing models. While increasing its extensibility, the framework may also result in an uncertainty of its application in specific cataloging contexts. This too may limit the willingness of the library community to invest in its adoption. Furthermore, even though BIBFRAME’s potential for extensibility is intended to foster its adoption in a wide range of bibliographic contexts and to work equally well for divergent descriptive needs, its ability to accommodate most if not all modeling and content standards currently in use or yet to be invented may be optimistic. In this regard, BIBFRAME’s ability to support widespread interoperability needs to be further addressed.

In this study we discussed the relationship of BIBFRAME to the prevailing content standards and models employed by cultural heritage institutions in order to determine the degree, to which BIBFRAME can be a viable and extensible framework for bibliographic description and exchange in the Web environment. Despite the promise of improved data management, sharing, and usage offered through the BIBFRAME model, there are various challenges that must be overcome for its adoption within the library community. However, if the initiative can overcome what will likely be significant challenges to the implementation of the model, BIBFRAME appears to be poised to become the next standard of bibliographic description and exchange for the library community and beyond. Furthermore, the model also promises to make library data more visible on the Web, not only to the benefit of users looking for library resources but also for reuse in contexts outside of the library community. Finally, it appears that BIBFRAME will permit the full description of relationships between and among resources, enhancing and enriching the user experience of library information.

BIBFRAME TOOLS & RESOURCES

Here are some great online resources to learn about BIBFRAME:

Bibliographic Framework Initiative – Library of Congress website with official BIBFRAME information, specifications, FAQ, tools, news, and more.
BIBFRAME Training at the Library of Congress
BIBFRAME Webcasts & Presentations - Library of Congress
BIBFRAME Editor – Open source editing software downloadable from Github.
BIBFRAME Listserv – Bibliographic Framework Transition Initiative Forum.
BIBFRAME.ORG – Is an index site to BIBFRAME Initiative, Model & Vocabulary, and Implementation and Testing sites.
Bibframe2Schema.org - It is a community initiative with the following initial objectives: The creation of a reference mapping from BIBFRAME 2.0† to Schema.org. The development and sharing of reference software implementation(s) to: Enrich BIBFRAME data with Schema.org Terms, Create Schema.org terms from BIBFRAME 2.0 data
Zepheira – Linked Data and BIBFRAME training from the company which was consulted by the Library of Congress to develop the BIBFRAME specifications.

BIBFRAME SEMANTIC WEB AND LINKED DATA QUIZ

List of questions, answers, and quizzes on BIBFRAME, Semantic Web, and Linked Data from Library and Information Science Questions Answers Quizzes.

The semantic of something is the [(a) structure of something (b) meaning of something (c) appearance of something (d) none of the above]
Who first coined the term "semantic web?" [(a) Larry Page (b) Tim Berners-Lee (c) Sergey Brin (d) Jerry Yang]
The Semantic Web is a web that is able to describe things in a way that computers can [(a) convert (b) not understand (c) understand (d) compile]
What's another name some people use for "semantic web?" [(a) Web 3.0 (b) Web 2.1 (c) Web 2.0]
"SemanticWebVision" is a future where [(a) Web information has exact meaning (b) Computers can integrate information from the web (c) Web information can be understood and processed by computers]
What is Linked Data? [(a) a set of techniques for expressing, exposing, and publishing data (b) a set of best practices for publishing and connecting structured data on the Web (c) using Web technologies to connect data that is related but stored in different locations]

BIBFRAME VIDEOS

Videos about the Bibliographic Framework Initiative of the Library of Congress.

Video 1

Title: BIBFRAME Frequently Asked Questions

Creator: Librarianship Studies & Information Technology.

Original Published Date: May 5, 2020

Runtime: 5 minutes

Video 2

Title: From MARC to BIBFRAME: An Introduction

Creator: ALCTS, American Library Association

Original Published Date: Nov 18, 2015

Runtime: 63 minutes

Video 3

Title: BIBFRAME 2.0, the Library of Congress Pilot & Next Steps

Summary: Beacher Wiggins highlighted the goals and accomplishments of the Library of Congress' BIBFRAME 2.0 pilot to date and gave an overview of the next steps. Nate Trail discussed the development of the BIBFRAME database, a complete base file that includes all the bibliographic records from Voyager. Catalogers use the bibliographic data in combination with authority data from id.loc.gov to describe resources for the BIBFRAME 2.0 pilot. Trail demonstrated features of the BIBFRAME database interface including end-user search capability, use of linked data queries to highlight relationships between resources and how the database interacts with data entered through the BIBFRAME editor. Jodi Williamschen and Les Hawkins discussed profiles developed for the BIBFRAME editor and illustrated workflows such as creation of a new work and instance, adding instance descriptions to an existing work, updating of an initial bibliographic control (IBC) description and creating new expressions relating to an existing work.

Creator: Library of Congress

Original Published Date: Mar 8, 2019

Runtime: 59 minutes

Video 4

Title: 20200212 BIBFRAME Progress at the Library of Congress

Creator: ALCTS, American Library Association

Original Published Date: May 5, 2020

Runtime: 59 minutes

Video 5

Title: Cataloging for the Future FRBR, BIBFRAME & Linked Data

Creator: SEFLIN Training Library SEFLIN

Original Published Date: Aug 16, 2018

Runtime: 124 minutes

Video 6

Title: Beyond MARC: BIBFRAME and the Future of Bibliographic Data

Summary: The Bibliographic Framework Initiative, or BIBFRAME, is intended to provide a replacement to the MARC format as an encoding standard for library catalogs. Its aim is to move library data into a Linked Data format, allowing it to interact with other data on the Web. This session will cover the basics of BIBFRAME, describe what it can provide for users of library catalogs that MARC can’t, and outline what librarians should be aware of regarding this change in the cataloging landscape.

Creator: Reaching Across Illinois Library System

Original Published Date: Nov 14, 2015

Runtime: 48 minutes

Video 7

Title: BIBFRAME Goes International

Summary: BIBFRAME, a Library of Congress data model for bibliographic description designed to replace the MARC standards, is seeing activity around the globe. In this presentation, Library staff review current activity of the BIBFRAME model on the international scene, including how Europe is exploring BIBFRAME, the interest in other parts of the globe and a large new project closer to home. Included are developments in the roles being played in the support of BIBFRAME by the Program for Cooperative Cataloging and the Italian book and metadata supplier, Casalini Libri.

Creator: Library of Congress

Original Published Date: May 24, 2019

Runtime: 61 minutes

Video 8

Title: 20190522 BIBFRAME Implementation

Summary: An ALCTS webinar.

This session will give an overview of the SVDE project and its history, along with the ongoing work of the Transformation Council. A particular focus will be on the development of BIBFRAME through SVDE in relation to other projects, such as initiatives at LC, LD4P, and the PCC.

The SHARE Virtual Discovery Environment (SVDE) project, launched by Casalini Libri in 2017, is a community-driven, vendor-supported initiative to establish an effective working environment for the use of linked data by libraries within a global context with the additional goal to ‘future-proof” descriptive metadata. The vendor-partners, Casalini Libri and @Cult, have created complex processes that enrich and convert MARC library data to BIBFRAME. The converted data are then utilized in a virtual discovery environment based upon the three-layered structure of the BIBFRAME data model (www.share-vde.org). Enriched MARC data along with BIBFRAME triples are returned to each library for local analysis and experimentation.

SVDE has a flexible approach to implementation through incremental change. This development continues to be guided by the 21 member libraries and community experts through the SVDE Transformation Council, and associated subcommittees. The SVDE Transformation Council began work in August 2018 to evaluate the MARC to BIBFRAME conversion from Phase II of the project. The Transformation Council began by reviewing several reports that member libraries created in response to local data conversion review. We based our initial feedback on these documents to articulate how the conversion process meets our expectations and needs. Looking for commonalities among pain points lead us to develop methods to identify, test, and make recommendations for improvement of the conversion process. The first set of recommendations was ready for implementation in early 2019 and has been incorporated in the current conversion specifications, but work is ongoing to refine these specifications and continue improving processes.

Creator: ALCTS, American Library Association

Original Published Date: Nov 7, 20019

Runtime: 57 minutes

Video 9

Title: Transform Your Catalog: One University’s Experience with BIBFRAME-based Discovery

Summary: Stephanie Kaceli, Library Director at Cairn University, will share Cairn’s involvement and testing with Innovative’s new library platform, Inspire Discovery. Then Martha Rice Sanders, Senior Consultant at Innovative, will dig into the choice to use BIBFRAME instead of MARC data to drive Inspire and what that means for libraries in the future.

Creator: CharlestonConference

Original Published Date: Jan 23, 2020

Runtime: 65 minutes

USED FOR

Bibliographic Framework
Bibliographic Framework Initiative

SEE ALSO

REFERENCES

1. Miller, Eric; Uche Ogbuji; Victoria Mueller; Kathy MacDougall (21 November 2012). Bibliographic Framework as a Web of Data: Linked Data Model and Supporting Services (PDF) (Report). Library of Congress. https://www.loc.gov/bibframe/pdf/marcld-report-11-21-2012.pdf (accessed December 10, 2017).

2. BIBFRAME. Wikipedia. https://en.wikipedia.org/wiki/BIBFRAME (accessed December 10, 2017).

3. Library of Congress, "Bibliographic Framework Initiative," https://www.loc.gov/bibframe/ (accessed April 4, 2020).

4. Library of Congress BIBFRAME Manual, https://www.loc.gov/aba/pcc/bibframe/BIBFRAME-Manual-Final-2019-07-12.pdf (accessed May 20, 2020).

5. James M. Day, Library Technology Launchpad, BIBFRAME: Basics and Resources, https://libtechlaunchpad.com/2016/03/21/bibframe-basics-and-resources/ (accessed May 20, 2020).

6. Library of Congress, https://www.loc.gov/aba/pcc/bibframe/BIBFRAME%20paper%2020140501.docx (accessed May 20, 2020).

7. Jung-Ran Park, Andrew Brenza and Lori Richards (April 22nd 2020). BIBFRAME Linked Data: A Conceptual Study on the Prevailing Content Standards and Data Model [Online First], IntechOpen, DOI: 10.5772/intechopen.91849. Available from: https://www.intechopen.com/online-first/bibframe-linked-data-a-conceptual-study-on-the-prevailing-content-standards-and-data-model

LIBRARIANSHIP STUDIES & INFORMATION TECHNOLOGY

Librarianship Studies & Information Technology

BIBFRAME (Bibliographic Framework)

Facebook

Tags

Total Pageviews

Random Posts

Recent in Lists

Popular Posts

Five Laws of Library Science

Best Quotes About Libraries Librarians and Library and Information Science

Cataloging

Menu Footer Widget

LIBRARIANSHIP STUDIES & INFORMATION TECHNOLOGY

Librarianship Studies & Information Technology

BIBFRAME (Bibliographic Framework)

You may like these posts

CONNECT

Facebook

Tags

Total Pageviews

Random Posts

Recent in Lists

Popular Posts

Five Laws of Library Science

Best Quotes About Libraries Librarians and Library and Information Science

Cataloging

Menu Footer Widget