Ronald Snijder

Metadata Standards and Information Analysis

A Survey of Current Metadata Standards and the Underlying Models

2001


1 Introduction

In recent years standards have emerged concerning the description of information resources on the Internet. This thesis surveys several standards: the Dublin Core Metadata Initiative (DCMI), the Warwick Framework and the Resource Description Framework (RDF), and the Digital Object Identifier (DOI). Although created for different purposes by different groups of people, these standards are closely related. This thesis will describe the standards in detail, and explore the relations between them. The list of standards is by no means comprehensive; other standards such as the TEI Header (TEI 2001), MPEG-7 (Martinez 2001), the OCLC/RLG Preservation Metadata standard (OCLC/RLG 2001), or the IEEE standard on learning objects (IEEE 2001) are not discussed.

Furthermore, this thesis surveys the models that can serve as a basis for the metadata standards. These models are the IFLA Functional Requirements for Bibliographic Records, and the INDECS Metadata Framework. The thesis tries to enlighten the differences between the models and hopes to find out whether the models cover all aspects of information resources on a network.

Credit where credit is due

A thesis like this cannot be written without the help and support of a lot of people. First of all I would like to thank dr. Gerhard Riesthuis, for his comments and tough questions. I received additional comments by dr. Maja Zumer. My old friend Diebert Jan van Rhijn introduced me to Trudi Noordermeer MSc, who directed me to the subject of this thesis. My manager Jeroen Liefferink kindly granted me study leave, even in busy times. I owe fellow student and co-worker Tom Hendriks for being my 'partner in crime'. Adriënne Baars-Schuyt MA and Erik Smelt MA volunteered to correct my English. My friends and family helped me a lot, by not minding my absence during this time.

But most of all I would like to thank my wife Dorien. Without her love and patience, this thesis - and all of my studies - would have been completely impossible. This thesis is therefore dedicated to her.

See who's visiting this page.View Page Stats


Table of Contents
 
1 Introduction
2 What is metadata?
3 The Dublin Core Metadata Initiative (DCMI)
   3.1 Introduction to the Dublin Core (DC)
   3.2 Elements of the Dublin Core
   3.3 The principles of the DC
   3.4 The original elements
   3.5 Extending the element set: description of images
   3.6 The Canberra Qualifiers and simplicity
   3.7 Defining unqualified DC and the 1:1 principle
   3.8 Standardization, qualification and the DOI/INDECS project
   3.9 Dublin Core Qualified
   3.10 Special interests: extensions to the Dublin Core
4 The Warwick Framework and the Resource Description Framework
   4.1 Introduction to the Warwick Framework
   4.2 General principles of metadata
   4.3 The Warwick Framework architecture
   4.4 Introduction to the Resource Description Framework
5 The Digital Object Identifier (DOI)
   5.1 Introduction to the Digital Object Identifier
   5.2 The DOI System
6 The Functional Requirements for Bibliographic Records (FRBR) and the INDECS Metadata Framework
   6.1 Introduction to the Functional Requirements for Bibliographic Records (FRBR)
   6.2 Entities of the FRBR
   6.3 FRBR User tasks
   6.4 The FRBR as basis for the Dublin Core and the DOI?
   6.5 Introduction to the INDECS Metadata Framework
   6.6 Entities of the INDECS Metadata Framework
   6.7 The INDECS model and the Functional Requirements
7 Conclusion: towards a theory of metadata?
   7.1 Concepts of creations
   7.2 General principles and concepts of network-metadata
   7.3 Fixation of creations and generated metadata
   7.4 Possible directions for network-metadata standards
Appendix Literature



 
NEXT
TOC
2 What is metadata?

A thesis about metadata standards should contain at least one definition of metadata. Using the literature the following list of definitions can be made:

  • 'Data about data' (Boehm 1999; Butterfield 1995; Daniel and Lagoze 1997; Henze and Schefczik 1997; Lynch 1998; Rust 1998; Weibel 1997)
  • 'A (usually brief) characterization of the individual Information Objects in the collection of a library' (Smith 1996)
  • 'Classifying the content of Web objects' (Marchiori 1998)
  • 'Metadata is data associated with objects which relieves their potential users of having to have full advance knowledge of their existence or characteristics' (Dempsey and Heery 1998)
  • 'The Internet-age term for structured data about data' (EU-NSF 1999)
  • 'A small summary of characteristics of each available resource' (Wood 1999)
  • '[A] relationship that someone claims to exist between two entities' (Rust and Bide 2000)


Almost all definitions mention data - structured or not - that describes something about another data resource. Mostly, the term is used when this data resource resides on the Internet. 
Some definitions seem to use quite a different angle:

  • 'There is no clear line between content and meta-content' (Guha 1996/1997b)
  • 'The lifeblood of commerce systems' (Erickson 1998)
This thesis uses the first definition in the list for metadata: data about data. The metadata will most likely describe a digital information resource, but this is not a requirement. The metadata itself is also likely to be digital.

A subject like metadata standards has many aspects, roughly dividable in theoretical and technical aspects. The theoretical aspects may lead to a better understanding of information resources and the means to make them more useful. Technical aspects are another important part of the standards. The acceptance and implementation of the solutions provided by the standards depend on the way they fit into existing technologies. Creating a solution optimized for the Gopher protocol would not lead to worldwide acceptance, while the use of HTML could. In spite of its importance, this thesis will not focus on technology, but on the concepts on which the standards are based. If the concepts are clear, a rapid change of technology loses some of its impact.


PREV.
NEXT
TOC
3 The Dublin Core Metadata Initiative (DCMI)

3.1 Introduction to the Dublin Core (DC)

The Dublin Core (DC) may be considered to be the best-known metadata project today. From its start in 1995 the DC evolved into 'the leading initiative for improving resource discovery on the Web'. (Weibel 2000) During a series of Metadata Workshops the DC was created and further refined. This process will be discussed in this chapter after an introduction to the Dublin Core element set.

The DC was created as an answer to the question: 'Why is it so difficult to find items of interest on the Internet or the World Wide Web?' (Weibel, Godby et al. 1995) Automatically indexing all information resources on the Internet does not function very well. Searching in very large collections may yield large result sets. If the collection contains resources of different fields of study, the differences in jargon used may cause additional problems. For instance, the term 'period' is used quite differently in mathematics and in geology. Furthermore, resources may not have a description at all except a filename, which may not describe its contents. (Weibel 1995; Weibel, Godby et al. 1995)

The resolution to this problem is found in creating a description for the resource, so a metadata record performs more or less the same function as a catalogue record. Given the enormous amount of resources, it is not possible to create a large and complex description, like a UNIMARC or MARC21 record or a TEI Header1. The number of resources also makes it impossible for trained librarians to create all the descriptions. Therefore, those responsible for the resources should be able to create the description themselves. This requires a simple format, easy to understand and maintain. The description should follow a standard, enabling automated tools to 'harvest' and manipulate the descriptions. 

The description of the information resources is created using Dublin Core elements. An element is a pre-defined string or label, which is paired to a value. In its simplest form it looks like this: 'Author = Ronald Snijder'. A list of fifteen elements is defined, which will be discussed in the next paragraph. Unlike a more formal description method like the AACR2, the use of the Dublin Core is not bound to strict rules. All elements are optional, and may be repeated without any constraint. The syntax of the elements is simple: just the name - or 'Identifier' - of the element, and the value of the element. The value may consist of free text or it may be taken from a standardized resource. The description may reside in a separate file or it may be a part of the information resource itself. While no formal syntax rules are defined, several syntax recommendations have been created for generic text files, HTML and the Resource Description Framework (RDF). These are discussed in (Hillmann 2000). 

If needed, the contents of the element may be refined or further explained by a qualifier. A qualifier gives additional information on the contents of an element, e.g. by telling which controlled vocabulary or which encoding scheme is used. This vocabulary or scheme may be any 'standard list', ranging from the Library of Congress Subject Headings for a subject to an ANSI standard for dates. The next example may enlighten this: 'Date (scheme = ISO 8601) = 2001-03-12'. ISO 8601 is a standard for displaying dates, in the format YYYY-MM-DD (year in four digits, followed by month in two digits, followed by day in two digits). This qualifier makes it clear that the date is March 12, 2001 instead of December 3, 2001. Any qualifier imaginable may be used. However, it should be noted that a list of recommended qualifiers is available in (DCMI 2000). 

The primary goal of the Dublin Core is resource discovery, which was defined in the first Metadata Workshop - held in Dublin in the USA - as the most pressing need, regardless of the subject or the complexity of the information resource. The resources to be discovered and described are of the type 'document-like objects' (DLOs). This restriction was made in the first Workshop, for two reasons. Firstly, the participants of the Workshop felt that most information sought on the Internet is contained in such objects. Secondly, if the standard applied to satisfaction on familiar objects, it could be extended to other types of resources. A DLO is not strictly defined, but explained using examples. An electronic newspaper article is a DLO, while a collection of slides without any description or annotation is not. Generally speaking, the content of this type of information resource is mostly text, and the required metadata is very similar to the description of analogue textual documents.

Restricting the focus in the first Metadata Workshop eliminated the need for a very large set of metadata elements. No elements were created for copyright information, cost or archival status. The Workshop aimed to define a core set of elements, which could be universally applied for resource discovery. The result was the Dublin Metadata Core Element Set, containing thirteen elements. In the third Workshop this set of elements was extended to fifteen. (Weibel and Miller 1997) It was recognized from the start that a list of just fifteen elements might not suffice for all uses. The set therefore may be extended with new elements. Of course, other users of the Dublin Core may not recognize the new elements, but it was believed that any program using this set would at least be able to use the DC elements. 

The DC element set had to be simple, but it should also contain the possibility to 'map' it into other, more complex systems like MARC21. The solution to this problem is twofold: further explanation of elements using qualifiers and through extension of the original element set. Information placed in an element can be refined using a qualifier. For instance, an information resource in Dutch may also have a translated title in English. Because all elements may be repeated, a DC description would contain two Title elements, one containing a Dutch title and one containing an English title. Adding a qualifier to the two elements - like the required MARC21 field (240 for the primary title, 242 for the other title) - simplifies the 'mapping process'. Adding additional metadata elements extends the original element set. A possible extension of the element set could encompass an element for the organization 'holding' the information resource. 

3.2 Elements of the Dublin Core

The Dublin Core consists of the following elements, as copied from (DCMI 1999). The Identifier is used in the DC description, while the Name is used as a 'label'2 describing the element. The comment helps to describe the function of the different elements, and the way they may be used in a Dublin Core description.
 

'Element: Title

Name: Title
Identifier: Title
Definition: A name given to the resource.
Comment: Typically, a Title will be a name by which the resource is formally known.

Element: Creator

Name: Creator
Identifier: Creator
Definition: An entity primarily responsible for making the content of the resource.
Comment: Examples of a Creator include a person, an organisation, or a service. Typically, the name of a Creator should be used to indicate the entity.

Element: Subject

Name: Subject and Keywords
Identifier: Subject
Definition: The topic of the content of the resource.
Comment: Typically, a Subject will be expressed as keywords, key phrases or classification codes that describe a topic of the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme.

Element: Description

Name: Description
Identifier: Description
Definition: An account of the content of the resource.
Comment: Description may include but is not limited to: an abstract, table of contents, reference to a graphical representation of content or a free-text account of the content.

Element: Publisher

Name: Publisher
Identifier: Publisher
Definition: An entity responsible for making the resource available
Comment: Examples of a Publisher include a person, an organisation, or a service. Typically, the name of a Publisher should be used to indicate the entity.

Element: Contributor

Name: Contributor
Identifier: Contributor
Definition: An entity responsible for making contributions to the content of the resource.
Comment: Examples of a Contributor include a person, an organisation, or a service. Typically, the name of a Contributor should be used to indicate the entity.

Element: Date

Name: Date
Identifier: Date
Definition: A date associated with an event in the life cycle of the resource.
Comment: Typically, Date will be associated with the creation or availability of the resource. Recommended best practice for encoding the date value is defined in a profile of ISO 8601 and follows the YYYY-MM-DD format.

Element: Type

Name: Resource Type
Identifier: Type
Definition: The nature or genre of the content of the resource.
Comment: Type includes terms describing general categories, functions, genres, or aggregation levels for content. Recommended best practice is to select a value from a controlled vocabulary (for example, the working draft list of Dublin Core Types). To describe the physical or digital manifestation of the resource, use the FORMAT element.

Element: Format

Name: Format
Identifier: Format
Definition: The physical or digital manifestation of the resource.
Comment: Typically, Format may include the media-type or dimensions of the resource. Format may be used to determine the software, hardware or other equipment needed to display or operate the resource. Examples of dimensions include size and duration. Recommended best practice is to select a value from a controlled vocabulary (for example, the list of Internet Media Types defining computer media formats).

Element: Identifier

Name: Resource Identifier
Identifier: Identifier
Definition: An unambiguous reference to the resource within a given context.
Comment: Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system.  Example formal identification systems include the Uniform Resource Identifier (URI) (including the Uniform Resource Locator (URL)), the Digital Object Identifier (DOI) and the International Standard Book Number (ISBN).

Element: Source

Name: Source
Identifier: Source
Definition: A Reference to a resource from which the present resource is derived.
Comment: The present resource may be derived from the Source resource in whole or in part. Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.

Element: Language

Name: Language
Identifier: Language
Definition: A language of the intellectual content of the resource.
Comment: Recommended best practice for the values of the Language element is defined by RFC 1766, which includes a two-letter Language Code (taken from the ISO 639 standard), followed optionally, by a two-letter Country Code (taken from the ISO 3166 standard). For example, 'en' for English, 'fr' for French, or 'en-uk' for English used in the United Kingdom.

Element: Relation

Name: Relation
Identifier: Relation
Definition: A reference to a related resource.
Comment: Recommended best practice is to reference the resource by means of a string or number conforming to a formal identification system.

Element: Coverage

Name: Coverage
Identifier: Coverage
Definition: The extent or scope of the content of the resource.
Comment: Coverage will typically include spatial location (a place name or geographic coordinates), temporal period (a period label, date, or date range) or jurisdiction (such as a named administrative entity). Recommended best practice is to select a value from a controlled vocabulary (for example, the Thesaurus of Geographic Names) and that, where appropriate, named places or time periods be used in preference to numeric identifiers such as sets of coordinates or date ranges.

Element: Rights

Name: Rights Management
Identifier: Rights
Definition: Information about rights held in and over the resource.
Comment: Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.'


3.3 The principles of the DC

In the first Workshop, several principles for the Dublin Core were defined. All further development of the DC should be governed by these principles in order to keep it small, understandable, and flexible. The principles are intrinsicality, extensibility, syntax-independence, optionality, repeatability, and modifiability.

  • Intrinsicality. 

  • The DC describes properties that are a part of the object itself, and can be discovered 'by having the work in hand'. Examples are the intellectual content or the physical form of the object. Extrinsic data describe the use of the object, for instance access to a resource or the costs associated to it. If it is necessary to describe extrinsic data, this may be achieved by extending the element set.
  • Extensibility. 

  • Extending the DC is not only useful for describing extrinsic data, but also for describing data that is not part of the element set. It gives users the flexibility to use fields, specific to their needs. Furthermore, the specification of the Dublin Core will develop and change. If the original elements are not abandoned while new elements are added, backward compatibility will be maintained.
  • Syntax-Independence.

  • he DC is intended to be used within very different applications, possibly using different technologies. Defining a formal syntax would impede this, limiting the ways to implement the element set. 
  • Optionality.

  • All elements are optional for two reasons. Not all elements may be useful or be known for all kinds of information resources. The second reason is simplicity. A person who is not a professional librarian and to whom not all elements make sense may describe the information resource. This way, the user may restrict himself to the elements best suited for his situation.
  • Repeatability.

  • Every element may be repeated without any constraint. For instance, if the described information resource has ten authors, the DC description will contain ten Author elements.
  • Modifiability.

  • All the elements described in the first Workshop are defined in such a way, that no further explanation is necessary. However, the DC elements should enable different groups of people to describe 'their' information resources. To do so effectively, a qualifier can modify the definition of all elements. The definition cannot change completely, but it can be 'narrowed down'. The use of qualifiers simplifies mapping the DC elements to other information systems and it enables a more precise description of the resource. The Subject element could be used to describe an information resource in several ways:
    • 'Subject = The design of (super) computers'. Here free text is used to describe the subject.
    • 'Subject (scheme=Dewey Decimal System) = 004.251 Supercomputers--systems design'. This element uses a term from a pre-defined list: the Dewey Decimal System.


3.4 The original elements

As described in paragraph 3.1 the Dublin Core element set originally consisted of thirteen elements. Below are the elements, as defined in the first Metadata Workshop. For a detailed discussion of those elements see (Weibel, Godby et al. 1995).

  • Subject. The topic addressed by the work
  • Title. The name of the object
  • Author. The person(s) primarily responsible for the intellectual content of the object
  • Publisher. The agent or agency responsible for making the object available
  • OtherAgent. The person(s), such as editors and transcribers, who have made other significant intellectual contributions to the work
  • Date. The date of publication
  • ObjectType. The genre of the object, such as novel, poem, or dictionary
  • Form. The data representation of the object, such as Postscript file or Windows executable file
  • Identifier. String or number used to uniquely identify the object
  • Relation. Relationship to other objects
  • Source. Objects, either print or electronic, from which this object is derived, if applicable
  • Language. Language of the intellectual content
  • Coverage. The spatial locations and temporal durations characteristic of the object


In the first Workshop, the foundations for the DC were created. The second Workshop enabled the creation of the Warwick Framework. This concept - and its influence on the Resource Description Framework - will be described in chapter 4. Although it is closely related to the DC, this concept is quite different and it should not be described as an integral part of the Dublin Core. The following paragraphs discuss the evolution of the DC as it developed during the Workshops.

3.5 Extending the element set: description of images

The third Metadata Workshop focused on the question of how the DC could be applied to the description of images. (Weibel 1997; Weibel and Miller 1997) Two very interesting results emerged from this Workshop and the discussions following it. Firstly, it was concluded that images are not very different from DLOs. Secondly, two elements were added to the DC element set.

3.5.1 Images vs. Document-Like Objects
In the first Workshop, the concept of the document-like object was launched. During the third Workshop the question whether an image is a document-like object was answered. The implications are clear. If an image were different from a DLO, a different set of elements would be required. Using two different element sets causes new problems. The first is a lack of simplicity. Describing and searching information resources using two metadata standards is more complicated than using one standard. Furthermore, information resources on the Internet usually contain both textual information and - sometimes animated - images. Describing those resources using two standards would not be an easy thing to do.

The outcome of the Workshop was that discovery of images could be enabled using the DC, be it with an extended element set. This implicates that images are DLOs and not something completely different. What defines a document-like object is not its content, but whether or not it is fixed. If the resource shows the same content to all users, it is fixed. This class of objects does not only contain texts and images, but also movies, speeches and other resources. Objects that are non-document-like are able to generate different images for each user. One could think of databases or other applications that create images 'on the fly', based on user input and available data. 

Of course images differ from textual resources. Indexing textual resources is relatively easy, while extracting information from images is not. When using digital images various technical aspects come into play, which are all necessary to display the image correctly. Another aspect is formed by different versions of the same image. These problems were recognized, but not solved in the third Workshop. 

3.5.2 Extension of the Dublin Core
The element set was extended with two new elements: Description and Rights. In the first version of the DC, the content of a resource was described using Subject. In this version of the Dublin Core two fields enable content description: Subject and Description. The Subject field is reserved for a keyword-like description of the content of the resource, while the Description field is to be used for describing the content in more detail. In the case of textual resources, the Description field is intended for abstracts. In the case of graphical resources it may be used for a longer content description.

Intrinsicality is one of the principles that guide the DC. Aspects that are not an integral part of the resource should not be described. Following this principle intellectual property rights may not be considered to be such an aspect, and creating a Rights field is not necessary. In this case pragmatic considerations prevailed. Rights management is of such importance to the use of images, that not deploying a Rights field would be a large handicap. Weibel states it like this: 'Resource description is a messy business - ask any cataloger.' (Weibel and Miller 1997) The field is intended to describe usage restrictions or to contain a link to a description of usage restrictions.

3.6 The Canberra Qualifiers and simplicity

The main subject of the fourth Metadata Workshop was a further definition and structuring of the elements and to define ways to extend the element set. (Weibel, Iannella et al. 1997) Different views on the DC were expressed, and several qualifiers were defined.

3.6.1 Simplicity vs. qualifiers
The discussions on these topics revealed two points of view, best described as the Minimalist and the Structuralist view. From the start the DC was intended to provide a way of describing information resources usable by very different groups of people. Keeping the element set simple enhances the ease of use and the exchange of resource descriptions between different user groups. This simplicity also has a downside since the description cannot be as accurate as a more complex resource description.

From the Minimalist point of view the most important aspect of the DC is its simplicity. If all elements have the same meaning in all communities that use the Dublin Core, creating and exchanging metadata is - relatively - easy. This will also simplify the use of automated tools. Using additional qualifiers to modify the meaning of an element endangers the interoperatibility of the descriptions. The Structuralists emphasize the advantages of - formally defined - qualifiers. Using those qualifiers enables them to create a more precise description, which may better suit the needs of a specific user group. 

Since the definition of the DC element set is based on consensus, a compromise had to be found. The primary goal of resource description is to enable retrieval of information resources. It is assumed that the use of qualifiers improves the retrieval result. Another assumption is that a simple element set is essential. Merging those assumptions leads to the solution of using qualifiers but keeping the number of qualifiers to a minimum.

3.6.2 The Canberra Qualifiers

  • During the Workshop several qualifiers were formalized and encoding DC elements in HTML 2.0 was discussed. The list of qualifiers includes:
  • Language. To specify the language of the descriptor field, not the language of the information resources itself. The following example describes a subject in Dutch: 'Subject (language=nl) = Vergelijking van metadata standaards'.
  • Scheme. To specify a context for the interpretation of description in an element. If the element description is based on a standard, the qualifier describes it. Example: 'Subject (scheme=Dewey Decimal System) = 004.251 Supercomputers--systems design'.
  • Type (sub-element Name). To specify a facet of an element. Using the example of 'Author = Ronald Snijder', the Type of author could be 'personalName' to distinguish between names of persons and names of organizations. This element could be qualified as 'Author (type=personalName) = Ronald Snijder'.


3.7 Defining unqualified DC and the 1:1 principle

The fifth Workshop aimed to create the final definition for the 15 elements known as the 'Finnish Finish'. (Miller, Paul and Gill 1997; Weibel and Hakala 1998) Discussed were the elements Date, Coverage and Relation. This last element plays an important part in the 1:1 Principle that states that every resource should have its own description.

3.7.1 The Date element
This element was firstly defined as the time a resource was made available in the current form for which the description was made. The definition was thought to be too narrow, while other dates regarding the information resource may be equally important. In the Workshop, the definition was extended to a date associated with the creation or the availability of the resource. This definition gives room to different interpretations about what dates are to be included in the element. Some may find this 'unsatisfactory ambiguous', (Weibel and Hakala 1998) although this may also be interpreted as flexibility. The Workshop was also a starting point for the creation of Date types to be used in a qualified version of the DC.

3.7.2 The Coverage element
The definition of this element was not a simple task, either. From the first Workshop on consensus on this element was not easily found. Finally, it was agreed that the element should enable searches on the space - in every sense of the word - covered by the resource or a period of time. By using this element as described the DC aimed to simplify searching for local resources.

3.7.3 The Relation element and the 1:1 Principle
Describing information resources derived from other resources - one could think of an image of a painting - is not always an easy task. Should the description only cover the information resource itself or should the content also be described? For example, if a person creates a digital image of the painting Guernica by Pablo Picasso and publishes it on the Internet, what should be the content of the field Creator? The photographer or Picasso?

The solution to this problem was found in the 1:1 Principle, which states that every information resource should be described separately and every metadata description should describe just one information resource. It is recommended to link these descriptions by using the Relation element. In the unqualified version of the DC no rules are given for the description of the type of relation or how to identify the related information resource. Using the example of the previous paragraph two descriptions should be created, one for Picasso's painting and one for the digital image.

3.8 Standardization, qualification and the DOI/INDECS project

The sixth Workshop resulted in several developments that may be considered crucial to the DC. (Weibel 1999; Werf 1999) Firstly, a process of standardization was started resulting in a more formal organization and the creation of two standards: DC version 1.0 (DCMI 1998) and DC version 1.1 (DCMI 1999). The second development encompassed the development of qualification mechanisms. Guidelines were developed for the use of qualifiers. Thirdly, cooperation was started with representatives of the DOI/INDECS project. This cooperation could have had far reaching consequences, as will be described in the last chapter.

3.8.1 Standardization of the DC
A rather informal process guided the creation and the evolution of the DC. As the number of interested parties increased, a need for a more formal forum was felt as well. This resulted in the establishment of the Dublin Core Directorate in 1998, as well as several committees and working groups. From this point onwards, the formal name used is the Dublin Core Metadata Initiative. Furthermore, a Request for Comments (RFC) was submitted to the Internet Engineering Task Force (IETF). This RFC 2413 functions as an 'Internet standard'. (DCMI 1998) Secondly, this document was slightly rewritten - resulting in (DCMI 1999) - and submitted to NISO (National Information Standards Organization) and CEN (Center for European Normalization). The encoding of DC elements in HTML was also formalized.

3.8.2 Qualification mechanisms
As the development of the 'unqualified' version of the DC was completed more attention was given to create a mechanism for extending the element set with qualifiers. Several reasons support the extension. Firstly, using a classification or controlled vocabulary enables a more precise description and may enhance retrieval results. For example, using free text in the Subject element may result in 'Subject = The design of (super) computers'. Using a qualifier results in a more precise description, such as 'Subject (scheme=Dewey Decimal System) = 004.251 Supercomputers--systems design'. Secondly, the content of DC elements is far less ambiguous if encoding rules are applied to it. An obvious example is the notation of dates. Furthermore, a standard 'substructure' for different elements could be created. For example, a Creator element could just contain a name or it could contain formalized name and address data. This is further explained in paragraph 4.4.2. Principles of the Resource Description Framework. The last reason is authority control. If applied to certain elements, the values could be uniquely identified. For a discussion on this topic, see also (Vellucci 2000).

As can be concluded from the examples above the qualifiers used may be very different. To ensure a minimum level of interoperatibility restrictions are needed. It is advisable to use qualification schemes that are maintained by external parties, for example, the Dewey Decimal System or the Medical Subject Headings (MeSH). Even so, different user groups may use different schemes. This affects the interoperatibility in two ways. Firstly, it affects the exchange of metadata and secondly, it affects the search in metadata records. If different groups want to exchange metadata records a very strict use of qualifier schemes is compulsory, ensuring that all records are truly 'interchangeable'. Searching metadata records from different user groups requires a less strict protocol. If the application used to search the data is able to use the different DC elements, a search may still produce useful results. Of course, if the application is able to use the qualification scheme the results may be better. Using standardized qualification schemes may improve the search results.

Another - pragmatic - restriction to the use of qualifiers is the Dumb-Down Principle: one should be able to ignore all qualifiers and use the description as if it were unqualified. (Baker 2000; Werf 1999) Ignoring the qualifier results in loss of precision, but resource discovery - the main function of the DC - is still possible. While the possible list of qualifiers is endless, the list of DC elements is not. Therefore, automated agents working with the Dublin Core are to be expected to at least recognize the elements. So, if a qualifier is not recognized, the Dumb Down Principle ensures that the information retrieved is still usable and that resource discovery is possible. The use of qualifiers and the Dumb-Down Principle is discussed in detail in (Lagoze 2001).

3.8.3 The DOI/INDECS project and the Dublin Core
Representatives of several metadata projects were invited to the sixth Workshop. One of these projects is the DOI/INDECS project. The Digital Object Identifier project has its roots in the content industries (publishing, music, visual arts), and aims to create a unique identifier for all sources that are subject to intellectual property rights. Identifying the sources enables rights owners to manage these rights and make transactions in an online environment (e-trade). This project will be described in more detail in chapter 5. The goal of the Interoperatibility of Data in E-commerce Systems (INDECS) project was to define a data model to support this. A more detailed description can be found in chapter 6.

It was recognized that cooperation would benefit both the DC project and the DOI/INDECS project. Although the projects have a different focus - resource description vs. managing inteellectual property rights - the underlying data model could very well be the same. A common data model would enable the exchange of metadata, and support the goals of both user groups. One of the consequences might have been the creation of a new version of the Dublin Core - DC 2.0 - but the associated Working Group was deactivated in October 2000, apparently without achieving results. 

3.9 Dublin Core Qualified

The focus of the seventh Workshop was on qualifiers. (Peereboom 2000; Weibel 2000) One of the results was a common set of qualifiers useful to all users of the DC. Furthermore, work started on a metadata registry designed to register and share local extensions of the DC.

3.9.1 Common qualifiers
As mentioned in paragraph 3.6.2 a restricted use of qualifiers enhances both the precision of the description and the interoperatibility between different DC applications. One way to enable this is the definition of a common set of qualifiers, to be found in (DCMI 2000). The qualifiers used are classified in qualifiers refining the meaning of an element, and in qualifiers to identify an encoding scheme. The latter include classification or controlled vocabularies and rules on formal notations, for instance on the interpretation of dates.

3.9.2 Extending the Core
One of the basic principles of the DC is extensibility. The core element set could never cover all relevant aspects of an information resource within a certain user group. Several user groups created local extensions to the element set. To support the extensions, work on a metadata registry started. Extensions to the core can be registered and shared amongst user groups. This could prevent duplication and enhance interoperatibility. 

3.10 Special interests: extensions to the Dublin Core

The main topic of the eighth Workshop may be described as extensions for special interests. (Weibel and Koch 2000) As a result of this Workshop several developments arose, which will be discussed in this paragraph. These include application profiles and the creation of special interest groups.

3.10.1 Application profiles
In (Heery and Patel 2000) the idea of an application profile is discussed. An application profile is defined as a metadata scheme created for a specific purpose. It consists of data elements drawn from one or more standard metadata schemes (called 'namespaces'). A customized metadata scheme is created, but all its elements are defined elsewhere. The creation of standardized schemes is done by groups of people called 'Standard makers'. Examples would be the DCMI organization or the International DOI Federation. Another group of people is responsible for the use of metadata in a specific environment: the 'Implementers'.

By creating an application profile the Implementers are able to declare how they use standard schemes. An application profile may be shared with other groups with similar interests, and thus interoperatibility of metadata is enhanced. The following principles apply to application profiles:

  • When drawing on one or more existing namespaces only elements of existing metadata schemes are allowed. A namespace is defined as a complete metadata scheme.
  • No introduction of new data elements is allowed. If the Implementer needs to use elements that are not defined by any existing namespace, the Implementer must create - and maintain - a new workspace for these elements.
  • Permitted schemes and values may be specified. The application profile may state the data format to use, for instance for dates or names. A controlled vocabulary with allowable values may also be declared.
  • Standard definitions may be refined. The definition of metadata elements used in the namespaces may be refined. The application profile may not expand the definition, but only narrow it down.


Like the Warwick Framework - discussed in the next chapter - this is a theoretic concept developed to make an attempt to place the common practice of extending metadata schemes into a framework which is consistent with the principles guiding the Dublin Core. At this stage, it is not clear in what form this concept can be implemented.

3.10.2 Special interest groups
As the core element set was already fixed, the attention focused on ways to extend it to satisfy the needs of specialized groups of users. To support the extension, several working groups were created during the seventh and eighth Workshop. Among others, the subjects of these groups cover education, government, libraries, business and moving pictures. The work of these groups may lead to several extensions to the DC. In the future, these extensions will be stored in the DCMI Registry, which is still under construction. The concept of application profiles may provide a connection point.


PREV.
NEXT
TOC
4 The Warwick Framework and the Resource Description Framework

4.1 Introduction to the Warwick Framework

The concept of the Warwick Framework (WF) results from an analysis of the DC and other forms of metadata. It seeks to 'store' several sets of metadata, each separately accessible. The WF is not implemented in an automated system, but serves as a concept to explore the possibilities of such a system. This concept influenced the Resource Description Framework (RDF), developed by the World Wide Web Consortium (W3C), as an infrastructure for the use of structured metadata sets. Contrary to the WF the RDF is an implementation built in XML, the successor of HTML. This chapter will describe the Warwick Framework concept and the concept and architecture of the Resource Description Framework.

The Warwick Framework concept is designed to place several metadata sets - independent of each other - in one place. The term for this place is a framework. This framework contains two kinds of objects: containers and packages. A package is defined as a metadata set, such as a complete DC description. A container is defined as a storage place for other containers or for packages. In paragraph 4.3, the architecture of the WF is discussed in more detail.

During the second DC Workshop - held in Warwick in the United Kingdom - the Dublin Core was evaluated, after aa year of working with the element set. (Dempsey and Weibel 1996; Hakala, Husby et al. 1996; Lagoze 1996; Lagoze, Lynch et al. 1996) Although the concept of a simple metadata set was considered to be useful, several issues emerged. Firstly, the question rose if the DC qualified as a true standard suitable for implementing in automated systems. Such a standard requires clearly defined rules, whereas the DC is very loosely defined. Secondly, the DC was kept simple to enable all authors to create a description about their content, but the ability of these authors to do so was questioned. Furthermore, the loose nature of the DC rules gives room to inconsistent descriptions. An example of this may be found in (Banerjee 1999). The third issue encompasses the purpose of the DC. It is aimed toward content description leaving aside administrative or operational metadata. Lastly, some elements of the DC - coverage, source and relationship - may be considered to be too much depending on a certain domain. If these elements fit into the DC other elements may also apply and that may result in an extended instead of a core element set.

This resulted in three questions, as quoted from (Lagoze 1996):
 

'1. Should the number of elements in the Dublin Core be expanded or contracted? Some workshop attendees felt that in order for the Core to succeed as a tool for authors, its number of elements should be restricted to only the most basic descriptive elements. Others saw the need for new fields such as terms and conditions or administrator. 
2. Should the syntax of the Core be strictly defined or left unstructured? Many attendees wanted to avoid the painful syntax wars that are familiar to those who have participated in standards efforts. However, without a stricter definition of syntax, the Dublin Core does not provide the level of interoperability for which it was intended. 
3. Should the Core be targeted solely at the existing WWW architecture, or extend that architecture? There is a strong argument for specifying a metadata standard that can be implemented within the existing World Wide Web framework (browsers, servers, HTML specification, etc.). However, the Web is clearly not the model for the optimal information infrastructure, and many of its flaws are the subject of active discussion in the IETF, W3C, and other venues. Many of the Workshop attendees felt that it was important to describe a metadata framework that extends existing WWW technology and provides guidance on how that technology might evolve.' 


These questions were fundamental to the development of the DC element set. As can be read in the previous chapter, the path that was ultimately chosen lies between the extremes. Although the core set has expanded into fifteen elements, its size may still be considered manageable. If needed, local extensions are possible. The same holds true for the syntax rules. An extended syntax using qualifiers may be applied, but only if the Dumb-Down Principle is obeyed. Also, guidelines for the application of the DC in HTML exist, but the current version of the DC is still independent of HTML, or its successor XML.

4.2 General principles of metadata

At the time of the second Workshop, the answers were not clear, and the participants tried to find them by defining and examining general principles of metadata. The Warwick Framework is based on these principles. As can be seen from the following paragraphs, the concept of the WF is broader than the Dublin Core. The purpose of the DC is describing data for the goal of resource discovery, whereas the WF is an architecture in which different metadata sets - used for different goals - may be stored. The following principles were defined:
* Metadata takes a variety of forms, both specialized and general. The DC element set could be described as a certain class of metadata, aimed at the discovery of information resources. Other metadata sets in this class are UNIMARC and MARC21 or the TEI header. These are far more complex than the DC, but still aim to describe very different sources. Furthermore, specific user groups use specialized formats, for example the Content Standard for Digital Geospatial Metadata (CSDGM) as used by the Federal Geographic Data Committee.
Apart from this class of metadata, other sets of metadata may be useful as well, such as terms and conditions for use of resources, administrative data, content ratings -ranging from filters on violent or sexual content to descriptions of educational resources, provenance - describing the origin or source of the described object, and structural data - defining the components of complex objects.

The boundaries between the metadata classes are not very clear. The DC gives room to descriptions of relations to other sources and MARC records also serve as administrative data for libraries.

  • New metadata sets will develop as the networked information matures. In recent years information and communication technology changed quickly enabling new ways to publish and share information resources. New developments will create new possibilities requiring other metadata sets.
  • Different communities will propose, design, and be responsible for different types of metadata. Different user groups will create metadata sets suitable for their specific needs. While librarians are interested in describing information resources to make them accessible to their patrons, the owners of intellectual property rights of the same resources may be best served by descriptions of those rights. The notation and syntax of these metadata sets may vary greatly from text strings to executable programs.
  • There are many 'users' of metadata. Not only the creators of metadata differ greatly, but also the users. Some metadata will be used by automated agents or will consist of executable programs, while others are meant to be read by humans. Some data formats - like XML - may serve both goals. Although primarily useful for automated processing, 'raw' XML is readable for humans.
  • Metadata and data have similar behaviors and characteristics. There is no clear boundary between data and metadata. One could view metadata in a certain context as data. For example, one could use an annotated list of book titles as metadata for a description of a collection or as data for a bibliography. Furthermore, the metadata itself may contain descriptive data. For a more detailed discussion on this subject, see also (Guha 1996/1997a; Guha 1996/1997b).
  • The metadata sets associated with an object may be physically collocated or may be referenced indirectly. While the information resource may consist of one object or of several linked objects, the same holds true for the metadata sets 'attached' to it. The different objects do not necessarily reside in the same depot. Following this principle, one metadata set - or part of a metadata set - may be linked to several information objects. 


4.3 The Warwick Framework architecture

As described in the first paragraph of this chapter the WF aims to assemble separate metadata sets into one framework. It consists of two components: containers and packages. As explained in paragraph 4.1 a package is a complete description. Packages are stored in a container. The state of the container may be transient or persistent. A transient container is not stored as a file on a server, but exists only as a 'transport object between repositories, clients and agents'. (Lagoze, Lynch et al. 1996) A persistent container is stored on a server and is accessible through an identifier. The container itself may also be a part of a larger object, which contains both the data (the information resource) and the metadata (the container).

The packages are defined as independent objects meaning that the different metadata sets are separated. The user of the container must be able to recognize the type of the package, and must be able to skip unknown - or unwanted - packages. For instance, one package may contain a complete UNIMARC description and another package may contain a DC description. If the user knows this, he or she has the possibility to use the UNIMARC description instead of the DC description. The packages may be encrypted enabling the creators of the package controlled access to its contents. No definition is given for the format of the packages. It may consist of ASCII texts, executable code or any other suitable format.

Three types of packages are defined:

  • Metadata set. A package containing the actual metadata. This package must also contain data about what kind of metadata set is used. For instance, handling a MARC21 record - with its complex structure - is quite different from handling a DC description.
  • Indirect. The content of this package is a reference to another object, stored elsewhere on the network. The referenced information resource may have its own metadata, and different containers - describing different information resources - may share it.
  • Container. The package may be another container. No limits are defined to this recursion.


One of the assumptions of the DC is that the author of an information resource also creates the associated metadata. This may not be the case with the WF, because its creators describe two types of containers: internally referenced and externally referenced containers. The internally referenced container is associated to the information resource by the person or organization responsible for the resource. While an externally referenced container also describes the information, it is not the responsibility of the 'owner' of the information resource. This container may even be unknown to the owner.

4.3.1 Unresolved issues
During the second Workshop several issues were discussed for which no resolution was found. The semantic interaction of overlapping metadata sets was considered to be fundamental to the WF. Several separate packages may contain data with roughly the same meaning, but in a different format. Consider a package with a UNIMARC record and a package with a DC description. Both packages aim to describe the contents of the information resource, but the format used is quite different. Extracting useful information by means of an automated agent becomes very complex since it has to 'understand' both formats. Adding a third packet with descriptive information will only aggravate the situation. Furthermore, because a container may hold other containers, more or less equivalent metadata sets may be found on different levels in the Framework. The flexibility of the WF may lead to the creation of very complex - but useless - applications.

Other issues concerned the implementation of the WF. As described before the metadata sets in all packages are of a certain type, such as a DC description, a TEI Header or a MARC21 record. These types should be registered and understood by the software used to process it. Data encoding is the second implementation issue discussed. Two types of syntax should be defined: for the containers itself, and for the contents of the packages. Because of the variety of the metadata, different formats may be used. The following example may enlighten this: one package could contain a DC description in plain text, while the next package contains a MARC21 record. The complex formatting of that record requires more handling to make it useful, while a plain text can be read directly. Another area of concern is efficiency. The complexity of the WF applications should not lead to low performance. Repository access is another issue. A protocol should be developed to enable the automated retrieval of the separate containers and packages. 

4.3.2 Extending the Framework
In (Daniel and Lagoze 1997) the concept of the WF is further extended. This is done using a model called Distributed Active Relationships (DARs). As can be read from the following paragraphs the model bears important similarities with the Resource Description Framework (RDF). The model is based on four principles, partly described in paragraph 4.2 General principles of metadata, and partly new. The principles are:

  • There is no essential distinction between data and metadata. This was described earlier as 'Metadata and data have similar behaviors and characteristics'. What is concerned to be metadata depends from the context, and only if a certain relationship between two information resources exists, one may speak of metadata.
  • Resources can be related without regard for their location. This principle was earlier phrased as 'The metadata sets associated with an object may be physically collocated or may be referenced indirectly'.
  • There is no single 'about' relationship. Several different relationships exist between information resources. Again, this was phrased earlier in: 'New metadata sets will develop as the networked information matures' and 'Different communities will propose, design, and be responsible for different types of metadata'.
  • The computational power of the networked information environment makes it possible to consider active or dynamic relationships between data sets. The metadata required may not be stored in a file, but created 'on the fly' by an automated agent. It may also enable different levels of access to a resource, depending on predefined rules.
From a conceptual point of view, only the last remark may be considered to be new.

The model is an extension of the WF in two ways. One aspect is that the content of the information resource is considered to be another package, instead of something outside the Framework. The consequence is that access to the 'content package' may be governed by rules defined in another package. Here the 'computational power' comes into play. The second extension to the WF is that relationships between different information resources - stored in packages - is stored in one or more separate packages. The relationship is an information resource in itself. Again, this enables automated agents to process its contents. One of the relations defined could be terms and conditions for accessing a content package. Depending on the contents of the 'relationship package', access is granted or denied.

The Resource Description Framework has similar features. To describe an information resource, properties may be defined - with the same characteristics as relationships - and values for these properties may be defined. It will be discussed in more detail in the following paragraphs. The RDF is not a theoretic concept like the Warwick Framework, but is implemented in an application.

4.4 Introduction to the Resource Description Framework

The Resource Description Framework (RDF) is developed under the auspices of the World Wide Web Consortium (W3C). It is designed as an architecture to enable the sharing of metadata sets; although designed using contributions of several parties, it may be considered to be an implementation of the principles guiding the WF. (Bray 1998; Iannella 1999a; Miller, Eric 1998; Miller, Eric, Childress et al. 1999; W3C 2001; Weibel 1999) The RDF is an application built upon XML, and the following paragraphs will contain a short discussion of the XML architecture, and the principles and architecture of the RDF. 

4.4.1 A short description of XML
While the concepts of the DC and the WF are independent of a specific technical format, the RDF is not. It is an application using XML, and therefore a short - and incomplete - description of XML is included in this thesis. The eXtensible Markup Language (XML) is defined by the World Wide Web Consortium as a successor to HTML. It is used to create information objects consisting of elements. The elements are encoded using tags and attributes. Contrary to HTML tags are only used to define the structure of documents, and not the layout. While it is possible to freely define tags XML also gives the opportunity to define strictly ruled applications such as the RDF. The tags function as 'containers' for digital data, regardless of the format of that data. Furthermore, the markup language contains a facility called 'namespace'. A namespace is a set of pre-defined elements, and possible attributes to these elements. (W3C 2000) In the RDF application, the different metadata standards are defined in such namespaces, called schemas.

This example of XML describes a 'collection' of two documents, an article from D-Lib Magazine, and a book on libraries in the 16th century.

<?xml version = "1.0"?>
<collection>
  <article>
   <title>The Dublin Core Metadata Initiative : Mission, Current
    Activities, and Future Directions</title>
   <year>2000</year>
   <magazine>D-Lib Magazine (6) 32</magazine>
   <keyword>Dublin Core</keyword>
   <keyword>DCMI</keyword>
   <keyword>Metadata</keyword>
  </article>
  <book> 
   <title>Bibliotheken en historie in de 16e eeuw</title>
   <author>door L. Brummel</author>
   <year>1967</year>
   <publisher>'s-Gravenhage : Nijhoff</publisher>
   <page>19 p</page>
   <annotation> Afscheidscollege Amsterdam G.U.</annotation>
   <keyword>Libraries</keyword>
   <keyword>16th Century</keyword>
   <keyword>Brummel</keyword>
   <signature>(SISO) 022.3</signature>
   <discovered_in>Picarta</discovered_in>
   <discovered_in>NCC</discovered_in>
   <contained_by>Openbare Bibliotheek Amsterdam</contained_by>
   <contained_by>Universiteit van Amsterdam</contained_by>
  </book>
</collection>


A much larger collection is found at the Koninklijke Bibliotheek, the national library of the Netherlands. Its catalogue contains about 2,000,000 XML records that are directly available on the Internet. (KB 2001)

4.4.2 Principles of the Resource Description Framework 
Contrary to the WF the Resource Description Framework is implemented using the markup language XML. The RDF aims to describe resources, using a collection of properties. A resource is defined as any object that can be identified by a Uniform Resource Identifier (URI). An URI may be described as a 'network-address' of an information resource, so the RDF may be used for any object found on a network - e.g. the Internet - as long it has its own address. A collection of properties is called an RDF description. Every property is defined by a property type and has values. These values may either be atomic values or other resources. Atomic values consist of one 'piece of information', like a text string or a row of numbers. If the value is another resource, then that resource may also have its own set of properties. A set of properties - for example the elements of the DC - is called a schema. The RDF is illustrated in Figure 1.


Figure 1 RDF description

The RDF description contains Resource 1. Resource 1 has several properties of different types. Some of the property values are atomic, while one other is a resource in itself, with two other properties. Figure 2 contains an example, with the - fictional - website www.ronaldsnijder.nl.


Figure 2 Example of a RDF Description: Website www.ronaldsnijder.nl

As can be seen, several properties of the website are described. The value of the 'author property' contains of another resource, with two properties. In this example, the property types are more or less randomly chosen. Contrary to the previous example, the diverse metadata sets use standardized elements instead of randomly chosen elements. The elements are the properties in the RDF model - the contents of the elements are the values of the RDF properties. Every metadata set may be defined in an information resource accessible though the Internet. This information resource contains the property types that are allowed to use with that particular metadata set. There is no central depot for storing these resources, nor is there one central responsible organization. Everyone may define and manage a schema, as long as it is accessible via a networking address. Furthermore, a RDF description may use several schemas, as is illustrated in Figure 3.


Figure 3 Using schemas

The resource description in Figure 3 uses several schemas. The title and author of the website are described using the Dublin Core elements. Furthermore, an element of the Australian Government Locator Service (AGLS 2001) is used to describe the subject. More information about the author is described using the vCard metadata element set. (Iannella 1999b) In this case, the information on the author is stored in the - non-existing - website: www.allauthorsoftheworld.org..

The RDF description of the website would look like this:

<? xml version = "1.0" ?>
<RDF xmlns = "http://w3.org/TR/1999/PR-rdf-syntax-19990105#"
 xmlns:DC = "http://dublincore.org/#"
 xmlns:AGLS = "http://naa.gov.au/AGLS#">

  <Description about = "http://www.ronaldsnijder.nl">
   <DC:Title>Metadata standards and information analysis</DC:Title>
   <DC:Creator HREF =
       "http://www.allauthorsoftheworld.org/ronaldsnijder#"/>
   <AGLS:Function>Information Management - Internet</AGLS:Function>
  </Description>

</RDF>


In the example above, the information on the Creator can be found using a link to another website. Title and Function are both atomic values, defined by the DC and AGLS respectively. There is no need for different RDF resources to reside in separate files; all RDF information may also be stored in a single file.

The RDF model puts certain limitations on the definition of schemas, since all schemas must comply with the XML syntax rules. This is not necessarily a bad thing. Using a technical standard enhances interoperatibility of data. Furthermore, the definition of XML - and of the RDF - is maintained by a public organization - the W3C - and not by a software vendor. All technical documentation is publicly accessible enabling a widespread implementation. No software vendor will gain from creating propriety standards - as is the case with the web browsers Internet Explorer and Netscape Navigator.

4.4.3 RDF as a solution for the unresolved WF issues?
In paragraph 4.3.1 several unresolved issues concerning the WF were described. The RDF model has several features, which may overcome these issues. Semantic overlap is seen as the most fundamental problem facing the WF. This may be partly overcome. The diverse schemas - or packets - are described in separate files, using standardized syntax rules. By following the syntax rules, an automated agent should be able to process the contents of the RDF description. The example used in paragraph 4.3.1 Unresolved issues consisted of a UNIMARC record and a DC description. If their 'own' schema defines both, an automated agent may find the same types of data - for instance: the title - using the coorrect properties. 

Semantic overlap cannot be solved completely by applying syntax rules. Like the WF, the RDF enables the use of diverse metadata sets on different levels. All properties may consist of other resources, creating the diverse levels of the description. Again, this flexibility leads to the same disadvantage as the WF. Packages containing more or less equivalent metadata sets may still reside on different levels.

Some other implementation issues may be resolved. One of the problems concerned the registration of the separate metadata packages. Every set should be described, and its type should be recognizable. The definition of the schemas solves this. The organization responsible for the set maintains the schema, including its type. While the contents of the schema - the properties - are freely definable, the syntax rules enforce a standardized way to share information. This also solves the data-encoding problem: the 'containers' use the XML definition. Furthermore, the contents of these containers may be of different formats. The last issues concern repository access and efficiency. Access to the separate resources is implied in the definition of RDF, because a resource always has an URI. No information on the efficiency of RDF implementations was found.


PREV.
NEXT
TOC
5 The Digital Object Identifier (DOI)

5.1 Introduction to the Digital Object Identifier

Whereas the goal of the Dublin Core may simply be described as 'resource discovery', the ultimate goal of the Digital Object Identifier (DOI) is 'rights management'. While the DOI also uses metadata elements it is not primarily designed for finding resources, but to identify abstract concepts or tangible objects. The DOI project started in 1996, as a reaction by the - commercial - publishers to the challenges presented by E-commerce. (Erickson 1998; Erickson 1999; Morris 1999; Paskin 1999; Paskin 2001a; Paskin 2001b; Rust 1998) In the era before the popularity of the Internet description of resources was considered to be a task for the 'library world' and managing intellectual property rights was a task for the 'publishing world'. The possibility to publish information online changed that. Information - or 'content' - does not reside on a phhysical carrier, but it can be downloaded from a network. As a consequence, finding and buying a resource may be combined in one action. 

Another consequence is that all information resources need to be uniquely identified. If an article is published in a journal, the combined data of the journal (title, year, issue number, ISSN) and the article itself (title, author, page numbers) are sufficient to identify and retrieve the article. When the same article is only available as a resource on a network, other means of identification are necessary. Furthermore, the holders of the intellectual property rights on that resource grant the use of the resource. When different persons or organizations own different rights on the same resource, administrating these rights becomes a complex matter. To further complicate matters, different users may be granted different uses of the same resource. As an example, a distinction could be made between educational uses - by scholars and students - and commercial use of scientific information resources. This type of administration can only be achieved by automation and standardization of data. The DOI aims to support this. 

Identification of information resources is not new. Before defining the DOI, Paskin discusses two 'identification schemes': the ISBN and the URL. (Paskin 2001b) He finds both to be insufficient. The ISBN does not identify an information resource, but a physical object - hard covers and paperbacks often have different ISBNs, while the text is the same - and its use for information resources smaller than books is problematic. URLs do not identify the content of the information resource, but its location. A radical change of the contents may not lead to a new URL, while a change of location - which does not necessarily mean a change of content - does. 

The DOI is described as a 'persistent identifier of intellectual property entities'. An entity is defined as something that is identified ranging from an abstract notion to a tangible object. Intellectual property can be defined as 'creations of the mind: inventions, literary and artistic works, and symbols, names, images, and designs used in commerce'. (WIPO 2001) This means that the DOI can be used to identify anything to which intellectual property rights apply, ranging from ideas to a small piece of a musical recording. In the literature, the term granularity is used to describe the different levels or parts of an information resource. As can be seen from the examples above, the DOI may be used on every level of granularity, depending on the requirements of the person or organization using it. The identification is not limited to entities owned by persons or organizations. It may also be applied to entities in the public domain. Furthermore, the identifier is persistent, meaning that a change of ownership of the entity does not change it.

Contrary to what one may expect the DOI is not an application for the management of rights, but it may act as a component of such a system. The DOI is used to create unique identifiers. A comparison with the ISBN may enlighten this: assigning a unique number to a book is extremely useful, but publishing and selling books encompasses more than that. Changes in ownership or other rights concerning an entity are not administered within the DOI description itself. Of course, managing intellectual property rights concerning an entity depends on a good identification of that entity. For instance, the original text of the Odyssey is part of the public domain, while a recent Dutch translation is not. To distinguish between those two entities, a form of identification is necessary. 

Furthermore, some metadata is an undividable part of the DOI description. Therefore, the creation of metadata is not exclusively done by libraries or other information brokers, but it becomes a task of the entity's publisher or creator. While the Dublin Core also permits authors to create metadata for their own creations, its guidelines are very loose. The metadata used in the DOI description must follow strict rules, defined in a DOI Application Profile. 

The identification of entities is done by the use of a unique identifier, combined with a minimal set of metadata elements. Both the structure of the identifier and the metadata elements will be discussed in the next paragraph. As is the case with the Dublin Core, the DOI and its metadata are governed by certain principles. Instead of discovering the principles themselves, the creators of the DOI made the pragmatic choice to use guidelines defined by others. The identification of resources is guided by the requirements for Uniform Resource Names (URN), which can be found in (Sollins and Masinter 1994):

  • Global scope. The name is not bound to a location; it's meaning is the same everywhere.
  • Uniqueness. The same name may not be given to two different resources.
  • Persistence. The name must remain the same to infinity.
  • Scalability. The name can be assigned to any possible resource.
  • Legacy support. The name must be able to support legacy naming conventions, if the other requirements can be applied to those conventions.
  • Extensibility. It must be possible to expand the naming scheme.
  • Independence. The organization responsible for the names must be totally independent.


The creation of the metadata scheme was guided by another set of principles, derived from the INDECS project. This project will be described in more detail in chapter 6. As can be seen from chapter 3 the INDECS project also influenced the development of the Dublin Core. The principles identified are:

  • Unique identification. Every entity must be uniquely defined within an identified namespace. This may be considered the core of the DOI. Every entity is given a unique number, within the DOI identification scheme (the 'identified namespace').
  • Functional granularity. Entities are only defined if a need to do so arises. The following example may enlighten this. In popular music, small pieces of other songs (samples) are used. Because this has consequences for the rights owners of the sampled song, the sample must be identified, while normally only the complete song would be identified.
  • Designated authority. The creator of the metadata must be identified without doubts about its identity.
  • Application independence. The metadata scheme should be independent of the technology used.
  • Appropriate access. Everyone must have access to the metadata needed. The consequence of this principle that not all metadata must be accessible to everyone. In certain circumstances, some metadata would be inaccessible for certain users.
The defined metadata scheme will be described in more detail in paragraph 5.2.2. Description: the metadata used in the DOI.

5.2 The DOI System

To achieve its goals the identification scheme is embedded in the DOI System. This system consist of several components, which will be discussed in the next paragraphs:

  • Enumeration
  • Description
  • Resolution
The use of the DOI system is guarded by policies, as defined by the International DOI Federation (IDF), responsible for the management of the DOI. Apart from policies governing the relation between the IDF and other organizations, some policies concern the use of the DOI and its metadata. While finding the metadata of the DOI is a function of the DOI System, the reverse - from metadata to a DOI - is not. This function is left to other service providing organizations.

The DOI uses the Handle System. Designed to be used for digital libraries, the Handle System is designed as a 'naming service' for obtaining digital objects. On the Internet, the Domain Name System (DNS) is used for a similar goal. The DNS maps the domain address used in a browser (URL) - like www.loc.gov - to an Internet Protocol (IP) address cconsisting of 12 digits, which is used to find the correct website. Using a URL containing more information than just the domain - such as www.loc.gov/copyright/circs/circ1.html - enables obtaining a specific information resource. If the resource were moved, the used URL would point to the wrong direction, making it unusable. The Handle System does not point directly to a web address, but it uses a unique name to which one or more network addresses are attached. Therefore, if the location of the resource is changed, the name of the resource remains the same. (Lannom 1999)

5.2.1 Enumeration: constructing unique identifiers
One of the cornerstones of the DOI is the assignment of unique identifiers. All identifiers are created using a defined structure. The DOI is not intended to change, and it is meant to be an 'opaque string' (dumb number). No information about the resource or the - current - owner of the resource should be derived from the issued identification. The DOI syntax is a NISO standard3.

The DOI consists of a prefix and a suffix, separated by a forward slash. It is not limited to numbers, but all printable characters from Unicode v2.04. There is no limitation on the length of either the prefix or the suffix. All prefixes start with the string '10.' to distinguish DOIs from other implementations of the Handle System. All other characters in the prefix make up the identification of the organization - called a Registrant - registering the entities. If desirablee, the organization may apply for several prefixes. The Registrant creates the DOI suffix, which may be any string of characters desired. This enables the Registrant to use already applied identification codes, for instance an ISBN.

Following are examples of valid DOIs:

  • 10.100X/12345
  • 10.100X/ISBN-0-684-84815-5


5.2.2 Description: the metadata used in the DOI
The DOI itself does not reveal anything about the entity, which makes related metadata essential. Furthermore, the DOI may identify an - unrecorded - performance or an abstract 'work', makking direct inspection of its contents impossible. To overcome this, a small 'metadata kernel' is defined to be used in conjunction with the identifier. It is designed to be as limited in scope as possible, and it should be applicable to any entity identifiable by the DOI. Similar to the Dublin Core, the metadata kernel is meant to be the minimum set of elements, applicable to all entities. Also, the DOI metadata scheme is expandable, using an Application Profile.

The DOI Application Profile (DOI-AP) is defined as 'the functional specification of an application (or set of applications) of the DOI System to a class of intellectual property entities that share a common set of attributes.' (Paskin 2001b) As can be seen, the DOI-AP acts as an addition to the metadata kernel. Like the DC, the additions can be used for metadata that is specific for a certain type of entity. An example would be the number of pages, which is a necessity for paper-based information resources, but is quite useless for a MP3 music file. Maintenance of the DOI Application Profiles is done by one or more organizations responsible for the DOI. DOI Application Profiles may overlap other DOI Application Profiles, and may be as narrow or as broad as is required by the users. Every DOI should contain at least one DOI-AP.

Examples of possible DOI Application Profiles:

  • Academic journal articles
  • MP3 music files
  • E-books
  • Biomedical photographs


The following table lists the DOI metadata kernel, as copied from (Paskin 2001b):
 
 
Element Definition Status Number Allowed values Possible genre qualifications
DOI A DOI Mandatory 1 only DOI
DOI-AP A class of entities with common attributes Mandatory 1 minimum From DOI-AP tables
Identifier A unique identifier (e.g. from a legacy scheme) applied to the entity Qualified by AP 1 minimum Any alphanumeric string but when present must include an identifier type, e.g. ISBN. Define in application: it is normal to include a legacy identifier in one exists.
Title A name by which the entity is known Mandatory 1 minimum Any alphanumeric string Define in application; a value of "untitled" may be allowable in certain Aps.
Type The primary structural type of the entity Mandatory 1 only From: 
Abstraction
Tangible Manifestation
Intangible Manifestation
Performance
Mode The primary sensory mode by which the entity is intended to be perceived Mandatory 1 minimum From: 
Visual
Audio
Audio+Visual
Abstract
Define in application; a value of "unknown" may be allowable in certain Aps.
Primary agent The name or identifier of the primary agents(s) (normally but not necessarily the creator). Mandatory All primary agents. (1 minimum, but all entities fulfilling the same agent role must be included.) Identifier or Name from an agreed namespace The specification of the Primary Agent for any AP is determined by the DOI-AP rules.
Agent role The role(s) played by the primary agent(s) Mandatory 1 minimum Role code from an agreed namespace

Apart from the metadata elements administrative data such as Registrant, date of registration, record version number must also be kept in a DOI registration. Metadata on intellectual property rights concerning the entities are not a part of the metadata kernel. This is done deliberately. While intellectual property rights may change on a regular basis, the data in the kernel is intended to be static. 

5.2.3 Resolution: obtaining information resources
Resolution is defined here as submitting an identifier to a network service and receiving information about the identifier. An example is typing a text string like www.loc.gov in a browser, which is resolved to an Internet Protocol address of 12 digits, used to find the website of the Library of Congress. This text string - the URL - points to an address on the Internet, while the DOI is a unique identifier - a name - of a resource. Through the Handle System, this name is linked to a network address. Therefore, if the location of the resource changes, the DOI remains the same. Only the linked address changes. 

On a network like the Internet, several copies of the same information resource may exist. For example, several websites have 'mirror sites' to balance the traffic created by the visitors of the site. Also, the same entity may exist in several data formats. An example would be an article, available as a HTML file, a PDF file and as a MS Word document. The DOI System is able to automatically resolve a DOI to a desired address, depending on rules created by the rights owner of the entity. The address may be an URL or other network address, but also another DOI. This creates the possibility to use a DOI of an abstract work - for instance: The Odyssey - and resolve to a specific document, suuch as a Dutch translation of the work. The Registrant is responsible for the maintenance of the addresses. All DOIs - and the addresses attached to it - are registered in a central repository.



 
PREV.
NEXT
TOC
6 The Functional Requirements for Bibliographic Records (FRBR) and the INDECS Metadata Framework

6.1 Introduction to the Functional Requirements for Bibliographic Records (FRBR)

The concepts of both the Dublin Core and the Digital Object Identifier were created as a reaction to the challenges and new opportunities imposed by the digitalization of information resources. The same may also be said of the Functional Requirements. While the new technology enabled the creation of large databases containing bibliographic information, it also brought new forms of information resources, which may be accessed though a network. These changes - combined with an increased need to reduce costs and a growth of published information resources - were the main reasons behind a study undertaken by an IFLA Study Group in 1992 to accomplish the following: 'to produce a framework that would provide a clear, precisely stated, and commonly shared understanding of what it is that the bibliographic record aims to provide information about, and what it is that we expect the record to achieve in terms of answering user needs.' (IFLA 1998) 

The Study Group was to find a 'minimal level' of cataloguing, reducing the effort of national bibliographic agencies to create a new catalogue record. On the other hand, the record should also meet all the essential user needs. By defining such a level, a newly created catalogue record could contain less information than a 'full level record', but would still be useful for the users of the record. The study resulted in the Functional Requirements for Bibliographic Records, published in 1998.

The FRBR aims to provide a framework that identifies the objects of interest to the user of bibliographic data. In the IFLA Study the term used for these objects is 'entities'. Here the term has a slightly different meaning than defined by the DOI. While the DOI uses the term only for notions or objects to which intellectual property rights apply, the entities in the FRBR model also encompasses the creators and the possible subjects of those notions and objects. Three groups of entities are defined: firstly, products of intellectual or artistic endeavor; the second group encompasses the persons or groups responsible for the products of the first group; and thirdly the subjects of the products. The INDECS metadata model - also discussed in this chapter - uses a similar definition for the entities of the first group. In the following paragraphs these entities are discussed in more detail.

The framework does not only encompass entities, but also the attributes - or properties - of the entities and thhe relations that exist between the entities. The attributes and relations are recorded in the bibliographic record. When a user consults a record to perform a task, certain attributes and relations are used. The study aims to uncover the relations between the user tasks and the recorded information. For instance, if the user attempts to identify a work, one of the attributes to use is the title of the work. While the users tasks will be discussed in more detail, all defined relations will not be discussed.

The FRBR aims to cover all varieties of materials. This includes textual, musical, cartographic, audio-visual, graphic and three-dimensional materials, physical and digital media formats etc. What the model does not is covering the attributes and relationships reflected in authority records. The entities that are recorded in authority records - such as persons, organizations and concepts - are defined, but the additional data required for the use in authority records is not analyzed. While this may be considered to be an important extension of the model, the focus of the IFLA study and time constraints prohibited this.

6.2 Entities of the FRBR

The FRBR defines several entities - divided in three groups - which represent the key objects of interest to users of bibliographic data. While using the phrase 'aggregate and component entities', the model also recognizes granularity. An entity may consist of several other entities, or an entity may be part of a larger entity. The same holds true for the DOI, which aims to identify and describe any concept or object or parts of it. Within the Dublin Core the notion of granularity is more implicit. The 1:1 Principle states that different information resources should be described separately and linked together using the Relation element. To describe smaller parts of an information resource, the recommended qualifiers for this element (DCMI 2000) can be used.

6.2.1 Work, Expression, Manifestation and Item
The first group lists the entities concerning the products of intellectual or artistic endeavor:

  • Work
  • Expression
  • Manifestation
  • Item


A Work is defined as a distinct intellectual or artistic creation. It is an abstract notion, which is recognized by one or more realizations of the Work, called Expressions. For instance, a user may be interested in Hamlet by William Shakespeare. While several versions and translations of Hamlet exist, they are all realizations of the same Work. The same holds true for a musical Work: the song Come Together was not only performed by the Beatles, but also by various other artists. Yet the same Work may be recognized. The definition of a Work enables the linking of several Expressions - like translations or several versions - to the same Work.

An Expression is defined as an intellectual or artistic realization of a Work in any form of notation, sound, image, object, etc. Aspects that are not integral to the realization of the Work are not considered to be a part of an Expression. An example of this is the typeface and layout of a text. If inherent aspects of the Expression are changed a new Expression is created. Examples of this are the transition from spoken word to a written text or the translation from one language into another. Also, if a Work is revised or updated - which is quite common in a digital environment - the result is considered to be a new EExpression. Using this definition, different versions of the same Work may be identified.

A Manifestation is defined as the physical embodiment of an Expression. All objects with the same physical form and intellectual content are considered to be the same Manifestation. A change of the physical form results in a new Manifestation, an example may be a HTML file, a PDF file and a MS Word document all containing the same text. This definition enables the selection of a Manifestation with the required physical aspects.

The Item is defined as a single exemplar of a Manifestation. If several copies of a Manifestation exist, the Item enables the identification of a single object.

6.2.2  Person and Corporate body
The second group lists the entities responsible for the content, production and dissemination, or the custodianship of the entities in the first group:

  • Person
  • Corporate body


A Person is defined as an individual - living or deceased - that is in some way responsible for the creation or realization of a Work or is the subject of a Work. A Corporate body is defined as an organization or a named group of people, which is in some way responsible for the creation or realization of a Work or which is the subject of a Work. This enables the identification of the same Person or Corporate body in a consistent manner.

6.2.3 Concept, Object, Event and Place
The third group lists the entities, which may serve as a subject of works:

  • Concept
  • Object
  • Event
  • Place


A Concept is defined as an abstract notion or idea. An Object is defined as material thing. An Event is defined as an action or occurrence. A Place is defined as a location. This enables the identification of the same entities in a consistent manner, and the relations between them and a Work. The entities described in the first and the second group may also serve as subjects of a Work.

6.3 FRBR User tasks

The Study Group defined four user tasks. These tasks describe the type of usage that is made of bibliographic data. By defining these tasks, the model provides the possibility to map the attributes and relationships of the defined entities to the supported user task. The following tasks were defined:

  • To find entities that correspond to the user's stated search. If the user conducts a search in a set of bibliographic data - using attributes or relationships of entities - he or she should retrieve one or more entities. One important attribute of a Work is the title, and the user should be able to find it by searching on the title. Probably the most important relationship of a Work is the responsibility for it (e.g. who created the Work). A search in bibliographic data using the author should result in the correct Work.
  • To identify an entity. The bibliographic record should confirm that the described entity is indeed the one sought. The record should also distinguish between several entities with the same characteristics. One could think of several editions of the same book.
  • To select an entity that is appropriate to the user's needs. The user should be able to select - or reject - an entity, based on content, physical format etc. If one has found two texts - one in the form of a HTML file and one in the Adobe Acrobat format - the user should be able to choose the preferred format. Furthermore, a user may choose to obtain a translation of the novel War and Piece instead of a version in Russian.
  • To acquire or obtain access to the entity described. The user has the ability to purchase, download or loan an entity.


The next paragraph will discuss the relations between the Dublin Core and the DOI metadata standards, and the FRBR.

6.4 The FRBR as basis for the Dublin Core and the DOI?

While the Functional Requirements primarily aim to describe the minimal functionality of records in national bibliographies, its concept of the entities framework and the user tasks may also be applied to the Dublin Core and the DOI. Both metadata standards should enable the user to find, identify and use information and documents in the broadest sense. The framework of the FRBR describes the 'information resources' of the Dublin Core and the 'entities' of the DOI. The user tasks describe the uses made of them. 

In the following paragraphs the support of the entities framework and the user tasks are discussed. The level of support is discussed using the basic requirements for national bibliographic records as defined in the FRBR (Chapter 7 - Basic Requirements for National Bibliographic Records). These requirements describe the minimal functionality of a set of bibliographic data. The user should be able to do the following:

  • Find all Manifestations or a specific Manifestation, which embody the Works for which a given Person or Corporate body is responsible
  • Find all Manifestations or a specific Manifestation, which embody the various Expressions of a given Work. This can be achieved by using the title.
  • Find all Manifestations which embody the Works on a given subject
  • Find all Manifestations which embody the Works in a given series
  • Find a particular Manifestation when its identifier is known
  • Identify a Work
  • Identify an Expression of a Work
  • Identify an Manifestation
  • Select a Work
  • Select an Expression
  • Select a Manifestation
  • Obtain a Manifestation


The support of the tasks defined in the previous paragraph may be seen as the absolute minimum. The requirements for any system to support bibliographic information would encompass - at the very least - more sophisticated search functionalities. 

6.4.1 The Dublin Core and the FRBR
The 'functionality' of the Dublin Core is discussed in (Attig 1998). The author did not use the requirements as described in the last paragraph, but discusses the four user tasks in general and its implementation thereof in the DC. According to Attig, the Find task is best supported by the use of the elements Title, Creator, Contributor and Subject, while the elements Language, Coverage and Format may be used to restrict the search. 

The elements Creator and Contributor may indeed be used for finding the persons or groups that are responsible for an information resource. The role of the Title element and the Subject element is also clear. When looking at the minimal requirements defined in the FRBR for finding Manifestations using an identifier, the element Identifier may serve to find those entities. The definition of the Relation element is broad enough to use it to find all entities in the same series. A series of documents share a common relation, and therefore the Relation element may be suitable to describe that relation.

The retrieval of entities - especially the more 'formalized' search on series and identifier - depends heavily on standardization of the input. As stated before the DC does not impose strict guidelines on this part or assumes to prescribe binding rules. The Dublin Core may be used as a starting point for organizations with a common interest, which may create specific guidelines for the description of information resources. Within those communities rules or guidelines may be created to formalize the input. The formalization of the input should enhance the retrieval of relevant information resources. 

While Attig discusses the user task of identification briefly, the FRBR discusses numerous attributes of entities that are to be used. Among them are the title, the persons or organizations responsible, the language, the publisher/distributor and the date of publication/distribution. The corresponding DC elements are: Title, Creator, Contributor, Language, Publisher and Date. Furthermore, Attig discusses the selection task using the Description and the Coverage element as primary source, assuming that selection is primarily based on the contents of the retrieved information resources. The FRBR also emphasizes attributes considering the form and type of medium of the information resource. This is not surprising. The Dublin Core is mostly used for a single type of medium - a resource on a network - while the FRBR applies to all media types available in a national bibliography. The coverage of the obtain task also reflects this. Attig mentions a correct address in the Identifier element of the DC, the FRBR mentions several attributes ranging from title to access restrictions.

If the functionality of the Dublin Core is compared to the minimal requirements for bibliographic records, one may conclude that the DC meets those requirements more than halfway. The elements of the DC correspond to the attributes used in the FRBR. If the elements are used in a consequent manner - using a form of authority control and/or 'input guidelines' - the requirements of the FRBR may be met, but only within the boundaries of the community that enforces the guidelines. Using the Dublin Core elements in a less formalized manner does not lead to consistent descriptions. Inconsistent descriptions may be more useful than no description at all, but consistency is crucial for bibliographic records.

6.4.2 The DOI and the FRBR
The main purpose of the DOI is rights management and the metadata associated to it serves a slightly different purpose then the metadata in the DC. The function of the metadata is to describe the entity identified by the DOI instead of making the entity easier to retrieve. The associated metadata is kept to a minimum, although it may be extended. Finding entities using the attached metadata is not a function of the DOI; this task is left to others. Unlike the Dublin Core the input is governed by strict guidelines defined in the DOI Application Profiles.

Not surprisingly the Find task is only partly supported. The DOI number, Identifier, Title and Primary may be used but the DOI metadata kernel does not contain elements for finding entities by subject or series. Identifying entities is the main purpose of the DOI and therefore emphasis is placed on the DOI number to uniquely identify each entity. Title, Identifier, Primary agent - and to a lesser extent Type - support this task. No explicit support exists for relationships to other entities or for describing the characteristics of the entities' medium. The same also holds true for the selection task. The obtain task is more strongly developed as resolution is one of the main components of the DOI System.

Comparing the functionality of the DOI metadata to the minimal requirements leads to the conclusion that the requirements are not met. The DOI is not intended to be used for resource discovery, but for the identification of entities. The larger element set of the Dublin Core makes it more suitable to the requirements of the FRBR, but the DC lacks consistent input rules. Contrary to this, the DOI uses strict guidelines and obtaining entities is one of its strong points. 

As stated before, the DOI may act as a component of a larger system. Combining the DOI - and its metadata - with an application to manage bibliographical data or intellectual property rights could result in a system that may very well comply with the FRBR requirements. 

6.5 Introduction to the INDECS Metadata Framework

'In the print world, sales and rights transactions tend to be considered separately; in the digital world, all transactions are rights transactions'. (Morris 1999) This quote describes best the background for the INDECS project. INDECS is an acronym for Interoperatibility of Data for Electronic Commerce Systems, the name of a project running from the end of 1998 until 2000, with support of the European Commission. It brought together different organizations representing creators, publishers and managers of 'content in the digital environment'. Starting point is the assumption that trading content - to which intellectual property laws apply - will be done using a network as distribution channel. Automation is necessary to manage all the intellectual property (IP) transactions. To do this effectively, all activities must be identified and described using a standard. The INDECS Metadata Framework is that standard. (Bearman, Miller et al. 1999; Paskin 2001b; Rust 1998; Rust and Bide 2000)

The INDECS Framework aims to create interoperable metadata for electronic commerce - or E-commerce - meaning that the metadata must be used in as many ways as possible. This eliminates creating separate metadata for different media, for different functions - like cataloguing, discovery or rights management, for different levels - from simple to complex - of metadata, for different languages or territories and for different technology platforms. All these can be seen as trade barriers for E-commerce.

INDECS uses a fairly simple model for commerce: 'People make stuff. People use stuff. People do deals about stuff' (Rust and Bide 2000). The name used in the INDECS model for 'stuff' is 'creations'. Creations are formally defined as the output of a creative activity, and may be a tangible object or a concept. Commerce is seen in a broad context. Not only transactions with financial aim are described, but also transactions that enable people to access creations freely. An example of this is the use of books from a library or downloading copyright-free files. 

The INDECS model uses the following definition of metadata: 'An item of metadata is a relationship that someone claims to exist between two entities.' This reveals a different approach to metadata. While the Dublin Core describes properties of an object, the INDECS model uses a more 'relation oriented' approach. The object itself is not the most important part of the model, but the relationships that exist between the objects. An event is the type of relation that is central in the INDECS Framework. Not surprisingly the events described concern the creation, modification, use and 'publishing' of entities, and the conditions to enable these events: transactions, agreements, offers and payments. Furthermore, several axioms about metadata for E-commerce were defined:

  • Metadata is critical. Trading goods and services in an online environment depends more on identifiers and descriptions (metadata) than trading in the 'real world'. 
  • Stuff is complex. Managing intellectual property rights is a complex matter. Most digital creations consist of several pieces of intellectual property, which all need to be managed. Like the FRBR, the INDECS model acknowledges several aspects of products of intellectual endeavor as described in paragraph 6.2.1. To all aspects, intellectual property rights may apply.
  • Metadata is modular. E-commerce metadata consists of pieces created by different people. The traded 'stuff' consists of several pieces of intellectual property, with its own metadata attached to it.
  • Transactions need automation. In the digital environment the number of transactions is expected to grow fast. This can only be managed using automated tools.


Apart from these axioms, the creation of INDECS metadata is guided by several principles. As they are discussed in paragraph 5.1, they are only briefly listed here:

  • Unique identification. Every entity should be uniquely identified within an identified workspace.
  • Functional granularity. It should be possible to identify an entity whenever it needs to be distinguished.
  • Designated authority. The author of an item of metadata should be securely identified.
  • Appropriate access. Everyone requires access to the metadata on which they depend, and privacy and confidentiality for their own metadata from those who are not dependent on it.
In the next paragraph, the entities defined in the INDECS model are discussed.

6.6 Entities of the INDECS Metadata Framework

The definition of an entity is the same as in the DOI: an entity is something that is identified. Any entity has five types of attributes: labels, quantities, qualities, types and roles. This list of attributes functions as the metadata set for that particular entity. While all other attributes describe the entity itself, role is a part played or function fulfilled by an entity - during an event - in relation to another entity or entities. The entities contain several types of roles: 

  • Agent role. The active entity in an event. If the creation of the painting Guernica were described, the agent would be Pablo Picasso.
  • Input role. The passive, qualifying or supportive entity in an event. If the translation of the novel War and Peace is described, the original Russian novel plays the input role.
  • Output role. The created or changed entity in an event. Using the example of Picasso's painting, the output role would be played by the painting Guernica.
  • Context. An entity within an event took place or a situation exists, in other words: the time and place something takes place.
6.6.1 Relations
Relations are the most important entities in the INDECS model, which is reflected in the number of possible relation types and the subdivision of every relation type. Three types of relations are defined, where each type is divided in subtypes:
  • Event. This type describes all events concerning the artistic and the commercial processes concerning creations. It contains the following types: expression (an event which is a creation in itself, like a performance of a play), creatingEvent, transformingEvent, usingEvent, disseminatingEvent (making a creation accessible, for instance the publishing of a book), transaction, agreement, offer, and payment.
  • Roles. As described in the previous paragraph, this type is subdivided in an agent role, an input role, an output role and a context role.
  • Situation. A situation is a relation between entities that remains constant, contrary to the relations described above. It is divided in the types: posessingSituation and association (defined as a relation based on the verb 'to be').


6.6.2 Parties
While Relation entities describe possible actions, Party entities describe actors. Party entities do not only encompass people and groups of people - as is the case with authority files ussed in libraries - but also animals and plants. The following entities are defined:

  • HumanBeing
  • Organization. A group of HumanBeings, whether they are a legal entity or not.
  • Ensemble. A special type of Organization; a group of creators.
  • Animal
  • Plant
6.6.3 Creations
Creations are the 'stuff' in the model of commerce described in paragraph 6.5. The types of entities defined in the INDECS model seem to overlap those of the FRBR. Even the names are mostly the same: item, manifestation, and expression. This is not completely true. The INDECS Framework does not make the distinction between an abstract notion - for instance, the song Come Together - and the different realizations of it, such as performances by The Beatles and by other artists. The term Expression is used here for events that are creations in itself, like a performance of a play. An Expression may be recorded on a medium and this medium then becomes a Manifestation. An example of an Expression is the Live Aid concert, performed in 1985. A videotape containing a registration of this concert is a related Manifestation. The entity Abstraction is equivalent to Work in the FRBR and the definitions of Manifestation and Item carry the same meaning in both models. This is used for descriptions of Artifacts that need content to be used as a Manifestation. Examples of this are books without words or blank DVD's.
The complete list of entities:
  • Artifact. The generic definition for all things that are created, whether they are Items, Manifestations, or Formats.
  • Item. A single instance of an Artifact.
  • Manifestation. A type of Artifact in which Abstractions or Expressions are recognized. All Artifacts with the same physical form and intellectual content are considered to be the same Manifestation.
  • Format. This is used for descriptions of Artifacts that need content to be used as a Manifestation. Examples of this are books without words or blank DVD's.
  • Expression. As explained before, this is a 'creative' event.
  • Abstraction. A creation that is a concept.
Identifiers form an important part of the description of creations. Therefore they are specifically mentioned in the INDECS model and an initial list of identifiers is given. Among them are the DOI, the ISBN and the ISSN.

6.6.4 IPR transactions
As stated before, transactions on intellectual property play an important part. Therefore, the INDECS Framework contains several entities, describing a situation, an agreement and an offer. A considerable number of attributes of Agreements are described, starting with the parties to the agreement. Secondly a list of rights agreements is defined. This includes Permission, Prohibition, Requirement and IPRTransfer (the transfer of intellectual property rights). The third attribute of Agreements is an Offer, being an event in which the terms of a possible agreement are set.
The following entities are defined:

  • IprStatement. It is defined as a situation that describes the ownership of intellectual property rights.
  • Agreement. An Agreement is used to record where, when and by who a deal was concluded. 


6.6.5 Intellectual property and Assertions
Apart from the above-mentioned entities, the INDECS model defines the entities Intellectual property and Assertions. 

  • Intellectual property. This is a legal concept, which is defined by national and international law, and is to be used in metadata systems concerning intellectual property.
  • Assertions. An event in which a party makes a truthful claim about something. This may range from a simple statement - like: 'this book has 250 pages' - to a complex description of various riights owned by several parties on a book. For instance the rights for a translation in Dutch, the movie rights, etc.
In the next paragraph, the differences and points of agreements between the two models are discussed.

6.7 The INDECS model and the Functional Requirements

As can be concluded from the preceding paragraphs both the INDECS model and the Functional Requirements are concerned with notions or objects to which intellectual property rights apply. The two models describe things and abstractions as well as properties and actions related to them. Both use more or less the same definition for those notions and objects, but the models are defined for different purposes and focus on different points. 

The starting point of the FRBR is the user of bibliographic data and the ways he or she wants to make use of that data. To enable this, the objects and abstractions described in the bibliographic data are analyzed. The focus of the FRBR lies therefore on tangible objects - ranging from books to skyscrapers - and the abstractions contained by them. Contrary to this, the INDECS model aims to describe all aspects of transactions concerning things and notions to which intellectual law apply. Here events are the primary focal point, together with the actors and the things and notions participating in the event. This leads to several significant differences. 

The first difference between the two models is the way people and groups of people are treated. In the FRBR they are seen as a means to find or identify something, while the INDECS model emphasizes what the role is of a person or a group. To put it in other words: what has this group or person done? Another consequence of the 'contradiction' between objects and events is that the INDECS Framework describes events that are in itself a creation, like a concert of the performance of a play. To that particular event, participants - such as composers, musicians or actors - could be attached. In the analysis of the FRBR, the entities described by bibliographic records are in an abstract state (Work and Expression) or in a tangible state (Manifestation and Item). The entities cannot be an event in itself. When the event does lead to an entity described in a bibliographic record, the event would serve as a subject of the entity. To complicate matters, the INDECS model uses the term 'expression' to define the creative event; a term also used by the FRBR for something completely different.

Perhaps the most significant difference lies in the scope of the models. The FRBR is restricted to the 'library world', while the INDECS model aims to describe all kinds of events, whether they take place in a commercial setting or not. The broad scope of the INDECS Framework also leads to a much more complicated model. The complexity of the model may impede a full implementation of it. As for now only descriptions of documents are made operational, such as the DOI or the Onix database created by EDItEUR. (EDItEUR 2001) No data was found regarding a system based on events, the main focus point of INDECS.

Still, both models have some aspects in common. Firstly, both are created in a reaction to the changes introduced by the large-scale employment of communication and information technology. This creates new problems - the 'information overload' - but also creates new possibilities. One of these possibilities encompasses the use of small bits of a document, which is widely used today. Both models reflect this in their notion of granularity. Another consequence of digitalization is that identifying a document becomes of greater importance. Both models emphasize identification. The second similarity is found in the analysis of the 'stuff'. The analysis of notions and objects to which intellectual property rights apply bears great similarities in both models. Both recognize an abstract notion, which is embodied in a manifestation (or in different manifestations). This manifestation itself is exemplified in one or more items. The FRBR model also contains an expression to differentiate between several realizations of the same abstract notion; where the INDECS Framework would simply define different manifestations. Apart from this, the models use a similar analysis for creations.


PREV.
NEXT
TOC
7 Conclusion: towards a theory of metadata?

In this final chapter, an attempt is made to find and understand the concepts on which the metadata standards are based. These concepts encompass two kinds of objects: information sources or creations, and metadata used in a networked environment. A conceptual framework concerning creations and network-metadata is defined, which will be discussed in the next paragraphs. By comparing the findings of the creators of the diverse standards, the current state of affairs may be found. Furthermore, some possible directions for the future are discussed. Other authors have also commented on the developments concerning metadata, each using a different angle: (Caplan 2000; Gorman 1999; Gradmann 1998; Wood 1999).

7.1 Concepts of creations

The concept of creations plays an important role in all metadata standards, whether it is defined as an information resource, a document-like object, an intellectual property entity, a product of intellectual or artistic endeavor, or a creation. All terms refer to the same concept of something - tangible or abstract - that is createdd using a certain amount of intellectual labor. The objects described may be as various as its definition, ranging from a skyscraper to a small part of a melody. For reasons of simplicity the term 'creation' will be used in this chapter. Ultimately, all metadata is created to simplify the handling of creations. This handling may take many forms such as searching, identifying, using, selling or purchasing and many more ways. While the term creation may apply to many different things, all creations share several aspects. These aspects are:

  • Fixation.
  • Granularity.
  • States of a creation. 


7.1.1 Fixation
While it is implicit in all discussed metadata standards except the Dublin Core, fixation is an important aspect of creations. Adding descriptive data to a fixed entity does make sense, but what if the entity is changing constantly, or its content is generated depending on several factors? Examples of this are the constant changing stocks information, the output of a web cam or a application that creates images 'on the fly', based on user input and available data. These kinds of objects may require a different conceptual framework. Furthermore, it is interesting to note that there is no conceptual difference between textual and other creations. As a consequence, all creations could be described - in theory at least - using the same tools. This gives room to a new dream, not just of bibliographic control, but also of a way to make all creations accessible in a uniform manner.

7.1.2 Granularity
All discussed standards recognize in some way the notion of granularity, which states that a creation may consist of several other creations, or a creation may be part of a larger creation. The notion in itself is not new, good examples are chapters of a book or musical themes in an opera. What may be considered to be a recent development is the increased need to describe the smaller parts of a creation. While this need has always existed - for example, describing smaller parts of a musical creation - in recent years the notion of granularity has become more important. The digitization of creations could be a contributing factor, while it enables more ways of manipulating the creations, or parts of it. The use of information and communication technology also creates new - commercial - possibilities for the owners of creations, which is the main reason for defining the DOI and the INDECS framework. 

A consequence of granularity is that the creation is not something isolated, but it is related in some way to other creations. The Dublin Core has defined the Relation element to accommodate this, to be combined with the recommended qualifiers such as 'Is Version Of', 'Has Version', 'Is Part Of' and 'Has Part'. (DCMI 2000) The DOI does not contain this functionality, but its main function is to identify creations and it is designed to be part of a bigger implementation. Relations are the main focus of the INDECS framework, which includes relations between creations. The FRBR also emphasizes relations between the entities it describes.

7.1.3 States of a creation
The aspect most described in the literature is of course the states of a creation. Both the FRBR and the INDECS framework define three states of a creation. The first is an abstract state describing the 'pure' creation, regardless of the way it is fixed. It cannot be sensed directly, but its presence is detected via a medium. The second state defines the medium used to fixate the creation. To use the example of War and Peace, it has been fixated in a novel - with several translations - but also as a movie. The third state defines a singular item of the fixation. It is interesting to see that both the FRBR and the INDECS model share these views. The three states may be considered to be a commonly held belief, because these views are not disputed in any literature known to the author.

Other aspects are not so undisputed. The FRBR and the INDECS framework have defined an 'Expression', but its meaning is quite different, showing the different backgrounds of the models. As its name states, the FRBR is written to be used with bibliographical records. Those records describe tangible objects and abstractions. Therefore, the records do not take into account creations that may be sensed, but which are not fixed in any form of medium. This class of creations is only performed, it is an event. Those events are of great importance for those taking part of it, and those whose - abstract - creation is performed. To pput it in other words: it is important to the actors/musicians and the writers/composers. The INDECS framework is designed to accommodate the needs of all 'stakeholders' concerning intellectual property rights, and so these events are described. 

Traditionally, libraries - the main producers of bibliographic records - have placed emphasis on the different versions of the same creation. Different editions and publications of the same creation can still be linked together and it is not surprising that the FRBR defined this state of a creation. In a digitized environment where documents are updated frequently, describing several versions of the same creation can be extremely useful. Apparently, this level of detail is not needed in the INDECS framework. Several editions of the same creation would be described as different Manifestations. 

7.2 General principles and concepts of network-metadata

As can be concluded from the previous chapters, several groups and persons attempted to define the nature of metadata. Like creations, metadata is not a new concept, but its definition has changed recently to accommodate the challenges and opportunities created by the digitized environment. In this chapter, an attempt is made to define those differences. In the following paragraphs - for want of a better term - the term 'network-metadata' is used to define the descriptions used in a digitized and networked environment. While both types are of course a form of metadata, the term network-metadata is used to discern it from more traditional forms of metadata such as bibliographic records. For the sake of clarity the differences between network-metadata and bibliographic records are slightly overstated. In reality the boundaries between bibliographic records and network-metadata are not sharply defined.

Instead of exactly repeating all principles of all types of metadata described in the previous chapters, the following paragraphs discuss the general outlines of network-metadata. To ensure a generic view, definitions and principles specific for one metadata set are avoided. Furthermore, an attempt is made to uncover implicit assumptions. The following principles may be defined:

  • The principle of specialization.
  • The principle of direct accessible creations.


7.2.1 The principle of specialization
The most unique characteristic of network-metadata is that it does not aim to define a complete description of a resource. While a bibliographic record - and its guidelines - attempts to describe all relevant aspects of a creation, all network-metadata sets cover only a certain part. 

The specialization is reflected in several ways in the discussed standards. Firstly, the goal of the Dublin Core is limited to discovering creations, and the DOI aims to identify creations. Of course, the description of the DC may be used to identify a creation, and the associated description of the DOI could be used in a search. But both standards lack the complex possibilities of UNIMARC or MARC21. Not only the goals differ, but also the background of the network-metadata creators and its users. The DOI originates from the 'publishing world' and is primarily focused on owners of intellectual property, which are mostly found in the same environment. The DC is used more frequently by governmental and non-profit organizations, where more emphasis is placed on sharing information. Protecting the intellectual property rights of the creations shown would probably not be the first item on the priority list. 

This is a direct consequence of the organizational model used by the network-metadata creators. The model does not encompass one central authority - such as a national library - but it coonsists of several groups or organizations, working independently from each other. Every group has the freedom to define new standards or use standards defined by others. Both strategies have certain advantages: a new standard can be tailored to specific needs, while using pre-defined standards simplifies the exchange of metadata records. Not surprisingly, this model developed after the explosive use of the Internet. The DC is clearly a decentralized structure allowing maximal freedom to its users. While the DOI is more strictly organized, it too places a strong emphasis on decentralization. If several groups create distinct metadata sets describing the same creation, the need arises to relate them. Hence the development of the concept of the Warwick Framework and the RDF.

7.2.2 The principle of direct accessible creations
Although not always mentioned explicitly, it is assumed that most creations described by network-metadata are digital, and are accessible via a network. Analogue to this one could say that bibliographic records are mostly used to describe 'book-like' creations that are not accessed directly, but are accessed via the service of a library or information center. It is not argumented here that network-metadata is used only for electronic creations, just like bibliographical records are not used exclusively for books and journals. However, a very important aspect of electronic creations is direct accessibility: the creation can be opened directly from its location on the network. Furthermore, most network-metadata will possess the same qualities, taking the form of digital data that are to be accessed via a network. This leads to several specific aspects of network-metadata: there is no need to replace creations, network-metadata as a part of a creation and dynamically created network-metadata.

Bibliographic records are designed to create a complete description of a creation. Because the users of those records usually have no direct access to the creation, they must base their decision about the usefulness of a creation on the record. If the creation is directly accessible the user is able to consult it, eliminating the need for a full record. Even if access is limited -which may be the case with commercially available creations - the creation's owner will ensure that the creation contains publicly accessible information about the creation's content. 

7.3 Fixation of creations and generated metadata

Like all types of metadata, network-metadata may exist separately or as an integral part of the creation. Examples of this are placing Dublin Core elements in a Web page, or the combination of content and metadata in the MPEG-7 standard. The close relation with creations and the digitalization of both network-metadata and creations also leads to the possibility of creating network-metadata in a dynamic way. If the contents of certain network-metadata elements are linked to relevant parts of the creation, they may change automatically if the content of the creation is updated. This process could even be taken a step further, by generating the network-metadata automatically when it is needed, ensuring the most actual description possible. 

As can seen from the previous paragraphs, the boundaries between creations and network-metadata are unclear. Technological changes give rise to new forms of creations, and ways of describing them. This may affect even the main characteristic of creations - fixation - while the implications for network-metadata are far from clear. This makes describing the characteristics of creations and network-metadata not an easy task, and the current attempt cannot claim to be a complete coverage of the subject. The next paragraph will attempt to look ahead at possible developments.

7.4 Possible directions for network-metadata standards

This thesis concludes by briefly discussing possible directions for network-metadata standards. Given the fast-paced development of the recent years, it is to be expected that more will change in the future. An important development is the creation of new network-metadata standards, such as MPEG-7 (Martinez 2001), the OCLC/RLG Preservation Metadata standard (OCLC/RLG 2001), or the IEEE standard on learning objects (IEEE 2001). All standards are designed to serve different purposes and more standards are expected to be published. This development is also visible with the Dublin Core as it moves toward extended versions for special interests groups.

Together with the expansion of network-metadata sets a need will probably arise to find a common denominator and to find ways to combine the different sets. Both the DC and the RDF may play an important part in that respect. The Dublin Core element set is the best-known network-metadata set and has proven to be relatively stable. Furthermore, if used in a consequent manner it complies with the minimal requirements of the FRBR. This may be seen as a measurement of the qualities of the DC, insofar the document of the FRBR is used as a criterion. Combining metadata sets will most likely be done with RDF. Its close relation with the upcoming markup language XML and the support of the W3C make it appealing for different groups.

Considering the management of all types of - electronic - creations is considered, only parts of the total solution are currently visible. The discussed network-metadata standards are still developing and are directed toward a specialized goal. Furthermore, the INDECS framework is not yet fully implemented into applications, which may lead to a different perspective. And last but not least, the changes in technology will give rise to new types of created objects, requiring new types of management. Given all this, it is save to conclude that the current state of affairs is just the beginning.


PREV.
NEXT
TOC
Appendix 1: Literature 

AGLS (2001). "Australian Government Locator Service".
 http://www.naa.gov.au/recordkeeping/gov_online/agls/summary.html

Attig, John (1998). "Dublin Core Metadata and the Cataloging Rules", Pennsylvania State University Libraries.
 http://www.ala.org/alcts/organization/ccs/ccda/tf-tei5.html#dublin

Baker, Thomas (2000). "A Grammar of Dublin Core".  D-Lib Magazine 6(October).
 http://www.dlib.org/dlib/october00/baker/10baker.html

Banerjee, Kyle (1999). "Challenges of Using Metadata in a Library Setting: the Collection And Management of Electronic Links (CAMEL) Project at Oregon State University", Oregon State University.
 http://ucs.orst.edu/~banerjek/papers/camel.html

Bearman, David, Eric Miller, et al. (1999). "A Common Model to Support Interoperable Metadata :  Progress report on reconciling metadata requirements from the Dublin Core and INDECS/DOI Communities".  D-Lib Magazine 5(1).
 http://www.dlib.org/dlib/january99/bearman/01bearman.html

Boehm, Carla (1999). "The Metadata Bear. Or: Bearing the weight of accessibility".  Journal of educational media 24(3): 177-190.

Bray, Tim (1998). "RDF and Metadata", XML.com.
 http://www.xml.com/xml/pub/98/06/rdf.html

Butterfield, Kevin L. (1995). "Cataloger's and the Creation of Metadata Systems : a collaborative vision at the University of Michigan", University of Michigan.
 http://www.oclc.org/oclc/man/colloq/butter.htm

Caplan, Priscilla (2000). "International Metadata Initiatives: Lessons in Bibliographic Control", Library of Congress.
 http://lcweb.loc.gov/catdir/bibcontrol/caplan_paper.html

Daniel, Ron and Carl Lagoze (1997). "Extending the Warwick Framework : From Metadata Containers to Active Digital Objects".  D-Lib Magazine(November).
 http://www.dlib.org/dlib/november97/daniel/11daniel.html

DCMI (1998). "Dublin Core Element Set, Version 1.0: Reference Description", Dublin Core Metadata Initiative.
 http://dublincore.org/documents/1998/09/dces

DCMI (1999). "Dublin Core Element Set, Version 1.1: Reference Description", Dublin Core Metadata Initiative.
 http://dublincore.org/documents/1999/07/02/dces

DCMI (2000). "Dublin Core Qualifiers", Dublin Core Metadata Initiative.
 http://dublincore.org/documents/2000/07/11/dcmes-qualifiers

Dempsey, Lorcan and Stuart Weibel (1996). "The Warwick Metadata Workshop : A Framework for the Deployment of Resource Description".  D-Lib Magazine(November).
 http://www.dlib.org/dlib/july96/07weibel.html

Dempsey, Lorcan and Rachel Heery (1998). "Metadata: A current view of practice and issues".  Journal of documentation : devoted to the recording, organization and dissemination of specialized knowledge 54(2): 145-172.

EDItEUR (2001). "ONIX International".
 http://www.editeur.org/onix.html

Erickson, John (1998). "Metadata Initiatives and the DOI: Implications for Electronic Commerce and Copyright Management Services".  TRIALOGUE - Publishing News for Publishers, Vendorrs, and Librarians(8).
 http://www.ybp.com/yrm/trialogue/898/898metad.htm

Erickson, John (1999). "The DOI and Rights Management: Tying Up Loose Ends".  TRIALOGUE - Publishing News for Publishers, Vendors, and Librarians(11).
 http://www.ybp.com/yrm/trialogue/1199/1199doi.htm

EU-NSF (1999). "Metadata for Digital Libraries: a Research Agenda", EU-NSF Working Group on Metadata.
 http://www.iei.pi.cnr.it/DELOS/NSF/Metadata.html
 http://www.ercim.org/publication/ws-proceedings/EU-NSF/metadata.pdf

Gorman, Michael (1999). "Metadata or Cataloguing? A False Choice".  Journal of Internet Cataloging 2(1): 5-22.

Gradmann, Stefan (1998). "Cataloguing vs. Metadata: old wine in new bottles?", Pica.
 http://www.ifla.org/IV/ifla64/007-126e.htm

Guha, R.V. (1996/1997a). "Meta Content Framework".
 http://www.xspace.net/hotsauce/mcf.html

Guha, R.V. (1996/1997b). "Towards a theory of meta-content".
 http://www.xspace.net/hotsauce/mc.html

Hakala, Juha, Ole Husby, et al. (1996). "Warwick framework and Dublin core set provide a comprehensive infrastructure for network resource description", IFLA.
 http://www.ifla.org/documents/libraries/cataloging/metadata/warwick.htm

Heery, Rachel and Manjula Patel (2000). "Application profiles: mixing and matching metadata schemas".  Ariadne(25).
 http://www.ariadne.ac.uk/issue25/app-profiles/intro.html

Henze, Volker and Michael Schefczik (1997). "Metadaten : Beziehungen zwischen Dublin Core Set, Warwick Framework und Datenformaten".  Bibliotheksdienst 31(3): 413-419.

Hillmann, Diane (2000). "Using Dublin Core".
 http://dublincore.org/documents/usageguide/

Iannella, Renato (1999a). "An Idiot's Guide to the Resource Description Framework", University of Queensland.
 http://www.dstc.edu.au/Research/Projects/rdf/RDF-Idiot.html

Iannella, Renato (1999b). "Representing vCard v3.0 in RDF".
 http://dstc.edu.au/Research/Projects/rdf/draft-ianella-vcard-rdf-00.txt

IEEE, Learning Object Metadata WorkingGroup (2001). "Standard for Information Technology - Education and Training Systems - Learning Objects and Metadata", IEEE.
 http://ltsc.ieee.org/wg12/

IFLA, Study Group on the Functional Requirements for Bibliographic Records (1998). "Functional Requirements for Bibliographic Records : Final Report", International Federation of Library Associations and Institutions.
 http://www.ifla.org/VII/s13/frbr/frbr.pdf

KB, Koninklijke Bibliotheek (2001). "Expert Centre : KB-catalogue", Koninklijke Bibliotheek.
 http://www.kb.nl/kb/resources/frameset_kb.html?/kb/sbo/catalogus/kbcatalogus-en.html

Lagoze, Carl (1996). "The Warwick Framework : A Container Architecture for Diverse Sets of Metadata".  D-Lib Magazine(July/August).
 http://www.dlib.org/dlib/july96/lagoze/07lagoze.html

Lagoze, Carl, Clifford A. Lynch, et al. (1996). "The Warwick Framework : A Container Architecture for Aggregating Sets of Metadata".
 http://cs-tr.cs.cornell.edu/Dienst/Repository/2.0/Body/ncstrl.cornell/TR96-1593/html
 http://www.ifla.org/documents/libraries/cataloging/metadata/warwick2.htm

Lagoze, Carl (2001). "Keeping Dublin Core Simple : Cross-Domain Discovery or Resource Description?".  D-Lib Magazine(January).
 http://www.dlib.org/dlib/january01/lagoze/01lagoze.html

Lannom, Laurence (1999). "Handle System Overview", CNRI.
 http://www.icsti.org/icsti/forum/fo9904.html - lannom

Lynch, Clifford (1998). "The Dublin Core Descriptive Metadata Program: Strategic Implications for Libraries and Networked Information Access".  ARL : A Bimonthly Newsletter of Research Library Issues and Actions(196, February).
 http://www.arl.org/newsltr/196/dublin.html

Marchiori, Massimo (1998). "The limits of Web metadata, and beyond".  Computer Networks and ISDN Systems 30: 1-9.

Martinez, José M. (2001). "Overview of the MPEG-7 Standard (version 5.0)", ISO, International Standards Organization.
 http://www.darmstadt.gmd.de/mobile/MPEG7/Documents/W3445.htm

Miller, Eric (1998). "An Introduction to the Resource Description Framework".  D-Lib Magazine 4(May).
 http://www.dlib.org/dlib/may98/miller/05miller.html

Miller, Eric, Eric Childress, et al. (1999). "Making Progress: The Resource Description Framework (RDF)".  Journal of Internet Cataloging 1(4): 53-58.

Miller, Paul and Tony Gill (1997). "DC5: the search for Santa".  Ariadne November 1997(12).
 http://www.ariadne.ac.uk/issue12/metadata/

Morris, Sally (1999). "Metadata and rights".  VINE: Very Informal Newsletter on Library Automation(117): 30-34.

OCLC/RLG (2001). "Preservation Metadata for Digital Objects: A Review of the State of the Art", OCLC/RLG Working Group on Preservation Metadata.
 http://www.oclc.org/digitalpreservation/presmeta_wp.pdf

Paskin, Norman (1999). "DOI: Current Status and Outlook".  D-Lib Magazine 5(May).
 http://www.dlib.org/dlib/may99/05paskin.html

Paskin, Norman (2001a). "Position paper for W3C Workshop on Digital Rights Management for the Web (22/23 January, 2001)", International DOI Foundation.
 http://www.doi.org/001219_IDF_DRM_pos_paper.htm

Paskin, Norman (2001b). "The DOI(r) Handbook : Version 1.0.0 February 2001", International DOI Foundation.
 http://www.doi.org/handbook_2000/index.html

Peereboom, Marianne (2000). "Dublin Core Qualified: Metadata voor het nieuwe millennium".  Informatie professional : magazijn voor informatiewerkers 4(4): 20-23.

Rust, Godfrey (1998). "Metadata: The Right Approach :  An Integrated Model for Descriptive and Rights Metadata in E-commerce".  D-Lib Magazine 4(July/August).
 http://www.dlib.org/dlib/july98/rust/07rust.html

Rust, Godfrey and Mark Bide (2000). "The <indecs> metadata framework : Principles, model and data dictionary", Indecs Framework Ltd.
 http://www.indecs.org/pdf/framework.pdf

Smith, Terence R. (1996). "The Meta-Information Environment of Digital Libraries".  D-Lib Magazine(July/August).
 http://www.dlib.org/dlib/july96/new/07smith.html

Sollins, Karen and Larry Masinter (1994). "RFC 1737: Functional Requirements for Uniform Resource Names".
 http://www.w3.org/Addressing/rfc1737.txt

TEI, Text Encoding Initiative Consortium (2001). "Text Encoding Initative Homepage".
 http://www.tei-c.org/

Vellucci, Sherry L. (2000). "Metadata and Authority Control".  Library resources and technical services 44(1): 33-43.

W3C (2000). "Extensible Markup Language (XML)", World Wide Web Consortium.
 http://www.w3c.org/XML/

W3C (2001). "Semantic Web Activity: Resource DEscription Framework (RDF)", World Wide Web Consortium.
 http://www.w3.org/RDF

Weibel, Stuart (1995). "Metadata: The Foundations of Resource Description".  D-Lib Magazine(July).
 http://www.dlib.org/dlib/july95/07weibel.html

Weibel, Stuart, Jean Godby, et al. (1995). "OCLC/NCSA Metadata Workshop Report", OCLC/NCSA.
 http://dublincore.org/workshops/dc1/report.shtml

Weibel, Stuart (1997). "Discovering Online Resources. The Dublin Core: A Simple Content Description Model for Electronic Resources", Arts and Humanity Data Service.
 http://www.ahds.ac.uk/public/metadata/disc_03.html

Weibel, Stuart, Renato Iannella, et al. (1997). "The 4th Dublin Core Metadata Workshop Report : DC-4 March 3 - 5, 1997 National Library of Australia, Canberra".  D-Lib Magazine(June).
 http://www.dlib.org/dlib/june97/metadata/06weibel.html

Weibel, Stuart and Eric Miller (1997). "Image Description on the Internet : A Summary of the CNI/OCLC Image Metadata Workshop September 24 - 25, 1996 Dublin, Ohio".  D-Lib Magazine(January).
 http://www.dlib.org/dlib/january97/oclc/01weibel.html

Weibel, Stuart and Juha Hakala (1998). "DC-5: The Helsinki Metadata Workshop : A Report on the Workshop and Subsequent Developments".  D-Lib Magazine(February).
 http://www.dlib.org/dlib/february98/02weibel.html

Weibel, Stuart (1999). "The State of the Dublin Core Metadata Initiative April 1999".  D-Lib Magazine 5(April).
 http://www.dlib.org/dlib/april99/04weibel.html

Weibel, Stuart (2000). "The Dublin Core Metadata Initiative: the Frankfurt Focus and the Year 2000".  Zeitschrift für Bibliothekswesen und Bibliographie : Organ des Vereins Deutscher Bibliothekare und des Vereins der Diplombibliothekare an wissenschaftlichen Bibliotheken 47(1): 3-13.

Weibel, Stuart and Traugott Koch (2000). "The Dublin Core Metadata Initiative : Mission, Current Activities, and Future Directions".  D-Lib Magazine 6(12).
 http:/www/dlib.org/dlib/december00/weibel/12weibel.html

Werf, Titia van der (1999). "DONOR en Dublin Core Metadata".  Informatie professional : magazijn voor informatiewerkers 3(3): 23-28.

WIPO (2001). "About Intellectual Property", World Intellectual Property Organization.
 http://www.wipo.org/about-ip/en

Wood, Andrew (1999). "Metadata - The Ghosts of Data Past, Present, and Future".
 http://archive.dstc.edu.au/RDU/reports/Sympos97/metafuture.html


© Ronald Snijder
PREV.
TOC
1