|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 1 Introduction
In recent years standards have emerged concerning the description of information resources on the Internet. This thesis surveys several standards: the Dublin Core Metadata Initiative (DCMI), the Warwick Framework and the Resource Description Framework (RDF), and the Digital Object Identifier (DOI). Although created for different purposes by different groups of people, these standards are closely related. This thesis will describe the standards in detail, and explore the relations between them. The list of standards is by no means comprehensive; other standards such as the TEI Header (TEI 2001), MPEG-7 (Martinez 2001), the OCLC/RLG Preservation Metadata standard (OCLC/RLG 2001), or the IEEE standard on learning objects (IEEE 2001) are not discussed. Furthermore, this thesis surveys the models that can serve as a basis for the metadata standards. These models are the IFLA Functional Requirements for Bibliographic Records, and the INDECS Metadata Framework. The thesis tries to enlighten the differences between the models and hopes to find out whether the models cover all aspects of information resources on a network. Credit where credit is due A thesis like this cannot be written without the help and support of a lot of people. First of all I would like to thank dr. Gerhard Riesthuis, for his comments and tough questions. I received additional comments by dr. Maja Zumer. My old friend Diebert Jan van Rhijn introduced me to Trudi Noordermeer MSc, who directed me to the subject of this thesis. My manager Jeroen Liefferink kindly granted me study leave, even in busy times. I owe fellow student and co-worker Tom Hendriks for being my 'partner in crime'. Adriënne Baars-Schuyt MA and Erik Smelt MA volunteered to correct my English. My friends and family helped me a lot, by not minding my absence during this time. But most of all I would like to thank my wife Dorien. Without her love and patience, this thesis - and all of my studies - would have been completely impossible. This thesis is therefore dedicated to her.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 2 What is metadata?
A thesis about metadata standards should contain at least one definition of metadata. Using the literature the following list of definitions can be made:
A subject like metadata standards has many aspects, roughly dividable in theoretical and technical aspects. The theoretical aspects may lead to a better understanding of information resources and the means to make them more useful. Technical aspects are another important part of the standards. The acceptance and implementation of the solutions provided by the standards depend on the way they fit into existing technologies. Creating a solution optimized for the Gopher protocol would not lead to worldwide acceptance, while the use of HTML could. In spite of its importance, this thesis will not focus on technology, but on the concepts on which the standards are based. If the concepts are clear, a rapid change of technology loses some of its impact.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 3 The Dublin Core
Metadata Initiative (DCMI)
3.1 Introduction to the Dublin Core (DC) The Dublin Core (DC) may be considered to be the best-known metadata project today. From its start in 1995 the DC evolved into 'the leading initiative for improving resource discovery on the Web'. (Weibel 2000) During a series of Metadata Workshops the DC was created and further refined. This process will be discussed in this chapter after an introduction to the Dublin Core element set. The DC was created as an answer to the question: 'Why is it so difficult to find items of interest on the Internet or the World Wide Web?' (Weibel, Godby et al. 1995) Automatically indexing all information resources on the Internet does not function very well. Searching in very large collections may yield large result sets. If the collection contains resources of different fields of study, the differences in jargon used may cause additional problems. For instance, the term 'period' is used quite differently in mathematics and in geology. Furthermore, resources may not have a description at all except a filename, which may not describe its contents. (Weibel 1995; Weibel, Godby et al. 1995) The resolution to this problem is found in creating a description for the resource, so a metadata record performs more or less the same function as a catalogue record. Given the enormous amount of resources, it is not possible to create a large and complex description, like a UNIMARC or MARC21 record or a TEI Header1. The number of resources also makes it impossible for trained librarians to create all the descriptions. Therefore, those responsible for the resources should be able to create the description themselves. This requires a simple format, easy to understand and maintain. The description should follow a standard, enabling automated tools to 'harvest' and manipulate the descriptions. The description of the information resources is created using Dublin Core elements. An element is a pre-defined string or label, which is paired to a value. In its simplest form it looks like this: 'Author = Ronald Snijder'. A list of fifteen elements is defined, which will be discussed in the next paragraph. Unlike a more formal description method like the AACR2, the use of the Dublin Core is not bound to strict rules. All elements are optional, and may be repeated without any constraint. The syntax of the elements is simple: just the name - or 'Identifier' - of the element, and the value of the element. The value may consist of free text or it may be taken from a standardized resource. The description may reside in a separate file or it may be a part of the information resource itself. While no formal syntax rules are defined, several syntax recommendations have been created for generic text files, HTML and the Resource Description Framework (RDF). These are discussed in (Hillmann 2000). If needed, the contents of the element may be refined or further explained by a qualifier. A qualifier gives additional information on the contents of an element, e.g. by telling which controlled vocabulary or which encoding scheme is used. This vocabulary or scheme may be any 'standard list', ranging from the Library of Congress Subject Headings for a subject to an ANSI standard for dates. The next example may enlighten this: 'Date (scheme = ISO 8601) = 2001-03-12'. ISO 8601 is a standard for displaying dates, in the format YYYY-MM-DD (year in four digits, followed by month in two digits, followed by day in two digits). This qualifier makes it clear that the date is March 12, 2001 instead of December 3, 2001. Any qualifier imaginable may be used. However, it should be noted that a list of recommended qualifiers is available in (DCMI 2000). The primary goal of the Dublin Core is resource discovery, which was defined in the first Metadata Workshop - held in Dublin in the USA - as the most pressing need, regardless of the subject or the complexity of the information resource. The resources to be discovered and described are of the type 'document-like objects' (DLOs). This restriction was made in the first Workshop, for two reasons. Firstly, the participants of the Workshop felt that most information sought on the Internet is contained in such objects. Secondly, if the standard applied to satisfaction on familiar objects, it could be extended to other types of resources. A DLO is not strictly defined, but explained using examples. An electronic newspaper article is a DLO, while a collection of slides without any description or annotation is not. Generally speaking, the content of this type of information resource is mostly text, and the required metadata is very similar to the description of analogue textual documents. Restricting the focus in the first Metadata Workshop eliminated the need for a very large set of metadata elements. No elements were created for copyright information, cost or archival status. The Workshop aimed to define a core set of elements, which could be universally applied for resource discovery. The result was the Dublin Metadata Core Element Set, containing thirteen elements. In the third Workshop this set of elements was extended to fifteen. (Weibel and Miller 1997) It was recognized from the start that a list of just fifteen elements might not suffice for all uses. The set therefore may be extended with new elements. Of course, other users of the Dublin Core may not recognize the new elements, but it was believed that any program using this set would at least be able to use the DC elements. The DC element set had to be simple, but it should also contain the possibility to 'map' it into other, more complex systems like MARC21. The solution to this problem is twofold: further explanation of elements using qualifiers and through extension of the original element set. Information placed in an element can be refined using a qualifier. For instance, an information resource in Dutch may also have a translated title in English. Because all elements may be repeated, a DC description would contain two Title elements, one containing a Dutch title and one containing an English title. Adding a qualifier to the two elements - like the required MARC21 field (240 for the primary title, 242 for the other title) - simplifies the 'mapping process'. Adding additional metadata elements extends the original element set. A possible extension of the element set could encompass an element for the organization 'holding' the information resource. 3.2 Elements of the Dublin Core The Dublin Core consists of the following elements, as copied from (DCMI
1999). The Identifier is used in the DC description, while the Name is
used as a 'label'2 describing the element. The comment helps to describe
the function of the different elements, and the way they may be used in
a Dublin Core description.
'Element: Title In the first Workshop, several principles for the Dublin Core were defined. All further development of the DC should be governed by these principles in order to keep it small, understandable, and flexible. The principles are intrinsicality, extensibility, syntax-independence, optionality, repeatability, and modifiability.
The DC describes properties that are a part of the object itself, and can be discovered 'by having the work in hand'. Examples are the intellectual content or the physical form of the object. Extrinsic data describe the use of the object, for instance access to a resource or the costs associated to it. If it is necessary to describe extrinsic data, this may be achieved by extending the element set. Extending the DC is not only useful for describing extrinsic data, but also for describing data that is not part of the element set. It gives users the flexibility to use fields, specific to their needs. Furthermore, the specification of the Dublin Core will develop and change. If the original elements are not abandoned while new elements are added, backward compatibility will be maintained. he DC is intended to be used within very different applications, possibly using different technologies. Defining a formal syntax would impede this, limiting the ways to implement the element set. All elements are optional for two reasons. Not all elements may be useful or be known for all kinds of information resources. The second reason is simplicity. A person who is not a professional librarian and to whom not all elements make sense may describe the information resource. This way, the user may restrict himself to the elements best suited for his situation. Every element may be repeated without any constraint. For instance, if the described information resource has ten authors, the DC description will contain ten Author elements. All the elements described in the first Workshop are defined in such a way, that no further explanation is necessary. However, the DC elements should enable different groups of people to describe 'their' information resources. To do so effectively, a qualifier can modify the definition of all elements. The definition cannot change completely, but it can be 'narrowed down'. The use of qualifiers simplifies mapping the DC elements to other information systems and it enables a more precise description of the resource. The Subject element could be used to describe an information resource in several ways: As described in paragraph 3.1 the Dublin Core element set originally consisted of thirteen elements. Below are the elements, as defined in the first Metadata Workshop. For a detailed discussion of those elements see (Weibel, Godby et al. 1995).
3.5 Extending the element set: description of images The third Metadata Workshop focused on the question of how the DC could be applied to the description of images. (Weibel 1997; Weibel and Miller 1997) Two very interesting results emerged from this Workshop and the discussions following it. Firstly, it was concluded that images are not very different from DLOs. Secondly, two elements were added to the DC element set. 3.5.1 Images vs. Document-Like Objects
The outcome of the Workshop was that discovery of images could be enabled using the DC, be it with an extended element set. This implicates that images are DLOs and not something completely different. What defines a document-like object is not its content, but whether or not it is fixed. If the resource shows the same content to all users, it is fixed. This class of objects does not only contain texts and images, but also movies, speeches and other resources. Objects that are non-document-like are able to generate different images for each user. One could think of databases or other applications that create images 'on the fly', based on user input and available data. Of course images differ from textual resources. Indexing textual resources is relatively easy, while extracting information from images is not. When using digital images various technical aspects come into play, which are all necessary to display the image correctly. Another aspect is formed by different versions of the same image. These problems were recognized, but not solved in the third Workshop. 3.5.2 Extension of the Dublin Core
Intrinsicality is one of the principles that guide the DC. Aspects that are not an integral part of the resource should not be described. Following this principle intellectual property rights may not be considered to be such an aspect, and creating a Rights field is not necessary. In this case pragmatic considerations prevailed. Rights management is of such importance to the use of images, that not deploying a Rights field would be a large handicap. Weibel states it like this: 'Resource description is a messy business - ask any cataloger.' (Weibel and Miller 1997) The field is intended to describe usage restrictions or to contain a link to a description of usage restrictions. 3.6 The Canberra Qualifiers and simplicity The main subject of the fourth Metadata Workshop was a further definition and structuring of the elements and to define ways to extend the element set. (Weibel, Iannella et al. 1997) Different views on the DC were expressed, and several qualifiers were defined. 3.6.1 Simplicity vs. qualifiers
From the Minimalist point of view the most important aspect of the DC is its simplicity. If all elements have the same meaning in all communities that use the Dublin Core, creating and exchanging metadata is - relatively - easy. This will also simplify the use of automated tools. Using additional qualifiers to modify the meaning of an element endangers the interoperatibility of the descriptions. The Structuralists emphasize the advantages of - formally defined - qualifiers. Using those qualifiers enables them to create a more precise description, which may better suit the needs of a specific user group. Since the definition of the DC element set is based on consensus, a compromise had to be found. The primary goal of resource description is to enable retrieval of information resources. It is assumed that the use of qualifiers improves the retrieval result. Another assumption is that a simple element set is essential. Merging those assumptions leads to the solution of using qualifiers but keeping the number of qualifiers to a minimum. 3.6.2 The Canberra Qualifiers
The fifth Workshop aimed to create the final definition for the 15 elements known as the 'Finnish Finish'. (Miller, Paul and Gill 1997; Weibel and Hakala 1998) Discussed were the elements Date, Coverage and Relation. This last element plays an important part in the 1:1 Principle that states that every resource should have its own description. 3.7.1 The Date element
3.7.2 The Coverage element
3.7.3 The Relation element and the 1:1 Principle
The solution to this problem was found in the 1:1 Principle, which states that every information resource should be described separately and every metadata description should describe just one information resource. It is recommended to link these descriptions by using the Relation element. In the unqualified version of the DC no rules are given for the description of the type of relation or how to identify the related information resource. Using the example of the previous paragraph two descriptions should be created, one for Picasso's painting and one for the digital image. 3.8 Standardization, qualification and the DOI/INDECS project The sixth Workshop resulted in several developments that may be considered crucial to the DC. (Weibel 1999; Werf 1999) Firstly, a process of standardization was started resulting in a more formal organization and the creation of two standards: DC version 1.0 (DCMI 1998) and DC version 1.1 (DCMI 1999). The second development encompassed the development of qualification mechanisms. Guidelines were developed for the use of qualifiers. Thirdly, cooperation was started with representatives of the DOI/INDECS project. This cooperation could have had far reaching consequences, as will be described in the last chapter. 3.8.1 Standardization of the DC
3.8.2 Qualification mechanisms
As can be concluded from the examples above the qualifiers used may be very different. To ensure a minimum level of interoperatibility restrictions are needed. It is advisable to use qualification schemes that are maintained by external parties, for example, the Dewey Decimal System or the Medical Subject Headings (MeSH). Even so, different user groups may use different schemes. This affects the interoperatibility in two ways. Firstly, it affects the exchange of metadata and secondly, it affects the search in metadata records. If different groups want to exchange metadata records a very strict use of qualifier schemes is compulsory, ensuring that all records are truly 'interchangeable'. Searching metadata records from different user groups requires a less strict protocol. If the application used to search the data is able to use the different DC elements, a search may still produce useful results. Of course, if the application is able to use the qualification scheme the results may be better. Using standardized qualification schemes may improve the search results. Another - pragmatic - restriction to the use of qualifiers is the Dumb-Down Principle: one should be able to ignore all qualifiers and use the description as if it were unqualified. (Baker 2000; Werf 1999) Ignoring the qualifier results in loss of precision, but resource discovery - the main function of the DC - is still possible. While the possible list of qualifiers is endless, the list of DC elements is not. Therefore, automated agents working with the Dublin Core are to be expected to at least recognize the elements. So, if a qualifier is not recognized, the Dumb Down Principle ensures that the information retrieved is still usable and that resource discovery is possible. The use of qualifiers and the Dumb-Down Principle is discussed in detail in (Lagoze 2001). 3.8.3 The DOI/INDECS project and the Dublin Core
It was recognized that cooperation would benefit both the DC project and the DOI/INDECS project. Although the projects have a different focus - resource description vs. managing inteellectual property rights - the underlying data model could very well be the same. A common data model would enable the exchange of metadata, and support the goals of both user groups. One of the consequences might have been the creation of a new version of the Dublin Core - DC 2.0 - but the associated Working Group was deactivated in October 2000, apparently without achieving results. The focus of the seventh Workshop was on qualifiers. (Peereboom 2000; Weibel 2000) One of the results was a common set of qualifiers useful to all users of the DC. Furthermore, work started on a metadata registry designed to register and share local extensions of the DC. 3.9.1 Common qualifiers
3.9.2 Extending the Core
3.10 Special interests: extensions to the Dublin Core The main topic of the eighth Workshop may be described as extensions for special interests. (Weibel and Koch 2000) As a result of this Workshop several developments arose, which will be discussed in this paragraph. These include application profiles and the creation of special interest groups. 3.10.1 Application profiles
By creating an application profile the Implementers are able to declare how they use standard schemes. An application profile may be shared with other groups with similar interests, and thus interoperatibility of metadata is enhanced. The following principles apply to application profiles:
3.10.2 Special interest groups
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 4 The Warwick
Framework and the Resource Description Framework
4.1 Introduction to the Warwick Framework The concept of the Warwick Framework (WF) results from an analysis of the DC and other forms of metadata. It seeks to 'store' several sets of metadata, each separately accessible. The WF is not implemented in an automated system, but serves as a concept to explore the possibilities of such a system. This concept influenced the Resource Description Framework (RDF), developed by the World Wide Web Consortium (W3C), as an infrastructure for the use of structured metadata sets. Contrary to the WF the RDF is an implementation built in XML, the successor of HTML. This chapter will describe the Warwick Framework concept and the concept and architecture of the Resource Description Framework. The Warwick Framework concept is designed to place several metadata sets - independent of each other - in one place. The term for this place is a framework. This framework contains two kinds of objects: containers and packages. A package is defined as a metadata set, such as a complete DC description. A container is defined as a storage place for other containers or for packages. In paragraph 4.3, the architecture of the WF is discussed in more detail. During the second DC Workshop - held in Warwick in the United Kingdom - the Dublin Core was evaluated, after aa year of working with the element set. (Dempsey and Weibel 1996; Hakala, Husby et al. 1996; Lagoze 1996; Lagoze, Lynch et al. 1996) Although the concept of a simple metadata set was considered to be useful, several issues emerged. Firstly, the question rose if the DC qualified as a true standard suitable for implementing in automated systems. Such a standard requires clearly defined rules, whereas the DC is very loosely defined. Secondly, the DC was kept simple to enable all authors to create a description about their content, but the ability of these authors to do so was questioned. Furthermore, the loose nature of the DC rules gives room to inconsistent descriptions. An example of this may be found in (Banerjee 1999). The third issue encompasses the purpose of the DC. It is aimed toward content description leaving aside administrative or operational metadata. Lastly, some elements of the DC - coverage, source and relationship - may be considered to be too much depending on a certain domain. If these elements fit into the DC other elements may also apply and that may result in an extended instead of a core element set. This resulted in three questions, as quoted from (Lagoze 1996):
'1. Should the number of elements in the Dublin Core be expanded or contracted? Some workshop attendees felt that in order for the Core to succeed as a tool for authors, its number of elements should be restricted to only the most basic descriptive elements. Others saw the need for new fields such as terms and conditions or administrator.
4.2 General principles of metadata At the time of the second Workshop, the answers were not clear, and
the participants tried to find them by defining and examining general principles
of metadata. The Warwick Framework is based on these principles. As can
be seen from the following paragraphs, the concept of the WF is broader
than the Dublin Core. The purpose of the DC is describing data for the
goal of resource discovery, whereas the WF is an architecture in which
different metadata sets - used for different goals - may be stored. The
following principles were defined:
The boundaries between the metadata classes are not very clear. The DC gives room to descriptions of relations to other sources and MARC records also serve as administrative data for libraries.
As described in the first paragraph of this chapter the WF aims to assemble separate metadata sets into one framework. It consists of two components: containers and packages. As explained in paragraph 4.1 a package is a complete description. Packages are stored in a container. The state of the container may be transient or persistent. A transient container is not stored as a file on a server, but exists only as a 'transport object between repositories, clients and agents'. (Lagoze, Lynch et al. 1996) A persistent container is stored on a server and is accessible through an identifier. The container itself may also be a part of a larger object, which contains both the data (the information resource) and the metadata (the container). The packages are defined as independent objects meaning that the different metadata sets are separated. The user of the container must be able to recognize the type of the package, and must be able to skip unknown - or unwanted - packages. For instance, one package may contain a complete UNIMARC description and another package may contain a DC description. If the user knows this, he or she has the possibility to use the UNIMARC description instead of the DC description. The packages may be encrypted enabling the creators of the package controlled access to its contents. No definition is given for the format of the packages. It may consist of ASCII texts, executable code or any other suitable format. Three types of packages are defined:
4.3.1 Unresolved issues
Other issues concerned the implementation of the WF. As described before the metadata sets in all packages are of a certain type, such as a DC description, a TEI Header or a MARC21 record. These types should be registered and understood by the software used to process it. Data encoding is the second implementation issue discussed. Two types of syntax should be defined: for the containers itself, and for the contents of the packages. Because of the variety of the metadata, different formats may be used. The following example may enlighten this: one package could contain a DC description in plain text, while the next package contains a MARC21 record. The complex formatting of that record requires more handling to make it useful, while a plain text can be read directly. Another area of concern is efficiency. The complexity of the WF applications should not lead to low performance. Repository access is another issue. A protocol should be developed to enable the automated retrieval of the separate containers and packages. 4.3.2 Extending the Framework
The model is an extension of the WF in two ways. One aspect is that the content of the information resource is considered to be another package, instead of something outside the Framework. The consequence is that access to the 'content package' may be governed by rules defined in another package. Here the 'computational power' comes into play. The second extension to the WF is that relationships between different information resources - stored in packages - is stored in one or more separate packages. The relationship is an information resource in itself. Again, this enables automated agents to process its contents. One of the relations defined could be terms and conditions for accessing a content package. Depending on the contents of the 'relationship package', access is granted or denied. The Resource Description Framework has similar features. To describe an information resource, properties may be defined - with the same characteristics as relationships - and values for these properties may be defined. It will be discussed in more detail in the following paragraphs. The RDF is not a theoretic concept like the Warwick Framework, but is implemented in an application. 4.4 Introduction to the Resource Description Framework The Resource Description Framework (RDF) is developed under the auspices of the World Wide Web Consortium (W3C). It is designed as an architecture to enable the sharing of metadata sets; although designed using contributions of several parties, it may be considered to be an implementation of the principles guiding the WF. (Bray 1998; Iannella 1999a; Miller, Eric 1998; Miller, Eric, Childress et al. 1999; W3C 2001; Weibel 1999) The RDF is an application built upon XML, and the following paragraphs will contain a short discussion of the XML architecture, and the principles and architecture of the RDF. 4.4.1 A short description of XML
This example of XML describes a 'collection' of two documents, an article from D-Lib Magazine, and a book on libraries in the 16th century. <?xml version = "1.0"?>
4.4.2 Principles of the Resource Description Framework
The RDF description contains Resource 1. Resource 1 has several properties of different types. Some of the property values are atomic, while one other is a resource in itself, with two other properties. Figure 2 contains an example, with the - fictional - website www.ronaldsnijder.nl.
As can be seen, several properties of the website are described. The value of the 'author property' contains of another resource, with two properties. In this example, the property types are more or less randomly chosen. Contrary to the previous example, the diverse metadata sets use standardized elements instead of randomly chosen elements. The elements are the properties in the RDF model - the contents of the elements are the values of the RDF properties. Every metadata set may be defined in an information resource accessible though the Internet. This information resource contains the property types that are allowed to use with that particular metadata set. There is no central depot for storing these resources, nor is there one central responsible organization. Everyone may define and manage a schema, as long as it is accessible via a networking address. Furthermore, a RDF description may use several schemas, as is illustrated in Figure 3.
The resource description in Figure 3 uses several schemas. The title and author of the website are described using the Dublin Core elements. Furthermore, an element of the Australian Government Locator Service (AGLS 2001) is used to describe the subject. More information about the author is described using the vCard metadata element set. (Iannella 1999b) In this case, the information on the author is stored in the - non-existing - website: www.allauthorsoftheworld.org.. The RDF description of the website would look like this: <? xml version = "1.0" ?>
The RDF model puts certain limitations on the definition of schemas, since all schemas must comply with the XML syntax rules. This is not necessarily a bad thing. Using a technical standard enhances interoperatibility of data. Furthermore, the definition of XML - and of the RDF - is maintained by a public organization - the W3C - and not by a software vendor. All technical documentation is publicly accessible enabling a widespread implementation. No software vendor will gain from creating propriety standards - as is the case with the web browsers Internet Explorer and Netscape Navigator. 4.4.3 RDF as a solution for the unresolved WF issues?
Semantic overlap cannot be solved completely by applying syntax rules. Like the WF, the RDF enables the use of diverse metadata sets on different levels. All properties may consist of other resources, creating the diverse levels of the description. Again, this flexibility leads to the same disadvantage as the WF. Packages containing more or less equivalent metadata sets may still reside on different levels. Some other implementation issues may be resolved. One of the problems
concerned the registration of the separate metadata packages. Every set
should be described, and its type should be recognizable. The definition
of the schemas solves this. The organization responsible for the set maintains
the schema, including its type. While the contents of the schema - the
properties - are freely definable, the syntax rules enforce a standardized
way to share information. This also solves the data-encoding problem: the
'containers' use the XML definition. Furthermore, the contents of these
containers may be of different formats. The last issues concern repository
access and efficiency. Access to the separate resources is implied in the
definition of RDF, because a resource always has an URI. No information
on the efficiency of RDF implementations was found.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 5 The Digital
Object Identifier (DOI)
5.1 Introduction to the Digital Object Identifier Whereas the goal of the Dublin Core may simply be described as 'resource discovery', the ultimate goal of the Digital Object Identifier (DOI) is 'rights management'. While the DOI also uses metadata elements it is not primarily designed for finding resources, but to identify abstract concepts or tangible objects. The DOI project started in 1996, as a reaction by the - commercial - publishers to the challenges presented by E-commerce. (Erickson 1998; Erickson 1999; Morris 1999; Paskin 1999; Paskin 2001a; Paskin 2001b; Rust 1998) In the era before the popularity of the Internet description of resources was considered to be a task for the 'library world' and managing intellectual property rights was a task for the 'publishing world'. The possibility to publish information online changed that. Information - or 'content' - does not reside on a phhysical carrier, but it can be downloaded from a network. As a consequence, finding and buying a resource may be combined in one action. Another consequence is that all information resources need to be uniquely identified. If an article is published in a journal, the combined data of the journal (title, year, issue number, ISSN) and the article itself (title, author, page numbers) are sufficient to identify and retrieve the article. When the same article is only available as a resource on a network, other means of identification are necessary. Furthermore, the holders of the intellectual property rights on that resource grant the use of the resource. When different persons or organizations own different rights on the same resource, administrating these rights becomes a complex matter. To further complicate matters, different users may be granted different uses of the same resource. As an example, a distinction could be made between educational uses - by scholars and students - and commercial use of scientific information resources. This type of administration can only be achieved by automation and standardization of data. The DOI aims to support this. Identification of information resources is not new. Before defining the DOI, Paskin discusses two 'identification schemes': the ISBN and the URL. (Paskin 2001b) He finds both to be insufficient. The ISBN does not identify an information resource, but a physical object - hard covers and paperbacks often have different ISBNs, while the text is the same - and its use for information resources smaller than books is problematic. URLs do not identify the content of the information resource, but its location. A radical change of the contents may not lead to a new URL, while a change of location - which does not necessarily mean a change of content - does. The DOI is described as a 'persistent identifier of intellectual property entities'. An entity is defined as something that is identified ranging from an abstract notion to a tangible object. Intellectual property can be defined as 'creations of the mind: inventions, literary and artistic works, and symbols, names, images, and designs used in commerce'. (WIPO 2001) This means that the DOI can be used to identify anything to which intellectual property rights apply, ranging from ideas to a small piece of a musical recording. In the literature, the term granularity is used to describe the different levels or parts of an information resource. As can be seen from the examples above, the DOI may be used on every level of granularity, depending on the requirements of the person or organization using it. The identification is not limited to entities owned by persons or organizations. It may also be applied to entities in the public domain. Furthermore, the identifier is persistent, meaning that a change of ownership of the entity does not change it. Contrary to what one may expect the DOI is not an application for the management of rights, but it may act as a component of such a system. The DOI is used to create unique identifiers. A comparison with the ISBN may enlighten this: assigning a unique number to a book is extremely useful, but publishing and selling books encompasses more than that. Changes in ownership or other rights concerning an entity are not administered within the DOI description itself. Of course, managing intellectual property rights concerning an entity depends on a good identification of that entity. For instance, the original text of the Odyssey is part of the public domain, while a recent Dutch translation is not. To distinguish between those two entities, a form of identification is necessary. Furthermore, some metadata is an undividable part of the DOI description. Therefore, the creation of metadata is not exclusively done by libraries or other information brokers, but it becomes a task of the entity's publisher or creator. While the Dublin Core also permits authors to create metadata for their own creations, its guidelines are very loose. The metadata used in the DOI description must follow strict rules, defined in a DOI Application Profile. The identification of entities is done by the use of a unique identifier, combined with a minimal set of metadata elements. Both the structure of the identifier and the metadata elements will be discussed in the next paragraph. As is the case with the Dublin Core, the DOI and its metadata are governed by certain principles. Instead of discovering the principles themselves, the creators of the DOI made the pragmatic choice to use guidelines defined by others. The identification of resources is guided by the requirements for Uniform Resource Names (URN), which can be found in (Sollins and Masinter 1994):
To achieve its goals the identification scheme is embedded in the DOI System. This system consist of several components, which will be discussed in the next paragraphs:
The DOI uses the Handle System. Designed to be used for digital libraries, the Handle System is designed as a 'naming service' for obtaining digital objects. On the Internet, the Domain Name System (DNS) is used for a similar goal. The DNS maps the domain address used in a browser (URL) - like www.loc.gov - to an Internet Protocol (IP) address cconsisting of 12 digits, which is used to find the correct website. Using a URL containing more information than just the domain - such as www.loc.gov/copyright/circs/circ1.html - enables obtaining a specific information resource. If the resource were moved, the used URL would point to the wrong direction, making it unusable. The Handle System does not point directly to a web address, but it uses a unique name to which one or more network addresses are attached. Therefore, if the location of the resource is changed, the name of the resource remains the same. (Lannom 1999) 5.2.1 Enumeration: constructing unique identifiers
The DOI consists of a prefix and a suffix, separated by a forward slash. It is not limited to numbers, but all printable characters from Unicode v2.04. There is no limitation on the length of either the prefix or the suffix. All prefixes start with the string '10.' to distinguish DOIs from other implementations of the Handle System. All other characters in the prefix make up the identification of the organization - called a Registrant - registering the entities. If desirablee, the organization may apply for several prefixes. The Registrant creates the DOI suffix, which may be any string of characters desired. This enables the Registrant to use already applied identification codes, for instance an ISBN. Following are examples of valid DOIs:
The DOI Application Profile (DOI-AP) is defined as 'the functional specification of an application (or set of applications) of the DOI System to a class of intellectual property entities that share a common set of attributes.' (Paskin 2001b) As can be seen, the DOI-AP acts as an addition to the metadata kernel. Like the DC, the additions can be used for metadata that is specific for a certain type of entity. An example would be the number of pages, which is a necessity for paper-based information resources, but is quite useless for a MP3 music file. Maintenance of the DOI Application Profiles is done by one or more organizations responsible for the DOI. DOI Application Profiles may overlap other DOI Application Profiles, and may be as narrow or as broad as is required by the users. Every DOI should contain at least one DOI-AP. Examples of possible DOI Application Profiles:
Apart from the metadata elements administrative data such as Registrant, date of registration, record version number must also be kept in a DOI registration. Metadata on intellectual property rights concerning the entities are not a part of the metadata kernel. This is done deliberately. While intellectual property rights may change on a regular basis, the data in the kernel is intended to be static. 5.2.3 Resolution: obtaining information resources
On a network like the Internet, several copies of the same information
resource may exist. For example, several websites have 'mirror sites' to
balance the traffic created by the visitors of the site. Also, the same
entity may exist in several data formats. An example would be an article,
available as a HTML file, a PDF file and as a MS Word document. The DOI
System is able to automatically resolve a DOI to a desired address, depending
on rules created by the rights owner of the entity. The address may be
an URL or other network address, but also another DOI. This creates the
possibility to use a DOI of an abstract work - for instance: The Odyssey
- and resolve to a specific document, suuch as a Dutch translation of the
work. The Registrant is responsible for the maintenance of the addresses.
All DOIs - and the addresses attached to it - are registered in a central
repository.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 6 The Functional
Requirements for Bibliographic Records (FRBR) and the INDECS Metadata Framework
6.1 Introduction to the Functional Requirements for Bibliographic Records (FRBR) The concepts of both the Dublin Core and the Digital Object Identifier were created as a reaction to the challenges and new opportunities imposed by the digitalization of information resources. The same may also be said of the Functional Requirements. While the new technology enabled the creation of large databases containing bibliographic information, it also brought new forms of information resources, which may be accessed though a network. These changes - combined with an increased need to reduce costs and a growth of published information resources - were the main reasons behind a study undertaken by an IFLA Study Group in 1992 to accomplish the following: 'to produce a framework that would provide a clear, precisely stated, and commonly shared understanding of what it is that the bibliographic record aims to provide information about, and what it is that we expect the record to achieve in terms of answering user needs.' (IFLA 1998) The Study Group was to find a 'minimal level' of cataloguing, reducing the effort of national bibliographic agencies to create a new catalogue record. On the other hand, the record should also meet all the essential user needs. By defining such a level, a newly created catalogue record could contain less information than a 'full level record', but would still be useful for the users of the record. The study resulted in the Functional Requirements for Bibliographic Records, published in 1998. The FRBR aims to provide a framework that identifies the objects of interest to the user of bibliographic data. In the IFLA Study the term used for these objects is 'entities'. Here the term has a slightly different meaning than defined by the DOI. While the DOI uses the term only for notions or objects to which intellectual property rights apply, the entities in the FRBR model also encompasses the creators and the possible subjects of those notions and objects. Three groups of entities are defined: firstly, products of intellectual or artistic endeavor; the second group encompasses the persons or groups responsible for the products of the first group; and thirdly the subjects of the products. The INDECS metadata model - also discussed in this chapter - uses a similar definition for the entities of the first group. In the following paragraphs these entities are discussed in more detail. The framework does not only encompass entities, but also the attributes - or properties - of the entities and thhe relations that exist between the entities. The attributes and relations are recorded in the bibliographic record. When a user consults a record to perform a task, certain attributes and relations are used. The study aims to uncover the relations between the user tasks and the recorded information. For instance, if the user attempts to identify a work, one of the attributes to use is the title of the work. While the users tasks will be discussed in more detail, all defined relations will not be discussed. The FRBR aims to cover all varieties of materials. This includes textual, musical, cartographic, audio-visual, graphic and three-dimensional materials, physical and digital media formats etc. What the model does not is covering the attributes and relationships reflected in authority records. The entities that are recorded in authority records - such as persons, organizations and concepts - are defined, but the additional data required for the use in authority records is not analyzed. While this may be considered to be an important extension of the model, the focus of the IFLA study and time constraints prohibited this. The FRBR defines several entities - divided in three groups - which represent the key objects of interest to users of bibliographic data. While using the phrase 'aggregate and component entities', the model also recognizes granularity. An entity may consist of several other entities, or an entity may be part of a larger entity. The same holds true for the DOI, which aims to identify and describe any concept or object or parts of it. Within the Dublin Core the notion of granularity is more implicit. The 1:1 Principle states that different information resources should be described separately and linked together using the Relation element. To describe smaller parts of an information resource, the recommended qualifiers for this element (DCMI 2000) can be used. 6.2.1 Work, Expression, Manifestation and Item
An Expression is defined as an intellectual or artistic realization of a Work in any form of notation, sound, image, object, etc. Aspects that are not integral to the realization of the Work are not considered to be a part of an Expression. An example of this is the typeface and layout of a text. If inherent aspects of the Expression are changed a new Expression is created. Examples of this are the transition from spoken word to a written text or the translation from one language into another. Also, if a Work is revised or updated - which is quite common in a digital environment - the result is considered to be a new EExpression. Using this definition, different versions of the same Work may be identified. A Manifestation is defined as the physical embodiment of an Expression. All objects with the same physical form and intellectual content are considered to be the same Manifestation. A change of the physical form results in a new Manifestation, an example may be a HTML file, a PDF file and a MS Word document all containing the same text. This definition enables the selection of a Manifestation with the required physical aspects. The Item is defined as a single exemplar of a Manifestation. If several copies of a Manifestation exist, the Item enables the identification of a single object. 6.2.2 Person and Corporate body
6.2.3 Concept, Object, Event and Place
The Study Group defined four user tasks. These tasks describe the type of usage that is made of bibliographic data. By defining these tasks, the model provides the possibility to map the attributes and relationships of the defined entities to the supported user task. The following tasks were defined:
6.4 The FRBR as basis for the Dublin Core and the DOI? While the Functional Requirements primarily aim to describe the minimal functionality of records in national bibliographies, its concept of the entities framework and the user tasks may also be applied to the Dublin Core and the DOI. Both metadata standards should enable the user to find, identify and use information and documents in the broadest sense. The framework of the FRBR describes the 'information resources' of the Dublin Core and the 'entities' of the DOI. The user tasks describe the uses made of them. In the following paragraphs the support of the entities framework and the user tasks are discussed. The level of support is discussed using the basic requirements for national bibliographic records as defined in the FRBR (Chapter 7 - Basic Requirements for National Bibliographic Records). These requirements describe the minimal functionality of a set of bibliographic data. The user should be able to do the following:
6.4.1 The Dublin Core and the FRBR
The elements Creator and Contributor may indeed be used for finding the persons or groups that are responsible for an information resource. The role of the Title element and the Subject element is also clear. When looking at the minimal requirements defined in the FRBR for finding Manifestations using an identifier, the element Identifier may serve to find those entities. The definition of the Relation element is broad enough to use it to find all entities in the same series. A series of documents share a common relation, and therefore the Relation element may be suitable to describe that relation. The retrieval of entities - especially the more 'formalized' search on series and identifier - depends heavily on standardization of the input. As stated before the DC does not impose strict guidelines on this part or assumes to prescribe binding rules. The Dublin Core may be used as a starting point for organizations with a common interest, which may create specific guidelines for the description of information resources. Within those communities rules or guidelines may be created to formalize the input. The formalization of the input should enhance the retrieval of relevant information resources. While Attig discusses the user task of identification briefly, the FRBR discusses numerous attributes of entities that are to be used. Among them are the title, the persons or organizations responsible, the language, the publisher/distributor and the date of publication/distribution. The corresponding DC elements are: Title, Creator, Contributor, Language, Publisher and Date. Furthermore, Attig discusses the selection task using the Description and the Coverage element as primary source, assuming that selection is primarily based on the contents of the retrieved information resources. The FRBR also emphasizes attributes considering the form and type of medium of the information resource. This is not surprising. The Dublin Core is mostly used for a single type of medium - a resource on a network - while the FRBR applies to all media types available in a national bibliography. The coverage of the obtain task also reflects this. Attig mentions a correct address in the Identifier element of the DC, the FRBR mentions several attributes ranging from title to access restrictions. If the functionality of the Dublin Core is compared to the minimal requirements for bibliographic records, one may conclude that the DC meets those requirements more than halfway. The elements of the DC correspond to the attributes used in the FRBR. If the elements are used in a consequent manner - using a form of authority control and/or 'input guidelines' - the requirements of the FRBR may be met, but only within the boundaries of the community that enforces the guidelines. Using the Dublin Core elements in a less formalized manner does not lead to consistent descriptions. Inconsistent descriptions may be more useful than no description at all, but consistency is crucial for bibliographic records. 6.4.2 The DOI and the FRBR
Not surprisingly the Find task is only partly supported. The DOI number, Identifier, Title and Primary may be used but the DOI metadata kernel does not contain elements for finding entities by subject or series. Identifying entities is the main purpose of the DOI and therefore emphasis is placed on the DOI number to uniquely identify each entity. Title, Identifier, Primary agent - and to a lesser extent Type - support this task. No explicit support exists for relationships to other entities or for describing the characteristics of the entities' medium. The same also holds true for the selection task. The obtain task is more strongly developed as resolution is one of the main components of the DOI System. Comparing the functionality of the DOI metadata to the minimal requirements leads to the conclusion that the requirements are not met. The DOI is not intended to be used for resource discovery, but for the identification of entities. The larger element set of the Dublin Core makes it more suitable to the requirements of the FRBR, but the DC lacks consistent input rules. Contrary to this, the DOI uses strict guidelines and obtaining entities is one of its strong points. As stated before, the DOI may act as a component of a larger system. Combining the DOI - and its metadata - with an application to manage bibliographical data or intellectual property rights could result in a system that may very well comply with the FRBR requirements. 6.5 Introduction to the INDECS Metadata Framework 'In the print world, sales and rights transactions tend to be considered separately; in the digital world, all transactions are rights transactions'. (Morris 1999) This quote describes best the background for the INDECS project. INDECS is an acronym for Interoperatibility of Data for Electronic Commerce Systems, the name of a project running from the end of 1998 until 2000, with support of the European Commission. It brought together different organizations representing creators, publishers and managers of 'content in the digital environment'. Starting point is the assumption that trading content - to which intellectual property laws apply - will be done using a network as distribution channel. Automation is necessary to manage all the intellectual property (IP) transactions. To do this effectively, all activities must be identified and described using a standard. The INDECS Metadata Framework is that standard. (Bearman, Miller et al. 1999; Paskin 2001b; Rust 1998; Rust and Bide 2000) The INDECS Framework aims to create interoperable metadata for electronic commerce - or E-commerce - meaning that the metadata must be used in as many ways as possible. This eliminates creating separate metadata for different media, for different functions - like cataloguing, discovery or rights management, for different levels - from simple to complex - of metadata, for different languages or territories and for different technology platforms. All these can be seen as trade barriers for E-commerce. INDECS uses a fairly simple model for commerce: 'People make stuff. People use stuff. People do deals about stuff' (Rust and Bide 2000). The name used in the INDECS model for 'stuff' is 'creations'. Creations are formally defined as the output of a creative activity, and may be a tangible object or a concept. Commerce is seen in a broad context. Not only transactions with financial aim are described, but also transactions that enable people to access creations freely. An example of this is the use of books from a library or downloading copyright-free files. The INDECS model uses the following definition of metadata: 'An item of metadata is a relationship that someone claims to exist between two entities.' This reveals a different approach to metadata. While the Dublin Core describes properties of an object, the INDECS model uses a more 'relation oriented' approach. The object itself is not the most important part of the model, but the relationships that exist between the objects. An event is the type of relation that is central in the INDECS Framework. Not surprisingly the events described concern the creation, modification, use and 'publishing' of entities, and the conditions to enable these events: transactions, agreements, offers and payments. Furthermore, several axioms about metadata for E-commerce were defined:
6.6 Entities of the INDECS Metadata Framework The definition of an entity is the same as in the DOI: an entity is something that is identified. Any entity has five types of attributes: labels, quantities, qualities, types and roles. This list of attributes functions as the metadata set for that particular entity. While all other attributes describe the entity itself, role is a part played or function fulfilled by an entity - during an event - in relation to another entity or entities. The entities contain several types of roles:
Relations are the most important entities in the INDECS model, which is reflected in the number of possible relation types and the subdivision of every relation type. Three types of relations are defined, where each type is divided in subtypes:
Creations are the 'stuff' in the model of commerce described in paragraph 6.5. The types of entities defined in the INDECS model seem to overlap those of the FRBR. Even the names are mostly the same: item, manifestation, and expression. This is not completely true. The INDECS Framework does not make the distinction between an abstract notion - for instance, the song Come Together - and the different realizations of it, such as performances by The Beatles and by other artists. The term Expression is used here for events that are creations in itself, like a performance of a play. An Expression may be recorded on a medium and this medium then becomes a Manifestation. An example of an Expression is the Live Aid concert, performed in 1985. A videotape containing a registration of this concert is a related Manifestation. The entity Abstraction is equivalent to Work in the FRBR and the definitions of Manifestation and Item carry the same meaning in both models. This is used for descriptions of Artifacts that need content to be used as a Manifestation. Examples of this are books without words or blank DVD's. The complete list of entities:
6.6.4 IPR transactions
6.7 The INDECS model and the Functional Requirements As can be concluded from the preceding paragraphs both the INDECS model and the Functional Requirements are concerned with notions or objects to which intellectual property rights apply. The two models describe things and abstractions as well as properties and actions related to them. Both use more or less the same definition for those notions and objects, but the models are defined for different purposes and focus on different points. The starting point of the FRBR is the user of bibliographic data and the ways he or she wants to make use of that data. To enable this, the objects and abstractions described in the bibliographic data are analyzed. The focus of the FRBR lies therefore on tangible objects - ranging from books to skyscrapers - and the abstractions contained by them. Contrary to this, the INDECS model aims to describe all aspects of transactions concerning things and notions to which intellectual law apply. Here events are the primary focal point, together with the actors and the things and notions participating in the event. This leads to several significant differences. The first difference between the two models is the way people and groups of people are treated. In the FRBR they are seen as a means to find or identify something, while the INDECS model emphasizes what the role is of a person or a group. To put it in other words: what has this group or person done? Another consequence of the 'contradiction' between objects and events is that the INDECS Framework describes events that are in itself a creation, like a concert of the performance of a play. To that particular event, participants - such as composers, musicians or actors - could be attached. In the analysis of the FRBR, the entities described by bibliographic records are in an abstract state (Work and Expression) or in a tangible state (Manifestation and Item). The entities cannot be an event in itself. When the event does lead to an entity described in a bibliographic record, the event would serve as a subject of the entity. To complicate matters, the INDECS model uses the term 'expression' to define the creative event; a term also used by the FRBR for something completely different. Perhaps the most significant difference lies in the scope of the models. The FRBR is restricted to the 'library world', while the INDECS model aims to describe all kinds of events, whether they take place in a commercial setting or not. The broad scope of the INDECS Framework also leads to a much more complicated model. The complexity of the model may impede a full implementation of it. As for now only descriptions of documents are made operational, such as the DOI or the Onix database created by EDItEUR. (EDItEUR 2001) No data was found regarding a system based on events, the main focus point of INDECS. Still, both models have some aspects in common. Firstly, both are created in a reaction to the changes introduced by the large-scale employment of communication and information technology. This creates new problems - the 'information overload' - but also creates new possibilities. One of these possibilities encompasses the use of small bits of a document, which is widely used today. Both models reflect this in their notion of granularity. Another consequence of digitalization is that identifying a document becomes of greater importance. Both models emphasize identification. The second similarity is found in the analysis of the 'stuff'. The analysis of notions and objects to which intellectual property rights apply bears great similarities in both models. Both recognize an abstract notion, which is embodied in a manifestation (or in different manifestations). This manifestation itself is exemplified in one or more items. The FRBR model also contains an expression to differentiate between several realizations of the same abstract notion; where the INDECS Framework would simply define different manifestations. Apart from this, the models use a similar analysis for creations.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| 7 Conclusion:
towards a theory of metadata?
In this final chapter, an attempt is made to find and understand the concepts on which the metadata standards are based. These concepts encompass two kinds of objects: information sources or creations, and metadata used in a networked environment. A conceptual framework concerning creations and network-metadata is defined, which will be discussed in the next paragraphs. By comparing the findings of the creators of the diverse standards, the current state of affairs may be found. Furthermore, some possible directions for the future are discussed. Other authors have also commented on the developments concerning metadata, each using a different angle: (Caplan 2000; Gorman 1999; Gradmann 1998; Wood 1999). The concept of creations plays an important role in all metadata standards, whether it is defined as an information resource, a document-like object, an intellectual property entity, a product of intellectual or artistic endeavor, or a creation. All terms refer to the same concept of something - tangible or abstract - that is createdd using a certain amount of intellectual labor. The objects described may be as various as its definition, ranging from a skyscraper to a small part of a melody. For reasons of simplicity the term 'creation' will be used in this chapter. Ultimately, all metadata is created to simplify the handling of creations. This handling may take many forms such as searching, identifying, using, selling or purchasing and many more ways. While the term creation may apply to many different things, all creations share several aspects. These aspects are:
7.1.2 Granularity
A consequence of granularity is that the creation is not something isolated, but it is related in some way to other creations. The Dublin Core has defined the Relation element to accommodate this, to be combined with the recommended qualifiers such as 'Is Version Of', 'Has Version', 'Is Part Of' and 'Has Part'. (DCMI 2000) The DOI does not contain this functionality, but its main function is to identify creations and it is designed to be part of a bigger implementation. Relations are the main focus of the INDECS framework, which includes relations between creations. The FRBR also emphasizes relations between the entities it describes. 7.1.3 States of a creation
Other aspects are not so undisputed. The FRBR and the INDECS framework have defined an 'Expression', but its meaning is quite different, showing the different backgrounds of the models. As its name states, the FRBR is written to be used with bibliographical records. Those records describe tangible objects and abstractions. Therefore, the records do not take into account creations that may be sensed, but which are not fixed in any form of medium. This class of creations is only performed, it is an event. Those events are of great importance for those taking part of it, and those whose - abstract - creation is performed. To pput it in other words: it is important to the actors/musicians and the writers/composers. The INDECS framework is designed to accommodate the needs of all 'stakeholders' concerning intellectual property rights, and so these events are described. Traditionally, libraries - the main producers of bibliographic records - have placed emphasis on the different versions of the same creation. Different editions and publications of the same creation can still be linked together and it is not surprising that the FRBR defined this state of a creation. In a digitized environment where documents are updated frequently, describing several versions of the same creation can be extremely useful. Apparently, this level of detail is not needed in the INDECS framework. Several editions of the same creation would be described as different Manifestations. 7.2 General principles and concepts of network-metadata As can be concluded from the previous chapters, several groups and persons attempted to define the nature of metadata. Like creations, metadata is not a new concept, but its definition has changed recently to accommodate the challenges and opportunities created by the digitized environment. In this chapter, an attempt is made to define those differences. In the following paragraphs - for want of a better term - the term 'network-metadata' is used to define the descriptions used in a digitized and networked environment. While both types are of course a form of metadata, the term network-metadata is used to discern it from more traditional forms of metadata such as bibliographic records. For the sake of clarity the differences between network-metadata and bibliographic records are slightly overstated. In reality the boundaries between bibliographic records and network-metadata are not sharply defined. Instead of exactly repeating all principles of all types of metadata described in the previous chapters, the following paragraphs discuss the general outlines of network-metadata. To ensure a generic view, definitions and principles specific for one metadata set are avoided. Furthermore, an attempt is made to uncover implicit assumptions. The following principles may be defined:
The specialization is reflected in several ways in the discussed standards. Firstly, the goal of the Dublin Core is limited to discovering creations, and the DOI aims to identify creations. Of course, the description of the DC may be used to identify a creation, and the associated description of the DOI could be used in a search. But both standards lack the complex possibilities of UNIMARC or MARC21. Not only the goals differ, but also the background of the network-metadata creators and its users. The DOI originates from the 'publishing world' and is primarily focused on owners of intellectual property, which are mostly found in the same environment. The DC is used more frequently by governmental and non-profit organizations, where more emphasis is placed on sharing information. Protecting the intellectual property rights of the creations shown would probably not be the first item on the priority list. This is a direct consequence of the organizational model used by the network-metadata creators. The model does not encompass one central authority - such as a national library - but it coonsists of several groups or organizations, working independently from each other. Every group has the freedom to define new standards or use standards defined by others. Both strategies have certain advantages: a new standard can be tailored to specific needs, while using pre-defined standards simplifies the exchange of metadata records. Not surprisingly, this model developed after the explosive use of the Internet. The DC is clearly a decentralized structure allowing maximal freedom to its users. While the DOI is more strictly organized, it too places a strong emphasis on decentralization. If several groups create distinct metadata sets describing the same creation, the need arises to relate them. Hence the development of the concept of the Warwick Framework and the RDF. 7.2.2 The principle of direct accessible creations
Bibliographic records are designed to create a complete description of a creation. Because the users of those records usually have no direct access to the creation, they must base their decision about the usefulness of a creation on the record. If the creation is directly accessible the user is able to consult it, eliminating the need for a full record. Even if access is limited -which may be the case with commercially available creations - the creation's owner will ensure that the creation contains publicly accessible information about the creation's content. 7.3 Fixation of creations and generated metadata Like all types of metadata, network-metadata may exist separately or as an integral part of the creation. Examples of this are placing Dublin Core elements in a Web page, or the combination of content and metadata in the MPEG-7 standard. The close relation with creations and the digitalization of both network-metadata and creations also leads to the possibility of creating network-metadata in a dynamic way. If the contents of certain network-metadata elements are linked to relevant parts of the creation, they may change automatically if the content of the creation is updated. This process could even be taken a step further, by generating the network-metadata automatically when it is needed, ensuring the most actual description possible. As can seen from the previous paragraphs, the boundaries between creations and network-metadata are unclear. Technological changes give rise to new forms of creations, and ways of describing them. This may affect even the main characteristic of creations - fixation - while the implications for network-metadata are far from clear. This makes describing the characteristics of creations and network-metadata not an easy task, and the current attempt cannot claim to be a complete coverage of the subject. The next paragraph will attempt to look ahead at possible developments. 7.4 Possible directions for network-metadata standards This thesis concludes by briefly discussing possible directions for network-metadata standards. Given the fast-paced development of the recent years, it is to be expected that more will change in the future. An important development is the creation of new network-metadata standards, such as MPEG-7 (Martinez 2001), the OCLC/RLG Preservation Metadata standard (OCLC/RLG 2001), or the IEEE standard on learning objects (IEEE 2001). All standards are designed to serve different purposes and more standards are expected to be published. This development is also visible with the Dublin Core as it moves toward extended versions for special interests groups. Together with the expansion of network-metadata sets a need will probably arise to find a common denominator and to find ways to combine the different sets. Both the DC and the RDF may play an important part in that respect. The Dublin Core element set is the best-known network-metadata set and has proven to be relatively stable. Furthermore, if used in a consequent manner it complies with the minimal requirements of the FRBR. This may be seen as a measurement of the qualities of the DC, insofar the document of the FRBR is used as a criterion. Combining metadata sets will most likely be done with RDF. Its close relation with the upcoming markup language XML and the support of the W3C make it appealing for different groups. Considering the management of all types of - electronic - creations is considered, only parts of the total solution are currently visible. The discussed network-metadata standards are still developing and are directed toward a specialized goal. Furthermore, the INDECS framework is not yet fully implemented into applications, which may lead to a different perspective. And last but not least, the changes in technology will give rise to new types of created objects, requiring new types of management. Given all this, it is save to conclude that the current state of affairs is just the beginning.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Appendix
1: Literature
AGLS (2001). "Australian Government Locator Service".
Attig, John (1998). "Dublin Core Metadata and the Cataloging Rules",
Pennsylvania State University Libraries.
Baker, Thomas (2000). "A Grammar of Dublin Core". D-Lib Magazine
6(October).
Banerjee, Kyle (1999). "Challenges of Using Metadata in a Library Setting:
the Collection And Management of Electronic Links (CAMEL) Project at Oregon
State University", Oregon State University.
Bearman, David, Eric Miller, et al. (1999). "A Common Model to Support
Interoperable Metadata : Progress report on reconciling metadata
requirements from the Dublin Core and INDECS/DOI Communities". D-Lib
Magazine 5(1).
Boehm, Carla (1999). "The Metadata Bear. Or: Bearing the weight of accessibility". Journal of educational media 24(3): 177-190. Bray, Tim (1998). "RDF and Metadata", XML.com.
Butterfield, Kevin L. (1995). "Cataloger's and the Creation of Metadata
Systems : a collaborative vision at the University of Michigan", University
of Michigan.
Caplan, Priscilla (2000). "International Metadata Initiatives: Lessons
in Bibliographic Control", Library of Congress.
Daniel, Ron and Carl Lagoze (1997). "Extending the Warwick Framework
: From Metadata Containers to Active Digital Objects". D-Lib Magazine(November).
DCMI (1998). "Dublin Core Element Set, Version 1.0: Reference Description",
Dublin Core Metadata Initiative.
DCMI (1999). "Dublin Core Element Set, Version 1.1: Reference Description",
Dublin Core Metadata Initiative.
DCMI (2000). "Dublin Core Qualifiers", Dublin Core Metadata Initiative.
Dempsey, Lorcan and Stuart Weibel (1996). "The Warwick Metadata Workshop
: A Framework for the Deployment of Resource Description". D-Lib
Magazine(November).
Dempsey, Lorcan and Rachel Heery (1998). "Metadata: A current view of practice and issues". Journal of documentation : devoted to the recording, organization and dissemination of specialized knowledge 54(2): 145-172. EDItEUR (2001). "ONIX International".
Erickson, John (1998). "Metadata Initiatives and the DOI: Implications
for Electronic Commerce and Copyright Management Services". TRIALOGUE
- Publishing News for Publishers, Vendorrs, and Librarians(8).
Erickson, John (1999). "The DOI and Rights Management: Tying Up Loose
Ends". TRIALOGUE - Publishing News for Publishers, Vendors, and Librarians(11).
EU-NSF (1999). "Metadata for Digital Libraries: a Research Agenda",
EU-NSF Working Group on Metadata.
Gorman, Michael (1999). "Metadata or Cataloguing? A False Choice". Journal of Internet Cataloging 2(1): 5-22. Gradmann, Stefan (1998). "Cataloguing vs. Metadata: old wine in new
bottles?", Pica.
Guha, R.V. (1996/1997a). "Meta Content Framework".
Guha, R.V. (1996/1997b). "Towards a theory of meta-content".
Hakala, Juha, Ole Husby, et al. (1996). "Warwick framework and Dublin
core set provide a comprehensive infrastructure for network resource description",
IFLA.
Heery, Rachel and Manjula Patel (2000). "Application profiles: mixing
and matching metadata schemas". Ariadne(25).
Henze, Volker and Michael Schefczik (1997). "Metadaten : Beziehungen zwischen Dublin Core Set, Warwick Framework und Datenformaten". Bibliotheksdienst 31(3): 413-419. Hillmann, Diane (2000). "Using Dublin Core".
Iannella, Renato (1999a). "An Idiot's Guide to the Resource Description
Framework", University of Queensland.
Iannella, Renato (1999b). "Representing vCard v3.0 in RDF".
IEEE, Learning Object Metadata WorkingGroup (2001). "Standard for Information
Technology - Education and Training Systems - Learning Objects and Metadata",
IEEE.
IFLA, Study Group on the Functional Requirements for Bibliographic Records
(1998). "Functional Requirements for Bibliographic Records : Final Report",
International Federation of Library Associations and Institutions.
KB, Koninklijke Bibliotheek (2001). "Expert Centre : KB-catalogue",
Koninklijke Bibliotheek.
Lagoze, Carl (1996). "The Warwick Framework : A Container Architecture
for Diverse Sets of Metadata". D-Lib Magazine(July/August).
Lagoze, Carl, Clifford A. Lynch, et al. (1996). "The Warwick Framework
: A Container Architecture for Aggregating Sets of Metadata".
Lagoze, Carl (2001). "Keeping Dublin Core Simple : Cross-Domain Discovery
or Resource Description?". D-Lib Magazine(January).
Lannom, Laurence (1999). "Handle System Overview", CNRI.
Lynch, Clifford (1998). "The Dublin Core Descriptive Metadata Program:
Strategic Implications for Libraries and Networked Information Access".
ARL : A Bimonthly Newsletter of Research Library Issues and Actions(196,
February).
Marchiori, Massimo (1998). "The limits of Web metadata, and beyond". Computer Networks and ISDN Systems 30: 1-9. Martinez, José M. (2001). "Overview of the MPEG-7 Standard (version
5.0)", ISO, International Standards Organization.
Miller, Eric (1998). "An Introduction to the Resource Description Framework".
D-Lib Magazine 4(May).
Miller, Eric, Eric Childress, et al. (1999). "Making Progress: The Resource Description Framework (RDF)". Journal of Internet Cataloging 1(4): 53-58. Miller, Paul and Tony Gill (1997). "DC5: the search for Santa".
Ariadne November 1997(12).
Morris, Sally (1999). "Metadata and rights". VINE: Very Informal Newsletter on Library Automation(117): 30-34. OCLC/RLG (2001). "Preservation Metadata for Digital Objects: A Review
of the State of the Art", OCLC/RLG Working Group on Preservation Metadata.
Paskin, Norman (1999). "DOI: Current Status and Outlook". D-Lib
Magazine 5(May).
Paskin, Norman (2001a). "Position paper for W3C Workshop on Digital
Rights Management for the Web (22/23 January, 2001)", International DOI
Foundation.
Paskin, Norman (2001b). "The DOI(r) Handbook : Version 1.0.0 February
2001", International DOI Foundation.
Peereboom, Marianne (2000). "Dublin Core Qualified: Metadata voor het nieuwe millennium". Informatie professional : magazijn voor informatiewerkers 4(4): 20-23. Rust, Godfrey (1998). "Metadata: The Right Approach : An Integrated
Model for Descriptive and Rights Metadata in E-commerce". D-Lib Magazine
4(July/August).
Rust, Godfrey and Mark Bide (2000). "The <indecs> metadata framework
: Principles, model and data dictionary", Indecs Framework Ltd.
Smith, Terence R. (1996). "The Meta-Information Environment of Digital
Libraries". D-Lib Magazine(July/August).
Sollins, Karen and Larry Masinter (1994). "RFC 1737: Functional Requirements
for Uniform Resource Names".
TEI, Text Encoding Initiative Consortium (2001). "Text Encoding Initative
Homepage".
Vellucci, Sherry L. (2000). "Metadata and Authority Control". Library resources and technical services 44(1): 33-43. W3C (2000). "Extensible Markup Language (XML)", World Wide Web Consortium.
W3C (2001). "Semantic Web Activity: Resource DEscription Framework (RDF)",
World Wide Web Consortium.
Weibel, Stuart (1995). "Metadata: The Foundations of Resource Description".
D-Lib Magazine(July).
Weibel, Stuart, Jean Godby, et al. (1995). "OCLC/NCSA Metadata Workshop
Report", OCLC/NCSA.
Weibel, Stuart (1997). "Discovering Online Resources. The Dublin Core:
A Simple Content Description Model for Electronic Resources", Arts and
Humanity Data Service.
Weibel, Stuart, Renato Iannella, et al. (1997). "The 4th Dublin Core
Metadata Workshop Report : DC-4 March 3 - 5, 1997 National Library of Australia,
Canberra". D-Lib Magazine(June).
Weibel, Stuart and Eric Miller (1997). "Image Description on the Internet
: A Summary of the CNI/OCLC Image Metadata Workshop September 24 - 25,
1996 Dublin, Ohio". D-Lib Magazine(January).
Weibel, Stuart and Juha Hakala (1998). "DC-5: The Helsinki Metadata
Workshop : A Report on the Workshop and Subsequent Developments".
D-Lib Magazine(February).
Weibel, Stuart (1999). "The State of the Dublin Core Metadata Initiative
April 1999". D-Lib Magazine 5(April).
Weibel, Stuart (2000). "The Dublin Core Metadata Initiative: the Frankfurt Focus and the Year 2000". Zeitschrift für Bibliothekswesen und Bibliographie : Organ des Vereins Deutscher Bibliothekare und des Vereins der Diplombibliothekare an wissenschaftlichen Bibliotheken 47(1): 3-13. Weibel, Stuart and Traugott Koch (2000). "The Dublin Core Metadata Initiative
: Mission, Current Activities, and Future Directions". D-Lib Magazine
6(12).
Werf, Titia van der (1999). "DONOR en Dublin Core Metadata". Informatie professional : magazijn voor informatiewerkers 3(3): 23-28. WIPO (2001). "About Intellectual Property", World Intellectual Property
Organization.
Wood, Andrew (1999). "Metadata - The Ghosts of Data Past, Present, and
Future".
© Ronald Snijder
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||