Information architecture for digital libraries
First Monday

Information architecture for digital libraries by Scott J. Simon



Abstract
This paper surveys information architecture in the context of digital libraries. Key concepts are defined as well as common attributes of information architectures in general. Communications standards — including hybrid TCP/IP–OSI, CORBA, and Web services — are explored, as well as the history of information architecture and related models. A number of digital library projects are analyzed with a focus on their distinct architectures. The key role of information architecture in the design and development of the twenty–first century digital library is detailed throughout.

Contents

Introduction
Concepts
Standards
Digital library projects
Conclusion

 


 

Introduction

As libraries of the twenty–first century continue to migrate to the Web, many complex questions surface: What exactly is a digital library? What standards and technologies are utilized by such libraries? What role does interoperability play? Can the emerging discipline of information architecture provide guidance?

These questions provide a framework for a survey of digital libraries that explores information architecture and architectural models mainly from a technology and communications standards perspective. Related digital libraries research in collections management, classification systems, and distribution models, will not be addressed here. While a relatively recent phenomenon, digital libraries research extends back at least a decade with the National Science Foundation’s Digital Libraries Initiative (1994–2004). The Digital Libraries Initiative funded a number of digital library projects, and the architectures employed varied widely. Digital libraries initiatives that will be examined closely include the decentralized agent–based University of Michigan Digital Library (UMDL), and the federated open architecture of the Networked Computer Science Reference Library (NCSRL). In addition, the centralized service–based Distributed National Electronic Resource (DNER) developed by U.K.’s Joint Information Systems Committee (JISC) will demonstrate yet another distinct architectural model for digital library design.

It is useful to distinguish between digital libraries (also commonly referred to as ‘electronic’, ‘virtual’, or ‘hybrid’ libraries) that provides digital content and services only (for an example, an online interface and search engine linking directly to electronic journal articles), and a digital library that provides both digital and physical content and services (for an example, an online interface and search engine linking directly to electronic journal articles as well as information for accessing physical collections). The distinction is primarily an architectural one, and architecting a digital library by necessity involves the discipline of information architecture. Ideally, it is the information architect that defines the service, clarifies the vision, determines content and functionality, specifies how users will find information, and maps how the services provided will change and grow over time, all dependent upon the needs of users.

Information is an essential component of digital libraries; just as essential are context, users, and services. A structural relationship between all of these components is implied in any information architecture. A common attribute of architecture is standards. Perhaps the most visible standards–based architecture today is the Internet’s hybrid TCP/IP–OSI. Operating in interdependent layers, the Internet is a modular design that enables communications between hardware and software and interoperability between systems.

 

++++++++++

Concepts

Digital library:

The term “digital library” is one that is quite prevalent. As the Association of Research Libraries (1995) observes, “There are many definitions of a ‘digital library.’ Terms such as ‘electronic library’ and ‘digital library’ are often used synonymously.” The authors go on to identify the following common elements of digital libraries:

  • The digital library is not a single entity.
  • The digital library requires technology to link [multiple resources].
  • The linkages between the many digital libraries and information services are transparent to the end users.
  • Universal access to digital libraries and information services is a primary goal.
  • Digital library collections are not limited to document surrogates. They extend to digital artifacts that cannot be represented or distributed in printed formats.

An important distinction is made between a digital library that provides digital content only and a digital library that provides both physical and digital content. The term “digital library” implies a resource–discovery service that extends the range and reach of physical and/or electronic resources in a digital environment, through a common online information search–and–retrieval interface that presents multiple library catalogs and/or proprietary databases as a single “digital catalog.”

A digital library operates in collaboration with resource and service providers to offer a variety of content in multiple formats regardless of physical location and promotes resource sharing among traditional libraries. As a result, the digital library builds on and expands resources and services that were once only available at the local level.

Information architecture (IA):

According to typical definitions, architecture is the “art and science of building structures.” As Wyllys (2000) observes, information architecture shares its history with that of the pure discipline of architecture. In fact, it was the chairman of the American Institute of Architects, Richard Saul Wurman, who in 1976 first coined the term “information architecture.” Wurman envisioned information architecture as the science and art of creating an instruction for organized space. He viewed the problems of gathering, organizing, and presenting information as closely related to the problems an architect faces in designing a building that will serve the needs of its occupants. According to Wurman, the architect must:

  • ascertain (i.e., gather information about) those needs;
  • organize the needs into a coherent pattern that clarifies their nature and interactions; and,
  • design a building that will — by means of its rooms, fixtures, machines, and layout, i.e., flow of people and materials — meet the occupants’ needs.

The gathering, organizing, and presenting of information to serve a purpose, or set of purposes, is primarily an architectural task, a task that Wurman (in Bradford, 1996) was the first to articulate. To Wurman, the information architect is:

  1. The individual who organizes the patterns inherent in data, making the complex clear.
  2. A person who creates the structure or map of information that allows others to find their personal paths to knowledge.

Part of an emerging twenty–first century profession addressing the needs of the age and focused on clarity, human understanding, and the science of the organization of information.

Morville and Rosenfeld (2007) propose several component definitions for information architecture. In their estimation, IA is:

  1. The combination of organization, labeling, and navigation schemes within an information system.
  2. The structural design of an information space to facilitate task completion and intuitive access to content.
  3. The art and science of structuring and classifying Web sites and intranets to help people find and manage information.
  4. An emerging discipline and community of practice focused on bringing principles of design and architecture to the digital landscape.

The main job of an information architect is outlined by Morville and Rosenfeld below. Note that Morville and Rosenfeld’s original definition has been modified slightly through the substitution of “services” for “site.” While much of the information architecture literature focuses on Web sites, the change is made to shift the focus to services in the context of digital libraries.

According to Morville and Rosenfeld, the information architect should:

  • Prepare a definition of “what the [service] will actually be and how it will work”;
  • Clarify the mission and vision for the [service], balancing the needs of its sponsoring organization and the needs of its audiences;
  • Determine what content and functionality the [service] will provide;
  • Specify how users will find information with the [service] by defining its organization, navigation, labeling, and searching systems; and,
  • Map out how the [service] will accommodate change and growth over time.

Miller (2001) describes three types of architecture that make up the larger discipline of information architecture.

  • Technical architecture concerns the detailed specifications for individual or multiple information systems, components, and protocols for communication between components. Such related issues as the purpose of the system or user interaction are implicit at best.
  • Functional architecture is concerned with systems– and/or user–focused processes. This relates to functions that a system is required to perform or functions the user may want the system to undertake. Whether the focus is on the system, users, or both must be made explicit.
  • Landscape architecture defines systems boundaries and describes the relationships between users, resources, and technology. How far these relationships can be realized at the architectural level remains questionable.

Information architecture is the combination of technical, functional, and landscape architectures. Information architecture documents systems, processes, and relationships, along with its environment.

As defined above, information architecture is a discipline that structures information sites and services. Often found in conjunction with information architecture is the term “content management”, defined as the discipline of collecting, managing, and publishing content, often with the help of sophisticated software–based content management systems. Although information architecture may imply content management, the two terms are not interchangeable.

Information architecture is inherently multidisciplinary as Andrew Dillon (2000) has been apt to note (see Figure 1).

 

Figure 1: Contributory disciplines Information architecture
Figure 1: Contributory disciplines: Information architecture (Dillon, 2000).

 

Figure 2 depicts the relationship between the three components inherent in information architecture: discipline, role, and community. There is the discipline of information architecture, the role of information architect, and the community of architects themselves. Information architecture makes use of these components to design information environments that consist of users, context, as well as content and applications. Context can include goals, politics, culture, and resources. Users include information needs, audience types, expertise, tasks, and ecology. Content and applications includes document types, objects, structure, attributes, and meta–information. The purpose of the model is to illustrate the components of information architecture and the relationships between them while calling attention to the fact that information architecture involves much more than just information.

 

Figure 2: Three circles of information architecture 3.0
Figure 2: Three circles of information architecture 3.0 (Morville, 2006).

 

 

++++++++++

Standards

One of the most important aspects of architecture is standards, or rules of operation. Such standards are especially important in architectures in which communications between systems are necessary. Examples of standards in communications architecture include hybrid TCP/IP — OSI, CORBA, and Web services. It is important to note that such standards are not new, many have been around for a decade or more. Knowledge of these standards and how they interoperate are essential to understand the architectures of digital libraries.

Standards are often divided into functional units or layers, such as the physical, data–link, Internet, transport, and application layers of hybrid TCP/IP–OSI. These layers are open standards that promote interoperability between hardware and software from different vendors.

Benefits of standards include modularity and reusability. Standards enable a modular design approach that subdivides architectural components into independent modules that address specific tasks. Such modules allow for changes to be made to specific architectural objects without affecting the functionality of other objects. Standardized modules are interchangeable and reusable, which encourages efficiency in functionality. Standards also enable scalability, with the potential for millions of constituent objects, and extensibility, with the integration of many new objects.

The Internet is an excellent example of a standards–based architecture. Communications standards for the Internet originated with several standards–governing bodies: the Internet Engineering Task Force or IETF, International Organization for Standardization (ISO), and International Telecommunications Union–Telecommunications Standards Sector (ITU–T). The various standards that these governing bodies employed evolved into the hybrid model that is prevalent today.

Hybrid TCP/IP–OSI

The Internet comprises a layer of standards know as the Hybrid TCP/IP–OSI Standards Architecture. The five layers of standards are the result of two very different standards architectures: Transmission Control Protocol and Open Systems Interconnection. Figure 3 illustrates the TCP/IP layers, OSI layers, and the combined five layers that make up the Hybrid TCP/IP–OSI Standards Architecture.

 

Figure 3: TCP/IP, OSI, and Hybrid TCP/IP–OSI Standards Architecture layers.
Source: Panko (2006).
TCP/IP layerOSI layerHybrid TCP/IP–OSI layerPurpose
ApplicationApplication (7)Application (5)Allows two application programs to communicate effectively.
 Presentation (6)  
 Session (5)  
TransportTransport (4)Transport (4)Allows two computers to communicate even if they are of different platform types.
InternetNetwork (3)Internet (3)Governs the transmission of packets across multiple networks, via a mesh of routers. Each pair of routers is connected by a single network (subnet).
SubnetData Link (2)Data Link (2)Governs the transmission of frames within a single network (subnet).
 Physical (1)Physical (1)Governs the transmission of individual bits within a single network (subnet).

 

The TCP/IP’s upper–layer standards are governed by a loose confederation of working groups composed of persons throughout the world and coordinated under the umbrella of the Internet Engineering Task Force, or IETF. The IETF makes recommendations regarding new technical standards and architectures to the Internet Architecture Board (IAB). The basic communications standards to which all Internet–attached hosts comply are the Internet Protocol (IP), Transmission Control Protocol (TCP), and HyperText Transfer Protocol (HTTP).

The lower–layer (subnet) standards are the work of the International Organization for Standardization (ISO) and International Telecommunications Union — Telecommunications Standards Sector (ITU–T). Together they developed the joint standards architecture called the ISO Reference Model for Open Systems Interconnection (OSI). In relation to communication networks, OSI, introduced in the late 1970s, was the group’s landmark contribution. The goal of the OSI model was to allow interexchange of information across multiple vendor environments. The OSI model as illustrated in the second column of Figure 3 presents an architectural framework for communications networks based on a functional layered approach. It consists of seven layers, each with a set of discrete functions and an interface specification to the layers above and below it.

The value of the layered approach is in the isolation of one layer’s functionality from each other layer’s functionality. This isolation allows for changes to be made to one layer without altering other layers. For example, changes in the error–correction protocols at Layer 2 do not require changes in the Layer 3 network routing algorithms or the application–dependent functions at higher layers. While the conceptual model of OSI has withstood the test of time, TCP/IP has supplanted the specific protocol specifications for the upper layers.

 

Figure 4: Protocols for Hybrid TCP/IP–OSI standards architecture.
Source: Panko (2006).
Application Layer: HTTP
(HyperText Transfer Protocol)
Transport Layer: TCP
(Transmission Control Protocol)
Internet Layer: IP
(Internet Protocol)
Data Link Layer: PPP
(Point–to–Point Protocol)
Physical Layer: 56K/DSL/Cable/Fiber
Modem

 

Protocols are specific types of standards for communications between peer processes, that is, processes at the same layer but on different machines (Panko, 2006). The specific protocols that govern the exchange of messages for the five layers of the hybrid TCP/IP–OSI standards architecture are illustrated in Figure 4. The five layers of the hybrid TCP/IP–OSI standards architecture work together to empower the Internet and allow application programs on different computers, often clients and servers, to communicate effectively.

CORBA

As technology progresses, computers change dramatically along with their platforms, operating systems, programming languages, and protocols. While a particular computer is chosen in many cases because it is best suited for a particular need, over time this can lead to a collection of computers that are far from compatible. With the rise of the Internet and distributed systems, computers have extended to vast networks and compatibility has become a necessity.

The need to integrate diverse computer systems is highly problematic. While it would be easier to integrate a collection of computers that are technically similar, many computer platforms are specialized for a particular service. To replace such systems is not practical. Only a standardized architecture that enables diverse computer systems to communicate effectively can solve the problem. Common Object Request Broker Architecture (CORBA) is an open, vendor independent specification for an architecture and infrastructure that allows diverse computer applications to work together over networks. Created by a private consortium known as the Object Management Group (OMG), CORBA is principally designed to solve the problem of computer integration and interoperability. Interoperability is enabled by two specific parts of the architecture: OMG Interface Definition Language and the standardized protocols GIOP and IIOP. These allow a CORBA–based program on any computer, operating system, programming language, and network to interoperate with a CORBA–based program on any other computer, operating system, programming language, and network. OMG IDL has been an ISO International Standard for several years, and GIOP and IIOP protocols have been in place since October 2001 (OMG, 2002). In addition, CORBA defines a mandatory TCP/IP–based protocol for interoperability over the Internet and most intranets and integrates its functionality within the hybrid TCP/IP–OSI standards architecture at the applications level on both clients and servers. CORBA clients run on personal digital assistants (PDAs), laptops, desktop machines, and even mainframes; CORBA servers run on all of these machine types as well. The specification standardizes sophisticated resource management and fault tolerance for reliable server–side installations. Supporting CORBA on the application side is the Object Management Architecture (OMA), a collection of standardized objects performing standardized functions, including services such as transaction handling and security (OMG, 2002).

Web services

Another form of architecture that is growing in popularity is Web services. While Web services include architecture, the term also encompasses the standards, technology, and service models that make Web services possible. With that said, what exactly are Web services? As Tracy Gardner (2001) has put it, “Web services are interoperable building blocks for constructing applications.” Web services are a new form of Web application that is self–contained, self–describing, modular, and can be published, located, and invoked across the Web.

In the context of a digital library, Web services might include distributed search and retrieval, authentication, interlibrary loan requests, document translation, and payment. These services are accessed with a Web portal, such as a search–and–retrieval interface.

While the hybrid TCP/IP–OSI standards architecture enables users to connect to Web applications, Web services architecture enables Web applications to connect to other Web applications. Three roles are implicit to Web services: service provider, service requester, and service registry. The interactions between the three roles have been defined as “publish, find, and bind.” IBM’s Web services Architecture Diagram (see Figure 6) depicts the three roles and their interaction.

 

Figure 5: IBM Web services architecture diagram
Figure 5: IBM Web services architecture diagram (Gardner, 2001).

 

Following Figure 5, the service provider publishes a service description to a service registry. A service requester then finds the service description via the service registry. The services description contains sufficient information for the service requestor to bind to the service provider to use the service. (A service description is the metadata describing the service and a bind is the step that allows an application to connect to a Web service at a particular web location and begin interacting with it.)

Standards are integral to the interoperability required by Web services and critical for any large–scale adoption of the architecture. While there is currently no Web–services standards organization, key industry leaders have agreed on a set of XML–based open standards in order to implement the Web services architecture. These standards include Web Services Description Language (WSDL); Universal Discovery, Description, and Integration (UDDL); and, Simple Object Access Protocol (SOAP).

WSDL is the standard for capturing service descriptions, UDDL is the standard for specifying distributed registries of Web services, and SOAP is the standard for XML–based information exchange between distributed applications. These standards enable Web services to publish (WSDL, UDDI), find (WSDL, UDDI), and bind (WSDL, SOAP) in an interoperable manner.

 

++++++++++

Digital library projects

Information architecture in the context of digital libraries is by no means a new phenomenon. There are numerous digital library projects that utilize a variety of information architectures; select examples will demonstrate information architecture’s role in digital library design. The following have been chosen because their well–defined and distinct architectures and the analyses are meant to clarify their respective architectures rather than critique them. These digital library projects include the University of Michigan Digital Library (part of the Digital Library Initiative Phase I), Networked Computer Science Technical Resource Library (part of the Digital Library Initiative Phase II) and Distributed National Electronic Resource (U.K.‚s JISC).

Digital Library Initiative/National Science Foundation (DLI/NSF):

Sponsored in part by the National Science Foundation (NSF), the Digital Library Initiative (NSF, 2002a; 2002b) focused on radically advancing the means to collect, store, and organize digital information using user–friendly Internet–based search–and–retrieval mechanisms.

The Digital Library Initiative supported the storage and manipulation of large collections of materials in electronic format by sponsoring research into networked information systems, and necessary infrastructure to manipulate information on the Internet. Related technological considerations include the question of how to search and display selected materials from large digital collections. The Digital Library Initiative consisted of two phases, detailed below.

Phase I

The Digital Library Initiative Phase I (http://www.dli2.nsf.gov/dlione/) began in 1994 and was completed in 1998. It was comprised of six digital library projects in a joint initiative of the National Science Foundation (NSF), Department of Defense Advanced Research Projects Agency (DARPA), and National Aeronautics and Space Administration (NASA). The main objective of these projects was the development of state–of–the–art tools for information discovery, management, retrieval, and analysis. Collegiate partners included the University of California at Berkeley, University of California at Santa Barbra, Carnegie Mellon University, University of Illinois at Urbana–Champaign, University of Michigan, and Stanford University. Of particular interest is the decentralized agent–based architecture of the University of Michigan Digital Library Project.

University of Michigan Digital Library Project:

As a participant in the Digital Library Initiative Phase I, the University of Michigan Digital Library Project (UMDL; http://www.si.umich.edu/UMDL/) developed a decentralized architecture based on the notion of a software–agent. Agents are elements of a digital library collection or service and are highly encapsulated pieces of software that include the following two special properties (Birmingham, 1995):

  • Autonomy: Agents both compute something and determine preferences. Agents have the ability to reason about how they use their resources. In this way, an agent only fulfills those services consistent with its preferences. This reasoning is an extension of traditional computer programs.
  • Negotiation: Autonomous agents negotiate with other agents to gain access to resources or capabilities. The process of negotiation will often consist of a “conversation sequence,” wherein multiple messages are exchanged according to some agreed protocol, which itself can be negotiated.

Autonomous agents by definition allow for decentralized control and as a result should enable significant scaling of the UMDL. There is no master program that requires updating or need for global coordination since agents control services at the local level. Negotiation complements autonomy by creating a binding commitment between agents to exchange services.

 

Figure 6: UMDL agent-based architecture
Figure 6: UMDL agent–based architecture (Birmingham, 1995).

 

Three classes of agents make up the UMDL architecture (see Figure 6):

  • User Interface Agents (UIAs) provide a communication wrapper around a user interface that performs two functions. First, it encapsulates user queries in the form required for UMDL protocols. Second, it publishes a profile of the user to appropriate agents, which enables mediators to guide the search process.
  • Mediators perform a variety of functions, including all tasks that are required to refer a query from a UIA to a collection, monitor the progress of the query, transmit the results of a query, and perform necessary translation and bookkeeping. There are two main types of mediators: registry agents, which capture the address and contents of each collection, and query–planning agents, which receive queries and route them to collections. There is also a special class of mediators called facilitators that mediate negotiation among agents.
  • Collection Interface Agents (CIAs) provide a communication wrapper for a collection of information and perform translation tasks between the mediator and collection.

With user interface, mediator, and collection interface agents, the UMDL’s agent–based architecture is potentially both scalable and modular, with the capacity to add specialized agent–based services as needed. UMDL’s agent–based architecture demonstrates that a decentralized digital library is a viable option.

Phase II

The Digital Library Initiative Phase II (http://www.dli2.nsf.gov/) began in 1998 and ended 2004. Phase II built on the Phase I Initiative by focusing on supporting research to develop the next generation of digital libraries; to promote the use and usability of globally distributed, networked information resources; and, to encourage the creation of innovative applications. Support for DLI Phase II was provided by the National Science Foundation (NSF), Defense Advanced Research Projects Agency (DARPA), National Library of Medicine (NLM), Library of Congress (LOC), National Aeronautics and Space Administration (NASA), National Endowment for the Humanities (NEH) and Federal Bureau of Investigations (FBI). These agencies work in partnership with the Institute of Museum and Library Services (IMLS), Smithsonian Institution (SI), and National Archives and Records Administration (NARA) to fund hundreds of research projects undertaken by universities across the country. The research projects generally consisted of architectural components of a digital library; a perspicuous example is the federated open architecture of the Networked Computer Science Reference Library (NCSRL).

Networked Computer Science Reference Library:

As an example of a federated digital library based on an open architecture, the Networked Computer Science Technical Reference Library (NCSTRL; http://www.ncstrl.org/) provides a contrast to more centralized architectures. The NCSTRL comprises more than 100 institutions that provide a federated digital library for computer science materials in the form of individual collections and accompanying library services (Leiner, 1998).

As a federated digital library, NCSTRL features an open architecture that consists of a partitioned set of well–defined services, each with a well–defined protocol that specifies the interface to that service. As a result, organizations are autonomous at the local level and free to mix and match digital library services, with the only requirement being that the interface be consistent with the agreed upon protocol. The various organizations that make up the federated digital library are free to aggregate whatever technologies are deemed appropriate to satisfy user needs so long as the interface is consistent with the overall architecture. In this way, a federation of individually designed digital library systems provides a seamless collection of resources and services that is at the same time interoperable with the broader community.

The NCSTRL open architecture has three aspects: a common underlying infrastructure that supports the creation of multiple services; the decomposition of services into a set of well–defined services, each with a well–defined protocol to interface with that service, and mechanisms for interoperability among library systems that may not share the same service decomposition (Leiner, 1998). The supporting infrastructure starts with digital objects, a data structure whose components include data and a unique identifier. The unique identifier for digital objects is called a handle, and the naming service is the handle system. The handle system enables the association of unique identifiers with their respective digital objects. In addition, the system’s repository access protocol (RAP) manages the storing and retrieving of digital objects in a repository.

 

Figure 7: NCSTRL open architecture services and interactions
Figure 7: NCSTRL open architecture services and interactions (Leiner, 1998).

 

The NCSTRL open architecture consists of four key services: repository service, index service, collection service, and user interface service (see Figure 7). The repository service manages the deposit, storage, and access of digital objects. The index service manages the discovery of digital objects via query. The collection service manages the aggregation of sets of digital objects into specialized collections. The user–interface service provides a gateway to the various resources and services that make up the federated digital library. Underlying the interaction of the four services is the handle system and the repository access protocol.

Distributed National Electronic Resource:

The Distributed National Electronic Resource (DNER) is of interest as an international digital library test bed that featured a variety of quality–assured electronic and physical resources including scholarly journals, monographs, textbooks, learning objects, abstracts, manuscripts, maps, music scores, still images, geospatial images, and other kinds of vector and numeric data, as well as moving picture and sound collections.

Developed in the United Kingdom by the Joint Information Systems Committee (JISC; http://www.jisc.ac.uk/), DNER provided an information environment that was easily accessible, and a comprehensive information resource for use by learners, teachers and researchers within the U.K.’s higher and further education community. DNER’s architecture was subsequently incorporated into JISC’s online collection services.

DNER’s service–based architecture accesses content in the form of collections held locally, JISC collections made available through DNER content providers, and external proprietary collections.

 

Figure 8: DNER functional model
Figure 8: DNER functional model (Powell and Lyon, 2001).

 

DNER supports three high-level activities: discovery, access, and use. As can be seen in Figure 8, the functional architecture consists of an iterative process that starts with interaction (authenticate and buildLandscape), which initiates discovery of resources (survey, discover), then access of resources (detail, request, authorize, access), and finally leads to use (useRecord, useResource).

 

Figure 9: DNER technical architecture
Figure 9: DNER technical architecture (Powell and Lyon, 2001).

 

The DNER technical architecture (see Figure 9) provides the framework for a variety of services to interact with one another, allowing end-users to move between services in a uniform manner. This enables DNER to function as a coherent whole, rather than a set of isolated services.

Based on portals, brokers, aggregators and content providers, the DNER technical architecture consists of an interactive layer of services at the discovery level. Portals are common search–and–retrieval interfaces that provide a single point of contact for a variety of networked services. Brokers are services that fan out search queries to multiple targets; they sit between the portal and the content–provider target, interacting with both using Z39.50 and the Bath Profile, and returning results formatted as Dublin Core XML records. Aggregators are services that aggregate metadata records from several repositories, facilitating the discovery process. Portals, brokers, and aggregators work together with content providers to offer a seamless and integrated resource–discovery service for the end user.

Portals are also able to gather metadata records from remote content providers using the Open Archives Protocol for Metadata Harvesting (OAI). This enables them to build local databases that contain copies of the records provided by remote content providers.

In addition, portals are alerted to new resources using RDF Site Summary (RSS). RSS is a Resource Description Framework (RDF) application for syndicated news feeds on the Web. News items are described using Dublin Core–based descriptions and then exchanged as RDF/XML files. RSS is used to exchange metadata about any frequently updated material.

At the access level, authentication, authorization, collection–description, service–description, resolver (resolution–service), and institutional profiling services interact with portals, brokers, aggregators, and content providers to enable the retrieval of resources.

Theoretically, resolvers find the most appropriate copy or copies of a resource, either through knowledge of the network topology, access rights, price, etc. To do this, the resolver must be aware of who the end users are, where they are, what institution they are affiliated with, and what they have access to. Based on this qualification, it was determined this is best carried out locally, where knowledge of the user’s access rights and preferences are more likely available.

End user interaction with DNER is made possible through HTTP and HTML protocol, using standard Web browsers. Portals, brokers, aggregators, and content providers interact using a mixture of Z39.50, OAI, and RSS, with the majority of metadata transferred conforming to simple Dublin Core encoded using XML.

 

++++++++++

Conclusion

Information architecture is an inherent in the design and development of the digital libraries. Functional, technical, and landscape architectures combine to form the structure of a digital library. The principal role of the information architect is to define and document these structures. While the digital libraries utilize a variety of architectures, they share many common attributes. Perhaps the most obvious of these is that of providing for user needs. User needs define the range of services that the architecture is designed to provide, these services are in turn facilitated by standards. As with the Internet, standards, or rules of operation, play an integral role in architecture. Standards enable communication between components and facilitate interoperability. Much architecture, such as the hybrid TCP/IP–OSI of the Web, features several interdependent layers of standards that combine to enable specialized functionality. Standards also enable scalability of architectural components to meet growing demand, and extensibility, the integration of many new components. Thus, many design changes and improvements can be made to a core architecture that would not be otherwise possible. While the architectural models analyzed here reveal significant divergences in digital library design, they all employ common standards. As the examples indicate, information architecture has a defining role in digital library design, one that will determine to a large extent the future of the twenty–first century digital library. End of article

 

About the author

Dr. Scott Simon is a musician and information scientist in the School of Library and Information Sciences at the University of South Florida. He completed studies in both Music Performance and Music Information Retrieval while earning his Ph.D. in Interdisciplinary Information Science in 2005 at the University of North Texas.

 

References

Association of Research Libraries, 1995. “Definition and purpose of a digital library,” at http://www.ifla.org/documents/libraries/net/arl-dlib.txt, accessed 27 November 2008.

W.P. Birmingham, 1995. “An agent–based architecture for digital libraries,” D–Lib Magazine, volume 1, number 1, at http://www.dlib.org/dlib/July95/07birmingham.html, accessed 27 November 2008.

P. Bradford (editor), 1996. Information architects. Introduction by Richard Saul Wurman. Zurich, Switzerland: Graphis Press.

Dillon, A. (2000). “Information architecture: Why, what & when?,” PowerPoint presentation delivered at ASIS Summit 2000 in Boston, at http://www.asis.org/Conferences/Summit2000/dillon/index.htm, accessed 27 November 2008.

T. Gardner, 2001. “An introduction to Web services,” Ariadne, issue 29 (September), at http://www.ariadne.ac.uk/issue29/gardner/, accessed 27 November 2008.

B.M. Leiner, 1998. “The NCSTRL approach to open architecture for the confederated digital library,” D–Lib Magazine, volume 4, number 12 (December), at http://www.dlib.org/dlib/december98/leiner/12leiner.html, accessed 27 November 2008.

P. Miller, 2001. “Architects of the information age,” Ariadne, issue 29 (September), at http://www.ariadne.ac.uk/issue29/miller/, accessed 27 November 2008.

P. Morville, 2006. “Information architecture 3.0,” Semantics Studios (29 November), at http://www.semanticstudios.com/publications/semantics/000149.php, accessed 27 November 2008.

P. Morville and L. Rosenfeld, 2007. Information architecture for the World Wide Web. Third edition. Sebastopol, Calif.: O’Reilly.

National Science Foundation, 2002a. “Digital libraries initiative,” at http://www.dli2.nsf.gov/dlione/, accessed 27 November 2008.

National Science Foundation, 2002b. “Digital libraries initiative phase 2,” at http://www.dli2.nsf.gov/projects.html, accessed 27 November 2008.

Object Management Group (OMG), 2002. “The Object Management Group (OMG),” at http://www.omg.org/, accessed 27 November 2008.

R.R. Panko, 2006. Business data communications and networking Sixth edition. Upper Saddle River, N.J.: Prentice Hall.

A. Powell and L Lyon, 2001. “The DNER technical architecture: Scoping the information environment,” United Kingdom Office for Library and Information Networking. University of Bath (18 May), at http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/dner-arch.html, accessed 27 November 2008.

R.E. Wyllys, 2000. “Information architecture,” University of Texas at Austin, Graduate School of Library and Information Science, at http://www.gslis.utexas.edu/%7El38613dw/readings/InfoArchitecture.html, accessed 27 November 2008.

 


Editorial history

Paper received 24 May 2008; accepted 8 November 2008.


Copyright © 2008, First Monday.

Copyright © 2008, Scott J. Simon.

Information architecture for digital libraries
by Scott J. Simon
First Monday, Volume 13, Number 12 - 1 December 2008
http://www.firstmonday.org/ojs/index.php/fm/article/viewArticle/2183/2059





A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2014.