Planning, implementing and managing online repositories: Lessons learned from the KnowGenesis Library
by Saurabh Kudesia
Established in 2005, the KnowGenesis Online Library for Technical Communication (http://www.knowgenesis.org/tc) is India’s first online repository dedicated to accelerate knowledge sharing and promote selflearning in the field of technical communication. The Library is available free of cost, requiring a onetime free registration to access available material. The popularity and success rate of the Library can be determined by the fact that within a year of its launch, it not only attracted more than 24,000 visitors and gained more than 1,500 subscribers, but also increased the volume of the hosted content from a few documents to more than 2,000 important documents, presentations, tutorials and links.
The KnowGenesis (KG) Library presents a unique case for repository designers to study the complex design and implementation process that contributed to the stability and overall success rate of the online Library.
This paper not only shares the designing and implementation challenges faced by the KnowGenesis team, but also presents the approach used to match the user requirements with the Library design. Based on the lessons learned during the process, the paper also presents specific set of guidelines and recommends methodologies that can provide critical assistance for developing and managing medium and largescale repositories.
Background of the KnowGenesis Library
Important lessons learned
Success so far
In the last few decades there has been a rapid expansion of research on digital networks. The education community has provided excellent opportunities to connect information sources electronically, resulting in emerging, universally accessible, digital libraries. Digital libraries are no longer restricted to a collection of content collected on behalf of users but are extending their scope as institutions provide a range of services in a digital environment (Borgman, 1999). Hence digital library service is about an assemblage of digital computing, storage, and communications machinery together with the software needed to reproduce, emulate, and extend the services provided by conventional libraries based on paper and other material means of collecting, storing, cataloguing, finding, and disseminating information (Gladney, et al., 1994). In addition to digital collection and management tools, a digital library provides an environment to support information life cycle (Duguid, 1997).
Digital libraries are based on a distributed technology environment (Fox, 1994) and require technology to link resources and to make information services available to end users (Association of Research Libraries [ARL], 1995). The easier and wider access to information enables digital libraries to extend their services to digital artefacts that cannot be represented or distributed in printed formats (ARL, 1995). These libraries also serve as communal repositories of knowledge, allowing users to define and guide the development of specific libraries (Wright, et al., 2002).
Background of the KnowGenesis Library
While there were many Web sites catering to different aspects of technical communication, very few of them were able to meet the overall learning requirements of technical writers. Some prominent online collections  provide comprehensive bibliographic data related to the field but are slightly restricted in their operation. While the services related to these collections are free, a user has to actually visit the referred site to read the content. Frequently content availability is governed by the site hosting the content rather than the library itself. Therefore, some content hosted on the site was available only to subscribers of a parent site. There was no way to know if resources are available on the parent site without visiting a related site.
Propelled by a need to fill observed gaps during preliminary observations and specifically to meet the requirements of a growing Indian technical writing community, the KnowGenesis Library was conceptualized with an aim to
- Accelerate and promote selflearning through online support;
- Establish a networked, noncentralized approach to search and discovery of resources;
- Create culture that fosters contribution to and use of the Library;
- Bring the expertise distributed throughout the technical writing community and convert it to forms that are shareable through the digital Library, for the benefit of all;
- Develop tools for professional development and continuous learning aimed at addressing specific issues in technical writing education;
- Share best practices, activities and methodologies within the community;
- Foster synergistic relationships between contributors and users; and,
- Provide impetus for similar initiatives within and outside the community.
One of the biggest challenges in library planning is to provide users with access to information that has been evaluated, organized and preserved in the most useful format (McMillan, 2000). Special attention was therefore given to design, construction, and management aspects of the KnowGenesis Library in order to fully utilize the distributed effort and energies of a broadly engaged community. Methodologies such as usercentered design (Norman and Draper, 1986), taskcentered design (Gould, et al., 1991; Lewis and Rieman, 1993), and Participatory Design (PD) (Greenbaum and Kyng, 1991; Schuler and Namioka, 1993) were considered in order to involve users early in the design process and get their continuous and systematic feedback during and after the development of the Library.
Keeping user requirements in focus, major considerations were given to issues like
- Affordability: keeping initial cost as low as possible;
- Sustainability: minimal ongoing investment;
- Repeatability: to use the model to serve other requirements of the community;
- Openness: minimize software costs;
- Compatibility: supporting variety of document formats;
- Commonality: solution is a common denominator with elements readily accessible to subscribers; and,
- Scalability: eventually should be able to serve more than 10,000 users.
As the potential participants were distributed, with most project communication and coordination activities taking place over the Internet, a broader usercentric approach was adapted to cover the four major dimensions of data/collection, system/technology, users, and usage (Fuhr, et al., 2001). The initial framework was planned to serve different modules of the Library like
- The underlying system and its components, including classical information retrieval evaluation methods and techniques as well as overall systems performance;
- The interface and interaction level of the activities between the user and the system;
- Support for different access and usage strategies;
- Situational and contextual factors, such as organizational and group issues; and,
- Design outcomes, with emergent flavor, that can help to manage the unpredictable lines of inquiry being pursued in the broader community of participants. (Wright, et al., 2002).
Based on the initial research, the important technical requirements for Library development were identified and categorized as below:
- Mechanism for data collection and dissemination: including content (partial/full, diversity, size and quality of the content) (Fuhr, et al., 2001);
- Metacontent requirements: indexing, citation, level of detail, classification (Fischer, 2001a; 2001b);
- Management: access rights, user management, document maintenance, growth requirements, ease of use, workflow; approval, consumption, synthesis and distribution of documents;
- Technology: interface (user/admin), searching, printing, repository structure; supported document model and format;
- Information access: uploading, access, modification, deletion; and,
- Tools: minimum cost, ease of operation, availability of support documentation and online community.
For selecting the right tool for developing the Library, apart from cost consideration, immediate administrative and usage requirements were taken as minimum requirements (see Table 1).
Table 1: Categorized minimum requirements for KG repository. Builtin Applications Ease of Use Interoperability Granular Privileges Management Database Reports Friendly URLs Content Syndication (RSS) LDAP Authentication Content Scheduling Document Management Server Page Language FTP Support Session Management Inline Administration Help Desk/Bug Reporting Spell Checker UTF8 Support Versioning Online Administration Link Management Template Language WAI Compliant SSL Logins Package Deployment Mail Form UI Levels WebDAV Support SSL Pages Themes/Skins Polls Undo XHTML Compliant Support Trash Product Management WYSIWYG Editor Performance Commercial Manuals Web Statistics Project Tracking Flexibility Advanced Caching Commercial Support Webbased Style/Template Management Search Engine Content Reuse Page Caching Commercial Training Webbased Translation Management Subscriptions Extensible User Profiles Security Developer Community Syndicated Content (RSS) Interface Localization Content Approval Online Help User Contributions Metadata Email Verification
Because of its deep features and variety of means to communicate on the Web, Mambo Open Source was found suitable to satisfy the library requirements.
Figure 1: KG library Architecture.
The extremely lightweight and efficient Mambo is a useful, sophisticated content management system and supports most of the tasks that the content editors and site visitors care about including 
- A useful site structure and navigations system;
- Allowing nontechnical content editors to update content, add new pages or change navigation menu items;
- Supporting a completely configurable graphic design, with flexibility to modify the Web site and CMS to match the exact requirements;
- Facilitating internal work sharing;
- Providing accessible sites, search engine optimization and human readable URLs; and,
- Offering lots of plugins to support a wide range of common needs.
User participation planning
The initial requirements were focused towards developing the Library as a collaborative process, where the users of the Library can be the real contributors of the content and even take different roles and responsibilities based on the development requirements and individual interest and expertise.
Based on the requirement of their engagement in the project, the participants are categorized into three primary groupings: Administrators, Moderators and the community. The Administrators are responsible for the policy guidance and to manage the core facets of the Library (services, users, collections and technology), including developing and operating the core infrastructure of the Library. The moderators are volunteers who have shown their interest in helping KnowGenesis with the daytoday activities of the Library and its usage. The community includes individuals and institutions that have an interest in seeing the Library develop, and they are involved as contributors based on their interests and expertise.
Two main hierarchies exist for User Groups: one for access to the frontend (to allow users to log into the Web site and view designated sections and pages) and one for backend administration access (see Figure 2). In general, access provided to a parent group (like Registered) is inherited by a child group (like Author) unless specifically denied.
Users in the Super Administrator group cannot be deleted and cannot be switched to another group.
Figure 2: Hierarchical access mechanism and membership provided in KG.
Table 2: Categorized membership provided in KG. User Groups Access privileges Front End Registered These Users are able to login to the frontend Web site. Additional information (sections and pages) may be available to a user once logged in. Authors These Users are given access to submit new content and edit their own content items/pages by logging into the Frontend. Editors These Users are given access to submit and edit any content by logging into the frontend. Publishers These Users are given access to submit, edit and publish any content by logging into the frontend. Back End Manager The Manager Group is generally restricted to matters of content creation. Administrator The Administrator Group has slightly restricted access to the backend (Administrator) functions. Super Administrator The Super Administrator Group has access to all of the backend (Administrator) functions and can perform sites Global Configuration,
The access rights can be changed by the administrators to reflect the change in roles and responsibilities of the user.
Figure 3: KnowGenesis Library frontend.
Figure 4: KnowGenesis Library backend.
- Security. Apart from role based access, two additional access control parameters, namely Public and Registered, were assigned to Content items, menu items, modules and components. While the items published in the category Public can be viewed and accessed by the anonymous Web visitor, anything assigned with registered access can be view or accessed by anyone who has logged into the Web site via the frontend and is a registered user of the Library.
- User Registration. For registration, visitors are prompted for Name, Username, Email and Password. For activation, an email with the activation link is sent to the email address provided by the visitor. On clicking the activation link, the account will be activated and the user will able to log in. This feature not only verifies that visitor exists and has a valid email address but also gives the user the ability to choose their password at registration. It also provides Administrator a better overview of activated and nonactivated accounts.
- Content Uploading and Management. Any user created as Author, Editor, Publisher, Manager, Administrator or Super Administrator is considered a Special User and has access to submit news, articles, FAQs and Web Links. These Special Users are the only ones able to access to an item with the ‘Special access parameter. The entire content module may be hidden from any Public or Registered user by specifying its access as Special. The uploaded content can be scheduled for publication at specified date or can be configured to be made available for a specified duration of time. The Library also automatically picks the appropriate content items to show site visitors based on rules — for instance, the Library home page automatically display only the recent news stories, posts, content uploaded, and events added.
- Online Editing and Saving. When a User edits a file, Mambo changes its status to Checked Out to prevent two Users from editing a document at the same time, thus preventing loss of data upon saving. The Administrator has been granted privileges to forcefully checkin all the checkedout files.
- Archiving. KnowGenesis Library supports archiving the contents on predefined rules with additional facility of restoring or trashing the archived content.
- Language Management. KnowGenesis Library supports a long list of the Languages for the core text on the Front end of the Library. Currently, only the English language has been enabled.
- Messaging. The Messaging System is handy to facilitate work flow events and to send notes or messages to other Mambo Administrators. The Messaging system has been customized to notify administrators of events such as new content being submitted. Further to this, the library provides facility to send a message by email to one or more groups of Users.
Important lessons learned
- Working with a content management system. No single content management system (CMS) was found to be perfect during the tool assessment and almost all systems required some modifications to make it fit the requirements of KnowGenesis. It is therefore important to set library design priorities straight and manage tradeoffs between time and other requirements like features, stability and extensibility during the planning phase. While open source CMS may have no initial cost, the cost involved in setting up the hardware, initial installation and configuration has its own role to play. Evaluation of browser and platform compatibility of the selected CMS, along with the programming skills required to setup and maintain the library, should therefore form an integral part of the technical requirement assessment.
- Initial planning. The KnowGenesis Library has done a lot of things right, reflecting the time and analysis that went into initial planning. So far, the initiative has kept focus on the principles of affordability, sustainability, repeatability, openness, compatibility, commonality, and scalability. Regular testing of Library features and services ensure that the Librarys performance is up to the user requirements.
- Keep administrative requirements to a minimum. The KnowGenesis Library experience reflects that computermediated communication technologies (email, news, Web forums) can provide vital assets to establish and support online communities. Solutions that minimize and simplify human involvement at all levels of the librarys operation and tools or approaches that allow users to diagnose their problems without the assistance of library administrators should be encouraged and form a core aspect of library design .
- Balancing usability. Determining the quality of the actual user experience, however, was a difficult task as very few of the Library subscribers explicitly complained. It is therefore important for online repository designers to get explicit feedback about the quality of a given service or to track user behavior to get a real picture. Different methodologies and channels should be utilized to understand usability. Regular feedbacks using polls and email discussion are some of the methods that KnowGenesis is using to secure feedback from users.
- Ensuring social facilitation. The KG framework experimented with intertwining social processes and technical artifacts to develop community driven designs (Scharff, 2002). One major challenge was to manage the large amount of work required to support the social facilitation of a diverse set of participants, and to encourage them to get involved in the design, development and testing of the Library. Providing regular encouragements and due credits for individual contributions on different platforms are important in encouraging further participation.
- Securing content distribution. From a service perspective, it is indeed challenging to manage secure distribution of content into and out of the library environment and to keep it secure from intruders. The KG Library has several checkpoints to ensure that content is safe in all aspects and authentication is required at all levels to manage the regular operations.
To make sure that the Library can keep pace with the growing user requirements and to make the best use of the technical developments in the future, other important enhancements, features and facilities were identified that need to be incorporated in the future.
Table 3: Planned future enhancements for the KG repository. Applications Guest Book Security Performance Blog Job Postings Audit Trail Database Replication Chat My Page Login History Classifieds Newsletter Pluggable Authentication Contact Photo Gallery Support Management Surveys Professional Hosting Data Entry Tests/Quizzes Professional Services Discussion/Forum Flexibility Public Forum Events Calendar Multilingual Content Public Mailing List Legends FAQ Management Multilingual Content Integration ThirdParty Developers Initiated File Distribution MultiSite Deployment User Conference Near completion Graphs and Charts URL Rewriting Ease of Use Completed Groupware Wiki Aware Email to Discussion Due
While some of the planned enhancements are already implemented, a majority of them are in final stage, waiting to be integrated in the Library.
Success so far
Within a year of its establishment, the KG community has come together to support the specific educational needs of technical writing. It has fundamentally changed by exploring new ways of sharing information, tools and services.
Partnering with projects focused on improving education provides a scalable model for addressing the social aspects of library building. Within a year of its inception and with the efforts and support of volunteers, the KnowGenesis Library is now home to more than 4,000 important documents, presentations, case studies, papers, and other items of interest to the community. The total volume of content is excess of eight GB and there are over 2,000 subscribers . The Library is also earning support from international organizations . It is actively engaged in promoting fresh concepts and ideas to break new grounds in collaboration for the community. While the Library has been busy developing itself as an important resource for technical writers (Bhange, 2006), international use is growing with approximately onethird of traffic coming from outside India.
About the author
Saurabh is the deputy director of Technical Communication Department currently working with UTStarcom Inc. a world leader in providing IPbased telecommunications equipment and solutions. The majority of his team members are in China with the rest in the U.S., Canada, and India.
He has over nine years of professional experience as a writer, editor, columnist, and evaluator, and has the proven expertise to handle complex documentation projects in various domains for global audience. He has a rich track record of successfully completing more than 22 documentation projects and eight medium/large size knowledge bases for IT/ITES, BPO, and productbased companies/clients in U.S., U.K., Canada, China, Italy, Philippines, Taiwan, Japan, South Africa, and India. He has a good experience in developing, implementing, and managing costeffective document development life cycle and process flows to increase team efficiency, deliverable quality, and customer satisfaction.
He was awarded the Best Project Adherence Award from CEO, HSBC Global Technology Center (India) for his unique methodological approach towards documentation development. He recently contributed and reviewed ISO/IEC 26512 (standards for acquirers and suppliers of user documentation).
He is the associate editor of Directives, a newsletter published by the Society of Technical Communications (STC) Management Special Interest Group (SIG) and cofounder, editorinchief of KnowGenesis International Journal for Technical Communication (IJTC) Indias first online journal for technical communication (www.knowgenesis.net).
He is a member of the Society for Technical Communication (STC) and a member of the Publicity and Communications Committee, HCIHyderabad, India.
1. See http://tc.eserver.org/about/.
2. For more information, see the Mambo Administrator Manual at http://help.mamboserver.com/index.php?option=com_content&task=category§ionid=16&id=101&Itemid=121.
3. Several such approaches are given by Merwe, et al., 2003.
4. As of 23 January 2007.
5. See KnowGenesis proud to be associated with Digital Curation Centre (DCC) UK, available at http://www.knowgenesis.org/tc/index.php?option=content&task=view&id=133&Itemid=2.
The success of the KnowGenesis Library is shared by individual contributors whose continuous involvement in various roles and capacity is providing critical support to the Library. KnowGenesis biggest assets is its community. Within that community are those who search and upload hundreds of reference documents, original articles, presentations and tutorials that have helped to make KG Library a truly great open source project.
Association of Research Libraries (ARL), 1995. Definition and purpose of a digital library, at http://www.ifla.org/documents/libraries/net/arl-dlib.txt, accessed 10 January 2007.
C. Bhange, 2006. Virtual and digital libraries, at http://eprints.rclis.org/archive/00007987/02/virtual_and_digital_library.pdf, accessed 10 January 2007.
C.L. Borgman, 1999. What are digital libraries? Competing visions, Information Processing and Management, volume 35, number 3, pp. 227243.
P. Duguid, 1997. Report of the Santa Fe Planning Workshop on Distributed Knowledge Work Environments: Digital Libraries, at http://www.si.umich.edu/SantaFe/, accessed 10 January 2007.
E.A. Fox, 1994. Source book on digital libraries. Blacksburg, Va.: Computer Science Dept., Virginia Tech.
G. Fischer, 2001a. Communities of interest: Learning through the interaction of multiple knowledge systems, Proceedings of the 24th IRIS Conference (Bergen, Norway) at http://l3d.cs.colorado.edu/~gerhard/papers/iris24.pdf, accessed 20 February 2008.
G. Fischer, G. 2001b. External and sharable artifacts as sources for social creativity in communities of interest, Proceedings of the Fifth International Roundtable Conference on Computational and Cognitive Models of Creative Design (Heron Island, Australia).
N. Fuhr, P. Hansen, M. Mabe, A. Micsik and I. Sølvberg, 2001. Digital libraries: A generic classification and evaluation scheme, Proceedings, ECDL 2001, Fifth European Conference on Research and Advanced Technology for Digital Libraries (Darmstadt, Germany, 49 September), Lecture Notes in Computer Science, number 2163, pp. 187199, and at http://www.is.informatik.uni-duisburg.de/bib/pdf/ir/Fuhr_etal:01.pdf, accessed 20 February 2008.
H.M. Gladney, N.J. Belkin, Z. Ahmed, E.A. Fox, R. Ashany and M. Zemankova, 1994. Digital library: Gross structure and requirements (Report from a workshop), IBM Research Report, RJ 9840, at http://www.ifla.org/documents/libraries/net/rj9840.pdf, accessed 20 February 2008.
J.D. Gould, S.J. Boies and C. Lewis, 1991. Making usable, useful, productivityenhancing computer applications, Communications of the ACM, volume 34, number 1, pp. 7485.http://dx.doi.org/10.1145/99977.99993
J. Greenbaum and M. Kyng (editors), 1991. Design at work: Cooperative design of computer systems. Hillsdale, N.J.: L. Erlbaum Associates.
C. Lewis and J. Rieman, 1993. Taskcentered user interface design: A practical introduction,, at http://www.hcibib.org/tcuid/, accessed 10 January 2007.
Mambo Administrator Manual, at http://help.mamboserver.com/index.php?option=com_content&task=category§ionid=16&id=101&Itemid=121, accessed 10 January 2007.
G. McMillan, 2000. The digital library: Without a soul can it be a library? Proceedings of the Tenth VALA Biennial Conference (Melbourne, Australia), at http://www.vala.org.au/vala2000/2000pdf/McMillan.PDF, accessed 10 January 2007.
J. van der Merwe, P. Gausman, C.D. Cranor and R. Akhmarov, 2003. Design, implementation and operation of a large enterprise content distribution network, Proceedings of the Eighth International Workshop on Web Content Caching and Distribution, at http://www.research.att.com/~kobus/docs/wcw2003.pdf, accessed 10 January 2007.
D.A. Norman and S.W. Draper (editors), 1986. Usercentered system design: New perspectives on humancomputer interaction. Hillsdale, N.J.: L. Earlbaum Associates.
E.D. Scharff, 2002. Open source: A conceptual framework for collaborative artifact and knowledge construction, Ph.D. Thesis, Department of Computer Science, University of Colorado, Boulder, at http://www.cs.colorado.edu/events/defenses/2001-2002/scharff.html, accessed 20 February 2008.
D. Schuler and A. Namioka (editors), 1993. Participatory design: Principles and practices. Hillsdale, N.J.: L. Erlbaum Associates.
M. Wright, M. Marlino and T. Sumner, 2002. Metadesign of a community digital library, DLib Magazine, volume 8, number 5, at http://www.dlib.org/dlib/may02/wright/05wright.html, accessed 10 January 2007.
Paper received 3 May 2007; accepted 25 January 2008.
Copyright © 2008, First Monday.
Copyright © 2008, Saurabh Kudesia.
Planning, implementing and managing online repositories: Lessons learned from the KnowGenesis Library
by Saurabh Kudesia
First Monday, Volume 13, Number 2 - 4 February 2008