First Monday

From genesis to revelation of an online resource: The North Carolina History and Fiction Digital Library by Elizabeth H. Smith

This paper provides an account of the development of the North Carolina History and Fiction Digital Library ( from an idea into a worldwide resource in two years. The principal investigator for the project introduces the Web site and gives practical and technical information that can be used as a model in other digital projects. The paper includes use statistics; reactions to the site; suggestions for how students, historians, genealogists, and other researchers can use the site; and plans for enhancements.


Features of the site
Uses of the site
Genesis of an idea
Designing an idea
The real work begins
Projects within a project
Plans for the future





The North Carolina History and Fiction Digital Library (NCH&FDL) ( at East Carolina University’s (ECU) Joyner Library meets the needs of students, historians, genealogists, and other researchers who are interested in eastern North Carolina. During the 2003–2004 grant year, staff digitized and made full–text searchable local history and fiction related to 29 eastern North Carolina counties. The initial NCH&FDL site included 169 texts (more than 24,000 pages) and 40 zoomable maps. Significant support for the creation of this project came from a US$49,954 NC ECHO (Exploring Cultural Heritage Online, Digitization Grant for 2003–2004. These U.S. LSTA (Library Services and Technology Act) funds were made possible through a grant from the Institute of Museum and Library Services, administered by the State Library of North Carolina, a division of the Department of Cultural Resources. Additional funding of US$10,000 was received through the Content Star Search Award ( competition from ApexCoVantage (formerly Apex ePublishing, to develop an enhanced Dare County page through a partnership with the Outer Banks History Center (

Local history selected from Joyner Library’s North Carolina Collection (NCC, includes definitive county histories such as Henry T. King’s Sketches of Pitt County (1911) and Benjamin B. Winborne’s Colonial and State Political History of Hertford County (1906). Fiction titles related to the counties were selected from the Roberts Collection in Joyner Library’s NCC, which is a unique body of more than 1,200 works of fiction with a North Carolina setting. This collection dates from 1720 with Miscellanea Aurea: Or the Golden Medley, which contains the first reference to “Carolina” in fiction (Killigrew, 1720). Snow L. and B.W.C. Roberts of Durham donated this valuable collection to Joyner Library in 2001 because of East Carolina University’s support for North Carolina studies. Rare fiction works in the Digital Library include Benjamin Barker’s Blackbeard; or the Pirate of the Roanoke (1847), Mary Ann Bryan Mason’s A Wreath from the Woods of Carolina (1859), and David Morrill’s The Passing Clouds (1903).

The Web site offers cross–title searching, presents the digitized materials in an appropriate scholarly context, and includes instructional materials for the classroom. Tagging of texts follows the standards of the Text Encoding Initiative (TEI,, which was founded in 1987 to develop guidelines for encoding machine–readable texts in the humanities and social sciences. Conversion of scanned files, which includes re–keying text and TEI tagging, were outsourced to Apex CoVantage. Library staff provided quality assurance, developed the user interface, worked with public school teachers to develop instructional materials, linked TEI tagged texts to the Library’s online catalog, and publicized the project.

Conversion costs per page were one of the surprises during the project. History texts averaged US$2.06 per page and fiction texts averaged US$1.39 per page. Costs for conversion were calculated at a rate of US$0.87 per kilobyte. The major reasons for the price difference were the average amount of text on each page and the amount of TEI tagging required for history titles, which contained more names. Pages in most fiction titles have fewer words than those in most history books, so there are fewer kilobytes per page. With less text per page and less tagging, conversion costs for fiction titles were less per page than for history titles. A locally developed file to record actual charges for each title and to estimate future costs helped in determining the total number of pages to be selected. With lower conversion costs, the project booklist was increased from the original 67 titles to 169 titles.


Figure 1: NCH&FDL Home Page.

Figure 1: NCH&FDL home page (


The grant staff; the Digital Initiatives team; staff from Systems, NCC, Cataloging, and Administrative Services; and numerous graduate and undergraduate student assistants were real team players. The Joyner Library Digital Editorial Board was already in place to give administrative guidance for the project. Other groups were organized to advise project staff. The Content Advisory Board included an ECU English professor who teaches North Carolina literature, an ECU history professor who taught North Carolina history, and a former Joyner Library faculty member who is director of instructional technology for the North Carolina Department of Public Instruction. A K–12 Team, which included the head of the Teaching Resources Center and a Joyner Library reference librarian who had taught middle school language arts for several years, helped with planning educational resources for the site. The successful completion of the grant project would not have been possible without the cooperation of everyone involved.



Features of the site

At the Teachers’ Assembly, Wrightsville, N.C., 12 June 1901, Judge Walter Clark, President of the N.C. Literary and Historical Association, put forth a suggestion related to the study of North Carolina, which is on the Digital Library home page (See Figure 1).

“The State has a great history. Its people have shown themselves equal to every call upon them and equal to every occasion. But that history has not yet been presented as it should be. To excite interest in its study we must make it interesting ... .Then, with an interesting history interestingly told, what more is needed? You need a wider audience.” (Clark, 1911)

The NCH&FDL is a resource that meets those needs suggested more than 100 years ago.

The home page features a navigation bar with choices for links to resources “about” the Digital Library and to the list of digitized texts arranged by author or title. Entering a search term in the appropriate box returns a list of texts in which the term appears. To access a county page, a user may click on the county on the map or in the alphabetical list.

The zoomable map in the upper left corner of each county page is the historic soil survey map for the county (See Figure 2 for soil survey map), except for Dare County, which has a 1938 National Park Service map of the Cape Hatteras National Seashore.


Figure 2: Pitt County (NC) Soil Survey Map.

Figure 2: Pitt County (NC) Soil Survey Map.


In the center of each county page is the county seal along with a link to the county government Web site. The Travel Map in the upper right corner links to an online edition of the North Carolina State Transportation Map. Each county page has an almanac style display with information related to the county, such as county seat, date formed, and origin of name. There are links to NC ECHO Cultural Heritage Sites, public libraries, and National Register of Historic Places. Other resources on each county page include pronunciation of county name, “about” the county, population history from census records, and an additional reading list. Digitized titles related to the county are listed with bibliographical information, reading level, abstract, author biographical sketch, lesson plans, maps from some books, and a link to the catalog record.

The user comments ( since the Digital Library became available 1 July 2004, indicate the various ways that people approach the site and the useful features they have identified. The use statistics for the Web site are most revealing (See Table 1 for selected statistics). During the month it went online, 5,787 visitors accessed the Web site. During the 10 months after the Web site was announced, there were 156,470 visitors to the site, with an average of 508 visitors per day who spent an average of 12 minutes per visit. The total page views for that time were 6,084,928.


Table 1: Statistics from NCH&FDL Web site (1 July 2004–30 April 2005).
The most active countries and number of visitors were
United States
United Kingdom
European Union
The most active states and number of visitors were
North Carolina
New Jersey
The most active cities and number of visitors were
Reston, Va.
San Mateo, Calif.
Herndon, Va.
Sterling, Va.
Manassas, Va.
Greenville, N.C.
The most active cities in N.C. and number of visitors were
Chapel Hill




Uses of the site

The Web site offers an armchair cyber–tour around eastern North Carolina, guiding users to some familiar places and to other interesting resources. It can be used to plan a trip or to provide supplemental information following a visit to any of the 29 counties. Genealogists have found it useful and some have contacted the NCC about additional resources related to their family research. Historians or students of literature could assess the accuracy of the historical fiction since some authors’ works are more historically correct than others. It is a valuable source for works by women authors such as Catherine Albertson and Bettie Freshwater Pool. Others might be interested in comparing descriptions of local areas in the early nineteenth century to information in the early twentieth century soil surveys and to current descriptions that may be found on the county government Web sites. The Digital Library provides access to rare books on pirates and to information on other topics such as Quakers, who are mentioned in both history and fiction documents.



Genesis of an idea

Several simultaneous events during the summer of 2002 resulted in the creation of the North Carolina History and Fiction Digital Library. The head of preservation and conservation at Joyner Library was reassigned to work in the NCC; the library had received a gift of fiction related to North Carolina from Snow L. and B.W.C. Roberts; and NC ECHO announced a new Digitization Grant for 2003–2004. A new faculty member at ECU, influenced by Kathy Nawrot’s article about the value of using historical fiction to teach history and literature (Nawrot, 1996), suggested a digital project that would combine Joyner Library’s local history and historical fiction resources. Thus, the NCH&FDL developed from an idea to become a worldwide resource within two years.

From the beginning developers wanted a Web site that would be user friendly and easy to expand with new resources or components. A library staff member designed a system management database (See Figure 3 for database entry) to house all of the information on the Web site and to allow global updates. The information extracted from the database can be used in many ways, such as creating multiple Web pages or generating reports.


Figure 3: Database entry for Plantation Sketches (1906).

Figure 3: Database entry for Plantation Sketches (1906).


A full–text search returns a display of all digitized titles in which the term appears with the number of occurrences within each title. Selecting a title from the list displays the text of the work with a method for accessing each occurrence of the term, which is displayed in red (See Figure 4 for display).


Figure 4: Returns for search term Quakers.

Figure 4: Returns for search term ‘Quakers’.


TEI and XML (eXtensible Markup Language) tagging are the basis for index searching that allows cross–title searching for personal, place, ship, organization, publication, and war names. Contextualization was important to make the site more valuable for all age groups. Abstracts, author biographies, catalog links, and almanac–style information about the counties were some of the initial features identified for the county pages.

Time and funding restraints mandated a focused project; therefore, the 29 North Carolina counties for which the NCC collects comprehensively were chosen for the Digital Library. The focus of the project became local history titles published prior to 1923, adult fiction from 1876–1922, and children’s fiction published prior to 1923. This time period was selected because all books published before 1923 are in the public domain and not under copyright. Books published in 1923 and after have various copyright protections as outlined in Gasaway’s “When Works Pass into the Public Domain” ( If access to the U.S. copyright renewal records on Professor Michael Lesk’s page at the Rutgers University School of Communication, Information and Library Studies site ( had been available during the selection stage of this project, copyright questions could have been answered easily for newer fiction books. For example, this copyright renewal database would have indicated that the copyright had not been renewed for Bothwell’s Lost Colony: The Mystery of Roanoke Island (1953) and it could have been digitized for the Dare County component. The Lesk copyright renewal file includes books published 1923–1963 for which the copyright has been renewed. Books published in 1964 and after are still under copyright; however, prior to 1989 copyrighted books had to have proper notice and registration. Currently the U.S. Copyright Office records database ( includes titles dated from 1978 to the present. The time involved in securing permissions limited the number of copyrighted works digitized. Focusing on the eastern counties, which needed additional online resources, guaranteed a larger selection of local history titles from Joyner Library’s NCC.

The goal was to include one history and one fiction title related to each of the 29 counties; however, there was very little published history for some counties and no fiction related to some counties. Basic selection resources for the project were North Carolina History: An Annotated Bibliography (Jones, 1995), and North Carolina Fiction, 1734–1957: An Annotated Bibliography (Powell, 1958).

Out–of–print soil surveys, which were completed for most counties in the early part of the twentieth century, became the primary history titles for all counties except Dare. The historic soil survey maps, which accompany the published soil surveys, were included on county pages.

These soil surveys, which were developed through a partnership of federal, regional, state, and local agencies and organizations, were used as the main source for information about each county. The original soil surveys and accompanying maps for 28 eastern North Carolina counties, which were published between 1905 and 1938, are valuable sources of historic information about counties. Conventional signs on the maps give cultural, relief, and drainage information such as cities, villages, roads, buildings, railroads, schools, churches, cemeteries, bluffs, stony and gravelly areas, and rivers. Many older place names on these maps are not found on newer county maps.




TEI and XML provide the structural foundation of the Digital Library. The TEI standards and encoding scheme help libraries, museums, publishers, and scholars represent texts for online research and teaching. TEI’s “Guidelines for Electronic Text Encoding and Interchange” (, first published in April 1994, include a tag set that is compatible with XML.

XML is a simplified version of SGML (Standard Generalized Markup Language) that allows structural information in files (metadata), rather than simply keying of information. It separates the description of structure in a document, e.g., <title>, from the information about its appearance such as “please make all titles centered and in italics.” The XML tag <title> is contained within the document while the instruction “please make all titles centered and in italics” is in a stylesheet that accompanies the document. A stylesheet makes it easier to have, and to change, a common appearance across a set of documents, or to have different display instructions for different types of users. This was very helpful when proofreaders encountered odd characters throughout a converted document or across documents. The technical staff usually took one look at some of these examples and replied, “No problem. We can handle that with a stylesheet.” Global changes through stylesheets converted the errors, which were usually characters such as superscripts or long dashes, to correct text.

XML structure allows XML search and browsing software to address and deliver a part of a document rather than a whole file as HTML (HyperText Markup Language) does. XML is a set of tagging instructions that allows one to make a set of tags that are Web–friendly. These tags describe the hierarchic structure of a document, just as the TEI tag set does, rather than its appearance on a computer screen as HTML does. XML allows you to specify types of content that correspond to certain elements. The TEI Web site,, gives a good introduction to TEI and XML.



Designing an idea

When sufficient titles had been identified, project planning began and the principal investigator, with assistance from Digital Initiatives staff, wrote a letter of intent for the Digitization Grant. The initial list of 67 titles was adequate for the grant proposal. The library funded TEI training, zooming technology for maps, TextML software for searching, and a color overhead scanner. After permission to write a full grant application was received in December, the principal investigator outlined an implementation schedule and began writing the full grant proposal. Technical staff began developing a prototype Web site for Pitt and Hertford counties that included all of the proposed county page resources. Library staff received weeklong TEI training in January and graduate assistants began post processing for the four pilot titles. This TEI training and post processing experience prepared staff to work productively in supervisory positions at the start of the grant period. For future projects, books with fewer pages should be selected for the pilot. The significant texts, however, provided numerous examples to use in defining local practices such as: “If a person’s entire first and middle name are known from abbreviation or from earlier text, then both are used in the regularization. Ex. <name type= "person" reg–"Frazier, John Hamilton"> Jno. H. Frazier</name>.”

Work continued on the project during the grant review process as if it would receive funding. During weekly meetings, grant staff answered questions and made suggestions for improvements. Work continued on refining technical specifications (, prioritizing the list of proposed titles, and securing copyright permission for a few titles that would contribute significantly to some of the county resources. As expected, everyone contacted granted permission to digitize texts. Initially, the University of North Carolina Press ( granted copyright permission for author biographies from the Dictionary of North Carolina Biography (Powell, 1979), but subsequent requests, however, were denied.

Evaluation of overhead color scanners and zooming technology software began after the grant application had been submitted. An Indus Book Scanner 5200 ( was selected based on the following:

EyeSpyTM Image Server from AXS Technologies Inc. ( was chosen for zooming based on the following requirements:

After the grant was awarded, grant staff jobs were advertised, a library administrative staff group was organized to oversee expenditures of grant funds, and naming conventions and other documentation needed for the project were updated. Students scanned soil survey pamphlets on a flat bed scanner and prepared the first conversion batch. UNC Library Photographic Services scanned the soil survey maps and saved them on CDs in 300 dpi TIFF (Tag Image File Format). This cost was less than the proposal estimate, so the remaining money was used for other vendor conversion services. The overhead color scanner and zooming server software were ordered through the normal campus acquisition process.



The real work begins

At the start of the grant year, the principal investigator “walked” the paperwork through the campus process and secured all approvals within 24 hours so grant staff could begin working within the first week. The major campus activity was for the campus grants office to assign a grant number and designate budget lines so that payroll and vendor records could be created. Careful attention to campus processes, grant reporting requirements, and vendor invoicing procedures resulted in a smooth expenditure of funds before the grant deadline. The approval process for initial invoices, which were mailed to the library, took at least a week. Later in the year, however, Apex began e–mailing invoices, which was helpful near the grant deadline. The only budget revision reallocated funds designated for mailing printed copies of scans to the conversion vendor. At the vendor’s suggestion, texts were sent using FTP (File Transfer Protocol) and a laptop was purchased for making DVD backups. Images were stored on the servers in the appropriate presentation format and multiple backup copies were stored on DVD–R media in TIFF.

In August students began preparing abstracts for books, author biographies (See Figure 5 for abstract and author biography entry), descriptions about the counties, additional reading lists, population histories, and links to Web sites related to each county.


Figure 5: Abstract and author biography for History of the Presbyterian Church in New Bern, N.C. (1886).

Figure 5: Abstract and author biography for History of the Presbyterian Church in New Bern, N.C. (1886).


Technical support staff placed converted text files returned from the vendor on a Web site that students used for proofreading and for determining reading levels for the texts. Since readability results using Microsoft Word and the Fry Readability Program ( were similar, Microsoft Word was used to determine the reading levels of the texts. Three full pages of text from the beginning, middle, and end of each document were copied into separate Word documents and the Flesch–Kincaid Grade Level results were averaged to determine the reading level.

All of the contextualization work was done in the NCC under the supervision of the principal investigator while the technical work of scanning, transferring files, quality assurance, and Web site and database development were completed in the Digital Initiatives Unit of the Systems Department. The principal investigator prioritized titles to be scanned, resolved questions related to quality of originals, made decisions related to authorities for regularization of names, monitored expenditures, served as liaison with the conversion vendor, prepared reports to the granting agency, worked with technical staff in preparing county page components, and charted costs to determine if additional titles needed to be scanned in order to expend all project funds.



Projects within a project

The original project proposal included one history title for each of the 29 counties and fiction titles for about half of the counties. Additional funding through the Apex Content Star Search Award supported the conversion of an additional 31 titles for an enhanced Dare County component. A significant fiction title related to Dare County is Grace I. Witham’s Basil the Page, for which Joyner Library has the only copies of both the British and American editions cataloged in WorldCat (See Figure 6 for cover image).


Figure 6: Image of Basil the Page cover.

Figure 6: Image of Basil the Page cover.


For the Apex award application, Joyner Library proposed to collaborate with the Outer Banks History Center (OBHC) in Manteo to digitize some titles from their collection and to digitize all local history and fiction titles related to Dare County (published prior to 1923) that were in the North Carolina and Roberts collections. Since the history titles selected from the OBHC related to the Battle of Roanoke Island, several Union regimental histories from Joyner Library, which included descriptions and images related to eastern North Carolina, were digitized (See Figure 7 for map).


Figure 7: Map from Bearing Arms in the Twenty-Seventh Massachusetts Regiments of Volunteers Infantry during the Civil War, 1861-1865 (1883).

Figure 7: Map from Bearing Arms in the Twenty–Seventh Massachusetts Regiments of Volunteers Infantry during the Civil War, 1861–1865 (1883).


An enhanced page for Beaufort County, which includes a Pirate Resources component (−files/additionalPages/Beaufort−pirate.html), was introduced in March 2004 at the NCCAT (North Carolina Center for the Advancement of Teaching, reunion weekend "“‘Ahoy There Mates’ Pirates Ahead!” The component includes five fiction titles about pirates and three titles from the library’s Rare Book Collection that have become some of the most visited resources in the digital library &@151; A General History of the Pyrates (1724,−image.html), The History of the Pyrates (1728,−image.html), and Stockton’s Buccaneers and Pirates of Our Coasts (1898,−image.html). The Pirate Resources include a lesson plan for teachers, a research study guide for ECU students ( eaufort−pirate−research.html), and a zoomable edition of an early pirate map from The History of the Pyrates (1728) (See Figure 8 for map).


Figure 8: Map from The History of the Pyrates (1728).

Figure 8: Map from The History of the Pyrates (1728).





There were no major roadblocks during the grant year, but a few frustrations, however, made daily work worrisome. Since the overhead scanner was not put into production until February, more than half of the grant year was over before post processing of converted files began. Therefore, there was not enough time left to complete quality assurance before the end of the grant year. Two library staff members, who were reassigned from the project, were not replaced. During most of the year, grant staff and students in the digital initiatives area were allowed to work only during 8 A.M.–5 P.M., Monday–Friday. Toward the end of the project, however, production staff members were permitted to work on post processing at remote sites. Distance work arrangements with flexible schedules were already in place for proofreading staff who worked under the principal investigator. Another roadblock involved the level of TEI tagging completed by the vendor. The international staff, working in India, could not interpret English words that have more than one meaning, such as Pamlico, which can be the name of a river, a county, a sound, a soil, a town in Pamlico County, or the town Pamlico Beach in Beaufort County. Library staff should complete this level of coding in future projects.



Plans for the future

The education component of the site will continue to be developed. During a seminar to introduce the Web site to educators, teachers wrote lesson plans that were added to the LEARN NC database. A senior education major at ECU, who worked as a proofreader, wrote lesson plans, and a graduate assistant prepared descriptions for illustrations so that images could be made searchable. GIS (Geographic Information Systems) mapping to make the zoomable maps searchable would be a valuable enhancement. Grant project staff held exploratory discussions with ECU geography faculty; however, the project has not been pursued.




A project like this would be much easier if all grant staff had at least one year of prior experience. It would be wonderful if grant staff had the luxury to spend an additional year refining a site and completing all of the enhancements that could make it more educational. Joyner Library has a unique opportunity with a Heritage Partners Grant for 2004–2005 that will increase the number of digitized resources related to the 29 counties and add artifacts from Historic Hope Foundation (, the Tobacco Farm Life Museum ( and the Country Doctor Museum ( to enhance the resources already available on the site. Expanding the Digital Library to include all of North Carolina would be a good service to students, historians, genealogists, and other researchers who are interested in other regions of the state. ECU students and other library users have already expressed disappointment that the North Carolina History and Fiction Digital Library does not include all of North Carolina. End of article


About the author

Elizabeth H. Smith is Professor of Academic Library Services and librarian in the North Carolina Collection at Joyner Library, East Carolina University, Greenville, North Carolina. She was principal investigator for the 2003–2004 NC ECHO (Exploring Cultural Heritage Online) Digitization Grant project that created the North Carolina History and Fiction Digital Library.
E–mail: smithe [at] mail [dot] ecu [dot] edu



Special thanks to Michael Reece, Production Coordinator in Digital Services, for providing images for this article courtesy of the NCH&FDL at East Carolina University.



Walter Clark, 1911. “How Can Interest be Aroused in the Study of the History of North Carolina?” North Carolina Booklet, volume 11 (October): pp. 82–98.

H.G. Jones (compiler), 1995. North Carolina History: An Annotated Bibliography. Westport, Conn.: Greenwood Press.

Mr. Killigrew, 1920. Miscellanea Aurea: or The Golden Medley. London: Printed for A. Bettesworth and J. Pemberton.

Kathy Nawrot, 1996. “Making Connections with Historical Fiction,“ The Clearing House, volume 69, number 6 (July/August), pp. 343–345.

William S. Powell (editor), 1979. Dictionary of North Carolina Biography. Chapel Hill: University of North Carolina Press.

William S. Powell (editor), 1958. North Carolina Fiction, 1734–1957: An Annotated Bibliography. Chapel Hill: University of North Carolina Library.

Michael Reece, Diana Williams, and Ann Stocks, 2004. “EyeSpy as a Zooming Platform,” at, accessed 29 July 2005.

Editorial history

Paper received 8 May 2005; accepted 19 July 2005.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.

From genesis to revelation of an online resource: The North Carolina History and Fiction Digital Library by Elizabeth H. Smith
First Monday, Volume 10, Number 8 - 1 August 2005