First Monday

Writing photo captions for the Web

Writing photo captions for the Web by Ruth Garner, Mark Gillingham, and Yong Zhao

Photographs are rarely self-sufficient. They need captions. A caption tells us something about the person or thing photographed, also something about the photographer. In this article, we discuss how to write photo captions for the Web. We provide examples from adults’ and children’s work.


Photo captions on the Web
Children’s picture-taking and captioning for the Web
Captioning of robot photos
Captioning and didacticism





Photo captions are, in a sense, parasitical. They depend on the images to which they are attached for their existence. Photographs can appear captionless or nearly so, but captions never appear without photographs. Why would they?

To be truly parasitical, though, a photo caption would have to offer a photograph nothing useful in return, and that’s not the case. Some photographs require captions. Think about crime scene photos or the Army hospital photos of soldiers’ wounds and scars taken during the American Civil War. These photographs would be of little use for police work or for medical training without captions.

In fact, even a photograph that does not require much of a caption can benefit from one. Recently, while glancing at portions of Camera Lucida (Barthes, 1981), we paused to look at a photograph of Queen Victoria on horseback [1]. Reading the caption beneath the photograph, we paid close attention to the title (it identified the woman on horseback as the Queen). We paid less attention, we admit, to the information about the photographer and the year in which the photograph was taken (George W. Wilson, 1863). We found a quotation from Virginia Woolf interesting. Someone — Barthes, we assume, rather than the photographer, an archivist, or Barthes’s publisher — had included a few rather unflattering words from Woolf about the Queen.

There have been mass-produced photographs, with captions attached to them, since the mid-nineteenth century. Alan Trachtenberg (1989) wrote that early photographs struck viewers as familiar (like drawings), but also as strange in their "mirror-like detail" [2], their lifelikeness. Captions may have assisted viewers in making sense of the earliest images.



Photo captions on the Web

Some 150 years after the first cheap, mass-produced photographs became available, there is a World Wide Web, and non-text objects, including photographs, appear throughout the Web. Most of the photographs we've seen on the Web are captioned, though not all are.

Online newspapers seem to have the most informative photo captions, and that makes sense to us. After all, most newspaper people think it’s their job to keep the public informed. Text informs, a photograph with a good caption informs. The online newspapers that we access most frequently are not exactly the same as the print versions of the same papers. They have links to archived articles and video and audio clips. When there is breaking news, information is updated. That means that some text and some photographs (captions too, of course) change during the course of a day.

The usefulness of an informative photo caption can be seen in a recent article in the New York Times Magazine (Jacobs, 2003). Near the start of the article is a photograph that, on its own, is somewhat ambiguous: Is that haze an actual gas attack? Who are those people trying to secure their gas masks? They aren’t wearing proper military uniforms, so they can’t be soldiers, can they? Who are they then? What’s going on? An informative caption clarifies matters: "Members of the news media during a mock gas attack at a Pentagon 'boot camp' at Fort Dix, N.J." This photograph and two others appeared alongside a longish text. When we read through the text for more information about the mock gas attack, we learned that the exercise was meant to introduce wannabe war correspondents to the realities of war, one of the realities being an N.B.C. (nuclear-biological-chemical) assault.

Even though the Times piece on the mock gas attack appeared in both online and print versions of the paper, we see the piece as an example of what Web usability guru Jakob Nielsen advises people to do when they write for the Web. Nielsen and his colleagues have suggested that people read most articles on the Web in a no-nonsense, fast, focused fashion [3]. They want to have access to a good search engine (absent that, to their browser’s "Find" command). They also use certain elements in a text to help them find what they’re looking for. These elements include headings, text that is highlighted in some fashion, bulleted lists — and pictures and their captions. An uncaptioned photograph isn’t useful. Neither is a photograph with either a minimally informative or a misleading caption.



Children’s picture-taking and captioning for the Web

Many adults may doubt that children can take pictures that are worth viewing. When we began observing the work of young photographers in an after-school program in the small town of Baldwin, Michigan [4], we were surprised by three things. One was the children’s comfort in working with digital cameras, computers, and image-editing software. The second was the sophistication of their subject choices. Instead of doing what Susan Sontag (1977) described as typical behavior for amateur photographers — trying for a beautiful photograph by photographing something beautiful (a sunset perhaps) — they made offbeat choices and photographed things such as the abandoned buildings of an old cement plant and pet mice. The third thing that surprised us was the technical quality of the children’s work [5].

When we began our observations in Baldwin, we were so amazed at the photographs themselves that we didn’t pay much attention to photo captions. We noted that the children provided captions for work that Mac, the program coordinator, published on the program Web site, and we left it at that.

More recently, as we’ve thought more about photo captions, we’ve wondered if children might not have some difficulty writing good captions. We know that children produce ambiguous references in their writing. A number of years ago, Bartlett and Scribner (1981) analyzed the writing of children in third through sixth grade and found that the children had great difficulty when they attempted to make reference to one of two same-sex, same-age people. They used "the boy" or "the man," failing to clarify which boy or man they had in mind. Referential ambiguity of this sort would be a problem in captioning. In the Times photo of members of the news media fumbling with their gas masks, for example, the words "the man" in a caption would be confusing. "Which man?" readers might well ask.

We resolved to take a closer look at the Michigan children’s work. We would examine some of their photo captions, keeping an eye out for two problems: (a) minimal or misleading information, and (b) ambiguous references to people or objects in the photos. An opportunity presented itself when a group of sixth- and seventh-grade children and their adult robotics adviser, Diane, took some robot pictures and downloaded them to a computer. We would have a chance to observe the captioning of the robot photos.



Captioning of robot photos

There are currently four robot enthusiasts in the Baldwin after-school program — Jessica, Kimberlyn, Phillip, and Sarah. They meet with Diane every Tuesday and Thursday to design, build, and operate robots. They use LEGO Mindstorms robot kits, a commercial version of the construction materials that Seymour Papert (1993) gave children over a decade ago. The kits have an autonomous microcomputer that can be programmed with a PC. The kits also have LEGO bricks, motors, gears, and sensors.

As we studied the children and their snazzy little machines prior to any captioning work, we decided that we were looking at both science and art. It is science and engineering, of course, that is needed to build an artificial creature that can be made to move about purposefully. However, there is art in the robots as well. They are designed by the children, and they are obviously vehicles for self-expression.

In addition to science and art, there seems to be social interaction in work with robots. MIT scientist Rodney Brooks (2002) has noted that people engage robots. He gives the example of his MIT colleague, Sherry Turkle. Turkle visited his lab and reported that she felt happy when a robot turned its head to follow her movements. At one point, she found herself competing with another visitor for the robot’s attention. There is also almost-parental pride in giving birth to a robot: "Meet Oscar, my robot. As robots go, he’s pretty useless: He knows how to turn around when he bumps into an object. He beeps when he sees a light (sometimes), and he can do a little back-and-forth dance. That’s it. For that matter, he falls apart on a pretty regular basis. Any seventh grader could probably do better. But hey, he’s mine, and I’m proud" (Brown, 1998). In Baldwin, we heard the children engage their robots: Kimberlyn said, "You better work or else" (it did work, eventually). Sarah coaxed her robot, "Come on, you can do it" (move along a black line).

We asked the four robot enthusiasts in Baldwin (some of whom are also involved in photography) to provide captions for 14 robot pictures. Some of the photographs, just like the mock gas attack photo in the Times, were difficult to interpret without a caption: What were those various building materials? We recognized LEGO bricks, motors, gears. What were those colorful wobbly things on Jessica’s and Kimberlyn’s robot? Was Phillip doing programming on the laptop computer? And why was Diane covering her face in apparent distress? (We imagined some sort of off-camera robot crash, pieces of the machine flying off in all directions. We hoped that the children’s caption would tell us if we were right or not.)

The captioning session was very informal. Two of us, Diane, and the four children were present. We started the session by showing the children a captioned photograph, an old school picture from Baldwin. We told the children that we planned to use a few of the robot pictures in an online article, but the photos needed captions. We hoped that this would give them a clear (and authentic) purpose for writing captions. We said nothing, of course, about features of interest to us (i.e., informativeness, references to people or objects), as we wanted to see what sort of captions the children would write without any assistance from us.

The children worked as a group in captioning, often engaging in overlapping talk. Just as linguist Deborah Tannen (1989) wrote, the overlapping talk seemed to be cooperative and rapport-building, rather than interruptive. Part of a comment from one child was often repeated by another child. We scribbled notes on prints of the photographs, repeating what we’d written whenever we had any doubt that we’d accurately recorded exactly what the children had said. We encouraged the children to revise captions if they wished, making certain before moving on to another photograph that we had in our scribbles what the group considered to be a "final" caption. We did not use a tape recorder. We thought that the children would be more comfortable if they weren’t being recorded.

How did the children do? Quite well, we think. Two of us judged 11 of the 14 final captions (79 percent of them) to be informative and without referential ambiguity. Consider, for example, Figure 1. This photograph shows Phillip going through training exercises, learning programming.

img border="0" img src="" alt="Figure 1: A photograph of Phillip during training exercises. The children wrote an informative caption for this photo."
Figure 1: A photograph of Phillip during training exercises. The children wrote an informative caption for this photo.

The group’s first effort was "Phillip going through training." That caption was revised, as two children suggested that more detail was needed: "Phillip going through training in programming the robot." Phillip reminded the others that the training exercises were "shockingly boring" (his words). The others agreed, and the boringness idea was added to the caption. The children’s final caption for the photo was: "Phillip going through boring training in programming the robot." There could have been more information in the caption, of course. The children might have identified and explained the equipment being used — the laptop computer, the transmitter — but adding that information would have made the caption quite a bit longer. Remembering Morkes and Nielsen’s (1997) suggestion that people writing for the Web keep text short and to the point, we think that the length of the children’s caption is just about right.

Consider another photograph and its caption. Figure 2 shows Kimberlyn and Jessica searching for a particular part for their robot.

Figure 2: A photograph of Kimberlyn and Jessica searching for a robot part. The children experimented with dialogue in their caption for this photo.
Figure 2: A photograph of Kimberlyn and Jessica searching for a robot part. The children experimented with dialogue in their caption for this photo.

We liked what the group did with the caption for this photo. Instead of describing what is going on in the picture, they decided to tell us with a bit of talk: "Mrs. V. [the children’s name for Diane], we can’t find the part in the kit." The talk is natural sounding. That is undoubtedly because the children who say this (often, according to Diane) wrote the caption.

There were, as we said, three captions that struck us as less successful. In two, the photographs showed a large collection of specific objects used in robot-building (e.g., a microcomputer, LEGO bricks, motors, gears, and sensors, even those colorful wobbly things that we’d noticed), but the captions were very general. They referred only to "the parts." We felt that these captions were minimally informative, that naming specific objects would have been more useful to viewers. For the third (see Figure 3), a photograph of Jessica’s and Kimberlyn’s colorful robot, Diane hoped that the children would emphasize the robot’s "personality" (her word) in their caption. She said as much. However, the children went in a different direction.

Figure 3: A photograph of Jessica’s and Kimberlyn’s robot. The children’s caption for this photograph referred to objects that were not in the picture.
Figure 3: A photograph of Jessica’s and Kimberlyn’s robot. The children’s caption for this photograph referred to objects that were not in the picture.

Their final caption was: "The robot works very good. It can see black lines and follow them." An obvious (and easily fixed) problem with this caption is that the children used "good" when they should have used "well." Slightly less obvious, perhaps, is the fact that there are no black lines in this picture. A caption that refers to things not in evidence might confuse viewers. In an effort to make sense of the caption, viewers might make the mistake of looking at the lines on the table or the lines on the wall behind the robot. Neither, of course, has anything to do with the black lines that the children have in mind, the ones they had laid out for demonstrations before captioning began (the black lines of a race course).

We were right, by the way, in our guess that Phillip was programming. The children confirmed that during captioning. We were wrong about Diane’s covering her face because of an off-camera robot crash. It turns out that the program on the computer wasn’t working as the children and Diane had hoped it would, and everyone was frustrated. Designing, building, and (especially) operating robots can be frustrating, the children told us. "You have to have patience," Jessica added.



Captioning and didacticism

It wasn’t an issue with the robot photos, but some photographs are captioned with quite extraordinary zeal and passion. In our experience, this sort of captioning is often done by someone other than the photographer, perhaps an editor who has compiled a set of images and wants to get them into the hands of viewers in order to convince them of something. The zeal and passion of the captions are such that viewers’ interpretation of images is constrained.

Susan Sontag (1977) gave the example of Walker Evans’s photographs of sharecroppers that were accompanied by "eloquent prose written (sometimes overwritten) by James Agee" [6]. The prose, Sontag wrote, was intended to deepen the readers’ empathy with the sharecroppers’ lives. We do not mean to suggest — and Sontag didn’t either — that uncaptioned photographs are not interpretations of the world. They are. There is (Sontag’s word) a "didacticism" [7] in the whole enterprise of photography, as there is in the associated enterprise of captioning photographs. Our point is that photographers, writers, and editors should avoid extremely didactic captions that, in the making of moral observations, cease to inform.

Another example, in addition to Sontag’s, of didacticism in captioning was provided by Alan Trachtenberg (1989). He described how Northern photographers and publishers during the American Civil War wanted to make visible the mutilations of war (very evident in the photographs from the corps of photographers organized by Mathew Brady). However, they also wanted to sustain public faith in the cause. The solution in a number of publications was to present images within a tightly controlled framework of political interpretation — support of the Union, hatred for the practice of slavery and the Southern aristocracy. Pictures were surrounded with text, some of it in the form of extended captions. The captions in one of the books provide, as Trachtenberg put it, "assistance to the reader who may not see the image exactly in the light intended by the editor" [8].




Photo captions — the good ones, at least — are informative. Without the caption for the Queen Victoria photograph, we might recognize the woman pictured as someone rich and famous (she sits so regally on horseback, after all), but we might not know which rich and famous person she is.

Does that matter? It doesn’t, if we are skimming through the Barthes (1981) book simply to take note of the great variety of photographers’ subject matter. If, however, we find this particular photograph of historical interest, if we are studying it, we surely will want to know more — who the woman is, when the photograph was taken, and so on. For someone studying a photograph, an image is seldom self-sufficient. A caption is required.

A photograph requiring a caption need not be a portrait of a queen, and it need not be a photograph reproduced in a book. It might be an online photograph of a robot. We found certain photographs taken by the Baldwin robotics group difficult to interpret without a caption. We guessed right about some subject matter, but not about all. The children’s captions clarified matters for us.

What makes a photo caption a good one? We’d say that Morkes and Nielsen (1997) — and the four robot enthusiasts in Baldwin — got it about right: Captions should provide information, they shouldn’t mislead or confuse, they shouldn’t be excessively didactic, they should refer to people or objects in the photos without ambiguity, they should be relatively short and to the point. With the best ones, as Alan Trachtenberg (1989) noted, image and text seem "harmoniously combined" [9]. End of article


About the Authors

Ruth Garner currently serves as an evaluator for after-school programs in Michigan. Her most recent book (edited with Yong Zhao and Mark Gillingham) is Hanging Out: Community-Based After-School Programs for Children (Westport, Conn.: Bergin and Garvey, 2002).

Mark Gillingham heads the technology unit at the Great Books Foundation in Chicago. He co-authored (with Ruth Garner) the book Internet Communication in Six Classrooms: Conversations Across Time, Space, and Culture (Mahwah, N.J.: Erlbaum, 1996).

Yong Zhao is Associate Professor of Technology and Education at Michigan State University. He directs a federally funded consortium of urban and rural after-school programs. E-mail:



1. Barthes, 1981, p. 56.

2. Trachtenberg, 1989, p. 4.

3. See, e.g., Morkes and Nielsen, 1997.

4. See, e.g., Garner, Zhao, and Gillingham, 2002.

5. We’ve discovered recently that a group of researchers at the University of Birmingham (see, e.g., Sharples et al., forthcoming) has found much the same thing. The group asked children at different age levels to take pictures with single-use film cameras and to talk about their photographs. Even the youngest children produced photos of reasonable technical quality.

6. Sontag, 1977, p. 72.

7. Sontag, 1977, p. 7.

8. Trachtenberg, 1989, p. 96.

9. Trachtenberg, 1989, p. 99.



R. Barthes, 1981. Camera lucida. Translated by R. Howard. New York: Hill and Wang.

E.J. Bartlett and S. Scribner, 1981. "Text and content: An investigation of referential organization in children’s written narratives," In: C.H. Frederiksen and J.F. Dominic (editors). Writing: Process, development and communication. Hillsdale, N.J.: Erlbaum, pp. 153-167.

R.A. Brooks, 2002. Flesh and machines: How robots will change us. New York: Pantheon.

J. Brown, 1998. "I, robot? My robot!" (2 October), at, accessed 15 March 2003.

R. Garner, Y. Zhao, and M. Gillingham, 2002. "Children’s use of new technology for picture-taking," First Monday, volume 7, number 9 (September), at, accessed 5 March 2003.

A. Jacobs, 2003. "My week at embed boot camp," New York Times on the Web (2 March), at, accessed 5 March 2003.

J. Morkes and J. Nielsen, 1997. "Concise, SCANNABLE, and objective: How to write for the Web," at, accessed 5 March 2003.

S. Papert, 1993. The children’s machine. New York: Basic Books.

M. Sharples, L. Davison, G.V. Thomas, and P.D. Rudman, forthcoming. "Children as photographers: An analysis of children’s photographic behaviour and intentions at three age levels," Visual Communication.

S. Sontag, 1977. On Photography. New York: Picador.

D. Tannen, 1989. Talking Voices. Cambridge: Cambridge University Press.

A. Trachtenberg, 1989. Reading American photographs. New York: Hill and Wang.

Editorial history

Paper received 26 March 2003; revised 8 April 2003; accepted 31 July 2003.

Copyright ©2003, First Monday

Copyright ©2003, Ruth Garner, Mark Gillingham, and Yong Zhao

Writing photo captions for the Web by Ruth Garner, Mark Gillingham, and Yong Zhao
First Monday, Volume 8, Number 9 - 1 September 2003