How Sound Can Aid the Information Seeker in Information Retrieval...
April 26, 1999
Sonically enhanced retrieval allows for more natural communication between computer and user, allowing users to employ two senses to solve a problem.
- Stephen Brewster, University of Glasgow
Until recently, the primary utilization of sound in conjunction with computers has been for entertainment and simple user feedback. Sound is an integral and invaluable piece of the puzzle that can assist in the solution of the information retrieval match problem. Sonification, the use of non-speech audio to convey information, can especially aid the information seeker in the information retrieval match process.
Considering the fact that human communication is primarily carried out via speech and that we use audio cues in our daily lives, it is interesting that more research and application development has not occurred in the use of sound with computers. Bill Buxton, a noted researcher in sonification related fields, aptly observed that “Archaeologists of the future, upon finding a 1980 personal computer, could conclude that the creatures living at that time had very poor hearing. Why else would they design a tool with no audible output except a ‘beep’?”
Before discussing why sound, and sonification in particular, has only recently emerged as a powerful tool in the computer world, it is important that we begin with a common set of definitions. Sound is the quickly varying pressure wave within a medium. Sound is produced when the air is disturbed in some way, such as by a vibrating object. For the purposes of this paper, sound means audible sound, which is the sensation (as detected by the ear) of very small rapid changes in the air pressure above and below a static value.
“Who the hell wants to hear actors talk?”, attributed to Harry Warner (Warner Brothers) when asked on the future of “talkies” in the motion picture industry, although appearing laughable now, can be compared to our present day outlook on the use of sounds with computers. There are some distinct and unique advantages to using sound: sound, like time itself, is well suited to represent temporal information; changes in sound over time will be apparent; sound can be used to detect anomalies in data; sound can replace a distracting visual elements, such as on a map display; there is less interference between tasks using different senses; sound can be utilized by individuals who are deprived of vision; sound has an insistent quality that is suitable for alarms, signals, and email; sound is well suited for signaling the ongoing status of background activities; nonverbal sound can provide a sense of place; and sound can enhance “events” (the physical interactions between things) when they are hard to assess accurately with the eyes or are too transitory to be seen.
The acceleration of interest in sonification as an additional tool to aid in information retrieval has been brought to a nexus by a number of factors. The development of a better understanding of information retrieval theory, more powerful and affordable computers, and advances made in sound synthesis technology has synergistically resulted in an ability to take advantage of sonification. However, I believe the real driving force in the utilization of sonification in computers has been brought about because of the need to deal with the increasing number, size, and complexity of data sets that challenge existing visualization techniques.
The realization that sonification may aid in information retrieval by various practitioners and researchers has begun to affect the way data is examined. The difficulty of sorting out relationships increases as the number of variables to examine increases. Stuart Smith and his colleagues speak of using sonification to find what they refer to as signatures in the data. A signature means a feature or set of features that “pop out” to the analyst. For example, seismologists since the 1960’s have been aware of some of the advantages of sonification. Since seismic data is acoustic in nature, it made sense to replay the seismic recordings at audio rates, rather than look at text readouts, making it possible to overview twenty-four hours in several minutes. (Hayward 1994).
Like sonification, the utilization of speech with computers has also progressed exponentially over the last few years, for many of the same reasons as listed above. Although speech has some advantages in human to human communication, there are some aspects of language that makes sonification the better choice for some computer applications and specifically for information retrieval: speech is slow, sometimes the signals or data changes before the interface reacts; speech is serial, the user must hear it from beginning to end; speech is similar to text, whereas non-speech sounds are similar to graphical icons; a non-speech sound can represent a concept, whereas many words may be needed to describe it; speech is language dependent, whereas non-speech sound is universal; speech is re-digested into words by someone else; and finally, sonification is good at picking up correlations and repetitions.
Because of these factors, sonification has been used for centuries to help enable users to obtain more information. Sonar is perhaps the most thoroughly investigated practical use of sonification. As early as 1490, Leonardo da Vinci reported hearing ships at a long distance by submerging a long tube under the water and placing the other end near his ear. Drums were used in many primitive cultures to communicate. Military strategists employed drums (and bugles) because of their lack of ambiguity to convey nearly every maneuver used in battle.
As technology advanced, so did our use of sonification, even though it was somewhat limited. Morse code became a standard for a communication system. Hans Geiger in the early 1900’s realized that sonification could help users discover data that had previously gone unnoticed with a visual display, while at the same time, satisfy the end user needs to have eyes free for other tasks. In an ironic and “titanic” twist of fate, five days after the Titanic sank in 1912, a patent was filed for an echo-ranging device. Computer programmers in the early days of computers, discovered that a “cheap” radio placed near the processing unit would pick up interference and thus could detect loops that might otherwise go unnoticed (Bly).
Some recent sonification uses included the pulse-oximeter. Developed in the 1980’s, the pulse-oximeter produced a tone that varied in pitch with the level of oxygen in a patient’s blood (extended to a six-parameter medical workstation by Fitch and Kramer in the 1990’s). Sonification helped diagnose a problem in the early 1980’s with the Voyager 2 space mission as it began its traversal of the rings of Saturn. Controllers were unable to pinpoint the problem using visual displays, which just showed a lot of noise. When played through a music synthesizer (using an Apple II), a “machine gun” sound was heard during a critical period. This lead to the discovery that the cause of the problem was high-speed collisions with electromagnetically charged micrometeroids.
In other recent utilization of sonification, Edward Yeung, in the early 1980’s, investigated sound as a means of representing the multivariate data common in chemistry. Seven variables of sound were matched with seven chemical variables. Using professional chemists as test subjects, they were able to understand the different patterns of the sound representations and correctly classify the chemicals with 90% accuracy rate before training and 98% after training. Finally, a discovery known as the Quantum Whistle provides insight into different ways that sonification may be used in the future. After months of the unsuccessful study of visual oscilloscope traces for evidence of an oscillation predicted by quantum theory, physicists Davis and Packard used sonification to listen to their experiment. Using liquid helium, they heard a faint whistling, which proved to be the first evidence that the oscillations do actually occur. These cases illustrate the ability of the auditory system to extract underlying structure and temporal aspects of complex signals that are often important in scientific exploration and discovery.
Research in sound and sonification has been held back for several reasons, but partially because of one of its advantages: Sound is by its very nature interdisciplinary. This resulted in many groups from varying disciplines working in the research about sound, but there was no communication between the groups. Steve Frysinger (James Madison University), an early proponent of the use of sound and computers, states that it “essentially amounted to one or two papers per decade from the 1940’s through the 1970’s.” In 1980, spurred by a paper by Sara Bly, a small group consisting of Joe Mezrich, Dean Wallraff, Sara Bly, Dave Lunney, and Frysinger met and discussed issues relating to sound. Auditory Data Representation (ADR), as Frysinger called it then, had an unsuccessful reception from the computer world in 1980. This was probably due to the difficulty and expense of providing sound hardware and software, as well as lack of “sound” having its own distinct discipline.
Around 1985, some of the past, logistical problems (Bly had to build special hardware and interface) were partially solved by technological changes resulting from the music industry adopting a standard protocol for interfacing electronic sound synthesis and processing equipment to computers. Known as MIDI (Musical Instrument Digital Interface), this protocol eventually resulted in a wide range of equipment and interfaces available to the researcher.
In 1990, Greg Kramer, from the Sante Fe Institute, Studies in the Sciences of Complexity, attempted to pull many of the groups together, resulting in the International Conference of Auditory Display. Much of the research in this paper derives from proceedings from the ICAD organization. (Note that the “C” in ICAD now stands for Committee). Kramer advocates the term “Auditory Display” be used to describe non-speech audio to convey information in the computer interface. Auditory Display, as he defines it, is divided into three techniques: Sonification, Auditory Icons, and Earcons.
Sonification, as previously defined, is the use of non-speech audio to convey information. Sonification is the technique of translating multi-dimensional data directly into sound dimensions. More specifically, it is the transformation of data relations into perceived relations in an acoustical signal for the purposes of facilitating communication or interpretation. Kramer determined that by its very nature, sonification is interdisciplinary and thus integrates concepts from human perception, acoustics design, the arts, and engineering.
Sonification can include the use of data to control a sound generator for the purpose of monitoring and analysis of the data. It is important to note that sound generation need not have any direct relationship to the data being generated. (This distinction will become clearer when Auditory Icons and Earcons are discussed.) Another interesting aspect of sonification is that removing the display one more step from the data may also make it richer and more coherent. A mechanic listening to car provides a good example of this aspect. To look at the car, nothing may appear wrong, but an experienced mechanic can garner information about the RPM (Revolutions Per Minute) from the pitch, whether all cylinders firing, etc., just like sound can help a doctor determine information about the human body.
While it is important to realize the valuable contribution sonification can make now and for future developments, it is not a panacea for everything that ails information retrieval. In some cases, redundant information is less effective than audio-only information to the phenomenon of visual dominance. (Tzelgov et al, 1987) As Dr. Gluck states, “It really does depend!” Our sense of vision often seems much more dominant than our sense of hearing. (Ackerman 1990, Tuan 1993) In fact, there are many problems with sound and sonification including: low resolution of many auditory variables; limited spatial precision; lack of absolute values; certain sounds can be very annoying to users; sonification can interfere with speech communication; there can be an absence of persistence, with sound just fading into the background; printouts are unavailable, the auditory track may need to be played over and over; and there can be user limitations, the aural equivalent of color blindness.
Auditory Icons, a second technique of Auditory Display, are caricatures of naturally occurring sounds that convey information by analogy with everyday events. (Gaver, 1986) Gaver believed that one way sonification could be useful is to assist users in remembering the meaning of non-speech audio signals by metaphor, by having the meaning associated with a sound correspond to the meaning of similar sounds in the everyday world.
Two advantages of using Auditory Icons are that users tend to be more engaged and that Auditory Icons provide increased flexibility because users don not have to always attend to the screen for information. Theoretically, the advantage of Auditory Icons is in the intuitiveness of the mapping between the sounds and their meaning. Gaver sees the mapping as a part of the hearing process and developed a toolkit, ENO, to generate auditory icons.
One interesting aspect in the development of Auditory Icons is that Gaver questioned the basic notion of listening. In 1979, Vanderveer presented subjects with thirty common sounds such as the tearing of paper or applause and found that subjects tended to identify the sounds in terms of the object and events that caused them. The test subjects described the sensory qualities only when they could not identify the source events. Based on these and his own findings, Gaver argued that auditory displays should be built around real-world sounds, believing that we hear the source of the sound and attributes of the source.
Gaver used some effective Auditory Icons for the early Macintosh computer systems. For example, he used our association of reverberation with empty space to provide the user with a reverberant “clunk” when saving a file; with the amount of reverberation providing a good cue as to how much free space was left on the disk; and that when placing a file into the Macintosh “trash can”, it was accompanied by an appropriate “tinny crash”. Gaver employed Auditory Icons in another way, not quite as effective, by attempting to heighten awareness of a distributed community of users by emulating real sounds, such as a typewriter and other office equipment and procedures, to create a more “natural” environment.
The advantage of mapping between the sounds and their meaning is also the weakness of Auditory Icons. Researchers and practitioners have had difficulties trying to provide perceptual metaphors and developing real-world analogies for computer operations such as renaming a file. Although some may think of Auditory Icons as frivolous, it is important to remember how they can be used to aid in information retrieval with the visually impaired or in critical applications where sound can reduce risk of error.
Gaver believed that there are two types of listening, referring to them as musical listening and everyday listening, with everyday listening dominating. One aspect of sound that I have not discussed in this paper is music. Because of time and the limits of this investigation and project, music is only examined on how it is used in Auditory Display. Before discussing Earcons, the third type of Auditory Display, that uses musical tones, it is important to note the importance of musical theory and how it relates to sonification and information retrieval. The basic elements of music (listed below) can be used in a variety of ways to help discriminate information and several examples will be provided in the remaining sections of this paper. These music elements are Pitch: the primary basis for traditional melody and is the highness or lowness (frequency) of a sound, has been found to be the most useful element of music in information retrieval; Rhythm: the relative changes in the timing of the attacks of successive events; Tempo: the speed of events; Dynamics: the relative loudness of events (static or varying); Timbre: the difference of spectral content and energy over time (differentiates a saxophone from a flute); and Location: where the sound is coming from. One of the most straightforward ways of using sound to describe changes in magnitude of a single variable is to code it by pitch. (Mansur, Blattner, and Joy 1985) It is the feature of a sound by which listeners can arrange sounds on a scale from “lowest” to “highest”.
Earcons, a third type of Auditory Display, are “non-verbal audio messages that are used in the computer/user interface to provide information to the user about some computer object, operation or interaction.” (Meera Blattner, 1989) Earcons are based on synthetic musical tones and are constructed from simple building blocks called motives, generally one to three notes long. By using several dimensions of sound, such as pitch, timbre, and rhythm, Earcons can be combined in different ways to produce a tonal pattern sufficiently distinct to be a recognizable entity that can be communicated to the user. Earcons have been used in the largest number of real-world applications with the simplest being auditory alarms and warning sounds, like in airplane cockpits and hospitals.
Stephen Brewster and the Glasgow Interactive Systems Group have studied the effectiveness of adding sound to tool palettes. By using Earcons to indicate the current tool and also when tool changes occurred, users could tell what tool they were in, independent of wherever they were looking. Results showed a significant reduction in the number of tasks performed with the wrong tool. An important consideration was to not make the tools any more annoying to the user. Earcons associated with buttons, scrollbars, and menus reduced the time needed to recover from errors, as well as the time taken to complete tasks.
In other uses of music related sonification, Beaudouin-Lafon and Conversey added sound in scrollbars using an auditory illusion called Shepard-Risert tones which increases or decreases in pitch indefinitely (similar to an Escher endless staircase). Hudson and Smith (1996) have constructed longer earcons, which convey useful information about incoming email messages, referred to as audio glances.
An audio glance can present an overview aurally rather than visually of the overall properties of an object. Although primarily associated with the utilization of email, I think the implications of this feature to information retrieval are just beginning to be realized. For example, doing a basic Internet search, the user may receive some feedback resulting from his search, such as ranking. Earcons could be used in several different ways to aid in the match process. By assigning Earcons to certain types of documents such as text only, text with graphics, graphics only, music, video, etc., the user could know before downloading what type of document would be received or perhaps whether it should be downloaded at all. Another utilization of audio glances to information retrieval could be achieved by assigning a certain pitch or Earcon that corresponded to the length of time the document would take to download, based on its size.
Another way pitch or Earcons may be used is by assigning a different pitch or Earcon to several of the keywords in a search. The user interface could be set to play the pitch or Earcon in the order of priority found in the document. For example, a user could search on keywords “one” OR “two”. The first, “one” could be designated as a high pitch, and the second, “two”, as a low pitch. When the results were returned to the screen, a user could then highlight each of the “hits”, and as well as getting the traditional ranking and brief description of each of the sites, could also know which of the two keywords were most prevalent in the document by which pitch was played back first.
Elizabeth D. Mynatt, another noted sonic researcher, prophesized in 1995 that one should “Imagine driving to work, listening to auditory cues, navigate to the file menu which has a print option. You print reports so that they are waiting for you when you arrive. Start your email application and listen to any new email messages, delete and file others.” Again, one must consider the Internet access implications to information retrieval. Researchers are currently working on auditory techniques that can convey information that is normally achieved through layout. Using a combination of speech, sonification, music, and Earcons, layouts could be enhanced on screen or achieved in an auditory only model to be used in mobile operations. By using pauses, special voices or effects for hyperlinks, fast forward and rewind buttons, history knob, headline mode, etc., designers will make it possible for users to search the internet while driving to work.
In other related research and applications, sound has provided a means of expanding the representational aspects of cartography and visualization. In 1994, John Krygier identified a range of sound variables that are similar to graphic variables: Location: location of a sound in a two- or three- dimensional sound space; Loudness: the magnitude of a sound; Pitch: the highness or lowness (frequency) of a sound; Register: the relative location of a pitch in a given range of pitches; Timbre: the general prevailing quality or characteristic of a sound; Duration: the length of time a sound is (or is not) heard; Rate of change: the relation between the duration of sound and silence over time; Order: the sequence of sounds over time; and Attack/decay: the time it takes a sound to reach its maximum/minimum.
Krygier produced a cartographic example using sonification to provide information to the user. When a mouse was positioned over counties on a map, the computer provided a pitch corresponding to high, medium, or low mean journey to work of county residents. This technique, sonic probe, is intended as a way to increase the number of variables that an analyst can consider simultaneously. Although this same information could be obtained by color or symbol size, this technique is especially useful for the visually impaired.
Although this paper has attempted to focus on “all” users, because of time and topic considerations, it is impossible to ignore the effects of both visual and auditory display on the visually impaired. Much of the early research and development of applications in sonification began in the area of the treatment of individuals with visual impairment. It is also important to note that one of the main deprivations caused by blindness is the problem of access to information. This information bottleneck must be overcome in order for visually disabled users to access information. As the Graphical User Interface (GUI) has become the standard on the Internet, it is even more important that alternative modes of interaction are developed for the blind and visually impaired.
In conclusion, this paper has provided information on research and (hopefully) some insights to help answer three key questions: How can sonification improve sight-based (visual display) information retrieval for the information seeker?; How does the difference between speech, non-speech, and music affect information retrieval?; and Who can benefit from audio-based retrieval? The answer to all three questions can be summed up in one sentence: Depending on the situation, sonification (non-speech), speech, and music can improve sight-based (visual display) information retrieval for information seekers. Sonification is particularly relevant and timely in aiding the information seeker in the information retrieval match process because of the need to comprehend, interpret, and understand an abundance of data. Because visualization is often insufficient for comprehending certain features in the data, the properties and advantages of audio are being proven increasingly useful for presenting data for users.
THE FUTURE IS HEAR...
The continued trend towards complex tasks and data challenge our abilities to make sense of things via abstractions such as numbers and categories. Perceptual presentations such as sonification present a means for tapping innate meaning-making skills. Matching these perceptual and cognitive capacities with appropriate tasks and designing effective display systems to effect this match is going to be a large and increasingly subtle challenge.
-Greg Kramer (Email from Greg Kramer to Dennis Hage, April 18, 1999)