* Published in Proceedings of the 1996 International Computer Music Conference.
Conservatorium of Music
University of Tasmania
School of Arts and Media
La Trobe University
This paper describes a new interface for interactive sound spatialisation by the public over three-dimensional (3-D) loudspeaker arrays. The interface uses ultrasound to implement an adaptation of F. R. Moore's General Model for Spatial Processing of Sounds and offers clear ergonomic and computational advantages over existing strategies. It is designed for use by non-trained individuals but could be equally useful in a performance context especially where the performer is remote from the projected sound field.
The design of expressive public interactive music installations obligates an approach centred on guiding the public through specific sets of activities. An activity-based approach is necessary to elicit appropriate music generating behaviour from participants. Physical restrictions on the actions of participants are generally needed to keep behaviour within functional bounds. In addition, where adequate restraint of behaviour is unfeasible, it may also be necessary for designers to rationalise the sensitivity of sound generating mechanisms.
First-time users of interactive installations may not fully recognise the relationship between their own actions and the resulting sound. Hence, designers of public interactive works must establish potent techniques to facilitate rapid cognition in participants. It is important that participants are aware of the consequences of their actions, as it is through informed behaviour that composers can best communicate aesthetic material in an interactive context.
This paper will discuss a new method for interactive sound spatialisation by the public. The system has been devised for The Talking Chair [Mott, 1995], an interactive sound installation that enables non-trained individuals to manipulate compositional algorithms and control the 3-D motion of a virtual sound object around their body. We have developed a design with strong spatialised visual reference points to aid the first-time user in interactive sound spatialisation. While the interface is designed to control both music generating algorithms and the spatialisation of sound objects, discussion of the compositional element is beyond the scope of this paper.
2 Earlier Design
Our current work is performed as part of an ongoing investigation into interactive music. In collaborations with designers and animators we have produced works exploring spatial sound manipulation by the public. Squeezebox was made after The Talking Chair in 1994 and was produced in part to examine the use of restricted movement in public interaction. Unlike The Talking Chair which allows unhindered spatial gestures within a given region, the Squeezebox interface utilises four pneumatic pistons to insure smooth gestures from the public.
The impetus for designing a new spatial controller came from a need to readdress human factors associated with The Talking Chair. The first 1994 version of the installation (Figure: 1) implemented John Chowning's  theories of moving sound simulation and used an ultra-sound wand device as an interface. A single seated participant moved the wand through a cubic zone above the lap to control the motion of sound. Ultra-sound pulses transmitted from the wand were picked up by three fixed receivers and gestures were mapped in the form of cartesian coordinates. A mirrored display cabinet was situated in front of the listener as a visual aid to the spatial navigation of sound. The viewing cabinet, using a magic box technique [Popular Mechanics Co., 1913], displayed the illuminated wand tip as a glowing ball moving about a reflected image of the participants head. The position of the ball relative to the head corresponded to the perceived position of sound in the space surrounding the listener.
Figure 1: 1994 version of The Talking Chair
Regrettably, while the magic box system worked effectively, its use required detailed written explanation and some practice. We found that few people in museum environments fully read the detailed diagrams and as a result, the cabinet often proved confusing and distracting. The complexity of the interface also lead a few individuals to believe that the cabinet itself was the object of the interaction! The magic box display, in addition, requires a degree of control over ambient lighting that is not alway practical in many public spaces. A simplification of approach was in order.
In attempting to remedy shortcomings in our original human-machine interface our two main aims were to a) efficiently map the spatial gestures of participants to factors simulating the spatial motion of sound and b) provide an interface design that explicitly informs the participant of the spatial relationship between their body and the virtual sound object.
We chose F. R. Moore's  General Model for Spatial Processing of Sounds as the basis for our interface design. The model is readily adaptable to a user interface using ultrasound technology and has qualities which are highly instructive in the interactive positioning of sound relative to the body.
The model was originally produced to address problems of listener perspective in multi-speaker systems by generating audio delays and amplitudes with respect to the loudspeakers rather than the listener. In this way listeners in different locations within a sound field will each receive a different perspective of the same spatial event. Loudspeakers in spatial arrays are modelled as windows in a room, the size and shape of which is determined by the number and placement of the speakers. This inner room is surrounded by a larger outer room in which illusory sonic objects may move freely. The audience is contained within the inner room and perceives virtual sound objects through the windows. Spatialisation of sound is simulated by measuring the distance of the direct and reflected sound paths from the virtual object to each window. The distances are used to calculate individual amplitudes and time delays for each loudspeaker channel.
We were attracted to the General Model, not because of issues of listener perspective but rather due to its potential to produce an interface design of great ergonomic clarity. By constructing a user interface as a physical analogue of the inner room we at once provide an efficient mechanism for real-time spatial control and strong visual tool to communicate the position of sound with respect to the listener.
4 Implementation Overview
The Sound Sphere
The interface design incorporates a small sphere suspended from a stalk in front of the listener. The surface of the sound sphere is impregnated with six ultrasound receivers each in positions directly corresponding to the position of the speakers of The Talking Chair. An alignment tube is included in the interface so participants can adjust the seat height to position their head at the centre of the sound field. An ultrasound emitting wand is moved around the sound sphere to position the sound relative to the speakers of the sculpture. Participants will be informed in a simple four stage instruction chart (Figure 2) how to interact with the sculpture. Instruction covers a) the use of the alignment device b) how to change sounds and c) how to position sound objects relative to the sculpture and consequently the body of the participant. The discovery of the remaining nuances of interaction is left to the investigation of participants.
Amplitudes and Delays
The interface uses ultrasound to measure the distance from the wand to each receiver. Distance measurements are used to determine both amplitude and delay time of sound for each loudspeaker channel. Audio amplitudes are to be attenuated with a linear rather than an inverse cubic relationship [Moore, 1989] to distance, as such methods are more useful for interactive control [Ballan et. al., 1994].
With our implementation of the General Model, only the direct (not reflected) paths from the wand to each receiver are measured. While the omission of reflected paths represents a simplification of the model, a further complexity results from the distortion of the sound field caused by the sound sphere.
In the General Model, windows with direct sound paths obstructed by walls receive signal levels of zero. In cases where a window is suddenly shadowed by an obstructive wall, signal levels decline sharply and it is necessary to interpolate between amplitude values in order to avoid audible clicks [Moore, 1989]. With our interface design part of the ultrasound signal is diffracted around the surface of the sound sphere. Smooth amplitude transitions occur as windows (receivers) become gradually shadowed. Distance measurements from the wand to shadowed receivers already include the extra path lengths resulting from the sound wave following the curve of the surface, and are thus a truer representation in this context.
While it is possible for us to achieve stable amplitude measurements at each receiver, we will not however map these readings to loud speaker amplitudes. Our transmitter is not fully omnidirectional and consequently models a sound object with a directional radiation pattern. In addition to the inverse square nature of the response, such directionality would render the interaction unsuitable to non-trained usage. A mechanism of this type is however worthy of continued investigation in a performance context.
Reverberation is to be implemented using off-the-shelf effects units and the level will be determined according to the distance of the wand from the sound sphere. As in the 1994 version of The Talking Chair we will implement John Chowning's local and global reverberation techniques  which we have found to be highly efficient and effective in simulating changes in distance.
The attenuation of individual audio channels is to be performed by a custom built MIDI controlled mixer. The device which uses analog circuitry, is capable of controlling four independent sound objects in a six loudspeaker array. The device has the capacity to control the signal level of audio sent to each loudspeaker as well as controlling the level of both local and global reverberation. The mixer also contains insert points for the six individually tapped delays of one virtual object.
At time of writing, the audio delay software required for the spatialisation is being implemented on a DSP56002 processor [Motorola Inc., 1993]. Whilst the processor is an overkill for the simple delay processing required (basically, memory lookup and interpolation), it was decided to pursue this method as it is planned, eventually, to shift all of the spatial processing required for The Talking Chair and other projects onto the DSP, and eliminate most of the analog circuitry currently being used.
Ultrasound distance measurements are performed by a PIC16C84 microcontroller [Edwards, 1994] [Microchip Technology Inc., 1994]. The unit produces its output as MIDI messages which will be received by a Macintosh computer running the FORMULA music language. In addition to controlling music generating hardware, the Macintosh will control the DSP56002 and the custom mixer via MIDI.
5 Ultrasound Specification
Choice of Frequency
Ultrasound transducers, of the type readily available to us, are highly directional devices. In order for them to provide useful distance information for interactive spatialisation it is necessary to enhance their omni-directional capacity. This is to ensure that for all angles, the signal strength of a direct path to a given receiver is of a higher intensity than that of reflected signals. The 1994 ultrasound implementation used 40KHz transducers. The wand housed the transmitting transducer (Tx) and was moved through a zone monitored by three receivers. We found that by fixing a 16mm (diameter) marble in front of the Tx, sufficient reflection occurred to allow distance (ie timing) measurements to work, no matter which direction the wand was pointed in.
The original wand would be impractical in the new implementation as it is necessary for the sonar wave to be diffracted, at least partially, around a sound sphere of perhaps 80mm diameter. The new wand uses 25KHz, which has a wavelength of around 13.5mm, and although this is still much smaller than 80mm, our tests showed that there was just enough diffraction around the sound sphere for useable distance (ie timing) measurements at all but the most distant receivers. A 40KHz signal is far less useable as its shorter wavelength (8.5mm) will result in far greater reflection.
This lower operating frequency also allows the use of small (6mm diameter) electret mics as the sonar receivers. Their small size results in them having a near-omnidirectional response pattern in free air (mic diaphragm diameter is smaller than wavelength), and thus a hemispherical response when they are mounted flush on the sound sphere surface. 25KHz is somewhat above the intended operating frequency of the electret mics, which are designed for audio applications, but with sufficient amplification and appropriate bandpass filtering they are quite useable.
Modifications to 25KHz Transducers
Attempts to improve the omni-directional response of 25KHz transducers using a marble to scatter the beam have been unsuccessful. This is possibly due partly to the design differences between the two transducers as well as differences in wavelength. The 25 KHz Tx radiates directly from the piezoelectric surface via a grille, the centre 9 mm of which is blocked off, allowing sound only through the surrounding annulus. An enhanced omni-directional response has been achieved through the use of a conical device (Figure 3) attached to the front of the Tx. In our design, the inner concentric cone directs the annular cross-section wave to the tip of an outer cone, where the final release to the surrounding air is via a 4mm diameter hole. This hole is much smaller than the 13.5 mm wavelength, so the radiation is nearly omnidirectional. With the ultrasound cone attached, the radiation in any direction is uniform within 6 dB, except for a shallow null of 9 dB at 165 degrees. This compares favourably with the radiation pattern measured originally on the bare transducer, which showed a variation of 26 dB (Figure 4).
The current implementation of the hardware uses a PIC16C84 microcontroller to generate the blips and process the received signals. It incorporates a manual adjustment of blip rate (30 to 60 mSec between blips) to allow use in different reverberant spaces. It also measures all 6 received signals near-simultaneously, so that a complete set of measurements is made at each blip, rather than having to cycle through 6 blips (and wait 6 times as long before a system response to a wand movement can be perceived).
The MIDI data byte output of the PIC is currently set as 7-bits and the resolution of distance measurements is 6mm. Data filtering techniques are to be applied in the Macintosh computer to smooth and interpolate between values.
The Tx is pulsed with bursts of 8 cycles of the 25KHz signal (each burst lasts for 0.32 mSec). This signal is generated in software, and drives the Tx from two PIC data lines via a differential summing amp. The final waveform is symmetrical, swinging ±20V, with zero average DC. Lower frequency clicks from the Tx, which had earlier been a problem, are thus almost inaudible, despite the higher power used to compensate for losses in the cone structure.
The new Talking Chair user interface has significant advantages over the original magic box device. The most notable advancement is the shift from a 2D frame of reference to a true 3D method. Freedom of movement is more limited with the new interface due to partial obstruction by the attachment stalk and the need to navigate the wand in close proximity to a solid surface. We believe however that the strong ability of the device to communicate notions of spatial location, far outweighs obstruction concerns in a context of public interaction.
[Ballan et. al., 1994] Oscar Ballan, Luca Mozzoni and Davide Rocchesso. Sound Spatialisation in Real Time by First-Reflection Simulation. Proceedings of the 1994 International Computer Music Conference, pp.475-476, 1994.
[Chowning, 1971] John Chowning. The Simulation of Moving Sound Sources, Journal of the Audio Engineering Society 19(1): pp.1-6, 1971.
[Edwards, 1994] Scott Edwards. The PIC Source Book, Scott Edwards, 1994.
[Microchip Technology Inc., 1994] PIC16C84 Data Book, DS30081C (Preliminary), Microchip Technology Inc., 1994.
[Moore, 1983] F. Richard Moore. A General Model for Spatial Processing of Sounds, Computer Music Journal 7(3): pp.6-15, 1983.
[Moore, 1989] F. Richard Moore. Spatialization of Sounds over Loudspeakers. In Max V. Mathews and John R. Pierce (Eds.): Current Directions in Computer Music Research, MIT Press, Cambridge Massachusetts, pp.89-103, 1989.
[Motorola Inc., 1993] DSP56002 Digital Signal Processor User's Manual, Motorola Inc. 1993.
[Mott, 1995] Iain Mott. The Talking Chair: Notes on a Sound Sculpture, Leonardo 28(1): pp.69-70, 1995.
[Popular Mechanics Co., 1913] An Electric Illusion Box in The Boy Mechanic, Volume I, Popular Mechanics Co., Chicago, pp.130-131, 1913.