Publications‎ > ‎

Abstractly Related and Spatially Simultaneous Auditory-Visual Objects

Andrew D. Lyons
Composition Unit
The Sydney Conservatorium of Music.
The University of Sydney
Sydney NSW 2000 Australia
Email: tstex at(nospam)


This paper discusses various design issues related to the integrated synthesis of 3D sound and 3D graphics. Issues of particular concern are those related to a style of audio-visual integration in which the perceived attributes of sonic and visual phenomenon are mapped to each other. The relationship between auditory and visual perception can draw on synesthesia, mental imagery and creativity. Unique problems result from the combination of these cross-modal design concerns and concerns for dynamic and realistic spatialisation. These problems are discussed within the context of design for reproduction involving traditional screen and multiple channel audio theatre systems. Works by the author are used as examples including a recently completed 3D audio-visual work for DVD performance called "Heisenberg". The paper concludes with a call for greater treatment of visual artifacts involved in music cognition within music education.

1 Introduction

Works of the sort discussed here can be described using a schema of four design criteria. Issues related to these design criteria and their combination form the basis of all discussion in this paper. They will be referred to in this paper numerically as denoted in table 1.1 below:

1. Spatialised Sound - The realistic and dynamic spatialisation of sonic objects for performance using multiple speakers or headphones.

2. Spatially Simultaneous Sonic and Visual Objects - The visual representation of spatialised sound source locations using 3D computer graphics and a single screen display.

3. Abstractly Related Sonic and Visual Objects The representation of mental imagery resulting from music and the abstraction into the visual domain of other cognitive artifacts attributable to sonic objects.

4. Effective Design - The development of audiovisual works which satisfy the fundamental design tenets of both sonic and visual arts.

Table 1.1 The four design criteria applied by the author.

2 Primary Problems

In the authors research, each of the first three criteria in table 1.1 have been responsible for what may be considered simple or primary problems. Complex or compound problems begin to arise when attempts are made to integrate the first three criteria in a way that satisfies the demands of the fourth criteria. Because the compound problems often result from combinations of primary problems each shall be discussed in turn.

2.1 Primary Problems Related to Sound Spatialisation

2.1.1 Design Criteria One

3D sound spatialisation is defined here as any system which is based on, or extends the original John Chowning (1971) spatial synthesis system. 3D sound systems as defined here do more than just pan sounds around an n-speaker array. A 3D sound system should also synthesise distance cues, doppler shifts, the six early reflections of sound off walls, local and global reverberation and perhaps include other features in order to generate an aural virtual reality. (Begault 1994) For the creation of Heisenberg an automated 3D sound spatialisation system of this sort dubbed "Pan" was designed.

2.1.2 Spatial Resolution in Sound Spatialisation

It is well known that the number of speakers available in a specific speaker array has a direct relationship to the degree of spatial precision achievable with that array. (Begault 1994) As more speakers are added to an array, each individual sound source can be differentiated more easily from others, and its location determined more accurately. (Begault 1994)

The greatest spatial resolution in the ITU-R BS.755-1 5.1 speaker array corresponds to the sixty degree arc immediately in front of the audience - where three speakers and the visual display are normally located. The author observed that the movement of spatially coincident sounds and images across this frontal 60 degree arc precipitated a generalised perceptual plausibility which in turn enhanced the evocative power of the acousmatic space beyond the area of the screen

Besides the problematic limited spatial resolution inherent in five speaker arrays, the absence of speakers in the vertical plane might also be considered a limitation. The absence of speakers above and below the horizontal plane removes the ability to create sonic images above and below the audience. Heisenberg was subsequently animated so that all motion takes place not too far from a horizontal plane level with the camera.

2.1.3 Scale Issues and the Inverse Square Law

One problem associated with all spatial synthesis systems involves ambiguity surrounding the correct function with which to create distance attenuation effects. The inverse square law is suggested widely to be the correct function with which to attenuate the amplitude and spectral content of a sound source to create the illusion of distance. (Dodge 1997) (Roads 1997) (Begault 1994). In Moore (1990) however, the inverse cubed law is stated to provide a more perceptually tenable relationship between distance and amplitude. It would seem that the exponent in this function holds the key to creating different relationships between distance and attenuation.

It is well known that the volume of an extremely loud sound decays over greater distances than that of a quieter one. (Bregman 1990) When working with digitised sound files however it is not always good practice to digitally store such scales of amplitude due to problems with either clipping or narrow bit resolution. By making the exponent in the distance inversion function a parameter called "scale", it was possible to work with an optimally sampled sound file and create either the perception of a nearby insect or a distant Jumbo Jet. This level of control makes itself most apparent in Heisenberg in the dramatic yet slowly shifting doppler shifts created by fast moving loud distant sound sources. Without the scale parameter distant sound sources and the dramatic doppler shifts created by large relative velocities would be inaudible. This is a feature not documented in some classic texts on spatialisation nor implemented in some proprietary 3D sound systems.

2.2 Problems Related to Spatially Simultaneous Sonic and Visual Objects.

2.2.1 Design Criteria Two

In our day to day life we are able to attribute sounds causally to an event or object which is more often than not visible in some way. In this way our vision and hearing provide complimentary information about the location of events around us. It has been shown that each modality influences the spatial localisation of a stimuli source established by the other. (Lewis, Beauchamp and DeYoe. 2000) The synthesis of such spatial audio-visual relationships lends considerable authenticity to virtual environments. (Begault 1990) While there is a large amount of visual bias in situations when discrepancies between visual and auditory spatial location occur, (Welch and Warren 1980) the correlation of cues from each modality may serve to encourage and reinforce the perception of intended abstract relationships between auditory and visual objects.

2.2.2 Temporal resolution

The main strength of Pan for the author is its ability to provide perfect temporal synchronisation between sounds, their trajectories and their animated representations. No external devices are needed to achieve accuracy within the range of one audio sample. This kind of accuracy does however seem a little like overkill when one considers the temporal resolution of the television screens for which the animation is prepared. With such displays there is no satisfactory means to visually represent a sound which has a dynamic spectral envelope that endures no longer than the minimum duration for which a frame can be displayed. This is one twenty fifth of a second in the PAL format used in Australia and Britain and roughly one thirtieth of a second for the NTSC format used in the Unites States.

This problem becomes more insidious when percussive passages take place at a regular rate that is slightly out of phase with the redraw rate of whatever television standard is being used. For this reason all percussive passages inHeisenberg are synchronised with the 25 frames per second redraw rate of the PAL television standard. In most cases this means tempos of 93.75 bpm are used. While it remains impossible to accurately represent percussive sounds with any temporal detail, at least it is possible to match their onset and offset with the onset and offset of an image. Creating passages that synchronise in this way, and at this rate creates an effect commonly referred to as strobing. Strobing and flicker effects are regarded favourably in some circles and not so favourably in others, and ideally screens and projection systems with much faster redraw rates will be available in the long term.

2.2.3 Spatial Dimensions

When creating spatial works for cinema style performances, one becomes aware of a great disparity between the spatial range available for representing objects sonically and the spatial range available for representing objects visually. Irrespective of spatial resolution, a sonic object can be located anywhere around the audience on a horizontal plane when using five surround speakers. The representation of all visual objects however must vie for space within whatever dimensions are afforded by the available screen.

2.2.4 Calibrating for Spatial Simultaneity.

In a 3D animation in which the camera is both moving around a scene and panning from side to side, it is important that sound sources and their visual representation move together relative to the camera position and angle of rotation. If the camera pans to the left, then related sonic and visual objects should move off towards the right together relative to the central axis of view. It is one of the functions of Pan to ensure that all sounds are spatialised relative to the position and rotation of the camera being used to shoot the animation. It is essential that the audiovisual field remains spatially coincident during camera transformations, because ultimately the position and rotation of the camera represents the spatial position and alignment of the audience during theatre performances.

An extension of this problem is the need to spatially calibrate the visual field to match the rendered sound field. To calibrate the sound field spatially it should only be necessary to centre the circle of speakers around the supposed central audience position and arrange them using the angles indicated in the ITU 5.1 theatre sound specification. To calibrate the visual field however one should consider - before the animation is created - the angular relationship that exists between the central audience position and the side edges of the screen in the target theatre situation. This angle must then be used as the field of vision in the camera that shoots the 3D animation. In this way sounds and associated objects will appear to be spatially coincident. Figure 2.1 below may help explain this problem.

Figure 2.1 - Field of Vision and Spatial coincidence.

2.3 Problems Related to Abstractly Related Sonic and Visual Objects

2.3.1 Design Criteria Three

It has been a major objective of the author’s research to create animations that explore cross-modal exchanges between audition and vision, and which explore mental imagery associated with the experience of listening to acousmatic and computer music. The author appreciates that mental imagery takes highly individual forms, and the imagery of one does not often resemble the imagery of another when responding to the same piece of music. It is hoped however that research describing universal styles and recurring themes in mental imagery, synesthesia and other cross-modal categorisation systems will help offset this subjectivity. A system involving this research has been utilised in the development of SchwarzchildLoucid and Heisenberg.

2.3.2 Mental Imagery

When discussing mental imagery it is important to differentiate imagery specific to each sensory modality. Auditory, visual, kinesthetic, tactile, and olfactory imagery are all separate but related fields of cognitive psychology. (Richardson 1999) Such is the dominance of occularity that in its broader usage "mental imagery" may be taken to mean the visual variety of mental imagery. The auditory kind of mental imagery is generally referred to as "auditory imagery" in this and other literature. (Reisberg 1992)

It has been suggested that fifty percent of people experience mental imagery of some sort when listening to music. (Huron 1999) Recent PET and EEG scans of the human brain have confirmed the employment of neural areas usually associated with vision in the exercise of musical tasks. (Nakamura et al. 1999) (Platel et al. 1997) Psychologists regard music as being a powerful source of mental imagery. (Quittner 1980) The use of mental imagery in music therapy techniques such as the Bonny Method of Guided Imagery (Goldberg 1995) has created a body of research describing the features of such imagery. (Cho 2002) (Lem 1998) Some recurring themes in mental imagery in response to musical stimuli include "'Nature scenes' i.e. sun, sky, ocean, plant and animal, etc., 'Journey', 'I' and 'Emotion' i.e. happiness, sadness, depression, etc." (Cho 2001)

2.3.3 Synesthesia

While many experience mental imagery as a result of musical audition, only one in twenty five thousand people experience synesthetic perception, in which coloured forms appear involuntarily in the field of vision in response to sonic stimuli. (Cytowic 1992) Auditory-visual synesthetic imagery may be described as the superimposition of geometric figures known as "photisms" over the normal field of vision. While being largely flat and figurative, synesthetic photisms do have a sense of extruded depth.

Following a query by the author regarding dimensional features in synesthetic photisms, a synesthete named Sarah provided the following description of her synesthetic perception: "I have sound>colour synesthesia; to put it simply I see colours, and sometimes patterns, when hearing sound. To me these colours and patterns seem two-dimensional. When I hear words and simple sounds the colours are also simple; flat or graduating colours depending on what I'm hearing. When the sounds have some sort of rhythm, as with music or even poetry, these colours form moving patterns… To me it looks like a sort of filter or overlay. I'm not even sure sometimes if the colours are projected into space at all, or if I'm just seeing them "in my mind's eye" so to speak. I still see these colours with my eyes closed, for instance. In fact I see them more strongly with my eyes closed."

A synesthete named Lisa then offered a description of her perception: "My sound->sight synesthesia is sometimes projected as flat, like Sarah's, but mine can just as easily be 3d with depth. For instance, I was riding in a friend's van the other day, and the side door wasn't all the way shut. It kept making this noise that to me looked like slightly asymmetrical orangey-yellow cylinders coming from the top left of my vision to somewhat near me on my bottom right. (It drove me nuts for almost an hour, 'til I was able to re-close it.)"

There is a notable difference between these descriptions of synesthetic perception and the three dimensional landscapes and journeys common in non-pathological mental imagery. (Cho 2001) For the development of spatially dynamic audio-visual works, other systems must be drawn on to compliment any understanding of cross-modal exchange based on synesthetic perception.

2.3.4 A Systematic Approach

The author has devised a systematic approach to the visual representation of aural attributes which may assist others in developing such works. The first stage in this process involves reducing imagery to four main levels.

This reduction involves an illustrative consideration of the neural dispersion of cognitive activity responsible for each of: synesthesia, spatial cognition, associative mental imagery, and causal analysis. The reduction of auditory-visual imagery using this schema permits the further analysis of each of the four main levels using more specialised modal translation systems.




Level One

Sub – Cortical


Level Two



Level Three

Right Frontal


Level Four

Left Frontal


Table 2.1 - The author’s four part reduction of auditory-visual imagery. (It should be noted that the neural locations are very generalised.)

2.3.5 Level One - Synesthesia

The first level of the four part scheme in table 2.1 describes relationships between sound and visual imagery purported to take place at a sub-cortical level. Many neurologists involved in research into Synesthesia believe that this is the area of the brain responsible for Synesthetic perception. (Cytowic 1989) (Baron-Cohen and Harrison 1996).

Research into Synesthesia has suggested constant relationships between certain aural qualities and resultant visual percepts. (Marks 1978) In many of the exchanges between the modalities of audition and vision in synesthetic perception, intensity plays a significant role. (Marks 1978) In the context of aural perception, intensity is a product of both high frequency spectra and loudness. In visual perception intensity is a function of brightness. The relationship of aural intensity to visual intensity experienced by most synesthetes is similar to that of non-synesthetes. (Marks 1978) Some cross modal relationships are set out below in Table 2.2:

Aural Feature

Visual Counterpart

High Pitch

Small (bright) photism

Low Pitch

Large (dull) photism

High Loudness

Bright Photism

Low Loudness

Dull Photism

"Course" sonic texture.

Rough/Sawtooth shapes

"Smooth" sonic texture.

Smooth flowing shapes.

Table 2.2 Mapping aural features to visual figurative features in synesthetic photisms.

While it is not explicitly described in any cross-modal systems, intensity due to loudness can also be associated with the size of an object. This is perhaps related to the way that objects become larger and louder as they come closer to an observer. It may be noted that a direct relationship between loudness and size would at times conflict with the idea that objects with primarily high frequency spectra are small in visible size. There is also a potential for conflict in situations involving loud sounds with a low pitch - they need to be both bright and dull at the same time. The similar applies to quiet, high frequency sounds.

Besides these conflicts of attribution, the relationship between aural texture and visual texture would seem to be intuitive. Sounds with saw-tooth like amplitude envelopes precipitate saw tooth shaped photisms. Also, sounds with smooth envelopes produce smooth photisms. Not in reference to synesthetic photisms this sonic-visual topological equivalence is also described as "physiognomic perception". (Davies 1978)

Colour is considered to be a highly subjective and widely varying visual attribute of sound by both synesthetes and non-synesthetes.

2.3.6 Level Two – Spatial Aspects

In a positional context, the relationship between space and music employed is the uniform co-existence of sonic and visual objects. While there is a large amount of visual bias in situations when discrepancies between visual and auditory spatial locations occur, (Welch and Warren 1980) the careful correlation of cues from each modality may serve to reinforce the perception of intended abstract relationships between auditory and visual objects.

In a design context however, the relationship between space and sound can become more complex. In classical literature, architecture is often described as having a relationship to music in the use of proportion. (Yi 1991) (Bragdon 1939) (Ong 1994) (Treib 1996) Comparisons based on the broader principle of gestalt isomorphism are also useful in formulating relationships between music and architecture. (Wertheimer 1938) (Lyons 2000).

A dynamic variant of isomorphism may however be more useful in temporal works. Such an idea has been expounded by Candace Brower (1998) in which Density 21.5 by Edgar Varese is described in terms of pathways, containment and blockage. This idea of containment and release was a central visual design feature in the author’s 2000 animation: "Loucid" and then again in Heisenberg.

In some contexts the relationship between human gesture and music (Battey 1998) can be more useful in describing spatial-auditory relationships. This is for three reasons. The first is the powerful existential relationship between kinesthetic, auditory and visual cognition. (Priest 1998) The second is related to the fact that human gesture is at once spatial and temporal whereas architecture is spatial, but frozen in time. The third is the ability of human gesture to be performed within a confined visual field - such as that afforded by a visual display.

2.3.7 Level Three – Associative Imagery

The relationship of mental imagery to sound stimuli is highly subjective in nature. Generally the cognition of such imagery involves exchanges between encoded properties and associative memory as described in Kosslyn’s (1994) protomodel of visual perception. For this reason such imagery is delimited by one’s visual experiences and mediated by the cognition of similarity. (Sloman 1999)

Level three concerns mental imagery created by a low level, primary process style of association. (Dailey 1995) It is imagery that is associated in a fuzzy way with general auditory features during passive listening. Associative imagery as described here is that utilised in the Bonny Method of Guided Imagery. (Goldberg 1995) Level three imagery is usually derived from the overall gestalt of a sonic passage. No effort is made to isolate individual sounds or to determine their sources. The associated question in this stage is: "What does that sound remind me of?"

2.3.8 Level Four - Causal Attribution

This final level describes an analytical approach to attributing a sound source of initially unknown origin to an object / event. As opposed to the previous stage - which is passive, involuntary and not consciously directed - this stage is active, conscious, constructive and analytic. It depends as much on memory as the associative imagery described above, however the way in which memory is accessed is more directed. In this stage each individual sound source is addressed in series, and the question asked: "What physical object could be making that sound?" The overall nature of the scene is then constructed from the composite of each individual object.

2.3.9 Resolving the Four Levels.

It may be apparent at this point that each level of the four part system would suggest different types of imagery. In the authors work different levels of the system are given priority in each scene. In some scenes - such as the green room and the last cloudfall scene in Heisenberg - associative imagery is the dominant guide to visual design. In other scenes the causal approach is taken. Generally once this decision has been made the detail of a scene is developed using the spatial and figurative levels. Imagery suggested by these two low levels is only implemented where it doesn’t conflict with any imagery pre-determined by the causal or associative levels.

2.4 Problems Related to Effective Design.

2.4.1 Limiting the Scope.

Design aesthetics will only be discussed briefly in reference to aspects of the authors own approach to design where it pertains to the sort of works and problems being discussed here.

2.4.2 Designing Sound Spatialisation

When animating sound sources, dynamic and spectral qualities are dramatically modified by the proximity of a sound source to the camera/microphone position. The application of compression, dynamic loudness attenuation, or dynamic spectral filtering may compromise the illusion of moving sound sources. The relative loudness of sounds in a piece and certain rhythmic effects are dependent on sound source proximity.

Doppler pitch shifts create a situation where a sound’s pitch content is bent upward while approaching the camera and bent downward whilst travelling away. The idea that this might be useful in any melodic sense is defeated by the fact that a sound can only move towards the camera for a certain amount of time before it collides with it. Similarly a sound can only travel away from the camera for a certain amount of time until it is inaudible.

2.4.3 Timbral Composition

In the development of sonic material for Heisenberg, primary sonic constituents were initially chosen on the basis of desirable timbral properties. In combining sounds the author’s concerns are divided between various approaches. In a more analytic approach, consideration is given to the spectro-morphology of initially selected sounds. (Smalley 1986, 1997)

An advantage of the elemental system of categorisation (Stewart 1987) is that it can be applied equally well to both sonic and visual objects. Throughout the authors animated works there are literal implementations of this system. Fire, water, clouds and landscapes are all visual representational components in various pieces. Also, elemental systems have the rare attribute of near universality in category systems across all human cultures.

With elemental and spectro-morphological considerations in mind, sounds of differing natures are combined to produce balance, juxtaposition and other structural features where desired. By necessity this must be done with a mind to spatial trajectories and visual composition. In Heisenberg at least, fractal noise and other random procedures are then used to make decisions and generate detail at the microcosmic level.

2.4.4 Visual Concerns

A primary visual concern for the author is the composition of objects within the frame of the screen. In most scenes in Heisenberg for example, attempts are made to ensure that there is a balanced distribution of visible objects within the screen area. At no time for instance is there nothing to be seen on-screen, and in most cases at least one instance of the majority of auditory objects can be seen. Ideally the composition of objects will be balanced at all times.

The choice of hue for a scene is generally developed using the elemental or associative cross-modal systems described above. In Heisenberg, the use of high levels of saturation and the tendency towards chromatic homogeneity within a section are based purely on aesthetic choices.

3 Complex Problems

3.1 Overview

With the concerns and problems native to each of the four separate design criteria established, it is now possible to consider conflicts arising from certain combinatorial permutations of these criteria.

3.2 Occlusion

3.2.1 Occlusion

The most significant complex problem for the author in the creation of works such as Heisenberg is that of visual occlusion. In visual perception, when opaque objects are superimposed, the foreground object always occludes those behind it. The closer an object is to the viewer, the larger is it’s perceived size and its capacity to conceal objects behind it. In a scene in which objects are moving and changing size and shape, occlusion problems arise constantly. Some examples are discussed below.

3.2.2 Occlusion and Spatial Arrangement

Occlusion sets up limitations on the way in which objects can be arranged spatially. The position and size of an object must be taken into account when planning the animation of a scene. When combined with a concern for cross-modal mapping and physiognomic perceptual styles, a tendency arises for scenes to be full of layers of objects at increasing depth. In SchwarzchildLoucid andHeisenberg large dull objects – which resemble wall planes, ground planes or entire rooms – are used to represent low frequency sounds. As sounds become higher in frequency they tend to become smaller and move into the foreground. Objects within the room must be animated spatially in such a way that they do not pass through the geometry associated with the low frequency sound object.

3.2.3 Occlusion and Audio-Visual Proximity

Occlusion problems effect the proximity of sonic objects to the camera during spatial animation. In the visual domain, if an object is too close to the camera it can not only conceal the rest of the scene, it can also intersect or overlap the position of the camera. In 3D animation, a geometry-camera intersection will create unwanted artifacts. In the auditory domain proximity also creates loudness levels which may distort audio signals and even damage audio amplification equipment.

To avoid such effects of proximity it is necessary to hand animate objects carefully so that no camera collisions occur. In systems that are animated using algorithmic techniques, this can be achieved by converting cartesian position coordinates to polar coordinates and setting a minimum distance limit between objects and the camera location. To avoid sudden collision type motion when the minimum distance is achieved, a gaussian envelope filter can be applied to distance data to smooth out sudden changes in motion.

3.3 Other Criteria Conflicts

3.3.1 Aural Immersion - Field of View Conflicts

One conflict between the four design criteria in Table 1.1 results from the aforementioned desire to frame within a screen of limited size, an object associated with every sound audible within that scene. This of course creates a conflict of interest when one wishes to immerse the audience in sound and yet have the sound sources visible at the same time.

In Heisenberg, this is solved by arranging multiple instances of certain objects around the camera. In this way the audience is immersed in sound, and at least one instance of each sound object is visible whichever way the camera points. The audience should even be able to infer the appearance of invisible instances from one visible instance of that object type. In Heisenberg objects which exist in only one instance tend to be spatialised so that for at least fifty percent of the time they are clearly visible in the field of vision. Instances are usually different waveforms of a similar type, or a de-correlated and delayed version of the original instance. (Kendall 1995)

3.4 Feedback Loops in Audio-Visual Design

3.4.1 Overview

Problems like occlusion, which are due to simple conflicts of interest, can usually be solved with a little compromise. Difficulties arise when the manipulation of an attribute in either the visual or aural domain adversely effects related attributes in the other domain. Solutions for problems involving multiple conflicts of interest compounded by a potential for feedback between spatial, auditory and visual attributes of a scene can be a little trickier to conceptualise solutions for. Their solution usually involves tweaking spatial audio-visual relationships until the problematic system settles down and becomes acceptable in both audio and visual design domains.

3.4.2 Integrated Spatial Audio Visual Composition

When spatial simultaneity and cross modal exchanges are a creative concern, feedback between sonic, visual and spatial elements has the potential to enter infinite feedback loops. When adding a new sound to a sonic passage, the new sound may suggest a certain object, and while the new sound may blend nicely in amongst pre-existing audio material the new object may not. Finding a spatial location for that object where occlusion isn’t a problem will then become an issue. If the object is moving, altering it’s position will effect it’s doppler shifts and other dynamic audio information. This in turn may effect the way that sound would look in the first place. Changing the object’s look may fix this, so long as animation and other data doesn’t need to be changed with the new visual features. Still, with the object added and altered thus, the overall sonic dynamic of the entire passage may have altered. Other objects in the scene may need to be altered and added to represent these changes faithfully. The entire visual strata of the scene may in fact need to be re-designed. Besides any re-animation this may require, this may also in turn effect the perception of the new object relative to the new nature of the scene. The altered object type may not now be appropriate.

This kind of thing can go on indefinitely and the mental pre-visualisation of a solution which will permit the system to work on all levels will save a lot of time in trial and error. In the authors experience pre-visualised scenes are also generally the most satisfactory visually anyway.

3.4.3 Low Frequency and Reverberation

As described in section 3.2.2, low frequency sounds are often mapped into ground planes, wall planes or entire rooms. The animation of low frequency objects therefore has the potential to feed back into the reverberation in the scene. In turn, reverberation has the capacity to feedback into general visual qualities within the scene – especially if wall reflection doppler shifting is enabled.

4 Conclusion – Problems in Music Education

The initial problem for many attempting to develop interdisciplinary works such as those described here might be the lack of a conceptual starting point. This may be due in part to the traditional absence of discussion about visual artifacts involved in music cognition in music scholarship and education. While the visual arts have a considerable tradition regarding the depiction of auditory phenomena visually, (Kandinsky 1912) visual-auditory relationships are not an integral part of traditional music scholarship.

This may be related to the fact that mental imagery is experienced less often by trained musicians. (Huron 1999) This in turn has been suggested to be due to a shift in music cognition towards linguistic areas of the brain as linguistic abstractions of music are assimilated during traditional music education. (Crowder and Pitt 1992) Bio-musicologists have suggested that the emphasis in western music on reading notated music from left to right as one would read a spoken language, has had a major influence on the cognition of music and the evolution of the art as a whole. (Wallin 1991) This is perhaps reflected in the popularity of linguistic and grammatical models of music, such as the generative theory of tonal music (GTTM) of Lerdahl and Jackendoff. (1985) As Sir Thomas Beecham once quipped, "A musicologist is a man who can read music but can't hear it."

While the system of Common Music Notation (CMN) underlying western music scholarship is still useful for musicians and composers working with traditional western music and instruments, it is not always applicable to the music of other cultures, or to many new types of electronic music.

For some time musicologists have considered the possible shortcomings of describing music purely in terms derived from CMN. (Nattiez 1978) (Padham 1996) CMN and many models of music based on it are now increasingly regarded as being outmoded and irrelevant by modern sonic artists working with new technology and a broad timbral palette. (Wishart 1986) More recently it has been shown that cognitive musicology relying on CMN fails to meet basic adequacy criteria. (Leman 1999).

Many have argued that musical experience has ineffable qualities – that is qualia that cannot be expressed in any other way. (Raffman, 1993) There is in fact no a-priori reason why the analysis and discussion of music should be limited to linguistic models based on CMN, when it has been shown repeatedly that a significant degree of musical experience does not reduce well to CMN or related linguistic models.

Of the new music models developed in the 20th century, many make extensive use of visual analogies that do not involve CMN. (Mattis 1992), (Palombini 1992), (Smalley 1986, 1997) A visual approach to sonic art based on mental imagery is much more compatible with dominant dual coding theories of cognition, in which linguistic modes of cognition are complimented by mental imagery modes, and vice versa. (Paivio 1978) Music analysis based on such descriptive techniques might be described as Musicography. (Palombini 1992) (Lyons 1999)

The development of Musicography as a field within music research and education would do much to stimulate creativity in music culture. (Dailey 1994) While there can be no doubt that linguistic approaches to music discussion and education are valuable, there is increasingly no reason why they should not be complimented by analysis of the integral visual aspects of musical experience. Such a change to music scholarship would certainly do much to propagate musical concerns in future interdisciplinary artworks. The absence of such teaching will however perpetuate the developmental retardation of musicians seeking to work with new media and interdisciplinary art.

5 References

Baron-Cohen, S., and J. Harrison. eds. 1996. Synesthesia: Classic and Contemporary Readings. Oxford: Blackwells.

Battey Bret. 1998. An investigation into the Relationship between Language, Gesture and Music.

Begault, Durand R. 1994. 3D Sound for Virtual Reality and Multimedia. Cambridge, MA. Academic Press.

Brower, Candace. 1997. "Pathway, Blockage, and Containment in 'Density 21.5'," Theory and Practice 22.New York: Columbia University Press.

Davies, John Booth. 1978. The Psychology of Music. Stanford University Press.

Bragdon, Claude. 1978, (c1939). The Beautiful Necessity - Architecture as frozen Music. Wheaton, Ill.: Theosophical Pub. House.

Bregman, Albert S. 1990. Auditory Scene Analysis: The Perceptual Organisation of Sound. Cambridge, Mass.: Bradford Books, MIT Press.

Campbell, Joseph. 1957. Man and Time: Papers from the Eranos Yearbooks. New York: Princeton University Press.

Cho, Young-soo. 2001 "The influence of music on individual mental imagery." Korean Journal of Music Therapy. Vol. 3, No. 1, pp. 31.49

Cho, Young-soo. 2002. A Bibliography of UMI Dissertations Dealing with Mental Imagery. Referenced 3-5-2002.

Chowning, J. 1971. "The simulation of moving sound sources." Journal of the Audio Engineering Society.Vol 19, No.1.

Crowder, R.G; Pitt, M.A.; (1992) "Research on Memory/Imagery for Musical Timbre." In. Reisberg, D. (Ed.)Auditory Imagery. Hillsdale, New Jersey: Lawrence Erlbaum.

Dodge, Charles; Jerse, Thomas A. 1997 Computer music : synthesis, composition, and performance2nd ed. New York : Schirmer Books ; London : Prentice Hall International.

Cytowic, Richard E.1989. Synesthesia: a Union of the Senses, New York: Springer Verlag.

Dailey, Audrey, R. 1995. Creativity, Primary Process Thinking, Synesthesia, and Physiognomic Perception.Unpublished doctoral dissertation. University of Maine.

Goldberg, Frances Smith. 1995. "The Bonny Method of Guided Imagery." In, Wigram, T; Saperston, B; West, R. (Eds.) The Art and Science of Music Therapy. Harwood Academic Publishers.

Harley, Maria Anna. 1994. Space and Spatialization in Contemporary Music: History, ideas and Implementation. Montreal: McGill University. Unpublished doctoral dissertation.

Huron D. 1999 "Music838 Exam Questions and Answers." Ohio State University.

Kosslyn, S.M. 1994. Image and Brain: The resolution of the imagery debate. Cambridge, MA: MIT Press.

Leman, Marc. 1999. "Adequacy Criteria for Models of Musical Cognition." In J.N. Tabor (Ed.) Navigating New Musical Horizons. Westport CT: Greenwood Publishing Company.

Lewis JW; Beauchamp MS; DeYoe EA. 2000. "A comparison of visual and auditory motion processing in human cerebral cortex." Cerebral Cortex Sep;10(9):873-88

Lyons, Andrew D. 1999. A Course in Applied Musicography.Unpublished report.

Lyons, Andrew D. 2000. Gestalt Approaches to the Gesamtkunstwerk. Unpublished paper.

Lyons. Andrew D. 2001. "Synaesthesia: A Cognitive Model of Cross Modal Association." Consciousness, Literature and the Arts. Spring 2001.

Marks, L. E. 1978. The Unity of the Senses: Interrelations among the Modalities. New York: Academic Press.

Marin, Servio Tulio. 1994. The concept of the visonual. Aural and visual associations in twentieth century music theatre. Unpublished Doctoral Dissertation. University of California San Diego.

Mattis, Olivia. 1992. Edgard Varese and The Visual Arts. Unpublished Doctoral Dissertation. Stanford University.

Moore, Richard. F. 1990. Elements of computer music. Englewood Cliffs, N.J : Prentice Hall.

Nattiez, Jean-Jacques. 1990. Music and Discourse: Toward a Semiology of Music Translated from French by Carolyn Abbate. Princeton University Press.

Ong, Tze-Boon. 1994. Music as a generative process in Architectural form and space composition. Unpublished Doctoral Dissertation. Rice university: Houston, Texas.

Priest, Stephen. 1998. Mearleau Ponty. London: Routledge.

Reisberg, D. (Ed.) 1992. Auditory Imagery. Hillsdale, New Jersey: Lawrence Erlbaum.

Richardson, J.T.E. 1999. Imagery. East Sussex, UK: Psychology Press Ltd.

Paivio, A. 1978. "The relationship between verbal and perceptual codes." In E.C. Carterette and M.P. Friedman (Ed.) Handbook of Perception: Vol VIII.New York: Academic Press.

Palombini, Carlos. 1993. Pierre Schaeffer’s Typo Morphology of Sonic Objects. University of Durham. Unpublished Doctoral Dissertation.

Quittner, A.L. 1980. The facilitative effects of music on mental imagery: A multiple measures approach. Unpublished masters thesis. Florida State University, Tallahassee, Florida.

Roads, Curtis. et al. (eds.) 1997. Musical signal processing. Exton, PA: Swets & Zeitlinger.

Seashore, Carl E. 1967. Psychology of Music. NewYork: Dover Publications.

Sheperd, Roger N; and Cooper, Lynn A. 1982. Mental images and Their Transformations. Cambridge, MA: MIT Press.

Sloman, Steven A. and Rips, Lance J. (eds.) Similarity and symbols in human thinking. Cambridge, Mass. : MIT, Press, 1999.

Smalley, Denis. 1986. "Spectro-morphology and Structuring Processes." In S.Emmerson, ed., The Language of Electroacoustic Music. London: Macmillan.pp.61-93.

Smalley, Denis. 1997. "Spectromorphology : Explaining Sound-Shapes." Organised Sound, 2(2). Cambridge University Press.

Stein, Barry M. 1993. The Merging of the Senses. Cambridge, Mass.: MIT Press.

Stewart, R.J. 1987. Music and the elemental psyche. Wellingborough: Aquarian Press.

Summer, Lisa. 1985. "Imagery and Music." Journal of Mental Imagery. 9(4). New York: Brandon House.

Treib, Marc. 1996. Space Calculated in Seconds. Princeton, N.J. : Princeton University Press.

Tuchman, Maurice; Freeman, Judi and Blotkamp, Carel. (eds.) 1986 The Spiritual in Art: Abstract Painting 1890-1985. New York: Abbeville Press.

Wertheimer, Max. 1938 "Laws of Organization in Perceptual Forms". in Ellis, W. A source book of Gestalt psychology. London: Routledge & Kegan Paul.

Wallin, Nils Lennart. 1991. Biomusicology: Neurophysiological, Neuropsychological and evolutionary perspectives on the origins and purposes of music.Stuyvesant, NY: Pendragon Press.

Wishart, Trevor. 1985 On Sonic Art. York: Imagineering Press.

Yi, Dae-Am. 1991 Musical Analogy in Gothic and Renaissance Architecture. Unpublished Doctoral Dissertation. University of Sydney: Sydney, Australia.