Select Page
On Combining Image And Sound In Film

On Combining Image And Sound In Film

Many screenwriters and filmmakers find it very difficult to come up with ways of using sound more cinematically and creatively. Whenever they try, all they get is silence in their heads. It’s almost as if the muses themselves panicked and stampeded out of their minds the moment they heard the word “sound”.

This is not because sound doesn’t have much cinematic potential – it does. The problem lies in the way film sound is thought about. So that’ll be the subject of today’s post, the wrong way of thinking about sound and the alternative that can break the impasse.

Movies are so compelling because they’re made of the same stuff of reality – electromagnetic waves the eyes can see and mechanical waves the ears can hear. It is a mistake, though, to think that faithfully imitating perceptual reality will more effectively draw audiences into the story world. It is in fact the main cause of the creative impasse, like trying to complete a shape in a sliding tile puzzle that has no empty space to move the tiles around.

First, combining image and sound according to ready-made perceptual configurations amounts to mere mechanical reproduction, a big no-no in any form of representation.

Second, real does not necessarily mean authentic.

But most importantly, mechanically adding sound to images for the sake of veridicality can have the undesired effect of creating an attention deficit in the audience.

Attention is what makes a film possible. If a movie fails to engage the audience’s attention, then it is not a journey through a fictional world but a mere pile of celluloid or pixels.

The ability to get the sustained and undivided attention of an audience is what sets a good filmmaker apart from the rest. Filmmaking is in fact all about designing the cognitive processes required to create the illusion of reality in the audience’s mind. It’s the art of guiding attention through the skilful use of cinematic techniques such camera movement, composition, editing, and sound.

But anyone who’s had a go knows that this is not easy. Why? Because attention is metabolically expensive. It requires a lot of energy, of which the brain has only so much at its disposal. If it’s to make it through the day, it needs to find ways of making ends meet. In the case of perception, which is the outcome of attention, the brain goes about saving energy by storing and organising sensory information that finds in the environment by means of two systems known as event files and schemas.

Event files is where all the relevant sensory information about people, places, events and objects is stored, so that the next time the brain encounters them, it can retrieve the necessary information – characteristic visual features, sounds, and so on – quickly and efficiently.

Schemas are mental structures for organising more complex types of information: general knowledge about sequences of events, rules, norms, procedures, and social situations that have been acquired through experience. Film is a good example of a schema. It contains all the norms and conventions required to make sense of a movie, all of which we have acquired through experience, i.e., repeated exposure.

This system makes sense: why waste energy reinventing the wheel each time? By using event files and schemas, the brain can quickly and efficiently form a percept whenever it detects a familiar cue.

The brain also has a surveillance system which constantly monitors the environment by scanning every one of the 11 million bits of data that the sensory organs send each second. The brain constantly compares all this data against its existing event files and schemas. If everything is as expected, then it can carry on running in economy mode. Only if anything deviates from the expectations or if anything suddenly changes will the brain ‘wake up’ and deploy its attentional resources to find out whether these deviations or changes represent a threat, an opportunity, or nothing worth investing energy in.

And this is why it is risky to think that combining image and sound realistically will draw audiences deeper into the story world. Ready-made perceptual configurations can have the opposite effect of telling the audience’s brain that there’s no need to waste on attentional resources as everything is as it should be.

This instinct towards naturalness is understandable. From a survival point of view, one major advantage of being able to acquire information about a same event through multiple sensory channels is that it allows the brain to verify the truthfulness of our perceptions. After all, despite their complexity, our senses are still prone to misperceptions and illusions which could potentially be deadly. Therefore, having senses that carry information along separate pathways allows the brain to cross-check our perceptions and confirm that they’re accurate. So it’s little surprise that many feel realistic combinations of image and sound in film are the best approach, lest the brain might detect the perceptual fallacy that a film is.

But the truth is that we don’t need to worry about veridicality when it comes to film. First, for reasons we’ll see later, the brain is hardwired to automatically accept images and sounds that happen simultaneously in time and space as belonging to a same object, person, or event. In normal circumstances it may immediately carry out a reappraisal and either confirm that it was right or realise that it was wrong. But in the case of film, because it knows it is a schema where everything happens for a reason, it would never discard the image-sound connection as wrong.

Also, film is a form of pretend play. Pretend play allows us to modify representations of reality in our heads. It is good in that it opens up a world of new possibilities for exploring different options. But for it to be effective, the brain needs a way of making sure that real and imagined don’t get mixed up – it would be disastrous if we took a fire or a lion to be imagined. The brain goes around this problem by creating a copy of the original percept and then activating a decoupling mechanism that dissociates the copy from reality. That way we can modify the copy as much as we like without jeopardising the truth value of the real percept. So, our safety being secured, the brain is happy to suspend disbelief and go along with whatever comes its way, no matter how out of this world and improbable it may be.

Veridicality, therefore, is not something a filmmaker should be worrying about when it comes to combining image and sound in film.

Does all this mean we should avoid perceptual realism? No, so long as it serves a specific cinematic purpose. It should be a deliberate choice aimed at having some kind of effect on how the audience will perceive the scene. It’s a bit like deciding whether to use a standard medium shot that feels natural and safe instead of a dramatic low-angle, or mid-key lighting as opposed to low-key.

It doesn’t mean either that we can combine image and sound willy-nilly. The combination of the two must meet one fundamental requirement: that it is done in a way that the brain can make sense of. And what is that way?

Since filmmaking is all about hacking the perceptual and cognitive processes of the brain, it’s simply a matter of finding the right process to hack. This shouldn’t be too difficult, since the brain too has to combine images and sounds that have been captured separately into a coherent and meaningful whole.

The brain receives information about the environment through five different sensory organs. Each captures a different spectrum of physical reality. The eyes pick up electromagnetic waves, the ears mechanical waves, the nose chemical substances in the air, and so on. Also, each sensory organ is processed for the most part in a different region of the brain.

To integrate these multiple sources of information into a unified and meaningful percept, the brain has to solve what is commonly known as the binding problem: it must determine which features belong to the same object or event. And it has to do so at two different levels – physical and semantic.

At the physical level, the brain has to determine what multisensory stimuli belong together. It does so by searching for patterns of neural activity across the different cerebral regions.

The senses send signals to the brain which cause the cerebral parts responsible for processing each sense to become active with neural activity. There are other aspects of reality such as time and space that also have their own dedicated regions and that also get activated by sensory stimuli. So what happens every time we perceive a subject is that the regions in the brain responsible for processing vision, sound, time, and space, all fire up at the same time, and this is how the brain determines what stimuli belong together – by detecting overlapping patterns of neural activity across its different regions.

At the semantic level, the brain solves the problem of integration also by searching for overlapping patterns of information. The only difference is that the information is of a semantic nature instead of physical.

Here the problem is not to determine what belongs together but how it belongs together. If the brain grouped multisensory stimuli physically but not semantically, we would experience the world incoherently. We would be able to recognise the different sounds and images but they would not be grouped meaningfully. On a busy road, while talking to someone, we might perceive the noise of car engines as coming out of their mouth and their words as coming out of car engines. Everything would be random and incoherent and we would not be able to use that information to guide our actions and decisions effectively.

You can get a sense of the nature of the binding problem at the semantic level by trying to answer the following question:

Any luck? Are you maybe thinking, “It depends…?”

What makes it impossible to answer this question is that there are three dimensions of information – shape, size, and colour – spread over three different layers – triangles, squares, and circles but no way to link them. In order to answer the question, you’d need to be given the specific dimension you must use as a reference point or unifying value for making meaningful associations across the different layers. If the question was “Which three figures are alike in the dimension of size?”, then your brain would be able to look for patterns of the “size” cue across the three layers, and thus connect the layers meaningfully. If it was colour, then it would look for overlapping patterns of this cue instead.

This is how it would work in a simple real life audiovisual situation. Let’s say the brain wants to determine what gender a person is. In such case, the physical elements are the face in the visual channel and speech in the auditory channel. The semantic element is “gender”. “Gender” then will be the linking value or dimension the brain will use to integrate audiovisual information meaningfully.

The brain will then start looking for overlapping patterns across the two channels. At the visual level it will find things like skin texture and bone structure. At the auditory level, it will find things like pitch and sound power. It will then fuse them and the result will be a coherent percept that communicates something meaningful, i.e., the gender of the person.

We could change this parameter to “truthfulness” and the final percept would be different. The brain would be searching for different types of cues in each sensory channel that normally indicate whether a person is telling the truth or not – sweat and stress in the voice for example – and the final percept would have a different meaning.

For the record, in perceptual terms, dimensions are all the different things going on in the environment on which we could potentially focus our attention and the layers are the senses.

So how does all this apply to film?

You may recall what I said earlier about the brain being hardwired to automatically accept images and sounds that happen simultaneously in time and space as being causally connected. This is because of physical binding. It is so common that the auditory, visual, temporal, and spatial processors fire up simultaneously in the brain that evolution has concocted a rule that goes something like, “If auditory and visual stimuli happen simultaneously in time and space, then automatically synchronise them”. That’s just what we get at the movie theatre.

As for semantic binding, this is the principle we need to follow to combine image and sound in a way that the brain can make sense of, whether the combination is faithful to our everyday perceptual reality or not. It is the process a filmmaker can exploit to use the construction of auditory and visual elements in ways that serve the dramatic, narrative, and cinematic needs of the story and not just the brain’s demand for veridical perception.

The process is the same: to select the auditory and visual elements that will overlap to form an audiovisual pattern, and to manipulate them so that they are congruent with each other by way of sharing a common unifying dimension or value.

So first you need to select a unifying value to integrate image and sound meaningfully. Then you need to include features in both the image and sound channels that a) are semantically related to that unifying value and b) have a counterpart in the other channel that the brain can associate. Or put more simply, you need features in both image and sound that are related to a unifying value and that combined form an audiovisual pattern that the brain can detect and make sense of within the context of the story.

One phenomenon that clearly demonstrates how semantic binding works in film is the ability to successfully use different types of music with a same set of images. Say we have as the visual setting a couple by the beach at sunset. If we add a romantic melody, it will work perfectly well, but so will a suspenseful tune. Why?

In the first case, the visuals contain elements that, at least in our culture, are perceived as romantic: a detached natural setting and dim light that inclines couples to more freely express their feelings. The music also contains elements that are also perceived as romantic: simple chord progressions, a predictable linear melody, use of a major key, and so on. In such case, the brain has no problem finding overlapping patterns across image and sound that it can integrate meaningfully, i.e., that the couple are about to consummate their love for each other.

But the visuals also contain dimensions that we tend to perceive as danger. In the darkness our sexual inhibitions may decrease, but also our darker side may feel freer to come out. In the darkness we also feel more vulnerable. Therefore, the elements that characterise suspenseful music – dissonant chords, eerie intervals, non-linear sounds, a minor key and so on – will also align well with the visual elements of danger.

In short, a lonely beach at sunset is as good a setting for romance as it is for murder. Therefore, whether we use a romantic melody or a suspenseful tune, the brain will find corresponding patterns in both image and sound and will align them in our minds either way. Each alignment, though, will produce a very different meaning.

As we saw earlier, in everyday life perception, our needs and intentions dictate the unifying value we choose to focus on, which in turn will dictate what aspects of the environment make it to our perceptual field.

And as we’ve just seen, in film, the cinematic needs of the story (romance, suspense, and so on) dictate the unifying value, and the filmmaker determines what auditory and visual elements will be selected to align with this value, so that both image and sound work harmoniously with each other to create the desired effect and meaning.

But there’s more to it. This unifying value will not just determine what auditory and visual elements go in but also how they will be manipulated to make the overlapping pattern work. In the beach example, we would use different camera angles, framing, and editing for the romantic option than we would use for the suspenseful one.

In summary, in film, the unifying value is determined by the cinematic needs of the story and the scene in question. The unifying value in turn determines what auditory and visual elements will be required and how they need to be manipulated so as to create correspondences or patterns between image and sound, so that the brain can make the right associations.

Two scenes from two different scenes that fabulously illustrate this principle are the chopper scene in Predator (1988) and the chopper scene in Apocalypse Now (1977). They are ideal because both films share a same subject matter, war, and both scenes share the same setting and many elements: a chopper, soldiers, and a stereo playing music. But because each has a different purpose and therefore requires a very different set of choices regarding the selection and manipulation of auditory and visual elements to create an effective audiovisual pattern.

Predator. This is one of those films where the audience must be made to care for the team as a whole and not just fo the main character. It is in fact the group’s bantering, their comradeship, and how well they work as a team what makes the film so enjoyable to watch.

The chopper scene plays a key role in that respect. Its aim is to establish and build that sense of comradeship among the team and their knack of teasing each other. Comradeship therefore is the unifying value that drives the interaction between image and sound. And this is how the auditory and visual elements were selected, manipulated, and combined to serve that purpose:

On the visual side of things, the inside of the helicopter is lit with a dim red light to create a sense of warmth and intimacy, perfect for creating an atmosphere conducive to bonding. Framing consists of medium close-ups, and editing mostly of action and reaction shots that not only create a stronger connection with the characters but also show the nuances of their interactions and the sense of camaraderie emerging between them.

Sound wise, the dialogue consists mostly of their bantering. “Long Tall Sally”, a 1956 Rock and Roll song, is playing in the background with a small portable stereo player as its source. This is an interesting choice of music, since Rock and Roll has traditionally been used for male bonding purposes. And that’s just what the song is doing here, helping them bond and putting them in the right frame of mind for bantering and developing that sense of care for each other that is so essential to the plot and to the audience getting to like them and wanting to spend time with them. As for the manipulation of the music, because it is there for the benefit of the characters, it has to be diegetic, so the frequency range of the song had to be adjusted to make it sound like it is coming from small stereo and also so as not to interfere with the dialogue.

The correspondences between image and sound that form the audiovisual pattern then are the dim red light, which works well with rock and roll to create an atmosphere conductive to bonding; and the reduction of the frequency range of the music, which gives a natural sense of space and allows for the dialogue to be clear; the clarity of the dialogue in turn plays well with the medium close-ups, which in turn serve the purpose of conveying the sense of emerging camaraderie.

Apocalypse Now. This film is about the moral ambiguity of war, particularly of the Vietnam War. It reveals this theme through the actions of US Army soldiers whose moral values rapidly disintegrate as a result of their participation in a futile, morally unjustified, war. Most notorious is their use of Western cultural artefacts (Wagner, T.S Eliot…) as ‘weapons’ intended to represent a greater “civilised” power that can easily subjugate the indigenous peoples of Vietnam.

The scene of the Ride of the Valkyries captures these thematic elements very skilfully. Its aim is to display the scale and might of the US Army and the way they exploit Western cultural artefacts to tyrannise the invaded. The unifying value that will drive the interaction between image and sound then is scale (of superiority).

Visually, the scene consists of a large number of espectacular extreme wide shots of the fleet getting ready to attack and then charging at the inhabitants of the village.

Aurally, we have Wagner’s opera Ride of the Valkyries being played full blast by a soldier from a stereo inside the helicopter because “It scares the hell out of the slopes”, and the sound of large explosions.

If in Predator images and sounds were warm and intimate, here they are large, distant, and even intimidating. Most interesting is the manipulation of the music. In both scenes the source of the music is a stereo player. But in Apocalypse Now the music had to be manipulated very differently to serve the unifying value of scale and superiority. Even though the music is technically diegetic, it had to be made to work ambi-diegetically, since reducing the frequency range the way Predator does would defeat the point and go against the purpose of the scene, to display scale and might. It wouldn’t match the extreme wide shots and it would not sound large, threatening, and imposing. It would also be asking too much of the audience to believe that the “slopes” would be able to hear a thin-sounding tune, let alone be intimidated by it.

The final point I’d like to make is, why bother? After all, most films seem to be doing just fine with the naturalistic approach.

The reason is simple: the audience. When people invest money, energy and two hours of their time, they want to get a return for it. And what would that return be? Pleasure.

As I mentioned earlier, film is a form of cognitive pretend play. Pretend play is a behaviour that has survival value. Anything that is good for our survival comes with a ‘thank you’ gift from our genes – a shot of dopamine and other feel-good chemicals. And this is ultimately what the audience are after and pay for.

Films are like a gym for the mind. They allow us to hone one of the most fundamental skills we humans need to survive our environment: pattern recognition. That’s what films are, a system of interconnected patterns. And that’s great, because the brain gets a kick out of completing patterns. It loves to impose order on an otherwise highly chaotic environment. It constantly looks for coincidences that alert it to possible causal relationships between events. And when it makes the right connections, that’s when dopamine flows into the bloodstream.

The most common types of patterns used in film tend to be patterns of shapes, light, colour, sound, movement, cause and effect, time, space, behaviour, character, and action. But the relationship between image and sound itself is another rich source of pattern, one that filmmakers seldom exploit. It offers the opportunity to create patterns that convey meaning in a non-linear, more interesting, way and it offers the opportunity to take the brain out of its perceptual slumber. What is there to lose by bothering? And what is there to win? The bottom line is that, when it comes to film, the brain wants to be engaged, and the more layers of engagement, the better. The more patterns to solve, the bigger the fix of dopamine the audience will get.

What I’ve been talking about in this post is far from everything that there is to know about the relationship between image and sound. You can think of the organising principle I’ve described as the overall strategy. Then there’s tactics that offer myriad ways of putting image and sound together. Ultimately, it’s all about creating a whole that is greater than the mere sum of the parts, and that requires a dynamic process.

Exploring what dynamic means requires a different approach to film sound than I’ve been taking so far. So shortly I’ll putting evolution and cognition aside for some time and instead explore the relationship through the lens of semiotics, which deals with human-made meaning.

But before jumping into this fascinating world of meaning making, I’d like pause and take stock of some of the things I’ve been talking about so far. Nothing like seeing things in action, so in my next post I’ll be discussing sound in Lars von Trier’s Dancer In the Dark and Breaking the Waves.

I hope you’ll be joining me. Till then, have a great time.

Film As A Simulation Of The Brain’s Mind

Film As A Simulation Of The Brain’s Mind

In my previous post, I suggested that it is more helpful to look at story as a simulation rather than in terms of plot and character when it comes to thinking about sound cinematically, at least in the early stages. So today I’d like to explore the concept of simulation more in detail.

A simulation is a model of some aspect of reality that allows us to safely carry out experiments, learn new things, and practice skills that we can then apply to real-life situations. To that end, it needs to have some kind of interface that gives us the means to interact with the virtual world it represents. A flight simulator, for example, has a real cockpit that can tilt in any direction, that makes real sounds, and that has real controls linked to a computer system that interprets the pilot’s actions and moves the cockpit accordingly as the pilot reacts to faithfully recreated settings and situations such as airport, mountains, dangerous weather conditions, and emergency landings. All these elements allow the pilot to become immersed in the situation at hand and experience it ‘for real’.

Stories, too, are powerful simulations that allow us to explore and learn about the social world we inhabit. As Jonathan Gottschall puts in his book Storytelling Animals, they are a place “where people go to practice the key skills of human social life”. And the interface that allows us to interact with this simulation is identification with fictional characters.

It makes sense. All species are hardwired to learn specifically about that which is essential to surviving their environment. To us humans, one of these essentials is being able to figure out other people’s needs and intentions so that we can adjust our actions accordingly. Not an easy task, since we have to cohabit and cooperate with large numbers of strangers we know nothing about.

If stories are simulations of the social world, then it makes sense that we interact with it by ‘stepping into the shoes’ of a fictional character. Through identification with them we get to feel their longings, frustrations, virtues and flaws. We experience first-hand their struggles, the moral dilemmas they face, and the consequences of the choices they make. We walk their walk and that’s how we learn.

But how exactly does identification work as an interface? It is easy to see how we can use a cockpit to interact with a simulation, but identification? Stepping into the character’s shoes sounds great as a metaphor but if you’re entrusted with the task of performing the feat, it can leave you feeling a bit baffled.

Luckily, there’s a way of bringing down to earth this concept of identification with fictional characters by looking at it from a more biological point of view.

I’ll start with transportation. This is another term that goes hand-in-hand with identification. Rule number one is that, in order to step into the character’s shoes, the audience need to be transported to the story world first. Another metaphor, but this time it is actually closer to the real thing since films transport us to the story world almost literally – though it is not our legs what take us there, but the physiological responses and emotions we feel in the body as a result of being exposed to the story events.

Physiological responses ‘transport’ us because, if we’re feeling anything, it means our brains think we’re there. If our brains thinks we’re there, we might as well be there. And why do our brains think we’re there?

Films work as an illusion because they exploit loopholes in the perceptual and cognitive processes that we evolved to help us navigate the environment. One of them is the communication time lapse that exists between the unconscious and the conscious brain.

In very simple terms, the brain works like this: it interprets the data that the senses have picked up in the environment, it establishes the context, it evaluates the information according to this context, it determines its importance, and it decides on the best course of action. Is it a threat? Run. An opportunity maybe to reproduce? Strike a sexy pose. A significant change? Get closer and find out more.

The first thing we feel as a result of this process is the physiological response, which tends to be quite basic and which main function is to prepare the muscles to move. Then comes a fully-fledged emotion, which contains more detailed information about the required response. A racing heart in itself does not tell us much. If it comes with a rush of fear or of lust, then we get a much more precise idea of what it’s all about.

On the whole, we’re not aware of these processes. This is because the brain operates in two basic modes, unconscious and conscious. The unconscious mode is much faster at processing things than the conscious. It can organise the neurochemistry and behaviours of our system within 80 milliseconds, whereas it takes the conscious mode about 250 milliseconds to catch up with things.

It is thanks to this gap that we experience films the way we do. When we’re watching a movie, all our brains know for the first 250 milliseconds is that the senses are sending information that is organised in patterns of light, colour, sound, movement and behaviour that feel just like the real world. So the brain does with this information what it evolved to do. It processes and evaluates it, and it prepares the body for the right action. Luckily, though, just before we actually stampede out of the theatre at the sight of a dinosaur, the conscious brain realises that it’s just a movie.

In short, by the time the conscious brain has figured out that we’re only watching a movie, the unconscious brain has already made a full cognitive evaluation of a situation it deemed to be real and it has triggered all sorts of physiological and emotional responses in our bodies. These bodily sensations are what anchor us in the story world. As far as the unconscious brain is concerned, we are there.

All this works very well for transportation into the story world, but that’s only the first stage. Other things still need to happen for identification with a character to take place.

One good way of understanding identification is by looking at what it is not – empathy. These two terms are often used interchangeably, but they are not the same. We can, for example, empathise with humans and animals alike, but we can only identify with humans. Making this distinction is crucial.

Empathy is when we recognise and share the emotions and feelings of another being. We see their situation from their perspective and as a result we get to feel what they feel. This is possible thanks to a mirroring system that we evolved to deal with our social world. It works by replicating in our biology the neural processes that happen in the brain for coordinating and carrying out actions.

Let’s say that a man decides to open the door. A set of neurons will fire away in his brain and will activate the regions responsible for coordinating the actions involved. The motor system will then receive instructions and perform the task. Thanks to this mirroring system, if I watch this man carry out this action, the same neurons will fire in my brain and activate the same regions involved in the operation, only the motor system will not perform the action.

This mechanism works just the same with emotions. By picking up very subtle sensory cues from things such as facial expressions, body language, and tone of voice in another person, the brain is able to replicate in our bodies all the neural processes involved.

It is easy to see from an adaptive point of view the benefits of having such a system. It gives us first-hand information about the intentions and motivations of others and about their mental and emotional states, and it helps us adjust our responses accordingly. If you’ve ever seen someone in distress and immediately understood that all they needed was a hug or a few friendly words, that was your mirror neurons giving you a tip. Or if you’ve ever had a feeling that someone was lying to you or was up to something, that was your mirror neurons too.

As for film, you can imagine how useful these mirror neurons are when it comes to getting the audience to connect both biologically and emotionally with the characters on the screen. Sadly, though, this is not enough. For identification to happen, it is not enough to simply get to experience the same emotions and thoughts as the character. The audience must reach those mental and emotional states through the same cognitive processes that took the character there.

With empathy, things happen there and then, mostly thanks to our mirroring system. Identification is much wider in scope. It requires the audience aligning with not only the character’s feelings and emotions but also with his or her cognitive processes – how he or she reasons, evaluates things, solves problems, sets goals, formulates plans, and so on. That is why we can empathise with an animal such as a dog but not identify with it. Dogs feel emotions similar to ours because they are mammals, but they use different cognitive processes to solve their problems and as a result we cannot ‘step’ into them.

There is one assumption in NLP (Neuro-Linguistic Programming) that makes it easier to understand this concept of identification with fictional characters: “The map is not the territory”.

The territory refers to reality, the physical world that exists independently of our experience of it. The map refers to our minds and to the model of the world we have built through our perceptions, personal experiences, culture, and what we have learnt from the significant others that have been present since early in our lives. It contains, among other things, our beliefs and values, which play a key part in determining the decisions we make and the actions we take.

We only ever get to know our own version of reality. Because it is so unique and personal, we can’t really step into someone else’s mind and experience the world through the lens of their model, not unless he or she is a fictional character, that is.

And that brings us back to story as simulation of the social world and identification as its interface.

If our mental model of the world determines the cognitive processes by which our brains perceive, interpret, and react to events in the environment…

And if identification is achieved by getting the audience to feel and think what the character does through the cognitive processes that took the character there, i.e. through the same mental model…

And if simulations allows us to create models of some aspect of the world that we can use in its stead…

… then we can achieve identification with a character by creating a customised mental model of the character’s world that contains the values, beliefs, and experiences that will lead to the emotions and behaviours we want to explore in the simulation-story. We can then ‘install’ such model in the audience’s minds. This way, their brains will be operating within that specific mental model, and as a result, they will walk the character’s walk, and arrive at the same thoughts and emotions by means of the same filters that took the character to do, think, and feel what he or she did.

Equally important is that all this process generates a lot of neural activity in the brain. The resulting synapses are what get stored in the nervous system as memories. When this happens, the film has served its evolutionary purpose, and the audience get a reward in the form of a shot of feel-good chemicals such as dopamine. That’s why films that get identification right tend to do well at the box office.

One of the main tools for achieving identification with characters is narrative structure. The set-up, for example, is all about ‘installing’ in the audience’s brains the specific parameters in the character’s model of the world that will cause him or her to take the actions and make the decisions that will lead to his or her success or demise. From that point on, even if the character is not present in the scene, the audience will be evaluating the events from within their position of identification. “Are these events good or bad?” And “Will they facilitate achievement of the goal or hamper it?”

This tool, narrative structure, is the staple of all forms of fiction in general. But there are different forms of storytelling – novels, theatre, film – and each, as anyone who’s had a go at adaptation will know, offers dramatic possibilities unique to their own format. Each allows you to get into the character’s mind in ways that the other formats can’t.

What is unique about film is that stories are told through the arbitrary combination of images and sounds which are arranged according to an established cinematic code.

It is easy to see the advantages such feature would bring to a simulation. First, the presence of the two primary senses adds realism and acts as a form of veridicality. The brain is prone to perceptual illusions due to loopholes and it knows it. Seeing is believing, so is hearing, although there’s plenty of room for mistakes. But when you hear and see, there is no doubt in the brain’s mind that it must have been an accurate perception. This makes film even more of an effective illusion.

Also, sound alone does an incredible job of transporting and immersing us in the story world. Being a mechanical wave, it literally touches us and can, through sympathetic resonance, influence our biorhythms in the mental and emotional domains.

Then there’s point-of-view visuals and point-of-hearing sound, both of which can take identification to deeper levels. Although, be warned, in and of itself this technique is not enough to get the audience to identify with a character. In horror films, for instance, a point-of-view shot of the victim does not necessarily lead to identification with the killer (identification, remember, is a process).

But where the real possibilities lie is in the actual relationship between image and sound – in how they are combined meaningfully for the purpose of creating a specific mental or emotional effect.

In reality, the brain uses emotions and perception to guide our actions. After carrying out its evaluation, it activates the right behavioural programs, i.e. emotions, and it determines which bits of information on each sensory channel are most useful to guide our actions. If most of the important information comes in the form of, say, sound waves, it will reduce the presence of other sensory data. The result will be a streamlined percept that includes only what is essential to perform the task at hand efficiently, and that will have excluded any distracting sensory stimuli.

In filmmaking, the director takes charge of this process. He or she creates a percept, or movie, by manipulating and combining images and sounds so as to fool the brain into interpreting things in a way that will lead it to trigger the physiological and emotional responses required by the narrative. Filmmaking is the art of hijacking the brain and tricking it into thinking that the film is a meaningful percept that it, the brain, created by itself.

So what we have on one side is that identification requires that the audience get to feel and think what the character does by the same perceptual and cognitive means.

On the other side, we have that filmmakers can capture image and sound separately and then recombine the two arbitrarily to manipulate the perceptual and cognitive processes that guide the audience through the narrative.

In summary, we can use the relationship between image and sound to deepen identification. Or to give it some slack if that’s what the narrative requires, like for example when the character is a dubious figure and you need to break identification in order for the audience to step back and have a chance to reflect on what has transpired and thus learn the moral lesson of the film. Identification, after all, moves along a continuum. The manipulation of audiovisual information can help things move along this continuum.

In the last few paragraphs I’ve started mentioning a few times the phrase ’the relationship between image and sound’. This is what film sound is all about. Not the sounds. Not even the act of combining the two. The relationship itself, as an entity in its own right, is the actual cinematic device, the dynamic that breathes life into film sound.

So, in my next post, I’ll be talking about the guiding principle that can make this partnership go from ‘mere combination of image and sound’ to ‘meaningful relationship between image and sound’.

The Story of Story

The Story of Story

I ended my first post concluding that until we begin to understand film sound at a deeper level, we will not get it past the creative impasse it is stuck at right now. If you ask me, the first thing we need to do is to start thinking of film sound as a subsystem within a system rather than in terms of sound effects and music, the sole purpose of which is to add realism and emotion to the images.

A good way to start this process is by defining the system sound belongs to: the film. A film is a story told in pictures and sound. So far so good. But that leaves us with another question. What is a story? Not so easy to answer. Yet, this is the first question we need to address if we are to understand this concept of film sound as a subsystem and as a cinematic tool that takes on an active role in the process of narration.

Over 2000 years ago, Aristotle set the precedent in his Poetics for how story would be defined in Western culture. Most attempts today still revolve around the same type of questions: is it action, character, plot, conflict, form, content…?

I spent some time delving into such questions, and I gained some interesting insights, but unfortunately, they proved to be almost (not entirely) futile when it came to grasping the cinematic role of film sound. I decided to look somewhere else, and I found the answer in a somewhat unexpected place: evolution.

The main force behind evolution is natural selection. This is the process by which a behavioural, physical, or physiological genetic mutation either makes it into the permanent genetic make-up of a species or dies away. The selective criteria is simple: Does the mutation give an organism a competitive advantage that makes it more capable of adapting to its environment and fitter to survive it? Yes? Pass. No? Out.

From an evolutionary point of view, stories are a behavioural adaptation. It follows then that they must have given us some sort of adaptive advantage. They have. And this advantage holds not only the key to understanding what makes a good story, and therefore a good film (I will talk about this in another post), but it also provides very important clues to the cinematic potential of film sound. So, let’s dive right into the story of story.

Long before we could tell tales, we were a chimp-like species, much like any other. There was nothing remarkable about us. Then, one day, the story goes, a group of us were outcompeted by other apes and found ourselves having to find another way of foraging. Smart things that we are, in the process of solving this crisis, we invented a new way of cooperating: collaboration, which is similar to cooperation in that it requires the members of a group to work together towards achieving a goal but with a number of important differences.

Cooperation involves working together towards the same goal, but ultimately, participants do it for their own benefit. In the case of chimps, for instance, if teaming up and cooperating with another chimp makes it more likely for them to achieve their aim, then they’ll go along with it, but essentially, they prefer to acquire and eat their food alone if the circumstances allow for it.

With collaboration, however, our thinking became geared toward figuring out ways to coordinate our actions with that of others in order to achieve a joint goal that had been pre-agreed (cooperation lacked this element of predetermination).

To us today, this may not seem like a big deal, but back then, it was a revolutionary way of doing things that required us to push our cognitive skills to the limit. We had to develop the ability to form shared goals, to assign each member of the group an individual task, and to understand how both our own task and that of the others fit within the scheme of things. Then we had to focus our joint attention on the same aim and synchronise our actions in order to achieve our goal. This behaviour was so radically different that it led to us splitting from the chimp lineage and becoming an altogether new species: the human race.

In itself, this was not enough to turn us into what we eventually became, which is the most successful species on earth. It allowed us to form slightly larger societies than other species, but that was about it. The true evolution would come about some few thousand years later, with a rare mutation in the brain that gave us the capacity for story.

Before this mutation happened, our brain was modular in nature. That is, it had separate modules to process different aspects of the environment. There was one module for inanimate objects, one for artefacts, one for animals, one for members of the same species, and so on. Our brain also had modules with dedicated types of intelligence designed to solve specific problems. For instance, it had a technical intelligence for building tools, such as a hammer or a knife, and a social intelligence for making sense of things like facial expressions.

These intelligences were pre-programmed by evolution to obey the rules of the natural world and they could not be consciously changed or controlled. A lot of brains still work like this. A bird, for example, cannot consciously decide to change the way it goes about building its nest since the process is hardwired into its brain.

Having such brain structure meant that we could only process the environment literally. Animals were animals, people were people, flowers were flowers, and ice was ice. There was a fire here and a lion there. Although we had the capacity to have imaginative thoughts within each module – we could imagine that if we hit something with a hammer it might break – we could not bring knowledge from one module to another. This was because there were no neural networks connecting them that could provide such cross-over of information.

Then, this magic mutation changed the wiring of the brain, causing the separate modules to be able to communicate with one another. Thus, the story of how metaphor, fantasy, and wild imagination were born, and how we went from, “Careful! Lion there!” to, “Once upon a time, there lived in a land far, far away a man with a lion’s head, surrounded by the flames of eternal fire. His wife, who had eyes blue like the sky, and was beautiful like a flower, but had snakes for hair, spent her days spinning the fates of the inhabitants of this land.”

To see this in action, you only have to visit the section of any Natural History Museum holing objects dated from 35,000 years ago. There, you’ll find practical utensils like spears, cups, hammers and so on. Skip forward a couple of thousand years or so and you’ll start finding more bizarre things, like the 33,000-year-old lion-headed man carving from Hohlenstein Stadel in Germany.

Anyone with a bit of common sense would have bet that such a futile and potentially dangerous behavioural feature alone would have been enough to bring our entire species to an end – daydreaming in the caveman days does not sound like a wise thing to do. But for some strange reason, that wasn’t the case. On the contrary. Not only did our fellow Neanderthals, who didn’t develop this capacity for telling stories about peoples and places that didn’t exist, start disappearing at an alarming rate from this point on, but we also started living in larger and larger settlements.

What happened was that stories allowed us to invent narratives about our past that gave us a sense of belonging together. Because our bonds were built on a mental ground, there were no limits to how many people these bonds could unite in one single stroke. Compared to our other fellow ape species, who still had to rely on one-to-one grooming to build trust among each other, this represented a big advantage. It meant we could build infinitely bigger communities, achieve more together, and protect each other more efficiently.

We could also imagine stories about the future, better worlds where the problems we faced had been eradicated. We could imagine the values, beliefs, and behaviours that made such worlds possible.

Something very interesting is happening here. On one side, we have a newly-acquired set of cognitive abilities that allow us to form shared goals, to focus our collective attention on them, and to synchronise our actions in order to achieve them. On the other, we have the newly-acquired ability to imagine other worlds and the behaviours required for them to exist.

When we attend to a story collectively, our minds unite and become tuned to the same scenario. The story triggers in us the same thoughts, emotions and learning experiences.

If we put the two skills together, shared intentionality and story, what we get is the ability to synchronise our collective beliefs, values, and behaviours so as to bring our imagined worlds into existence. And that’s just what we did. The outcome? The creation of a new human-made environment: the social domain.

One thing about this new environment is that it is a lot more unforeseeable than the physical and biological habitats we had been inhabiting so far. In the physical realm, we can be sure that the sun will rise in the morning and set in the evening. In the biological realm, we can safely bet that a hungry lion will make lunch of us or that a poisonous mushroom will make us ill at best. It is not so easy, however, to guess what folk with complex psychological computations dictating their behaviour will do next. And that’s what this new environment did to us. It turned us into highly unpredictable and often cunning creatures difficult to figure out.

No need to worry. We had stories, and they turned out to be the solution to the very problem they had created. They became invaluable tools for helping us learn to navigate this new complex social world. Through stories, we could learn the values, attitudes and behaviours that would help us function in our societies or we could explore new behaviours and their consequences. We could also use stories to sharpen our ability to make inferences from and scrutinise other minds – their inner worlds, expectations, intentions, and motivations. This, in turn, would help us make better decisions and adjust our behaviours accordingly, so as to get the best possible outcome.

That’s precisely why stories got the thumbs-up from natural selection. They increased our capability to adapt to our environment. They made us more suitable to survive this new cognitive niche we had just moved into.

One word that sums up very well the adaptive role of stories is LEARNING, but learning with a twist.

Biologically speaking, learning is the process by which information about the environment is stored in our nervous system as memories. It is an essential adaptation for any creature to be able to survive. The only problem is that for learning to take place, we must have some form of direct experience with the environment, something that comes with its risks. What if we don’t survive the experience?

That’s where the twist comes in. Stories provide us with an effective means for learning about the environment and for expanding our repertoire of beneficial behaviours without putting ourselves at risk. They allow us to learn, not through direct experience, but through SIMULATION.

It is when we look at story and film from that perspective – as a simulation of the social world rather than in terms of plot or character – that the cinematic role of sound slowly begins to reveal itself. In actual fact, the whole concept of filmmaking takes on a fresh dimension.

Simulation, therefore, will be the subject of my next blog post.

Why Sound (Still) Is the Ugly Duckling of Filmmaking

Why Sound (Still) Is the Ugly Duckling of Filmmaking

When I first started working as a sound recordist, I had so much enthusiasm. I meticulously studied the script for every project I took and carefully thought about the sounds I wanted to capture and the things I wanted to look out for on the set. When filming started, I gave it my everything. That was in the beginning. By the end of year one, I had learnt that asking for more time to figure something out was equivalent to Oliver Twist begging for more soup.

Eventually, I decided to quit recording to concentrate on postproduction. Things didn't get much better. As a sound editor, I found myself spending my time mostly “fixing it in post”. As a sound designer, except for a few occasions, I spent my hours mostly looking for sound effects that would “sit nicely” with the image. That was it.

At first, I couldn’t comprehend why there was such a negative attitude towards a part of filmmaking that has so much potential. But then, all those stories I had read as a student about the coming of sound to cinema made me realise this was something we inherited, and it still lurks in the collective unconscious of filmmakers today.

The story of the coming of sound to cinema reads like some sort of tale of terror—the tale of how sound mercilessly murdered the beautiful language of silent cinema. The truth is that's what happened. The transition to sound was an apocalypse.

By the time sound came, filmmakers had created a unique way of telling stories through the use of editing techniques and camera movements that had the power to infiltrate the viewer’s mind with the same fluidity and magic of dreams. Then sound happened, and the whole process was turned upside down. Cameras had to be locked in sound-proof booths, filming had to be done in sound studios, actors had to stay put in fixed spots in order to be within range of the microphone, editing had to succumb to the physical laws of real time and space...the list goes on.

If that wasn’t enough, many businesses that couldn’t afford the technology went under, and so did the careers of well-established directors and stars who could not adapt to sound.

And if that still wasn’t enough, audiences simply loved the novelty. They wanted to hear actors talk on the screen. They couldn’t have enough of that which most filmmakers hated: the “talkies.”

It is understandable that many filmmakers in the silent era grew to hate sound and what it had done to their precious art. Gone were the days of roaming the earth unencumbered and free to take the camera where they wanted. That’s what they thought, anyway.
Luckily, they were wrong. A few refused to succumb to “canned theatre,” knowing in their hearts there must be more to sound than talk. They became the big heroes of the transition. Lubitsch, Clair, Mamoulan, Vidor, and the Soviets (though more in theory than in practice since they couldn’t afford the technology!) all summoned the courage to overcome the odds and rescue sound films from the claws of photographed plays, and thus propelled it into a new era of exploration.

Their courage and their trials and errors led them to the discovery of the true soul of sound films. They realised that the commonly held belief that everything that was seen on screen had to be heard and only that which was seen could be heard was nonsense. It dawned on them that they could film with the camera silently and then add the sound, and that in turn led them to realise they could manipulate sound to suit the dramatic needs of their stories, they could evoke mood and atmosphere in ways they couldn't with the image, they could add new levels of fluidity by using asynchronous sound, and rather than spoon-feeding audiences, they could engage their curiosity by combining image and sound in ways that required the audience’s active engagement and interpretation.

Whatever happened to their inquisitive spirit? It seemed to die with them. Once they were gone, things went back to "normal", and sound resumed its passive role as mere accompaniment to the image. The sound technology used in films today may be state-of-the-art, but sound as a narrative and cinematic tool has barely evolved. If anything, it has gone backwards.

The crux of this matter is that we don’t really understand non-musical sound as a creative form of expression. If you think about it, we’ve been recording images since our cave days. And we have been scrutinising and perfecting this practice for over 2000 years. In contrast, we’ve only been recording sound for just over 100 years. In actual fact, even less. It was not until we entered the era of magnetic tape recorders in the 1940s that sound recording technology became widely available and people were able to start experimenting freely with sound as an expressive medium. In film, it was only in 1979 that the term sound designer was introduced to the motion picture in recognition to the contribution this role made to the medium (or more specifically, to the contribution Walter Murch made to Apocalypse Now).

To that, we have to add that there is an alarmingly low number of books on the subject of film sound, and the ones available are either written by sound designers for sound designers or by scholars for scholars. Unfortunately, most of them are obscure to the same extent they are interesting. For example, the books of Michel Chion, one of the most influential figures in film sound theory (and in my career), are fabulously insightful but hardly make one’s beach vacation more relaxing. And if you are a screenwriter or director wanting to extract some practical advice out of them, you better be prepared to forego a few hours of strolling along the seashore aimlessly and instead spend that time digging for the treasures buried in these books.

As for books that deal with filmmaking techniques in general, they all surely have a chapter on film sound. Some are longer and more detailed than others, but their content can invariably be boiled down to one sentence:

Film sound can be diegetic/non-diegetic, simultaneous/non-simultaneous, and synchronous/asynchronous; it consists of dialogue, sound effects, and music; and its role is to enhance the audience's experience, create mood, and elicit emotion.

That’s it. Hardly surprising then that sound is used mostly as mere accompaniment to the image.

The problem is that these principles feel as if they were written in stone. Not many have questioned them and not many have wondered why such principles have failed to inspire a more creative use of sound in film. This type of blind acceptance is a very common problem. We only have to look at the timeline of art history to see how artists often spend many decades stuck in one way of doing things, taking for granted that’s just how things are done. Until, that is, someone the likes of Da Vinci or Picasso comes along with a very different vision, breaking all the conventions that had been written in stone up till that point in history, and suddenly everyone realises that there was another way of doing things after all.

I can’t help feeling that’s what’s happening with film sound. We’re stuck with a theory which, if you ask me, barely scratch the tip of the iceberg.

It’s not as if the real voice of film sound hasn’t been discovered yet. As I mentioned earlier, a few pioneers in the early era had their moments of great revelation, and a few directors more recently, like Darren Aronofsky for example, have used sound incredibly well. The problem is that not many seem to be noticing, let alone following in their steps. Again, in my opinion, things are this way due to a lack of proper understanding of sound’s place in cinematic language.

We need to start opening our minds to a new way of thinking about film sound: as an active element that holds as much power as any other lighting, camera, or editing technique, and as a subsystem within a system, rather than a nice sound effect here and there, or as music that guides the emotional responses of the audience. Film sound is not something that happens mostly in post-production. Film sound has to start with the screenwriter using sound as an active narrative element, continue with the director using it as a cinematic tool in its own right, and end with the sound designer bringing it all to life in an aesthetically pleasing and coherent manner.

“This sounds grand”, you may be thinking, “but how do I do that? And where do I start?”

I myself have thought long and hard about all this, and for some time, when I started asking these questions, my mind was blank.

My breakthrough came when I realised the solution is to understand film sound at a deep level. That’s how creativity works. It starts with the process of gathering information. Then there’s a period of incubation when we don’t consciously think about this information, and during which the unconscious part of the brain starts making associations internally. After that, insights and ideas start emerging as if by magic. And because they have been processed unconsciously by the brain, they feel organic to the whole we are trying to create rather than artificially imposed.

My approach, therefore, will be to talk about film sound - and sound - from many different perspectives, with the hope that this knowledge will slowly make it into the unconscious of screenwriters and directors, and then back out in the form of inspiration and insights that are put into useful form and that give rise to films that offer us all a richer, more fulfilling cinematic experience.

It will be a long journey that will start with my next post, where I will be talking about story from an evolutionary point of view.