ajinkya
Active Member
- Joined
- Oct 27, 2007
- Messages
- 507
- Points
- 43
Last week, I was lucky enough to attend a seminar given by James Johnston (JJ), who is presently the Chief Scientist of DTS, Inc. He has been called the father of perceptual audio coding for his pioneering contributions (that include the MP3 standards) that revolutionized digital audio. His accomplishments during a 26-year career at AT&T Bell Labs have, among other achievements, allowed for the distribution of digital music and digital radio over the Internet. For more details, Ive put a brief bio at the end of this post.
This post may become long and rambling at times, since I want to write down everything I remember from the seminar before it is lost to the annals of time and memory. I want to share whatever I gleaned from his talk with fellow forum members, since there were many interesting things I was exposed to during that one hour.
JJ gave his talk in a conference room full of some of the best people in the perceptual audio (and music) circle. Thats why he started off by apologizing to all the musicians in the room for helping invent MP3. He then went on to elaborate on the science and art of acoustics, human hearing and digital music. Heres the synopsis:
1.If you want to hear live music, Be There!
JJ explained analytically the acoustic wave equations that result from a musical concert in small to large halls. I will gloss over the mathematics for brevity, but the short point is, they are exceedingly complex! Concert halls (or any room for that matter) set up multiple modes of resonance, interference and destruction patterns, all particular to the room geometry and materials. If you want to capture the acoustic experience and reproduce it from a system, theoretically you would need infinite channels to capture spatial and time information completely. For the mathematically inclined, the infinite assumption can be relaxed if you sample the spatial dimension at the Nyquist rate, based of the fundamental frequency of the highest note produced during the performance. Then you have to sample at the Nyquist rate in time, at each spatial point as well. For the non-engineers, in English this means we need a lot of microphones (mics), placed in a lot of places in the hall, all sampling fast enough in time to be able to capture and accurately reproduce the original musical signal. This is a practical impossibility for most recording studios to do today and most recording mediums currently to hold this information.
This is also a theoretical impossibility because of the following reasons. The sound field is not uniform over the concert hall (obviously). The sound field is also not uniform at spaces separated by 3-4 inches as well (non-obvious). This is again because of the complex relationship between the wavelengths of the music to the geometry of the concert hall. JJ pointed out a phenomena which can be observed if you go to concerts regularly. I dont, so forum members who do can verify or comment on this. JJ said that most people will move their heads slightly from side to side during the performance to adapt to the changing soundfield. So, if a listener likes bright sound but the soundfield suddenly becomes diffuse around his head (for some frequency combinations), they will try to adjust for the best local sweet spot by moving their heads slightly till they are again happy with what they hear. Which means that the soundfield changes even in the small space around a listeners head enough for there to be perceptual differences at different points around the head region. Of course, JJ is an engineer and so he showed us measurements of this phenomenon as well. The point is, in order to accurately reproduce the musical experience for a single listener at some theoretical central point in the hall (maybe in an equilateral triangle from the musicians stage), you would need mics placed in a hemisphere around the head, at very small spatial intervals. And Heisenbergs principle will cause the measurement to be affected. In English, if you place so many mics, they will absorb sound energy (in order for the system to capture and reproduce the signal), and so the presence of mics will fundamentally affect the soundfield and change the very sound that the mics are trying to capture.
The whole point of the above discussion is that you cannot capture the complete physical experience of being in a concert hall just by placing a small number of mics on the soundstage and around the hall. The effect of the hall is invariably lost because of the physics of the acoustic phenomena. Which is also why stereo sound CANNOT capture the live music experience, no matter what recording techniques you try. Real acoustics are all around your head, from every direction. There is no front soundstage in isolation. And we do not have infinite bandwidth to store and capture all this information. In a nutshell, if you want to hear the concert exactly the way it sounded live, then be there when the concert is being played live.
2. A Centre channel centres the soundstage
JJ pointed out a series of interesting experiments carried out in 1934-36 where the effect of placing a centre speaker was measured on a large number of people. I dont recall the names of the experimenters now but if someone does, feel free to tell me. In each trial, the person could get a better sense of depth, localization of the front stage and slight height perception. Overall, the front soundstage was judged much better than with two (left-right) speakers. The interesting thing was, this was true for music as well as movie dialogue. JJ sadly shook his head and said that the audio industry had not learnt the lessons of something we knew so many years back. They still dont incorporate a centre speaker as standard. According to JJ, the centre channel is as important for music as the left-right and not just for voice. The centre fixes the soundstage and conveys depth and distance (difference) information to our ears, which helps our brain form the impression of the placement and musicians much easier than with only two channels. As an engineer, this makes sense to me, assuming the information from the centre channel is correctly captured. JJ advised people to spend as much care and attention to the centre speaker as they would for the stereo pair. Again, the message was clear: two-channel sound is not the be-all, end-all. At a minimum, three channels are needed for accurate audio rendition of the frontstage.
A brief discussion of quadraphonic sound came up during this time. JJ was emphatic: quadraphonic was a horrible, terrible mistake. All it did was take information from the stereo channels and add it as delayed sound to the rears. There was no centre channel and no new information being conveyed by the rear channels. People would ask, what exactly am I paying extra for? It was a flawed approach to begin with.
3. Hear Ye, Ear Ye
Our body is a wonderful piece of organic machinery. Our sensory organs have evolved to be sophisticated yet subtle conveyors of complex information to our brain, which then sorts and manages all these data streams before making a decision. JJ covered human hearing in five slides. He helpfully mentioned that it takes a new student at least a semester or more to understand what he just spoke about. So I wont confuse forum members by writing about the details of the human auditory system beyond a few points that are experimentally verifiable. The theory behind the ears is very complex, most of it still unknown and evolving and open to new research. In fact, there was an interesting exchange between JJ and a professor of perception during these slides. JJ said, And so this is what happens and we dont really know why. Professor said, Well, now we do and Ill explain it to you after the lecture. JJ said, No, now you think you know why and youre not sure yet if your theory is completely correct. Ill still be interested in listening to you while remaining a skeptic. This was typical of the question-answers which went on during the talk. The interesting point for me was that these were all excellent people in their field but they agreed to disagree while maintaining their faith in their understanding of the phenomenon. A lesson for all of us on the forum while arguing about our respective audio systems?
Back to the ear. The important results (for us) after analyzing what goes on in the ear are the following:
1. If you listen to something differently, you will REMEMBER different things This is not an illusion.
2. If you have reason to assume things may be different, you will most likely listen differently. Therefore, you will remember different things
This has been taken from this presentation online, which is an expanded version of the 5 slides JJ showed us. www.aes.org/sections/pnw/ppt/jj/hashighlevel.ppt
Which again goes to prove that unless listening tests are conducted without bias (and blind), there is a high chance of the listener hearing something that is not in the audio source. But s/he will hear it as reality. They are not lying but the test has failed since non-audio parameters have marred the accuracy of the test. It is the same thing with taste and smell, both are strongly associated. When you smell an apple and then bite into it, you will get an apple taste. If you close your nose completely and dont know what youre biting into, the apple may taste like a raw potato. When doing a listening test, try to isolate all bias and prejudice. This is humanly almost impossible to do. Which is why I now understand tests where listeners could not differentiate between two amplifiers of widely different price points in a blind test but could make out which sound they preferred once they listened to them separately, non-blind. They heard differently because of their inherent bias in what they liked. And they truthfully reported what they heard. The fault was with the non-blind second test, not the listeners.
I am going to end the first part of this post here (its long enough already) and write up the second part in a few days. Ill keep editing as I recall any missed points. Here is JJs bio for those interested:
He received the BSEE and MSEE from CMU in 1975 and 1976 and since then has worked at Bell Laboratories, AT&T Research, Microsoft, and DTS on basic research in audio signal processing, audio perception, perceptual coding of both audio and video, acoustics, acoustic processing, and related subjects.He invented a number of basic techniques used in perceptual audio coding, especially in MP3, and MPEG 2 Advanced Audio Coder (AAC). In addition he has developed loudness (as well as smart intensity) models, room correction algorithms, loudspeakers, acoustic diffusers, array loudspeakers, microphone techniques, and a variety of other things that combine physical acoustics, human perception, and signal processing. These achievements have also influenced international standards for audio transmission, such as the MP3 standard, widely used in computer networks, and they are the foundation of the electronic music distribution business, including AAC players and jukebox systems.
This post may become long and rambling at times, since I want to write down everything I remember from the seminar before it is lost to the annals of time and memory. I want to share whatever I gleaned from his talk with fellow forum members, since there were many interesting things I was exposed to during that one hour.
JJ gave his talk in a conference room full of some of the best people in the perceptual audio (and music) circle. Thats why he started off by apologizing to all the musicians in the room for helping invent MP3. He then went on to elaborate on the science and art of acoustics, human hearing and digital music. Heres the synopsis:
1.If you want to hear live music, Be There!
JJ explained analytically the acoustic wave equations that result from a musical concert in small to large halls. I will gloss over the mathematics for brevity, but the short point is, they are exceedingly complex! Concert halls (or any room for that matter) set up multiple modes of resonance, interference and destruction patterns, all particular to the room geometry and materials. If you want to capture the acoustic experience and reproduce it from a system, theoretically you would need infinite channels to capture spatial and time information completely. For the mathematically inclined, the infinite assumption can be relaxed if you sample the spatial dimension at the Nyquist rate, based of the fundamental frequency of the highest note produced during the performance. Then you have to sample at the Nyquist rate in time, at each spatial point as well. For the non-engineers, in English this means we need a lot of microphones (mics), placed in a lot of places in the hall, all sampling fast enough in time to be able to capture and accurately reproduce the original musical signal. This is a practical impossibility for most recording studios to do today and most recording mediums currently to hold this information.
This is also a theoretical impossibility because of the following reasons. The sound field is not uniform over the concert hall (obviously). The sound field is also not uniform at spaces separated by 3-4 inches as well (non-obvious). This is again because of the complex relationship between the wavelengths of the music to the geometry of the concert hall. JJ pointed out a phenomena which can be observed if you go to concerts regularly. I dont, so forum members who do can verify or comment on this. JJ said that most people will move their heads slightly from side to side during the performance to adapt to the changing soundfield. So, if a listener likes bright sound but the soundfield suddenly becomes diffuse around his head (for some frequency combinations), they will try to adjust for the best local sweet spot by moving their heads slightly till they are again happy with what they hear. Which means that the soundfield changes even in the small space around a listeners head enough for there to be perceptual differences at different points around the head region. Of course, JJ is an engineer and so he showed us measurements of this phenomenon as well. The point is, in order to accurately reproduce the musical experience for a single listener at some theoretical central point in the hall (maybe in an equilateral triangle from the musicians stage), you would need mics placed in a hemisphere around the head, at very small spatial intervals. And Heisenbergs principle will cause the measurement to be affected. In English, if you place so many mics, they will absorb sound energy (in order for the system to capture and reproduce the signal), and so the presence of mics will fundamentally affect the soundfield and change the very sound that the mics are trying to capture.
The whole point of the above discussion is that you cannot capture the complete physical experience of being in a concert hall just by placing a small number of mics on the soundstage and around the hall. The effect of the hall is invariably lost because of the physics of the acoustic phenomena. Which is also why stereo sound CANNOT capture the live music experience, no matter what recording techniques you try. Real acoustics are all around your head, from every direction. There is no front soundstage in isolation. And we do not have infinite bandwidth to store and capture all this information. In a nutshell, if you want to hear the concert exactly the way it sounded live, then be there when the concert is being played live.
2. A Centre channel centres the soundstage
JJ pointed out a series of interesting experiments carried out in 1934-36 where the effect of placing a centre speaker was measured on a large number of people. I dont recall the names of the experimenters now but if someone does, feel free to tell me. In each trial, the person could get a better sense of depth, localization of the front stage and slight height perception. Overall, the front soundstage was judged much better than with two (left-right) speakers. The interesting thing was, this was true for music as well as movie dialogue. JJ sadly shook his head and said that the audio industry had not learnt the lessons of something we knew so many years back. They still dont incorporate a centre speaker as standard. According to JJ, the centre channel is as important for music as the left-right and not just for voice. The centre fixes the soundstage and conveys depth and distance (difference) information to our ears, which helps our brain form the impression of the placement and musicians much easier than with only two channels. As an engineer, this makes sense to me, assuming the information from the centre channel is correctly captured. JJ advised people to spend as much care and attention to the centre speaker as they would for the stereo pair. Again, the message was clear: two-channel sound is not the be-all, end-all. At a minimum, three channels are needed for accurate audio rendition of the frontstage.
A brief discussion of quadraphonic sound came up during this time. JJ was emphatic: quadraphonic was a horrible, terrible mistake. All it did was take information from the stereo channels and add it as delayed sound to the rears. There was no centre channel and no new information being conveyed by the rear channels. People would ask, what exactly am I paying extra for? It was a flawed approach to begin with.
3. Hear Ye, Ear Ye
Our body is a wonderful piece of organic machinery. Our sensory organs have evolved to be sophisticated yet subtle conveyors of complex information to our brain, which then sorts and manages all these data streams before making a decision. JJ covered human hearing in five slides. He helpfully mentioned that it takes a new student at least a semester or more to understand what he just spoke about. So I wont confuse forum members by writing about the details of the human auditory system beyond a few points that are experimentally verifiable. The theory behind the ears is very complex, most of it still unknown and evolving and open to new research. In fact, there was an interesting exchange between JJ and a professor of perception during these slides. JJ said, And so this is what happens and we dont really know why. Professor said, Well, now we do and Ill explain it to you after the lecture. JJ said, No, now you think you know why and youre not sure yet if your theory is completely correct. Ill still be interested in listening to you while remaining a skeptic. This was typical of the question-answers which went on during the talk. The interesting point for me was that these were all excellent people in their field but they agreed to disagree while maintaining their faith in their understanding of the phenomenon. A lesson for all of us on the forum while arguing about our respective audio systems?
Back to the ear. The important results (for us) after analyzing what goes on in the ear are the following:
1. If you listen to something differently, you will REMEMBER different things This is not an illusion.
2. If you have reason to assume things may be different, you will most likely listen differently. Therefore, you will remember different things
This has been taken from this presentation online, which is an expanded version of the 5 slides JJ showed us. www.aes.org/sections/pnw/ppt/jj/hashighlevel.ppt
Which again goes to prove that unless listening tests are conducted without bias (and blind), there is a high chance of the listener hearing something that is not in the audio source. But s/he will hear it as reality. They are not lying but the test has failed since non-audio parameters have marred the accuracy of the test. It is the same thing with taste and smell, both are strongly associated. When you smell an apple and then bite into it, you will get an apple taste. If you close your nose completely and dont know what youre biting into, the apple may taste like a raw potato. When doing a listening test, try to isolate all bias and prejudice. This is humanly almost impossible to do. Which is why I now understand tests where listeners could not differentiate between two amplifiers of widely different price points in a blind test but could make out which sound they preferred once they listened to them separately, non-blind. They heard differently because of their inherent bias in what they liked. And they truthfully reported what they heard. The fault was with the non-blind second test, not the listeners.
I am going to end the first part of this post here (its long enough already) and write up the second part in a few days. Ill keep editing as I recall any missed points. Here is JJs bio for those interested:
He received the BSEE and MSEE from CMU in 1975 and 1976 and since then has worked at Bell Laboratories, AT&T Research, Microsoft, and DTS on basic research in audio signal processing, audio perception, perceptual coding of both audio and video, acoustics, acoustic processing, and related subjects.He invented a number of basic techniques used in perceptual audio coding, especially in MP3, and MPEG 2 Advanced Audio Coder (AAC). In addition he has developed loudness (as well as smart intensity) models, room correction algorithms, loudspeakers, acoustic diffusers, array loudspeakers, microphone techniques, and a variety of other things that combine physical acoustics, human perception, and signal processing. These achievements have also influenced international standards for audio transmission, such as the MP3 standard, widely used in computer networks, and they are the foundation of the electronic music distribution business, including AAC players and jukebox systems.