One of the most contentious issues in CD mastering these days is relative loudness. The tendency to try and make CDs as loud as possible in the mastering stage (and increasingly even during mixing) has become so common that it’s viewed by many people today as what “mastering” is. However, many are unaware of the major tradeoffs in sonic quality introduced in the quest to be the loudest track on the shuffle playlist. Hopefully, some background information on this aspect of the mastering process will leave you better able to make the right decisions for your project.
The decibel (1/10th of a Bel, named after Alexander Graham Bell) is a relative unit of measure expressing a ratio between some reference level and the level you are measuring. Since it is a ratio and not an absolute measurement, it can mean a lot of different things and always needs a reference suffix to be a useful number. Some common decibel readings used in various audio fields are dBSPL (for sound pressure levels traveling through air), dBu (for electrical levels in Volts), dBm (for power levels in milliwatts), dBVU (volume units, the standard measurement of electrical audio levels on mixing consoles and analog tape machines), and dBFS (or full scale, the standard measurement level for digital audio levels). The latter two are the most relied upon in audio mixing and mastering.
Analog Recording Levels
The major shortcoming of analog recording systems, historically, was always the noise floor of the storage or playback medium, such as tape hiss or surface noise (crackles and pops) on vinyl records. All analog recording and playback media have some sort of inherent noise (though today it is often very low). Design engineers continuously tried to get as far away from this noise floor as possible by achieving greater headroom, or higher output levels, from various recording formats. The result was that the program material was more clearly audible and farther above the noise. Once clarity and audibility stopped being the main problem, greater headroom was typically used as a sort of “reserve level” so that peaks in program material could be accurately represented without distorting. The main limitations on how loud one could make a piece of music were either excessive distortion on analog tape and vinyl above a certain level, or the ability of a needle to track a record groove in a playable fashion.
With the advent of digital recording and playback technologies in the early 1980s, the primary perceived advantage was the tremendous increase in dynamic range and headroom due to a greatly lowered noise floor. Binary digits have no inherent hiss or crackle (in fact, the CD playback format actually required the introduction of a small amount of randomized noise, known as dither, to cover up the distortion created by the lowest audio bit switching on and off), so most audio engineers believed that they would now have the ability to allow peak levels in music to occasionally be much higher than the average level. This would then lessen the need for compression and limiting, (which reduce the level of peaks so that the average level can be raised up further away from the formerly problematic noise floor).
In digital audio, an instantaneous moment of sound is described by a string of ones and zeroes, and there is a limit to the loudest signal that can be described numerically (for cd audio, that’s a string of 16 ones). A different kind of dB scale was needed to take this into account. This is known as dBFS or full scale. As opposed to the VU scale, in which zero on the scale is the average operating level which program swings above and below, zero in dBFS is the absolute highest level allowed. It is the very top end of the scale and all usable audio program falls below it. The dBFS scale uses negative numbers to represent audio program level below the maximum zero. In studios with both types of metering present, a point on the negative dBFS scale would be correlated with a point on the dBVU scale. Typically this is something like -20 dBFS = 0 dBVU, so that “0 dB” on a VU meter would leave approximately 20dB of headroom for signal peaks on the dBFS scale.
There are 2 different types of ballistics (or response times) used with dBFS metering. The first is peak level: a very fast response used to see the highest instantaneous signal peaks. Many mastering engineers choose either -.1 dBFS to -.3 dBFS as the level at which the highest peaks should remain at or below (this small amount of headroom is left to compensate for intersample peaks, where the top of the arc of a waveform described by two adjacent samples can sometimes create a signal of higher level than each single sample represents). The other form of ballistics more commonly used as a reference for CD loudness is RMS or “root mean squared” metering. This is a way of averaging level over a longer period of time that is similar to the ballistics of a mechanical VU (volume units) meter. It corresponds closely to the human perception of loudness. So if you play a whole CD track while watching it on a typical digital meter, you will get a peak level that might reach anywhere from -.5 dBFS to 0dBFS, as well as an RMS level that is lower. The distance between the peak levels and the RMS levels in a song is where the big changes have occurred in the last decade. The peak levels can’t get any louder. They are already at 0dBFS. But the RMS levels, which correspond to loudness or volume, have been creeping up.
Take a listen to any CD from the mid 80s to the early 90s and you’ll find RMS levels that are usually somewhere between -18 dBFS and -12 dBFS. Two examples from this time span are represented as waveform displays below:
My Bloody Valentine—”Only Shallow” (Loveless, Sire Records, 1990)
No mastering credit. Highest average RMS level -17.3 dBFS / Maximum peak level -4.2 dBFS:
Nirvana—”Heart Shaped Box” (In Utero, DGC Records, 1993)
Bob Ludwig, Gateway Mastering. Highest average RMS level -12.7 dBFS / Maximum peak level -0.2 dBFS:
These two examples show a few different things:
First off, for many music fans, the “Loveless” album from My Bloody Valentine is one of the hallmarks of a “huge” sounding record. It is a tremendous wall of sound. The punchline is that it is one of the quietest CDs (in terms of RMS level) even in a relatively “quiet” period of CD mastering. One could easily get another 4 dB of volume out of this CD without even beginning to limit the peaks.
Look at the rectangular window as a box you can fill up with sound (the waveform). The top and bottom of the box represents the maximum level allowed: 0dBFS. The average level of the music can be seen as the area where the waveform is solid black. When the solid area gets thicker, the average level goes up and when it gets thinner, the level goes down.
The jagged bits and spikes coming off the solid area are the peaks. The more white background you can see peaking through the black waveform, the bigger the distance between the peak level and the average level.
You can see in the MBV track that the peaks don’t even come anywhere near the maximum allowed level, and although the waveform is dense (due to the density of the music) it has a shape that varies and the peaks stick up at random heights out of the average area. The changing shape is a visual representation of the dynamics in the music. It’s easy to see that the concern for this CD was simply for how it sounded and for retaining the dynamics, not for making it louder by filling the box and bringing the peaks down into the area of the average level. No one seemed bothered by this when it was originally released. People simply turned it up to the volume they found appropriate at home.
Next is a CD that was near the loud end of the spectrum in its day. “In Utero” can be placed around the beginning of the current era of inflated loudness. Using the “box of sound” analogy again, you can see that the goal is beginning to be to try to use all the area available by shaving all the peaks off at the same level (peak limiting) and then turning the level up until those peaks almost hit the edges of the box. This track subjectively seems quite a bit louder than the MBV track, but still has a musical sound and a fair amount of dynamic range. You can still see a changing shape in the average level, and plenty of white space between the average area and the peaks.
Next, look at an example that illustrates the current trend in CD mastering:
Radiohead—”Dollars And Cents” (Amnesiac, Capitol Records 2001)
Bob Ludwig, Gateway Mastering. Highest average RMS level -6.3 dBFS / Maximum peak level -0.09 dBFS:
Here you can see that the box of sound is pretty well filled up! Both the peaks and much of the average material go right up to the edge for much of the song. The denser the black area is, the less distance between the peak and average levels. So the average level is much louder, and the overall dynamics of the song are less apparent. In the case of this Radiohead album, it is done in a way that is still shockingly transparent to the music (compared with what you see here) and “Amnesiac” is a remarkably good sounding CD (more on this later). But, if you consider that the same engineer mastered the Nirvana album shown above only 8 years earlier, you can see how other considerations, aside from simply the best sound for the material, have entered the picture. Filling the box is becoming increasingly important (to some!).
Finally, take a look at how things have changed with two releases of the same CD seven years apart. Admittedly, the example below has a bit of a wild-card thrown in (namely the artist, Iggy Pop, in the liner notes for the re-release takes full credit for the sonic decisions in the newer version), but it’s a startling example of how the acceptable delivery of modern music has changed:
Stooges—Search and Destroy (Raw Power, Columbia Records, 1990 CD release)
No mastering credit. Highest average RMS level -13.9 dBFS / Maximum peak level -1.7 dBFS:
Stooges—Search and Destroy (Raw Power, Sony Records, 1997 remastered CD release)
Sony Mastering. Highest average RMS level -2.58 dBFS / Max peak level 0.0 dBFS (constant digital overs):
THIS BOX IS FULL!!!!!!!!!! There is a difference of less than 3 decibels between the loudest average part of this track and the loudest digital word that can be represented as sound. This is what we call a true sausage. This record is shockingly loud, but also just shocking. The volume in this case has been achieved with almost constant clipping of the original waveforms. Which brings us to:
There are a number of ways to reduce the difference between the loudest and quietest parts of the music (or reducing the dynamic range), which is what is being done to music on a “louder” CD. Here are some brief descriptions of the most common among them:
- COMPRESSION is a process in which you choose a threshold at a given volume or level and tell the device that beyond this threshold “x,” the gain ratio will be increased so that for a given increase of “y” decibels at the input, the output level will only be “z” decibels louder. This ends up being expressed in a ratio such as 2:1 (where 2 is y and 1 is z) meaning that for every 2 decibels above the threshold level at the input to the compressor, the output will only increase by 1 decibel. This has the result of taking the louder parts and making them come out quieter, allowing you to turn the whole thing up at the output stage since now the previously loudest parts are compressed downward.
- LIMITING is simply compression at a very high ratio (generally between 10:1 and ∞:1, or infinity to one). This sort of high-ratio compression was originally used in radio broadcasting where a live broadcast had to accommodate a wide range of levels with the certainty that nothing would overload. No matter what level the input is above the threshold, the output level either varies only very slightly or not at all.
- DIGITAL “BRICKWALL” LIMITING is a more recent development. This process can be provided by either software plug-ins or a dedicated hardware box. These limiters process digital information rather than an analog electrical signal, introducing the ability to “look ahead” by buffering (or temporarily storing) the data so that they can effectively plan ahead on how to limit the signal peak before it happens. Used modestly, these devices can raise gain quite a bit by limiting a small amount of peaks, sacrificing a few dynamic moments for a safe (digitally speaking) overall gain increase. However, many engineers certainly use brickwall limiting very immodestly, creating noticeable distortion artifacts and drastically changing the balance of the mix.
- CLIPPING or purposefully overloading the inputs of analog to digital (A/D) converters is the latest in “loudening” technology. Some A/D converter devices can handle this questionable use better than others, but in all cases the result is that the peak of a waveform is simply lopped off. This is probably the most dubious way to remove the dynamics from music. The Stooges remaster displayed above is probably the most dramatic example of this technique, and it is certainly the most ear-fatigue-inducing way to get things loud. Below are two closeups of waveforms from some of the examples shown previously.
Radiohead—”Dollars and Cents”:
In this example, a very loud but musical sounding master was achieved through a layered approach to compression that probably began during tracking, continued through the mixing, and was finished off in mastering. There are no clipped waveforms here as shown above, and the result is quite loud but not terribly crushed or distorted music.
Stooges (remaster)—Search And Destroy:
In this example, you can see most vividly how the waveforms are so clipped that large areas are simply flatlines where musical detail used to be. There are large amounts of music that simply disappear into the non-existent area above 0dBFS. All these flat lines sound like crunchy, your-stereo-is-broken type-stuff. Some people like it. Fortunately, not everyone does (yet?)!
The overall point of this is that there’s still no free lunch. The reason CDs were quieter in the past was that it took a while for it to occur to people to try to hijack the volume knob from listeners. People spent a long time mixing their music to sound just the way they wanted it. Typically, they didn’t want someone to take that music and make radical or drastic changes to it after hearing it only a handful of times in a mastering session. The job of the mastering engineer was just to balance out any inconsistencies and transfer it to the delivery medium.
In this age, we all do tend to listen to music in much noisier environments and generally, perhaps, pay less attention to the music we hear. In such an environment, it is tempting to try to make your music “shout-out” the loudest. However, the only way to blast into people’s ears louder than the last song is to introduce sonic sacrifices to your original mixes to achieve this goal. Much of today’s modern music can certainly jump out at you from even the tinniest of computer speakers, but often doesn’t stand up to any serious scrutiny on a good full-range playback system. And it’s often chock full of pumping compression, distortion and other ear-fatiguing artifacts. Highly compressed or limited music with no dynamic range is physically difficult to listen to for any length of time. This “hearing fatigue” doesn’t present itself as obviously aching muscles, like other forms of physical fatigue, so it’s not obvious to the listener that he or she is being affected. But if you ever wonder why you don’t like modern music as much as older recordings, or why you don’t like to listen to it for long periods of time (much less over the years), this physical and mental hearing fatigue is a big part of the reason.
The situation has gotten so out of hand that there is now a feature in iTunes called “Soundcheck” that goes through your whole library and analyzes the average volume of all the songs, making a change to the metadata associated with the loudest tracks that tells the player to play them at a lower volume. This is a pretty imperfect solution (often, ironically, it leaves sparse or acoustic music at much higher levels than thick, rock music), but is an attempt to mitigate the unsettling and sometimes dangerous (depending on how loud you listen to music) level differences that exist in digitally delivered music today.
There is a happy medium for most projects using the powerful tools available to manage gain and dynamic range in mastering. Familiarize yourself and your mastering engineer with a few examples of music you believe sounds good and bad. This can be the best tool to communicate your sonic preferences and to help your album reach its fullest potential while preserving the important sonic decisions made during the arrangement and mixing stages.