November 2003Oversampling Versus Upsampling: Differences Explained
The year 1982 should be long remembered as the year the audio industry changed forever. A joint venture by electronics giants Sony and Phillips launched the compact disc for audio -- along with accompanying players -- to a marketplace that demonstrated its love of the new format by gobbling up an astonishing one billion CDs in the first ten years.
The reasons for this popularity are many, but the main selling feature was arguably a perceived increase in sound quality. Ironically, in the early days, CD-player sound was noticeably sub-optimal, because of a digital audio phenomenon called aliasing, which distorted the musics pitch and amplitude. Early filters moderately reduced aliasing, but it was not until the technique of oversampling was incorporated into CD-player design that the effect was significantly diminished and CD sound improved.
Throughout the 80s, oversampling techniques improved from the original 2x (two times) oversampling to 8x oversampling. Then, in the 90s, another technique -- called upsampling by its originators -- was incorporated into CD-player design. This development quickly became a source of contention in the audio industry. Many authorities argued that oversampling and upsampling were mathematically the same, while others swore there were significant sound improvements with the new process.
What are the differences between oversampling and upsampling? This article will first explain sampling and the aliasing effect before it attempts to explain the differences between these two corrective techniques.
What we perceive as sound is actually our response to minute air-pressure changes that are created by vibrating bodies such as vocal cords. Digital sound, or digital audio, is simply a numeric representation or model of these changes, and is made possible by a technique called sampling.
Analogies are helpful when describing sampling, so imagine that inside your analog-to-digital converter there is a sampler which contains a camera that takes pictures of the incoming soundwave. Each pictured point of the wave is assigned a number, and in turn, each of these numbers is coded in binary, as a series of digits (one per each bit of resolution) with a value of 1 or 0.
In actuality, the "camera" is a very fast electronic switch called -- logically enough -- a sampling switch. In response to a switching signal, this electronic circuit opens and closes hundreds of thousands of times per second, allowing an incredible amount of samples per second to be taken. As the soundwave enters the sampler, it is transformed into pulses that represent the binary code. The remarkable thing about sampling is that if enough "pictures" -- or samples -- are taken, sufficient information about the soundwave is preserved so that it can be accurately represented with the pulsed binary code.
Remarkable though it is, digitally representing sound by numbers has inherent limitations. This is because sound is a continuous wave, and waves have an infinite number of points that could, in theory, be assigned a number or digitized. Imagine trying to form a wave using blocks and you have a visual idea of the limits and difficulties of digital audio.
Of course, the more blocks you have, the more likely it is that you could build a wave that is as smooth as the original. Thats why the number of times per second the sampling switch opens and closes, or the switching frequency, is important to the quality of the recording.
Switching frequency cannot be infinitely fast and is not randomly assigned. To determine switching frequency, a mathematical equation is used to produce a value called the Nyquist limit. In 1928, while conducting research on telegraph-transmission theory, Harry Nyquist of AT&T developed a theory on frequencies that later became known as the Nyquist-Shannon Sampling Theorem, after being formally proved by Bell Labs Claude Shannon in 1949. This theorem stated that "when converting from an analog signal to digital (or otherwise sampling a signal at discrete intervals), the sampling frequency must be greater than twice the highest frequency of the input signal in order to be able to reconstruct the original perfectly from the sampled version."
The constraint within this theory has become known as the Nyquist limit, and while a lot has changed in telecommunications since then, this constraint still holds true. If the rate of sampling is too slow, sound quality will suffer because too much information has been left out. Therefore, CDs have a frequency of 44.1kHz and the Nyquist limit determines that highest frequency we can play the CD is half the recorded rate, which is 22.05kHz.
Mathematics describes a perfect world: Simply plug the numbers into Nyquist's theorem, calculate the Nyquist limit, and everything will sound perfect. Unfortunately, electronic circuits such as sampling switches care little about math; they exist in the mathematically imperfect real world.
The sampling switch creates distortions called aliasing frequencies. Alias frequencies occur above the Nyquist limit (22.05kHz for CDs), and these frequencies enter the sampler because it cannot distinguish between the true frequencies of the sampled sound and these sampling-switch distortions. Filters called low-pass, or anti-aliasing, filters are therefore built into CD players and they attempt to block frequencies above 22.05kHz from entering the digital-to-audio converter. What is allowed to enter is referred to as the baseband. With CD players, the baseband usually ranges from 20Hz to 20,000Hz.
Alias frequencies, mind you, are not always considerate enough to lie above the baseband, so first I will discuss aliasing on the digital-to-analog-conversion side. Imagine that a sampler had a sampling frequency of 10kHz. The Nyquist limit determines that the highest frequency reproduced is therefore 5kHz, so the filter only allows inputs with frequencies below 5kHz, and these comprise the baseband. The filter works well with, for example, an incoming sound with a frequency of 4kHz. The sampling switch samples not only the 4kHz, but also produces the alias frequency of 6kHz (10kHz to 4kHz), which lies out of the baseband, and so is easily filtered.
However, when the incoming sound frequencies approach the sampling frequency, problems arise on the analog-to-digital-conversion side. Let's say the incoming frequency is 9kHz. The alias frequency will then be 1kHz (10kHz to 9kHz), which is within the baseband and cannot be filtered out. Sound quality consequently suffers as this alias distortion passes underneath the wire. To minimize this type of aliasing artifact, the analog input to a A/D converter should also be filtered of frequencies above its Nyquist limit. A 5kHz low-pass filter before the sampler would prevent the 9kHz input from producing that 1kHz alias signal.
Furthermore, it is no easy matter to design a filter that allows frequencies of 20kHz to pass while blocking frequencies only a couple of kilohertz higher, such as 22.05kHz. A filter with a very steep slope is necessary for this very quick change of blocking 0 decibels to blocking a full 90 decibels within 2.05kHz. The steep slopes in filters designed to do this cause phase shifts and group delays within the filter circuitry, which also impair sound quality.
"If you can't change the world, change yourself" suggests a popular saying. In the digital-audio world, the corollary would be "If you can't change the filters, change what has to be filtered." As we mentioned earlier, the sampling switch creates the aliasing frequencies and therefore aliasing frequencies are dependent on sampling frequency. To prevent the aliasing frequencies from entering the baseband and thereby distorting the sound, one solution is to simply increase the frequency of the sampling switch. This technique -- which became known as oversampling -- greatly improved CD-player sound because it made the alias distortion filterable by simpler anti-aliasing filters.
The first oversamplers raised the frequencies to 2x the sampling rate of 44.1kHz, or 88.2kHz. The audio industry has now standardized at an 8x oversampling rate, which means a CD's sampling frequency is increased to 352.8kHz before it enters the digital-to-audio converter. This effectively moves the aliasing frequencies to values near 300kHz, much higher than the original 22.05kHz. Instead of having to filter out all sound within a couple of kHz (20kHz to 22.05kHz), the filters have a couple of hundred kHz with which to attenuate the aliasing frequencies. Consequently, none of the problems that exist when filtering a steep slope are encountered. And, since these filters no longer have steep slopes, they do not further distort the sound with phase shift or group delay.
Oversampling improved sound quality in another way: It spread out the quantization distortion, thereby allowing it to be more effectively dithered and noise-shaped. This occurred because oversampling increases the bandwidth from 22.05kHz to 176.4kHz. The oversampling technique, therefore, has greatly improved CD-player sound quality, and one would think the apex of digital audio has been achieved. But then another technique -- upsampling -- appeared.
The development of upsampling immediately proved to be a source of controversy. Like oversampling, upsampling increases the frequency (usually to 192kHz) in order to increase aliasing frequencies and thereby move them out of baseband. One difference between upsampling and oversampling is where in the process they occur. Upsamplers are usually found near the end of the digital process, just in front of the D/A converter, partly because they generate so much more data -- 8x upsampling, for example, creates eight times the data. Because of the large amount of data, upsamplers are sometimes located in separate chassis -- dCS's Purcell is an example of this -- but that's not usually the case. Audio Aero's Capitole Mk II CD player and Orpheus Laboratories' One digital-to-analog converter, for example, both use Anagram Technologies' Adaptive Time Filtering (ATF) 24/192 upsampler, a small device that resides inside their chassis.
But aside from this difference, many people claim the technique is still simply oversampling. Others, meanwhile, feel that regardless of the nomenclature, upsampling still improves CD-player sound quality to the point that it can finally be equal with the highest fidelity.
Oversampling versus upsampling
To end the confusion, we turn to Thierry Heeb, a founder of the pioneering Swiss firm Anagram Technologies, which develops and manufactures advanced high-performance audio-signal-processing solutions that includes the ATF 24/192 upsampler found in numerous CD players and DACs. Thierry clearly defines the two techniques: "Oversampling is an upsampling process where the ratio between output sampling frequency and input sampling frequency is an integer larger than 1. Upsampling is any kind of transformation providing an output sampling frequency that is higher than the input sampling frequency and not necessarily a ratio."
Why then the debate? As Thierry explains, "The difference [between oversampling and upsampling] is, to my opinion, more related to the clocking and jitter problem as well as to the statistical distribution of errors or artifacts." Once again, this takes us full circle to the point that oversampling and upsampling occur in different parts of digital audio systems.
Clock jitter, also known as time-domain distortion, alters the time at which the digitized sound information reaches a certain point. The digital information is accurate but arrives slightly too soon or too late, which affects the quality of the sound. It occurs when the samples that are output are not synchronized with the input samples. Of course, today's technology prevents a lot of clock-jitter problems with CD players, but problems still arise when combining multiple digital-audio devices.
A circuit called a phase-locked loop (PLL), for example, can be used to recover the clock signal when two digital devices are connected. The PLL tracks the frequency of the input, compensating for small variations from the center frequency at which that data is supposed to be entering. Therefore, the output from a well-designed PLL creates less clock jitter than its input even though it is not totally independent of that input clock.
"Even if both device clocks are at exactly the same frequency, they will almost certainly not be in phase," says Thierry. "Let's consider the clocking and jitter problem. In an oversampling system, the input sampling rate (Fs) is increased 8 times, and the output sampling rate is therefore 8 x Fs. This 8 x Fs clock is generated by an 8 x phase-locked loop based on Fs. So the two clocks are strongly linked and any imperfection appearing on the Fs clock will be reflected in the 8 x Fs clock. With upsampling we use unrelated clocks to drive the input and the output respectively. This means that even if the input clock is imperfect, the output clock will remain as precise as it is."
Thierry explains that while oversampling provides a digital signal with a clear spectrum up to 8 x Fs / 2, or the Nyquist limit, it does not address the problem of jitter in depth. Oversampling, he says, can allow the use of a "lighter output reconstruction filter with all the benefits it brings, but we are not isolated from clock imperfections that would pass on to the 8 x Fs clock." Upsampling, on the other hand, can help to overcome the clock imperfections.
As an example, Thierry turns to a scenario involving a digital-to-analog converter: "The digital input stream is PLL-ed and passed to the upsampler. The upsampler will output data at the rate given by the local output clock. Provided this one is of very high quality and the upsampler does good jitter rejection on the input clock, the D/A converters are clocked by a signal that is unrelated to the quality of the input clock. Moreover, having the output clock in the vicinity of the D/A chips ensures that those later will work in the best conditions. I think this jitter question is the main point of so-called sonic differences between oversampling and upsampling."
"The second point is a bit more mathematical," Thierry cautions. "If you use an oversampling of 8 x Fs, you remain synchronous to the input clock. Any artifacts introduced by the oversampling process will be time correlated with the input clock and thus appear at regularly spaces moments in time."
This means, Thierry explains, that by using an upsampling process (as described above, e.g., asynchronous with two separate clocks), the artifacts due to the upsampling process are likely to occur at any moment in time, not only at specific points. "This kind of spreads the artifacts in time, thus becoming less noticeable to the ear," he says.
"Basically if we lived in a world of perfect clocks and perfect hardware, then there would not be any sonic differences between oversampling and upsampling," Thierry concludes. "But we are in the real world with its imperfections, and as such, an upsampler may have an advantage over an oversampler."
Whether or not upsamplers will shake the foundations of the audio industry as profoundly as the original introduction of digital audio in 1980 is far from clear. More certain is the conclusion that those audiophiles in search of ever higher fidelity in sound reproduction continue to be rewarded by advances in digital technology. From filters to oversampling to upsampling, the gap between the mathematical "perfect world," whose potential we first glimpsed in 1982, and the "real world with its imperfections," is steadily decreasing.
Copyright © 2003 SoundStage!
All Rights Reserved