I am not going to contest that digital (even in theory) is an approximation. Primarily because it doesn't satisfy the criteria of the sampling theorem. Sampling theorem only holds for band limited signals. One wouldn't know where to put
the anti-aliazing/imaging filters if the signal was not band limited. A 5 minute song is time limited => not band limited.
The fact also has it that "the loss" in digital medium is only a ONCE phenomena. After a signal has been approximated, it never changes on the storage medium.
My conclusion is that: digital suffers from a single (and predictable) flaw (loss in rounding/approximation), but analog suffers from number of flaws.
From my reading of responses on this thread (I've just quoted two but there are others), I feel there may be confusion among us about where in the entire sampling process the "loss" in signal information actually occurs. So I will try to explain my understanding of what the sampling theorem is, in an effort to reconcile these differences, and learn something myself in the process.
The Nyquist-Shannon sampling theorem states (for a sine wave with frequency f) that if you sample this signal (i.e. pick points (samples) off this signal at periodic time instants) at a frequency at least twice the signal frequency f, you will lose NO information about the original signal and will be able to reconstruct it faithfully again from ONLY the samples you have and knowledge about f. The details of the theorem also describe the process of reconstruction i.e., the mathematical operations needed on the samples to obtain the original sine wave in the time domain. There is NO ambiguity about this; the mathematics are well-defined and consistent.
The theorem extends to any arbitrary signal that is BAND-LIMITED i.e., has information up to a maximum frequency f_m and no information beyond this frequency. This works because Fourier proved that any periodic signal can be composed by superimposing sine and cosine signals with multiples of frequencies (called harmonics). For non-periodic signals, the Fourier Transform is the limiting case of the Fourier series, both are mathematically precise and consistently defined. Hence, for any arbitrary signal that contains infinite frequencies, one needs infinite sine/cos components to create it. But for any arbitrary signal that is band-limited we need a limited/finite number of sine/cos components to create it. And that is the "trick" behind why we can recreate a band-limited (periodic or non-periodic) analog signal only from its samples.
In essence, the knowledge that there is no information beyond a certain frequency (in the "frequency domain" perspective) helps us to "compensate for" the lack of information in the samples (in the "time domain" perspective). This statement is a very loose way of condensing the precise mathematics of the theorem but it is reasonably apt for someone who wants to intuitvely understand where the infinity of one representation (the analog signal) is being accounted for/balanced in the other representation (frequency spectrum of the signal).
So, if we assume that audio signals for music are band-limited to 20 Hz - 20 Khz (by relying on tests and measurements from various studies/labs/people that humans cannot hear beyond 20 Khz, even with golden ears or implants), then the sampling theorem IS completely applicable to audio signals in the real world. With my description so far, there has been no approximation encountered in the sampling process.
So where are all the places that approximations to the theorem manifest themselves in a real-world sampling operation on actual audio signals? The first place is the operation of sampling itself. The theorem assumes that the sampler is modeled by a Dirac-Delta function (a pulse of infinitesimal duration). This function is a mathematical entity and cannot exist in the real world in its ideal form. However, we can and have come extremely close in approximating it because today's electronics can create pulses nano to microseconds in duration, which is "good enough" for sampling operations. This approximation does not have a measurable detrimental impact on the samples themselves. In fact, such approximations (creating a signal of very small duration or magnitude with respect to the rest of the system to approximate the ideal infinitesimal signal) are done all the time in micro-eletronics, including assuming the transistor base current I_b to be zero for small-signal transistor models that many engineers still use as a first phase of design.
The next approximation is more significant because it does have an impact on the fidelity of the information captured in the samples. And this is the fact that any digital process has finite word-length. What this means is that any data
captured in any digital device (including the computer which is based on binary, and hence digital, logic) can be represented by a certain number of electronic states. In the computer, these are in terms of the number of bits contained in each "word" of information in memory/storage. This is where the 16-bit word of CD-based audio comes from. And any finite number of bits cannot represent the mathematics (infinity) of real numbers, which is what the sampling theorem assumes. So there is definitely loss of information in EACH sample about the precise magnitude/value of the original signal from which the sample was taken from, at each time instant.
Does this cause a significant degradation in the fidelity of the sampled representation? It depends on whom you speak to. Some folks think that 16 bits is enough to capture the dynamic range of today's music. Others feel 24 or even 32 bits should be the standard. However, don't be fooled into thinking that capturing the audio signal with an analog process is free from this loss of signal fidelity either. As Thad, ReignofChaos, ThatGuy, Ranjeetrain, and others have pointed out the deficiencies in the analog capturing process admirably, I will just reiterate one example: the grooves in an LP are of finite depth and width too! Which means, again, that the precise value of the magnitude of the original audio signal is being recast into the range of values that can be represented by the finite dimensions of the vinyl groove. Both processes approximate the original signal, one in terms of the bit-depth of the samples captured, the other in terms of the physical dimensions of the LP grooves or the resolution of the magnetic particle density that exists on the master analog tape. Summary message: The act of recording is itself an approximating process, whether analog or digital.
Once the signal is stored in the computer as a series of bits (now with reduced fidelity because of the limited word length per sample), this stored signal can be used to completely and faithfully reproduce the original analog signal, according to the sampling theorem (and in reality), taking into account again that the magnitude of the output signal may not be the exact value of the original signal (to repeat, because of the limit of the finite word-length representation). At this point, except for the "rounding error" (which is what I assume Ranjeetrain means in his post?) this signal contains ALL the information in the original analog signal. There is NO loss in this representation. I am repeating the same statement to drive home the point, so apologies to those who've already grasped it :|
So how to we recover the original signal from the sampled version? This is where the biggest approximation to the sampling theorem takes place. And this requires a bit of mathematics to understand completely. I will try to give an intuitive description, the mathematics are well described in any signal processing textbook (see sections on anti-aliasing and sampling reconstruction or similar...or use Google). The sampled signal, by the very nature of the sampling operation, has frequency components that contain the original signal's frquency components, and periodically repeating harmonics i.e., higher-order frequency components that are an artifact of the samping process. We don't want these to be reproduced in the reconstructed signal since they are added (wrong) information. What we want is only to extract frequency information upto what is contained in the original signal. And how do we know where to cut off the frequency?
That is why the sampling theorem assumes BAND-LIMITED input signals! We know that the original signal did not have any frequency content after f_m (the maximum frequency contained in the signal). Hence, if we design a low-pass filter that only allows frequencies upto f_max to go through (also called anti-aliasing filter, as mentioned by ThatGuy), and pass the analog signal reconstructed from the samples through this filer, the output signal will contain only the frequency content of the original signal. And we have recovered exactly what we sampled, because the relation between the Fourier transform of a signal and the signal itself is one-to-one. In other words, given the frequency content of a signal, only that unique signal corresponds to that frequency content. So where is the problem?
The problem is the filtering operation. A perfect low-pass filter (so called "brick-wall" filter) is again a mathematical idealisation. It is not possible for electrical devices to suddenly cut off frequencies after a certain maximum.
This leads to a discontinous function and unimplementable in its ideal form. As engineers, we have come very close to the ideal, just like in the Dirac-Delta function approximation, but we have to be honest and say that this is an
approximation to the mathematics of the theorem. How appproximate? Depends on the implementation of the filter. And that is where different designers can work their magic and experience in filter design, noise reduction et al. And as Asit had pointed out a while back, "ringing" is a very real and problematic phenomenon at this stage as well. I am not going into the details of any of these only because this post is trying to clearly define the points where approximations are present in the digital process, not in the intricacies of what the various problems in approximations are.
Another way to think about the approximation is that the Inverse Fourier Transform of an ideal low-pass filter (frequency domain perspective) is a Sinc function (time domain perspective). In the time domain, this sinc function has to be "convolved" (another mathematical operation) with the sampled signal to get back the original signal. So where is the problem? Well, the sinc function is again a mathematical idealisation of a real-word function that has a main lobe and infinite side lobes that decay over time. Yes, Infinite. And this is where we have traded the infinity neglected in the sampling process by reintroducing it in the reconstruction process. So, you see, the infinity of the real-number representation never goes away, we just juggle it from one stage to another. In any case, a real-world sinc signal is extremely difficult to create. I just said the same thing a few sentences back when I said that an ideal low-pass filter is extremely dificult to implement. The time and frequency domain perspectives are just two ways of looking at the same thing.
I hope this post makes the approximations to the sampling theorem in the real-world digitisation process a little clearer. If any of my assumptions or explanations are inadvertently in error, I will be glad to understand and correct them, and also learn something new in the process.
-Jinx