So in another thread, talking about differences in digital sources (and specifically brands of USB sticks vs one another), I was challenged to present reasoned evidence for my 'scepticism'. I foolishly promised I would, so here goes. Starting with the basics, working my way up to issues that are more directly relevant.
My background by the way is under and postgraduate engineering and mathematics from university and a professional career in the digital domain (although not audio). Let's just say my current job involves number crunching and fast computers (lots of them).
I'll concentrate on the following:
Myth #1: digital HiFi is at the forefront of digital technology and science doesn't understand and can't explain many of the subtleties involved
Myth #2: many of the limitations from the analog world can be translated and applied to digital audio; specifically, every component in the chain matters, down to USB sticks and hard drives
Obviously if you want to validate my claims and/or read more, please Google the keywords below -- I will try and stick to universally accepted terminology and it should be reasonably clear when I'm quiting facts vs stating subjective opinion.
So what is digital audio?
As you know, microphones use membranes which vibrate to sound, and translates the oscillations into an electrical signal, ie variations in voltage, down a cable. This signal can be amplified and sent to a speaker, which turns the electrical signal back into sound waves by moving its speaker membranes according to the incoming signal.
The electrical signals to/from microphones and speakers is said to be an "analog" of the sound and examples of "analog" signals.
There are several benefits in converting this information into digital format. I will explain some of these benefits further below, but let's look at how it all works first.
The standard representation of digital audio in computers, video and on CDs is called PCM (pulse-code modulation). The basic idea is simple. You feed the analog signal into a analog-to-digital converter (ADC). The ADC samples the analog signal at regular intervals and outputs a series of digital readings ("samples") of the signal. The phrase "digital" by the way means discrete (discontinuous), ie a numeric representation of the strength (amplitude) of the signal at that point in time. This series of digital samples constitutes the digital audio data. A wav file on your computer is exactly this. And because the information is described as numbers, it can be stored on digital media (hard drives, USB sticks, servers, CDs, DVDs, etc) and sent over digital communication channels (eg the Internet, your home wireless network, or the cable between your digital source and your DAC).
The functional opposite of the ADC is the digital-to-analog converter, aka the DAC, which takes linear PCM data and reproduces the original analog signal so that it can be fed to a pre-amplifier stage (and ultimately speakers).
There are 2 key factors to consider here:
a) sampling period, ie the time spacing between samples. This is usually described in terms of frequency and in Hz (ie number of samples per second). So the higher the sampling frequency, the tighter spaced the samples and the more detail you can capture in the original analog signal. There is an upper theoretical limit to the frequencies that can be represented through sampling, the so called Nyquist frequency, which is 0.5 times the sample frequency. So for example, with 44.1kHz sampling, which is the frequency used on CDs, you can theoretically capture and represent anything up to 22kHz. For perfect representation, however, the reconstruction requires an ideal filter that passes some frequencies unchanged while suppressing all others completely (commonly called a brickwall filter). Unfortunately such a filter is unattainable both in practice and in theory, so in reality, 18-20kHz is a more realistic cut-off and of course it's no coincidence then that 44.1kHz has been chosen to represent roughly the frequency band that humans can hear. DACMagic owners will be familiar with these types of filters by the way as they have a couple to choose from. Usually, the filter and its parameterization is a fix part of the design.
b) the resolution (bit depth) of the individual samples. You will have heard CD being described as 44kHz / 16bit - so while the first is the sampling frequency, the second is a measure of the number of possible values each sample can take. 16 bits will give you 65,536 possible combinations, so the sample values in that example range from -32768 to +32767 (minimum amplitude to maximum amplitude).
So what are the key differences between analog and digital representation? First of all, an analog signal has a theoretically infinite resolution. The digital representation has not. But in practice, analog signals are subject to noise and are sensitive to small fluctuations in the signal. It is difficult to detect when such degradation occurs, and impossible to correct when they do. A comparable performing digital system is more complex – but with the benefit that degradation can not only be detected but corrected as well. Examples of this below.
So how do you store the data and what are the challenges?
Formats
There are many audio file formats, generally in 3 categories: uncompressed (eg WAV or raw PCM), compressed (eg FLAC or Apple Lossless), and formats with lossy compression (eg MP3, AAC, Vorbis). The format describes how the audio is organised in the audio file.
The most 'vanilla' format is raw PCM or WAV, which is not much more than the PCM data, sample by sample, organised in a long sequence.
FLAC is getting a lot of attention right now. It uses linear prediction to convert the audio samples to a series of small, uncorrelated numbers which are stored efficiently, and usually reduces the overall size by about 50% (compared to raw PCM data). The encoding is completely reversible, ie the data can be decompressed into an identical copy to the original, hence the term lossless.
MP3 and other lossy compression methods generally use principles that were discovered by a French mathematician called Fourier in the 19th century. He proved that any periodic signal can be broken down into a sum of simple oscillating functions, and he provided methods and equations for doing so. Although it wasn't his original intent, his theories applies very well to audio -- think of it in terms of sound that can be broken down into frequencies. As it happens, if you store the audio signal as scaling factors of the frequency components, as opposed to sample by sample as described above, size can be reduced dramatically (this is a little simplified but you get the idea). The 'lossy' part primarily comes from the fact that these formats also tend to use psychoacoustic methods to throw away sound components that (allegedly) can't be heard. This type of processing is easier to do once you have the sound wave broken down into its components.
Any digital audio format which will be sent through a DAC has to be converted back into raw PCM first. This is usually done in real-time by the playback device and software en route to the DAC. The DAC does not know whether the audio was previously compressed or not --and with loss-less compression methods it should make no difference anyway -- unless the software or music server is doing something strange in the process of decoding the samples and streaming them to the DAC.
Storage and transmission, S/PDIF and USB
As touched upon earlier, one of the clever aspects of digital systems is the ability to detect and correct errors in stored and transmitted data. There are a multitude of techniques to ensure that data is transmitted without errors, even across unreliable media or networks. The scientific discipline which concerns itself with these problems is called 'information theory' -- it's father is an American mathematician called Shannon who published a landmark paper in the 1940s which established the discipline and brought it to worldwide attention.
Even today there are 2 basic ways to design an error correcting system:
* Automatic repeat-request (ARQ): The transmitter sends the data and also an error detection code, which the receiver uses to check for errors, and requests retransmission of erroneous data.
* Forward error correction (FEC): The transmitter encodes the data with an error-correcting code (ECC) and sends the coded message. The receiver never sends any messages back to the transmitter. The receiver decodes what it receives into the "most likely" data. The codes are designed so that it would take an "unreasonable" amount of noise to trick the receiver into misinterpreting the data.
It is possible to combine the two, so that minor errors are corrected without retransmission, and major errors are detected and a retransmission requested. Incidentally most wireless communication is built like this, because without FEC it would often suffer from packet-error rates close to 100%, and with ARQ on its own it would generate very low goodput.
All error detection codes transmit more bits than were in the original data. Typically the transmitter sends a fixed number of original data bits, followed by fixed number of check bits which are derived from the data by some deterministic algorithm. The receiver applies the same algorithm to the received data bits and compares its output to the received check bits; if the values do not match, an error has occurred at some point during the transmission.
Error-correcting codes are redundant data that is added to the message on the sender side. If the number of errors is within the capability of the code being used, the receiver can use the extra information to discover the locations of the errors and correct them.
Error correction are suitable for simplex communication, ie communication that occurs in one direction only, for example broadcasting. They are also used in computer data storage, for example CDs, DVDs and hard drives.
S/PDIF for digital audio is simplex (one-way). As far as I know it does not have error correction and only very rudimentary error detection but I couldn't find much information on this when I searched the net. Maybe someone can fill in? Certainly it becomes obvious that the standard was designed nearly 30 years ago and in a way such that it would make it easy (read: cheap) for manufacturers to implement. It's a shame that this it is still the prevailing standard, because its weaknesses, although relatively easy to overcome today, will continue to fuel marketing hype and digital audio cable debates...
So let me try to put this into perspective. The Ethernet standard for computer networks has been around since the 70s. The original standard ran at twice the data rate of CDs, but today's Ethernet is capable of nearly 700 times the speed needed to stream CD audio. The standard contains error correction and is duplex, ie data flows in both directions, so the receiver can ask for data to be re-sent when an error is detected. Your computer almost certainly contains an Ethernet chip already but otherwise an Ethernet card can be bought for about £5. An Ethernet cable is a couple of £ and there is no need to pay any more, because errors will be detected and corrected if/when they occur, and even so, the error rates on a cheapo cable are very low. To my mind, this puts significant doubt around the value of spending tens or even hundreds of £ on a digital cable for S/PDIF, but I am aware this is a highly controversial topic...
So then along came DACs with USB interfaces. I for one was excited about this, because the USB protocol contains robust error detecting and error correction mechanisms, so I was hoping it would be the end to over-priced digital audio components and cables. I was equally disappointed to learn that the USB standard offers an 'isochronous' transfer mode which is simplex (one-way) and still offers error detection, but no retry or guarantee of delivery, because no package responses are sent back as part of an isochronous transfer. This is the transfer mode that DACs typically use. Again, disappointingly, it simplifies the the design of the DAC, but has inherent weaknesses in that the data integrity is not guaranteed, very much like S/PDIF.
Fortunately, new DACs are starting to arrive into the market which uses the standard 'Bulk' USB transfer mode. Implemented properly, this should eliminate any theoretical chances of transfer errors, jitter, etc. Specifically, it can ask the host computer to re-send any packets that arrived with errors, and separately, use its own internal clock to drive the digital-to-analog conversion which means no jitter (or, at least, no jitter which has anything to do with the way the data is streamed into the DAC).
You can compare this with a USB printer. Clearly you don't expect your printed document to come out with garbled words or spelling mistakes created by USB transfer errors. Well you won't, because the protocol corrects them. Also, the printer prints at a certain rate, but the computer is not aware of (or doesn't care) exactly how quickly the ink is sprayed onto the paper -- instead, the printer asks the computer for data when it needs it, and loads ahead into local memory so that it always has enough data available to drive printing function.
Wavelength, Ayre and dCs make such DAC products today but I'm hoping this will become standard practise. There is no reason such a DAC has to cost £1000 or more -- remember that you can buy a USB printer for £25 which already implements all of this.
So finally -- do USB sticks and hard drives make a difference?
I doubt it, and there are several reasons.
First of all, they both use error detecting and error correcting codes, so there is no loss of data unless there is a catastrophic failure on the device (which would very much be noticed).
Second, hard drives and USB sticks use filesystems for storing and organizing the computer files and data that they contain. There are many flavours of filesystem around, but typically it will be one of NTFS (Windows), HFS (Mac), Ext (Linux) or FAT/FAT32 (MS-DOS, but still common on USB sticks and portable hard drives because of its simplicity and portability). The filesystem adds another layer of control and integrity on top of the raw data.
Thirdly, hard drives and USB sticks read and write data in 'sectors'. For an given file on your device, the filesystem will contain tables and references which tell you what sectors you need to read to fetch your data -- the file may be split into many parts in various locations on the drive. There is no concept of 'streaming' data at a fix rate from a hard drive or a USB stick -- it will simply read and return the sector(s) that you ask it for, and the speed of this will vary depending on several factors including the capabilities of the drive, whether the data is at the beginning or the end of the disk, etc.
In order to 'stream' audio data at a fix rate from a hard drive or a USB stick you must load the required data into a buffer, and stream data out of the same buffer at a fix rate. As far as the hard drive or the USB stick is concerned, all that matters is that it can read data quickly enough for the buffer to never become empty. If that would occur there would be a drop-out of the sound -- not subtle differences.
In light of all this, it is hard to see how a USB stick from Sony can 'sound' different to one from Kingston.
Sorry for the long post, but I hope that addresses some of the issues...
My background by the way is under and postgraduate engineering and mathematics from university and a professional career in the digital domain (although not audio). Let's just say my current job involves number crunching and fast computers (lots of them).
I'll concentrate on the following:
Myth #1: digital HiFi is at the forefront of digital technology and science doesn't understand and can't explain many of the subtleties involved
Myth #2: many of the limitations from the analog world can be translated and applied to digital audio; specifically, every component in the chain matters, down to USB sticks and hard drives
Obviously if you want to validate my claims and/or read more, please Google the keywords below -- I will try and stick to universally accepted terminology and it should be reasonably clear when I'm quiting facts vs stating subjective opinion.
So what is digital audio?
As you know, microphones use membranes which vibrate to sound, and translates the oscillations into an electrical signal, ie variations in voltage, down a cable. This signal can be amplified and sent to a speaker, which turns the electrical signal back into sound waves by moving its speaker membranes according to the incoming signal.
The electrical signals to/from microphones and speakers is said to be an "analog" of the sound and examples of "analog" signals.
There are several benefits in converting this information into digital format. I will explain some of these benefits further below, but let's look at how it all works first.
The standard representation of digital audio in computers, video and on CDs is called PCM (pulse-code modulation). The basic idea is simple. You feed the analog signal into a analog-to-digital converter (ADC). The ADC samples the analog signal at regular intervals and outputs a series of digital readings ("samples") of the signal. The phrase "digital" by the way means discrete (discontinuous), ie a numeric representation of the strength (amplitude) of the signal at that point in time. This series of digital samples constitutes the digital audio data. A wav file on your computer is exactly this. And because the information is described as numbers, it can be stored on digital media (hard drives, USB sticks, servers, CDs, DVDs, etc) and sent over digital communication channels (eg the Internet, your home wireless network, or the cable between your digital source and your DAC).
The functional opposite of the ADC is the digital-to-analog converter, aka the DAC, which takes linear PCM data and reproduces the original analog signal so that it can be fed to a pre-amplifier stage (and ultimately speakers).
There are 2 key factors to consider here:
a) sampling period, ie the time spacing between samples. This is usually described in terms of frequency and in Hz (ie number of samples per second). So the higher the sampling frequency, the tighter spaced the samples and the more detail you can capture in the original analog signal. There is an upper theoretical limit to the frequencies that can be represented through sampling, the so called Nyquist frequency, which is 0.5 times the sample frequency. So for example, with 44.1kHz sampling, which is the frequency used on CDs, you can theoretically capture and represent anything up to 22kHz. For perfect representation, however, the reconstruction requires an ideal filter that passes some frequencies unchanged while suppressing all others completely (commonly called a brickwall filter). Unfortunately such a filter is unattainable both in practice and in theory, so in reality, 18-20kHz is a more realistic cut-off and of course it's no coincidence then that 44.1kHz has been chosen to represent roughly the frequency band that humans can hear. DACMagic owners will be familiar with these types of filters by the way as they have a couple to choose from. Usually, the filter and its parameterization is a fix part of the design.
b) the resolution (bit depth) of the individual samples. You will have heard CD being described as 44kHz / 16bit - so while the first is the sampling frequency, the second is a measure of the number of possible values each sample can take. 16 bits will give you 65,536 possible combinations, so the sample values in that example range from -32768 to +32767 (minimum amplitude to maximum amplitude).
So what are the key differences between analog and digital representation? First of all, an analog signal has a theoretically infinite resolution. The digital representation has not. But in practice, analog signals are subject to noise and are sensitive to small fluctuations in the signal. It is difficult to detect when such degradation occurs, and impossible to correct when they do. A comparable performing digital system is more complex – but with the benefit that degradation can not only be detected but corrected as well. Examples of this below.
So how do you store the data and what are the challenges?
Formats
There are many audio file formats, generally in 3 categories: uncompressed (eg WAV or raw PCM), compressed (eg FLAC or Apple Lossless), and formats with lossy compression (eg MP3, AAC, Vorbis). The format describes how the audio is organised in the audio file.
The most 'vanilla' format is raw PCM or WAV, which is not much more than the PCM data, sample by sample, organised in a long sequence.
FLAC is getting a lot of attention right now. It uses linear prediction to convert the audio samples to a series of small, uncorrelated numbers which are stored efficiently, and usually reduces the overall size by about 50% (compared to raw PCM data). The encoding is completely reversible, ie the data can be decompressed into an identical copy to the original, hence the term lossless.
MP3 and other lossy compression methods generally use principles that were discovered by a French mathematician called Fourier in the 19th century. He proved that any periodic signal can be broken down into a sum of simple oscillating functions, and he provided methods and equations for doing so. Although it wasn't his original intent, his theories applies very well to audio -- think of it in terms of sound that can be broken down into frequencies. As it happens, if you store the audio signal as scaling factors of the frequency components, as opposed to sample by sample as described above, size can be reduced dramatically (this is a little simplified but you get the idea). The 'lossy' part primarily comes from the fact that these formats also tend to use psychoacoustic methods to throw away sound components that (allegedly) can't be heard. This type of processing is easier to do once you have the sound wave broken down into its components.
Any digital audio format which will be sent through a DAC has to be converted back into raw PCM first. This is usually done in real-time by the playback device and software en route to the DAC. The DAC does not know whether the audio was previously compressed or not --and with loss-less compression methods it should make no difference anyway -- unless the software or music server is doing something strange in the process of decoding the samples and streaming them to the DAC.
Storage and transmission, S/PDIF and USB
As touched upon earlier, one of the clever aspects of digital systems is the ability to detect and correct errors in stored and transmitted data. There are a multitude of techniques to ensure that data is transmitted without errors, even across unreliable media or networks. The scientific discipline which concerns itself with these problems is called 'information theory' -- it's father is an American mathematician called Shannon who published a landmark paper in the 1940s which established the discipline and brought it to worldwide attention.
Even today there are 2 basic ways to design an error correcting system:
* Automatic repeat-request (ARQ): The transmitter sends the data and also an error detection code, which the receiver uses to check for errors, and requests retransmission of erroneous data.
* Forward error correction (FEC): The transmitter encodes the data with an error-correcting code (ECC) and sends the coded message. The receiver never sends any messages back to the transmitter. The receiver decodes what it receives into the "most likely" data. The codes are designed so that it would take an "unreasonable" amount of noise to trick the receiver into misinterpreting the data.
It is possible to combine the two, so that minor errors are corrected without retransmission, and major errors are detected and a retransmission requested. Incidentally most wireless communication is built like this, because without FEC it would often suffer from packet-error rates close to 100%, and with ARQ on its own it would generate very low goodput.
All error detection codes transmit more bits than were in the original data. Typically the transmitter sends a fixed number of original data bits, followed by fixed number of check bits which are derived from the data by some deterministic algorithm. The receiver applies the same algorithm to the received data bits and compares its output to the received check bits; if the values do not match, an error has occurred at some point during the transmission.
Error-correcting codes are redundant data that is added to the message on the sender side. If the number of errors is within the capability of the code being used, the receiver can use the extra information to discover the locations of the errors and correct them.
Error correction are suitable for simplex communication, ie communication that occurs in one direction only, for example broadcasting. They are also used in computer data storage, for example CDs, DVDs and hard drives.
S/PDIF for digital audio is simplex (one-way). As far as I know it does not have error correction and only very rudimentary error detection but I couldn't find much information on this when I searched the net. Maybe someone can fill in? Certainly it becomes obvious that the standard was designed nearly 30 years ago and in a way such that it would make it easy (read: cheap) for manufacturers to implement. It's a shame that this it is still the prevailing standard, because its weaknesses, although relatively easy to overcome today, will continue to fuel marketing hype and digital audio cable debates...
So let me try to put this into perspective. The Ethernet standard for computer networks has been around since the 70s. The original standard ran at twice the data rate of CDs, but today's Ethernet is capable of nearly 700 times the speed needed to stream CD audio. The standard contains error correction and is duplex, ie data flows in both directions, so the receiver can ask for data to be re-sent when an error is detected. Your computer almost certainly contains an Ethernet chip already but otherwise an Ethernet card can be bought for about £5. An Ethernet cable is a couple of £ and there is no need to pay any more, because errors will be detected and corrected if/when they occur, and even so, the error rates on a cheapo cable are very low. To my mind, this puts significant doubt around the value of spending tens or even hundreds of £ on a digital cable for S/PDIF, but I am aware this is a highly controversial topic...
So then along came DACs with USB interfaces. I for one was excited about this, because the USB protocol contains robust error detecting and error correction mechanisms, so I was hoping it would be the end to over-priced digital audio components and cables. I was equally disappointed to learn that the USB standard offers an 'isochronous' transfer mode which is simplex (one-way) and still offers error detection, but no retry or guarantee of delivery, because no package responses are sent back as part of an isochronous transfer. This is the transfer mode that DACs typically use. Again, disappointingly, it simplifies the the design of the DAC, but has inherent weaknesses in that the data integrity is not guaranteed, very much like S/PDIF.
Fortunately, new DACs are starting to arrive into the market which uses the standard 'Bulk' USB transfer mode. Implemented properly, this should eliminate any theoretical chances of transfer errors, jitter, etc. Specifically, it can ask the host computer to re-send any packets that arrived with errors, and separately, use its own internal clock to drive the digital-to-analog conversion which means no jitter (or, at least, no jitter which has anything to do with the way the data is streamed into the DAC).
You can compare this with a USB printer. Clearly you don't expect your printed document to come out with garbled words or spelling mistakes created by USB transfer errors. Well you won't, because the protocol corrects them. Also, the printer prints at a certain rate, but the computer is not aware of (or doesn't care) exactly how quickly the ink is sprayed onto the paper -- instead, the printer asks the computer for data when it needs it, and loads ahead into local memory so that it always has enough data available to drive printing function.
Wavelength, Ayre and dCs make such DAC products today but I'm hoping this will become standard practise. There is no reason such a DAC has to cost £1000 or more -- remember that you can buy a USB printer for £25 which already implements all of this.
So finally -- do USB sticks and hard drives make a difference?
I doubt it, and there are several reasons.
First of all, they both use error detecting and error correcting codes, so there is no loss of data unless there is a catastrophic failure on the device (which would very much be noticed).
Second, hard drives and USB sticks use filesystems for storing and organizing the computer files and data that they contain. There are many flavours of filesystem around, but typically it will be one of NTFS (Windows), HFS (Mac), Ext (Linux) or FAT/FAT32 (MS-DOS, but still common on USB sticks and portable hard drives because of its simplicity and portability). The filesystem adds another layer of control and integrity on top of the raw data.
Thirdly, hard drives and USB sticks read and write data in 'sectors'. For an given file on your device, the filesystem will contain tables and references which tell you what sectors you need to read to fetch your data -- the file may be split into many parts in various locations on the drive. There is no concept of 'streaming' data at a fix rate from a hard drive or a USB stick -- it will simply read and return the sector(s) that you ask it for, and the speed of this will vary depending on several factors including the capabilities of the drive, whether the data is at the beginning or the end of the disk, etc.
In order to 'stream' audio data at a fix rate from a hard drive or a USB stick you must load the required data into a buffer, and stream data out of the same buffer at a fix rate. As far as the hard drive or the USB stick is concerned, all that matters is that it can read data quickly enough for the buffer to never become empty. If that would occur there would be a drop-out of the sound -- not subtle differences.
In light of all this, it is hard to see how a USB stick from Sony can 'sound' different to one from Kingston.
Sorry for the long post, but I hope that addresses some of the issues...