MajorFubar said:
Well I'm probably into the realms of conjecture again, but I'd say a faster sample-rate is more important than a greater bit-depth. In analogue-recording terms it's like increasing the speed of the tape (well sort of), where bit-depth is more like increasing the track-width (well sort of).
Why should it make a difference? Firstly, the Nyquist rule isn't wrong, it's just that - for example - a 10kHz tone sampled at 44.1kHz has only four distinct samples per cycle. With clever filtering that's enough to get a decent approximation of the sound on replay, but it's still pretty obvious that a greater number of samples in the first instance will do a better job. And real music isn't anywhere near as simple as a solitary 10kHz sine-wave.
That's my take on it, anyway.
Major,
There are many fields in engineering where a layman's common sense will allow a good grasp of the fundamentals. Digital audio isn't one of those fields. Mental images of sinewaves turned into little staircases by sampling and the feeling that 'more samples are better' doesn't fully capture what is going on.
A 10KHz tone sampled at 44.1KHz will give a perfect rendition, not a decent approximation. The fact that it is 'pretty obvious a greater number of samples will do a better job' is layman's common sense conjecture getting in the way of the math.
As you rightly point out, music isn't a 10KHz sine wave, but what it is (for the purposes of digital audio), is a complex signal that is band limited to 20KHz. When sampled at a frequency greater than 40KHz, all the information will be captured.
So, you now have a mental image of a sine wave with a little squiggle on the peak of one cycle - how does that get captured? That little squiggle can be thought of as being made up of a combination of sine waves of different frequency - its frequency components. As the signal is band limited, that squiggle cannot have any frequency components above 20KHz - we know that because we have filtered them out. We also know that if we sample at least twice the maximum frequency we can recreate all the component frequencies of the squiggle, and therefore the squiggle gets fully captured.
I think the problem comes from not mentally putting our imaginary spiky 'sampling won't get this signal right' waveform through an imaginary low pass filter first before we do our imaginary sampling.
Higher sampling frequency does have benefits - most notably it gives an easier time to the analogue filter stages after the DAC, but that is nothing to do with more samples doing a better job, just the realities of analogue filter design.