TLDR; it’s best not to put much stock into measurement graphs.
I’m with
@Gray here… if you knew how much the graph changes just from wiggling the position while the contact point of the pads don’t move, or a different width measurement head, you would put far less faith in graphs. This is why good measurements require an averaging of several “seatings,” and dialing in a decent data set takes time.
Furthermore, whether the rig is an expensive full head and upper torso simulator, a pair of ears on a stick, or just a straight tube with a mic on one end, universally a headphone will NEVER sound the same as it is measured. It may seem to reviewers to be a method to add some objectivity and legitimacy to their opinions, but the big headphone manufacturers understand there are still some inaccuracies and variable results with the full measurement rigs that cost well above $50k. If you buy a head and torso simulator, they come with a factory measured frequency response to show how that individual sample varies from other measurement heads from the same production line. So, they are not comparable to eachother. In fact, your own ears act as a filter, which colors and “EQ’s” the sound of everything you hear, more uniquely than a fingerprint. Your “neutral” is different from someone else’s “neutral.” At best, reviewers disclose how frequencies above a certain pitch can no longer be accurately measured, but they don’t empathize this point which would decrease the perception of legitimacy that they are going for with graphs in the first place, and sometimes gaslight the reader by saying the reader won’t know how to interpret them anyway.
Many measurement aficionados apply a compensation curve that is their “best guess” on how to change a graph from what was measured to what they feel they heard. Since everyone’s guess (and target curve) is a bit different, this is one more reason measurements done by different entities aren’t comparable to eachother.
Lastly… measurements can easily be influenced and manipulated. If you put a rubber band around the head and earcups to increase the clamp pressure, they measure more linear and usually gain better bass extension. If multiple measurements are taken, they can select which ones can back up an opinion they want to make in a review. Sometimes they are tested at unrealistically high volumes in order to have the headphones function outside their intended use to show them “failing” in some way. They can also be visually misleading in comparisons, because they can choose how to overlay the two graphs to emphasize or de-emphasize differences. And reviewers who use graphs rarely give equal weight to brain burn-in… our amazing ability to get used to sounds and compensate them to sound more “natural”over time.
Don’t prostate yourself at the altar of graphs. At best, it’s an imprecise science and a crutch to try and tease apart minute differences, but at worst they are tools used to mislead and support a narrative. They only measure amplitude sensitivity at a given pitch; far too much conjecture is made from these two aspects of data. And if a reviewer is overly positive or negative, ask yourself “where does the money go?”