I think measurements can predict preference with good reliability up to a certain point. And maybe perfect measurements + perfect analysis would give perfect predictions up to the point where individual preferences diverge, but I'm not there yet.
When doing a speaker design I always start with an extensive suite of measurements which takes days to run. Assuming the drivers are not rejected because of problems which showed up, the crossover design then starts. Because of all the measurements I have a very good idea of what will be happening out in the room, which is imo important.
I have a specific target curve in mind but there will inevitably be forks in the road where choices must be made. And those choices are made by ear. I always enlist a second set of ears which is free from my unavoidable bias as the designer who knows which curve looks the most "right". Then once the design is as finished as I can get it, I ship it off to be auditioned by sets of ears (calibrated by a very good reference system) which have proven to be superior to my own. Fortunately I have all these measurements on file so that when feedback comes in, I can translate their words into a specific crossover adjustment.
We don't have the facilities for double-blind testing but we do use single-blind testing in the final phase, where the gradations are small.
Regarding the predictive power of Harman's Spin-o-rama suite of measurements (of which mine is a more laborious version) backed by their extensive controlled blind listening tests, I think it's great that they find reliable correlations between measurements and subjective preference. I'm geek enough to get into that sort of thing, but skeptic enough to not make it my "audio religion".
Here are four limitations of Harman's system for predicting preference from measurements which ime DO NOT get enough attention from my fellow geeks:
1. The listening test were single-speaker vs single-speaker. The speakers are each quickly shuffled to the same location along the centerline of a 21-foot wide, 30-foot-long well-treated room. So the speakers are abnormally far from the sidewalls; not getting much in the way of boundary reinforcement; and the relatively short decay times of the super-well-treated (though not overly dead) room will favor speakers with a wide radiation pattern. Narrow-pattern speakers will tend to sound too dry in such a room.
2. The measurements do not reveal non-linear distortions nor time-domain distortions, aside from their relatively minor effects on frequency response. In general non-linear distortions are not an issue unless a speaker is pushed beyond its linear excursion and/or thermal limits, but multiple researchers find the time domain to matter. And distortions which arrive later in time (such as diffraction), and are therefore less likely to be masked by similar sounds, may be unexpectedly audible and objectionable. (Revel's top-of-the-line Salon 2 is a very low-diffraction design, so they know that it matters, even if it's not highlighted by their Spin-o-rama measurements.)
3. The tests do not adequately investigate spatial qualities because they are comparing single speakers. Yes you can hear some spatial information from a single speaker, but that (imo) may largely be a function of radiation pattern width in many cases: More in-room reflections will make the speaker less obviously the sound source. If a pair of speakers is particularly good at recreating a holographic image of the recording, that will not show up. And spatial qualities apparently matter a great deal: According to a study by Wolfgang Kippel, cited in Toole's book, spatial qualities make up 50% of our perception of "naturalness" (realism and accuracy), and 70% of our perception of "pleasantness" (general satisfaction or preference). Not making an adequate subjective evaluation of a speaker's spatial qualities is an arguably major flaw unless comparisons are limited to speakers with very similar spatial qualities.
4. The sample of speakers evaluated is too small and non-diverse to make general extrapolations universally reliable. My understanding is that 70 single speakers were evaluated under controlled double-blind conditions, and from what I can tell, most of them were obvious competitors to Harman's offerings (in particular the Infinity and Revel lines). So the study was about "what beats our competitors", not about "what's the best way to make speakers". Unfortunately many people think the study has said all there is to say about the best way to make speakers, but look closely and it becomes obvious that was not the focus. For instance, conspicuous by its absence was the bipolar Mirage M1, which was Floyd Toole's speaker of choice after it performed exceptionally well in blind testing while he was in charge of the Canadian NRC. Anybody else curious about why it sounded so good? If so, your best bet is probably to read
Stereophile's review.
Lest you think the above gripes are just the sour grapes ravings of a geek who's not part of the inner circle, these same limitations are mentioned (though not in as much detail) in one of Sean Olive's papers on the subject, "A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements: Part II – Development of the Model":
"LIMITATIONS OF MODEL
"The conclusions of this study may only be safely generalized to the conditions in which the tests were performed. Some of the possible limitations are listed below.
"1. Up to this point, the model has been tested in one listening room.
"2. The model doesn’t include variables that account for nonlinear distortion (and to a lesser extent, perceived spatial attributes).
"3. The model is limited to the specific types of loudspeakers in our sample of 70."
My understanding is that Harman's model is roughly 87% reliable at predicting listener preference based on their tests (which have the above limitations). That's very good, but it means that even with their (imo inadequately diverse) sample size, there are things going on which the model doesn't explain. That may not matter to most people, but my guess is that it matters to some. And those people really have no recourse but to trust the final say to ears.