Do you measure the response in the room as well .. Curious if flat measuring speakers are flat in a room or do they need to be DSp'ed ?
Here are my thoughts regarding need for speakers to measure good based on my limited understanding and not necessarily w.r.t sounding good. Smoothness and consistency of the frequency response both on-axis and to off-axis angles is often is a good target to achieve from a speaker design point of view. The flatness of the on-axis response can be sacrificed to some extent if that means we will be able to get more consistent response to other off axis angles both horizontally and vertically. This will also result in smooth power response and directivity natively. To achieve these targets unwanted acoustical resonances should be controlled, the speaker box/acoustical construct must have appropriate shape to aid sound radiation, good quality drivers, and the good overall acoustic concept in terms of directivity control and other things etc should be well thought out and implemented. Finally for all this to sound good, it will also depend a lot on preferences of people. As an example, In terms of measurements, some people prefer wide directivity. Some prefer narrower directivity. The directivity is a crucial factor involved in often talked about things like overall tonality, sound stage dimensions, imaging etc.
Often, in the past, there was a misconception that flat on-axis should be the target response for a speaker. On-axis is just one direction out of all the direction the sound radiation produced by the speaker travels to. So, logically for it all to sound "consistent" the response to off-axis is often more important. Good quality drivers put in an appropriate construct (box etc) will take care of most of these things. Then comes the aspect of the crossover. The crossover also impacts directivity and therefore drivers must be crossed over with a good amount of attention paid to the directivity transitions between the drivers. Otherwise, power response dips, power response humps, and resulting directivity changes will have significant audible impact on the overall sound like harshness and general unpleasantness.
Another reason for needing uniformity in frequency response is if one wants to EQ via DSP or other analog circuits. While applying EQ, if the speaker frequency response is not consistent/smooth both ON and OFF axis, we may introduce unwanted peaks and dips in frequency responses to off axis angles. By the above explanations, it can potentially create an unwanted overall response from the perspective of a listener. Another aspect w.r.t ability to DSPiability is that the drivers should have good power handling capabilities and linearity to be able to handle boosts/cuts well.
Next comes the room. Room treatment atleast to some extent is necessary with common wide directivity boxed speakers as they illuminate the room significantly. So aspects of directivity control will play a significant role here also. In order to reduce the room-induced peaks and dips in the response, cardiodid, super/hyper cardiod type radiation patterns aim to mitigate the room interactions to some extent. the Kii3, Dutch and Dutch 8C, genelec 8351etc are some examples of speakers which incorporate above aspects into design. Floor bounce/ ceiling bounce etc will all try to 'color' the sound. Again, the directivity control implemented is important here.
These are a limited set of explanations as to why good measuring speakers are needed in general. I am stopping here and not stepping into other measurement-relates aspects as I am really tired of typing..