Here is my attempt to extract some information from the step response plot. Technically, the step response is the integral of the impulse response of a device under test (DUT), here it being a 3 way speaker with driver responses shaped by the crossover circuit.
For a loudspeaker, the step response tells about the "time coherence" of the the different drivers tied together by the crossover. In a typical 3 way speaker with the tweeter, midrange and the woofer placed on a flat baffle, for a typical measurement axis (around the tweeter center or somewhere around it), at the mic position, the tweeter response arrives first, followed by the midrange response, followed by the woofer response.
In the above picture, the initial peak around zero seems to be the tweeter response, then the negative peak around 200us appears to be that of the midrange and the positive peak around 1ms appears to be that of the woofer.
The fact that all three peaks look clearly separated tells that the speaker is not time-coherent/drivers are not "time-aligned". In fact an estimate of the distance between the acoustic centers of the tweeter, midrange, and woofer can be made by computing the delay between the peak locations and converting it into distance (in mm or cm).
Ideally we want the step response to be like a right triangle, the initial sharp rise followed by a gradual decay (and peaks not going up and down). As the step down from that ideal, we would like to see smooth transitions all around in the response, without any abrupt peaks and dips.
Since time delay between drivers also indicate about the phase difference, this response also speaks about the relative phase differences between drivers. For example, we can see that in this case, the tweeter and woofer appears to be in phase, whereas the midrange is 180 degree out of phase with the tweeter and woofer.
Time-coherent devices are good from a theoretical perspective, but how much of that translates into listening experience is a controversial subject..
Frequency response plots both on and off axis (hence the directivity) in combination with other things seems to have more correlation with perceptual sound quality (allthough the impulse response, step response and frequency response are all tightly connected by math).
More info could be gleaned from the step response by more knowledgeable/experienced people. So we can wait for others to comment more about more step response related aspects.
Thanks
Vineeth