I agree, what's happening at mid and high frequencies does not translate directly to low frequencies, because, as you say, "for lower frequencies, we hear differently."
It takes at least one wavelength before we even begin to register bass energy, and multiple wavelengths before we begin to register pitch. Given that bass wavelengths are longer than the reflection paths in our rooms, we cannot perceptually separate the "direct sound" in the bass region from the later-arriving reflected sound, at least in the size rooms we have for home audio. The good news is, this means that in-room bass measurements are reliable predictors of perception, unlike in-room measurements further up the frequency range.
That being said, the more closely the in-room reflections approach a continuum, the more effectively the ear's critical-band averaging characteristic perceptually smooths out the room-induced peaks and dips.
The idea is to activate so many modes - to generate so many room-interaction peaks and dips by having multiple subs in acoustically very different locations - that we no longer have lone peaks and dips sticking out like sore thumbs. If a particular modal peak is only strongly excited by one out of four subs, then instead of that peak sticking out 8 dB above the average, maybe now it only sticks out 2 dB above the average. (The actual interaction between the outputs of the different subs is more complicated than this implies because the reflections are interacting in semi-random phase, which results in a lot of decorrelation in the bass region; the hypothetical in the preceding sentence is just for illustration.)
In practice there is often still some frequency region that rises a few dB above the average. Because the spatial variance is decreased, this issue is probably present throughout the room, rather than being confined to a small area as is typically the case with a single sub. This is beneficial because now EQ can be used to fix the issue WITHOUT the likelihood of simultaneously ruining the frequency response elsewhere in the room.
Here is Earl Geddes, he is the originator of the asymmetrically-distributed multisub paradigm that I subscribe to. I suggest watching at least the first 16 minutes or so:
And Matthew Poes again, cued up to where he describes his version of Earl Geddes' setup approach: