As you say, level of engagement is a personal is your mental state vs. my mental state. It might also be substance dependent

. Live music itself doesn't often give a high degree of emotional engagement. So, if the real thing doesn't always do it then one shouldn't expect a stereo system to always produce EE.
As I said above, suspension of disbelief doesn't really have to do with EE. It has to do with believability of the presentation that it sounds real. If your aural memory is well adjusted to live, unamplified music and you hear a system that just nails the tone, space (imaging and soundstage), dynamics etc. such that closing your eyes gives you a believable presentation then that can result in suspension of doesn't mean that you actually forget you are listening to reproduction but that it has the simulacrum of a live performance.
Of course where your system is along that curve of realness of sound is somewhat subject, although I would argue that those who are well versed in live music (particularly that music up close like most recordings) will converge on system types that do similar things well that translate to lack of audible artifacts and high dynamics.