I'd be more concerned about the coloration of the SET than the horns.
Distortion is one of the main reasons we hear differences in amplifiers. For example, the brightness and harshness of solid state is not due to a frequency response error; its caused by higher ordered harmonic distortion (although at what is often called a 'low level'). SETs OTOH tend to make a prominent 2nd order, which results in 'bloom' and 'warmth'. From these two examples we can see that the ear is far more sensitive to higher ordered harmonics; this is because it uses them to gauge sound pressure.
But there is more to it than just that! The presence of the 2nd or 3rd harmonic (both of which are treated by the ear in the same way) can mask the presence of the higher orders if that 2nd or 3rd is present in enough quantity. This is why tube amplifiers sound smoother than solid state, even though they have more higher ordered distortion.
So far this is all easily confirmed both by listening and measurements- they are in agreement.
Now a circuit that generates a 2nd order as its primary distortion component has what is known mathematically as a 'quadratic non-linearity'. If the circuit generates a 3rd as the primary distortion component it has what is called a 'cubic non-linearity'. If the former is present, as the order of the distortion is increased, it falls off at a slower rate than a circuit having the latter.
So we can see that if we can build a circuit with a cubic non-linearity (assuming that real linearity is out of reach) it will be less colored as it will have less distortion, not the least of which is the fact that the primary distortion will be at a lower level than seen if a quadratic non-linearity is the main influence.
So now the question is- how do you do that?? In simplistic terms, these two differences in distortion are seen in single-ended circuits as opposed to fully differential and balanced circuits.
If you mix the two types of circuits, as is common in traditional tube amplifiers like the Dynaco ST70 (single-ended input, push-pull output) you'll get both types of non-linearities. When this happens a more prominent 5th is present, and distortion drops off at a slower rate as the order of teh harmonic is increased.
So far I've kept the effects of negative feedback out of this conversation. But it has a dramatic effect on what we hear. But I think I'll leave that for later, as I only have so much time
Anyway, one advantage of SETs is that as power is decreased, distortion falls off to unmeasurable. This is all about that 'first watt' which has to be clean. SETs are pretty good about this. By comparison, large push-pull amps are not, since below a certain power level (usually around 5% or so), distortion starts to go back up. This is why SET users often talk about that 'magical inner detail'.
But if you have a fully differential circuit (and no feedback) you can also achieve this linearly decreasing distortion curve as power is decreased, with no rise in distortion at lower power levels. And of course you can build small push-pull amps if you want; these are rare in good quality and I would surmise never compared directly watt for watt to SETs.
SETs are traditionally power challenged as it is very difficult to build an output transformer that has bandwidth and power at the same time. This is why the 300b was King in the 1990s, why the 2A3 ascended the throne 20 years ago and why the type 45 power tube (good for about a watt) is now the object of admiration. But if you have a fully differential amplifier you can have the power and bandwidth
at the same time, while using the same power triodes (if you want).
So IMO/IME there is no argument for SETs, other than someone simply not having heard something that is both sonically and measurably better.
Some argue that SETs are very dynamic and it is often true that they sound that way. But in reality, what is happening is the amp is being driven hard enough that the higher ordered harmonics are showing up as the distortion of the amp is increased (SETs are often 10% THD at full power), and where the power is needed is often transients. Since the ear uses these harmonics to sense sound pressure, and since they are showing up on transients, presto- you have an amp that sounds 'dynamic' but its really distortion masquerading as such. Simply by you're reading this, I may have ruined it for you because of how our brains process music. To avoid this phenom, the speaker should be efficient enough that this does not occur: hence horns.
A mark of any good system is the quality where it does not sound loud even when it is. That takes clean power which SETs cannot provide except at low power. This is often why SET owners will tell you that '90dB is plenty loud enough for me'. If the higher ordered harmonics were not present, it would be natural to turn the volume up higher.