On the subject of networking with the Extreme:
I'm taking a few shortcuts here and am simplifying a few things in an attempt to keep it somewhat readable.
Your network setup does influence sound, every component connected to it has an effect, even your mobile phone using your wi-fi in a way totally unrelated to streaming introduces activity on all network ports and cables. This includes browsing your favourite What's Best Forum. There is a way to reduce this with smart switches/routers, or by using VLANs to segment your network, but this is advanced networking, not typically used in domestic situations.
All network activity causes noise, every data packet travelling your domestic network introduces electrical activity travelling your entire network which is just 1 "subnet".
This also means your music sever will "see" all data packets travelling your network, it will "investigate" every packet to check if it contains data adressed to it.
A domestic switch will simply replicate all data on it's input to all it's outputs, a smart switch provides you with a degree of control over this, so you can segment your network, reducing network traffic on specific links. Again this is advanced networking, none of the "audiophile" switches support this. Now before you think "gotta have", smart switches apply processing to the data stream, investigate certain parts of all data packets passing through, use more power, and have a noise signature. So there are pluses and minuses to using this in the first place. Network utilisation and the amount of active devices in your network are going to be determining factors if this can net out positive or not.
100Mbit networking uses 2 differential data pairs, 1Gbit networking uses 4. Data is transported as a modulated voltage over these lines. Modulating voltage introduces certain types of noise. In a switch the ports are galvanically decoupled by means of a differential transformer of which the center tap is connected to ground, usually through a simple filter network. One of the functions of this is to break "ground current paths".
Moving to fiber, we have optical links, SFP modules convert electrical signals to light pulses and vice versa. So arguable there is no real benefit to reducing network activity induced noise, in fact there is additional activity inside your appliances from this conversion process. A SFP module can easily consume 1 to 1.5 watts of power, which does not seem like much, but at this level, it is a lot. On the plus side there is no path for ground currents or electrical noise, whatever the source, travelling your fiber links. There are many types and makes of SFP modules, obvious differences can be found in power consumption efficiency, robustness, error correction, quality of optical receivers/transceivers etc. SFP+ (10G) modules can apply a higher degree of error correction, some even have built in "reclocking or jitter reduction" and yes this draws more power, so positives and negatives. Industrial versions are built to operate in harsh environments, like abnormal temperatures, heavy vibration environments, or in strong RFI/EMI polluted areas. What you can get buying industrial grade is better component quality and tolerance, higher selection grades, more robust PCB mounting and/or layout, better error correction algorithms, better filtering and most of the time lower power consumption. The downside is hefty price tags. I do have a few here.
Your internet router performs quite a bit of processing, it almost always performs something called NAT (Network Address Translation) meaning it forwards traffic from the internet to a different Ip range which you use inside your home. There is both a security and a functional aspect to it as without it each of your devices would require an unique IP address on the whole world wide web, and there is a limit to addresses available, that is why we are for example moving from the IPv4 protocol to IPv6 which has a vastly higher number of IP addresses available. A security aspect is your device cannot directly be accessed from any other device in the world. The router usually also provides DHCP services (assigns an unique address to each device on your local network), can provide DNS caching and often runs firewall software. It also often provides Wi-Fi services. It can be quite a busy device.
By now it must be clear that this is a very complex system with a lot of variables in play. Every network is likely to be unique. Different routers, different switches, different devices using it, different traffic patterns, it is unlikely that there are 2 exact identically performing network setups anywhere in the world at any given time.
Now how does all of this influence playback quality of the Extreme? Well it does, no way around it. So what we have done is running a whole lot of different network setups and combinations to identify the largest disturbances to sound quality. You can take measures to minimize their influence and get repeatable results up to a degree.
The copper network port of the Extreme will provide you with good and repeatable sound quality in virtually all environments. It will sound largely similar in all environments, even in the presence of heavy RFI/EMI pollution.
The fiber network port of the Extreme provides a somewhat different perspective, it will not very significantly impact the overall sound quality or voicing. The plus side is a certain degree of "isolation" (there is really more to it then that but it's to complicated to elaborate on that, the terminology suffices for its purpose). The down side is additional processing and a slightly higher power consumption.
It tends to net out positive with for example blacker backgrounds, improved clarity and more focus without impacting voicing. The downside of the increased focus is "sharper edges" to images, and some SFP modules can introduce a degree of mechanical quality to the sound, the reclocking SFP+ modules being about the worst at that.
So we have recommendations we make, based on repeatable results in different environments, the recommended SFP modules and FMC are based on that. There are combinations which give an impression of higher resolution but it's important to note increased noise is often perceived as increased resolution. The fatiguing aspect of this usually goes unnoticed as comparative listening sessions are often of short duration with a few test tracks people skip through quickly to remember enough detail to make a meaningful A/B comparison. It is rarely evaluated long term, being over weeks, listening in different moods/mindsets, at different levels of physical or mental fatigue, at different times of the day with varying levels of power grid pollution, or how do you perceive the difference in the first 30 minutes, and then after a few hours of continuous listening. There are again a lot of variations to evaluating.
Therefor our recommendation is to just use copper networking initially, let the Extreme burn in / settle in your environment, so far it has performed to full satisfaction by everybody who has bought one using it this way. Apply basic voicing measures as you would do with any appliance, like powercords, usb cables, footers etc, to adjust it to your taste. Then, if you feel so inclined turn to tweaking your network environment. And don't take anything for granted there, as your results are not guaranteed to mirror others.
It is relatively risk free to jump straight to using fiber, when used with the components we have tested long term, in various environments, but it is really optional. It is a relatively minor investment with value for money gains though. The downside is it has a "manual", if you power cycle the server you sometimes have to power cycle the FMC too, or pull the copper network cable from it, so it generates a link fault resetting the interface. But that is really quite a minor issue.
M12 is an Industrial version of UTP/copper networking, so overall at a higher level of performance, it is the best I've heard so far, it's also VERY expensive. It connects to the copper network port of the Extreme. If money is no object, the telegartner gold is likely to perform best, but I have not auditioned that specific model personally.
Now keep in mind network tweaking, and audiophile networking products are relatively new, surely there are gains to be made there. But do be aware of all aspects of performance. The Ether regen looks nicely done, I have one on order, initial feedback is promising and it's not outrageously priced. The downside is it runs at 100Mbit, meaning file copying can be slow. For bulk copying you'd always have the option to temporary replace it with a 1Gb switch though. I do look forward to testing it.