It does seem unlikely that poor workmanship would be duplicated so identically in all four locations that it is causing the same problem to occur everywhere. One would have to guess that the problem is further upstream.
Q 5 Is this data center simply too complex to be reliable?
Without access to the drawings, there is no way to answer this. However, every designer knows that reliability can often be reduced by complexity, and complexity is something that every article about the NSA facility has emphasized. Would that be the cause of major electrical faults? It’s doubtful, but overly complex designs could have something to do with those fault surges getting through and damaging other equipment.
Did the designers get caught up in the intrigue of a huge project and try to make it ultra-reliable through overly sophisticated engineering? Was there insufficient time for a real value engineering review? A true “VE” exercise should result in the streamlining of designs to eliminate components and complexity that add cost while degrading reliability or without substantively improving performance. (Today, unfortunately, “VE” usually just means cost cutting.) There are few projects where it is as important to have a good VE review as in high availability data center designs.
Q 6: Does the facility just meet minimums, or is it more robust where needed?
In the old motion picture “The Towering Inferno” the architect says to the contractor: “Code isn’t good enough for this building. This building has to be better than code.” This doesn’t mean that data centers should be over-specified to the point of energy inefficiency or wasteful budget. But codes and standards are “minimum performance” documents, and there are places where minimums should be critically evaluated when designing a mission critical data center, especially one of this magnitude.
Conductors in these facilities are often oversized, to provide for growth, configuration flexibility, and reduced heat. Switchgear is often more robust than it might theoretically need to be. Cooling towers, pumps and chillers are conservatively selected. And, as noted in Q7 below, ground resistance must be way less than the 25 Ohm NEC maximum. But as already noted, government may want “the best”, but isn’t likely to budget for it. So it would not be surprising if this facility, which undoubtedly warrants better than “minimum acceptable” has been relegated to that level by cost and time constraints that may again prove the “penny wise and pound foolish” adage.
Q 7: Is the grounding system appropriate to a facility of this magnitude and sophistication?
The code maximum of 25 Ohms ground resistance [NEC 250.53(A)(2)] is way too high for the ground system of a mission critical data center. The ANSI/J-STD-607-A, which should be used for data centers as well as telecom installations, calls for a maximum of 5 Ohms. And hopefully, no one included Isolated Grounds or “IG” circuits in the design, although they would not likely have anything to do with the present problem.
In a data center IG circuits are a waste of copper and a potential source of multiple ground paths and eddy current problems. But proper installation of a technology ground reference system is not easy to achieve. Electricians like to make nice neat bends in wires, but the high-frequency spikes that grounds are supposed to drain away don’t go around sharp corners. Getting electricians to properly radius every bend can be challenging, particularly when a large number of craftsmen are working on a high time pressure project. Since a robust ground reference is necessary for breakers to clear fault conditions before they can propagate upstream, might we question whether sufficient attention was given to this part of the design and installation?