Data Center News - CoreSite Connect[ED] Blog

CoreSite Delivers Customized Liquid Cooling System to Support GPUaaS and Enterprise AI

AI is unquestionably the hottest technology across virtually all industry sectors. It’s also dependent on an infrastructure that is best delivered only in select, AI-ready and certified data centers. In this blog, I’ll focus on an AI-specific production environment – GPU One, a GPU as a Service (GPUaaS) neocloud from STN, Inc., offered at CoreSite’s CH2 data center.

But there’s more to the story. I’ll also describe the collaboration with our partner Atomatic Mechanical Services to design and build a customized direct-to-chip liquid cooling system to ensure the continuous operation of GPU One.

The Back Story

The cooling system supporting the GPUaaS deployment in CH2, designed and installed by Atomatic Mechanical Services.

STN is a next-generation infrastructure company specializing in cloud services (including neocloud), managed IT services and secure AI platforms for companies building the future. From high-growth AI startups to enterprise innovation teams, STN helps customers navigate complexity with expert consulting, vendor-agnostic strategy and data center infrastructure that’s ready for live deployment of AI workloads.

STN approached CoreSite about hosting GPU One in CH2. Having worked with CoreSite before and seen CH2, STN was confident it was an ideal data center in which to deploy its flagship service. CH2 was purpose-built to support the full range of customers’ computing requirements, with the operational resilience and infrastructure needed to provide the power, cooling and interconnection capabilities enterprises require to implement AI, ML and other high-density applications.

Other factors leading to this choice were:

  • CH2’s NVIDIA DGX-Ready Data Center program certification
  • CoreSite’s ability to design and build, in collaboration with Atomatic, a custom liquid cooling system specifically to support STN’s deployment, which comprises 1,536 NVIDIA B200 GPUs running on extremely high-performance servers deployed across 24 racks.
  • The near-zero-latency, direct connections to major cloud service providers.
  • Data center site-to-site interconnection management via CoreSite’s Open Cloud Exchange® (OCX) and intra-site cross connects.
  • CH2’s extraordinary load-bearing architecture, enabling it to support the massive weight (over 120,000 pounds) of the GPU One installation and dedicated liquid cooling system.

Last, but certainly not least, was the Illinois Department of Commerce and Economic Opportunity Retail Sales Tax Exemption program. Qualifying data center customers can receive a 10.25% sales tax exemption on equipment and software costs on items deployed within CH2.

Another critical factor for the GPU One installation was CoreSite’s data center operations staff’s collaborative culture, which made working with Atomatic smooth right from the get-go. Let’s look closer at how that helped drive the success of this one-of-a-kind solution.

Atomatic’s Mission-Critical System Experience Was … Mission-Critical

A welder working on the underfloor piping beneath the deployment. 

CH2 was built to support liquid cooling for high-density workloads, but GPU One represents a new type of cloud offering – neocloud. It was clear that bringing in a systems provider with mission-critical infrastructure experience and data center expertise would be needed for the project.

Sometimes the stars align. I found out about Atomatic at an industry conference, right about the time we were working through STN’s requirements. I was introduced by one of the board members and engaged with Paige Fugger, who is a Sales Engineer, and soon was in also touch with Bryan O’Neill, the Vice President of Construction. What intrigued our team was not only that Atomatic has worked with data centers, but they have extensive experience in the mission-critical healthcare industry, installing complex HVAC and chilled water-type systems and environments where uptime and cleanliness are literally a matter of life and death. They also have proved that they can work without interrupting operations, whether it’s in a data center or a hospital.

Atomatic has capabilities that streamline design, fabrication and project execution. They were able to develop the schematics, pre-fabricate parts of the pumping system at their facilities to accelerate the construction schedule, and plan how and when their LU 597 Pipefitters and other technicians would be in CH2.

That last remark doesn’t truly represent the depth of collaboration involved. But suffice to say with weekly meetings and a lot of sweat equity, a neocloud was enabled. Bottom line: They approach data centers as mission-critical infrastructure, just as we do, and understand that time to market is a major business-driver for our clients.

All of that is very important, but there’s another element that I need to include – the core values of our companies align. I’d be happy to tell you more about that, just get in touch.

GPU One Deployment and Customer Primary Liquid Cooling System Installation Concurrently

As the STN GPU deployment was underway, Atomatic’s build-out of the customer primary liquid cooling system began. The chillers, pumps, telemetry and control units and thermal storage reservoir were assembled on-site, on a skid adjacent to the GPU One cabinets. Additionally, underfloor piping (the customer primary cooling loop) was assembled beneath and surrounding the GPU One cabinets, with taps into each cabinet.

The customer primary liquid cooling system pumps chilled water through the cooling loop at a rate of 380 gallons per minute, which is routed to an in-cabinet CDU (cooling distribution unit), and then to the direct-to-chip cold plates in each server. The heat from the NVIDA chips is absorbed by the cold plates; ultimately, the warmed water is sent back to the data center chiller. This process is continuous.

The liquid cooling system features 2N redundancy and can rapidly failover to use concurrent backup pumps and the water reservoir on the skid, should there be a power interruption that requires CH2 to switch over to generator power. In such a case, the customer primary liquid cooling system would be interrupted for only a matter of seconds.

This is critical not just to ensure the continuous operation of GPU One, but also to meet warranty requirements on NVIDIA’s B200 chips, and the servers in which they reside.

This liquid cooling system was designed and built with these factors in mind – and to the best of anyone’s knowledge here at CoreSite, it’s the only liquid cooling system backed by its own service level agreement.

Skild AI Robotics Brain Leverages GPU One

Skild AI, a company that creates AI foundation models and software designed to drive a diverse range of robotic devices and applications, is leveraging GPU One to host and power its Skild Brain robotics application.

As described in TechRepublic, “This [Skild Brain] isn’t just about smarter robots – it’s about fundamentally reshaping how work gets done. Imagine construction sites where robots navigate dangerous environments alongside humans, manufacturing floors where machines adapt to new tasks without reprogramming, and hospitals where robotic assistants handle complex procedures.”

Skild AI co-founder Abhinav Gupta expounds on this in the article saying that, with general-purpose robots that can “safely perform any automated task, in any environment,” companies can expand robot capabilities while addressing the labor crisis head-on. The company is already targeting construction, manufacturing, and security sectors, where dangerous or repetitive tasks could be automated.

An AI foundation model using large scale data, Skild Brain learns general skills, adapts to new environments and keeps improving over time. 5G connectivity between Skild Brain and various quadruped, humanoid, table-top and other robot form factors is enabled by STN’s GPU One near-zero-latency connectivity.

Another Success Story Demonstrating Where You Put Your Infrastructure Matters

Of course, none of this could work without the direct-to-chip liquid cooling required to keep the NVIDIA B200 GPUs and the high-performance servers running at peak capability.

CoreSite’s innovative and collaborative culture, along with its strong industry partnerships and its focus on AI-ready data center design and operation have been key to the success of STN’s GPU One deployment as well as GPU One-hosted applications such as Skild Brain.

It’s just one more example of CoreSite’s oft-stated tenet: “Where you put your infrastructure matters.”

Know More

To delve even deeper into how CoreSite supports cutting-edge AI, check out a new guest blog by Sabur Mian, CEO and Co-Founder of STN, detailing how STN’s neocloud hosts the Skild Brain – a general-purpose operating system that manages inferences and directs robot actions, leveraging the first-ever truly intelligent foundational model – in CoreSite’s CH2 data center.

 
When you are ready, contact us to discuss your digital business objectives and challenges.