Skip to content

The Future of Federal Supercomputing

Author’s Note: As an Artificial Intelligence (AI) practitioner, I constantly think about compute capacity. Testing production-ready, robust infrastructure to serve and monitor Large Language Model (LLM) apps is one of my key responsibilities as an engineer with FoundationaLLM. My experience with solutions that apply LLMs has naturally made me curious about the infrastructure that powers the development of sophisticated language models. This curiosity, coupled with my interest in policymaking, inspired me to look deeper into the history of federally sanctioned supercomputing projects and offer my thoughts on the future of public-private supercomputing partnerships. I hope you enjoy reading the perspective I offer here. Please feel free to voice your opinion in the comments!

In 1996, President Bill Clinton signed the U.N. General Assembly’s Comprehensive Nuclear Test Ban Treaty, prohibiting the detonation of nuclear weapons for testing. Though the U.S. Senate did not ratify the treaty, it was reflective of growing interest in prohibiting nuclear testing to advance non-proliferation. Indeed, Clinton’s predecessor, George H. W. Bush, signed a unilateral ban on nuclear testing in 1992­­­.1 Though well-intentioned, these nuclear policy maneuvers raised an important concern: How would the U.S. verify the functionality of nuclear weapons that guaranteed security for itself and its allies without testing? So began the mandate of the National Nuclear Security Administration (NNSA), established by Congress in 2000.2 The NNSA had a specific tool in mind to verify decades-old weapons systems: Supercomputing.

IBM’s BlueGene/L supercomputer was no doubt impressive. Its modular architecture enabled it to reach a peak performance of 360 trillion floating-point operations per second (TFlops).3 The grandeur of this machine was certainly not lost on the scientists who crammed into a San Francisco room in 2004 to watch it simulate the detonation of a Nixon-era nuclear weapon.4 Arguably more significant than the test, however, was its consequences. Policymakers recognized that supercomputing was an essential part of America’s national security strategy. Heightened interest in supercomputing catalyzed further development of High-Performance Computing (HPC) algorithms for other industries, such as drug development and weather modeling. Today’s premier supercomputer, Aurora, introduced last November at Argonne National Laboratory, far surpasses Blue Gene/L and aims to advance Artificial Intelligence (AI) research.5

Given the distinguished history of America’s public supercomputers, continued government support for new projects seems assured. I argue, however, that the federal government would be better positioned to address its scientific research goals without compromising national security by leaning on private sector hyperscalers for its supercomputing needs.

Industry analysts often refer to the trio of Microsoft’s, Amazon’s, and Google’s public cloud business as the hyperscalers. This terminology is sensible: Nearly two decades of sustained capital expenditures (CapEx) on data centers means that these firms have acquired an immeasurable lead over their competitors and the federal government. According to Charles Fitzgerald, who served in strategy leadership roles at Microsoft from 1989 to 2008, sustained CapEx is the most significant factor underlying the trio’s success in dominating the public cloud market.6

A graph of a company growth

Description automatically generated

Image Credit: Charles Fitzgerald’s platformonomics.com

The hyperscalers have a track record of expanding compute offerings for customers at a meteoric rate. HPC users have taken notice. According to a June 2021 press release from Microsoft Azure (Microsoft’s public cloud offering), a cluster consisting of 164 of the firm’s top-tier ND-series Virtual Machines (VMs) performs equivalently to a Top 20 supercomputer from November 2020.7

For scientific research applications, the hyperscalers, which rapidly integrate hardware advances and offer seemingly boundless capacity, provide an appealing alternative to government-funded supercomputers. Non-sensitive workloads with looser compliance requirements might even be able to use public cloud capacity, rather than sovereign clouds. However, transitioning critical national security workloads poses a greater challenge. The hyperscalers may be willing to provide government environments, subject to strict oversight, for especially sensitive workloads, but they will only assume that responsibility if the federal government signals a long-term commitment. As technology companies come under increasing scrutiny, and the Congressional appropriations process becomes ever more harrowing thanks to political polarization, the hyperscalers may understandably be reluctant.

Maintaining America’s technological edge is a goal most federal policymakers share. As China’s leadership accelerates strategic private-sector investment, it is sensible for the U.S. federal government to recognize the private sector’s compute supremacy and mobilize it to serve the public interest. We have much to thank federal supercomputers for, and they will likely continue to propel extraordinary scientific feats. Ultimately, however, policymakers must pursue higher-impact partnerships with the hyperscalers.

Footnotes

1 www.armscontrol.org/act/1999-09/press-releases/senate-rejects-comprehensive-test-ban-treaty-clinton-vows-continue

2 www.energy.gov/nnsa/about-nnsa

3 www.energy.gov/nnsa/nnsas-high-performance-computing-achievements

4 www.nbcnews.com/id/wbna6745422

5 www.anl.gov/aurora

6 platformonomics.com/2024/04/follow-the-capex-the-clown-car-race-checkered-flag/

7 azure.microsoft.com/en-us/blog/azure-announces-general-availability-of-scaleup-scaleout-nvidia-a100-gpu-instances-claims-title-of-fastest-public-cloud-super/