Elon Musk’s xAI has claimed to have launched the world’s ‘most powerful’ AI cluster, Colossus, with plans to increase its size even more.
The tech CEO and billionaire announced the launch of the Colossus 100k H100 training cluster in a post on X (Sep. 2). Musk applauded the “excellent work by the [xAI] team, Nvidia and our many partners/suppliers”
“This weekend, the @xAI team brought our Colossus 100k H100 training cluster online,” he wrote. “From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months.”
This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days.
Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months.
Excellent…
— Elon Musk (@elonmusk) September 2, 2024
Colossus boasts a massive 100,000 Nvidia H100 GPUs, which is a huge amount compared to other competitors. For example, ChatGPT uses an estimated 30,000 A100 GPUs. xAI’s Colossus was built in just 122 days and there are reportedly plans for it to double in size down the line.
This cutting-edge system is housed in a facility in Memphis, Tennessee, formerly owned by Electrolux. According to earlier reports by The Information, Musk envisions transforming this location into a massive “gigafactory of compute.”
The Nvidia GPUs used for Colossus are among the most coveted technology products today. Their soaring demand has propelled Nvidia’s market capitalization and enabled it to briefly become the world’s most valuable company. Acquiring these GPUs can be difficult, as all the major players in the market want the same chips. To navigate this issue, xAI managed to bypass the competition by obtaining the initial batch of GPUs that were first allocated to Tesla, Musk revealed on X in June.
Tesla had no place to send the Nvidia chips to turn them on, so they would have just sat in a warehouse.
The south extension of Giga Texas is almost complete. This will house 50k H100s for FSD training.
— Elon Musk (@elonmusk) June 4, 2024
How could Colossus affect the wider AI sector?
The massive amount of GPUs certainly goes some way towards verifying Musk’s claim that Colossus is the ‘most powerful AI training system in the world.’ It’s another competitor entering the already crowded generative AI space and could help xAI’s Grok strengthen its position as a serious contender in the AI space. AI’s Grok 2 is already nipping at the heels of OpenAI’s GPT-4.
Grok 2 was only trained on roughly 15,000 GPUs, so the performance increase with Colossus’ 100,000 GPUs is sure to be sizeable, being six times more powerful. The xAI team and future versions of Grok will therefore likely be able to put renewed pressure on OpenAI, Google, and others in the generative AI space to deliver new results.
Featured image: Midjourney