December 8, 2024

AWS Expands AI Infrastructure with New Trainium2 Chips and Project Rainier

Listen to this article as Podcast
0:00 / 0:00
AWS Expands AI Infrastructure with New Trainium2 Chips and Project Rainier

Amazon Web Services Expands AI Infrastructure with New Trainium2 Chips

Amazon Web Services (AWS) has announced the availability of its new EC2 instances with Trainium2 chips. These chips, specifically designed for AI training, are said to offer significantly improved performance and an optimized price-performance ratio, thus accelerating the development and deployment of complex AI models.

Focus on Performance Increase and Cost Efficiency

According to AWS, the new Trn2 instances offer a computing power of 20.8 Petaflops per instance. Compared to the previous GPU-based EC2 P5 instances, AWS promises an increase in the price-performance ratio of up to 40 percent. This allows developers and companies to train AI models more efficiently and cost-effectively.

Innovative Architecture for Scalable Performance

A Trn2 UltraServer combines four Trn2 instances via the NeuronLink connection. This architecture allows scaling of computing power up to 83.2 Petaflops. This enables even very large AI models with up to one trillion parameters to be processed with significantly reduced training and inference times and improved latency.

"Project Rainier": Next-Generation AI Cluster

With "Project Rainier," AWS goes one step further. Here, hundreds of Trainium2 UltraServers are combined into an EC2 UltraCluster. This infrastructure is already being used in organizations like Anthropic to optimize and train large language models like Claude for Amazon Bedrock. This will enable customers to efficiently train and use models with trillions of parameters in real-time. AWS emphasizes that not only the sheer size of the clusters, but above all the optimized architecture of the Trainium2 UltraServers with improved data distribution and resource allocation is responsible for the performance increase. This shortens training times without reaching the limits of conventional network architectures.

Nvidia Blackwell GPUs and Outlook on Trainium3

In addition to the Trainium2-based instances, AWS also introduced the new EC2 P6 instances, which are based on Nvidia's Blackwell GPUs. These offer up to 2.5 times higher performance compared to the previous generation and are specifically optimized for computationally intensive generative AI applications.

AWS also announced the successor to Trainium2: Trainium3. This chip will be manufactured using a 3-nanometer process and is expected to offer four times the performance of Trainium2 while simultaneously improving energy efficiency. Trainium3 will be used in future versions of the UltraServers and will enable customers to train models even faster and deploy them in real-time.

Bibliographie: https://www.heise.de/news/AWS-Neue-Cloud-Instanzen-mit-Trainium2-Chips-fuer-mehr-KI-Leistung-10191488.html https://de.investing.com/news/company-news/aws-stellt-neue-trainium2-kichips-fur-ec2instanzen-vor-93CH-2798963 https://rpa-ki.de/articles/news-zu-ki-k%C3%BCnstliche-intelligenz/1216-aws-neue-cloud-instanzen-mit-trainium2-chips-f%C3%BCr-mehr-ki-leistung https://de.qz.com/amazon-ki-training-chip-trainium-aws-cloud-computing-1851712293 https://futurezone.at/b2b/amazon-supercomputer-nvidia-rainier-ultracluster-ultraserver-ki-trainium2/402984065 https://www.golem.de/news/ki-training-aws-baut-ki-cluster-mit-hunderttausenden-chips-fuer-anthropic-2412-191421.html https://www.hardwareluxx.de/index.php/news/allgemein/wirtschaft/65032-aws-ultracluster-mit-hunderttausenden-trainium-chips-in-planung.html https://x.com/iXmagazin/status/1865372184534733241 https://aws.amazon.com/de/ai/machine-learning/trainium/