AMD announced AMD Instinct MI100 Accelerator, said to be the world’s fastest HPC GPU and the first x86 server GPU that surpasses the 10 teraflops (FP64) performance barrier. The MI100 supports accelerated compute platforms from Dell, Gigabyte, HPE and Supermicro, combines with AMD EPYC CPUs and the ROCm 4.0 open software platform specially designed to propel newer discoveries ahead.
The MI100 was built based on the new AMD CDNA architecture, enables newer classes of accelerated systems for HPC and AI powered by 2nd Gen of AMD EPYC processors. In addition, the MI100 offers up to 11.5 TFLOPS of the peak FP64 performance for HPC and up to 46.1 TFLOPS peak FP32 Matrix performance for AI and machine learning workloads. Paired with AMD Matrix Core technology, the MI100 delivers 7x of the boost in FP16 theoretical peak floating point performance for AI training workloads compared to the previous generation accelerators.
Table 1 below shows the specification of AMD Instinct MI100:
|Compute Units||Stream Processors||FP64 TFLOPS (Peak)||FP32 TFLOPS (Peak)||FP32 Matrix TFLOPS (Peak)||FP16/FP16 Matrix|
|INT4 | INT8 TOPS (Peak)||bFloat16 TFLOPs (Peak)||HBM2|
|120||7680||Up to 11.5||Up to 23.1||Up to 46.1||Up to 184.6||Up to 184.6||Up to 92.3 TFLOPS||32GB||Up to 1.23 TB/s|
Next, AMD had shared the features of the Instinct MI100 Accelerator for public view:
- Newer AMD CDNA Architecture: Engineered to power AMD GPUs for the exascale era, and offers exceptional performance and power efficiency.
- Leading FP64 and FP32 Performance for HPC Workloads: Delivers industry leading 11.5 TFLOPS peak FP64 performance and 23.1 TFLOPS peak FP32 performance, enables scientists and researchers across the globe to accelerate discoveries in industries including life sciences, energy, finance, academics, government, defense and etceteras.
- Newer Matrix Core Technology for HPC and AI: Supercharged performance for maximum range of single and mixed precision matrix operations, such as FP32, FP16, bFloat16, Int8 and Int4, engineered to boosts the convergence of HPC and AI.
- 2nd Gen AMD Infinity Fabric Technology: Instinct MI100 provides twice the peer-to-peer (P2P) peak I/O bandwidth over PCIe 4.0 with up to 340 GB/s of aggregate bandwidth per card with three AMD Infinity Fabric Links. In a server, MI100 GPUs can be tweaked up to two fully-connected quad GPU hives, each providing up to 552 GB/s of P2P I/O bandwidth for faster data sharing.
- Ultra-Fast HBM2 Memory: Features 32GB high-bandwidth HBM2 memory at a clock rate of 1.2 GHz and delivers an ultra-high 1.23 TB/s of memory bandwidth to support wider data sets and help to eliminate bottlenecks in moving the data in and out of the memory.
- Support for Industry’s latest PCIe Gen 4.0: Designed with the latest PCIe Gen 4.0 technology to support providing up to 64GB/s peak theoretical transport data bandwidth from CPU to GPU.
Paving the path to Exascale Computing
AMD provided Oak Ridge National Labs to access to the AMD Instinct MI100 accelerator as a move to help the scientists and the researcher to perform their research via delivering a compute and interconnect performance. The Instinct MI100 accelerator was said to deliver newer classes of accelerated system and true heterogeneous compute capabilities from AMD for HPC and AI. In addition, AMD vows to provide continues performance, capabilities and scales needed to power the current and the future HPC workloads.
Below are the sources referred before the publication of this article: