It seems that Intel’s latest plan to ward off rivals from high-performance computing workloads involves a CPU with large stacks of high-bandwidth memory and new kinds of accelerators, plus its long-awaited data center GPU that will go head-to-head against Nvidia’s most powerful chips.
The Intel Xeon Max processor is part of the company’s new branding for HPC products. The new Xeon Max CPUs are optimized for HPC deployments along with the Max GPU series. In this generation, this product was previously known as “Sapphire Rapids HBM,” combining a next-gen Xeon CPU with HBM onboard to increase memory bandwidth for HPC applications.
The Xeon Max CPUs
The Xeon Max Series will pack up to 56 performance cores based on the same Golden Cove microarchitecture features as Intel’s 12th-Gen Core CPUs, which debuted last year. Like the vanilla Sapphire Rapids chips coming next year, these chips will support DDR5, PCIe 5.0, and Compute Express Link (CXL) 1.1, which will enable memory to be directly attached to the CPU over PCIe 5.0.
Xeon Max comes with a thermal design power (TDP) of 350W and comes with 20 accelerators built for artificial intelligence and HPC workloads. These accelerator types include Intel Advanced Vector Extensions 512 (AVX-512) and Intel Deep Learning Boost (DL Boost), Intel Data Streaming Accelerator (DSA), and Intel Advanced Matrix Extensions (AMX).
With AVX-512, Intel claimed a Xeon Max-based system can provide double the deep learning training performance of a system using AMD’s high-end Epyc 7763 CPU, using the MLPerf DeepCAM benchmark. But with AMX, the company said the Xeon Max system could provide 3.6 times faster performance. We should take any performance claims with a grain of salt.
Unlike vanilla Sapphire Rapids, Xeon Max will come with 64GB of HBM2e, giving the CPU roughly 1TB/s of memory bandwidth and more than 1GB per core.
With 64GB of HBM2e, a dual-socket server with two Xeon Max CPUs will pack 128GB. This is significant because you can use the HBM as system memory and, as a result, forget about putting in any DRAM modules if you’re okay with that capacity.
McVeigh said this configuration, called HBM-only mode, can help data center operators save on money and power, and there is no need for any code changes for software to recognize HBM.
But for data center operators who want to use DDR memory as extra capacity or as the system memory, there are options. In HBM flat mode, the HBM and DDR act as two memory regions, but code changes are needed for the software to recognize this. In HBM caching mode, the HBM acts as a cache for the DDR; this requires no code changes.
McVeigh claimed that HBM helps Xeon Max significantly improve performance per watt over AMD’s HPC-focused Epyc 7773X, which comes with 768MB of L3 cache. With DDR5 memory installed, Intel said a Xeon Max-based system uses 63 percent lower power than the Epyc-based system to provide the same level of performance for the High-Performance Conjugate Gradients benchmark. With only HBM, the Xeon Max system uses 67 percent less power, according to Intel.
The Xeon Max GPUs
While Intel’s Data Centre GPU Max Series lacks a creative brand name like Xeon, the company is hoping the accelerator formerly known as Ponte Vecchio will make the company more competitive with data center GPUs from Nvidia, which has a solid lead, and AMD, which is catching up.
The chipmaker called the Max Series GPU its “highest density processor” because it packs more than 100 billion transistors into a system-on-package comprising 47 chiplets, known as “tiles” in Intel lingo. These tiles are brought together on the package using Intel’s advanced packaging technologies: embedded multi-die interconnect bridge (EMIB) and Foveros.
The Max Series GPU comes with up to 128 cores based on the Intel Xe HPC microarchitecture, an HPC-focused branch of the chipmaker’s Xe GPU architecture. McVeigh said this allows the GPU’s most potent configuration to provide 52 teraflops of peak FP64 throughput, a key measure for HPC.
The GPU also comes with up to 128 ray tracing units, which are geared for traditional simulation software, digital content creation, and pre-visualization applications. Each GPU has 16 Xe Link ports, allowing multiple GPUs to communicate directly.
Like Xeon Max, the Max Series GPU comes equipped with HBM2e, except the capacity, in this case, goes up to 128GB. The GPU also packs a lot of caches, with a maximum of 408MB of Rambo L2 cache (Rambo stands for “random access memory, bandwidth optimized”) and up to 64MB of L1 cache.