Details
Grace Hopper DGX™ - The Universal Superchip and System for AI
The NVIDIA Grace Hopper Superchip architecture is the first true heterogeneous accelerated platform for high-performance computing (HPC) and AI workloads. It accelerates applications with the strengths of both GPUs and CPUs while providing the simplest and most productive distributed heterogeneous programming model to date. Scientists and engineers can focus on solving the world’s most important problems.
AI models and Deep Neural Networks are rapidly growing in size and complexity, in response to the most pressing challenges in business and research.The computational capacity needed to support today’s modern AI workloads has outpaced traditional data centre architectures. Modern techniques that exploit use of model parallelism are colliding with the limits of inter-GPU bandwidth, as developers build increasingly large accelerated computing clusters and push the limits of data centre scale. A new approach is needed - one that delivers almost limitless AI computing scale in order to break through the barriers to achieving faster insights.
NVIDIA NVLink 4th Generation
Fourth-generation NVLink enables accessing peer memory using direct loads, stores, and atomic operations, enabling accelerated applications to solve larger problems more easily than ever.
The NVIDIA GH200 NVL2 fully connects two GH200 Superchips with NVLink, delivering up to 288GB of high-bandwidth memory, 10 terabytes per second (TB/s) of memory bandwidth, and 1.2TB of fast memory. The GH200 NVL2 offers up to 3.5X more GPU memory capacity and 3X more bandwidth than the NVIDIA H100 Tensor Core GPU in a single server for compute- and memory-intensive workload.
NVLink-C2C
NVLink-C2C memory coherency increases developer productivity, performance, and the amount of GPU-accessible memory. CPU and GPU threads can now concurrently and transparently access both CPU and GPU resident memory, allowing developers to focus on algorithms instead of explicit memory management.
The NVIDIA GH200 Grace Hopper Superchip combines the NVIDIA Grace™ and Hopper™ architectures using NVIDIA® NVLink®-C2C to deliver a CPU+GPU coherent memory model for accelerated AI and HPC applications. With 900 gigabytes per second (GB/s) of coherent interface, the superchip is 7X faster than PCIe Gen5. And with HBM3 and HBM3e GPU memory, it supercharges accelerated computing and generative AI. GH200 runs all NVIDIA software stacks and platforms, including NVIDIA AI Enterprise, the HPC SDK, and Omniverse™. The Dual GH200 Grace Hopper Superchip fully connects two GH200 Superchips with NVLink and delivers up to 3.5x more GPU memory capacity and 3x more bandwidth than H100 in a single server.
Data Center Scalability with Nvidia Networking
NVIDIA DGX GH200 is the only AI supercomputer that offers a shared memory space of 19.5TB across 32 Grace Hopper Superchips, providing developers with over 30X more fast-access memory to build massive models.
DGX GH200 is the first supercomputer to pair Grace Hopper Superchips with the NVIDIA NVLink Switch System, which allows 32 GPUs to be united as one data-centre-size GPU. Multiple DGX GH200 systems can be connected using NVIDIA InfiniBand to provide even more computing power. This architecture provides 10X more bandwidth than the previous generation, delivering the power of a massive AI supercomputer with the simplicity of programming a single GPU.
Giant Memory for Giant Models
As the complexity of AI models has increased, the technology to develop and deploy them has become more resource intensive. However, using the NVIDIA Grace Hopper Superchip architecture, DGX GH200 achieves excellent power efficiency.
Each NVIDIA Grace Hopper Superchip is both a CPU and GPU in one unit, connected with superfast NVIDIA NVLink-C2C. The Grace™ CPU uses LPDDR5X memory, which consumes one-eighth the power of traditional DDR5 system memory while providing 50% more bandwidth than eight-channel DDR5. And being on the same module, the Grace CPU and Hopper™ GPU interconnect consumes 5X less power and provides 7X the bandwidth compared to the latest PCIe technology used in other systems.
Hopper - 5th Generation Tensor Cores & Precisions
First introduced in the NVIDIA Volta architecture, NVIDIA Tensor Core technology has brought dramatic speedups to AI, bringing down training times from weeks to hours and providing massive acceleration to inference.
The NVIDIA Hopper architecture builds upon these innovations by bringing new precisions -Tensor Float (TF32) and Floating Point 64 (FP64) - to accelerate and simplify AI adoption and extend the power of Tensor Cores to HPC.
TF32 works just like FP32 while delivering speedups of up to 20X for AI without requiring any code change. Using NVIDIA Automatic Mixed Precision, researchers can gain an additional 2X performance with automatic mixed precision and FP16 adding just a couple of lines of code. And with support for bfloat16, INT8, and INT4, Tensor Cores in NVIDIA H200 Tensor Core GPUs create an incredibly versatile accelerator for both AI training and inference. Bringing the power of Tensor Cores to HPC, H200 also enables matrix operations in full, IEEE-certified, FP64 precision.Proven Infrastructure Solutions Built with Trusted Data Center Leaders
As an Nvidia Elite Partner, we offer a portfolio of infrastructure solutions that incorporates the Hopper architecture and the best of the NVIDIA DGX POD reference architecture.
Delivered as fully integrated, ready-to-deploy, these solutions make data centre AI deployments simpler and faster for IT.
Part No. | DGX-GH200 |
---|---|
Manufacturer | nvidia |
End of Life? | No |
Performance | 128 petaFLOPS of FP8 AI performance |
Compatible CPU(s) | Nvidia GH200 |
Max # Core(s) | 2,304 Arm® Neoverse V2 Cores with SVE2 4X 128b |
No. of GPUs | 256 |
GPU Memory | 144TB |
Memory Expansion | 19.5 TB |
Storage Capacity | OS: 2X 960GB NVME SSDs Internal Storage: 30TB (8X 3.84TB) NVME SSDs |
Supported OS | Ubuntu Linux Host OS |