The World’s Most Powerful Ray Tracing GPU
Do you require higher performance for artificial intelligence (AI) training and inference, high-performance computing (HPC) or graphics? NVIDIA® Accelerators for HPE help solve the world’s most important scientific, industrial, and business challenges with AI and HPC.
Visualize complex content to create cutting-edge products, tell immersive stories, and reimagine cities of the future.
Extract new insights from massive datasets.
Hewlett Packard Enterprise servers with NVIDIA accelerators are designed for the age of elastic computing, providing unmatched acceleration at every scale.
Visualize complex content to create cutting-edge products, tell immersive stories, and reimagine cities of the future.
Extract new insights from massive datasets.
Hewlett Packard Enterprise servers with NVIDIA accelerators are designed for the age of elastic computing, providing unmatched acceleration at every scale.
Feature
Specification
GPU Architecture
NVIDIA Ampere
NVIDIA Third-Generation Tensor Cores
160 total Tensor Cores (40 cores per GPU, 4 GPUs)
NVIDIA CUDA Cores (shading units)
5120 total FP32 CUDA Cores (1280 cores per GPU, 4 GPUs)
NVIDIA RT Cores
40 total RT Cores (10 cores per GPU, 4 GPUs)
Double-Precision Performance (FP64)
Not applicable
Single-Precision Performance
FP32: 4x 4.5 TFLOPS
Tensor Float 32 (TF32): 4x 9 TFLOPS, 4x 18 TFLOPS* Half-Precision Performance FP16: 4x 17.9 TFLOPS, 4x 35.9 TFLOPS* Bfloat16 Not applicable Integer Performance INT8: 4x 35.9 TOPS, 4x 71.8 TOPS* GPU Memory 64GB GDDR6 (16 GB per GPU, 4 CPUs) Memory Bandwidth 4x 200 GB/s ECC Yes Interconnect Bandwidth Not applicable System Interface PCIe Gen 4, x16 lanes Form Factor PCIe full height/length, double width (dual slot) Multi-Instance GPU (MIG) No support Max Power Consumption 250 W Thermal Solution Passive Graphics APIs DirectX 12.07, Shader Model 5.17, OpenGL 4.68, Vulkan 1.18 Compute APIs CUDA, DirectCompute, OpenCL, OpenACC
Tensor Float 32 (TF32): 4x 9 TFLOPS, 4x 18 TFLOPS* Half-Precision Performance FP16: 4x 17.9 TFLOPS, 4x 35.9 TFLOPS* Bfloat16 Not applicable Integer Performance INT8: 4x 35.9 TOPS, 4x 71.8 TOPS* GPU Memory 64GB GDDR6 (16 GB per GPU, 4 CPUs) Memory Bandwidth 4x 200 GB/s ECC Yes Interconnect Bandwidth Not applicable System Interface PCIe Gen 4, x16 lanes Form Factor PCIe full height/length, double width (dual slot) Multi-Instance GPU (MIG) No support Max Power Consumption 250 W Thermal Solution Passive Graphics APIs DirectX 12.07, Shader Model 5.17, OpenGL 4.68, Vulkan 1.18 Compute APIs CUDA, DirectCompute, OpenCL, OpenACC