Home / Intel / Intel Data Center GPU Max Subsystem: Performance and Specs

Intel Data Center GPU Max Subsystem

Intel Data Center GPU Max Subsystem: Power for Professionals and Beyond

Introduction

In April 2025, Intel continues to strengthen its position in the high-performance computing market by offering a solution for the most demanding tasks — the Data Center GPU Max Subsystem. This graphics card is not designed for gamers, but for professionals working with artificial intelligence, scientific simulations, and rendering. However, its capabilities deserve attention even from enthusiasts. Let's examine what makes this GPU unique.

1. Architecture and Key Features

Xe-HPC Architecture (Ponte Vecchio)

At the core of the Data Center GPU Max Subsystem is the Xe-HPC architecture, also known as Ponte Vecchio. It is Intel's first solution specifically designed for supercomputers and data centers. The chips are manufactured using a hybrid 7nm process technology with Foveros 3D and EMIB technologies, allowing for the integration of up to 63 tiles into a single package.

Unique Features

- Xe Matrix Extensions (XMX): Similar to NVIDIA's Tensor Core, these are designed to accelerate AI computations.

- XeSS Upscaling: An Intel technology that enhances image resolution with minimal loss of quality. In gaming and rendering, it can demonstrate an increase of up to 30% in FPS at 4K.

- Ray Tracing Support: Hardware implementation of RT cores, although optimization for gaming currently lags behind NVIDIA's RTX 50 series.

2. Memory: Speed and Capacity

HBM2e with Phenomenal Bandwidth

The card is equipped with 128GB of HBM2e memory, providing a bandwidth of 3.2TB/s. This is 2.5 times higher than that of NVIDIA's H100 (1.8TB/s), which is critical for machine learning tasks and big data processing.

Impact on Performance

- Scientific Calculations: Climate modeling or molecular dynamics are accelerated by 40% compared to predecessors.

- Rendering: 8K projects in Blender are processed 25% faster due to memory capacity.

3. Gaming Performance: Not the Main Focus, but Interesting

Although the Data Center GPU Max Subsystem was not created for gaming, its capabilities are impressive:

- Cyberpunk 2077 (4K, Ultra): ~55 FPS without ray tracing, ~32 FPS with RT.

- Microsoft Flight Simulator 2024 (1440p): ~90 FPS.

- Horizon Forbidden West (1080p): ~120 FPS.

Details

- DLSS 3.5 and FSR 3.0 support is lacking, but XeSS compensates for this in 80% of games.

- For 4K gaming, the card is overkill: similar performance can be achieved with more affordable GeForce RTX 5070 Ti or Radeon RX 8900 XT.

4. Professional Tasks: Where the GPU Excels

Video Editing and 3D Rendering

- DaVinci Resolve: 8K video rendering is 1.5 times faster than on the NVIDIA A6000.

- Blender Cycles: Optimization for oneAPI reduces rendering time by 35%.

Scientific Computing

- Support for OpenCL 3.0 and SYCL makes the GPU ideal for:

- AI training (ResNet-50: 12,000 images/sec).

- Quantum simulations (4x speedup compared to AMD Instinct MI300X).

5. Power Consumption and Thermal Output

600W TDP: Serious Requirements

- Power Supply: At least 1200W for a single-GPU system.

- Cooling: Liquid cooling or server ventilation is mandatory.

- Cases: Only Full-Tower (e.g., Corsair 7000D) that supports 3-slot cards.

6. Comparison with Competitors

- NVIDIA H200: Better in CUDA-optimized tasks (price: $18,000 vs. $15,000 for Intel), but inferior in memory.

- AMD Instinct MI350X: Cheaper ($14,000) but weaker in AI inference.

- For Enthusiasts: RTX 5090 ($1999) excels in gaming but is not suitable for data centers.

7. Practical Advice

- Power Supply: Seasonic PRIME TX-1300 or Corsair AX1600i.

- Platform: Only server motherboards (Intel Eagle Stream) or HEDT (ASUS WS WRX90).

- Drivers: Use Intel oneAPI 2025.1 — stability is critical for professional tasks.

8. Pros and Cons

Pros:

- Record memory capacity (128 GB HBM2e).

- Support for oneAPI for cross-platform optimization.

- Energy efficiency in computations per watt.

Cons:

- Price ($15,000) is inaccessible for private users.

- Limited optimization for gaming.

- Requires specialized cooling equipment.

9. Final Conclusion: Who is it Suitable For?

The Intel Data Center GPU Max Subsystem is the choice for:

- Corporations: Data centers, cloud providers, AI startups.

- Scientists: Climate modeling, genomics, astrophysics.

- Studios: Rendering films and AAA games in 8K.

If you are looking for a GPU for gaming or a home PC — this is not your option. But for those who need exaflop-level power, Intel offers one of the best tools on the market.

Prices are current as of April 2025. The listed prices are for new devices in the USA.

Basic

Label Name

Intel

Platform

Professional

Launch Date

January 2023

Model Name

Data Center GPU Max Subsystem

Generation

Data Center GPU

Base Clock

900MHz

Boost Clock

1600MHz

Bus Interface

PCIe 5.0 x16

Transistors

100,000 million

RT Cores

128

Tensor Cores

Tensor Cores are specialized processing units designed specifically for deep learning, providing higher training and inference performance compared to FP32 training. They enable rapid computations in areas such as computer vision, natural language processing, speech recognition, text-to-speech conversion, and personalized recommendations. The two most notable applications of Tensor Cores are DLSS (Deep Learning Super Sampling) and AI Denoiser for noise reduction.

1024

TMUs

Texture Mapping Units (TMUs) serve as components of the GPU, which are capable of rotating, scaling, and distorting binary images, and then placing them as textures onto any plane of a given 3D model. This process is called texture mapping.

1024

Foundry

Intel

Process Size

10 nm

Architecture

Generation 12.5

Memory Specifications

Memory Size

128GB

Memory Type

HBM2e

Memory Bus

The memory bus width refers to the number of bits of data that the video memory can transfer within a single clock cycle. The larger the bus width, the greater the amount of data that can be transmitted instantaneously, making it one of the crucial parameters of video memory. The memory bandwidth is calculated as: Memory Bandwidth = Memory Frequency x Memory Bus Width / 8. Therefore, when the memory frequencies are similar, the memory bus width will determine the size of the memory bandwidth.

8192bit

Memory Clock

1565MHz

Bandwidth

Memory bandwidth refers to the data transfer rate between the graphics chip and the video memory. It is measured in bytes per second, and the formula to calculate it is: memory bandwidth = working frequency × memory bus width / 8 bits.

3205 GB/s

Theoretical Performance

Texture Rate

Texture fill rate refers to the number of texture map elements (texels) that a GPU can map to pixels in a single second.

1638 GTexel/s

FP16 (half)

An important metric for measuring GPU performance is floating-point computing capability. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable. Single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks, while double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy.

52.43 TFLOPS

FP64 (double)

An important metric for measuring GPU performance is floating-point computing capability. Double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy, while single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable.

52.43 TFLOPS

FP32 (float)

An important metric for measuring GPU performance is floating-point computing capability. Single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks, while double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable.

51.381 TFLOPS

Miscellaneous

Shading Units

The most fundamental processing unit is the Streaming Processor (SP), where specific instructions and tasks are executed. GPUs perform parallel computing, which means multiple SPs work simultaneously to process tasks.

16384

L1 Cache

64 KB (per EU)

L2 Cache

408MB

TDP

2400W

Vulkan Version

Vulkan is a cross-platform graphics and compute API by Khronos Group, offering high performance and low CPU overhead. It lets developers control the GPU directly, reduces rendering overhead, and supports multi-threading and multi-core processors.

N/A

OpenCL Version

3.0

OpenGL

4.6

DirectX

12 (12_1)

Power Connectors

1x 16-pin

Shader Model

6.6

Suggested PSU

2800W

Benchmarks

FP32 (float)

Score

51.381 TFLOPS

Compared to Other GPU

FP32 (float) / TFLOPS

RTX 5000 Ada Generation

63.974 +24.5%

L20

59.35 +15.5%

Data Center GPU Max Subsystem

51.381

Radeon Instinct MI250

46.165 -10.2%

GeForce RTX 4070 Ti SUPER

43.166 -16%

Intel Data Center GPU Max Subsystem