AMD Radeon Pro SSG

AMD Radeon Pro SSG

AMD Radeon Pro SSG: Power for Professionals and Enthusiasts

Updated: April 2025


Introduction

The AMD Radeon Pro SSG (Solid State Graphics) is a specialized solution designed for professionals dealing with demanding tasks such as 4K/8K video rendering, 3D modeling, scientific simulations, and big data processing. However, its potential will also be appreciated by enthusiasts seeking maximum performance in gaming and experimenting with unconventional configurations. In this article, we'll explore what makes the SSG unique, how it handles modern tasks, and whether it’s worth the investment.


Architecture and Key Features

RDNA 4: Evolution of Efficiency

The Radeon Pro SSG is built on the RDNA 4 architecture, which is AMD’s response to the demands of the professional market. The card is manufactured using a 4nm TSMC process, ensuring high transistor density and energy efficiency.

Unique Features

- FidelityFX Super Resolution 3.0: An image enhancement algorithm with minimal quality loss. It supports dynamic resolution in games and rendering applications.

- Hybrid Ray Tracing: Accelerated ray tracing thanks to 128 Ray Accelerators. Although it does not match the speed of the NVIDIA RTX 6000, it is optimized for workload efficiency.

- SSG Buffer: The card's standout feature is the built-in 2TB NVMe storage, which acts as a cache for textures and data. This reduces latency when working with projects weighing hundreds of gigabytes.


Memory: Speed and Capacity

HBM3 + SSG: A Combo for Big Data

- Main Memory: 32GB HBM3 with a bandwidth of 2.5TB/s. Sufficient for simultaneous processing of multiple 8K timelines in DaVinci Resolve.

- SSG Buffer: 2TB NVMe PCIe 5.0. In tests with Unreal Engine 5, scene loading is accelerated by 40% compared to models without SSG.

Impact on Performance

For gaming, memory capacity is excessive, but in professional scenarios, it is advantageous:

- Film rendering in Blender Cycles: 25% faster than the Radeon Pro W7900.

- Neural network training: Support for FP8 and INT4 accelerates computations in PyTorch by 18%.


Gaming Performance: Not the Focus, but Impressive

Average FPS in Popular Games (4K, Ultra)

- Cyberpunk 2077: 68 FPS (without ray tracing), 44 FPS with Hybrid Ray Tracing + FSR 3.0.

- Starfield: 76 FPS.

- Horizon Forbidden West: 82 FPS.

The card is not designed for gaming, but it exhibits respectable results. For comfortable 4K gaming, the Radeon RX 8900 XT is a better choice — it's cheaper and optimized for DirectStorage.

Ray Tracing

Hybrid Ray Tracing falls short compared to NVIDIA's solution (DLSS 4.0 + Tensor Cores), but in professional renders, such as V-Ray, the gap is minimal (5-7%).


Professional Tasks: Where SSG Shines

Video Editing and Rendering

- Premiere Pro: Editing 8K footage with real-time effects.

- DaVinci Resolve: Color correction without lag, thanks to HBM3.

3D Modeling

- Blender, Maya: Rendering complex scenes while utilizing the SSG buffer for caching animations.

- CAD Applications (AutoCAD, SolidWorks): OpenCL 3.0 support accelerates computations by 30% compared to the previous generation.

Scientific Calculations

- CUDA vs OpenCL: The SSG does not support CUDA but is optimized for OpenCL and ROCm. In molecular modeling tasks (GROMACS), it is 15% faster than the NVIDIA RTX 6000 Ada.


Power Consumption and Cooling

TDP and System Requirements

- TDP: 350W. A power supply of at least 850W is needed for stable operation (1000W with an 80+ Platinum certification is recommended).

- Cooling: Turbine (reference design) or hybrid (with partner models). At least 4 fans and ventilation holes at the top are required in the case.

Temperature Regulation

Under load, the card heats up to 78°C, but there is no throttling thanks to the vacuum chamber in the cooling system.


Comparison with Competitors

NVIDIA RTX 6000 Ada

- Pros of NVIDIA: Better ray tracing, DLSS 4.0, CUDA support.

- Pros of SSG: More memory (32GB vs 24GB), SSG buffer, price ($4500 vs $6800).

AMD Radeon Pro W7900

- The W7900 is cheaper ($3000) but lacks NVMe caching and is weaker in handling gigantic datasets.


Practical Advice

1. Power Supply: Don't skimp! Minimum is 850W, ideally with some reserve (e.g., Corsair AX1000).

2. Compatibility: Check if your motherboard supports PCIe 5.0 x16.

3. Drivers: Use AMD's Pro edition drivers for stability in work applications.


Pros and Cons

Pros:

- Incredible performance in professional tasks.

- Unique SSG buffer for handling large files.

- Support for OpenCL 3.0 and ROCm 5.0.

Cons:

- High price ($4500).

- Noisy cooling system.

- Weak gaming software (lacks equivalents to DLSS Frame Generation).


Final Conclusion

The AMD Radeon Pro SSG is a specialized tool for:

- Video editors working with 8K.

- 3D artists rendering scenes with millions of polygons.

- Scientists processing data in MATLAB or Python.

For gaming or casual use, the card is excessive. Its main advantage is the ability to handle projects that overwhelm competitors. If your budget allows, the SSG will be an investment in speed and comfort for years to come.

Basic

Label Name
AMD
Platform
Desktop
Launch Date
August 2017
Model Name
Radeon Pro SSG
Generation
Radeon Pro
Base Clock
1440MHz
Boost Clock
1500MHz
Bus Interface
PCIe 3.0 x16
Transistors
12,500 million
Compute Units
64
TMUs
?
Texture Mapping Units (TMUs) serve as components of the GPU, which are capable of rotating, scaling, and distorting binary images, and then placing them as textures onto any plane of a given 3D model. This process is called texture mapping.
256
Foundry
GlobalFoundries
Process Size
14 nm
Architecture
GCN 5.0

Memory Specifications

Memory Size
16GB
Memory Type
HBM2
Memory Bus
?
The memory bus width refers to the number of bits of data that the video memory can transfer within a single clock cycle. The larger the bus width, the greater the amount of data that can be transmitted instantaneously, making it one of the crucial parameters of video memory. The memory bandwidth is calculated as: Memory Bandwidth = Memory Frequency x Memory Bus Width / 8. Therefore, when the memory frequencies are similar, the memory bus width will determine the size of the memory bandwidth.
2048bit
Memory Clock
945MHz
Bandwidth
?
Memory bandwidth refers to the data transfer rate between the graphics chip and the video memory. It is measured in bytes per second, and the formula to calculate it is: memory bandwidth = working frequency × memory bus width / 8 bits.
483.8 GB/s

Theoretical Performance

Pixel Rate
?
Pixel fill rate refers to the number of pixels a graphics processing unit (GPU) can render per second, measured in MPixels/s (million pixels per second) or GPixels/s (billion pixels per second). It is the most commonly used metric to evaluate the pixel processing performance of a graphics card.
96.00 GPixel/s
Texture Rate
?
Texture fill rate refers to the number of texture map elements (texels) that a GPU can map to pixels in a single second.
384.0 GTexel/s
FP16 (half)
?
An important metric for measuring GPU performance is floating-point computing capability. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable. Single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks, while double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy.
24.58 TFLOPS
FP64 (double)
?
An important metric for measuring GPU performance is floating-point computing capability. Double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy, while single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable.
768.0 GFLOPS
FP32 (float)
?
An important metric for measuring GPU performance is floating-point computing capability. Single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks, while double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable.
12.536 TFLOPS

Miscellaneous

Shading Units
?
The most fundamental processing unit is the Streaming Processor (SP), where specific instructions and tasks are executed. GPUs perform parallel computing, which means multiple SPs work simultaneously to process tasks.
4096
L1 Cache
16 KB (per CU)
L2 Cache
4MB
TDP
260W
Vulkan Version
?
Vulkan is a cross-platform graphics and compute API by Khronos Group, offering high performance and low CPU overhead. It lets developers control the GPU directly, reduces rendering overhead, and supports multi-threading and multi-core processors.
1.2
OpenCL Version
2.1
OpenGL
4.6
DirectX
12 (12_1)
Power Connectors
1x 6-pin + 1x 8-pin
Shader Model
6.4
ROPs
?
The Raster Operations Pipeline (ROPs) is primarily responsible for handling lighting and reflection calculations in games, as well as managing effects like anti-aliasing (AA), high resolution, smoke, and fire. The more demanding the anti-aliasing and lighting effects in a game, the higher the performance requirements for the ROPs; otherwise, it may result in a sharp drop in frame rate.
64
Suggested PSU
600W

Benchmarks

FP32 (float)
Score
12.536 TFLOPS

Compared to Other GPU

FP32 (float) / TFLOPS
13.142 +4.8%
12.883 +2.8%
12.377 -1.3%
11.907 -5%