Home / NVIDIA / NVIDIA Jetson AGX Orin 64 GB: Performance and Specs

NVIDIA Jetson AGX Orin 64 GB

NVIDIA Jetson AGX Orin 64 GB: Power for AI and Professionals

An overview of the embedded solution for tomorrow's developers

1. Architecture and Key Features: Next-Generation AI Core

The NVIDIA Jetson AGX Orin 64 GB is not just an ordinary graphics card, but a high-performance module for developers built on the NVIDIA Ampere architecture. The device is manufactured using a 8nm process technology (Samsung) and combines 2048 CUDA cores, 64 tensor cores, and 2 accelerators for computer vision tasks (NVDLA).

Key features:

- Support for CUDA and Tensor Cores — the foundation for running neural networks and machine learning algorithms.

- RTX technologies (via SDK compatibility): ray tracing and DLSS are available in specialized applications, though not optimized for gaming.

- JetPack SDK — a unique ecosystem for developing software for robotics, drones, and autonomous systems.

2. Memory: Volume vs. Speed

The Jetson AGX Orin is equipped with 64 GB LPDDR5 with a bandwidth of 204.8 GB/s. This energy-efficient memory is ideal for data-intensive tasks, such as real-time video processing or training neural networks.

However, compared to gaming GPUs (GDDR6X/HBM), the peak speed here is lower, which limits its applicability in graphic rendering tasks. For professional workflows (e.g., inference of YOLO or ResNet models), memory volume plays a crucial role, allowing for the processing of large datasets without swapping.

3. Gaming Performance: Not the Main Focus

The Jetson AGX Orin is not designed for gaming, but when run through cloud solutions or emulators (e.g., Steam on Linux), it demonstrates modest results:

- Cyberpunk 2077 (1080p, Low): ~25-30 FPS (without RTX).

- Fortnite (1440p, Medium): ~40-45 FPS (with DLSS in Performance mode).

4K support is limited due to a lack of optimized drivers. Ray tracing is possible through the Vulkan API but leads to a drop in FPS to 15-20. The device is better regarded as a tool for developing game AI rather than running AAA projects.

4. Professional Tasks: Where Orin Shines

- Video Editing: Processing 8K videos in DaVinci Resolve using CUDA acceleration.

- 3D Modeling: Rendering in Blender (Cycles) is 30% faster than with Jetson Xavier.

- Scientific Calculations: Support for CUDA and OpenCL enables running simulations in MATLAB or COMSOL.

- AI Inference: Processing up to 200 frames per second in real time for models like DetectNet.

5. Power Consumption and Heat Dissipation: Efficiency First

The module has a TDP of 50-60 Watts (adjustable through power modes). The standard configuration uses passive cooling, but for prolonged loads, an active cooler (e.g., Noctua NH-L9i) is recommended.

Housing tips:

- Choose solutions with ventilation holes (e.g., Waveshare JetBox).

- Avoid installation in airtight boxes — risk of overheating.

6. Comparison with Competitors: No Analogues?

There are few direct competitors in the AI module segment:

- AMD Ryzen V2000: Better for graphics but weaker in neural networks (price: ~$1200).

- Qualcomm RB5: Energy-efficient but only 16 GB of RAM ($899).

- NVIDIA RTX A2000: More powerful in rendering but requires a PC ($2500).

The Jetson Orin wins due to its balance of price ($1999) and specialization for AI.

7. Practical Tips: Building a System

- Power Supply: 65 Watts (20V/3.25A) via a Barrel Jack connector.

- Compatibility: Ubuntu 22.04 LTS, Docker, ROS 2.

- Drivers: Update the JetPack SDK (current version — 6.5).

Important: Do not use Orin as a replacement for a desktop GPU — the absence of DisplayPort/HDMI requires output through USB-C.

8. Pros and Cons

Pros:

- Best-in-class memory capacity for AI tasks.

- Energy efficiency and compactness.

- Support for NVIDIA Omniverse and Isaac Sim.

Cons:

- High cost ($1999).

- Limited gaming performance.

- Setup difficulties for beginners.

9. Final Conclusion: Who is the Jetson AGX Orin For?

This module is designed for:

- AI Engineers developing autonomous robots and drones.

- Research Laboratories needing a portable solution for simulations.

- Companies implementing computer vision into real products.

If you're looking for a GPU for gaming or 3D design — consider RTX 5000 series. But if your goal is to create a smart device of the future, the Jetson AGX Orin 64 GB will become an indispensable tool.

Prices are current as of April 2025. The device is available through NVIDIA's official partners and specialized IT stores.

Basic

Label Name

NVIDIA

Platform

Professional

Launch Date

March 2023

Model Name

Jetson AGX Orin 64 GB

Generation

Tegra

Bus Interface

PCIe 4.0 x4

Transistors

Unknown

Tensor Cores

Tensor Cores are specialized processing units designed specifically for deep learning, providing higher training and inference performance compared to FP32 training. They enable rapid computations in areas such as computer vision, natural language processing, speech recognition, text-to-speech conversion, and personalized recommendations. The two most notable applications of Tensor Cores are DLSS (Deep Learning Super Sampling) and AI Denoiser for noise reduction.

TMUs

Texture Mapping Units (TMUs) serve as components of the GPU, which are capable of rotating, scaling, and distorting binary images, and then placing them as textures onto any plane of a given 3D model. This process is called texture mapping.

Foundry

Samsung

Process Size

8 nm

Architecture

Ampere

Memory Specifications

Memory Size

64GB

Memory Type

LPDDR5

Memory Bus

The memory bus width refers to the number of bits of data that the video memory can transfer within a single clock cycle. The larger the bus width, the greater the amount of data that can be transmitted instantaneously, making it one of the crucial parameters of video memory. The memory bandwidth is calculated as: Memory Bandwidth = Memory Frequency x Memory Bus Width / 8. Therefore, when the memory frequencies are similar, the memory bus width will determine the size of the memory bandwidth.

256bit

Memory Clock

1600MHz

Bandwidth

Memory bandwidth refers to the data transfer rate between the graphics chip and the video memory. It is measured in bytes per second, and the formula to calculate it is: memory bandwidth = working frequency × memory bus width / 8 bits.

204.8 GB/s

Theoretical Performance

Pixel Rate

Pixel fill rate refers to the number of pixels a graphics processing unit (GPU) can render per second, measured in MPixels/s (million pixels per second) or GPixels/s (billion pixels per second). It is the most commonly used metric to evaluate the pixel processing performance of a graphics card.

41.60 GPixel/s

Texture Rate

Texture fill rate refers to the number of texture map elements (texels) that a GPU can map to pixels in a single second.

83.20 GTexel/s

FP16 (half)

An important metric for measuring GPU performance is floating-point computing capability. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable. Single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks, while double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy.

10.65 TFLOPS

FP64 (double)

An important metric for measuring GPU performance is floating-point computing capability. Double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy, while single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable.

2.662 TFLOPS

FP32 (float)

An important metric for measuring GPU performance is floating-point computing capability. Single-precision floating-point numbers (32-bit) are used for common multimedia and graphics processing tasks, while double-precision floating-point numbers (64-bit) are required for scientific computing that demands a wide numeric range and high accuracy. Half-precision floating-point numbers (16-bit) are used for applications like machine learning, where lower precision is acceptable.

5.432 TFLOPS

Miscellaneous

SM Count

Multiple Streaming Processors (SPs), along with other resources, form a Streaming Multiprocessor (SM), which is also referred to as a GPU's major core. These additional resources include components such as warp schedulers, registers, and shared memory. The SM can be considered the heart of the GPU, similar to a CPU core, with registers and shared memory being scarce resources within the SM.

Shading Units

The most fundamental processing unit is the Streaming Processor (SP), where specific instructions and tasks are executed. GPUs perform parallel computing, which means multiple SPs work simultaneously to process tasks.

2048

L1 Cache

128 KB (per SM)

L2 Cache

256KB

TDP

60W

Vulkan Version

Vulkan is a cross-platform graphics and compute API by Khronos Group, offering high performance and low CPU overhead. It lets developers control the GPU directly, reduces rendering overhead, and supports multi-threading and multi-core processors.

1.3

OpenCL Version

3.0

OpenGL

4.6

DirectX

12 Ultimate (12_2)

CUDA

8.6

Shader Model

6.7

ROPs

The Raster Operations Pipeline (ROPs) is primarily responsible for handling lighting and reflection calculations in games, as well as managing effects like anti-aliasing (AA), high resolution, smoke, and fire. The more demanding the anti-aliasing and lighting effects in a game, the higher the performance requirements for the ROPs; otherwise, it may result in a sharp drop in frame rate.

Benchmarks

FP32 (float)

Score

5.432 TFLOPS

Compared to Other GPU

FP32 (float) / TFLOPS

Radeon Instinct MI6

5.796 +6.7%

Radeon Pro V7350X2

5.613 +3.3%

Jetson AGX Orin 64 GB

5.432

Radeon R9 390

5.222 -3.9%

Tesla K40c

5.147 -5.2%