NVLink vs. PCIe: Understanding The Differences

We know the most common PCIe-based GPUs, but now NVIDIA has introduced a new version of the GPU: NVLink. But as a leader in the AI industry, why has NVIDIA done this? Why has NVIDIA introduced a new version of the GPU in an already existing technology? What is the difference between NVLink vs PCIe? All these questions will be answered in the article below.

What is PCIe
What is NVLink
NVLink vs. PCIe
NVLink Limitations
How to choose
FAQs
Conclusion

What is PCIe

This has been the most mainstream GPU form for a long time. Because it has been trusted for many years, it has strong compatibility and has become an industry standard interface. Whether it is a home gaming PC or a regular server, PCIe can be used perfectly for data transmission. It is commonly used to connect various devices in a stable way, such as network cards and storage drives.

PCIe is essentially a general-purpose peripheral bus architecture. It forms a tree structure with the CPU or chipset as the central node. In this architecture, all communication must go through the host path. While this provides very high compatibility, it also shows limitations today as AI training demand grows. It cannot support direct GPU-to-GPU communication (or only very limited), and must rely on the slower PCIe channel.

pcle_port — *Figure 1: PCLe ports (Source from en.wikipedia.org)*

However, in NVLink vs. PCIe, although the interconnect bandwidth of PCIe GPUs is lower than that of NVLink, the computing performance of the GPU itself does not have a clear difference. PCIe Gen5 x16 provides about 64GB/s one-way bandwidth, while PCIe Gen6 with PAM4 signaling can reach about 128GB/s one-way bandwidth. For applications that do not rely heavily on high-speed GPU interconnect, such as small to medium model training and inference deployment, GPU interconnect bandwidth does not have a large impact on overall performance.

PCIe key features:

• Hub-based communication via CPU/chipset
• Standard interface with strong compatibility
• Tree topology
• Bandwidth scaling via PCIe lanes, e.g., x16

What is NVLink

NVLink refers to a block-style structure based on the SXM architecture, which stands for Socketed Multi-Chip Module. In today’s fast-growing AI era, to further improve speed, NVIDIA introduced a new GPU form factor called SXM. SXM supports up to 8 GPUs placed flat on the motherboard. This design is created for AI model training, such as LLMs, enabling fast direct communication between GPUs without going through other devices.

nvlink_nvidia — *Figure 2: NVLink NVIDIA (Source from www.nvidia.com)*

NVLink vs PCIe, NVLink is a high-speed interconnect protocol designed specifically for GPU communication, and it is built for short-distance, high-quality links. It uses SerDes (NVHS) for differential signaling, which makes the signal path shorter and reduces interference. At the same time, NVLink scales bandwidth through multiple parallel links. Each GPU contains multiple NVLink links, and each link consists of several high-speed channels. For example, in H100 (NVLink 4), each GPU provides 18 NVLink links, each with about 25GB/s one-way bandwidth, reaching about 450GB/s one-way (900GB/s bidirectional). In the newer Blackwell architecture (NVLink 5), total bandwidth can reach around 1.8TB/s bidirectional.

Unlike PCIe, NVLink does not rely on complex long-distance FEC systems. Its lightweight packet + flit mechanism gives it lower latency and higher efficiency.

NVLink key features:

Direct GPU to GPU communication
Multiple parallel links for each GPU
Support for GPU mesh interconnect.

Why is NVLink necessary now?

In the past, people focused more on connecting different types of devices into one system, and the demand for bandwidth was not so high. But now, with the rise of AI training, simple computation is no longer the main bottleneck. The focus has shifted to data movement, especially data transfer between GPUs.

In model parallelism and distributed training, a large amount of intermediate data needs to be exchanged frequently between GPUs. If still relying on the PCIe path, each transfer goes through a single-channel, slower communication path, which limits performance. NVLink was created to solve this problem and has gradually become a key technology supporting AI infrastructure. So it is necessary now.

NVLink vs. PCIe

NVLink vs. PCIe have many differences and similarities, including bandwidth and performance, which people care about most. Some people in the community also focus on RAS coverage, SerDes differences, and link scaling methods. The table below can help you quickly understand these differences.

	PCIe	NVLink
Bandwidth	Gen5 x16: 64 GB/s→128 GB/s (bi) Gen6 x16: 128 GB/s→256 GB/s (bi)	NVLink 4: 450 GB/s→900 GB/s (bi) NVLink 5: 900 GB/s→1.8 TB/s (bi)
Latency	Higher, due to CPU/chipset path	Low, direct GPU-to-GPU links
Topology	CPU-controlled tree structure	Mesh / point-to-point GPU interconnect
GPU Direct	No, via CPU/chipset	Yes, direct GPU communication
Use Cases	General servers, gaming	AI training, model parallelism
Scaling	Add lanes (x16/x32)	Add links per GPU
Reliability (RAS)	Protocol + platform-dependent	CRC + retransmission
SerDes	Long-reach, needs FEC/retimers	Short-reach NVHS, lower latency
Expansion	Scale by lane width	Scale by parallel links

NVLink Limitations

However, newer does not mean perfect. From discussions in the Reddit community, we can clearly see that NVLink still has some limitations and is not suitable for everyone.

First, a very common misunderstanding is that NVLink can achieve “memory pooling” or “shared VRAM.” NVLink does not make multiple GPUs act as a single GPU with larger memory. Each GPU still has its own independent memory space. NVLink only provides a faster path for data exchange. Data still needs to move between GPUs, and even with NVLink, this is much slower than internal GPU memory bandwidth (for example, A100 memory bandwidth can reach hundreds of GB/s, while NVLink is around the hundred GB/s level).

Second, the benefit of NVLink depends highly on the workload. In data parallelism scenarios, each GPU processes the full model independently, and there is almost no communication between GPUs. In this case, NVLink brings little benefit. When In model parallelism, since computation is often sequential, communication only happens a limited number of times during forward and backward passes, so the overall speed gain is also limited. In many real tests, NVLink improves performance by about 30%–40%, not by an order of magnitude. So changing the connection type won’t make a huge difference; it just makes the work more efficient. Also, NVLink is still evolving.

NVLink vs PCIe, NVLink usually depends on SXM platforms or dedicated bridge structures, rather than standard PCIe slots. This means higher hardware cost, stricter thermal design, and a more limited ecosystem, unlike PCIe, which has a complete ecosystem. Therefore, for small-scale deployment or general computing scenarios, PCIe is still a more practical and cost-effective choice.

How to choose

In the context of large-scale model training with model parallelism, NVLink vs PCIe, the best choice is NVLink. It would help make the process more efficient and speed up the communication between the GPUs. However, it should be noted that it is not a plug-and-play solution, and after choosing the NVLink systems, software optimization is required to form a complete ecosystem.

However, if you are using a home or small-scale model training environment, you can still use PCIe. In addition, the mature environment means you don’t have to worry too much about compatibility. Moreover, when the number of GPUs is small, the difference in bandwidth is not significant. In this case, PCIe is powerful enough.

Besides, the interconnect between the GPUs is only a part of the whole system. The whole AI deployment system is not only about the interconnect between GPUs (either NVLink or PCIe), but also about the system’s coordination with the network layer. In this case, the optical module can add more value for you.

Generic Compatible 100G QSFP28 SR4 850nm 100m Transceiver

US$ 29.90 (Excl. VAT)

Add to cart
Generic QSFPDD-400G-SR8 Compatible 400GBASE-SR8 QSFP-DD 850nm 100m Transceiver

US$ 139.00 (Excl. VAT)

Add to cart
Generic OSFP-800G-SR8 Compatible 800G 2xSR4/SR8 OSFP Finned Top 850nm 100m Dual MPO-12/APC Transceiver

US$ 599.00 (Excl. VAT)

Add to cart
0.5~2m Generic OSFP-800G-DAC Compatible 800G OSFP Finned Top DAC Cable

Price range: US$ 99.00 through US$ 159.00 (Excl. VAT)

Select options

FAQs

#1 Is NVLink always better?
No. NVLink is designed for GPU communication. In many real applications, its advantage cannot be fully used, and PCIe is still more suitable for most scenarios in terms of cost and general use.

#2 Will NVLink replace PCIe?
No. PCIe is a general standard interface, while NVLink is a specialized solution for GPU interconnect. They are completely different in concept and direction. They will coexist for a long time and serve different needs.

Conclusion

The enumeration and hot plug of PCIe determine the upper limit of PCIe. NVLink vs PCIe, a product of a new era, can improve the interconnectivity between GPUs and make it more efficient and direct. Both interfaces have their own merits and demerits, and NVLink also needs time to develop. What is most important is to understand the difference between them and make a suitable selection according to your case.

Read more

Blog, Network Cabling

NVLink vs. PCIe: Understanding The Differences

Table of contents

What is PCIe

PCIe key features:

What is NVLink

NVLink key features:

Why is NVLink necessary now?

NVLink vs. PCIe

NVLink Limitations

How to choose

Generic Compatible 100G QSFP28 SR4 850nm 100m Transceiver

Generic QSFPDD-400G-SR8 Compatible 400GBASE-SR8 QSFP-DD 850nm 100m Transceiver

Generic OSFP-800G-SR8 Compatible 800G 2xSR4/SR8 OSFP Finned Top 850nm 100m Dual MPO-12/APC Transceiver

0.5~2m Generic OSFP-800G-DAC Compatible 800G OSFP Finned Top DAC Cable

FAQs

Conclusion

Maddy

Leave a Reply Cancel reply

About Us

Customer Service

Resources