NVIDIA Introduces H100 NVL – Optimum Memory Web Server Board for Big Language Versions

NVIDIA Announces H100 NVL - Maximum Memory Server Board for Large Language Models

While this year’s Springtime GTC occasion does not include any type of brand-new GPUs or GPU designs from NVIDIA, the business is still in the procedure of releasing brand-new items based upon the Receptacle and also Ada Lovelace GPUs it presented in 2015. At the high-end of the marketplace, the business today reveals a brand-new version of the H100 accelerator particularly focused on customers of huge language designs: the H100 NVL.

The H100 NVL is an amazing version of NVIDIA’s H100 PCIe card that, in maintaining with the moments and also NVIDIA’s wide success in the area of AI, deals with a unique market: the execution of LLM (Big Language Design ). There are a couple of points that make this card irregular from NVIDIA’s normal web server price, not the least of which is that it includes 2 PCIe H100 cards that come currently connected with each other, yet the huge and also is the huge memory capability. The consolidated dual-GPU card supplies 188GB of HBM3 memory – 94GB per card – using even more memory per GPU than any type of various other NVIDIA component to day, also within the H100 family members.

Contrast of NVIDIA H100 accelerator specs
H100 NVL H100 PCIe H100 SXM
CUDA FP32 cores 2 x 16896? 14592 16896
Tensor cores 2 x 528? 456 528
Raise the clock 1.98GHz? 1.75GHz 1.98GHz
Memory clock ~5.1Gbps HBM3 3.2Gbps HBM2e 5.23Gbps HBM3
Memory bus size 6144 little bits 5120 little bits 5120 little bits
memory transmission capacity 2 3.9TB/sec 2TB/sec 3.35TB/sec
VRAM 2 x 94GB (188GB) 80GB 80GB
FP32 vector 2 x 67 TFLOPS? 51 TFLOPS 67 TFLOPS
FP64 vector 2 x 34 TFLOPS? 26 TFLOPS 34 TFLOPS
INT8 Tensor 2 TOP of 1980 1513 TOP TOP 1980
Tensor FP16 2 x 990 TFLOPS 756 TFLOPS 990 TFLOPS
TF32 tensor 2 x 495 TFLOPS 378 TFLOPS 495 TFLOPS
FP64 tensor 2 x 67 TFLOPS? 51 TFLOPS 67 TFLOPS
Affiliation NV web link 4
18 web links (900GB/sec)
NV web link 4
(600GB/sec)
NV web link 4
18 web links (900GB/sec)
GPUs 2GH100
(814mm2)
GH100
(814mm2)
GH100
(814mm2)
Transistor matter 2x80B 80B 80B
TDP 700W 350W 700-800W
Production procedure TSMC 4N TSMC 4N TSMC 4N
User Interface 2 PCIe 5.0
(quadruple port)
PCIe 5.0
(dual port)
SXM5
Design receptacle receptacle receptacle

Driving this SKU is a details particular niche: memory capability. Big language designs such as the GPT family members remain in numerous aspects memory capability relevant, as they will rapidly load also an H100 accelerator to hold every one of their criteria (175B when it comes to the biggest GPT-3 designs). Consequently, NVIDIA has actually chosen to create a brand-new H100 SKU that supplies a little even more memory per GPU than the normal H100 components, which appear at 80GB per GPU.

Under the hood, what we’re checking out is basically a grandfather clause of the GH100 GPU that remains on a PCIe card. All GH100 GPUs include 6 HBM, HBM2e or HBM3 memory heaps, with a capability of 16GB per pile. Nonetheless, for efficiency factors, NVIDIA just ships the routine H100 get rid of 5 of the 6 HBM heaps made it possible for. So while there’s nominally 96GB of VRAM on each GPU, just 80GB is readily available on routine SKUs.

The H100 NVL, subsequently, is the mythological totally made it possible for SKU with all 6 heaps made it possible for. By turning on the 6th HBM, NVIDIA has the ability to access the extra memory and also extra memory transmission capacity it supplies. It will certainly have some product influence on returns – just how much that is a carefully secured NVIDIA trick – yet the LLM market is obviously huge sufficient and also ready to pay a high sufficient costs for virtually ideal GH100 plans to make NVIDIA worth it.

Also after that, it ought to be kept in mind that consumers do not have accessibility to almost all of the 96GB per card. Instead, with an overall capability of 188GB of storage space, they efficiently obtain 94GB per card. NVIDIA really did not explain on this style peculiarity in our pre-briefing in advance of today’s keynote, yet we think this is likewise for efficiency factors, offering NVIDIA some flexibility to disable cells (or layers) defective within the HBM3 memory heaps. The web outcome is that the brand-new SKU supplies 14GB even more memory per GH100 GPU, a memory rise of 17.5%. On the other hand, accumulated memory transmission capacity for the card is 7.8TB/second, climbing to 3.9TB/second for private cards.

Besides the boosted memory capability, in numerous means the private cards within the bigger dual-GPU/dual-card H100 NVL very closely appear like the SXM5 variation of the H100 positioned on a PCIe card. While the routine H100 PCIe is obstructed by the use slower HBM2e memory, less energetic SM/tensor cores, and also slower clock rates, the tensor core efficiency numbers NVIDIA is pointing out for the H100 NVL are all on the same level with the H100 SXM5, suggesting that this card is not additional reduced like the routine PCIe card. We’re still waiting on the last, total specifications for the item, yet presuming every little thing below is as offered, after that the GH100s entering into the H100 NVL would certainly stand for the greatest selection GH100s presently readily available.

As well as below a focus on the plural is called for. As kept in mind previously, the H100 NVL is not a solitary GPU component, yet instead is a double GPU/dual card component and also emerges to the host system thus. The equipment itself is based around 2 PCIe type aspect H100s linked per various other utilizing 3 NVLink 4 bridges. Literally, this is essentially similar to NVIDIA’s existing PCIe H100 style, which can currently be mated utilizing NVLink bridges, therefore the distinction it’s not in the building and construction of the two-card/four-slot leviathan, yet instead in the top quality of the silicon inside. Simply put, today you can link normal H100 PCie cards with each other, yet it would not match the memory transmission capacity, memory capability, or tensor throughput of the H100 NVL.

Remarkably, in spite of the excellent specifications, the TDPs remain close. The H100 NVL is a 700W to 800W component, reducing to 350W to 400W per card, the reduced limitation of which coincides TDP as the routine PCIe H100. In this situation, NVIDIA appears to focus on compatibility over peak efficiency, as couple of web server framework can deal with PCIe cards greater than 350W (and also also less greater than 400W), suggesting TDPs need to stand up. . Nonetheless, provided the greater efficiency and also memory transmission capacity, it’s vague exactly how NVIDIA provides the additional efficiency. Power binning can go a lengthy means below, yet it can likewise be a situation where NVIDIA is offering the card a more than normal increase clock rate because the target audience is mainly curious about tensor efficiency and also will not illuminate the whole GPU at the same time.

Or else, NVIDIA’s choice to launch what is basically the very best H100 container is an uncommon option provided their total choice for SXM components, yet it’s a choice that makes good sense in the context of what LLM consumers require. Big SXM-based H100 collections can quickly scale as much as 8 GPUs, yet the quantity of NVLink transmission capacity readily available in between any type of 2 is obstructed by the demand to go through NVSwitches. For a solitary dual-GPU setup, combining a collection of PCIe cards is far more uncomplicated, with the difficult web link supplying 600GB/second of transmission capacity in between cards.

However maybe crucial is merely the capacity to rapidly release the H100 NVL right into your existing facilities. As opposed to needing the setup of purpose-built HGX H100 service provider cards to match GPUs, LLM consumers can merely introduce the NVL H100 in brand-new web server develops or as a fairly fast upgrade to existing web server builds. Nevertheless, NVIDIA is targeting an extremely particular market below, so the normal SXM lead (and also NVIDIA’s capacity to toss its cumulative weight) may not use below.

In Conclusion, NVIDIA is advertising the H100 NVL as using 12 times the GPT3-175B reasoning throughput as the last generation HGX A100 (8 H100 NVL vs 8 A100). Which for consumers wanting to release and also scale their systems for LLM work as rapidly as feasible, will absolutely be appealing. As kept in mind previously, the H100 NVL brings absolutely nothing brand-new in regards to building attributes – much of the efficiency increase below originates from the brand-new Receptacle design change engines – yet the H100 NVL will certainly offer a details particular niche as the fastest PCIe H100 choice and also the choice with the biggest GPU memory swimming pool.

Profits, according to NVIDIA, the H100 NVL cards will certainly start delivering in the 2nd fifty percent of this year. The business isn’t estimating a cost, however, for what is basically a premium GH100 container, we would certainly anticipate them to regulate an optimal rate. Specifically taking into account exactly how the surge in LLM use is becoming a brand-new gold thrill for the web server GPU market.

Leave a Reply

Your email address will not be published. Required fields are marked *