Artificial Intelligence 8 min read

NVIDIA L40S GPU Overview and Its Impact on Generative AI and Optical Modules

The NVIDIA L40S GPU, built on the Ada Lovelace architecture with 48 GB GDDR6 memory and 846 GB/s bandwidth, delivers over 1.45 PFLOPS tensor performance and superior FP16/FP32 efficiency for generative AI training and inference, while its lower power and GDDR6 design may influence demand for mid‑range optical modules in data centers.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
NVIDIA L40S GPU Overview and Its Impact on Generative AI and Optical Modules

At SIGGRAPH 2023 NVIDIA announced the new L40S GPU and the OVX server equipped with it, targeting generative AI model training and inference to improve computational efficiency.

The L40S uses the Ada Lovelace architecture, featuring 48 GB GDDR6 memory, 846 GB/s bandwidth, fourth‑generation Tensor cores and an FP8 Transformer engine, delivering more than 1.45 PFLOPS tensor throughput. Benchmarks show higher efficiency than the A100 in both fine‑tuning and inference workloads.

Compared with the A100, the L40S differs in several aspects:

(1) It uses mature GDDR6 memory, which offers lower bandwidth than HBM but benefits from better availability and supply stability.

(2) FP16 performance is higher than the A100, and FP32 performance shows a significant increase, making it more suitable for scientific computing.

(3) Power consumption is reduced, helping lower data‑center energy usage.

(4) According to Super Micro data, the L40S offers better price‑performance than the A100.

The L40S connects to CPUs via a 16‑lane PCIe Gen 4 interface (64 GB/s bidirectional), while NVIDIA Grace Hopper uses NVLink‑C2C for up to 900 GB/s, seven times faster than PCIe Gen 5.

With 18,176 CUDA cores, the L40S provides nearly five times the FP32 performance of the A100, accelerating complex calculations and data‑intensive analysis.

It also includes 142 third‑generation RT cores delivering 212 TFLOPS ray‑tracing performance, with a 350 W power envelope, suitable for real‑time rendering, product design, and 3D content creation.

For large generative AI models, the L40S offers up to 1.2× inference and 1.7× training performance improvements over the A100.

The OVX server can house up to eight L40S GPUs; NVIDIA claims it can fine‑tune a 8.6 billion‑token GPT‑3‑40B model in seven hours and generate 80 images per minute with Stable Diffusion XL.

Analysis suggests that because the L40S uses PCIe Gen 4, its impact on demand for 800 Gbps optical modules is limited, but its cost‑performance advantage and mature memory technology may boost adoption by cloud providers, potentially benefiting mid‑range (200 Gbps/400 Gbps) optical modules.

NVIDIA’s FY2024 Q2 results show record revenue of $13.51 billion, driven by a 171% YoY increase in data‑center sales, largely from generative AI workloads, and the company plans to launch GH200‑based OEM servers in Q3 2023.

Overall, the L40S’s hardware specifications, improved efficiency for AI workloads, and competitive pricing position it as a compelling option for data‑center operators and may stimulate growth in the optical‑module market.

performanceGPUNVIDIAgenerative AIData CenterL40Soptical modules
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.