Tag

NeMo

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jan 21, 2025 · Artificial Intelligence

NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF

This article presents NVIDIA's NeMo technology stack for end‑to‑end large language model (LLM) training, covering the full software pipeline, model alignment with reinforcement learning from human feedback (RLHF), performance optimizations such as model parallelism, FP8, TensorRT‑LLM inference, dynamic load balancing, and future research directions.

GPU optimizationLLMNeMo
0 likes · 24 min read
NVIDIA NeMo Full Stack: End‑to‑End Large Language Model Training, Alignment, and RLHF
DataFunSummit
DataFunSummit
Oct 2, 2024 · Artificial Intelligence

NVIDIA’s Solutions for Large Language Models: NeMo Framework, TensorRT‑LLM, and Retrieval‑Augmented Generation

This article explains NVIDIA’s end‑to‑end stack for large language models, covering the NeMo Framework for data processing, training, and deployment, the open‑source TensorRT‑LLM inference accelerator, and the Retrieval‑Augmented Generation (RAG) technique that enriches model outputs with external knowledge.

AI accelerationNVIDIANeMo
0 likes · 17 min read
NVIDIA’s Solutions for Large Language Models: NeMo Framework, TensorRT‑LLM, and Retrieval‑Augmented Generation
DataFunTalk
DataFunTalk
Jun 3, 2024 · Artificial Intelligence

Deploying Speech AI Services Quickly with NVIDIA Riva

This article explains how to use NVIDIA Riva to rapidly deploy speech AI services, covering Riva's overview, Chinese ASR model updates, TTS capabilities, customization options, the Quickstart tool, and a Q&A session that clarifies deployment, model fine‑tuning, and integration with NeMo and Triton.

ASRGPU AccelerationNVIDIA Riva
0 likes · 13 min read
Deploying Speech AI Services Quickly with NVIDIA Riva
DataFunTalk
DataFunTalk
Mar 15, 2024 · Artificial Intelligence

NVIDIA’s NeMo Framework and TensorRT‑LLM: Full‑Stack Solutions for Large Language Models and Retrieval‑Augmented Generation

This article explains NVIDIA’s end‑to‑end ecosystem for large language models, covering the NeMo Framework’s data processing, distributed training, model fine‑tuning, inference acceleration with TensorRT‑LLM, deployment via Triton, and Retrieval‑Augmented Generation (RAG) techniques that enhance model reliability and performance.

AINVIDIANeMo
0 likes · 16 min read
NVIDIA’s NeMo Framework and TensorRT‑LLM: Full‑Stack Solutions for Large Language Models and Retrieval‑Augmented Generation
DataFunTalk
DataFunTalk
Dec 6, 2023 · Artificial Intelligence

Distributed Training Techniques and Quantitative Analysis for Large Language Models (GPT‑175B)

This article presents a comprehensive overview of state‑of‑the‑art distributed training methods for large language models, using GPT‑175B as a case study to analyze memory, communication, and compute overheads, and to recommend practical optimization strategies such as tensor, pipeline, and sequence parallelism, ZeRO‑1 optimizer, and selective activation checkpointing.

GPU memory optimizationLLMMegatron
0 likes · 22 min read
Distributed Training Techniques and Quantitative Analysis for Large Language Models (GPT‑175B)