Tag

model acceleration

0 views collected around this technical thread.

Baidu Geek Talk
Baidu Geek Talk
May 15, 2024 · Artificial Intelligence

Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM: Challenges, Techniques, and Optimizations

The talk outlines how Baidu’s Baige AIAK‑LLM suite tackles the exploding compute demands of trillion‑parameter models by boosting Model FLOPS Utilization through advanced parallelism, memory‑saving recompute, zero‑offload, adaptive scheduling, and cross‑chip orchestration, delivering 30‑60% training and inference speedups and a unified cloud product.

AI infrastructureBaiduMFU
0 likes · 25 min read
Accelerating Large Model Training and Inference with Baidu Baige AIAK‑LLM: Challenges, Techniques, and Optimizations
DaTaobao Tech
DaTaobao Tech
Apr 26, 2024 · Artificial Intelligence

Accelerating Stable Diffusion Models: Evaluation of FlashAttention2, OneFlow, DeepCache, Stable-Fast, and LCM-LoRA

Our benchmark of FlashAttention2, OneFlow, DeepCache, Stable‑Fast, and LCM‑LoRA on Stable Diffusion models shows that DeepCache combined with PyTorch 2.2 consistently cuts inference time by 40‑50% with minimal code changes, while OneFlow offers 20‑40% speedups when compatible, making DeepCache the recommended default acceleration.

DeepCacheFlashAttention2LCM-LoRA
0 likes · 10 min read
Accelerating Stable Diffusion Models: Evaluation of FlashAttention2, OneFlow, DeepCache, Stable-Fast, and LCM-LoRA
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Apr 22, 2024 · Artificial Intelligence

PP-LCNet: A Lightweight CPU-Optimized Convolutional Neural Network

PP-LCNet is a lightweight convolutional neural network designed for Intel CPUs that leverages MKLDNN acceleration, H‑Swish activation, selective SE modules, larger kernels, and expanded fully‑connected layers to achieve higher accuracy without increasing inference latency across image classification, detection, and segmentation tasks.

CPU optimizationMKLDNNdeep learning
0 likes · 25 min read
PP-LCNet: A Lightweight CPU-Optimized Convolutional Neural Network
Alimama Tech
Alimama Tech
Dec 14, 2023 · Artificial Intelligence

AI-Driven Content Risk Control: System Evolution and Optimization at Alibaba

Alibaba Mom’s AI‑driven content risk platform has evolved from simple rule‑matching to a data‑centric, serverless architecture that integrates large‑model acceleration, decision‑tree compilation, high‑throughput vector retrieval and elastic word‑matching, delivering sub‑100 ms text and sub‑1 s image moderation while remaining stable during peak promotional traffic.

AIDevOpscontent moderation
0 likes · 25 min read
AI-Driven Content Risk Control: System Evolution and Optimization at Alibaba
360 Smart Cloud
360 Smart Cloud
Nov 20, 2023 · Artificial Intelligence

Overview of Recent Open‑Source AI Models and Tools (November 2023)

This article summarizes a collection of newly released open‑source AI projects covering natural‑language processing, multimodal processing, intelligent agents, recommendation systems, and model training acceleration, providing brief descriptions, key capabilities, and links to their repositories.

AIRecommendation systemslarge language models
0 likes · 9 min read
Overview of Recent Open‑Source AI Models and Tools (November 2023)
DataFunTalk
DataFunTalk
Oct 12, 2021 · Artificial Intelligence

PaddleNLP v2.1 Release: Taskflow One‑Click NLP, Few‑Shot Learning Enhancements, and 28× Text Generation Acceleration

PaddleNLP v2.1 introduces an industrial‑grade Taskflow for eight NLP scenarios, a three‑line few‑shot learning paradigm that boosts small‑sample performance, and a FasterTransformer‑based inference engine that delivers up to 28‑fold speedup for text generation, all backed by extensive model and algorithm integrations.

Artificial IntelligenceNLPPaddleNLP
0 likes · 7 min read
PaddleNLP v2.1 Release: Taskflow One‑Click NLP, Few‑Shot Learning Enhancements, and 28× Text Generation Acceleration
Tencent Tech
Tencent Tech
Feb 27, 2020 · Artificial Intelligence

How to Speed Up Deep Learning Models: Cutting-Edge Acceleration Techniques

Deep learning models often suffer from slow training and deployment due to their size, but a range of advanced acceleration methods—including model architecture optimization, pruning, quantization, knowledge distillation, and distributed training techniques—can dramatically improve speed and efficiency while maintaining performance.

deep learningdistributed trainingknowledge distillation
0 likes · 14 min read
How to Speed Up Deep Learning Models: Cutting-Edge Acceleration Techniques