Tagged articles
6 articles
Page 1 of 1
SuanNi
SuanNi
Feb 23, 2026 · Artificial Intelligence

How FireRed-Image-Edit Sets New Standards for AI-Powered Image Editing

FireRed-Image-Edit, an open‑source instruction‑driven diffusion model, combines massive high‑quality data, a dual‑stream multimodal architecture, progressive training, and a comprehensive multi‑dimensional benchmark to achieve unprecedented pixel‑level control and human‑like editing performance across diverse visual tasks.

AITraining Strategiesdata engineering
0 likes · 12 min read
How FireRed-Image-Edit Sets New Standards for AI-Powered Image Editing
Architect
Architect
Jan 29, 2025 · Artificial Intelligence

How Janus‑Pro Redefines Multimodal AI with Bigger Models and New Training Strategies

DeepSeek’s newly released Janus‑Pro series (1B and 7B) advances multimodal AI by decoupling visual understanding and generation, employing optimized three‑stage training, massive data expansion, and larger LLM backbones, achieving performance that matches or exceeds leading models such as Meta, Google, OpenAI, and Stability AI.

DeepSeekJanus-ProModel Scaling
0 likes · 6 min read
How Janus‑Pro Redefines Multimodal AI with Bigger Models and New Training Strategies
NewBeeNLP
NewBeeNLP
Nov 11, 2024 · Artificial Intelligence

What Do Recent Multimodal LLM Papers Reveal About Vision‑Language Models?

This article surveys ten recent multimodal large language model papers, covering vision representation laws, a stricter instruction benchmark, safety impacts of visual adaptation, the Mini‑Gemini architecture, automatic pruning, vision capability boosting, long‑context transfer, efficient token sparsification, math reasoning, and hallucination mitigation.

BenchmarkTraining StrategiesVision-Language Models
0 likes · 18 min read
What Do Recent Multimodal LLM Papers Reveal About Vision‑Language Models?
Baobao Algorithm Notes
Baobao Algorithm Notes
May 9, 2024 · Artificial Intelligence

Inside Deepseek‑V2: How Multi‑Head Latent Attention Cuts KV‑Cache and Boosts Performance

This article provides an in‑depth technical analysis of Deepseek‑V2, covering its 236B parameter size, Multi‑Head Latent Attention optimization that reduces KV‑cache memory, architectural details, training pipelines, infrastructure choices, and performance results on benchmarks such as MMLU and instruction following.

AI ArchitectureDeepSeekModel Optimization
0 likes · 17 min read
Inside Deepseek‑V2: How Multi‑Head Latent Attention Cuts KV‑Cache and Boosts Performance
NewBeeNLP
NewBeeNLP
Feb 22, 2024 · Artificial Intelligence

Practical Tips for CPT, SFT, and LoRA in Large Language Model Fine‑Tuning

This article shares hands‑on guidance on using continual pre‑training (CPT), supervised fine‑tuning (SFT), and LoRA adapters for large language models, covering dataset size requirements, learning‑rate scheduling, warm‑up ratios, epoch strategies, and practical routing choices based on real‑world experiments.

CPTLLM fine-tuningLoRA
0 likes · 12 min read
Practical Tips for CPT, SFT, and LoRA in Large Language Model Fine‑Tuning
DataFunSummit
DataFunSummit
Nov 18, 2021 · Artificial Intelligence

Enterprise Applications and Research of Speech Translation

This article reviews recent advances in speech translation, discusses ByteDance's practical deployments, compares cascade and end‑to‑end modeling approaches, introduces improved encoder‑decoder architectures and training strategies, and reports state‑of‑the‑art results on the IWSLT 2021 benchmark.

AIByteDanceEnd-to-End
0 likes · 15 min read
Enterprise Applications and Research of Speech Translation