SuanNi
Mar 4, 2026 · Artificial Intelligence
How to Fit Large Language Models into Cars and Robots: A Hardware‑Aware Scaling Law
This article presents a hardware‑aware co‑design framework for edge‑deployed large language models, revealing a scaling law that balances model accuracy and inference latency, and demonstrates how Pareto‑optimal architectures can be discovered quickly using roofline analysis and systematic search on devices like NVIDIA Jetson Orin.
AI inferenceEdge ComputingPareto optimization
0 likes · 15 min read
