How LoRA Enables Multimodal Capabilities in Large Language Models
This article compares two ways to add vision to large language models—training a native multimodal model from scratch or attaching a visual module to a pretrained LLM—then details the VoRA approach that uses LoRA adapters to inject visual knowledge without extra inference cost.
