Weekly AI paper roundup: protein design, open‑source agent, HunyuanOCR, Olmo 3
This weekly roundup highlights five recent AI papers—including HumanSense for multimodal LLM evaluation, JAM‑2 for de novo antibody design, the open‑source Olmo 3 language models, the Lumine generalist 3D agent, and the lightweight HunyuanOCR vision‑language model—summarizing their core contributions, results, and links.
Multimodal large language models (MLLM) show great promise for human‑like interaction, yet they lack a fine‑grained, human‑centric evaluation framework. A team from Xi'an Jiaotong University together with Ant Group introduced HumanSense , a comprehensive benchmark that measures both deep understanding of extended multimodal context and the generation of empathetic, context‑aware responses. Evaluation results reveal substantial gaps in current leading MLLMs for advanced interaction tasks, and the authors propose a multi‑stage, modality‑progressive reinforcement‑learning method called HumanSense‑Omni‑Reasoning , which markedly improves performance on high‑level reasoning and interaction benchmarks.
JAM‑2: Fully computational design of drug‑like antibodies with high success rates
The paper presents JAM‑2, a universal from‑scratch protein design system that, for the first time, efficiently designs VHH‑Fc antibodies and full‑length monoclonal antibodies (mAb) with drug‑like affinity and developability. Across 16 previously unseen targets, JAM‑2 achieved binding molecules for every target, with average success rates of 39 % for VHH‑Fc and 18 % for mAb. Paper link: https://go.hyper.ai/3Mfna
Olmo 3: Open‑source language model series
Olmo 3 introduces a family of open‑source language models at 7 B and 32 B parameters that achieve industry‑leading performance on long‑context reasoning, function calling, programming, instruction following, general dialogue, and knowledge retrieval. The release provides a complete model flow—including all training stages, checkpoints, datapoints, and dependencies—enabling end‑to‑end construction and deployment. Paper link: https://go.hyper.ai/HgvWV
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds
Lumine is the first open‑source generalist agent framework capable of executing complex, hour‑long tasks in rich 3D open‑world environments. It adopts a human‑like interaction paradigm, using a vision‑language model to unify perception, reasoning, and action in an end‑to‑end pipeline. The system processes raw pixel input at 5 frames / s and generates precise keyboard‑mouse actions at 30 frames / s, invoking the reasoning module only when necessary. Paper link: https://go.hyper.ai/6qg4A
HunyuanOCR Technical Report
The report introduces HunyuanOCR, a commercial‑grade, open‑source, lightweight (1 B‑parameter) vision‑language model for OCR tasks. Its architecture couples a native Vision Transformer (ViT) with a compact large language model (LLM) via an MLP adapter. Empirical results show HunyuanOCR surpasses existing commercial APIs, traditional OCR pipelines, and larger models such as Qwen3‑VL‑4B. Paper link: https://go.hyper.ai/KxstF
HyperAI Super Neural
Deconstructing the sophistication and universality of technology, covering cutting-edge AI for Science case studies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
