Turning Multi‑Teacher Conflict into Dynamic Constraints: Robust Reasoning Alignment for Multimodal LLMs (ICML 2026)
APO (Autonomous Preference Optimization) converts the drift and conflict among multiple teacher multimodal LLMs into dynamic negative constraints while treating consensus as a positive preference, enabling robust concept alignment and superior diagnostic accuracy on the CXR‑MAX benchmark, as demonstrated by extensive ICML‑2026 experiments.
