AI Weekly Digest Issue 1 – Fine‑Grained Video Understanding, Structured JSON Output, and Market Insights

This issue introduces a Tsinghua research breakthrough for fine‑grained video understanding with multimodal large models, explains how to engineer open‑source models to reliably emit JSON, reviews recent AI‑driven earnings of cloud giants and challenges faced by domestic AI firms, and highlights a new text‑to‑image tool, Recraft.ai.

ZhongAn Tech Team
ZhongAn Tech Team
ZhongAn Tech Team
AI Weekly Digest Issue 1 – Fine‑Grained Video Understanding, Structured JSON Output, and Market Insights

Weekly AI Digest – Issue 1 (Second week of November 2024)

The AI team at ZhongAn launches a weekly internal newsletter where a rotating algorithm specialist curates recent AI research, engineering advances, market news, and useful tools for colleagues interested in AI.

Rotating Editor

Sun Jie – three years at ZhongAn, experienced in image and video processing, familiar with advertising and claims‑handling business units.

New Project‑Related Technology

Paper Highlight: A Tsinghua University paper presented at NeuralPS 2024 proposes a novel paradigm for fine‑grained video understanding using large multimodal models. The approach splits annotation into two stages – coarse static enhancement and fine dynamic enhancement – and employs multiple large models for self‑enhancement, followed by a discriminative model that selects the optimal textual annotation.

Related Projects: Material quality inspection and video‑based claim processing.

Business Pain Point: Existing AI models are too coarse for the wide granularity required (from scene‑level to specific actions or expressions). After deploying a multimodal large‑model inspection system in July, accuracy plateaued at 70‑80%, prompting the need for methods that align model granularity with business needs.

Insights from Sun Jie:

Combining multiple large‑model workflows with a discriminative model offers a low‑cost, automated fine‑annotation solution that can improve the model’s “business awareness”.

The engineering‑centric system, including open‑source scene‑segmentation components, can be directly applied to our projects.

Ensuring Open‑Source Large Models Output Correct JSON

Technical Pain Point: Structured JSON output is essential for integrating generative models with downstream systems. Unlike OpenAI’s API, most open‑source models still produce malformed JSON at a non‑trivial rate, especially after fine‑tuning or when using smaller models.

How to “Reverse Engineer” OpenAI’s Structured Output: After OpenAI’s August 2024 update, the underlying technique – dynamic constrained decoding (implemented in the SGLang library) – inserts JSON‑Schema constraints during generation and filters candidate tokens with regex checks. By applying prompt engineering, the open‑source SGLang library, and post‑processing, any large model can be equipped with a reliable structured‑output mode.

Sun Jie adds that intervening mid‑generation with supplemental prompts not only enforces format but may also boost overall answer accuracy.

Market and Voices

Cloud giants reported strong AI‑driven revenue growth in Q3 2024: Microsoft and Google Cloud saw >30% YoY increase, with Microsoft attributing 12% of growth to AI demand. Google’s CEO claimed 25% of new code is AI‑generated, a statement that sparked debate.

Surveys show 76% of programmers are using or planning to use AI‑assisted coding, and 92% of U.S. developers already employ AI code tools.

Domestically, SenseTime faces severe losses and layoffs, with AI‑related expenditures cited as a major factor. An AI‑generated analysis suggests focusing on core business, accelerating commercialization, optimizing cost structure, and exploring compute‑rental revenue streams.

AI Tools Worth Trying This Week

Recraft.ai – New Benchmark for Text‑to‑Image

In benchmark competitions, Recraft.ai achieves >70% win rate. It offers a designer‑friendly canvas for generation, upscaling, editing, and even limited 3D perception. The main drawback is lack of Chinese text support. Free tier provides 50 points for 25 generations.

Reference Links

[1]

Detailed reading: https://developer.aliyun.com/article/1632397 [2] Official website: https://www.recraft.ai/

AI market analysisAI tool recommendationfine-grained video understandinglarge multimodal modelsstructured JSON output
ZhongAn Tech Team
Written by

ZhongAn Tech Team

China's first online insurer. Through tech innovation we make insurance simpler, warmer, and more valuable. Powered by technology, we support 50 billion RMB of policies and serve 600 million users with smart, personalized solutions. ZhongAn's hardcore tech and article shares are here.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.