Tagged articles

computer vision

667 articles · Page 2 of 7

May 26, 2025 · Artificial Intelligence

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

This article details how JD Retail's young algorithm engineers tackled a series of AI engineering problems—including advertising image quality assessment with multi‑reward models, large‑language‑model‑driven query expansion, FFT‑and‑RDP‑based model pruning, and agent‑centric reinforcement learning—while sharing practical growth insights and code snippets.

AIModel OptimizationQuery Expansion

0 likes · 15 min read

Solving Technical Challenges at JD Retail: Multi‑Reward Models, LLM‑Based Query Expansion, Model Pruning, and Reinforcement Learning

JD Tech

May 20, 2025 · Artificial Intelligence

How Re‑parameterization and Adaptive Learning Boost Visual Deep Learning Efficiency

The award‑winning project from Tsinghua University and JD Retail introduces re‑parameterization model design, cross‑scene adaptive learning, and platform‑aware compression to overcome accuracy‑efficiency trade‑offs in visual deep learning, achieving over 20% accuracy gains and more than 50% inference speedup in real‑world e‑commerce deployments.

AI researchadaptive modelscomputer vision

0 likes · 6 min read

How Re‑parameterization and Adaptive Learning Boost Visual Deep Learning Efficiency

Network Intelligence Research Center (NIRC)

May 19, 2025 · Artificial Intelligence

How 3D Modeling Powers Digital Humans: From 3DMM to NeRF

The article explains what digital humans are, reviews the evolution of 3D modeling techniques—from early 2D hand‑drawn methods and 3DMM to deep‑learning‑based implicit models like NeRF—and discusses current challenges and future research directions.

3D modeling3DMMNeRF

0 likes · 7 min read

How 3D Modeling Powers Digital Humans: From 3DMM to NeRF

AIWalker

May 18, 2025 · Artificial Intelligence

YOLOE: Open‑Source Real‑Time Anything Detector Beats YOLO‑World v2

YOLOE unifies object detection and segmentation in a single efficient model that supports text, visual, and prompt‑free inference, introduces RepRTA, SAVPE, and LRPC strategies, and achieves higher AP with up to three‑fold lower training cost and 1.4× faster inference on GPUs and mobile devices, as demonstrated by extensive LVIS and COCO experiments.

Prompt EngineeringReal-timeYOLOE

0 likes · 29 min read

YOLOE: Open‑Source Real‑Time Anything Detector Beats YOLO‑World v2

DaTaobao Tech

May 16, 2025 · Artificial Intelligence

JianYi: AI‑Powered Image Segmentation and Matting System for Taobao Home‑Decoration

The article introduces JianYi, a self‑developed image segmentation and matting system for Taobao's home‑decoration business that supports product, human, and panoramic segmentation with multi‑modal interaction, achieving high‑precision real‑time performance and powering AI tools such as "Jiazuo" and "Fang Wo Jia".

artificial-intelligencecomputer visiondeep learning

0 likes · 11 min read

JianYi: AI‑Powered Image Segmentation and Matting System for Taobao Home‑Decoration

Bilibili Tech

May 16, 2025 · Artificial Intelligence

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

The article introduces FineVD, the first large‑scale multi‑dimensional UGC video quality dataset, and presents FineVQ, a unified model that predicts quality scores, attributes, and distortion types across six dimensions, achieving state‑of‑the‑art performance on multiple benchmarks and cross‑dataset evaluations.

FineVQMultimodalUGC

0 likes · 9 min read

How FineVQ Sets New Standards for Fine‑Grained UGC Video Quality Assessment

AI Frontier Lectures

May 15, 2025 · Artificial Intelligence

OverLoCK: How a Bio‑Inspired Three‑Stage ConvNet Beats Transformers on Vision Tasks

OverLoCK introduces a bio‑inspired depth‑stage decomposition that splits a network into Base‑Net, Overview‑Net and Focus‑Net, and a novel Context‑Mix dynamic convolution, achieving state‑of‑the‑art accuracy on image classification, detection and segmentation while balancing speed and model size.

ConvNetcomputer vision

0 likes · 11 min read

OverLoCK: How a Bio‑Inspired Three‑Stage ConvNet Beats Transformers on Vision Tasks

AI Frontier Lectures

May 15, 2025 · Artificial Intelligence

DefMamba: How Deformable Scanning Boosts Vision State‑Space Models

DefMamba introduces a deformable visual state‑space model that dynamically adjusts scanning paths and reference points, preserving spatial structure and improving feature capture, achieving state‑of‑the‑art results on ImageNet classification, COCO detection, and ADE20K segmentation while reducing computational cost.

DefMambaDeformable ScanningState Space Model

0 likes · 23 min read

DefMamba: How Deformable Scanning Boosts Vision State‑Space Models

AIWalker

May 14, 2025 · Artificial Intelligence

How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters

This paper presents HGO‑YOLO, a lightweight real‑time anomaly‑behavior detector that integrates HGNetv2 and GhostConv into YOLOv8, achieving 87.4% mAP with just 4.6 MB of parameters and 56 FPS on CPU, and validates its performance across multiple datasets and hardware platforms.

Anomaly DetectionLightweight ModelsYOLO

0 likes · 25 min read

How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters

AIWalker

May 13, 2025 · Artificial Intelligence

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

PixelHacker introduces a latent class guidance (LCG) paradigm that injects foreground and background embeddings into a diffusion model, training on 14 million image‑mask pairs and achieving state‑of‑the‑art structural and semantic consistency across Places2, CelebA‑HQ and FFHQ benchmarks.

Diffusion ModelsPixelHackerSOTA

0 likes · 16 min read

PixelHacker: Diffusion‑Based Image Inpainting with Latent Class Guidance Beats SOTA

AIWalker

May 12, 2025 · Artificial Intelligence

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks

DefMamba introduces a multi‑scale backbone, deformable Mamba modules, and a dynamic scanning strategy to preserve image spatial structure, achieving state‑of‑the‑art performance on image classification, object detection, and semantic segmentation benchmarks.

DefMambaSemantic Segmentationcomputer vision

0 likes · 23 min read

DefMamba: A Deformable Multi‑Scale Visual Foundation Model that Boosts Vision Tasks

Meituan Technology Team

Apr 24, 2025 · Artificial Intelligence

Meituan AI Recruitment: Join Our Advanced Technology Teams

Meituan's AI recruitment page showcases diverse opportunities across AI infrastructure, intelligent interaction, visual intelligence, and intelligent products, featuring roles from algorithm engineers to product managers working on cutting-edge technologies including large models, intelligent agents, and multimodal systems.

AI recruitmentIntelligent agentsMultimodal AI

0 likes · 5 min read

Meituan AI Recruitment: Join Our Advanced Technology Teams

php Courses

Apr 23, 2025 · Artificial Intelligence

Real-Time Face Recognition with PHP and OpenCV

This article explains how to set up a PHP environment, control a camera, and use the OpenCV library to perform real‑time face detection and recognition with code examples, demonstrating a practical security solution for applications such as access control and surveillance.

PHPReal-timecomputer vision

0 likes · 6 min read

Real-Time Face Recognition with PHP and OpenCV

Liangxu Linux

Apr 22, 2025 · Artificial Intelligence

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars

This article compiles a ranked list of ten popular open-source OCR projects on GitHub, summarizing each tool’s key capabilities—such as multimodal text extraction, PDF linearization, layout analysis, and multilingual support—along with star counts and direct repository links for developers seeking ready-to-use OCR solutions.

GitHubMultimodalOCR

0 likes · 9 min read

Top 10 Open-Source OCR Projects on GitHub Ranked by Stars

JD Cloud Developers

Apr 22, 2025 · Artificial Intelligence

How AI Turns 2D Videos into Immersive 3D Spatial Content at Scale

Leveraging 3D vision and AIGC, JD Retail’s R&D team converts abundant 2D video assets into high‑quality stereoscopic 3D space videos through a pipeline that includes monocular depth estimation, novel view synthesis, multi‑branch inpainting, and MV‑HEVC encoding, validated by ICME 2025 and a new StereoV1K dataset.

3D videoAIGCDepth Estimation

0 likes · 26 min read

How AI Turns 2D Videos into Immersive 3D Spatial Content at Scale

JD Tech Talk

Apr 22, 2025 · Artificial Intelligence

End-to-End 3D Spatial Video Generation via Monocular Depth Estimation, Novel View Synthesis, and MV-HEVC Encoding

Leveraging AI-driven monocular depth estimation, novel view synthesis, and MV‑HEVC encoding, the JD Retail Content R&D team presents an end‑to‑end pipeline that converts 2D video assets into high‑quality immersive 3D spatial videos, introduces the large‑scale StereoV1K dataset, and demonstrates superior performance over existing methods.

3D video generationAIGCMV-HEVC

0 likes · 22 min read

End-to-End 3D Spatial Video Generation via Monocular Depth Estimation, Novel View Synthesis, and MV-HEVC Encoding

Amap Tech

Apr 21, 2025 · Artificial Intelligence

Lenna: Language‑Enhanced Reasoning Detection Assistant and a Chain‑of‑Thought Image Editing Framework Using Multimodal Large Language Models

At ICASSP 2025, Gaode’s two accepted papers present Lenna, a language‑enhanced reasoning detection assistant that adds a DET token to multimodal LLMs and achieves state‑of‑the‑art accuracy on RefCOCO benchmarks, and a chain‑of‑thought image‑editing framework that converts complex prompts into segmented masks and repair prompts for diffusion‑based inpainting, surpassing existing methods.

AIChain-of-ThoughtICASSP

0 likes · 10 min read

Lenna: Language‑Enhanced Reasoning Detection Assistant and a Chain‑of‑Thought Image Editing Framework Using Multimodal Large Language Models

Python Programming Learning Circle

Apr 19, 2025 · Artificial Intelligence

Building an AI‑Powered Dou Dizhu Card‑Playing Assistant with YOLOv5 and DouZero

This tutorial explains how to create an AI‑driven Dou Dizhu (Chinese poker) assistant that captures game screenshots, uses YOLOv5 for card detection, leverages the DouZero model for optimal move prediction, and provides a PyQt5 UI for real‑time play assistance, including environment setup and code examples.

AIDouZeroPyQt5

0 likes · 13 min read

Building an AI‑Powered Dou Dizhu Card‑Playing Assistant with YOLOv5 and DouZero

Alibaba Cloud Developer

Apr 18, 2025 · Artificial Intelligence

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

The open‑sourced 14‑billion‑parameter Tongyi Wanxiang video model can create high‑quality 720p videos that seamlessly connect user‑provided start and end images, offering controllable, personalized video generation with prompt‑driven camera motions and easy access via its website, GitHub, Hugging Face, and ModelScope.

AI modelcomputer visiondeep learning

0 likes · 5 min read

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

DataFunTalk

Apr 18, 2025 · Artificial Intelligence

Applying ByteDance’s Doubao‑1.5 Vision Model for Image Counting and Automated Annotation

The article demonstrates how ByteDance’s new Doubao‑1.5 multimodal model can be used to locate and count objects in images—such as sushi plates, street signs, and cartoon hats—by generating coordinates and overlaying visual annotations through a concise Python script.

AIDoubaoImage Annotation

0 likes · 5 min read

Applying ByteDance’s Doubao‑1.5 Vision Model for Image Counting and Automated Annotation

AIWalker

Apr 16, 2025 · Artificial Intelligence

Plug‑and‑Play Multi‑Scale Attention: A Seamless Boost for Model Performance

This article reviews recent multi‑scale attention breakthroughs—including EMA, MSDA, VWA, and related modules—showing how they improve accuracy, cut FLOPs by up to 70%, and can be inserted into existing models with minimal effort, backed by code and paper links.

Model EfficiencyPlug-and-Playcomputer vision

0 likes · 10 min read

Plug‑and‑Play Multi‑Scale Attention: A Seamless Boost for Model Performance

JD Retail Technology

Apr 16, 2025 · Artificial Intelligence

AI‑Driven 3D Spatial Video Generation from Monocular 2D Content with MV‑HEVC Encoding

This work presents an end‑to‑end AI pipeline that transforms existing monocular 2D videos into immersive 3D spatial streams by combining DINO‑v2‑based depth estimation, multi‑branch view synthesis, and MV‑HEVC encoding, achieving up to 33 % BD‑Rate reduction, 31 % speed gains, state‑of‑the‑art visual quality, and real‑time production suitability, validated on the new StereoV1K benchmark and deployed in JD.Vision’s e‑commerce catalog.

3D videoAI generationAIGC

0 likes · 21 min read

AI‑Driven 3D Spatial Video Generation from Monocular 2D Content with MV‑HEVC Encoding

AI Frontier Lectures

Apr 13, 2025 · Artificial Intelligence

How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality

The paper introduces HINT, a Transformer‑based image restoration model that employs Hierarchical Multi‑Head Attention (HMHA) and a Query‑Key Cache Updating (QKCU) module to eliminate attention redundancy, achieving superior PSNR/SSIM scores across low‑light enhancement, dehazing, desnowing, denoising, and deraining tasks while maintaining low model complexity.

Hierarchical AttentionTransformercomputer vision

0 likes · 10 min read

How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality

AI Frontier Lectures

Apr 11, 2025 · Artificial Intelligence

How Q-Insight Uses Reinforcement Learning to Make AI Truly Understand Image Quality

Q-Insight, a multimodal large‑model introduced by Peking University and Volcano Engine, leverages reinforcement learning and a novel Group Relative Policy Optimization algorithm to evaluate image quality, providing detailed reasoning, degradation detection, and zero‑shot comparison, outperforming state‑of‑the‑art methods on multiple benchmarks.

AIVideo Cloudcomputer vision

0 likes · 10 min read

How Q-Insight Uses Reinforcement Learning to Make AI Truly Understand Image Quality

AI Frontier Lectures

Apr 10, 2025 · Artificial Intelligence

How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds

WonderTurbo introduces a real‑time 3D scene generation pipeline that accelerates both geometry and appearance modeling to under a second per view, using StepSplat, QuickDepth, and FastPaint modules, achieving up to 15× speedup while maintaining high visual quality.

3D generationDepth CompletionGeometry Modeling

0 likes · 16 min read

How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds

AIWalker

Apr 7, 2025 · Artificial Intelligence

TurboFill: High‑Quality Image Inpainting in Just 4 Steps

TurboFill introduces a fast image‑inpainting model that trains a repair adapter on a few‑step text‑to‑image diffusion backbone, achieving state‑of‑the‑art results with only four diffusion steps while dramatically reducing computational cost.

Diffusion ModelsTurboFillcomputer vision

0 likes · 17 min read

TurboFill: High‑Quality Image Inpainting in Just 4 Steps

AI Frontier Lectures

Apr 4, 2025 · Artificial Intelligence

How OverLoCK Redefines Vision Backbones with Dynamic Convolution

OverLoCK, a new vision backbone inspired by human top‑down attention, combines a three‑stage decomposition, dynamic ContMix convolutions and top‑down guidance to achieve state‑of‑the‑art performance on ImageNet classification, COCO detection and ADE20K segmentation while maintaining strong trade‑offs.

OverLoCKTop-down AttentionVision Backbone

0 likes · 10 min read

How OverLoCK Redefines Vision Backbones with Dynamic Convolution

Python Programming Learning Circle

Mar 29, 2025 · Artificial Intelligence

Hand Gesture Detection Using OpenCV and Python: Skin Color and Contour Processing

This article presents a step‑by‑step tutorial for building a hand‑gesture detection system in Python using OpenCV, covering video capture, skin‑color detection via YCrCb conversion, contour extraction, and full source code for processing frames and visualizing results.

Hand Gesturecomputer visionopencv

0 likes · 6 min read

Hand Gesture Detection Using OpenCV and Python: Skin Color and Contour Processing

AI Frontier Lectures

Mar 29, 2025 · Artificial Intelligence

How MMGDreamer Achieves Precise Geometry Control in 3D Indoor Scene Generation

MMGDreamer introduces a mixed‑modality graph and a dual‑branch diffusion model that combine text, image, and relational cues to generate highly realistic, geometrically controllable 3D indoor scenes, outperforming prior methods on multiple quantitative and qualitative benchmarks.

3D scene generationAI researchcomputer vision

0 likes · 11 min read

How MMGDreamer Achieves Precise Geometry Control in 3D Indoor Scene Generation

AIWalker

Mar 27, 2025 · Artificial Intelligence

MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

MagicColor introduces a novel multi‑instance sketch‑coloring framework that uses a two‑stage self‑play training strategy, instance guidance, and edge‑aware pixel‑level color matching to automatically produce high‑quality, consistent colors for multiple line‑art instances, outperforming prior GAN and diffusion‑based methods.

AIDiffusion ModelsMulti-Instance

0 likes · 16 min read

MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

AIWalker

Mar 25, 2025 · Artificial Intelligence

ContinuousSR: Reconstructing Continuous High-Resolution Signals from Discrete Low-Resolution Images

ContinuousSR introduces a Pixel-to-Gaussian paradigm that models images as continuous Gaussian fields, enabling arbitrary‑scale super‑resolution with 0.9 dB PSNR gains and up to 19.5× faster rendering compared to existing methods.

Arbitrary-Scale SRContinuousSRPixel-to-Gaussian

0 likes · 5 min read

ContinuousSR: Reconstructing Continuous High-Resolution Signals from Discrete Low-Resolution Images

AI Frontier Lectures

Mar 25, 2025 · Artificial Intelligence

Can Mixed‑Modality Graphs Unlock Precise 3D Indoor Scene Generation?

MMGDreamer introduces a mixed‑modality graph and a dual‑branch diffusion model that jointly enhance geometric control and realism in 3D indoor scene synthesis, outperforming state‑of‑the‑art methods across multiple quantitative and qualitative benchmarks.

3D scene generationAIcomputer vision

0 likes · 12 min read

Can Mixed‑Modality Graphs Unlock Precise 3D Indoor Scene Generation?

AI Frontier Lectures

Mar 24, 2025 · Artificial Intelligence

How MambaIRv2 Boosts Image Restoration with Attentive State‑Space Design

Introducing MambaIRv2, an image restoration backbone that replaces Mamba’s causal scanning with an attentive state‑space module, achieving single‑direction scanning, reduced parameters and computation, and superior performance on lightweight and classic super‑resolution, JPEG artifact removal, and denoising tasks, as validated by CVPR‑2025 results.

MambaIRv2attentioncomputer vision

0 likes · 8 min read

How MambaIRv2 Boosts Image Restoration with Attentive State‑Space Design

AntTech

Mar 14, 2025 · Artificial Intelligence

MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding

The CVPR 2025 paper "MP-GUI: Modality Perception with MLLMs for GUI Understanding" presents a novel algorithm that enhances multimodal large language models' ability to perceive and reason about graphical user interfaces by integrating text, visual, and spatial signals through specialized perception modules and a dynamic fusion gate, achieving state‑of‑the‑art performance on multiple GUI benchmarks.

CVPR2025GUI UnderstandingMLLM

0 likes · 5 min read

MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding

AIWalker

Mar 13, 2025 · Artificial Intelligence

YOLOE: Real‑Time Open‑World Object Detection and Segmentation Unveiled

The paper introduces YOLOE, a new YOLO‑based model that supports text, visual, and no‑prompt open‑world detection and segmentation, detailing its lightweight RepRTA, SAVPE, and LRPC modules and showing benchmark gains in speed and zero‑shot performance on LVIS and COCO.

YOLOEbenchmarkcomputer vision

0 likes · 9 min read

YOLOE: Real‑Time Open‑World Object Detection and Segmentation Unveiled

php Courses

Mar 13, 2025 · Artificial Intelligence

Real-Time Image Processing with PHP and OpenCV: A Step-by-Step Tutorial

This tutorial guides PHP developers through installing OpenCV and the php‑opencv extension, capturing live video, displaying frames in a browser, and performing real‑time face detection using Haar cascades, providing a practical introduction to computer‑vision tasks in PHP.

Image processingPHPReal-time

0 likes · 6 min read

Real-Time Image Processing with PHP and OpenCV: A Step-by-Step Tutorial

AIWalker

Mar 8, 2025 · Artificial Intelligence

IMAGPose: A Unified Conditional Framework for Photo‑Realistic Pose‑Guided Person Generation (NeurIPS 2024)

IMAGPose introduces a unified conditional diffusion framework that combines feature‑level, image‑level, and cross‑view attention modules to generate high‑fidelity, photo‑realistic person images under diverse pose and multi‑view scenarios, outperforming prior SOTA methods on DeepFashion and Market‑1501.

AIDiffusion Modelscomputer vision

0 likes · 22 min read

IMAGPose: A Unified Conditional Framework for Photo‑Realistic Pose‑Guided Person Generation (NeurIPS 2024)

AIWalker

Mar 8, 2025 · Artificial Intelligence

Trainable HVI Color Space Turns Dark Photos into Cinematic Images – CVPR 2025

The paper introduces a globally first trainable HVI color space and a lightweight CIDNet network that jointly model intensity and chrominance, eliminating color bias and brightness artifacts in low‑light image enhancement and achieving state‑of‑the‑art results on ten benchmark datasets.

CIDNetCVPR 2025HVI color space

0 likes · 12 min read

Trainable HVI Color Space Turns Dark Photos into Cinematic Images – CVPR 2025

AIWalker

Mar 7, 2025 · Artificial Intelligence

How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks

The paper introduces GIFNet, a three‑branch network that leverages low‑level visual tasks and a cross‑fusion gating mechanism to achieve a single, task‑agnostic image‑fusion model with dramatically reduced computation, strong generalization to unseen modalities, and even single‑modal enhancement capabilities.

CVPR2025GIFNetImage Fusion

0 likes · 20 min read

How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks

AIWalker

Mar 6, 2025 · Artificial Intelligence

How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution

The paper introduces a Semantic‑Concentrated Multi‑Head Self‑Attention (SCMHSA) module and a new embedding‑space loss to address semantic dilution and loss‑target mismatch in Transformer‑based video next‑frame prediction, demonstrating significant PSNR and MSE gains across four benchmark datasets.

Embedding LossSCMHSASemantic Dilution

0 likes · 23 min read

How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution

AIWalker

Mar 1, 2025 · Artificial Intelligence

UltraFusion HDR: AI-Generated HDR Algorithm Captures Detail and Balances Exposure

The UltraFusion HDR algorithm combines generative AI with traditional exposure fusion to recover details and produce natural‑looking high‑dynamic‑range images even when the exposure gap reaches up to 9 EV, turning over‑exposed or under‑exposed shots into high‑quality photos.

Generative AIHDRImage Fusion

0 likes · 6 min read

UltraFusion HDR: AI-Generated HDR Algorithm Captures Detail and Balances Exposure

Xiaohongshu Tech REDtech

Feb 27, 2025 · Artificial Intelligence

SAFE: A Lightweight General AI Image Detection Method Achieving 96.7% Accuracy Across 33 Test Subsets

SAFE is a lightweight AI‑image detection framework using only 1.44 M parameters and 2.30 B FLOPs that preserves fine‑grained artifacts through crop‑based preprocessing, invariant augmentations, and high‑frequency wavelet features, achieving an average 96.7 % accuracy across 33 test subsets and strong generalization to unseen GAN and diffusion generators.

AI image detectioncomputer visiondeep learning

0 likes · 11 min read

SAFE: A Lightweight General AI Image Detection Method Achieving 96.7% Accuracy Across 33 Test Subsets

AI Product Manager Community

Feb 26, 2025 · Artificial Intelligence

How Alibaba Cloud’s Open‑Source Wan 2.1 Sets New Benchmarks in Video Generation

Alibaba Cloud’s newly open‑sourced visual generation model Wan 2.1 achieves a VBench score of 86.22%, outperforms leading models, runs on consumer‑grade GPUs with only 8.2 GB VRAM, and supports multi‑task video creation, marking a significant step for open‑source video AI.

Alibaba Cloudbenchmarkcomputer vision

0 likes · 6 min read

How Alibaba Cloud’s Open‑Source Wan 2.1 Sets New Benchmarks in Video Generation

Xiaohongshu Tech REDtech

Feb 24, 2025 · Artificial Intelligence

AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark

The paper introduces AIDE, a hybrid AI‑generated image detector that fuses low‑level pixel statistics with high‑level semantic embeddings, and the manually curated Chameleon benchmark of ~26 000 diverse, high‑realism images, showing AIDE surpasses nine state‑of‑the‑art methods by up to 4.6 % while highlighting remaining challenges on this tougher dataset.

AI-generated image detectionbenchmark datasetcomputer vision

0 likes · 14 min read

AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark

AIWalker

Feb 23, 2025 · Artificial Intelligence

D-FINE Redefines Bounding-Box Regression to Reach State-of-the-Art Real-Time Detection

D-FINE introduces Fine-grained Distribution Refinement and Global Optimal Localization Self-Distillation to overhaul DETR's bounding-box regression, achieving 54‑59% AP on COCO and Objects365 at 78‑124 FPS while surpassing YOLO and RT-DETR in both accuracy and speed.

DETRReal-timeSelf‑Distillation

0 likes · 25 min read

D-FINE Redefines Bounding-Box Regression to Reach State-of-the-Art Real-Time Detection

AIWalker

Feb 19, 2025 · Artificial Intelligence

YOLOv12 Unveiled: Boosted Performance and Speed for Real‑Time Detection

YOLOv12 introduces an attention‑centric architecture, a lightweight regional attention module, and the R‑ELAN aggregation network, delivering consistent mAP gains and lower latency across N, S, M, L and X model scales while surpassing previous YOLO versions and other real‑time detectors.

Attention MechanismReal-timeYOLOv12

0 likes · 8 min read

YOLOv12 Unveiled: Boosted Performance and Speed for Real‑Time Detection

DevOps

Feb 17, 2025 · Artificial Intelligence

Microsoft OmniParser V2.0: A Visual Agent Parsing Framework for Enhanced UI Understanding

Microsoft's OmniParser V2.0 transforms large language models such as DeepSeek‑R1, GPT‑4o, and Qwen‑2.5VL into visual AI agents by accurately detecting interactive UI elements, providing semantic descriptions, and generating structured representations that boost inference speed, reduce latency by 60%, and dramatically improve benchmark accuracy.

AI AgentDeepSeekGPT-4o

0 likes · 7 min read

Microsoft OmniParser V2.0: A Visual Agent Parsing Framework for Enhanced UI Understanding

Python Programming Learning Circle

Feb 11, 2025 · Artificial Intelligence

Face Swapping in Python Using dlib, OpenCV, and Procrustes Alignment

This article demonstrates how to create a Python script that automatically detects facial landmarks with dlib, aligns two images using Procrustes analysis, corrects color differences, and blends the faces together with OpenCV, providing complete code and step‑by‑step explanations.

Procrustescomputer visiondlib

0 likes · 12 min read

Face Swapping in Python Using dlib, OpenCV, and Procrustes Alignment

php Courses

Feb 10, 2025 · Artificial Intelligence

Real-Time Face Recognition Using PHP and OpenCV

This article explains how to set up a PHP environment with OpenCV, control a camera to capture images, perform real-time face detection using Haar cascades, train and apply an LBPH face recognizer, and integrate the results into a security system.

PHPReal-timecomputer vision

0 likes · 5 min read

Real-Time Face Recognition Using PHP and OpenCV

AIWalker

Feb 9, 2025 · Artificial Intelligence

Douyin’s BDVQAGroup Secures Global Runner‑Up in DXOMARK Image Quality Challenge at CVPR 2024

At CVPR 2024 NTIRE, Douyin’s BDVQAGroup achieved second place worldwide in the DXOMARK portrait quality track using their SampleIQA model, which combines data‑re‑sampling, a Swin‑Transformer backbone, twin‑network ranking loss and content‑aware cropping to outperform existing IQA state‑of‑the‑art methods.

DXOMARKNTIRE2024SampleIQA

0 likes · 10 min read

Douyin’s BDVQAGroup Secures Global Runner‑Up in DXOMARK Image Quality Challenge at CVPR 2024

JD Tech

Feb 5, 2025 · Artificial Intelligence

Tech Insight: Highlights of Ten JD Retail Technology Papers Published in Top AI Conferences (2024)

Tech Insight presents concise overviews of ten JD retail technology papers accepted at top AI conferences in 2024, covering topics such as open‑vocabulary object detection, multi‑scenario ranking, diversity‑aware re‑ranking, a diversified product search dataset, semi‑supervised query classification, plug‑in CTR models, and methods to mitigate LLM hallucinations.

AIInformation RetrievalRanking

0 likes · 17 min read

Tech Insight: Highlights of Ten JD Retail Technology Papers Published in Top AI Conferences (2024)

DataFunSummit

Jan 28, 2025 · Artificial Intelligence

Few-Shot Learning for Multi-New-Class Scenarios: Challenges, Methodology, and Experimental Evaluation

This article introduces a novel few‑shot learning approach tailored for multi‑new‑class scenarios, discusses its background, problem definition, proposed parallel training framework, hierarchical fine‑tuning method, and presents extensive experiments demonstrating superior performance and computational efficiency.

computer visionfew-shot learninghierarchical fine-tuning

0 likes · 10 min read

Few-Shot Learning for Multi-New-Class Scenarios: Challenges, Methodology, and Experimental Evaluation

DataFunSummit

Jan 27, 2025 · Artificial Intelligence

Intelligent Plastic Bottle Sorting: Challenges, Multimodal AI Methods, High‑Speed Performance, and Commercialization Path

This article examines the state and challenges of plastic bottle recycling, presents multimodal AI‑driven sorting methods using RGB and NIR data, discusses high‑speed sorting performance, and outlines a commercial pathway that balances precision, speed, and cost for large‑scale deployment.

computer visionhigh-speed sortingindustrial automation

0 likes · 12 min read

Intelligent Plastic Bottle Sorting: Challenges, Multimodal AI Methods, High‑Speed Performance, and Commercialization Path

Huolala Tech

Jan 23, 2025 · Artificial Intelligence

How AI Transforms Freight Safety: Real-Time Risk Detection and Intervention

This article explains how AI technologies are applied to freight safety, detailing the challenges of traditional controls, the architecture of a real‑time AI safety system, data processing, risk detection, tiered interventions, and the resulting improvements in accuracy and operational efficiency.

AIRisk Detectioncomputer vision

0 likes · 7 min read

How AI Transforms Freight Safety: Real-Time Risk Detection and Intervention

AIWalker

Jan 21, 2025 · Artificial Intelligence

UltraFusion HDR: How AIGC Enhances Dynamic Imaging to Capture Detail and Balance Exposure

The UltraFusion HDR algorithm, developed by Shanghai AI Lab with CUHK and Zhejiang University, combines generative AI with exposure fusion to recover detail and balance lighting even when exposure differences reach up to 9 EV, enabling high‑quality images from ordinary cameras without hardware upgrades.

Dynamic RangeGenerative AIHDR

0 likes · 6 min read

UltraFusion HDR: How AIGC Enhances Dynamic Imaging to Capture Detail and Balance Exposure

JD Retail Technology

Jan 21, 2025 · Artificial Intelligence

Tech Insight: Selected JD Retail Technology Papers in Artificial Intelligence (2024)

Tech Insight highlights ten 2024 JD Retail Technology AI papers presented at top conferences—including CVPR, SIGIR, WWW, AAAI and IJCAI—that advance open‑vocabulary object detection, unified search‑recommendation, pre‑ranking consistency, diversity‑aware re‑ranking, a diversified product‑search dataset, graph‑based query classification, plug‑in CTR models, parallel ad‑ranking, trajectory‑based CTR stability, and task‑aware decoding for large language models.

CTR PredictionE‑commerceInformation Retrieval

0 likes · 20 min read

Tech Insight: Selected JD Retail Technology Papers in Artificial Intelligence (2024)

Python Programming Learning Circle

Jan 14, 2025 · Artificial Intelligence

Age Prediction Using OpenCV and Deep Learning with Python

This tutorial explains how to use OpenCV, pre‑trained deep‑learning models, and Python to automatically detect faces and predict a person's age from static images or real‑time video, covering model selection, project structure, script usage, result analysis, and ways to improve accuracy.

Age EstimationCaffecomputer vision

0 likes · 18 min read

Age Prediction Using OpenCV and Deep Learning with Python

AIWalker

Jan 13, 2025 · Artificial Intelligence

Multi-View Transformer (MVFormer) Sets New Top‑1 Accuracy Records in Classification, Detection, and Segmentation

The paper proposes MVFormer, a Vision Transformer that combines a Multi‑View Normalization (MVN) module and a Multi‑View Token Mixer (MVTM) to diversify feature learning, achieving state‑of‑the‑art Top‑1 accuracy of 83.4%‑84.6% on ImageNet‑1K and superior performance on COCO detection and ADE20K segmentation while using comparable or fewer parameters and MACs.

Multi-View NormalizationToken MixerVision Transformer

0 likes · 25 min read

Multi-View Transformer (MVFormer) Sets New Top‑1 Accuracy Records in Classification, Detection, and Segmentation

AIWalker

Jan 12, 2025 · Artificial Intelligence

CubeFormer: A Simple Yet Effective Lightweight Image Super‑Resolution Baseline

CubeFormer introduces a novel cube attention mechanism and dual transformer blocks that dramatically improve feature diversity, enabling a lightweight image super‑resolution model to achieve state‑of‑the‑art PSNR and visual detail across multiple benchmarks while keeping parameters low.

computer visioncube attentiondeep learning

0 likes · 21 min read

CubeFormer: A Simple Yet Effective Lightweight Image Super‑Resolution Baseline

Java Tech Enthusiast

Jan 12, 2025 · Artificial Intelligence

AgiBot World: Large-Scale Multi‑Robot Embodied AI Dataset Release

AgiBot World, the first globally‑scale robot dataset captured in fully realistic environments, provides ten‑fold longer trajectories and hundred‑fold greater scene coverage than prior collections, featuring over 80 daily‑life skills recorded by a 32‑DOF robot with advanced sensing, and includes rigorous multi‑stage quality control with future releases slated to reach a million runs and millions of simulated trajectories.

Embodied AIcomputer visionlarge dataset

0 likes · 9 min read

AgiBot World: Large-Scale Multi‑Robot Embodied AI Dataset Release

AIWalker

Jan 11, 2025 · Artificial Intelligence

CAS-ViT: The Fastest, Strongest Vision Transformer for Mobile Image Classification & Detection

CAS‑ViT introduces a convolutional additive self‑attention mechanism that dramatically reduces the computational cost of Vision Transformers, achieving state‑of‑the‑art accuracy on image classification, object detection, and segmentation while being deployable on mobile devices.

Efficient ModelsSelf-AttentionVision Transformer

0 likes · 19 min read

CAS-ViT: The Fastest, Strongest Vision Transformer for Mobile Image Classification & Detection

AIWalker

Jan 11, 2025 · Artificial Intelligence

Arc2Face: Identity‑Conditioned Face Generation Model Delivering High‑Consistency, High‑Quality AI Portraits

Arc2Face is an identity‑conditioned face synthesis foundation model that projects ArcFace embeddings into the CLIP space of a fine‑tuned Stable Diffusion, using up‑sampled WebFace42M and high‑quality FFHQ/CelebA‑HQ data to achieve far‑superior facial similarity and consistency compared with existing methods such as FaceSwap and InstantID, as demonstrated by extensive quantitative and visual experiments.

Arc2FaceFace GenerationIdentity Conditioning

0 likes · 7 min read

Arc2Face: Identity‑Conditioned Face Generation Model Delivering High‑Consistency, High‑Quality AI Portraits

Python Programming Learning Circle

Dec 18, 2024 · Artificial Intelligence

Object Detection in Python Using Template Matching

This article demonstrates how to perform object detection in Python without machine‑learning frameworks by using OpenCV’s template‑matching functions, covering single‑object detection, multi‑object detection with thresholding, and providing complete code examples for loading images, matching, locating matches, drawing bounding boxes, and visualizing results.

Template Matchingcomputer visionopencv

0 likes · 6 min read

Object Detection in Python Using Template Matching

php Courses

Dec 18, 2024 · Artificial Intelligence

Using PHP to Access the Camera and Perform Face Detection with OpenCV

This article explains how to install OpenCV and php-facedetect libraries, write PHP code to capture images from a webcam, perform face detection using the pico library, and display the results, providing a step‑by‑step guide for object detection with PHP.

CameraPHPcomputer vision

0 likes · 5 min read

Using PHP to Access the Camera and Perform Face Detection with OpenCV

Test Development Learning Exchange

Dec 6, 2024 · Artificial Intelligence

Using pytesseract and Pillow for OCR: Installation, Configuration, and Accuracy Improvement Techniques

This guide explains how to install Tesseract OCR and the Python libraries pytesseract and Pillow, configure the engine path, perform image-to-text extraction with example code, and apply various preprocessing, detection, and post‑processing methods to significantly improve OCR accuracy.

OCRPythoncomputer vision

0 likes · 8 min read

Using pytesseract and Pillow for OCR: Installation, Configuration, and Accuracy Improvement Techniques

Architecture Digest

Dec 5, 2024 · Artificial Intelligence

NeurIPS 2024 Best Paper Introduces Visual Autoregressive Modeling (VAR) for Image Generation

A recent NeurIPS 2024 best‑paper award highlights a novel Visual Autoregressive Modeling (VAR) approach that uses multi‑scale token prediction to improve image generation, while the surrounding article also mentions a free book giveaway and a legal dispute involving the paper's author.

NeurIPSVisual Autoregressive Modelingartificial-intelligence

0 likes · 5 min read

NeurIPS 2024 Best Paper Introduces Visual Autoregressive Modeling (VAR) for Image Generation

php Courses

Dec 5, 2024 · Artificial Intelligence

Real-Time Face Recognition with PHP and OpenCV

This article explains how to set up a PHP environment, control a camera, and use the OpenCV library to perform real-time face detection and recognition with code examples, enabling security applications such as access control and monitoring systems.

PHPReal-timecomputer vision

0 likes · 6 min read

Test Development Learning Exchange

Nov 30, 2024 · Artificial Intelligence

Basic Image Processing with OpenCV: Reading, Displaying, and Manipulating Images in Python

This tutorial introduces basic image processing techniques using OpenCV in Python, covering image reading, displaying, grayscale conversion, cropping, resizing, rotation, flipping, and saving, with step‑by‑step code examples and explanations to help beginners apply these operations in real projects.

Pythoncomputer visionopencv

0 likes · 8 min read

Basic Image Processing with OpenCV: Reading, Displaying, and Manipulating Images in Python

Test Development Learning Exchange

Nov 30, 2024 · Artificial Intelligence

Popular Python Libraries for Image Processing with Installation Commands and Code Samples

This article introduces ten widely used Python image‑processing libraries—including Pillow, OpenCV, scikit‑image, imageio, mahotas, SimpleITK, imgaug, face_recognition, Pyradiomics, and tqdm—provides brief descriptions, pip installation commands, and runnable code examples to help developers choose the right tool for their computer‑vision tasks.

Pythoncomputer visionmachine learning

0 likes · 10 min read

Popular Python Libraries for Image Processing with Installation Commands and Code Samples

DaTaobao Tech

Nov 27, 2024 · Artificial Intelligence

FuseAnyPart: Diffusion‑Driven Facial Parts Swapping via Multiple Reference Images

FuseAnyPart is a diffusion‑model‑based facial part swapping technique that fuses features from multiple reference images via mask‑based fusion and additive injection modules, delivering high‑fidelity, consistent face edits with lower computational cost, outperforming prior methods on CelebA‑HQ and FaceForensics++ and already boosting commercial AIGC applications.

computer visiondiffusion modelfacial part swapping

0 likes · 9 min read

FuseAnyPart: Diffusion‑Driven Facial Parts Swapping via Multiple Reference Images

Python Programming Learning Circle

Nov 27, 2024 · Artificial Intelligence

Open‑Source Bird Species Detection with TensorFlow, MobileNet V2 and OpenCV

A hobbyist builds a Python‑based bird‑recognition system using TensorFlow's SSD OpenImages model, a MobileNet V2 classifier from TensorFlow Hub, and OpenCV, shares the open‑source code on GitHub, discusses early results, challenges like accuracy and non‑maximum suppression, and outlines future improvements.

Bird DetectionTensorFlowcomputer vision

0 likes · 8 min read

Open‑Source Bird Species Detection with TensorFlow, MobileNet V2 and OpenCV

DaTaobao Tech

Nov 25, 2024 · Artificial Intelligence

Open‑Set Object Detection and Visual Grounding: Analysis of YOLO‑World, Grounding DINO, and YOLO11

The article surveys state‑of‑the‑art open‑set object detection and visual‑grounding models—Grounding DINO, YOLO‑World, and the latest YOLO 11—detailing their architectures, training strategies, and experimental results on home‑decoration datasets, showing that open‑set detectors recognize unseen objects while YOLO 11 excels on known categories, and that integrating both approaches yields superior performance, highlighting the expanded potential of detectors for real‑world applications.

Grounding DINOVisual GroundingYOLO-World

0 likes · 15 min read

Open‑Set Object Detection and Visual Grounding: Analysis of YOLO‑World, Grounding DINO, and YOLO11

Baidu Geek Talk

Nov 25, 2024 · Artificial Intelligence

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

PP‑ShiTuV2, a PaddleX pipeline that integrates subject detection, deep feature encoding, and vector retrieval, delivers 91 % recall@1 on AliProducts, surpasses earlier models by over 20 points, runs efficiently on GPU and CPU, and offers simple installation, quick‑start code, and full fine‑tuning support.

Model DeploymentPP-ShiTuV2PaddleX

0 likes · 8 min read

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

JD Tech Talk

Nov 14, 2024 · Artificial Intelligence

Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet

This paper presents a multimodal Reliable Feedback Network (RFNet) and a consistency regularization method that use human feedback to automatically evaluate and fine‑tune diffusion models, dramatically increasing the usable rate of e‑commerce advertising images while preserving visual quality.

Diffusion ModelsHuman FeedbackRFNet

0 likes · 8 min read

Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet

Bilibili Tech

Nov 8, 2024 · Artificial Intelligence

AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili

Bilibili’s AI‑driven game‑recognition system extracts real‑time LoL events through OCR, hero detection and hot‑spot tagging, generating high‑energy timestamps and interactive overlays that let viewers jump to key moments and view detailed statistics, enhancing spectator engagement and analytical capabilities across major esports tournaments.

AIGame RecognitionMultimodal

0 likes · 14 min read

AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili

Test Development Learning Exchange

Nov 4, 2024 · Artificial Intelligence

Image Processing with Python: Pillow and OpenCV Guide

This guide demonstrates how to perform common image processing tasks in Python using the Pillow and OpenCV libraries, covering reading, displaying, saving, resizing, cropping, rotating, converting to grayscale, adding text, compositing, blurring, sharpening, enhancing, and extracting image metadata.

Pythoncomputer visionopencv

0 likes · 5 min read

Image Processing with Python: Pillow and OpenCV Guide

Tencent Cloud Developer

Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCCLIPDiffusion Models

0 likes · 19 min read

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

php Courses

Oct 11, 2024 · Artificial Intelligence

Using PHP to Access a Webcam and Perform Object (Face) Detection with OpenCV

This tutorial explains how to install OpenCV and php-facedetect, write PHP code to capture images from a webcam, perform face detection, and display the results, providing step‑by‑step commands and a complete example script.

PHPcomputer visionface detection

0 likes · 6 min read

Using PHP to Access a Webcam and Perform Object (Face) Detection with OpenCV

php Courses

Sep 25, 2024 · Artificial Intelligence

Real-Time Face Recognition with PHP and OpenCV

This article demonstrates how to set up a PHP environment, control a camera, and integrate OpenCV for real-time face detection and recognition, providing code examples and a complete workflow to enhance security applications.

PHPcomputer visionface recognition

0 likes · 5 min read

Open Source Tech Hub

Sep 12, 2024 · Artificial Intelligence

Master Double-Digit OCR with ddddocr: Deep Learning Library for PHP & Python

This article introduces ddddocr, an open‑source deep‑learning OCR library for recognizing double‑digit numbers, explains its background, key features, installation steps, and provides detailed PHP examples for basic OCR, target detection, and slider detection functionalities.

OCRPHPPython

0 likes · 9 min read

Master Double-Digit OCR with ddddocr: Deep Learning Library for PHP & Python

Sohu Tech Products

Sep 11, 2024 · Artificial Intelligence

Low‑Cost 3D Reconstruction Using 3D Gaussian Splatting

This article explains how to create high‑quality 3D scenes from ordinary video footage by slicing frames with ffmpeg, extracting camera poses with COLMAP, and applying 3D Gaussian Splatting to replace traditional mesh‑texture pipelines, dramatically lowering equipment costs and data size.

3D reconstructionCOLMAPFFmpeg

0 likes · 6 min read

Low‑Cost 3D Reconstruction Using 3D Gaussian Splatting

Volcano Engine Developer Services

Sep 11, 2024 · Artificial Intelligence

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

This article reviews recent advances in applying large language models to computer vision, covering background challenges, unified multimodal modeling, the PixelLM architecture for pixel‑level understanding and generation, and new approaches to image and video creation such as StoryDiffusion, while outlining future research directions.

PixelLMStoryDiffusioncomputer vision

0 likes · 22 min read

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

Python Programming Learning Circle

Sep 5, 2024 · Artificial Intelligence

Face Detection with Haar Cascade and Face Recognition Using LBPH in OpenCV

This article explains the fundamentals of face detection using the Haar‑cascade algorithm, how to train and apply detectors with OpenCV, and introduces the Local Binary Patterns Histograms (LBPH) method for face recognition, covering data preparation, parameter selection, and matching techniques.

Haar cascadeLBPHPython

0 likes · 13 min read

Face Detection with Haar Cascade and Face Recognition Using LBPH in OpenCV

AntTech

Sep 3, 2024 · Artificial Intelligence

2024 Inclusion Bund Conference AI Innovation Competition and Deepfake Challenge Results

The 2024 Inclusion Bund Conference in Shanghai announced the winners of its newly added AI Innovation Competition, including the AFAC Financial Intelligence Contest and the Global Deepfake Attack‑Defense Challenge, highlighting participation from over 7,000 teams across more than 20 countries and showcasing cutting‑edge deepfake detection achievements.

AIFinTechInnovation Competition

0 likes · 7 min read

2024 Inclusion Bund Conference AI Innovation Competition and Deepfake Challenge Results

JD Cloud Developers

Aug 29, 2024 · Artificial Intelligence

How AI Powers E‑Commerce Content Compliance and Price Governance

This article explains how e‑commerce platforms use AI‑driven content compliance to detect malicious products, price manipulation, and counterfeit goods, outlining the technical challenges, core business metrics, model‑based solutions for price over‑pricing, and personal growth advice for compliance engineers.

AINLPcomputer vision

0 likes · 9 min read

How AI Powers E‑Commerce Content Compliance and Price Governance

Architecture Development Notes

Aug 28, 2024 · Artificial Intelligence

Master Drawing with OpenCV in Rust: Lines, Shapes, and Text Made Easy

This tutorial walks you through using OpenCV's Rust bindings to draw lines, geometric shapes, and custom text on images, detailing each function's parameters and providing complete, ready‑to‑run code examples for practical image processing tasks.

Image processingcomputer visiondrawing

0 likes · 8 min read

Master Drawing with OpenCV in Rust: Lines, Shapes, and Text Made Easy

Bilibili Tech

Aug 27, 2024 · Artificial Intelligence

Multimodal Video Scene Classification for Adaptive Video Processing

The paper presents a multimodal video scene classification system that leverages CLIP‑generated pseudo‑labels and a fine‑tuned image encoder to automatically identify nature, animation/game, and document scenes, enabling more effective adaptive transcoding, intelligent restoration, and quality assessment for user‑generated content on platforms such as Bilibili.

Bilibili multimediaCLIPMultimodal Learning

0 likes · 17 min read

Multimodal Video Scene Classification for Adaptive Video Processing

Rare Earth Juejin Tech Community

Aug 22, 2024 · Artificial Intelligence

Understanding Faster R-CNN: Architecture, Training, and Experimental Results

This article provides an in‑depth overview of the Faster R‑CNN object detection framework, covering its background, key innovations such as the Region Proposal Network, detailed algorithmic principles, training procedures, experimental results on PASCAL VOC and MS COCO, and a reproducible PyTorch implementation.

Faster R-CNNPyTorchRPN

0 likes · 14 min read

Understanding Faster R-CNN: Architecture, Training, and Experimental Results

php Courses

Jul 26, 2024 · Artificial Intelligence

Real-Time Image Processing with PHP and OpenCV

This tutorial explains how PHP developers can install OpenCV and the php-opencv extension, write code to capture webcam video, display live frames in a browser, and perform real-time face detection using computer‑vision techniques.

PHPReal-timecomputer vision

0 likes · 6 min read

Real-Time Image Processing with PHP and OpenCV

Baidu Geek Talk

Jul 24, 2024 · Artificial Intelligence

AI-Driven Fusion of Peking Opera Characters with Ink-Wash Painting Style Using PaddleGAN

Li Yilin’s AI project blends Peking Opera characters with traditional ink‑wash painting by using PaddleHub for style transfer and PaddleGAN’s First‑Order Motion model for facial motion, then adds music and Wav2Lip lip‑sync, producing videos that modernize Chinese heritage and gauge public cultural awareness.

AIPaddleGANPeking Opera

0 likes · 9 min read

AI-Driven Fusion of Peking Opera Characters with Ink-Wash Painting Style Using PaddleGAN

Full-Stack Cultivation Path

Jul 17, 2024 · Artificial Intelligence

Open-Source PDF Toolkit Delivers High-Accuracy Layout and Formula Detection

PDF‑Extract‑Kit is an open‑source toolkit that combines high‑accuracy layout detection, formula detection, formula recognition, and OCR for PDFs, and the article details its model comparisons, evaluation on academic and textbook datasets, and step‑by‑step instructions for running it on Windows or macOS, including Apple Silicon.

OCRPDF-Extract-Kitcomputer vision

0 likes · 6 min read

Open-Source PDF Toolkit Delivers High-Accuracy Layout and Formula Detection

Kuaishou Tech

Jul 16, 2024 · Artificial Intelligence

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

LivePortrait is an open‑source, controllable portrait video generation framework that transfers facial expressions and poses from a driving video to static or dynamic portraits in real time, leveraging a 69M‑frame mixed video‑image training set, stitching and retargeting modules, and achieving high quality with low latency.

AIReal-timeVideo Animation

0 likes · 14 min read

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Alibaba Cloud Big Data AI Platform

Jul 15, 2024 · Artificial Intelligence

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

EasyAnimate v3, an open‑source video generation system from Alibaba Cloud AI Platform, introduces Diffusion Transformer‑based architecture, Hybrid Motion Module, and Slice VAE to enable image‑to‑video, text‑to‑video, and unlimited‑length video creation with up to 720p/144 fps resolution on modest GPU memory.

AIEasyAnimateGenerative AI

0 likes · 5 min read

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

Architecture and Beyond

Jul 7, 2024 · Artificial Intelligence

How Does ControlNet Extend Stable Diffusion for Precise Image Generation?

This article explains the core principles of Stable Diffusion, its training pipeline and limitations, then details how ControlNet adds controllable signals to diffusion models, outlines its architecture, ecosystem of model variants, and showcases diverse real‑world applications.

AIControlNetDiffusion Models

0 likes · 16 min read

How Does ControlNet Extend Stable Diffusion for Precise Image Generation?

Rare Earth Juejin Tech Community

Jul 4, 2024 · Artificial Intelligence

Parsing and Visualizing COCO Keypoint Detection Annotations with Python

This tutorial explains how to explore the COCO keypoint detection annotation files, describes their JSON structure and fields, and provides step‑by‑step Python code using json, Pillow, and matplotlib to load images, extract keypoints, and draw both points and skeletal connections for visual analysis.

COCOPythonannotation parsing

0 likes · 12 min read

Parsing and Visualizing COCO Keypoint Detection Annotations with Python

Selected Java Interview Questions

Jul 3, 2024 · Artificial Intelligence

Integrating OpenCV with Java and Spring Boot for Face Detection and Recognition

This guide provides a comprehensive walkthrough of installing OpenCV, using its Java API for image and video face detection, implementing face comparison, creating custom GUI windows, and integrating the library into a Spring Boot application with detailed code examples and common troubleshooting tips.

Custom GUIJavaSpring Boot

0 likes · 25 min read

Integrating OpenCV with Java and Spring Boot for Face Detection and Recognition

Kuaishou Tech

Jul 1, 2024 · Artificial Intelligence

Short-Form Video Quality Assessment Competition at CVPR NTIRE 2024: Dataset, Challenge Overview, and Top Winning Solutions

The CVPR NTIRE 2024 short-form video quality assessment competition introduced the KVQ dataset, attracted over 200 teams, evaluated submissions using SROCC and PLCC metrics, and highlighted the winning approaches of SJTU MMLab, IH‑VQA, and TVQE, showcasing advances in AI‑driven video quality evaluation.

AI competitionNTIRE 2024computer vision

0 likes · 9 min read

Short-Form Video Quality Assessment Competition at CVPR NTIRE 2024: Dataset, Challenge Overview, and Top Winning Solutions

DaTaobao Tech

Jul 1, 2024 · Artificial Intelligence

Recent Progress in Vision-Language Models (VLMs)

Over the past year, Vision‑Language Models have surged from early multimodal experiments to competitive open‑source systems rivaling GPT‑4, driven by higher‑resolution processing, richer vision encoders, better projection layers, and larger curated datasets, yet they still face evaluation difficulties, hallucinations, speed limits, and limited multimodal output.

computer visiondeep learninglarge language models

0 likes · 24 min read

Recent Progress in Vision-Language Models (VLMs)

Kuaishou Large Model

Jun 27, 2024 · Artificial Intelligence

How I2V-Adapter Turns Images into Videos with Minimal Training

Fast‑forwarding image‑to‑video generation, the article introduces I2V‑Adapter, a lightweight plug‑in for Stable Diffusion‑based video diffusion models that converts a single static image into a coherent video without altering the original T2V architecture, and details its design, frame‑similarity prior, experimental results, and real‑world applications.

AIDiffusion ModelsI2V-Adapter

0 likes · 9 min read

How I2V-Adapter Turns Images into Videos with Minimal Training

Kuaishou Tech

Jun 26, 2024 · Artificial Intelligence

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models

The I2V-Adapter paper introduces a plug‑and‑play lightweight module that enables static images to be converted into dynamic videos using Stable Diffusion‑based text‑to‑video diffusion models without altering the original architecture or pretrained parameters, achieving competitive quality with far less training cost.

AIDiffusion ModelsI2V-Adapter

0 likes · 8 min read

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models