Tagged articles
650 articles
Page 2 of 7
Python Programming Learning Circle
Python Programming Learning Circle
Apr 19, 2025 · Artificial Intelligence

Building an AI‑Powered Dou Dizhu Card‑Playing Assistant with YOLOv5 and DouZero

This tutorial explains how to create an AI‑driven Dou Dizhu (Chinese poker) assistant that captures game screenshots, uses YOLOv5 for card detection, leverages the DouZero model for optimal move prediction, and provides a PyQt5 UI for real‑time play assistance, including environment setup and code examples.

AIComputer VisionDouZero
0 likes · 13 min read
Building an AI‑Powered Dou Dizhu Card‑Playing Assistant with YOLOv5 and DouZero
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 18, 2025 · Artificial Intelligence

How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images

The open‑sourced 14‑billion‑parameter Tongyi Wanxiang video model can create high‑quality 720p videos that seamlessly connect user‑provided start and end images, offering controllable, personalized video generation with prompt‑driven camera motions and easy access via its website, GitHub, Hugging Face, and ModelScope.

AI modelComputer VisionDeep Learning
0 likes · 5 min read
How the New 14B End‑to‑End Video Model Generates Custom 720p Clips from Two Images
AIWalker
AIWalker
Apr 16, 2025 · Artificial Intelligence

Plug‑and‑Play Multi‑Scale Attention: A Seamless Boost for Model Performance

This article reviews recent multi‑scale attention breakthroughs—including EMA, MSDA, VWA, and related modules—showing how they improve accuracy, cut FLOPs by up to 70%, and can be inserted into existing models with minimal effort, backed by code and paper links.

Computer VisionDeep LearningPlug-and-Play
0 likes · 10 min read
Plug‑and‑Play Multi‑Scale Attention: A Seamless Boost for Model Performance
JD Retail Technology
JD Retail Technology
Apr 16, 2025 · Artificial Intelligence

AI‑Driven 3D Spatial Video Generation from Monocular 2D Content with MV‑HEVC Encoding

This work presents an end‑to‑end AI pipeline that transforms existing monocular 2D videos into immersive 3D spatial streams by combining DINO‑v2‑based depth estimation, multi‑branch view synthesis, and MV‑HEVC encoding, achieving up to 33 % BD‑Rate reduction, 31 % speed gains, state‑of‑the‑art visual quality, and real‑time production suitability, validated on the new StereoV1K benchmark and deployed in JD.Vision’s e‑commerce catalog.

3D videoAI GenerationAIGC
0 likes · 21 min read
AI‑Driven 3D Spatial Video Generation from Monocular 2D Content with MV‑HEVC Encoding
AI Frontier Lectures
AI Frontier Lectures
Apr 13, 2025 · Artificial Intelligence

How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality

The paper introduces HINT, a Transformer‑based image restoration model that employs Hierarchical Multi‑Head Attention (HMHA) and a Query‑Key Cache Updating (QKCU) module to eliminate attention redundancy, achieving superior PSNR/SSIM scores across low‑light enhancement, dehazing, desnowing, denoising, and deraining tasks while maintaining low model complexity.

Computer VisionHierarchical AttentionImage Restoration
0 likes · 10 min read
How HINT’s Hierarchical Multi‑Head Attention Boosts Image Restoration Quality
AI Frontier Lectures
AI Frontier Lectures
Apr 11, 2025 · Artificial Intelligence

How Q-Insight Uses Reinforcement Learning to Make AI Truly Understand Image Quality

Q-Insight, a multimodal large‑model introduced by Peking University and Volcano Engine, leverages reinforcement learning and a novel Group Relative Policy Optimization algorithm to evaluate image quality, providing detailed reasoning, degradation detection, and zero‑shot comparison, outperforming state‑of‑the‑art methods on multiple benchmarks.

AIComputer VisionVideo Cloud
0 likes · 10 min read
How Q-Insight Uses Reinforcement Learning to Make AI Truly Understand Image Quality
AI Frontier Lectures
AI Frontier Lectures
Apr 10, 2025 · Artificial Intelligence

How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds

WonderTurbo introduces a real‑time 3D scene generation pipeline that accelerates both geometry and appearance modeling to under a second per view, using StepSplat, QuickDepth, and FastPaint modules, achieving up to 15× speedup while maintaining high visual quality.

3D generationComputer VisionDepth Completion
0 likes · 16 min read
How WonderTurbo Generates Interactive 3D Worlds in Just 0.72 Seconds
AIWalker
AIWalker
Apr 7, 2025 · Artificial Intelligence

TurboFill: High‑Quality Image Inpainting in Just 4 Steps

TurboFill introduces a fast image‑inpainting model that trains a repair adapter on a few‑step text‑to‑image diffusion backbone, achieving state‑of‑the‑art results with only four diffusion steps while dramatically reducing computational cost.

Computer VisionTurboFilldiffusion models
0 likes · 17 min read
TurboFill: High‑Quality Image Inpainting in Just 4 Steps
AI Frontier Lectures
AI Frontier Lectures
Apr 4, 2025 · Artificial Intelligence

How OverLoCK Redefines Vision Backbones with Dynamic Convolution

OverLoCK, a new vision backbone inspired by human top‑down attention, combines a three‑stage decomposition, dynamic ContMix convolutions and top‑down guidance to achieve state‑of‑the‑art performance on ImageNet classification, COCO detection and ADE20K segmentation while maintaining strong trade‑offs.

Computer VisionOverLoCKTop-down Attention
0 likes · 10 min read
How OverLoCK Redefines Vision Backbones with Dynamic Convolution
AIWalker
AIWalker
Mar 27, 2025 · Artificial Intelligence

MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

MagicColor introduces a novel multi‑instance sketch‑coloring framework that uses a two‑stage self‑play training strategy, instance guidance, and edge‑aware pixel‑level color matching to automatically produce high‑quality, consistent colors for multiple line‑art instances, outperforming prior GAN and diffusion‑based methods.

AIComputer VisionMulti-Instance
0 likes · 16 min read
MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics
AI Frontier Lectures
AI Frontier Lectures
Mar 25, 2025 · Artificial Intelligence

Can Mixed‑Modality Graphs Unlock Precise 3D Indoor Scene Generation?

MMGDreamer introduces a mixed‑modality graph and a dual‑branch diffusion model that jointly enhance geometric control and realism in 3D indoor scene synthesis, outperforming state‑of‑the‑art methods across multiple quantitative and qualitative benchmarks.

3D scene generationAIComputer Vision
0 likes · 12 min read
Can Mixed‑Modality Graphs Unlock Precise 3D Indoor Scene Generation?
AI Frontier Lectures
AI Frontier Lectures
Mar 24, 2025 · Artificial Intelligence

How MambaIRv2 Boosts Image Restoration with Attentive State‑Space Design

Introducing MambaIRv2, an image restoration backbone that replaces Mamba’s causal scanning with an attentive state‑space module, achieving single‑direction scanning, reduced parameters and computation, and superior performance on lightweight and classic super‑resolution, JPEG artifact removal, and denoising tasks, as validated by CVPR‑2025 results.

Computer VisionImage RestorationMambaIRv2
0 likes · 8 min read
How MambaIRv2 Boosts Image Restoration with Attentive State‑Space Design
AntTech
AntTech
Mar 14, 2025 · Artificial Intelligence

MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding

The CVPR 2025 paper "MP-GUI: Modality Perception with MLLMs for GUI Understanding" presents a novel algorithm that enhances multimodal large language models' ability to perceive and reason about graphical user interfaces by integrating text, visual, and spatial signals through specialized perception modules and a dynamic fusion gate, achieving state‑of‑the‑art performance on multiple GUI benchmarks.

CVPR2025Computer VisionGUI Understanding
0 likes · 5 min read
MP-GUI: Modality Perception with Multimodal Large Language Models for GUI Understanding
AIWalker
AIWalker
Mar 13, 2025 · Artificial Intelligence

YOLOE: Real‑Time Open‑World Object Detection and Segmentation Unveiled

The paper introduces YOLOE, a new YOLO‑based model that supports text, visual, and no‑prompt open‑world detection and segmentation, detailing its lightweight RepRTA, SAVPE, and LRPC modules and showing benchmark gains in speed and zero‑shot performance on LVIS and COCO.

BenchmarkComputer VisionYOLOE
0 likes · 9 min read
YOLOE: Real‑Time Open‑World Object Detection and Segmentation Unveiled
php Courses
php Courses
Mar 13, 2025 · Artificial Intelligence

Real-Time Image Processing with PHP and OpenCV: A Step-by-Step Tutorial

This tutorial guides PHP developers through installing OpenCV and the php‑opencv extension, capturing live video, displaying frames in a browser, and performing real‑time face detection using Haar cascades, providing a practical introduction to computer‑vision tasks in PHP.

Computer VisionFace DetectionImage Processing
0 likes · 6 min read
Real-Time Image Processing with PHP and OpenCV: A Step-by-Step Tutorial
AIWalker
AIWalker
Mar 8, 2025 · Artificial Intelligence

IMAGPose: A Unified Conditional Framework for Photo‑Realistic Pose‑Guided Person Generation (NeurIPS 2024)

IMAGPose introduces a unified conditional diffusion framework that combines feature‑level, image‑level, and cross‑view attention modules to generate high‑fidelity, photo‑realistic person images under diverse pose and multi‑view scenarios, outperforming prior SOTA methods on DeepFashion and Market‑1501.

AIComputer Visiondiffusion models
0 likes · 22 min read
IMAGPose: A Unified Conditional Framework for Photo‑Realistic Pose‑Guided Person Generation (NeurIPS 2024)
AIWalker
AIWalker
Mar 8, 2025 · Artificial Intelligence

Trainable HVI Color Space Turns Dark Photos into Cinematic Images – CVPR 2025

The paper introduces a globally first trainable HVI color space and a lightweight CIDNet network that jointly model intensity and chrominance, eliminating color bias and brightness artifacts in low‑light image enhancement and achieving state‑of‑the‑art results on ten benchmark datasets.

CIDNetCVPR 2025Computer Vision
0 likes · 12 min read
Trainable HVI Color Space Turns Dark Photos into Cinematic Images – CVPR 2025
AIWalker
AIWalker
Mar 7, 2025 · Artificial Intelligence

How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks

The paper introduces GIFNet, a three‑branch network that leverages low‑level visual tasks and a cross‑fusion gating mechanism to achieve a single, task‑agnostic image‑fusion model with dramatically reduced computation, strong generalization to unseen modalities, and even single‑modal enhancement capabilities.

CVPR2025Computer VisionGIFNet
0 likes · 20 min read
How GIFNet’s Low‑Level Interaction Breakthrough Enables Universal Multimodal Fusion Across Tasks
AIWalker
AIWalker
Mar 6, 2025 · Artificial Intelligence

How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution

The paper introduces a Semantic‑Concentrated Multi‑Head Self‑Attention (SCMHSA) module and a new embedding‑space loss to address semantic dilution and loss‑target mismatch in Transformer‑based video next‑frame prediction, demonstrating significant PSNR and MSE gains across four benchmark datasets.

Computer VisionEmbedding LossSCMHSA
0 likes · 23 min read
How SCMHSA Improves Transformer Next‑Frame Prediction by Reducing Semantic Dilution
AIWalker
AIWalker
Mar 1, 2025 · Artificial Intelligence

UltraFusion HDR: AI-Generated HDR Algorithm Captures Detail and Balances Exposure

The UltraFusion HDR algorithm combines generative AI with traditional exposure fusion to recover details and produce natural‑looking high‑dynamic‑range images even when the exposure gap reaches up to 9 EV, turning over‑exposed or under‑exposed shots into high‑quality photos.

Computer VisionHDRImage Fusion
0 likes · 6 min read
UltraFusion HDR: AI-Generated HDR Algorithm Captures Detail and Balances Exposure
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 27, 2025 · Artificial Intelligence

SAFE: A Lightweight General AI Image Detection Method Achieving 96.7% Accuracy Across 33 Test Subsets

SAFE is a lightweight AI‑image detection framework using only 1.44 M parameters and 2.30 B FLOPs that preserves fine‑grained artifacts through crop‑based preprocessing, invariant augmentations, and high‑frequency wavelet features, achieving an average 96.7 % accuracy across 33 test subsets and strong generalization to unseen GAN and diffusion generators.

AI image detectionComputer VisionDeep Learning
0 likes · 11 min read
SAFE: A Lightweight General AI Image Detection Method Achieving 96.7% Accuracy Across 33 Test Subsets
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Feb 24, 2025 · Artificial Intelligence

AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark

The paper introduces AIDE, a hybrid AI‑generated image detector that fuses low‑level pixel statistics with high‑level semantic embeddings, and the manually curated Chameleon benchmark of ~26 000 diverse, high‑realism images, showing AIDE surpasses nine state‑of‑the‑art methods by up to 4.6 % while highlighting remaining challenges on this tougher dataset.

AI-generated image detectionComputer VisionDeep Learning
0 likes · 14 min read
AIDE: Hybrid Feature Detector for AI‑Generated Image Detection and the Chameleon Benchmark
AIWalker
AIWalker
Feb 19, 2025 · Artificial Intelligence

YOLOv12 Unveiled: Boosted Performance and Speed for Real‑Time Detection

YOLOv12 introduces an attention‑centric architecture, a lightweight regional attention module, and the R‑ELAN aggregation network, delivering consistent mAP gains and lower latency across N, S, M, L and X model scales while surpassing previous YOLO versions and other real‑time detectors.

Attention MechanismBenchmarkComputer Vision
0 likes · 8 min read
YOLOv12 Unveiled: Boosted Performance and Speed for Real‑Time Detection
DevOps
DevOps
Feb 17, 2025 · Artificial Intelligence

Microsoft OmniParser V2.0: A Visual Agent Parsing Framework for Enhanced UI Understanding

Microsoft's OmniParser V2.0 transforms large language models such as DeepSeek‑R1, GPT‑4o, and Qwen‑2.5VL into visual AI agents by accurately detecting interactive UI elements, providing semantic descriptions, and generating structured representations that boost inference speed, reduce latency by 60%, and dramatically improve benchmark accuracy.

AI AgentComputer VisionDeepSeek
0 likes · 7 min read
Microsoft OmniParser V2.0: A Visual Agent Parsing Framework for Enhanced UI Understanding
php Courses
php Courses
Feb 10, 2025 · Artificial Intelligence

Real-Time Face Recognition Using PHP and OpenCV

This article explains how to set up a PHP environment with OpenCV, control a camera to capture images, perform real-time face detection using Haar cascades, train and apply an LBPH face recognizer, and integrate the results into a security system.

Computer VisionPHPReal-Time
0 likes · 5 min read
Real-Time Face Recognition Using PHP and OpenCV
AIWalker
AIWalker
Feb 9, 2025 · Artificial Intelligence

Douyin’s BDVQAGroup Secures Global Runner‑Up in DXOMARK Image Quality Challenge at CVPR 2024

At CVPR 2024 NTIRE, Douyin’s BDVQAGroup achieved second place worldwide in the DXOMARK portrait quality track using their SampleIQA model, which combines data‑re‑sampling, a Swin‑Transformer backbone, twin‑network ranking loss and content‑aware cropping to outperform existing IQA state‑of‑the‑art methods.

Computer VisionDXOMARKDeep Learning
0 likes · 10 min read
Douyin’s BDVQAGroup Secures Global Runner‑Up in DXOMARK Image Quality Challenge at CVPR 2024
JD Tech
JD Tech
Feb 5, 2025 · Artificial Intelligence

Tech Insight: Highlights of Ten JD Retail Technology Papers Published in Top AI Conferences (2024)

Tech Insight presents concise overviews of ten JD retail technology papers accepted at top AI conferences in 2024, covering topics such as open‑vocabulary object detection, multi‑scenario ranking, diversity‑aware re‑ranking, a diversified product search dataset, semi‑supervised query classification, plug‑in CTR models, and methods to mitigate LLM hallucinations.

AIComputer Visione‑commerce
0 likes · 17 min read
Tech Insight: Highlights of Ten JD Retail Technology Papers Published in Top AI Conferences (2024)
DataFunSummit
DataFunSummit
Jan 28, 2025 · Artificial Intelligence

Few-Shot Learning for Multi-New-Class Scenarios: Challenges, Methodology, and Experimental Evaluation

This article introduces a novel few‑shot learning approach tailored for multi‑new‑class scenarios, discusses its background, problem definition, proposed parallel training framework, hierarchical fine‑tuning method, and presents extensive experiments demonstrating superior performance and computational efficiency.

Computer VisionFew‑Shot Learninghierarchical fine-tuning
0 likes · 10 min read
Few-Shot Learning for Multi-New-Class Scenarios: Challenges, Methodology, and Experimental Evaluation
DataFunSummit
DataFunSummit
Jan 27, 2025 · Artificial Intelligence

Intelligent Plastic Bottle Sorting: Challenges, Multimodal AI Methods, High‑Speed Performance, and Commercialization Path

This article examines the state and challenges of plastic bottle recycling, presents multimodal AI‑driven sorting methods using RGB and NIR data, discusses high‑speed sorting performance, and outlines a commercial pathway that balances precision, speed, and cost for large‑scale deployment.

Computer Visionhigh-speed sortingindustrial automation
0 likes · 12 min read
Intelligent Plastic Bottle Sorting: Challenges, Multimodal AI Methods, High‑Speed Performance, and Commercialization Path
Huolala Tech
Huolala Tech
Jan 23, 2025 · Artificial Intelligence

How AI Transforms Freight Safety: Real-Time Risk Detection and Intervention

This article explains how AI technologies are applied to freight safety, detailing the challenges of traditional controls, the architecture of a real‑time AI safety system, data processing, risk detection, tiered interventions, and the resulting improvements in accuracy and operational efficiency.

AIComputer VisionLogistics
0 likes · 7 min read
How AI Transforms Freight Safety: Real-Time Risk Detection and Intervention
AIWalker
AIWalker
Jan 21, 2025 · Artificial Intelligence

UltraFusion HDR: How AIGC Enhances Dynamic Imaging to Capture Detail and Balance Exposure

The UltraFusion HDR algorithm, developed by Shanghai AI Lab with CUHK and Zhejiang University, combines generative AI with exposure fusion to recover detail and balance lighting even when exposure differences reach up to 9 EV, enabling high‑quality images from ordinary cameras without hardware upgrades.

Computer VisionDynamic RangeHDR
0 likes · 6 min read
UltraFusion HDR: How AIGC Enhances Dynamic Imaging to Capture Detail and Balance Exposure
JD Retail Technology
JD Retail Technology
Jan 21, 2025 · Artificial Intelligence

Tech Insight: Selected JD Retail Technology Papers in Artificial Intelligence (2024)

Tech Insight highlights ten 2024 JD Retail Technology AI papers presented at top conferences—including CVPR, SIGIR, WWW, AAAI and IJCAI—that advance open‑vocabulary object detection, unified search‑recommendation, pre‑ranking consistency, diversity‑aware re‑ranking, a diversified product‑search dataset, graph‑based query classification, plug‑in CTR models, parallel ad‑ranking, trajectory‑based CTR stability, and task‑aware decoding for large language models.

CTR predictionComputer VisionE‑commerce
0 likes · 20 min read
Tech Insight: Selected JD Retail Technology Papers in Artificial Intelligence (2024)
Python Programming Learning Circle
Python Programming Learning Circle
Jan 14, 2025 · Artificial Intelligence

Age Prediction Using OpenCV and Deep Learning with Python

This tutorial explains how to use OpenCV, pre‑trained deep‑learning models, and Python to automatically detect faces and predict a person's age from static images or real‑time video, covering model selection, project structure, script usage, result analysis, and ways to improve accuracy.

Age EstimationCaffeComputer Vision
0 likes · 18 min read
Age Prediction Using OpenCV and Deep Learning with Python
AIWalker
AIWalker
Jan 13, 2025 · Artificial Intelligence

Multi-View Transformer (MVFormer) Sets New Top‑1 Accuracy Records in Classification, Detection, and Segmentation

The paper proposes MVFormer, a Vision Transformer that combines a Multi‑View Normalization (MVN) module and a Multi‑View Token Mixer (MVTM) to diversify feature learning, achieving state‑of‑the‑art Top‑1 accuracy of 83.4%‑84.6% on ImageNet‑1K and superior performance on COCO detection and ADE20K segmentation while using comparable or fewer parameters and MACs.

Computer VisionDeep LearningMulti-View Normalization
0 likes · 25 min read
Multi-View Transformer (MVFormer) Sets New Top‑1 Accuracy Records in Classification, Detection, and Segmentation
AIWalker
AIWalker
Jan 12, 2025 · Artificial Intelligence

CubeFormer: A Simple Yet Effective Lightweight Image Super‑Resolution Baseline

CubeFormer introduces a novel cube attention mechanism and dual transformer blocks that dramatically improve feature diversity, enabling a lightweight image super‑resolution model to achieve state‑of‑the‑art PSNR and visual detail across multiple benchmarks while keeping parameters low.

Computer VisionDeep Learningcube attention
0 likes · 21 min read
CubeFormer: A Simple Yet Effective Lightweight Image Super‑Resolution Baseline
Java Tech Enthusiast
Java Tech Enthusiast
Jan 12, 2025 · Artificial Intelligence

AgiBot World: Large-Scale Multi‑Robot Embodied AI Dataset Release

AgiBot World, the first globally‑scale robot dataset captured in fully realistic environments, provides ten‑fold longer trajectories and hundred‑fold greater scene coverage than prior collections, featuring over 80 daily‑life skills recorded by a 32‑DOF robot with advanced sensing, and includes rigorous multi‑stage quality control with future releases slated to reach a million runs and millions of simulated trajectories.

Computer VisionEmbodied AIRobotics
0 likes · 9 min read
AgiBot World: Large-Scale Multi‑Robot Embodied AI Dataset Release
AIWalker
AIWalker
Jan 11, 2025 · Artificial Intelligence

Arc2Face: Identity‑Conditioned Face Generation Model Delivering High‑Consistency, High‑Quality AI Portraits

Arc2Face is an identity‑conditioned face synthesis foundation model that projects ArcFace embeddings into the CLIP space of a fine‑tuned Stable Diffusion, using up‑sampled WebFace42M and high‑quality FFHQ/CelebA‑HQ data to achieve far‑superior facial similarity and consistency compared with existing methods such as FaceSwap and InstantID, as demonstrated by extensive quantitative and visual experiments.

Arc2FaceComputer VisionFace Generation
0 likes · 7 min read
Arc2Face: Identity‑Conditioned Face Generation Model Delivering High‑Consistency, High‑Quality AI Portraits
Python Programming Learning Circle
Python Programming Learning Circle
Dec 18, 2024 · Artificial Intelligence

Object Detection in Python Using Template Matching

This article demonstrates how to perform object detection in Python without machine‑learning frameworks by using OpenCV’s template‑matching functions, covering single‑object detection, multi‑object detection with thresholding, and providing complete code examples for loading images, matching, locating matches, drawing bounding boxes, and visualizing results.

Computer VisionOpenCVTemplate Matching
0 likes · 6 min read
Object Detection in Python Using Template Matching
php Courses
php Courses
Dec 18, 2024 · Artificial Intelligence

Using PHP to Access the Camera and Perform Face Detection with OpenCV

This article explains how to install OpenCV and php-facedetect libraries, write PHP code to capture images from a webcam, perform face detection using the pico library, and display the results, providing a step‑by‑step guide for object detection with PHP.

CameraComputer VisionFace Detection
0 likes · 5 min read
Using PHP to Access the Camera and Perform Face Detection with OpenCV
Test Development Learning Exchange
Test Development Learning Exchange
Dec 6, 2024 · Artificial Intelligence

Using pytesseract and Pillow for OCR: Installation, Configuration, and Accuracy Improvement Techniques

This guide explains how to install Tesseract OCR and the Python libraries pytesseract and Pillow, configure the engine path, perform image-to-text extraction with example code, and apply various preprocessing, detection, and post‑processing methods to significantly improve OCR accuracy.

Computer VisionOCRPython
0 likes · 8 min read
Using pytesseract and Pillow for OCR: Installation, Configuration, and Accuracy Improvement Techniques
php Courses
php Courses
Dec 5, 2024 · Artificial Intelligence

Real-Time Face Recognition with PHP and OpenCV

This article explains how to set up a PHP environment, control a camera, and use the OpenCV library to perform real-time face detection and recognition with code examples, enabling security applications such as access control and monitoring systems.

Computer VisionPHPReal-Time
0 likes · 6 min read
Real-Time Face Recognition with PHP and OpenCV
Test Development Learning Exchange
Test Development Learning Exchange
Nov 30, 2024 · Artificial Intelligence

Basic Image Processing with OpenCV: Reading, Displaying, and Manipulating Images in Python

This tutorial introduces basic image processing techniques using OpenCV in Python, covering image reading, displaying, grayscale conversion, cropping, resizing, rotation, flipping, and saving, with step‑by‑step code examples and explanations to help beginners apply these operations in real projects.

Computer VisionOpenCVPython
0 likes · 8 min read
Basic Image Processing with OpenCV: Reading, Displaying, and Manipulating Images in Python
Test Development Learning Exchange
Test Development Learning Exchange
Nov 30, 2024 · Artificial Intelligence

Popular Python Libraries for Image Processing with Installation Commands and Code Samples

This article introduces ten widely used Python image‑processing libraries—including Pillow, OpenCV, scikit‑image, imageio, mahotas, SimpleITK, imgaug, face_recognition, Pyradiomics, and tqdm—provides brief descriptions, pip installation commands, and runnable code examples to help developers choose the right tool for their computer‑vision tasks.

Computer VisionOpenCVPython
0 likes · 10 min read
Popular Python Libraries for Image Processing with Installation Commands and Code Samples
DaTaobao Tech
DaTaobao Tech
Nov 27, 2024 · Artificial Intelligence

FuseAnyPart: Diffusion‑Driven Facial Parts Swapping via Multiple Reference Images

FuseAnyPart is a diffusion‑model‑based facial part swapping technique that fuses features from multiple reference images via mask‑based fusion and additive injection modules, delivering high‑fidelity, consistent face edits with lower computational cost, outperforming prior methods on CelebA‑HQ and FaceForensics++ and already boosting commercial AIGC applications.

Computer Visiondiffusion modelfacial part swapping
0 likes · 9 min read
FuseAnyPart: Diffusion‑Driven Facial Parts Swapping via Multiple Reference Images
Python Programming Learning Circle
Python Programming Learning Circle
Nov 27, 2024 · Artificial Intelligence

Open‑Source Bird Species Detection with TensorFlow, MobileNet V2 and OpenCV

A hobbyist builds a Python‑based bird‑recognition system using TensorFlow's SSD OpenImages model, a MobileNet V2 classifier from TensorFlow Hub, and OpenCV, shares the open‑source code on GitHub, discusses early results, challenges like accuracy and non‑maximum suppression, and outlines future improvements.

Bird DetectionComputer VisionOpenCV
0 likes · 8 min read
Open‑Source Bird Species Detection with TensorFlow, MobileNet V2 and OpenCV
DaTaobao Tech
DaTaobao Tech
Nov 25, 2024 · Artificial Intelligence

Open‑Set Object Detection and Visual Grounding: Analysis of YOLO‑World, Grounding DINO, and YOLO11

The article surveys state‑of‑the‑art open‑set object detection and visual‑grounding models—Grounding DINO, YOLO‑World, and the latest YOLO 11—detailing their architectures, training strategies, and experimental results on home‑decoration datasets, showing that open‑set detectors recognize unseen objects while YOLO 11 excels on known categories, and that integrating both approaches yields superior performance, highlighting the expanded potential of detectors for real‑world applications.

Computer VisionDeep LearningGrounding DINO
0 likes · 15 min read
Open‑Set Object Detection and Visual Grounding: Analysis of YOLO‑World, Grounding DINO, and YOLO11
Baidu Geek Talk
Baidu Geek Talk
Nov 25, 2024 · Artificial Intelligence

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

PP‑ShiTuV2, a PaddleX pipeline that integrates subject detection, deep feature encoding, and vector retrieval, delivers 91 % recall@1 on AliProducts, surpasses earlier models by over 20 points, runs efficiently on GPU and CPU, and offers simple installation, quick‑start code, and full fine‑tuning support.

Computer VisionDeep LearningModel Deployment
0 likes · 8 min read
PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX
JD Tech Talk
JD Tech Talk
Nov 14, 2024 · Artificial Intelligence

Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet

This paper presents a multimodal Reliable Feedback Network (RFNet) and a consistency regularization method that use human feedback to automatically evaluate and fine‑tune diffusion models, dramatically increasing the usable rate of e‑commerce advertising images while preserving visual quality.

Computer VisionHuman FeedbackRFNet
0 likes · 8 min read
Can Human Feedback Make Advertising Image Generation Reliable? Introducing RFNet
Bilibili Tech
Bilibili Tech
Nov 8, 2024 · Artificial Intelligence

AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili

Bilibili’s AI‑driven game‑recognition system extracts real‑time LoL events through OCR, hero detection and hot‑spot tagging, generating high‑energy timestamps and interactive overlays that let viewers jump to key moments and view detailed statistics, enhancing spectator engagement and analytical capabilities across major esports tournaments.

AIComputer VisionGame Recognition
0 likes · 14 min read
AI-Powered Game Recognition for League of Legends Live Streaming on Bilibili
Test Development Learning Exchange
Test Development Learning Exchange
Nov 4, 2024 · Artificial Intelligence

Image Processing with Python: Pillow and OpenCV Guide

This guide demonstrates how to perform common image processing tasks in Python using the Pillow and OpenCV libraries, covering reading, displaying, saving, resizing, cropping, rotating, converting to grayscale, adding text, compositing, blurring, sharpening, enhancing, and extracting image metadata.

Computer VisionOpenCVPython
0 likes · 5 min read
Image Processing with Python: Pillow and OpenCV Guide
Tencent Cloud Developer
Tencent Cloud Developer
Oct 30, 2024 · Artificial Intelligence

Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview

This survey acts as a comprehensive portal that organizes AIGC research across seven domains—text, image, and audio generation, cross‑modal association, text‑guided image and audio synthesis, and supporting resources—detailing seminal models such as GPT, Diffusion, CLIP, DALL·E, Stable Diffusion, MusicLM, and key papers that shaped each field.

AIGCCLIPComputer Vision
0 likes · 19 min read
Comprehensive Survey of AIGC Research: Papers, Resources, and Technical Overview
php Courses
php Courses
Sep 25, 2024 · Artificial Intelligence

Real-Time Face Recognition with PHP and OpenCV

This article demonstrates how to set up a PHP environment, control a camera, and integrate OpenCV for real-time face detection and recognition, providing code examples and a complete workflow to enhance security applications.

Computer VisionPHPface recognition
0 likes · 5 min read
Real-Time Face Recognition with PHP and OpenCV
Sohu Tech Products
Sohu Tech Products
Sep 11, 2024 · Artificial Intelligence

Low‑Cost 3D Reconstruction Using 3D Gaussian Splatting

This article explains how to create high‑quality 3D scenes from ordinary video footage by slicing frames with ffmpeg, extracting camera poses with COLMAP, and applying 3D Gaussian Splatting to replace traditional mesh‑texture pipelines, dramatically lowering equipment costs and data size.

3D reconstructionCOLMAPComputer Vision
0 likes · 6 min read
Low‑Cost 3D Reconstruction Using 3D Gaussian Splatting
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 11, 2024 · Artificial Intelligence

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

This article reviews recent advances in applying large language models to computer vision, covering background challenges, unified multimodal modeling, the PixelLM architecture for pixel‑level understanding and generation, and new approaches to image and video creation such as StoryDiffusion, while outlining future research directions.

Computer VisionPixelLMStoryDiffusion
0 likes · 22 min read
How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation
Python Programming Learning Circle
Python Programming Learning Circle
Sep 5, 2024 · Artificial Intelligence

Face Detection with Haar Cascade and Face Recognition Using LBPH in OpenCV

This article explains the fundamentals of face detection using the Haar‑cascade algorithm, how to train and apply detectors with OpenCV, and introduces the Local Binary Patterns Histograms (LBPH) method for face recognition, covering data preparation, parameter selection, and matching techniques.

Computer VisionFace DetectionHaar cascade
0 likes · 13 min read
Face Detection with Haar Cascade and Face Recognition Using LBPH in OpenCV
AntTech
AntTech
Sep 3, 2024 · Artificial Intelligence

2024 Inclusion Bund Conference AI Innovation Competition and Deepfake Challenge Results

The 2024 Inclusion Bund Conference in Shanghai announced the winners of its newly added AI Innovation Competition, including the AFAC Financial Intelligence Contest and the Global Deepfake Attack‑Defense Challenge, highlighting participation from over 7,000 teams across more than 20 countries and showcasing cutting‑edge deepfake detection achievements.

AIComputer VisionDataset
0 likes · 7 min read
2024 Inclusion Bund Conference AI Innovation Competition and Deepfake Challenge Results
JD Cloud Developers
JD Cloud Developers
Aug 29, 2024 · Artificial Intelligence

How AI Powers E‑Commerce Content Compliance and Price Governance

This article explains how e‑commerce platforms use AI‑driven content compliance to detect malicious products, price manipulation, and counterfeit goods, outlining the technical challenges, core business metrics, model‑based solutions for price over‑pricing, and personal growth advice for compliance engineers.

AIComputer VisionNLP
0 likes · 9 min read
How AI Powers E‑Commerce Content Compliance and Price Governance
Bilibili Tech
Bilibili Tech
Aug 27, 2024 · Artificial Intelligence

Multimodal Video Scene Classification for Adaptive Video Processing

The paper presents a multimodal video scene classification system that leverages CLIP‑generated pseudo‑labels and a fine‑tuned image encoder to automatically identify nature, animation/game, and document scenes, enabling more effective adaptive transcoding, intelligent restoration, and quality assessment for user‑generated content on platforms such as Bilibili.

Bilibili multimediaCLIPComputer Vision
0 likes · 17 min read
Multimodal Video Scene Classification for Adaptive Video Processing
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 22, 2024 · Artificial Intelligence

Understanding Faster R-CNN: Architecture, Training, and Experimental Results

This article provides an in‑depth overview of the Faster R‑CNN object detection framework, covering its background, key innovations such as the Region Proposal Network, detailed algorithmic principles, training procedures, experimental results on PASCAL VOC and MS COCO, and a reproducible PyTorch implementation.

Computer VisionDeep LearningFaster R-CNN
0 likes · 14 min read
Understanding Faster R-CNN: Architecture, Training, and Experimental Results
php Courses
php Courses
Jul 26, 2024 · Artificial Intelligence

Real-Time Image Processing with PHP and OpenCV

This tutorial explains how PHP developers can install OpenCV and the php-opencv extension, write code to capture webcam video, display live frames in a browser, and perform real-time face detection using computer‑vision techniques.

Computer VisionPHPReal-Time
0 likes · 6 min read
Real-Time Image Processing with PHP and OpenCV
Baidu Geek Talk
Baidu Geek Talk
Jul 24, 2024 · Artificial Intelligence

AI-Driven Fusion of Peking Opera Characters with Ink-Wash Painting Style Using PaddleGAN

Li Yilin’s AI project blends Peking Opera characters with traditional ink‑wash painting by using PaddleHub for style transfer and PaddleGAN’s First‑Order Motion model for facial motion, then adds music and Wav2Lip lip‑sync, producing videos that modernize Chinese heritage and gauge public cultural awareness.

AIComputer VisionDeep Learning
0 likes · 9 min read
AI-Driven Fusion of Peking Opera Characters with Ink-Wash Painting Style Using PaddleGAN
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Jul 17, 2024 · Artificial Intelligence

Open-Source PDF Toolkit Delivers High-Accuracy Layout and Formula Detection

PDF‑Extract‑Kit is an open‑source toolkit that combines high‑accuracy layout detection, formula detection, formula recognition, and OCR for PDFs, and the article details its model comparisons, evaluation on academic and textbook datasets, and step‑by‑step instructions for running it on Windows or macOS, including Apple Silicon.

Computer VisionOCRPDF-Extract-Kit
0 likes · 6 min read
Open-Source PDF Toolkit Delivers High-Accuracy Layout and Formula Detection
Kuaishou Tech
Kuaishou Tech
Jul 16, 2024 · Artificial Intelligence

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

LivePortrait is an open‑source, controllable portrait video generation framework that transfers facial expressions and poses from a driving video to static or dynamic portraits in real time, leveraging a 69M‑frame mixed video‑image training set, stitching and retargeting modules, and achieving high quality with low latency.

AIComputer VisionDeep Learning
0 likes · 14 min read
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 15, 2024 · Artificial Intelligence

How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers

EasyAnimate v3, an open‑source video generation system from Alibaba Cloud AI Platform, introduces Diffusion Transformer‑based architecture, Hybrid Motion Module, and Slice VAE to enable image‑to‑video, text‑to‑video, and unlimited‑length video creation with up to 720p/144 fps resolution on modest GPU memory.

AIComputer VisionDiffusion Transformer
0 likes · 5 min read
How EasyAnimate v3 Generates High‑Resolution Videos with Diffusion Transformers
Selected Java Interview Questions
Selected Java Interview Questions
Jul 3, 2024 · Artificial Intelligence

Integrating OpenCV with Java and Spring Boot for Face Detection and Recognition

This guide provides a comprehensive walkthrough of installing OpenCV, using its Java API for image and video face detection, implementing face comparison, creating custom GUI windows, and integrating the library into a Spring Boot application with detailed code examples and common troubleshooting tips.

Computer VisionCustom GUIFace Detection
0 likes · 25 min read
Integrating OpenCV with Java and Spring Boot for Face Detection and Recognition
Kuaishou Tech
Kuaishou Tech
Jul 1, 2024 · Artificial Intelligence

Short-Form Video Quality Assessment Competition at CVPR NTIRE 2024: Dataset, Challenge Overview, and Top Winning Solutions

The CVPR NTIRE 2024 short-form video quality assessment competition introduced the KVQ dataset, attracted over 200 teams, evaluated submissions using SROCC and PLCC metrics, and highlighted the winning approaches of SJTU MMLab, IH‑VQA, and TVQE, showcasing advances in AI‑driven video quality evaluation.

AI competitionComputer VisionDataset
0 likes · 9 min read
Short-Form Video Quality Assessment Competition at CVPR NTIRE 2024: Dataset, Challenge Overview, and Top Winning Solutions
DaTaobao Tech
DaTaobao Tech
Jul 1, 2024 · Artificial Intelligence

Recent Progress in Vision-Language Models (VLMs)

Over the past year, Vision‑Language Models have surged from early multimodal experiments to competitive open‑source systems rivaling GPT‑4, driven by higher‑resolution processing, richer vision encoders, better projection layers, and larger curated datasets, yet they still face evaluation difficulties, hallucinations, speed limits, and limited multimodal output.

Computer VisionDeep LearningVision-Language Models
0 likes · 24 min read
Recent Progress in Vision-Language Models (VLMs)
Kuaishou Large Model
Kuaishou Large Model
Jun 27, 2024 · Artificial Intelligence

How I2V-Adapter Turns Images into Videos with Minimal Training

Fast‑forwarding image‑to‑video generation, the article introduces I2V‑Adapter, a lightweight plug‑in for Stable Diffusion‑based video diffusion models that converts a single static image into a coherent video without altering the original T2V architecture, and details its design, frame‑similarity prior, experimental results, and real‑world applications.

AIComputer VisionI2V-Adapter
0 likes · 9 min read
How I2V-Adapter Turns Images into Videos with Minimal Training
Kuaishou Tech
Kuaishou Tech
Jun 26, 2024 · Artificial Intelligence

I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models

The I2V-Adapter paper introduces a plug‑and‑play lightweight module that enables static images to be converted into dynamic videos using Stable Diffusion‑based text‑to‑video diffusion models without altering the original architecture or pretrained parameters, achieving competitive quality with far less training cost.

AIComputer VisionI2V-Adapter
0 likes · 8 min read
I2V-Adapter: A Lightweight Image‑to‑Video Adapter for Stable Diffusion Video Diffusion Models
Ops Development & AI Practice
Ops Development & AI Practice
Jun 22, 2024 · Artificial Intelligence

Why Transformers Revolutionized AI: From NLP to Vision and Speech

Transformers, introduced in 2017, have reshaped neural networks by leveraging attention mechanisms to outperform RNNs and CNNs across NLP, computer vision, and speech tasks, offering parallel processing, long‑range dependency capture, and versatile applications such as translation, text generation, image classification, and speech recognition.

Attention MechanismComputer VisionDeep Learning
0 likes · 6 min read
Why Transformers Revolutionized AI: From NLP to Vision and Speech
AntTech
AntTech
Jun 18, 2024 · Artificial Intelligence

Ant Group’s 24 Papers Featured at CVPR2024: Topics and Abstracts

The IEEE CVPR2024 conference in Seattle accepted 2,719 papers out of 11,532 submissions, and Ant Group contributed 24 papers covering computer vision, deep learning, digital humans, large models, multimodal remote sensing, vision‑language distillation, federated incremental learning, model‑stealing defense, and more, with one highlighted as a highlight.

Ant GroupCVPR2024Computer Vision
0 likes · 17 min read
Ant Group’s 24 Papers Featured at CVPR2024: Topics and Abstracts
Model Perspective
Model Perspective
Jun 17, 2024 · Artificial Intelligence

Can Diffusion Equations Restore Damaged Paintings? A Practical Guide

This article explains how diffusion equation methods can be applied to digitally repair spotted paintings, covering the mathematical representation of images, the underlying heat‑transfer analogy, step‑by‑step inpainting procedures, and improvements such as total variation flow to preserve edges.

Computer VisionImage Restorationdiffusion equation
0 likes · 4 min read
Can Diffusion Equations Restore Damaged Paintings? A Practical Guide
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jun 16, 2024 · Artificial Intelligence

HRNet Source Code Walkthrough: Keypoint Dataset Construction, Online Data Augmentation, and Training Pipeline

This article provides a detailed, English-language walkthrough of the HRNet source code, covering how the COCO keypoint dataset is built, the online data‑augmentation techniques applied during training, and the end‑to‑end training and inference procedures for human pose estimation.

Computer VisionDeep LearningHRNet
0 likes · 36 min read
HRNet Source Code Walkthrough: Keypoint Dataset Construction, Online Data Augmentation, and Training Pipeline
Meituan Technology Team
Meituan Technology Team
Jun 13, 2024 · Artificial Intelligence

Overview of Meituan's Selected CVPR 2024 Papers and Online Sharing Event

Meituan's tech team highlights seven CVPR 2024 papers—spanning OCR pre‑training, long‑tail semi‑supervised learning, visual AIGC, audio‑visual segmentation and synthetic‑data detection—provides detailed abstracts and experimental results, and announces an online author‑talk session on June 27.

Audio-Visual SegmentationCVPR 2024Computer Vision
0 likes · 18 min read
Overview of Meituan's Selected CVPR 2024 Papers and Online Sharing Event
DataFunSummit
DataFunSummit
Jun 11, 2024 · Artificial Intelligence

AI Technology Evolution, Commercial Drivers, and Practical Applications in Iron Spectrum Image Recognition, Smart Clause Libraries, and Fire Detection

This article examines the commercial forces behind AI technology evolution, explores academic research on iron‑spectrum image recognition, and details product‑level deployments such as smart clause libraries and fire‑detection systems, while highlighting challenges, strategic approaches, and future outlooks for AI adoption.

AIComputer VisionProduct Development
0 likes · 21 min read
AI Technology Evolution, Commercial Drivers, and Practical Applications in Iron Spectrum Image Recognition, Smart Clause Libraries, and Fire Detection
php Courses
php Courses
May 30, 2024 · Artificial Intelligence

Real-Time Face Recognition with PHP and OpenCV

This article demonstrates how to set up a PHP environment with OpenCV, control a camera to capture images, and implement real-time face detection and recognition using Haar cascades and LBPH algorithms, providing code examples for building a security-oriented facial recognition system.

Computer VisionOpenCVPHP
0 likes · 6 min read
Real-Time Face Recognition with PHP and OpenCV
Liangxu Linux
Liangxu Linux
May 26, 2024 · Artificial Intelligence

Can Palette-Based Recoloring Transform Pokémon Images Without Neural Networks?

This article presents a mathematically modeled algorithm that extracts color palettes from any Pokémon image and applies them to another, optimizing the swap via deep‑feature distance and dense color‑transform space, demonstrating superior visual results and subjective evaluations compared to traditional hue‑shift and other recoloring methods.

Computer VisionDeep Learningcolor transfer
0 likes · 14 min read
Can Palette-Based Recoloring Transform Pokémon Images Without Neural Networks?
Huolala Tech
Huolala Tech
May 23, 2024 · Artificial Intelligence

How to Detect and Remove Moiré Patterns with AI and Diffusion Models

This article explains the nature of moiré patterns in digital imaging, reviews manual mitigation techniques, introduces direct and indirect AI‑based recognition methods—including traditional feature extraction and deep‑learning models such as CNNs and diffusion frameworks—and details practical applications and evaluation metrics used by Huolala.

AIComputer VisionDeep Learning
0 likes · 17 min read
How to Detect and Remove Moiré Patterns with AI and Diffusion Models
Meituan Technology Team
Meituan Technology Team
May 16, 2024 · Artificial Intelligence

CMIngre: A Cross‑Modal Ingredient‑Level Dataset for Chinese Food Understanding

The CMIngre dataset, created by Meituan’s R&D platform and Tianjin University, offers 8,001 image‑text pairs of 429 Chinese dishes with 95,290 ingredient bounding boxes, enabling fine‑grained ingredient detection and cross‑modal retrieval tasks, and baseline experiments show DINO and CLIP models achieve the strongest performance.

Computer Visioncross-modal retrievalfood understanding
0 likes · 44 min read
CMIngre: A Cross‑Modal Ingredient‑Level Dataset for Chinese Food Understanding
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
May 10, 2024 · Artificial Intelligence

SIF3D: Sense‑Informed Forecasting of 3D Human Motion with Multimodal Attention

SIF3D is a scene‑aware 3D human motion forecasting framework that fuses observed motion, 3D point‑cloud scenes, and gaze through novel ternary intention‑aware and semantic‑coherence‑aware attention mechanisms, encoding with PointNet++ and Transformers, and decoding with a graph‑convolutional network, achieving state‑of‑the‑art results on GIMO and GTA‑1M benchmarks.

3D scene understandingCVPR2024Computer Vision
0 likes · 15 min read
SIF3D: Sense‑Informed Forecasting of 3D Human Motion with Multimodal Attention
AntTech
AntTech
May 6, 2024 · Artificial Intelligence

AGAP: A Simple Technique for Editing 3D Neural Radiance Fields Explained in Plain Language

The article introduces AGAP, an open‑source method from Ant Research Institute that replaces complex 3D neural radiance field editing with a 2D appearance aggregation approach, enabling easy, loss‑less 3D image manipulation comparable to 2D Photoshop, and is showcased in a short explanatory video.

3D editingAGAPComputer Vision
0 likes · 3 min read
AGAP: A Simple Technique for Editing 3D Neural Radiance Fields Explained in Plain Language
Alimama Tech
Alimama Tech
Apr 24, 2024 · Artificial Intelligence

Mask‑Guided Diffusion for Precise Product Image Generation

Mask‑Guided Diffusion combines instance‑mask training, Masked Canny ControlNet, and Mask‑guided Attribute Binding to preserve product details, correctly bind attributes, fix hand distortion, and generate uniform colored backgrounds, enabling merchants to quickly create high‑quality, controllable product images with Stable Diffusion.

AIComputer VisionControlNet
0 likes · 16 min read
Mask‑Guided Diffusion for Precise Product Image Generation
php Courses
php Courses
Apr 16, 2024 · Artificial Intelligence

Using PHP and OpenCV for Camera‑Based Object Detection

This tutorial explains how to install required libraries, write PHP code that captures images from a webcam, uses OpenCV and php‑facedetect to detect faces, and displays the results with annotated bounding boxes, providing a foundation for further object detection projects.

CameraComputer VisionFace Detection
0 likes · 6 min read
Using PHP and OpenCV for Camera‑Based Object Detection
php Courses
php Courses
Apr 11, 2024 · Artificial Intelligence

How to Use PHP and OpenCV for Real-Time Camera Image Processing

This tutorial explains how PHP developers can install OpenCV and the php‑opencv extension, capture video from a webcam, display live frames in a browser, and perform basic real‑time image processing such as face detection using OpenCV’s cascade classifier.

Computer VisionFace DetectionOpenCV
0 likes · 5 min read
How to Use PHP and OpenCV for Real-Time Camera Image Processing