Tagged articles
650 articles
Page 3 of 7
Kuaishou Tech
Kuaishou Tech
Mar 6, 2024 · Artificial Intelligence

Short Video Quality Assessment Competition (KVQ) at CVPR NTIRE 2024

The CVPR NTIRE 2024 workshop hosts the first short‑video quality assessment competition, introducing the KVQ dataset of 4,200 videos across nine scenes, providing training/validation data, a baseline 3D Swin‑Transformer model, detailed competition rules, rewards, and organizer contacts.

AIComputer VisionDataset
0 likes · 7 min read
Short Video Quality Assessment Competition (KVQ) at CVPR NTIRE 2024
DaTaobao Tech
DaTaobao Tech
Mar 6, 2024 · Artificial Intelligence

AI Clothing Graffiti Project: Implementation and Optimization of AIGC Technology in Taobao Life 2

The AI Clothing Graffiti Project in Taobao Life 2 leverages Stable Diffusion, ControlNet, and LoRA to let users generate and stylize clothing designs via text‑image prompts, employing parallel processing, face repair, and content filtering, and has launched successfully, inviting algorithm engineers to join the team.

AIAIGCComputer Vision
0 likes · 14 min read
AI Clothing Graffiti Project: Implementation and Optimization of AIGC Technology in Taobao Life 2
DataFunSummit
DataFunSummit
Mar 5, 2024 · Artificial Intelligence

AI-Driven Intelligent Management and Regulation of Mold Temperature in Smart Manufacturing

This article explores how artificial intelligence, computer vision, and control algorithms are applied to smart manufacturing for intelligent mold temperature detection, cooling flow regulation, and full‑process system alerts, presenting a detailed solution architecture, key technologies, and a real‑world case study.

AIComputer VisionMold Temperature Control
0 likes · 11 min read
AI-Driven Intelligent Management and Regulation of Mold Temperature in Smart Manufacturing
NewBeeNLP
NewBeeNLP
Mar 4, 2024 · Artificial Intelligence

A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations

This article presents a GitHub‑hosted collection of 25 recent research papers on Mamba and its variants, summarizing each work’s core contributions across sequence modeling, vision, medical imaging, graph analysis, and multimodal tasks, and highlighting their performance gains over prior methods.

Computer VisionDeep LearningMamba
0 likes · 13 min read
A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations
Architects' Tech Alliance
Architects' Tech Alliance
Feb 18, 2024 · Artificial Intelligence

How OpenAI’s Sora Redefines Video Generation with 3‑D Consistency and World Simulation

OpenAI’s Sora model introduces a diffusion‑transformer approach that generates high‑fidelity, 60‑second videos with consistent 3‑D camera motion, long‑term object persistence, and the ability to simulate interactive digital worlds, backed by a detailed technical report and research paper.

Computer VisionOpenAISora
0 likes · 9 min read
How OpenAI’s Sora Redefines Video Generation with 3‑D Consistency and World Simulation
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jan 31, 2024 · Artificial Intelligence

Encoding‑Alignment‑Interaction (EAI) Framework for Full‑Body Human Motion Forecasting

The Encoding‑Alignment‑Interaction (EAI) framework predicts full‑body human motion—including detailed hand joints—by extracting spatio‑temporal features with DCT and GCNs, aligning heterogeneous body‑hand representations via Cross‑Context Alignment, and modeling semantic and physical interactions through Cross‑Context Interaction, achieving state‑of‑the‑art accuracy on the GRAB dataset.

Computer VisionEAI frameworkcross-context alignment
0 likes · 15 min read
Encoding‑Alignment‑Interaction (EAI) Framework for Full‑Body Human Motion Forecasting
DaTaobao Tech
DaTaobao Tech
Jan 31, 2024 · Artificial Intelligence

Highlights of Recent AI Research Papers from Top Conferences (2023)

The article curates standout AI papers from 2023 CCF‑A conferences—including CVPR, ICLR, ACM MM, and INFORMS—showcasing advances such as Swin‑Transformer video quality assessment, cross‑modal e‑commerce product search, transformer‑based vehicle routing heuristics, diffusion‑driven dance generation, and reinforcement‑learning inventory replenishment.

AIComputer VisionMultimedia
0 likes · 23 min read
Highlights of Recent AI Research Papers from Top Conferences (2023)
Huolala Tech
Huolala Tech
Jan 25, 2024 · Artificial Intelligence

How Open‑Vocabulary Detection and Segment‑Anything Are Revolutionizing Visual AI at Huolala

This article reviews traditional computer‑vision tasks—classification, detection, and segmentation—highlights their limitations, introduces open‑vocabulary detection and segment‑anything models such as GLIP, Grounding DINO, and SAM, and details how Huolala applies these advances to driver‑license, packing, and vehicle‑sticker inspections for safer, more efficient AI‑driven operations.

Computer VisionSegmentationobject detection
0 likes · 20 min read
How Open‑Vocabulary Detection and Segment‑Anything Are Revolutionizing Visual AI at Huolala
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Jan 12, 2024 · Artificial Intelligence

Exploring NeRF: From Theory to Real-World 3D Reconstruction Tools

This article introduces Neural Radiance Fields (NeRF) as a cutting‑edge AI technique for high‑quality 3D reconstruction, explains its core principles and advantages, outlines a step‑by‑step building workflow, reviews popular open‑source libraries such as Luma AI, NVIDIA Instant NeRF and NeRFStudio, and offers a forward‑looking summary of its potential and challenges.

3D reconstructionAIComputer Vision
0 likes · 12 min read
Exploring NeRF: From Theory to Real-World 3D Reconstruction Tools
21CTO
21CTO
Dec 17, 2023 · Artificial Intelligence

Remembering Tang Xiaoyu: The Visionary Behind Modern Facial Recognition

AI pioneer Tang Xiaoyu, co‑founder of SenseTime and former director of leading computer‑vision labs, passed away in December 2023, leaving a legacy of groundbreaking facial‑recognition algorithms, influential mentorship, and a profound impact on the global artificial‑intelligence community.

AI PioneerComputer Visionartificial intelligence
0 likes · 7 min read
Remembering Tang Xiaoyu: The Visionary Behind Modern Facial Recognition
We-Design
We-Design
Dec 13, 2023 · Artificial Intelligence

How AI-Powered Beauty Filters Evolved: From Classic Portraits to Real-Time Video Effects

This article traces the evolution of beauty filter technology from ancient artistic enhancements to modern AI-driven real-time video effects, detailing key techniques like face detection, skin smoothing, AR integration, and shifting user preferences, while reflecting on its cultural impact on social media aesthetics.

AIARComputer Vision
0 likes · 9 min read
How AI-Powered Beauty Filters Evolved: From Classic Portraits to Real-Time Video Effects
Airbnb Technology Team
Airbnb Technology Team
Dec 8, 2023 · Artificial Intelligence

Leveraging Image Aesthetics and Photo Sorting Algorithms to Enhance Airbnb Listings

Airbnb’s new computer‑vision pipeline trains a deep‑learning aesthetic model with an EMD loss to rank photos, automatically sorts new‑listing images by design and room type, and scales real‑time similarity search via HNSW‑based ANN on AWS OpenSearch, boosting click‑through, bookings, and enabling unsupervised visual recommendations.

AirbnbComputer VisionDeep Learning
0 likes · 9 min read
Leveraging Image Aesthetics and Photo Sorting Algorithms to Enhance Airbnb Listings
IT Services Circle
IT Services Circle
Dec 6, 2023 · Artificial Intelligence

AI Image Outpainting: Unexpected Transformations and How It Works

The article showcases a series of humorous and surprising AI‑generated image expansions from Douyin, explains the underlying outpainting technology, and discusses why such tools are both entertaining and useful despite occasional odd results.

AIComputer VisionDeep Learning
0 likes · 6 min read
AI Image Outpainting: Unexpected Transformations and How It Works
Python Programming Learning Circle
Python Programming Learning Circle
Nov 30, 2023 · Artificial Intelligence

Common Python Libraries for Computer Vision Projects

This article introduces ten popular Python libraries for computer vision, describing their main features, typical applications, and providing concise code examples to help beginners and practitioners quickly choose and use the right tools for image processing and deep learning tasks.

Computer VisionImage ProcessingPython
0 likes · 10 min read
Common Python Libraries for Computer Vision Projects
DataFunTalk
DataFunTalk
Nov 24, 2023 · Artificial Intelligence

Open Vocabulary Detection Contest 2023: Summary of Winning Teams' Technical Solutions

The article reviews the Open Vocabulary Detection Contest organized by the Chinese Society of Image and Graphics and 360 AI Institute, describing the competition setup, dataset characteristics, and detailed winning approaches that combine Detic, CLIP, prompt learning, and multi‑stage pipelines to achieve strong few‑shot and zero‑shot object detection performance.

CLIPComputer Visioncompetition
0 likes · 17 min read
Open Vocabulary Detection Contest 2023: Summary of Winning Teams' Technical Solutions
Test Development Learning Exchange
Test Development Learning Exchange
Nov 16, 2023 · Artificial Intelligence

Building a Python Image Editing Tool with Pillow, OpenCV, and NumPy

This guide demonstrates how to create a custom image editing tool in Python by leveraging the Pillow, OpenCV, and NumPy libraries, providing step‑by‑step code examples for opening, resizing, filtering, converting to grayscale, edge detection, rotation, channel manipulation, blurring, contour extraction, and color adjustment.

Computer VisionImage ProcessingNumPy
0 likes · 6 min read
Building a Python Image Editing Tool with Pillow, OpenCV, and NumPy
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Nov 9, 2023 · Artificial Intelligence

How Wav2Lip Achieves Accurate Speech‑Driven Lip Sync with Expert Discriminators

The article analyzes the limitations of traditional speech‑driven lip‑sync methods and explains how Wav2Lip introduces a pretrained multi‑frame expert sync discriminator, a two‑stage GAN training pipeline, and a specialized generator architecture to produce high‑quality, audio‑aligned facial videos.

Computer VisionDeep LearningGAN
0 likes · 7 min read
How Wav2Lip Achieves Accurate Speech‑Driven Lip Sync with Expert Discriminators
Tencent Tech
Tencent Tech
Nov 9, 2023 · Artificial Intelligence

How Adaptive Skinning Model Boosts Low-Cost High-Quality 3D Face Reconstruction

This article introduces the Adaptive Skinning Model (ASM), a low‑cost yet high‑precision 3D face reconstruction technique that leverages Gaussian‑Mixture skinning weights and dynamic bone binding to surpass traditional 3DMM methods and achieve state‑of‑the‑art results on multiple benchmarks.

3D face reconstructionComputer VisionGaussian mixture model
0 likes · 13 min read
How Adaptive Skinning Model Boosts Low-Cost High-Quality 3D Face Reconstruction
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Oct 31, 2023 · Artificial Intelligence

Edge‑Cloud AI Powers Student Fatigue‑Driving Detection – Challenge Cup Winners

The 18th Challenge Cup showcased cutting‑edge student projects on fatigue‑driving detection, with Huawei Cloud’s edge‑cloud collaborative topic drawing nearly a thousand participants and five top teams demonstrating AI‑driven solutions that combine incremental training, low‑light enhancement, and lightweight models for real‑time safety alerts.

AIComputer VisionEdge Computing
0 likes · 6 min read
Edge‑Cloud AI Powers Student Fatigue‑Driving Detection – Challenge Cup Winners
Python Programming Learning Circle
Python Programming Learning Circle
Oct 26, 2023 · Artificial Intelligence

Animal Recognition Techniques Using Deep Learning and Image Processing

This article reviews animal recognition technology, covering its background, basic principles, image‑processing, feature extraction, machine‑learning and deep‑learning methods, dataset construction, preprocessing, and feature‑selection techniques, and provides Python code examples for implementing CNNs and traditional classifiers.

Computer VisionDeep LearningImage Processing
0 likes · 18 min read
Animal Recognition Techniques Using Deep Learning and Image Processing
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Oct 23, 2023 · Artificial Intelligence

How Multiple‑Instance Learning Boosts Context Understanding in Video Anomaly Detection

The article reviews the CVPR 2021 MIST framework, explaining how a multiple‑instance pseudo‑label generator and a self‑guided attention encoder work together with sparse continuous sampling to improve context awareness and detection accuracy in weakly‑supervised video anomaly detection.

Attention EncoderComputer VisionMultiple Instance Learning
0 likes · 9 min read
How Multiple‑Instance Learning Boosts Context Understanding in Video Anomaly Detection
DaTaobao Tech
DaTaobao Tech
Oct 13, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Core Principles and Technical Architecture

The article demystifies Stable Diffusion by explaining its low‑cost latent‑space design and conditioning mechanisms, comparing it to autoregressive, VAE, flow‑based and GAN models, detailing the iterative noise‑to‑image process, token‑based text‑to‑image control, version differences, common generation issues, and providing implementation code examples.

AI image generationComputer VisionCross-Attention
0 likes · 15 min read
Understanding Stable Diffusion: Core Principles and Technical Architecture
Meituan Technology Team
Meituan Technology Team
Oct 11, 2023 · Artificial Intelligence

Meituan Vision AI Research Highlights and Open‑Source Releases

This article compiles Meituan's cutting‑edge computer‑vision research and engineering achievements—including CVPR award‑winning segmentation, YOLOv6 releases, GPU inference optimizations, the Food2K dataset, and numerous paper digests—to provide practical insights for visual AI practitioners.

CVPRComputer VisionDeep Learning
0 likes · 11 min read
Meituan Vision AI Research Highlights and Open‑Source Releases
Kuaishou Large Model
Kuaishou Large Model
Sep 27, 2023 · Artificial Intelligence

DVIS: Decoupled Framework that Sets New SOTA in Video Instance Segmentation

DVIS introduces a decoupled video instance segmentation framework that splits the task into segmentation, tracking, and refinement modules, achieving state-of-the-art performance across VIS, VPS, and VSS benchmarks while maintaining low computational overhead, and demonstrates robustness in both online and offline settings.

Computer VisionDeep LearningTransformer
0 likes · 12 min read
DVIS: Decoupled Framework that Sets New SOTA in Video Instance Segmentation
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Sep 16, 2023 · Artificial Intelligence

Understanding DeepSort: A Classic Multi-Object Tracking Algorithm

This article introduces the fundamentals of object tracking in computer vision, explains classic algorithms such as SORT and its deep learning extension DeepSort, describes their underlying mechanisms including Kalman filtering, Hungarian assignment, feature extraction via CNNs, and provides references and code resources for further study.

CNNComputer VisionDeepSort
0 likes · 10 min read
Understanding DeepSort: A Classic Multi-Object Tracking Algorithm
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 26, 2023 · Artificial Intelligence

Using AI and RPA to Solve Slider Captcha: A Practical Implementation with YOLOv8 and PyAutoGUI

This article demonstrates how to combine AI‑based object detection (YOLOv8) with robotic process automation (pyautogui) to automatically locate, drag and release slider captchas, covering data preparation, model training, screen capture, coordinate extraction, mouse simulation, and robustness improvements.

AICaptchaComputer Vision
0 likes · 15 min read
Using AI and RPA to Solve Slider Captcha: A Practical Implementation with YOLOv8 and PyAutoGUI
DataFunSummit
DataFunSummit
Aug 24, 2023 · Artificial Intelligence

Panoramic Indoor Layout Estimation with Vision Transformer (PanoViT)

This article introduces the PanoViT model, a vision‑transformer‑based approach for indoor layout estimation from panoramic images, covering its research background, architectural components, experimental results on public datasets, and step‑by‑step usage within ModelScope.

3D reconstructionComputer VisionDeep Learning
0 likes · 8 min read
Panoramic Indoor Layout Estimation with Vision Transformer (PanoViT)
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 24, 2023 · Artificial Intelligence

Neural Style Transfer with PyTorch: Theory and Implementation

This article introduces neural style transfer, explains its underlying principles using VGG19 feature extraction, content and style loss definitions, and provides a complete PyTorch implementation with code for loading images, extracting features, computing Gram matrices, and optimizing the output image.

Computer VisionDeep LearningPyTorch
0 likes · 14 min read
Neural Style Transfer with PyTorch: Theory and Implementation
Top Architect
Top Architect
Aug 22, 2023 · Artificial Intelligence

Face Recognition Search: Principles, Implementation Steps, and Applications

This article explains the background, core principles, preprocessing, feature extraction, matching algorithms, and practical application scenarios of face recognition search, and provides detailed reference implementations with Java and OpenCV code examples for building a complete system.

Computer VisionDeep LearningImage Processing
0 likes · 15 min read
Face Recognition Search: Principles, Implementation Steps, and Applications
DaTaobao Tech
DaTaobao Tech
Aug 21, 2023 · Artificial Intelligence

Action Sensitivity Learning for Temporal Action Localization

The paper presents Action Sensitivity Learning (ASL), a framework that models frame‑wise importance at both class‑level (via learnable Gaussian distributions) and instance‑level (using quality scores), integrates these weights into classification and regression losses, adds a contrastive InfoNCE term, and achieves state‑of‑the‑art temporal action localization performance across six benchmark datasets.

Action Sensitivity LearningComputer VisionDeep Learning
0 likes · 8 min read
Action Sensitivity Learning for Temporal Action Localization
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 17, 2023 · Artificial Intelligence

Getting Started with YOLOv8 on the Ultralytics Platform: Installation, Command‑Line Usage, and Model Training

This article introduces the YOLOv8 object‑detection framework on the Ultralytics platform, covering environment setup, command‑line and Python APIs for inference, model‑file options, result interpretation, data annotation, training procedures, and exporting models to various deployment formats.

Computer VisionModel TrainingPython
0 likes · 14 min read
Getting Started with YOLOv8 on the Ultralytics Platform: Installation, Command‑Line Usage, and Model Training
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 16, 2023 · Artificial Intelligence

Deep Dive into OCR – Chapter 2: Development and Classification of OCR Technology

This article provides a comprehensive overview of OCR technology, detailing the evolution from traditional hand‑crafted methods to modern deep‑learning approaches, describing image preprocessing, text detection and recognition pipelines, summarizing classic machine‑learning algorithms, and presenting a practical OpenCV implementation with Python code.

Computer VisionDeep LearningOCR
0 likes · 23 min read
Deep Dive into OCR – Chapter 2: Development and Classification of OCR Technology
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 12, 2023 · Artificial Intelligence

An Introduction to OCR: Concepts, History, Applications, Datasets, and Technical Workflow

This article provides a comprehensive overview of Optical Character Recognition (OCR), covering its definition, historical development, classification, real‑world applications, technical pipeline, common challenges, mitigation strategies, popular datasets, model performance comparisons, and leading open‑source platforms.

Computer VisionDatasetsDeep Learning
0 likes · 16 min read
An Introduction to OCR: Concepts, History, Applications, Datasets, and Technical Workflow
Model Perspective
Model Perspective
Aug 2, 2023 · Artificial Intelligence

How Segment Anything (SAM) Is Revolutionizing Image Segmentation

This article explains the fundamentals of image segmentation, introduces the open‑source Segment Anything Model (SAM) and its massive SA‑1B dataset, outlines SAM's unique promptable, real‑time capabilities, and explores its wide‑ranging future applications across AR/VR, content creation, and scientific research.

AIComputer VisionSAM
0 likes · 7 min read
How Segment Anything (SAM) Is Revolutionizing Image Segmentation
Meituan Technology Team
Meituan Technology Team
Jul 27, 2023 · Artificial Intelligence

Street Scene Understanding: Segmentation Technology, Research Progress, and Business Applications

Meituan’s Street‑Scene Understanding team built a high‑precision, efficient segmentation system that aligns motion and static semantics, mines hard examples, iterates models via a data‑model loop, and pursues unified open‑world segmentation, winning multiple CVPR 2023 awards and powering map production, autonomous delivery and store‑scene reconstruction.

AICVPR 2023Computer Vision
0 likes · 31 min read
Street Scene Understanding: Segmentation Technology, Research Progress, and Business Applications
php Courses
php Courses
Jul 24, 2023 · Artificial Intelligence

Image Edge Enhancement Using PHP and OpenCV

This article explains how to perform image edge enhancement by installing PHP and the OpenCV library, importing images, invoking OpenCV functions, selecting edge detection algorithms such as Sobel or Canny, processing the image with custom code, and displaying or saving the enhanced result.

Computer VisionEdge DetectionImage Processing
0 likes · 5 min read
Image Edge Enhancement Using PHP and OpenCV
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 24, 2023 · Artificial Intelligence

Understanding Slide-Transformer: An Efficient Local Attention Module for Vision Transformers

This article explains the Slide-Transformer paper, describing how the proposed Slide Attention replaces inefficient Im2Col‑based local attention with depthwise convolutions and a deformable shift module, achieving high efficiency, flexibility, and hardware‑agnostic performance for Vision Transformers.

Computer VisionDeep LearningDeformable Shift
0 likes · 13 min read
Understanding Slide-Transformer: An Efficient Local Attention Module for Vision Transformers
Huolala Tech
Huolala Tech
Jul 21, 2023 · Artificial Intelligence

Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation

Recent advances in visual language models enable zero-shot multimodal tasks, and this article explores their application to open-set object detection, prompt learning, and promptable surgical instrument segmentation, highlighting methods like CLIP, CoOp, and the DetPro framework with experimental results across multiple benchmarks.

Computer VisionVisual-Language Modelsmultimodal
0 likes · 12 min read
Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation
php Courses
php Courses
Jul 21, 2023 · Artificial Intelligence

Image Segmentation with PHP and OpenCV

This tutorial explains how to perform image segmentation using the OpenCV library in PHP, covering environment setup, library import, image loading, grayscale conversion, thresholding, result display, and saving the segmented output.

Computer VisionOpenCVPHP
0 likes · 4 min read
Image Segmentation with PHP and OpenCV
php Courses
php Courses
Jul 18, 2023 · Artificial Intelligence

Implementing Face Recognition with PHP and OpenCV

This article provides a step‑by‑step tutorial on installing OpenCV and the PHP OpenCV extension on Ubuntu, then demonstrates how to write PHP code for face detection and recognition using OpenCV's cascade classifier and FisherFaceRecognizer, complete with example scripts and usage instructions.

Computer VisionOpenCVPHP
0 likes · 7 min read
Implementing Face Recognition with PHP and OpenCV
php Courses
php Courses
Jul 17, 2023 · Artificial Intelligence

Implementing Facial Landmark Detection with PHP and OpenCV

This tutorial demonstrates how to set up PHP and OpenCV, install necessary libraries, write and run a PHP script that detects faces and extracts facial landmarks, and saves the annotated image, providing a practical introduction to facial landmark detection in computer vision.

Computer VisionFacial Landmark DetectionImage Processing
0 likes · 5 min read
Implementing Facial Landmark Detection with PHP and OpenCV
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 12, 2023 · Artificial Intelligence

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

This article provides an in‑depth, English‑language overview of Vision Transformer (ViT), covering its Transformer‑based architecture, patch‑to‑token conversion, token and position embeddings, fine‑tuning strategies such as 2‑D interpolation, experimental results versus CNNs, and the model’s broader significance for multimodal AI research.

Computer VisionDeep LearningFine‑tuning
0 likes · 25 min read
Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance
Kuaishou Large Model
Kuaishou Large Model
Jul 7, 2023 · Artificial Intelligence

How HairStep Revolutionizes Single-View 3D Hair Reconstruction

This paper introduces HairStep, a novel intermediate representation combining Strand Maps and Depth Maps, and demonstrates how it reduces domain gap and improves single‑view 3D hair reconstruction accuracy across multiple algorithms, supported by new annotated datasets (HiSa, HiDa) and fair evaluation metrics.

3D hair reconstructionComputer VisionDataset
0 likes · 11 min read
How HairStep Revolutionizes Single-View 3D Hair Reconstruction
Efficient Ops
Efficient Ops
Jun 26, 2023 · Artificial Intelligence

How Multimodal AI Is Revolutionizing Credit Card Fraud Detection

Amid tightening financial regulations, ICBC's software team proposes a multimodal AI anti‑fraud framework that combines image, video, and structured data to detect deep‑fake, mask, and forged‑document attacks, enriches verification with cross‑modal cues, and outlines future expansion to text and speech modalities.

AIComputer VisionDeep Learning
0 likes · 7 min read
How Multimodal AI Is Revolutionizing Credit Card Fraud Detection
Programmer DD
Programmer DD
Jun 20, 2023 · Artificial Intelligence

Yann LeCun: Today's AI Still Below Dog Level – Inside Meta’s Voicebox, MusicGen & I‑JEPA

Meta’s chief AI scientist Yann LeCun warned that current large language models still fall short of human and even dog intelligence, citing their lack of real‑world understanding, while Meta unveiled three new generative AI models—Voicebox for speech, MusicGen for music, and I‑JEPA for image reasoning—showcasing both progress and remaining limitations.

Computer VisionMusic generationSpeech synthesis
0 likes · 7 min read
Yann LeCun: Today's AI Still Below Dog Level – Inside Meta’s Voicebox, MusicGen & I‑JEPA
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2023 · Artificial Intelligence

Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification

At CVPR 2023 the Xiaohongshu team presented OvarNet, a unified one‑stage Faster‑RCNN model built on CLIP that uses prompt learning and knowledge distillation to jointly detect objects and recognize open‑vocabulary attributes, achieving state‑of‑the‑art results on VAW, MS‑COCO, LSA and OVAD datasets.

Computer VisionMultimodal Learningattribute recognition
0 likes · 12 min read
Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification
Meituan Technology Team
Meituan Technology Team
Jun 15, 2023 · Artificial Intelligence

Meituan Technical Team's 8 CVPR 2023 Papers: Overview and Insights

This article reviews eight CVPR 2023 papers selected by Meituan’s technology team, covering self‑supervised learning, domain adaptation, federated learning, object detection, 3D reconstruction, GAN‑based pre‑training, RGB‑T tracking, vision‑language navigation, and visual‑textual layout generation, highlighting each work’s methodology, experiments, and reported performance gains.

3D Object DetectionCVPR 2023Computer Vision
0 likes · 15 min read
Meituan Technical Team's 8 CVPR 2023 Papers: Overview and Insights
Alimama Tech
Alimama Tech
Jun 14, 2023 · Artificial Intelligence

Intelligent Live‑Streaming Video Editing Techniques and Practices

Alibaba Mama’s end‑to‑end intelligent clipping system automatically transforms long live‑stream e‑commerce videos into short, high‑quality ads by segmenting streams, classifying speech with GPT‑based tags, selecting visually appealing clips, arranging coherent storylines, and applying effects, achieving 96% classification accuracy and improved advertising efficiency.

AIComputer VisionContent Optimization
0 likes · 14 min read
Intelligent Live‑Streaming Video Editing Techniques and Practices
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Jun 9, 2023 · Artificial Intelligence

2023 NIRC PhD Graduates Reveal Cutting-Edge AI and Network Intelligence Research

In 2023 the Network Intelligent Research Center celebrated its largest PhD graduating class—seven scholars whose dissertations span deep‑vision hand‑gesture estimation, multi‑scenario network transmission, graph alignment, interactive streaming, knowledge‑defined networking, wireless body‑area networking, and more—showcasing significant AI‑driven advances and high‑impact publications.

Computer VisionDeep LearningGraph Alignment
0 likes · 30 min read
2023 NIRC PhD Graduates Reveal Cutting-Edge AI and Network Intelligence Research
DataFunSummit
DataFunSummit
May 31, 2023 · Artificial Intelligence

Evolution of Face Detection Techniques: Datasets, Research Directions, and Future Work

This article reviews the evolution of face detection, covering the Widely‑Face dataset, major research directions such as feature fusion, label assignment, auxiliary supervision, anchor‑free methods, NAS‑based designs, summarizes key papers from S3FD to MogFace, introduces ModelScope implementations, and outlines future challenges and opportunities.

AI researchComputer VisionDatasets
0 likes · 13 min read
Evolution of Face Detection Techniques: Datasets, Research Directions, and Future Work
Test Development Learning Exchange
Test Development Learning Exchange
May 27, 2023 · Artificial Intelligence

Eight Essential OpenCV Examples for Image Processing

This article introduces eight fundamental OpenCV examples—including image reading, display, grayscale conversion, edge detection, resizing, Gaussian blur, and face detection—providing concise Python code snippets and explanations to help readers quickly apply these common computer‑vision techniques.

Code ExamplesComputer VisionImage Processing
0 likes · 5 min read
Eight Essential OpenCV Examples for Image Processing
DataFunTalk
DataFunTalk
May 13, 2023 · Artificial Intelligence

Multimedia Content Understanding at Weibo: Video Summarization, Quality Assessment, OCR, Embedding, and CV‑CUDA Optimization

This article presents Weibo's comprehensive multimedia content understanding pipeline, covering video summarization techniques, quality assessment models, OCR advancements, video embedding strategies, and the performance benefits of CV‑CUDA acceleration, while highlighting real‑world applications and engineering trade‑offs.

CV-CUDAComputer VisionDeep Learning
0 likes · 32 min read
Multimedia Content Understanding at Weibo: Video Summarization, Quality Assessment, OCR, Embedding, and CV‑CUDA Optimization
AntTech
AntTech
May 6, 2023 · Artificial Intelligence

Wu Wenjun AI Science and Technology Award Honors Tsinghua and Ant Group's Unconstrained Human Portrait Perception and Understanding Technology

The 2022 Wu Wenjun Artificial Intelligence Science and Technology Award recognized a decade‑long collaborative effort by Tsinghua University and Ant Group's security lab for breakthrough research on unconstrained human portrait perception and understanding, highlighting three core scientific discoveries, extensive academic impact, and large‑scale commercial applications in identity verification.

AI AwardsAnt GroupComputer Vision
0 likes · 5 min read
Wu Wenjun AI Science and Technology Award Honors Tsinghua and Ant Group's Unconstrained Human Portrait Perception and Understanding Technology
Baidu Tech Salon
Baidu Tech Salon
Apr 25, 2023 · Game Development

How to Build Sensor‑Free Motion Games with PP‑TinyPose and FastDeploy

This article explains how to develop sensor‑less motion-controlled games by leveraging the PP‑TinyPose keypoint detection model and FastDeploy inference tool, detailing the required setup, code snippets, and a reusable PyQt5 framework for creating webcam‑driven interactive demos.

AIComputer VisionFastDeploy
0 likes · 11 min read
How to Build Sensor‑Free Motion Games with PP‑TinyPose and FastDeploy
DataFunSummit
DataFunSummit
Apr 20, 2023 · Artificial Intelligence

SenseTime Unveils Multimodal ‘SenseNova’ Large Model System and Its Industry Applications

SenseTime introduced its visual‑centric multimodal large‑model platform SenseNova, detailing model scaling, extensive AI infrastructure, diverse industry deployments such as autonomous driving and generative content, and the challenges of compute efficiency and data acquisition in the race for advanced AI.

AI InfrastructureComputer Visionlarge models
0 likes · 13 min read
SenseTime Unveils Multimodal ‘SenseNova’ Large Model System and Its Industry Applications
Baidu Tech Salon
Baidu Tech Salon
Apr 14, 2023 · Artificial Intelligence

How PaddleDepth and Paddle3D Enable Low‑Cost 3D Vision Development

This article examines the challenges of 3D vision data acquisition and explains how Baidu's PaddleDepth and Paddle3D toolkits provide low‑cost depth collection, super‑resolution, and end‑to‑end perception pipelines, showcasing performance on KITTI and Middlebury datasets with code examples.

3D visionComputer VisionDepth estimation
0 likes · 12 min read
How PaddleDepth and Paddle3D Enable Low‑Cost 3D Vision Development
AntTech
AntTech
Apr 12, 2023 · Artificial Intelligence

Ant Technology Research Institute Interactive Intelligence Lab – 13 Papers Accepted at CVPR 2023 and Recent AI Research Highlights

The Ant Technology Research Institute’s Interactive Intelligence Lab announced that 13 of its papers were accepted at CVPR 2023, alongside other recent achievements in generative models and 3D vision, highlighting collaborations with top universities and summarizing the lab’s contributions to artificial intelligence research.

3D visionCVPRComputer Vision
0 likes · 6 min read
Ant Technology Research Institute Interactive Intelligence Lab – 13 Papers Accepted at CVPR 2023 and Recent AI Research Highlights
Baidu Tech Salon
Baidu Tech Salon
Apr 7, 2023 · Artificial Intelligence

Ambiguity-Resistant Semi-supervised Learning (ARSL) for Single-stage Object Detection

ARSL, an ambiguity‑resistant semi‑supervised learning framework for single‑stage object detection, introduces Joint‑Confidence Estimation and Task‑Separation Assignment to resolve selection and assignment ambiguities in pseudo‑labels, thereby markedly improving pseudo‑label quality and achieving state‑of‑the‑art AP gains on COCO benchmarks.

ARSLComputer VisionSemi-supervised Learning
0 likes · 8 min read
Ambiguity-Resistant Semi-supervised Learning (ARSL) for Single-stage Object Detection
Baidu Geek Talk
Baidu Geek Talk
Mar 16, 2023 · Artificial Intelligence

PaddleDetection v2.6 Release: PP-YOLOE Family Expansion and Advanced Detection Algorithms

PaddleDetection v2.6 expands the PP‑YOLOE family with rotating, small‑object, dense‑object, and ultra‑lightweight edge‑GPU models, upgrades PP‑Human and PP‑Vehicle toolboxes, releases semi‑supervised, few‑shot and distillation learning methods, adds numerous state‑of‑the‑art algorithms, and improves infrastructure with Python 3.10, EMA filtering and AdamW support.

BaiduComputer VisionDeep Learning
0 likes · 14 min read
PaddleDetection v2.6 Release: PP-YOLOE Family Expansion and Advanced Detection Algorithms
政采云技术
政采云技术
Mar 9, 2023 · Artificial Intelligence

Comprehensive Overview of Object Detection: From Traditional Methods to Modern Deep Learning Models

This article provides a comprehensive overview of object detection, describing traditional sliding‑window approaches, deep‑learning based two‑stage and one‑stage models such as R‑CNN, Faster R‑CNN, YOLO series, and discusses current challenges, improvement directions, and future research trends in the field.

Computer VisionDeep LearningR-CNN
0 likes · 29 min read
Comprehensive Overview of Object Detection: From Traditional Methods to Modern Deep Learning Models
Python Programming Learning Circle
Python Programming Learning Circle
Mar 8, 2023 · Artificial Intelligence

Using ddddocr SDK for Captcha Recognition in Python

This article introduces the open‑source ddddocr SDK, demonstrates how to install it and use it in Python to automatically solve three common captcha types—slider, click‑based, and alphanumeric—providing code examples and result explanations for each.

CaptchaComputer VisionOCR
0 likes · 4 min read
Using ddddocr SDK for Captcha Recognition in Python
Meituan Technology Team
Meituan Technology Team
Feb 23, 2023 · Artificial Intelligence

Food2K: A Large-Scale Food Image Dataset and Progressive Region Enhancement Network

This article reviews the Food2K dataset and the proposed Progressive Region Enhancement Network for large‑scale food image recognition, detailing dataset construction, method design, extensive experiments, ablation studies, visualizations, and future research directions, all validated on the IEEE T‑PAMI 2023 paper.

Computer VisionDatasetFine-Grained Classification
0 likes · 31 min read
Food2K: A Large-Scale Food Image Dataset and Progressive Region Enhancement Network
DaTaobao Tech
DaTaobao Tech
Feb 20, 2023 · Mobile Development

AR Foot Measurement and Hand Try-On Algorithms for Mobile Vision

The article presents a mobile‑vision solution that combines lightweight detection, line detection, segmentation and 3‑D point‑cloud reconstruction to measure foot length within 3 mm error, and a MANO‑based hand‑try‑on system that predicts full mesh vertices for real‑time watch, phone and ring fitting on smartphones.

ARComputer VisionFoot Measurement
0 likes · 18 min read
AR Foot Measurement and Hand Try-On Algorithms for Mobile Vision
AsiaInfo Technology: New Tech Exploration
AsiaInfo Technology: New Tech Exploration
Feb 20, 2023 · Industry Insights

Why Pre‑trained Large Models Are the New Infrastructure for AI Applications

Pre‑trained large models are emerging as the foundational infrastructure for AI across industries; this article analyzes their technical advantages, application trends in NLP, CV and multimodal domains, presents a telecom customer‑service case study with performance benchmarks, and outlines future deployment challenges and research directions.

Computer VisionNLPPrompt Tuning
0 likes · 23 min read
Why Pre‑trained Large Models Are the New Infrastructure for AI Applications
DataFunTalk
DataFunTalk
Feb 11, 2023 · Artificial Intelligence

Accelerating Computer Vision Pipelines with CV-CUDA: Reducing Complexity and Performance Bottlenecks

This article explains how moving image preprocessing and post‑processing to GPU with the open‑source CV‑CUDA library dramatically reduces system complexity, eliminates CPU‑GPU bottlenecks, and delivers up to thirty‑fold performance gains for computer‑vision workloads across training and inference stages.

CV-CUDAComputer VisionDeep Learning
0 likes · 16 min read
Accelerating Computer Vision Pipelines with CV-CUDA: Reducing Complexity and Performance Bottlenecks
DataFunTalk
DataFunTalk
Jan 12, 2023 · Artificial Intelligence

Tencent AI Lab's Advances in High‑Fidelity 3D Face Digitization and Evaluation

This article presents Tencent AI Lab's recent research on efficient 3D face digitization—including single‑photo, multi‑photo, and RGB‑D selfie pipelines—describes a detailed production workflow, introduces a new evaluation benchmark (REALY), and shares insights from a technical Q&A session.

3D face reconstructionAI LabComputer Vision
0 likes · 11 min read
Tencent AI Lab's Advances in High‑Fidelity 3D Face Digitization and Evaluation
DataFunTalk
DataFunTalk
Jan 8, 2023 · Artificial Intelligence

Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Images

The paper introduces ABPN, an Adaptive Blend Pyramid Network that achieves precise, high‑quality skin retouching and garment wrinkle removal on 4K‑8K photos in real time by combining a context‑aware local retouching layer with a novel adaptive blend pyramid layer, addressing challenges of artifact‑free detail preservation and efficient high‑resolution processing.

Computer VisionDeep Learningadaptive blend pyramid
0 likes · 16 min read
Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Images
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Dec 30, 2022 · Artificial Intelligence

Unlocking Realistic Bokeh: Depth‑Aware Algorithms Behind Holiday Video Effects

This article explains the optical principles of bokeh (scatter blur), describes a depth‑aware variable‑focus algorithm developed by Kuaishou’s audio‑video team, and details practical optimizations such as saliency detection, edge‑preserving weighting, and adaptive spot‑light effects that enable realistic, customizable holiday video filters.

BokehComputer VisionDepth estimation
0 likes · 11 min read
Unlocking Realistic Bokeh: Depth‑Aware Algorithms Behind Holiday Video Effects
DataFunTalk
DataFunTalk
Dec 27, 2022 · Artificial Intelligence

Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework

This article reviews the challenges of ultra‑large‑scale face recognition, presents existing solutions such as metric learning, PFC and VFC, and details the proposed FFC framework with dual loaders, ID groups, probe and gallery networks, plus experimental results showing its cost‑effective performance.

AIComputer VisionDeep Learning
0 likes · 7 min read
Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework
Kuaishou Tech
Kuaishou Tech
Dec 26, 2022 · Artificial Intelligence

ICDAR 2023-DSText Video Text Reading Competition Overview

The ICDAR 2023-DSText competition, launching on February 15, 2023, focuses on dense and small text detection and recognition in video, providing a YouTube‑sourced dataset of 100 videos, two challenge tasks, a detailed timeline, eligibility rules, and a list of international sponsoring institutions.

Computer VisionDatasetICDAR
0 likes · 6 min read
ICDAR 2023-DSText Video Text Reading Competition Overview
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 19, 2022 · Artificial Intelligence

How AI Transforms Football Video Analysis: Detection, Tracking, and Event Recognition

This article explores how artificial intelligence techniques such as deep learning, object detection, multi‑object tracking, and coordinate projection are applied to football video analysis to automatically detect the ball and players, map their positions onto the field, and recognize key events like shots and goals.

AIComputer VisionSports Analytics
0 likes · 16 min read
How AI Transforms Football Video Analysis: Detection, Tracking, and Event Recognition
DataFunTalk
DataFunTalk
Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Computer VisionDeep LearningModel architecture
0 likes · 16 min read
Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance
Laiye Technology Team
Laiye Technology Team
Dec 16, 2022 · Artificial Intelligence

Efficient Production of Scene-specific OCR Models Using an AI Platform

This article explains how a unified AI platform enables rapid, data‑driven creation, training, deployment, and evaluation of OCR models for visually distinct text regions such as seals, meter readings, license plates, and VIN codes, while minimizing hardware and annotation costs.

AI PlatformComputer VisionKubeflow
0 likes · 7 min read
Efficient Production of Scene-specific OCR Models Using an AI Platform
DataFunSummit
DataFunSummit
Dec 9, 2022 · Artificial Intelligence

Volcano Engine Virtual Digital Human Technology Overview

This article provides a comprehensive overview of Volcano Engine's virtual digital human platform, detailing its definition, AI‑driven and human‑driven classifications, 2D and 3D technical architectures, multi‑modal perception, interaction capabilities, application scenarios, and future development directions.

2D avatar3D AvatarComputer Vision
0 likes · 15 min read
Volcano Engine Virtual Digital Human Technology Overview
DataFunTalk
DataFunTalk
Dec 5, 2022 · Artificial Intelligence

MogFace: A High‑Performance Face Detector with Dynamic Label Assignment, FP Context Analysis, and Pyramid‑Level Supervision

The article presents MogFace, a state‑of‑the‑art face detection system that combines a dynamic label‑assignment strategy, false‑positive context analysis, and pyramid‑layer ground‑truth supervision to achieve multiple top‑ranked results on the WIDER FACE benchmark, and details its architecture, observations, and experimental validation.

Computer VisionMogFacedynamic label assignment
0 likes · 7 min read
MogFace: A High‑Performance Face Detector with Dynamic Label Assignment, FP Context Analysis, and Pyramid‑Level Supervision
DataFunTalk
DataFunTalk
Nov 17, 2022 · Artificial Intelligence

Enhance the Visual Representation via Discrete Adversarial Training

The Alibaba AAIG team proposes Discrete Adversarial Training (DAT), which leverages VQGAN‑based discretization to generate natural‑looking adversarial samples that improve visual representation robustness and transferability across classification, self‑supervised learning, and object detection tasks without sacrificing accuracy, achieving new state‑of‑the‑art results on multiple benchmarks.

Computer VisionRobustnessVisual Representation
0 likes · 12 min read
Enhance the Visual Representation via Discrete Adversarial Training
Tencent Cloud Developer
Tencent Cloud Developer
Nov 11, 2022 · Artificial Intelligence

Tencent Advertising Multimedia AI Technology: Research and Application

Liu Wei outlines Tencent’s Advertising Multimedia AI ecosystem on the Taiji platform, describing a five‑platform matrix—Jue for content understanding, Qiankun for automated video creation, Shenzhen for AI‑driven review, Tianyin for hierarchical fingerprinting, and Hunyuan as a multimodal large model—featuring innovations such as massive multimodal pre‑training, logo retrieval, QA‑style attribute extraction, spatiotemporal video analysis, advanced auto‑judgment, and high‑performance hashing that achieve top cross‑modal retrieval results.

Computer VisionMultimodal AIadvertising technology
0 likes · 18 min read
Tencent Advertising Multimedia AI Technology: Research and Application
Shopee Tech Team
Shopee Tech Team
Nov 10, 2022 · Artificial Intelligence

ShopeeVideo OCR: Multi-language Text Recognition System for E-commerce Video

ShopeeVideo OCR is a multi‑language text‑recognition system for Southeast Asian e‑commerce videos that unifies detection, Transformer‑based recognition, layout analysis, and large‑scale synthetic data generation to handle Indonesian, Filipino, English, Vietnamese, Thai and Chinese scripts, delivering industry‑leading accuracy and winning thirteen ICDAR first‑place awards.

Computer VisionDeep LearningMulti-language OCR
0 likes · 15 min read
ShopeeVideo OCR: Multi-language Text Recognition System for E-commerce Video
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Nov 9, 2022 · Artificial Intelligence

Detailed Explanation of Fully Convolutional Networks (FCN) for Semantic Segmentation

This article provides a comprehensive, beginner‑friendly overview of semantic segmentation, focusing on the pioneering Fully Convolutional Network (FCN) architecture, its variants (FCN‑32s, FCN‑16s, FCN‑8s), underlying concepts, loss computation, and practical tips for working with the VOC dataset.

AlexNetComputer VisionFCN
0 likes · 14 min read
Detailed Explanation of Fully Convolutional Networks (FCN) for Semantic Segmentation
Zhuanzhuan Tech
Zhuanzhuan Tech
Nov 9, 2022 · Artificial Intelligence

Applying OCR to Game Skin Recognition: Filtering Owned Skins and Tolerant Text Matching

This article describes how OCR technology is used in a game marketplace to automatically extract skin parameters from user‑uploaded images, outlines methods for separating owned skin regions from background using color analysis, and presents a tolerant matching solution based on Rabin‑Karp hashing to handle OCR errors.

Computer VisionGame DevelopmentImage Processing
0 likes · 10 min read
Applying OCR to Game Skin Recognition: Filtering Owned Skins and Tolerant Text Matching
DataFunSummit
DataFunSummit
Oct 19, 2022 · Artificial Intelligence

Series Six of the Integer Intelligence Autonomous Driving Dataset Collection – Overview and Highlights

This article presents a comprehensive overview of several publicly available autonomous driving datasets, focusing on Series Six of the Integer Intelligence collection, which includes StreetLearn, UTBM RoboCar, Multi‑Vehicle Stereo Event Camera, comma2k19, the Annotated Laser Dataset, Ford, and Oxford RobotCar, detailing their sources, download links, publication years, key features, and research relevance.

Computer VisionDatasetsRobotics
0 likes · 10 min read
Series Six of the Integer Intelligence Autonomous Driving Dataset Collection – Overview and Highlights
Baidu Geek Talk
Baidu Geek Talk
Oct 17, 2022 · Artificial Intelligence

OCR Technology: PaddleOCR and Paddle.js Integration

The article explains OCR fundamentals and details how Baidu’s open‑source PaddleOCR suite can be converted and run in browsers via the @paddlejs‑models/ocr SDK, describing model initialization, detection and CRNN‑based recognition pipelines, and presenting benchmark results that show the newer ch_PP‑OCRv2 model achieving higher accuracy and faster inference than the mobile variant.

AIComputer VisionOCR
0 likes · 9 min read
OCR Technology: PaddleOCR and Paddle.js Integration
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 12, 2022 · Artificial Intelligence

Unlock Vision AI: How EasyCV Streamlines Datasets and Model Training

This article introduces EasyCV, an open‑source all‑in‑one visual algorithm platform that abstracts diverse data sources, provides SOTA self‑supervised models, and offers ready‑to‑download datasets for image classification, object detection, segmentation, and pose estimation, complete with configuration examples.

Computer VisionDatasetsDeep Learning
0 likes · 9 min read
Unlock Vision AI: How EasyCV Streamlines Datasets and Model Training
AntTech
AntTech
Sep 27, 2022 · Artificial Intelligence

Ant Group’s Research Institute Publishes Four NeurIPS 2022 Papers on Advanced Computer Vision and AI

Ant Group’s Ant Technology Research Institute had four papers from its Visual Intelligence Lab accepted at NeurIPS 2022, covering rank diminishing in deep networks, geometry‑aware 3D image synthesis, dynamic discriminators for GANs, and uncertainty‑aware hierarchical refinement for incremental classification, highlighting the institute’s cutting‑edge AI research.

AI researchComputer VisionDeep Learning
0 likes · 8 min read
Ant Group’s Research Institute Publishes Four NeurIPS 2022 Papers on Advanced Computer Vision and AI
Zhengtong Technical Team
Zhengtong Technical Team
Sep 22, 2022 · Artificial Intelligence

How YOLOv5 Powers Real‑Time City Management Video Analysis

This article explains the background, workflow, and technical details of using the YOLOv5 one‑stage object detection algorithm to enable fast, accurate video analytics for urban management, covering data augmentation, backbone design, FPN‑PAN neck, and prediction output processing.

AIComputer VisionDeep Learning
0 likes · 8 min read
How YOLOv5 Powers Real‑Time City Management Video Analysis
HomeTech
HomeTech
Sep 20, 2022 · Artificial Intelligence

Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition

This article reviews the evolution of deep‑learning image‑classification networks, surveys attention mechanisms for fine‑grained tasks, describes the CVPR 2022 FGVC9 competition solution using RegNetY and random attention cropping, and discusses its deployment in automotive series recognition along with future challenges.

CVPRComputer VisionDeep Learning
0 likes · 19 min read
Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition
Programmer DD
Programmer DD
Sep 13, 2022 · Artificial Intelligence

Why AI Porn Detection Still Struggles: Key Challenges Explained

AI-based porn detection uses deep neural networks to classify images, but faces tough hurdles such as visual similarity with benign content, subjective standards for nudity, and vulnerabilities from training‑data dependence, meaning human moderators remain essential for reliable safety.

AI moderationComputer VisionContent Safety
0 likes · 3 min read
Why AI Porn Detection Still Struggles: Key Challenges Explained
DataFunSummit
DataFunSummit
Sep 6, 2022 · Artificial Intelligence

Recent Advances in Self‑Supervised Learning for Text Recognition (OCR)

This article reviews recent progress in applying self‑supervised learning to OCR text recognition, covering mainstream model architectures, key considerations for self‑supervised tasks on text images, and detailed analyses of representative papers such as SeqCLR, SimAN, and DiG, highlighting their designs, experiments, and results.

Computer VisionOCRcontrastive learning
0 likes · 20 min read
Recent Advances in Self‑Supervised Learning for Text Recognition (OCR)
ByteDance Terminal Technology
ByteDance Terminal Technology
Sep 1, 2022 · Artificial Intelligence

Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion

This article presents a hybrid pipeline combining traditional computer vision techniques and deep learning models to automatically extract and verify text background colors in UI automation screenshots, effectively addressing challenges like limited training data and complex borders to significantly reduce manual inspection costs while achieving high accuracy and robustness in production environments.

Automated TestingComputer VisionDeep Learning
0 likes · 10 min read
Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion
DevOps
DevOps
Aug 23, 2022 · Artificial Intelligence

Intelligent Automation Testing: Self‑Healing and Machine‑Learning Techniques

This article reviews the evolution of automated testing toward intelligent solutions, explaining self‑healing mechanisms, machine‑learning‑driven object recognition, computer‑vision and OCR approaches, industry tools such as Healenium and Airtest, and future prospects for zero‑code AI‑powered test automation.

AIComputer VisionOCR
0 likes · 13 min read
Intelligent Automation Testing: Self‑Healing and Machine‑Learning Techniques
DaTaobao Tech
DaTaobao Tech
Aug 19, 2022 · Artificial Intelligence

SepLUT: Separable Lookup Tables for Real-time Image Enhancement

SepLUT, a new separable lookup‑table framework, splits color enhancement into a 1‑D LUT for independent adjustments and a 3‑D LUT for correlated changes, predicted by a lightweight CNN, enabling quantizable, real‑time ISP performance with state‑of‑the‑art results on the FiveK benchmark.

Computer VisionDeep LearningReal-Time
0 likes · 12 min read
SepLUT: Separable Lookup Tables for Real-time Image Enhancement
FunTester
FunTester
Aug 18, 2022 · Artificial Intelligence

How AI Can Automate UI Testing: Building Image‑Based Anomaly Detection

This article examines the evolution of mobile UI testing toward AI‑driven approaches, outlines the challenges of large‑scale apps, and details a practical workflow for constructing image‑based anomaly datasets, training a ResNet‑18 model, and iterating on detection performance.

AI testingComputer VisionDeep Learning
0 likes · 13 min read
How AI Can Automate UI Testing: Building Image‑Based Anomaly Detection
Beike Product & Technology
Beike Product & Technology
Aug 12, 2022 · Artificial Intelligence

Green Area Generation Method Based on Pix2pix Model

This paper proposes a pix2pix‑based method to automatically generate green areas for large‑scale outdoor 3D scene modeling, detailing dataset creation via OpenCV segmentation, model training, region partitioning, and experimental results showing a 93.8% acceptance rate, significantly improving efficiency over manual drawing.

3D modelingComputer VisionGAN
0 likes · 14 min read
Green Area Generation Method Based on Pix2pix Model
ITPUB
ITPUB
Jul 21, 2022 · Artificial Intelligence

From Blur to Brilliance: How AI‑Powered Image Quality Assessment Transformed 58.com’s Recruitment Images

This article reviews image quality assessment fundamentals, modern CNN‑based IQA models, and their deployment at 58.com to automatically score, filter, and rank millions of recruitment photos, achieving a drop in low‑quality images from 9% to zero while boosting overall accuracy to 94.7%.

Business ApplicationCNNComputer Vision
0 likes · 19 min read
From Blur to Brilliance: How AI‑Powered Image Quality Assessment Transformed 58.com’s Recruitment Images