Tagged articles

650 articles

Page 3 of 7

Mar 6, 2024 · Artificial Intelligence

Short Video Quality Assessment Competition (KVQ) at CVPR NTIRE 2024

The CVPR NTIRE 2024 workshop hosts the first short‑video quality assessment competition, introducing the KVQ dataset of 4,200 videos across nine scenes, providing training/validation data, a baseline 3D Swin‑Transformer model, detailed competition rules, rewards, and organizer contacts.

AIComputer VisionDataset

0 likes · 7 min read

Short Video Quality Assessment Competition (KVQ) at CVPR NTIRE 2024

DaTaobao Tech

Mar 6, 2024 · Artificial Intelligence

AI Clothing Graffiti Project: Implementation and Optimization of AIGC Technology in Taobao Life 2

The AI Clothing Graffiti Project in Taobao Life 2 leverages Stable Diffusion, ControlNet, and LoRA to let users generate and stylize clothing designs via text‑image prompts, employing parallel processing, face repair, and content filtering, and has launched successfully, inviting algorithm engineers to join the team.

AIAIGCComputer Vision

0 likes · 14 min read

AI Clothing Graffiti Project: Implementation and Optimization of AIGC Technology in Taobao Life 2

DataFunSummit

Mar 5, 2024 · Artificial Intelligence

AI-Driven Intelligent Management and Regulation of Mold Temperature in Smart Manufacturing

This article explores how artificial intelligence, computer vision, and control algorithms are applied to smart manufacturing for intelligent mold temperature detection, cooling flow regulation, and full‑process system alerts, presenting a detailed solution architecture, key technologies, and a real‑world case study.

AIComputer VisionMold Temperature Control

0 likes · 11 min read

AI-Driven Intelligent Management and Regulation of Mold Temperature in Smart Manufacturing

NewBeeNLP

Mar 4, 2024 · Artificial Intelligence

A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations

This article presents a GitHub‑hosted collection of 25 recent research papers on Mamba and its variants, summarizing each work’s core contributions across sequence modeling, vision, medical imaging, graph analysis, and multimodal tasks, and highlighting their performance gains over prior methods.

Computer VisionDeep LearningMamba

0 likes · 13 min read

A Curated Tour of Mamba Papers: 25 Cutting‑Edge State‑Space Model Innovations

Architects' Tech Alliance

Feb 18, 2024 · Artificial Intelligence

How OpenAI’s Sora Redefines Video Generation with 3‑D Consistency and World Simulation

OpenAI’s Sora model introduces a diffusion‑transformer approach that generates high‑fidelity, 60‑second videos with consistent 3‑D camera motion, long‑term object persistence, and the ability to simulate interactive digital worlds, backed by a detailed technical report and research paper.

Computer VisionOpenAISora

0 likes · 9 min read

How OpenAI’s Sora Redefines Video Generation with 3‑D Consistency and World Simulation

Xiaohongshu Tech REDtech

Jan 31, 2024 · Artificial Intelligence

Encoding‑Alignment‑Interaction (EAI) Framework for Full‑Body Human Motion Forecasting

The Encoding‑Alignment‑Interaction (EAI) framework predicts full‑body human motion—including detailed hand joints—by extracting spatio‑temporal features with DCT and GCNs, aligning heterogeneous body‑hand representations via Cross‑Context Alignment, and modeling semantic and physical interactions through Cross‑Context Interaction, achieving state‑of‑the‑art accuracy on the GRAB dataset.

Computer VisionEAI frameworkcross-context alignment

0 likes · 15 min read

Encoding‑Alignment‑Interaction (EAI) Framework for Full‑Body Human Motion Forecasting

DaTaobao Tech

Jan 31, 2024 · Artificial Intelligence

Highlights of Recent AI Research Papers from Top Conferences (2023)

The article curates standout AI papers from 2023 CCF‑A conferences—including CVPR, ICLR, ACM MM, and INFORMS—showcasing advances such as Swin‑Transformer video quality assessment, cross‑modal e‑commerce product search, transformer‑based vehicle routing heuristics, diffusion‑driven dance generation, and reinforcement‑learning inventory replenishment.

AIComputer VisionMultimedia

0 likes · 23 min read

Highlights of Recent AI Research Papers from Top Conferences (2023)

Huolala Tech

Jan 25, 2024 · Artificial Intelligence

How Open‑Vocabulary Detection and Segment‑Anything Are Revolutionizing Visual AI at Huolala

This article reviews traditional computer‑vision tasks—classification, detection, and segmentation—highlights their limitations, introduces open‑vocabulary detection and segment‑anything models such as GLIP, Grounding DINO, and SAM, and details how Huolala applies these advances to driver‑license, packing, and vehicle‑sticker inspections for safer, more efficient AI‑driven operations.

Computer VisionSegmentationobject detection

0 likes · 20 min read

How Open‑Vocabulary Detection and Segment‑Anything Are Revolutionizing Visual AI at Huolala

AsiaInfo Technology: New Tech Exploration

Jan 12, 2024 · Artificial Intelligence

Exploring NeRF: From Theory to Real-World 3D Reconstruction Tools

This article introduces Neural Radiance Fields (NeRF) as a cutting‑edge AI technique for high‑quality 3D reconstruction, explains its core principles and advantages, outlines a step‑by‑step building workflow, reviews popular open‑source libraries such as Luma AI, NVIDIA Instant NeRF and NeRFStudio, and offers a forward‑looking summary of its potential and challenges.

3D reconstructionAIComputer Vision

0 likes · 12 min read

Exploring NeRF: From Theory to Real-World 3D Reconstruction Tools

Python Programming Learning Circle

Jan 8, 2024 · Artificial Intelligence

Human Skin Detection with OpenCV: YCrCb and HSV Based Methods

This article demonstrates how to use Python's OpenCV library for human skin detection by converting images to YCrCb and HSV color spaces, applying Gaussian blur, Otsu thresholding, and range‑based segmentation, with step‑by‑step installation, code examples, and visual results.

Computer VisionHSVImage Processing

0 likes · 10 min read

Human Skin Detection with OpenCV: YCrCb and HSV Based Methods

21CTO

Dec 17, 2023 · Artificial Intelligence

Remembering Tang Xiaoyu: The Visionary Behind Modern Facial Recognition

AI pioneer Tang Xiaoyu, co‑founder of SenseTime and former director of leading computer‑vision labs, passed away in December 2023, leaving a legacy of groundbreaking facial‑recognition algorithms, influential mentorship, and a profound impact on the global artificial‑intelligence community.

AI PioneerComputer Visionartificial intelligence

0 likes · 7 min read

Remembering Tang Xiaoyu: The Visionary Behind Modern Facial Recognition

We-Design

Dec 13, 2023 · Artificial Intelligence

How AI-Powered Beauty Filters Evolved: From Classic Portraits to Real-Time Video Effects

This article traces the evolution of beauty filter technology from ancient artistic enhancements to modern AI-driven real-time video effects, detailing key techniques like face detection, skin smoothing, AR integration, and shifting user preferences, while reflecting on its cultural impact on social media aesthetics.

AIARComputer Vision

0 likes · 9 min read

How AI-Powered Beauty Filters Evolved: From Classic Portraits to Real-Time Video Effects

Airbnb Technology Team

Dec 8, 2023 · Artificial Intelligence

Leveraging Image Aesthetics and Photo Sorting Algorithms to Enhance Airbnb Listings

Airbnb’s new computer‑vision pipeline trains a deep‑learning aesthetic model with an EMD loss to rank photos, automatically sorts new‑listing images by design and room type, and scales real‑time similarity search via HNSW‑based ANN on AWS OpenSearch, boosting click‑through, bookings, and enabling unsupervised visual recommendations.

AirbnbComputer VisionDeep Learning

0 likes · 9 min read

Leveraging Image Aesthetics and Photo Sorting Algorithms to Enhance Airbnb Listings

IT Services Circle

Dec 6, 2023 · Artificial Intelligence

AI Image Outpainting: Unexpected Transformations and How It Works

The article showcases a series of humorous and surprising AI‑generated image expansions from Douyin, explains the underlying outpainting technology, and discusses why such tools are both entertaining and useful despite occasional odd results.

AIComputer VisionDeep Learning

0 likes · 6 min read

AI Image Outpainting: Unexpected Transformations and How It Works

Python Programming Learning Circle

Nov 30, 2023 · Artificial Intelligence

Common Python Libraries for Computer Vision Projects

This article introduces ten popular Python libraries for computer vision, describing their main features, typical applications, and providing concise code examples to help beginners and practitioners quickly choose and use the right tools for image processing and deep learning tasks.

Computer VisionImage ProcessingPython

0 likes · 10 min read

Common Python Libraries for Computer Vision Projects

DataFunTalk

Nov 24, 2023 · Artificial Intelligence

Open Vocabulary Detection Contest 2023: Summary of Winning Teams' Technical Solutions

The article reviews the Open Vocabulary Detection Contest organized by the Chinese Society of Image and Graphics and 360 AI Institute, describing the competition setup, dataset characteristics, and detailed winning approaches that combine Detic, CLIP, prompt learning, and multi‑stage pipelines to achieve strong few‑shot and zero‑shot object detection performance.

CLIPComputer Visioncompetition

0 likes · 17 min read

Open Vocabulary Detection Contest 2023: Summary of Winning Teams' Technical Solutions

Test Development Learning Exchange

Nov 16, 2023 · Artificial Intelligence

Building a Python Image Editing Tool with Pillow, OpenCV, and NumPy

This guide demonstrates how to create a custom image editing tool in Python by leveraging the Pillow, OpenCV, and NumPy libraries, providing step‑by‑step code examples for opening, resizing, filtering, converting to grayscale, edge detection, rotation, channel manipulation, blurring, contour extraction, and color adjustment.

Computer VisionImage ProcessingNumPy

0 likes · 6 min read

Building a Python Image Editing Tool with Pillow, OpenCV, and NumPy

Network Intelligence Research Center (NIRC)

Nov 9, 2023 · Artificial Intelligence

How Wav2Lip Achieves Accurate Speech‑Driven Lip Sync with Expert Discriminators

The article analyzes the limitations of traditional speech‑driven lip‑sync methods and explains how Wav2Lip introduces a pretrained multi‑frame expert sync discriminator, a two‑stage GAN training pipeline, and a specialized generator architecture to produce high‑quality, audio‑aligned facial videos.

Computer VisionDeep LearningGAN

0 likes · 7 min read

How Wav2Lip Achieves Accurate Speech‑Driven Lip Sync with Expert Discriminators

Tencent Tech

Nov 9, 2023 · Artificial Intelligence

How Adaptive Skinning Model Boosts Low-Cost High-Quality 3D Face Reconstruction

This article introduces the Adaptive Skinning Model (ASM), a low‑cost yet high‑precision 3D face reconstruction technique that leverages Gaussian‑Mixture skinning weights and dynamic bone binding to surpass traditional 3DMM methods and achieve state‑of‑the‑art results on multiple benchmarks.

3D face reconstructionComputer VisionGaussian mixture model

0 likes · 13 min read

How Adaptive Skinning Model Boosts Low-Cost High-Quality 3D Face Reconstruction

Huawei Cloud Developer Alliance

Oct 31, 2023 · Artificial Intelligence

Edge‑Cloud AI Powers Student Fatigue‑Driving Detection – Challenge Cup Winners

The 18th Challenge Cup showcased cutting‑edge student projects on fatigue‑driving detection, with Huawei Cloud’s edge‑cloud collaborative topic drawing nearly a thousand participants and five top teams demonstrating AI‑driven solutions that combine incremental training, low‑light enhancement, and lightweight models for real‑time safety alerts.

AIComputer VisionEdge Computing

0 likes · 6 min read

Edge‑Cloud AI Powers Student Fatigue‑Driving Detection – Challenge Cup Winners

Python Programming Learning Circle

Oct 26, 2023 · Artificial Intelligence

Animal Recognition Techniques Using Deep Learning and Image Processing

This article reviews animal recognition technology, covering its background, basic principles, image‑processing, feature extraction, machine‑learning and deep‑learning methods, dataset construction, preprocessing, and feature‑selection techniques, and provides Python code examples for implementing CNNs and traditional classifiers.

Computer VisionDeep LearningImage Processing

0 likes · 18 min read

Animal Recognition Techniques Using Deep Learning and Image Processing

Network Intelligence Research Center (NIRC)

Oct 23, 2023 · Artificial Intelligence

How Multiple‑Instance Learning Boosts Context Understanding in Video Anomaly Detection

The article reviews the CVPR 2021 MIST framework, explaining how a multiple‑instance pseudo‑label generator and a self‑guided attention encoder work together with sparse continuous sampling to improve context awareness and detection accuracy in weakly‑supervised video anomaly detection.

Attention EncoderComputer VisionMultiple Instance Learning

0 likes · 9 min read

How Multiple‑Instance Learning Boosts Context Understanding in Video Anomaly Detection

DaTaobao Tech

Oct 13, 2023 · Artificial Intelligence

Understanding Stable Diffusion: Core Principles and Technical Architecture

The article demystifies Stable Diffusion by explaining its low‑cost latent‑space design and conditioning mechanisms, comparing it to autoregressive, VAE, flow‑based and GAN models, detailing the iterative noise‑to‑image process, token‑based text‑to‑image control, version differences, common generation issues, and providing implementation code examples.

AI image generationComputer VisionCross-Attention

0 likes · 15 min read

Understanding Stable Diffusion: Core Principles and Technical Architecture

Meituan Technology Team

Oct 11, 2023 · Artificial Intelligence

Meituan Vision AI Research Highlights and Open‑Source Releases

This article compiles Meituan's cutting‑edge computer‑vision research and engineering achievements—including CVPR award‑winning segmentation, YOLOv6 releases, GPU inference optimizations, the Food2K dataset, and numerous paper digests—to provide practical insights for visual AI practitioners.

CVPRComputer VisionDeep Learning

0 likes · 11 min read

Meituan Vision AI Research Highlights and Open‑Source Releases

Kuaishou Large Model

Sep 27, 2023 · Artificial Intelligence

DVIS: Decoupled Framework that Sets New SOTA in Video Instance Segmentation

DVIS introduces a decoupled video instance segmentation framework that splits the task into segmentation, tracking, and refinement modules, achieving state-of-the-art performance across VIS, VPS, and VSS benchmarks while maintaining low computational overhead, and demonstrates robustness in both online and offline settings.

Computer VisionDeep LearningTransformer

0 likes · 12 min read

DVIS: Decoupled Framework that Sets New SOTA in Video Instance Segmentation

Rare Earth Juejin Tech Community

Sep 16, 2023 · Artificial Intelligence

Understanding DeepSort: A Classic Multi-Object Tracking Algorithm

This article introduces the fundamentals of object tracking in computer vision, explains classic algorithms such as SORT and its deep learning extension DeepSort, describes their underlying mechanisms including Kalman filtering, Hungarian assignment, feature extraction via CNNs, and provides references and code resources for further study.

CNNComputer VisionDeepSort

0 likes · 10 min read

Understanding DeepSort: A Classic Multi-Object Tracking Algorithm

Rare Earth Juejin Tech Community

Aug 26, 2023 · Artificial Intelligence

Using AI and RPA to Solve Slider Captcha: A Practical Implementation with YOLOv8 and PyAutoGUI

This article demonstrates how to combine AI‑based object detection (YOLOv8) with robotic process automation (pyautogui) to automatically locate, drag and release slider captchas, covering data preparation, model training, screen capture, coordinate extraction, mouse simulation, and robustness improvements.

AICaptchaComputer Vision

0 likes · 15 min read

Using AI and RPA to Solve Slider Captcha: A Practical Implementation with YOLOv8 and PyAutoGUI

DataFunSummit

Aug 24, 2023 · Artificial Intelligence

Panoramic Indoor Layout Estimation with Vision Transformer (PanoViT)

This article introduces the PanoViT model, a vision‑transformer‑based approach for indoor layout estimation from panoramic images, covering its research background, architectural components, experimental results on public datasets, and step‑by‑step usage within ModelScope.

3D reconstructionComputer VisionDeep Learning

0 likes · 8 min read

Panoramic Indoor Layout Estimation with Vision Transformer (PanoViT)

Rare Earth Juejin Tech Community

Aug 24, 2023 · Artificial Intelligence

Neural Style Transfer with PyTorch: Theory and Implementation

This article introduces neural style transfer, explains its underlying principles using VGG19 feature extraction, content and style loss definitions, and provides a complete PyTorch implementation with code for loading images, extracting features, computing Gram matrices, and optimizing the output image.

Computer VisionDeep LearningPyTorch

0 likes · 14 min read

Neural Style Transfer with PyTorch: Theory and Implementation

Top Architect

Aug 22, 2023 · Artificial Intelligence

Face Recognition Search: Principles, Implementation Steps, and Applications

This article explains the background, core principles, preprocessing, feature extraction, matching algorithms, and practical application scenarios of face recognition search, and provides detailed reference implementations with Java and OpenCV code examples for building a complete system.

Computer VisionDeep LearningImage Processing

0 likes · 15 min read

Face Recognition Search: Principles, Implementation Steps, and Applications

DaTaobao Tech

Aug 21, 2023 · Artificial Intelligence

Action Sensitivity Learning for Temporal Action Localization

The paper presents Action Sensitivity Learning (ASL), a framework that models frame‑wise importance at both class‑level (via learnable Gaussian distributions) and instance‑level (using quality scores), integrates these weights into classification and regression losses, adds a contrastive InfoNCE term, and achieves state‑of‑the‑art temporal action localization performance across six benchmark datasets.

Action Sensitivity LearningComputer VisionDeep Learning

0 likes · 8 min read

Action Sensitivity Learning for Temporal Action Localization

Rare Earth Juejin Tech Community

Aug 17, 2023 · Artificial Intelligence

Getting Started with YOLOv8 on the Ultralytics Platform: Installation, Command‑Line Usage, and Model Training

This article introduces the YOLOv8 object‑detection framework on the Ultralytics platform, covering environment setup, command‑line and Python APIs for inference, model‑file options, result interpretation, data annotation, training procedures, and exporting models to various deployment formats.

Computer VisionModel TrainingPython

0 likes · 14 min read

Getting Started with YOLOv8 on the Ultralytics Platform: Installation, Command‑Line Usage, and Model Training

Rare Earth Juejin Tech Community

Aug 16, 2023 · Artificial Intelligence

Deep Dive into OCR – Chapter 2: Development and Classification of OCR Technology

This article provides a comprehensive overview of OCR technology, detailing the evolution from traditional hand‑crafted methods to modern deep‑learning approaches, describing image preprocessing, text detection and recognition pipelines, summarizing classic machine‑learning algorithms, and presenting a practical OpenCV implementation with Python code.

Computer VisionDeep LearningOCR

0 likes · 23 min read

Deep Dive into OCR – Chapter 2: Development and Classification of OCR Technology

Rare Earth Juejin Tech Community

Aug 12, 2023 · Artificial Intelligence

An Introduction to OCR: Concepts, History, Applications, Datasets, and Technical Workflow

This article provides a comprehensive overview of Optical Character Recognition (OCR), covering its definition, historical development, classification, real‑world applications, technical pipeline, common challenges, mitigation strategies, popular datasets, model performance comparisons, and leading open‑source platforms.

Computer VisionDatasetsDeep Learning

0 likes · 16 min read

An Introduction to OCR: Concepts, History, Applications, Datasets, and Technical Workflow

Model Perspective

Aug 2, 2023 · Artificial Intelligence

How Segment Anything (SAM) Is Revolutionizing Image Segmentation

This article explains the fundamentals of image segmentation, introduces the open‑source Segment Anything Model (SAM) and its massive SA‑1B dataset, outlines SAM's unique promptable, real‑time capabilities, and explores its wide‑ranging future applications across AR/VR, content creation, and scientific research.

AIComputer VisionSAM

0 likes · 7 min read

How Segment Anything (SAM) Is Revolutionizing Image Segmentation

Test Development Learning Exchange

Aug 1, 2023 · Artificial Intelligence

How to Build a One‑Click Face Swap System with OpenCV, dlib, and Flask

This guide walks through installing required libraries, preparing source and target images, detecting and aligning facial landmarks with dlib, swapping faces using OpenCV, displaying and saving the result, and adding a Flask‑based image‑upload interface to automate the one‑click face swap workflow.

Computer VisionFaceSwapFlask

0 likes · 7 min read

How to Build a One‑Click Face Swap System with OpenCV, dlib, and Flask

Meituan Technology Team

Jul 27, 2023 · Artificial Intelligence

Street Scene Understanding: Segmentation Technology, Research Progress, and Business Applications

Meituan’s Street‑Scene Understanding team built a high‑precision, efficient segmentation system that aligns motion and static semantics, mines hard examples, iterates models via a data‑model loop, and pursues unified open‑world segmentation, winning multiple CVPR 2023 awards and powering map production, autonomous delivery and store‑scene reconstruction.

AICVPR 2023Computer Vision

0 likes · 31 min read

Street Scene Understanding: Segmentation Technology, Research Progress, and Business Applications

php Courses

Jul 24, 2023 · Artificial Intelligence

Image Edge Enhancement Using PHP and OpenCV

This article explains how to perform image edge enhancement by installing PHP and the OpenCV library, importing images, invoking OpenCV functions, selecting edge detection algorithms such as Sobel or Canny, processing the image with custom code, and displaying or saving the enhanced result.

Computer VisionEdge DetectionImage Processing

0 likes · 5 min read

Image Edge Enhancement Using PHP and OpenCV

Rare Earth Juejin Tech Community

Jul 24, 2023 · Artificial Intelligence

Understanding Slide-Transformer: An Efficient Local Attention Module for Vision Transformers

This article explains the Slide-Transformer paper, describing how the proposed Slide Attention replaces inefficient Im2Col‑based local attention with depthwise convolutions and a deformable shift module, achieving high efficiency, flexibility, and hardware‑agnostic performance for Vision Transformers.

Computer VisionDeep LearningDeformable Shift

0 likes · 13 min read

Understanding Slide-Transformer: An Efficient Local Attention Module for Vision Transformers

Huolala Tech

Jul 21, 2023 · Artificial Intelligence

Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation

Recent advances in visual language models enable zero-shot multimodal tasks, and this article explores their application to open-set object detection, prompt learning, and promptable surgical instrument segmentation, highlighting methods like CLIP, CoOp, and the DetPro framework with experimental results across multiple benchmarks.

Computer VisionVisual-Language Modelsmultimodal

0 likes · 12 min read

Visual Language Models Power Open-Set Detection and Surgical Tool Segmentation

php Courses

Jul 21, 2023 · Artificial Intelligence

Image Segmentation with PHP and OpenCV

This tutorial explains how to perform image segmentation using the OpenCV library in PHP, covering environment setup, library import, image loading, grayscale conversion, thresholding, result display, and saving the segmented output.

Computer VisionOpenCVPHP

0 likes · 4 min read

php Courses

Jul 18, 2023 · Artificial Intelligence

Implementing Face Recognition with PHP and OpenCV

This article provides a step‑by‑step tutorial on installing OpenCV and the PHP OpenCV extension on Ubuntu, then demonstrates how to write PHP code for face detection and recognition using OpenCV's cascade classifier and FisherFaceRecognizer, complete with example scripts and usage instructions.

Computer VisionOpenCVPHP

0 likes · 7 min read

Implementing Face Recognition with PHP and OpenCV

php Courses

Jul 17, 2023 · Artificial Intelligence

Implementing Facial Landmark Detection with PHP and OpenCV

This tutorial demonstrates how to set up PHP and OpenCV, install necessary libraries, write and run a PHP script that detects faces and extracts facial landmarks, and saves the annotated image, providing a practical introduction to facial landmark detection in computer vision.

Computer VisionFacial Landmark DetectionImage Processing

0 likes · 5 min read

Implementing Facial Landmark Detection with PHP and OpenCV

ByteFE

Jul 12, 2023 · Artificial Intelligence

Image Processing and WebAssembly: From Basic Filters to OpenCV Applications

This article explores image processing techniques from basic filters to advanced OpenCV applications, demonstrating how WebAssembly enables high-performance image processing in web browsers.

AssemblyScriptComputer VisionFilters

0 likes · 16 min read

Image Processing and WebAssembly: From Basic Filters to OpenCV Applications

Rare Earth Juejin Tech Community

Jul 12, 2023 · Artificial Intelligence

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

This article provides an in‑depth, English‑language overview of Vision Transformer (ViT), covering its Transformer‑based architecture, patch‑to‑token conversion, token and position embeddings, fine‑tuning strategies such as 2‑D interpolation, experimental results versus CNNs, and the model’s broader significance for multimodal AI research.

Computer VisionDeep LearningFine‑tuning

0 likes · 25 min read

Comprehensive Guide to Vision Transformer (ViT): Architecture, Patch Tokenization, Embedding, Fine‑tuning, and Performance

Kuaishou Large Model

Jul 7, 2023 · Artificial Intelligence

How HairStep Revolutionizes Single-View 3D Hair Reconstruction

This paper introduces HairStep, a novel intermediate representation combining Strand Maps and Depth Maps, and demonstrates how it reduces domain gap and improves single‑view 3D hair reconstruction accuracy across multiple algorithms, supported by new annotated datasets (HiSa, HiDa) and fair evaluation metrics.

3D hair reconstructionComputer VisionDataset

0 likes · 11 min read

How HairStep Revolutionizes Single-View 3D Hair Reconstruction

Efficient Ops

Jun 26, 2023 · Artificial Intelligence

How Multimodal AI Is Revolutionizing Credit Card Fraud Detection

Amid tightening financial regulations, ICBC's software team proposes a multimodal AI anti‑fraud framework that combines image, video, and structured data to detect deep‑fake, mask, and forged‑document attacks, enriches verification with cross‑modal cues, and outlines future expansion to text and speech modalities.

AIComputer VisionDeep Learning

0 likes · 7 min read

How Multimodal AI Is Revolutionizing Credit Card Fraud Detection

Programmer DD

Jun 20, 2023 · Artificial Intelligence

Yann LeCun: Today's AI Still Below Dog Level – Inside Meta’s Voicebox, MusicGen & I‑JEPA

Meta’s chief AI scientist Yann LeCun warned that current large language models still fall short of human and even dog intelligence, citing their lack of real‑world understanding, while Meta unveiled three new generative AI models—Voicebox for speech, MusicGen for music, and I‑JEPA for image reasoning—showcasing both progress and remaining limitations.

Computer VisionMusic generationSpeech synthesis

0 likes · 7 min read

Yann LeCun: Today's AI Still Below Dog Level – Inside Meta’s Voicebox, MusicGen & I‑JEPA

Xiaohongshu Tech REDtech

Jun 20, 2023 · Artificial Intelligence

Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification

At CVPR 2023 the Xiaohongshu team presented OvarNet, a unified one‑stage Faster‑RCNN model built on CLIP that uses prompt learning and knowledge distillation to jointly detect objects and recognize open‑vocabulary attributes, achieving state‑of‑the‑art results on VAW, MS‑COCO, LSA and OVAD datasets.

Computer VisionMultimodal Learningattribute recognition

0 likes · 12 min read

Open-Vocabulary Object Attribute Recognition with OvarNet: A Unified Framework for Detection and Attribute Classification

Meituan Technology Team

Jun 15, 2023 · Artificial Intelligence

Meituan Technical Team's 8 CVPR 2023 Papers: Overview and Insights

This article reviews eight CVPR 2023 papers selected by Meituan’s technology team, covering self‑supervised learning, domain adaptation, federated learning, object detection, 3D reconstruction, GAN‑based pre‑training, RGB‑T tracking, vision‑language navigation, and visual‑textual layout generation, highlighting each work’s methodology, experiments, and reported performance gains.

3D Object DetectionCVPR 2023Computer Vision

0 likes · 15 min read

Meituan Technical Team's 8 CVPR 2023 Papers: Overview and Insights

Alimama Tech

Jun 14, 2023 · Artificial Intelligence

Intelligent Live‑Streaming Video Editing Techniques and Practices

Alibaba Mama’s end‑to‑end intelligent clipping system automatically transforms long live‑stream e‑commerce videos into short, high‑quality ads by segmenting streams, classifying speech with GPT‑based tags, selecting visually appealing clips, arranging coherent storylines, and applying effects, achieving 96% classification accuracy and improved advertising efficiency.

AIComputer VisionContent Optimization

0 likes · 14 min read

Intelligent Live‑Streaming Video Editing Techniques and Practices

Network Intelligence Research Center (NIRC)

Jun 9, 2023 · Artificial Intelligence

2023 NIRC PhD Graduates Reveal Cutting-Edge AI and Network Intelligence Research

In 2023 the Network Intelligent Research Center celebrated its largest PhD graduating class—seven scholars whose dissertations span deep‑vision hand‑gesture estimation, multi‑scenario network transmission, graph alignment, interactive streaming, knowledge‑defined networking, wireless body‑area networking, and more—showcasing significant AI‑driven advances and high‑impact publications.

Computer VisionDeep LearningGraph Alignment

0 likes · 30 min read

2023 NIRC PhD Graduates Reveal Cutting-Edge AI and Network Intelligence Research

DataFunSummit

May 31, 2023 · Artificial Intelligence

Evolution of Face Detection Techniques: Datasets, Research Directions, and Future Work

This article reviews the evolution of face detection, covering the Widely‑Face dataset, major research directions such as feature fusion, label assignment, auxiliary supervision, anchor‑free methods, NAS‑based designs, summarizes key papers from S3FD to MogFace, introduces ModelScope implementations, and outlines future challenges and opportunities.

AI researchComputer VisionDatasets

0 likes · 13 min read

Evolution of Face Detection Techniques: Datasets, Research Directions, and Future Work

Test Development Learning Exchange

May 27, 2023 · Artificial Intelligence

Eight Essential OpenCV Examples for Image Processing

This article introduces eight fundamental OpenCV examples—including image reading, display, grayscale conversion, edge detection, resizing, Gaussian blur, and face detection—providing concise Python code snippets and explanations to help readers quickly apply these common computer‑vision techniques.

Code ExamplesComputer VisionImage Processing

0 likes · 5 min read

Eight Essential OpenCV Examples for Image Processing

DataFunTalk

May 13, 2023 · Artificial Intelligence

Multimedia Content Understanding at Weibo: Video Summarization, Quality Assessment, OCR, Embedding, and CV‑CUDA Optimization

This article presents Weibo's comprehensive multimedia content understanding pipeline, covering video summarization techniques, quality assessment models, OCR advancements, video embedding strategies, and the performance benefits of CV‑CUDA acceleration, while highlighting real‑world applications and engineering trade‑offs.

CV-CUDAComputer VisionDeep Learning

0 likes · 32 min read

Multimedia Content Understanding at Weibo: Video Summarization, Quality Assessment, OCR, Embedding, and CV‑CUDA Optimization

AntTech

May 6, 2023 · Artificial Intelligence

Wu Wenjun AI Science and Technology Award Honors Tsinghua and Ant Group's Unconstrained Human Portrait Perception and Understanding Technology

The 2022 Wu Wenjun Artificial Intelligence Science and Technology Award recognized a decade‑long collaborative effort by Tsinghua University and Ant Group's security lab for breakthrough research on unconstrained human portrait perception and understanding, highlighting three core scientific discoveries, extensive academic impact, and large‑scale commercial applications in identity verification.

AI AwardsAnt GroupComputer Vision

0 likes · 5 min read

Wu Wenjun AI Science and Technology Award Honors Tsinghua and Ant Group's Unconstrained Human Portrait Perception and Understanding Technology

Baidu Tech Salon

Apr 25, 2023 · Game Development

How to Build Sensor‑Free Motion Games with PP‑TinyPose and FastDeploy

This article explains how to develop sensor‑less motion-controlled games by leveraging the PP‑TinyPose keypoint detection model and FastDeploy inference tool, detailing the required setup, code snippets, and a reusable PyQt5 framework for creating webcam‑driven interactive demos.

AIComputer VisionFastDeploy

0 likes · 11 min read

How to Build Sensor‑Free Motion Games with PP‑TinyPose and FastDeploy

DataFunSummit

Apr 20, 2023 · Artificial Intelligence

SenseTime Unveils Multimodal ‘SenseNova’ Large Model System and Its Industry Applications

SenseTime introduced its visual‑centric multimodal large‑model platform SenseNova, detailing model scaling, extensive AI infrastructure, diverse industry deployments such as autonomous driving and generative content, and the challenges of compute efficiency and data acquisition in the race for advanced AI.

AI InfrastructureComputer Visionlarge models

0 likes · 13 min read

SenseTime Unveils Multimodal ‘SenseNova’ Large Model System and Its Industry Applications

Baidu Tech Salon

Apr 14, 2023 · Artificial Intelligence

How PaddleDepth and Paddle3D Enable Low‑Cost 3D Vision Development

This article examines the challenges of 3D vision data acquisition and explains how Baidu's PaddleDepth and Paddle3D toolkits provide low‑cost depth collection, super‑resolution, and end‑to‑end perception pipelines, showcasing performance on KITTI and Middlebury datasets with code examples.

3D visionComputer VisionDepth estimation

0 likes · 12 min read

How PaddleDepth and Paddle3D Enable Low‑Cost 3D Vision Development

AntTech

Apr 12, 2023 · Artificial Intelligence

Ant Technology Research Institute Interactive Intelligence Lab – 13 Papers Accepted at CVPR 2023 and Recent AI Research Highlights

The Ant Technology Research Institute’s Interactive Intelligence Lab announced that 13 of its papers were accepted at CVPR 2023, alongside other recent achievements in generative models and 3D vision, highlighting collaborations with top universities and summarizing the lab’s contributions to artificial intelligence research.

3D visionCVPRComputer Vision

0 likes · 6 min read

Ant Technology Research Institute Interactive Intelligence Lab – 13 Papers Accepted at CVPR 2023 and Recent AI Research Highlights

Baidu Tech Salon

Apr 7, 2023 · Artificial Intelligence

Ambiguity-Resistant Semi-supervised Learning (ARSL) for Single-stage Object Detection

ARSL, an ambiguity‑resistant semi‑supervised learning framework for single‑stage object detection, introduces Joint‑Confidence Estimation and Task‑Separation Assignment to resolve selection and assignment ambiguities in pseudo‑labels, thereby markedly improving pseudo‑label quality and achieving state‑of‑the‑art AP gains on COCO benchmarks.

ARSLComputer VisionSemi-supervised Learning

0 likes · 8 min read

Ambiguity-Resistant Semi-supervised Learning (ARSL) for Single-stage Object Detection

Baidu Geek Talk

Mar 16, 2023 · Artificial Intelligence

PaddleDetection v2.6 Release: PP-YOLOE Family Expansion and Advanced Detection Algorithms

PaddleDetection v2.6 expands the PP‑YOLOE family with rotating, small‑object, dense‑object, and ultra‑lightweight edge‑GPU models, upgrades PP‑Human and PP‑Vehicle toolboxes, releases semi‑supervised, few‑shot and distillation learning methods, adds numerous state‑of‑the‑art algorithms, and improves infrastructure with Python 3.10, EMA filtering and AdamW support.

BaiduComputer VisionDeep Learning

0 likes · 14 min read

PaddleDetection v2.6 Release: PP-YOLOE Family Expansion and Advanced Detection Algorithms

政采云技术

Mar 9, 2023 · Artificial Intelligence

Comprehensive Overview of Object Detection: From Traditional Methods to Modern Deep Learning Models

This article provides a comprehensive overview of object detection, describing traditional sliding‑window approaches, deep‑learning based two‑stage and one‑stage models such as R‑CNN, Faster R‑CNN, YOLO series, and discusses current challenges, improvement directions, and future research trends in the field.

Computer VisionDeep LearningR-CNN

0 likes · 29 min read

Comprehensive Overview of Object Detection: From Traditional Methods to Modern Deep Learning Models

Python Programming Learning Circle

Mar 8, 2023 · Artificial Intelligence

Using ddddocr SDK for Captcha Recognition in Python

This article introduces the open‑source ddddocr SDK, demonstrates how to install it and use it in Python to automatically solve three common captcha types—slider, click‑based, and alphanumeric—providing code examples and result explanations for each.

CaptchaComputer VisionOCR

0 likes · 4 min read

Using ddddocr SDK for Captcha Recognition in Python

Meituan Technology Team

Feb 23, 2023 · Artificial Intelligence

Food2K: A Large-Scale Food Image Dataset and Progressive Region Enhancement Network

This article reviews the Food2K dataset and the proposed Progressive Region Enhancement Network for large‑scale food image recognition, detailing dataset construction, method design, extensive experiments, ablation studies, visualizations, and future research directions, all validated on the IEEE T‑PAMI 2023 paper.

Computer VisionDatasetFine-Grained Classification

0 likes · 31 min read

Food2K: A Large-Scale Food Image Dataset and Progressive Region Enhancement Network

DaTaobao Tech

Feb 20, 2023 · Mobile Development

AR Foot Measurement and Hand Try-On Algorithms for Mobile Vision

The article presents a mobile‑vision solution that combines lightweight detection, line detection, segmentation and 3‑D point‑cloud reconstruction to measure foot length within 3 mm error, and a MANO‑based hand‑try‑on system that predicts full mesh vertices for real‑time watch, phone and ring fitting on smartphones.

ARComputer VisionFoot Measurement

0 likes · 18 min read

AR Foot Measurement and Hand Try-On Algorithms for Mobile Vision

AsiaInfo Technology: New Tech Exploration

Feb 20, 2023 · Industry Insights

Why Pre‑trained Large Models Are the New Infrastructure for AI Applications

Pre‑trained large models are emerging as the foundational infrastructure for AI across industries; this article analyzes their technical advantages, application trends in NLP, CV and multimodal domains, presents a telecom customer‑service case study with performance benchmarks, and outlines future deployment challenges and research directions.

Computer VisionNLPPrompt Tuning

0 likes · 23 min read

Why Pre‑trained Large Models Are the New Infrastructure for AI Applications

DataFunSummit

Feb 15, 2023 · Artificial Intelligence

Accelerating Computer Vision Pipelines with CV‑CUDA: Reducing Complexity and Boosting Performance

This article examines how moving image pre‑ and post‑processing to GPU with NVIDIA's CV‑CUDA reduces software complexity, alleviates CPU bottlenecks, and delivers up to thirty‑fold throughput gains for computer‑vision workloads across training and inference pipelines.

CV-CUDAComputer VisionDeep Learning

0 likes · 17 min read

Accelerating Computer Vision Pipelines with CV‑CUDA: Reducing Complexity and Boosting Performance

DataFunTalk

Feb 11, 2023 · Artificial Intelligence

Accelerating Computer Vision Pipelines with CV-CUDA: Reducing Complexity and Performance Bottlenecks

This article explains how moving image preprocessing and post‑processing to GPU with the open‑source CV‑CUDA library dramatically reduces system complexity, eliminates CPU‑GPU bottlenecks, and delivers up to thirty‑fold performance gains for computer‑vision workloads across training and inference stages.

CV-CUDAComputer VisionDeep Learning

0 likes · 16 min read

Accelerating Computer Vision Pipelines with CV-CUDA: Reducing Complexity and Performance Bottlenecks

DataFunTalk

Jan 12, 2023 · Artificial Intelligence

Tencent AI Lab's Advances in High‑Fidelity 3D Face Digitization and Evaluation

This article presents Tencent AI Lab's recent research on efficient 3D face digitization—including single‑photo, multi‑photo, and RGB‑D selfie pipelines—describes a detailed production workflow, introduces a new evaluation benchmark (REALY), and shares insights from a technical Q&A session.

3D face reconstructionAI LabComputer Vision

0 likes · 11 min read

Tencent AI Lab's Advances in High‑Fidelity 3D Face Digitization and Evaluation

DataFunTalk

Jan 8, 2023 · Artificial Intelligence

Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Images

The paper introduces ABPN, an Adaptive Blend Pyramid Network that achieves precise, high‑quality skin retouching and garment wrinkle removal on 4K‑8K photos in real time by combining a context‑aware local retouching layer with a novel adaptive blend pyramid layer, addressing challenges of artifact‑free detail preservation and efficient high‑resolution processing.

Computer VisionDeep Learningadaptive blend pyramid

0 likes · 16 min read

Adaptive Blend Pyramid Network for Real-Time Local Retouching of Ultra High-Resolution Images

Kuaishou Audio & Video Technology

Dec 30, 2022 · Artificial Intelligence

Unlocking Realistic Bokeh: Depth‑Aware Algorithms Behind Holiday Video Effects

This article explains the optical principles of bokeh (scatter blur), describes a depth‑aware variable‑focus algorithm developed by Kuaishou’s audio‑video team, and details practical optimizations such as saliency detection, edge‑preserving weighting, and adaptive spot‑light effects that enable realistic, customizable holiday video filters.

BokehComputer VisionDepth estimation

0 likes · 11 min read

Unlocking Realistic Bokeh: Depth‑Aware Algorithms Behind Holiday Video Effects

DataFunTalk

Dec 27, 2022 · Artificial Intelligence

Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework

This article reviews the challenges of ultra‑large‑scale face recognition, presents existing solutions such as metric learning, PFC and VFC, and details the proposed FFC framework with dual loaders, ID groups, probe and gallery networks, plus experimental results showing its cost‑effective performance.

AIComputer VisionDeep Learning

0 likes · 7 min read

Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework

Kuaishou Tech

Dec 26, 2022 · Artificial Intelligence

ICDAR 2023-DSText Video Text Reading Competition Overview

The ICDAR 2023-DSText competition, launching on February 15, 2023, focuses on dense and small text detection and recognition in video, providing a YouTube‑sourced dataset of 100 videos, two challenge tasks, a detailed timeline, eligibility rules, and a list of international sponsoring institutions.

Computer VisionDatasetICDAR

0 likes · 6 min read

ICDAR 2023-DSText Video Text Reading Competition Overview

Alibaba Cloud Developer

Dec 19, 2022 · Artificial Intelligence

How AI Transforms Football Video Analysis: Detection, Tracking, and Event Recognition

This article explores how artificial intelligence techniques such as deep learning, object detection, multi‑object tracking, and coordinate projection are applied to football video analysis to automatically detect the ball and players, map their positions onto the field, and recognize key events like shots and goals.

AIComputer VisionSports Analytics

0 likes · 16 min read

How AI Transforms Football Video Analysis: Detection, Tracking, and Event Recognition

DataFunTalk

Dec 17, 2022 · Artificial Intelligence

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

This article presents a comprehensive overview of multimodal pre‑training, describing its motivation, architecture choices, large‑scale Chinese image‑text dataset construction, training optimizations, performance benchmarks, downstream applications, and a Q&A session that highlights practical deployment considerations.

Computer VisionDeep LearningModel architecture

0 likes · 16 min read

Multimodal Pre‑training Techniques and Applications – Overview, OPPOVL Dataset, Architecture, and Performance

Laiye Technology Team

Dec 16, 2022 · Artificial Intelligence

Efficient Production of Scene-specific OCR Models Using an AI Platform

This article explains how a unified AI platform enables rapid, data‑driven creation, training, deployment, and evaluation of OCR models for visually distinct text regions such as seals, meter readings, license plates, and VIN codes, while minimizing hardware and annotation costs.

AI PlatformComputer VisionKubeflow

0 likes · 7 min read

Efficient Production of Scene-specific OCR Models Using an AI Platform

Alipay Experience Technology

Dec 12, 2022 · Artificial Intelligence

How Alipay Powers Mobile Vision: Architecture, Challenges, and Future Directions

This article reviews Alipay's mobile visual algorithm ecosystem, detailing its diverse application scenarios, technical challenges, architectural framework, lightweight design strategies, scalable modeling techniques, and future research directions for edge AI on billions of devices.

Algorithm OptimizationComputer Visionedge AI

0 likes · 20 min read

How Alipay Powers Mobile Vision: Architecture, Challenges, and Future Directions

DataFunSummit

Dec 9, 2022 · Artificial Intelligence

Volcano Engine Virtual Digital Human Technology Overview

This article provides a comprehensive overview of Volcano Engine's virtual digital human platform, detailing its definition, AI‑driven and human‑driven classifications, 2D and 3D technical architectures, multi‑modal perception, interaction capabilities, application scenarios, and future development directions.

2D avatar3D AvatarComputer Vision

0 likes · 15 min read

Volcano Engine Virtual Digital Human Technology Overview

DataFunTalk

Dec 5, 2022 · Artificial Intelligence

MogFace: A High‑Performance Face Detector with Dynamic Label Assignment, FP Context Analysis, and Pyramid‑Level Supervision

The article presents MogFace, a state‑of‑the‑art face detection system that combines a dynamic label‑assignment strategy, false‑positive context analysis, and pyramid‑layer ground‑truth supervision to achieve multiple top‑ranked results on the WIDER FACE benchmark, and details its architecture, observations, and experimental validation.

Computer VisionMogFacedynamic label assignment

0 likes · 7 min read

MogFace: A High‑Performance Face Detector with Dynamic Label Assignment, FP Context Analysis, and Pyramid‑Level Supervision

DataFunTalk

Nov 17, 2022 · Artificial Intelligence

Enhance the Visual Representation via Discrete Adversarial Training

The Alibaba AAIG team proposes Discrete Adversarial Training (DAT), which leverages VQGAN‑based discretization to generate natural‑looking adversarial samples that improve visual representation robustness and transferability across classification, self‑supervised learning, and object detection tasks without sacrificing accuracy, achieving new state‑of‑the‑art results on multiple benchmarks.

Computer VisionRobustnessVisual Representation

0 likes · 12 min read

Enhance the Visual Representation via Discrete Adversarial Training

Tencent Cloud Developer

Nov 11, 2022 · Artificial Intelligence

Tencent Advertising Multimedia AI Technology: Research and Application

Liu Wei outlines Tencent’s Advertising Multimedia AI ecosystem on the Taiji platform, describing a five‑platform matrix—Jue for content understanding, Qiankun for automated video creation, Shenzhen for AI‑driven review, Tianyin for hierarchical fingerprinting, and Hunyuan as a multimodal large model—featuring innovations such as massive multimodal pre‑training, logo retrieval, QA‑style attribute extraction, spatiotemporal video analysis, advanced auto‑judgment, and high‑performance hashing that achieve top cross‑modal retrieval results.

Computer VisionMultimodal AIadvertising technology

0 likes · 18 min read

Tencent Advertising Multimedia AI Technology: Research and Application

Shopee Tech Team

Nov 10, 2022 · Artificial Intelligence

ShopeeVideo OCR: Multi-language Text Recognition System for E-commerce Video

ShopeeVideo OCR is a multi‑language text‑recognition system for Southeast Asian e‑commerce videos that unifies detection, Transformer‑based recognition, layout analysis, and large‑scale synthetic data generation to handle Indonesian, Filipino, English, Vietnamese, Thai and Chinese scripts, delivering industry‑leading accuracy and winning thirteen ICDAR first‑place awards.

Computer VisionDeep LearningMulti-language OCR

0 likes · 15 min read

ShopeeVideo OCR: Multi-language Text Recognition System for E-commerce Video

Rare Earth Juejin Tech Community

Nov 9, 2022 · Artificial Intelligence

Detailed Explanation of Fully Convolutional Networks (FCN) for Semantic Segmentation

This article provides a comprehensive, beginner‑friendly overview of semantic segmentation, focusing on the pioneering Fully Convolutional Network (FCN) architecture, its variants (FCN‑32s, FCN‑16s, FCN‑8s), underlying concepts, loss computation, and practical tips for working with the VOC dataset.

AlexNetComputer VisionFCN

0 likes · 14 min read

Detailed Explanation of Fully Convolutional Networks (FCN) for Semantic Segmentation

Zhuanzhuan Tech

Nov 9, 2022 · Artificial Intelligence

Applying OCR to Game Skin Recognition: Filtering Owned Skins and Tolerant Text Matching

This article describes how OCR technology is used in a game marketplace to automatically extract skin parameters from user‑uploaded images, outlines methods for separating owned skin regions from background using color analysis, and presents a tolerant matching solution based on Rabin‑Karp hashing to handle OCR errors.

Computer VisionGame DevelopmentImage Processing

0 likes · 10 min read

Applying OCR to Game Skin Recognition: Filtering Owned Skins and Tolerant Text Matching

DataFunSummit

Oct 19, 2022 · Artificial Intelligence

Series Six of the Integer Intelligence Autonomous Driving Dataset Collection – Overview and Highlights

This article presents a comprehensive overview of several publicly available autonomous driving datasets, focusing on Series Six of the Integer Intelligence collection, which includes StreetLearn, UTBM RoboCar, Multi‑Vehicle Stereo Event Camera, comma2k19, the Annotated Laser Dataset, Ford, and Oxford RobotCar, detailing their sources, download links, publication years, key features, and research relevance.

Computer VisionDatasetsRobotics

0 likes · 10 min read

Series Six of the Integer Intelligence Autonomous Driving Dataset Collection – Overview and Highlights

Baidu Geek Talk

Oct 17, 2022 · Artificial Intelligence

OCR Technology: PaddleOCR and Paddle.js Integration

The article explains OCR fundamentals and details how Baidu’s open‑source PaddleOCR suite can be converted and run in browsers via the @paddlejs‑models/ocr SDK, describing model initialization, detection and CRNN‑based recognition pipelines, and presenting benchmark results that show the newer ch_PP‑OCRv2 model achieving higher accuracy and faster inference than the mobile variant.

AIComputer VisionOCR

0 likes · 9 min read

OCR Technology: PaddleOCR and Paddle.js Integration

Alibaba Cloud Big Data AI Platform

Oct 12, 2022 · Artificial Intelligence

Unlock Vision AI: How EasyCV Streamlines Datasets and Model Training

This article introduces EasyCV, an open‑source all‑in‑one visual algorithm platform that abstracts diverse data sources, provides SOTA self‑supervised models, and offers ready‑to‑download datasets for image classification, object detection, segmentation, and pose estimation, complete with configuration examples.

Computer VisionDatasetsDeep Learning

0 likes · 9 min read

Unlock Vision AI: How EasyCV Streamlines Datasets and Model Training

AntTech

Sep 27, 2022 · Artificial Intelligence

Ant Group’s Research Institute Publishes Four NeurIPS 2022 Papers on Advanced Computer Vision and AI

Ant Group’s Ant Technology Research Institute had four papers from its Visual Intelligence Lab accepted at NeurIPS 2022, covering rank diminishing in deep networks, geometry‑aware 3D image synthesis, dynamic discriminators for GANs, and uncertainty‑aware hierarchical refinement for incremental classification, highlighting the institute’s cutting‑edge AI research.

AI researchComputer VisionDeep Learning

0 likes · 8 min read

Ant Group’s Research Institute Publishes Four NeurIPS 2022 Papers on Advanced Computer Vision and AI

Zhengtong Technical Team

Sep 22, 2022 · Artificial Intelligence

How YOLOv5 Powers Real‑Time City Management Video Analysis

This article explains the background, workflow, and technical details of using the YOLOv5 one‑stage object detection algorithm to enable fast, accurate video analytics for urban management, covering data augmentation, backbone design, FPN‑PAN neck, and prediction output processing.

AIComputer VisionDeep Learning

0 likes · 8 min read

How YOLOv5 Powers Real‑Time City Management Video Analysis

HomeTech

Sep 20, 2022 · Artificial Intelligence

Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition

This article reviews the evolution of deep‑learning image‑classification networks, surveys attention mechanisms for fine‑grained tasks, describes the CVPR 2022 FGVC9 competition solution using RegNetY and random attention cropping, and discusses its deployment in automotive series recognition along with future challenges.

CVPRComputer VisionDeep Learning

0 likes · 19 min read

Deep Learning for Image Classification: Classic Networks, Attention Mechanisms, and Their Application to Fine‑Grained Classification and Automotive Series Recognition

Programmer DD

Sep 13, 2022 · Artificial Intelligence

Why AI Porn Detection Still Struggles: Key Challenges Explained

AI-based porn detection uses deep neural networks to classify images, but faces tough hurdles such as visual similarity with benign content, subjective standards for nudity, and vulnerabilities from training‑data dependence, meaning human moderators remain essential for reliable safety.

AI moderationComputer VisionContent Safety

0 likes · 3 min read

Why AI Porn Detection Still Struggles: Key Challenges Explained

DataFunSummit

Sep 6, 2022 · Artificial Intelligence

Recent Advances in Self‑Supervised Learning for Text Recognition (OCR)

This article reviews recent progress in applying self‑supervised learning to OCR text recognition, covering mainstream model architectures, key considerations for self‑supervised tasks on text images, and detailed analyses of representative papers such as SeqCLR, SimAN, and DiG, highlighting their designs, experiments, and results.

Computer VisionOCRcontrastive learning

0 likes · 20 min read

Recent Advances in Self‑Supervised Learning for Text Recognition (OCR)

ByteDance Terminal Technology

Sep 1, 2022 · Artificial Intelligence

Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion

This article presents a hybrid pipeline combining traditional computer vision techniques and deep learning models to automatically extract and verify text background colors in UI automation screenshots, effectively addressing challenges like limited training data and complex borders to significantly reduce manual inspection costs while achieving high accuracy and robustness in production environments.

Automated TestingComputer VisionDeep Learning

0 likes · 10 min read

Hybrid Computer Vision and Deep Learning for Automated UI Background Color Extraction and Assertion

DevOps

Aug 23, 2022 · Artificial Intelligence

Intelligent Automation Testing: Self‑Healing and Machine‑Learning Techniques

This article reviews the evolution of automated testing toward intelligent solutions, explaining self‑healing mechanisms, machine‑learning‑driven object recognition, computer‑vision and OCR approaches, industry tools such as Healenium and Airtest, and future prospects for zero‑code AI‑powered test automation.

AIComputer VisionOCR

0 likes · 13 min read

Intelligent Automation Testing: Self‑Healing and Machine‑Learning Techniques

DaTaobao Tech

Aug 19, 2022 · Artificial Intelligence

SepLUT: Separable Lookup Tables for Real-time Image Enhancement

SepLUT, a new separable lookup‑table framework, splits color enhancement into a 1‑D LUT for independent adjustments and a 3‑D LUT for correlated changes, predicted by a lightweight CNN, enabling quantizable, real‑time ISP performance with state‑of‑the‑art results on the FiveK benchmark.

Computer VisionDeep LearningReal-Time

0 likes · 12 min read

SepLUT: Separable Lookup Tables for Real-time Image Enhancement

Python Crawling & Data Mining

Aug 18, 2022 · Artificial Intelligence

How to Quickly Extract Text from Images in Python Using ddddocr and OpenCV

This article walks through a Python OCR solution for a blank image output problem, demonstrates a working ddddocr code snippet, and suggests an alternative OpenCV preprocessing step, providing clear screenshots and concise explanations for effective image text extraction.

Computer VisionImage ProcessingOCR

0 likes · 3 min read

How to Quickly Extract Text from Images in Python Using ddddocr and OpenCV

FunTester

Aug 18, 2022 · Artificial Intelligence

How AI Can Automate UI Testing: Building Image‑Based Anomaly Detection

This article examines the evolution of mobile UI testing toward AI‑driven approaches, outlines the challenges of large‑scale apps, and details a practical workflow for constructing image‑based anomaly datasets, training a ResNet‑18 model, and iterating on detection performance.

AI testingComputer VisionDeep Learning

0 likes · 13 min read

How AI Can Automate UI Testing: Building Image‑Based Anomaly Detection

Beike Product & Technology

Aug 12, 2022 · Artificial Intelligence

Green Area Generation Method Based on Pix2pix Model

This paper proposes a pix2pix‑based method to automatically generate green areas for large‑scale outdoor 3D scene modeling, detailing dataset creation via OpenCV segmentation, model training, region partitioning, and experimental results showing a 93.8% acceptance rate, significantly improving efficiency over manual drawing.

3D modelingComputer VisionGAN

0 likes · 14 min read

Green Area Generation Method Based on Pix2pix Model

ITPUB

Jul 21, 2022 · Artificial Intelligence

From Blur to Brilliance: How AI‑Powered Image Quality Assessment Transformed 58.com’s Recruitment Images

This article reviews image quality assessment fundamentals, modern CNN‑based IQA models, and their deployment at 58.com to automatically score, filter, and rank millions of recruitment photos, achieving a drop in low‑quality images from 9% to zero while boosting overall accuracy to 94.7%.

Business ApplicationCNNComputer Vision

0 likes · 19 min read

From Blur to Brilliance: How AI‑Powered Image Quality Assessment Transformed 58.com’s Recruitment Images