Tagged articles
650 articles
Page 4 of 7
AntTech
AntTech
Jul 18, 2022 · Artificial Intelligence

Trusted AI Research at Ant Group: Advances in Computer Vision, Watermark Defense, Robust Machine Learning, and Explainable NLG

Ant Group’s security labs present a series of cutting‑edge AI research achievements—including hierarchical multi‑granular classification for computer vision, watermark‑vaccine defenses, multi‑modal document understanding, robust and explainable machine learning, and logic‑driven data‑to‑text generation—highlighting their commitment to trustworthy and secure AI applications.

AI SafetyComputer VisionData2Text
0 likes · 12 min read
Trusted AI Research at Ant Group: Advances in Computer Vision, Watermark Defense, Robust Machine Learning, and Explainable NLG
JD Tech
JD Tech
Jul 18, 2022 · Artificial Intelligence

AI-Powered Visual Defect Detection for Mobile App UI Testing: Methodology, Data Construction, Model Training, and Evaluation

This article presents an end‑to‑end AI‑driven visual testing solution for mobile applications, detailing the business pain points, data set construction, CNN‑based model design, training procedures, performance evaluation with ROC and confusion matrices, and future directions for improving defect detection accuracy.

Computer VisionDeep LearningImage Classification
0 likes · 14 min read
AI-Powered Visual Defect Detection for Mobile App UI Testing: Methodology, Data Construction, Model Training, and Evaluation
MaGe Linux Operations
MaGe Linux Operations
Jul 14, 2022 · Artificial Intelligence

How to Detect Nude Images with Python and Pillow: A Complete Guide

This article walks through building a Python3 program that uses the Pillow library to identify skin regions in images, applies color‑space heuristics to classify pixels, merges connected skin areas, and decides whether an image is pornographic based on configurable rules, complete with code samples and testing results.

Computer VisionImage ProcessingPython
0 likes · 22 min read
How to Detect Nude Images with Python and Pillow: A Complete Guide
58 Tech
58 Tech
Jul 14, 2022 · Artificial Intelligence

Image Quality Assessment Techniques and Their Application in 58.com Recruitment Image Filtering

This article reviews image quality assessment (IQA) methods—including full‑reference, reduced‑reference, and no‑reference approaches—covers typical datasets and evaluation metrics, describes CNN‑based models such as WaDIQaM, DBCNN and hyperIQA, and details a customized IQA solution deployed at 58.com to filter and rank recruitment images, achieving a reduction of bad‑image rate from 9% to 0%.

CNNComputer VisionIQA
0 likes · 17 min read
Image Quality Assessment Techniques and Their Application in 58.com Recruitment Image Filtering
Alimama Tech
Alimama Tech
Jul 13, 2022 · Artificial Intelligence

Fully Automatic Template‑Free Image‑Text Creative Generation System

Alibaba Alimama’s fully automatic, template‑free image‑text creative generation system uses deep‑learning models across material mining, layout synthesis, on‑image copy generation, and visual attribute rendering to produce personalized ad creatives directly from product images and metadata, achieving roughly 19 % CTR lift over prior template‑based methods.

AIComputer VisionGenerative Models
0 likes · 19 min read
Fully Automatic Template‑Free Image‑Text Creative Generation System
DataFunTalk
DataFunTalk
Jul 12, 2022 · Artificial Intelligence

Applying Computer Vision for Content Safety in Live Streaming: Practices and Future Directions

This presentation details how Huya leverages computer‑vision algorithms to detect and mitigate risky content such as political, pornographic, and violent material in live‑streaming and short‑video platforms, describing system architecture, labeling strategies, algorithmic pipelines, real‑time moderation techniques, and future research directions.

AI SafetyComputer VisionRisk Detection
0 likes · 11 min read
Applying Computer Vision for Content Safety in Live Streaming: Practices and Future Directions
DaTaobao Tech
DaTaobao Tech
Jul 1, 2022 · Artificial Intelligence

Deep Generative Projection for High‑Fidelity Virtual Try‑On

The paper presents Deep Generative Projection (DGP), a virtual‑try‑on system that learns a realistic dressing distribution from unpaired images with StyleGAN, projects coarse garment‑human alignments into its latent space, refines details, and achieves higher fidelity and robustness than supervised SOTA methods without needing paired data.

Computer VisionUnsupervised Learninggenerative adversarial network
0 likes · 13 min read
Deep Generative Projection for High‑Fidelity Virtual Try‑On
DataFunTalk
DataFunTalk
Jun 30, 2022 · Artificial Intelligence

Self‑Augmented Unpaired Image Dehazing via Density and Depth Decomposition (D4)

The paper introduces D4, a self‑augmented unpaired image dehazing framework that decomposes the transmission map into fog density and scene depth, enabling realistic fog synthesis for data augmentation and achieving superior dehazing performance with fewer parameters and FLOPs on multiple benchmarks.

CVPR2022Computer VisionDepth estimation
0 likes · 14 min read
Self‑Augmented Unpaired Image Dehazing via Density and Depth Decomposition (D4)
AntTech
AntTech
Jun 24, 2022 · Artificial Intelligence

Hierarchical Residual Network for Multi‑Granularity Classification (HRN) – CVPR 2022 Paper Overview

This article presents a CVPR 2022 paper by Zhejiang University and Ant Group that introduces a label‑relation‑tree‑based Hierarchical Residual Network (HRN) for improving multi‑granularity image classification, detailing its motivation, architecture, composite loss design, extensive experiments on fine‑grained datasets, and practical impact on content‑security applications.

CVPR2022Computer VisionDeep Learning
0 likes · 12 min read
Hierarchical Residual Network for Multi‑Granularity Classification (HRN) – CVPR 2022 Paper Overview
Meituan Technology Team
Meituan Technology Team
Jun 23, 2022 · Artificial Intelligence

Highlights of Six Meituan Papers Accepted at CVPR 2022

Meituan’s six CVPR 2022 papers advance computer vision by introducing a few‑sample model compression method, a language‑bridged video object segmentation approach, a single‑stage 3D visual grounding technique, a dynamic early‑exit image captioning system, a boosted black‑box adversarial attack, and a semi‑supervised video paragraph grounding framework.

3D groundingCVPR 2022Computer Vision
0 likes · 15 min read
Highlights of Six Meituan Papers Accepted at CVPR 2022
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 20, 2022 · Artificial Intelligence

Action Sequence Verification in Videos with CosAlignment Transformer (CAT)

The paper introduces Action Sequence Verification (ASV), a task that determines whether two videos follow the same ordered actions, provides the Chemical Sequence Verification dataset and re‑annotated COIN‑SV and Diving48‑SV sets, and proposes the CosAlignment Transformer (CAT) with intra‑step feature extraction, a Transformer‑based inter‑step encoder, and a sequence‑alignment loss that outperforms prior baselines and serves as a pre‑training model for video retrieval and classification.

Action VerificationComputer VisionDataset
0 likes · 7 min read
Action Sequence Verification in Videos with CosAlignment Transformer (CAT)
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Jun 13, 2022 · Artificial Intelligence

Neighbor Transformer (NFormer): Robust Person Re-identification via Interactive Multi‑image Modeling

Neighbor Transformer (NFormer) introduces interactive multi‑image modeling for person re‑identification, using Landmark Agent Attention and Reciprocal Neighbor Softmax to efficiently fuse features across images, achieving state‑of‑the‑art accuracy and tighter embedding clusters on multiple benchmark datasets.

Computer VisionDeep Learninglandmark agent attention
0 likes · 8 min read
Neighbor Transformer (NFormer): Robust Person Re-identification via Interactive Multi‑image Modeling
DaTaobao Tech
DaTaobao Tech
Jun 10, 2022 · Artificial Intelligence

NeRF-Editing: Geometry Editing of Neural Radiance Fields

NeRF‑Editing introduces an interactive framework that lets users freely deform the geometry of neural radiance fields by coupling an explicit mesh with implicit NeRF representations, propagating mesh vertex changes through tetrahedral ARAP optimization to bend rays during rendering, enabling realistic edits and animations on synthetic and real‑world scenes, a first reported at CVPR 2022.

3D reconstructionARAP deformationComputer Vision
0 likes · 6 min read
NeRF-Editing: Geometry Editing of Neural Radiance Fields
ITPUB
ITPUB
Jun 9, 2022 · Artificial Intelligence

How 58’s Multi‑Label Image Recognition Boosts Semantic Search and Recommendations

This article details the design, data pipeline, model architecture, loss functions, and evaluation metrics of a large‑scale multi‑label image classification system built for 58.com, showing how it improves semantic similarity detection, recommendation, and content moderation across diverse business domains.

Computer VisionDeep Learningasymmetric loss
0 likes · 18 min read
How 58’s Multi‑Label Image Recognition Boosts Semantic Search and Recommendations
Python Programming Learning Circle
Python Programming Learning Circle
Jun 9, 2022 · Artificial Intelligence

Python Nude Image Detection Using Pillow: Algorithm, Implementation, and Visualization

This tutorial explains how to build a Python program that detects nude images by analyzing skin-colored regions with Pillow, covering project setup, image preprocessing, pixel classification using RGB/HSV/YCrCb formulas, region merging, decision rules, and command‑line usage with optional visualization.

Computer VisionImage ProcessingNude Detection
0 likes · 23 min read
Python Nude Image Detection Using Pillow: Algorithm, Implementation, and Visualization
58 Tech
58 Tech
Jun 9, 2022 · Artificial Intelligence

Multi‑Label Image Recognition for 58.com: Algorithm Design, Data Construction, and Model Optimization

This article presents a comprehensive study of multi‑label image recognition applied to 58.com’s business scenarios, covering problem motivation, dataset construction, evaluation metrics, mainstream deep‑learning methods, an asymmetric‑loss‑based optimization pipeline, and practical output schemes for recommendation and retrieval.

Computer Visionasymmetric lossdata annotation
0 likes · 17 min read
Multi‑Label Image Recognition for 58.com: Algorithm Design, Data Construction, and Model Optimization
DaTaobao Tech
DaTaobao Tech
Jun 8, 2022 · Artificial Intelligence

Modeling Indirect Illumination for Inverse Rendering

The CVPR‑2022 paper by Alibaba’s Taobao Tech and Zhejiang University introduces a neural‑radiance‑field‑based method that directly models indirect illumination via a signed‑distance‑field geometry and spherical‑Gaussian visibility, avoiding costly path tracing and enabling more accurate recovery of geometry, material and lighting for realistic free‑viewpoint relighting.

BRDFComputer Visionindirect illumination
0 likes · 9 min read
Modeling Indirect Illumination for Inverse Rendering
Youku Technology
Youku Technology
Jun 7, 2022 · Artificial Intelligence

Mobile Real-Time Portrait Segmentation for Youku Bullet Comment Passthrough

To enable real‑time bullet‑comment passthrough on Youku’s mobile app, the team built a million‑scale portrait dataset and designed the AirSegNet series—CPU, GPU, and server variants—using VGG‑style nets, edge‑aware losses, and hybrid CPU‑GPU inference, achieving 0.98 IoU and sub‑15 ms latency on most devices.

Computer VisionEdge ComputingMNN Framework
0 likes · 13 min read
Mobile Real-Time Portrait Segmentation for Youku Bullet Comment Passthrough
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Jun 2, 2022 · Artificial Intelligence

How Knowledge Distillation Shrinks Deep Neural Networks Without Losing Accuracy

Knowledge Distillation, a teacher‑student model compression technique, enables large, high‑performing deep neural networks to transfer their learned representations to smaller models, achieving comparable accuracy with faster inference, reduced resource consumption, and broader applicability in computer‑vision tasks.

AIComputer VisionFitNet
0 likes · 14 min read
How Knowledge Distillation Shrinks Deep Neural Networks Without Losing Accuracy
Java Backend Technology
Java Backend Technology
May 28, 2022 · Artificial Intelligence

5 Mind-Blowing Open-Source Projects That Let You Control Faces, Erase Spiders, and Hack Wi-Fi

This article showcases five cutting‑edge open‑source projects—from a ROS‑based system that lets a gamepad animate facial muscles, to AI‑driven video inpainting, text‑to‑image generation, eye‑gaze computer control, and a comprehensive Wi‑Fi cracking toolkit—each pushing the boundaries of modern tech.

AIComputer VisionRobotics
0 likes · 6 min read
5 Mind-Blowing Open-Source Projects That Let You Control Faces, Erase Spiders, and Hack Wi-Fi
Youku Technology
Youku Technology
May 18, 2022 · Artificial Intelligence

Subjective and Objective Quality of Experience of Free Viewpoint Videos – Paper Overview

This IEEE TIP paper presents a large‑scale subjective‑objective study of Free Viewpoint Video quality, introducing a cost‑saving two‑stage labeling workflow, a sparse‑frame benchmark model, and publicly releasing the dataset and code, with contributions from Alibaba’s Moku Lab and Jiangxi University researchers.

Computer VisionDatasetFree Viewpoint Video
0 likes · 5 min read
Subjective and Objective Quality of Experience of Free Viewpoint Videos – Paper Overview
Code DAO
Code DAO
May 18, 2022 · Artificial Intelligence

A Practical Guide to PyTorch Visualization Tools for Deep Learning

This article walks through the core PyTorch visualization utilities—making image grids, drawing bounding boxes, segmentation masks, and keypoints—explaining why they are needed, how to set up the pipeline, and providing complete code examples for each computer‑vision task.

Bounding BoxesComputer VisionKeypoints
0 likes · 18 min read
A Practical Guide to PyTorch Visualization Tools for Deep Learning
DaTaobao Tech
DaTaobao Tech
May 11, 2022 · Artificial Intelligence

AdaInt: Learning Adaptive Intervals for 3D Lookup Tables in Real‑time Image Enhancement

AdaInt introduces a lightweight convolutional network that predicts non‑uniform sampling coordinates and basis 3D LUTs, using a differentiable binary‑search AiLUT‑Transform to enable end‑to‑end training, thereby delivering superior PSNR, negligible extra parameters, and real‑time color enhancement on ultra‑high‑resolution images, outperforming prior state‑of‑the‑art methods.

3D LUTComputer VisionReal-Time
0 likes · 11 min read
AdaInt: Learning Adaptive Intervals for 3D Lookup Tables in Real‑time Image Enhancement
Bilibili Tech
Bilibili Tech
May 10, 2022 · Artificial Intelligence

Glance Supervised Video Moment Retrieval via the ViGA Framework

The paper presents a glance‑supervised video moment retrieval approach that records a single annotator‑seen frame, introduces the ViGA contrastive learning framework to leverage this weak temporal cue, and demonstrates on three benchmarks performance rivaling fully supervised methods while keeping annotation cost minimal.

Computer VisionGlance SupervisionViGA
0 likes · 8 min read
Glance Supervised Video Moment Retrieval via the ViGA Framework
Code DAO
Code DAO
May 10, 2022 · Artificial Intelligence

How Geometric Deep Learning Enables Spherical CNNs for Rotationally Equivariant Vision

The article explains why traditional planar CNNs fail on spherical data, describes how encoding rotational symmetry through continuous spherical representations and spherical harmonics leads to spherical convolutions that are rotation‑equivariant, and outlines the practical computation using harmonic coefficients.

Computer Visiongeometric-deep-learningrotational equivariance
0 likes · 9 min read
How Geometric Deep Learning Enables Spherical CNNs for Rotationally Equivariant Vision
Tencent Cloud Developer
Tencent Cloud Developer
Apr 27, 2022 · Artificial Intelligence

Alignment-Uniformity Representation Learning for Zero-shot Video Classification (AURL)

The AURL framework, presented by Pu Shi, introduces alignment‑uniformity aware representation learning for zero‑shot video classification, achieving up to 28 % top‑1 accuracy gains on UCF101 and HMDB51, and has already boosted business metrics in Tencent’s advertising, search, and video‑channel recommendation systems.

AlignmentComputer VisionDeep Learning
0 likes · 19 min read
Alignment-Uniformity Representation Learning for Zero-shot Video Classification (AURL)
Python Programming Learning Circle
Python Programming Learning Circle
Apr 26, 2022 · Artificial Intelligence

Python Script for Adding Face Masks to CelebA Images Using the face_recognition Library

This article demonstrates how to use Python, the face_recognition library, and OpenCV/Pillow to automatically detect facial landmarks in CelebA images, generate and align mask overlays, and save both masked and binary mask versions for computer‑vision research and dataset augmentation.

Computer VisionImage ProcessingPython
0 likes · 11 min read
Python Script for Adding Face Masks to CelebA Images Using the face_recognition Library
Meituan Technology Team
Meituan Technology Team
Apr 14, 2022 · Artificial Intelligence

Short Video Content Understanding and Generation Practices at Meituan

Meituan leverages computer‑vision techniques to tag, analyze, and automatically generate short videos across consumer and merchant scenarios, detailing hierarchical tag design, self‑supervised representation learning, fine‑grained food recognition, intelligent cover creation, and pixel‑level editing to enhance content discovery and presentation.

AI content generationComputer Visionfine-grained recognition
0 likes · 20 min read
Short Video Content Understanding and Generation Practices at Meituan
Kuaishou Tech
Kuaishou Tech
Apr 11, 2022 · Artificial Intelligence

Kuaishou's Custom Video Matting Solution: Interactive Object Segmentation for Mobile Creators

Kuaishou's audio‑video technology team presents a self‑developed custom video matting system that combines foreground, interactive, and video object segmentation to let creators extract arbitrary subjects without green screens, featuring adaptive cropping, multi‑stage training, and deployment across Android and iOS devices.

Computer VisionDeep LearningKuaishou
0 likes · 15 min read
Kuaishou's Custom Video Matting Solution: Interactive Object Segmentation for Mobile Creators
Python Programming Learning Circle
Python Programming Learning Circle
Apr 9, 2022 · Artificial Intelligence

Image Resizing with OpenCV and PyTorch

This article explains how to resize images using OpenCV's cv2.resize function and how to scale multi‑dimensional tensors in PyTorch with torch.nn.functional.interpolate, providing detailed parameter descriptions and practical code examples for both single images and batch processing.

Computer VisionImage ProcessingPyTorch
0 likes · 6 min read
Image Resizing with OpenCV and PyTorch
Meituan Technology Team
Meituan Technology Team
Apr 7, 2022 · Mobile Development

Zero‑Code Scripted Guidance for Mobile Apps Using CV and AI

The ASG system delivers stack‑agnostic, zero‑code in‑app guidance by combining traditional computer‑vision matching with deep‑learning detectors, enabling product teams to author scripts visually, cut development time below half a person‑day, boost task completion from 18 % to 35.7 %, and slash costs over 90 %.

Computer VisionMobile Developmentimage matching
0 likes · 31 min read
Zero‑Code Scripted Guidance for Mobile Apps Using CV and AI
Kuaishou Large Model
Kuaishou Large Model
Apr 6, 2022 · Artificial Intelligence

How Transformers Revolutionize Image Style Transfer: Introducing StyTr²

This article reviews the limitations of traditional CNN‑based image stylization, explains how Transformer architectures overcome these issues with global context and self‑attention, and presents the novel StyTr² method with content‑aware positional encoding that achieves superior, detail‑preserving style transfer results.

Computer VisionDeep LearningTransformer
0 likes · 8 min read
How Transformers Revolutionize Image Style Transfer: Introducing StyTr²
Tencent Architect
Tencent Architect
Apr 6, 2022 · Artificial Intelligence

Award-Winning AIoT Projects from the 2021 TencentOS Tiny AIoT Innovation Competition

The 2021 TencentOS Tiny AIoT Innovation Competition showcased over 50 original projects, including award‑winning multi‑functional pedestrian detection devices, AI‑enhanced smart wheelchairs, and endangered‑animal recognition systems, each demonstrating low‑power embedded AI, edge computing, and cloud integration for diverse real‑world applications.

AIoTComputer VisionEdge Computing
0 likes · 8 min read
Award-Winning AIoT Projects from the 2021 TencentOS Tiny AIoT Innovation Competition
Kuaishou Tech
Kuaishou Tech
Apr 6, 2022 · Artificial Intelligence

StyTr²: A Transformer‑Based Approach for Image Style Transfer

The paper proposes StyTr², a Transformer‑based image style transfer method that uses content‑aware positional encoding to preserve details and improve feature representation, achieving high‑quality stylization with better content structure and style patterns.

Computer VisionDeep Learningcontent-aware positional encoding
0 likes · 7 min read
StyTr²: A Transformer‑Based Approach for Image Style Transfer
Laiye Technology Team
Laiye Technology Team
Mar 25, 2022 · Artificial Intelligence

Laiye OCR Error‑Correction Model: Architecture, Implementation, and Evaluation

This article describes Laiye's OCR error‑correction system, detailing the background challenges of Chinese character recognition, the analysis of three possible solutions, the chosen post‑processing approach, model architecture, training data, loss design, online inference, and experimental results showing a measurable performance boost.

Chinese textComputer VisionDeep Learning
0 likes · 13 min read
Laiye OCR Error‑Correction Model: Architecture, Implementation, and Evaluation
JD Cloud Developers
JD Cloud Developers
Mar 21, 2022 · Artificial Intelligence

ViTAEv2 Breaks ImageNet Real Record with 91.2% Accuracy – How a 600M‑Parameter Model Redefines Few‑Shot Learning

JD Research Institute and the University of Sydney introduced ViTAEv2, a 600‑million‑parameter deep learning model that achieved a world‑leading 91.2% top‑1 accuracy on ImageNet Real without external data, demonstrating strong few‑shot learning, reducing labeling costs, and promising advances across many computer‑vision tasks.

AI modelComputer VisionDeep Learning
0 likes · 4 min read
ViTAEv2 Breaks ImageNet Real Record with 91.2% Accuracy – How a 600M‑Parameter Model Redefines Few‑Shot Learning
JD Retail Technology
JD Retail Technology
Mar 7, 2022 · Artificial Intelligence

AI-Driven UI Testing: Data Collection, Model Development, and Deployment for Mobile App Anomaly Detection

This article presents a comprehensive study on applying AI and deep‑learning techniques to mobile UI testing, covering background challenges, feasibility research, abnormal sample construction, model design, training, evaluation, and future directions for intelligent test automation.

AI testingComputer VisionModel Training
0 likes · 13 min read
AI-Driven UI Testing: Data Collection, Model Development, and Deployment for Mobile App Anomaly Detection
Kuaishou Large Model
Kuaishou Large Model
Mar 4, 2022 · Artificial Intelligence

How Adaptive 3D Face Cutout Transforms Kuaishou’s AR Effects

This article explains the adaptive 3D face cutout technology behind Kuaishou's "3D Zoom Face" effect, detailing its problem‑solving approach, implementation workflow, camera‑control optimizations, and how it expands creative possibilities while lowering production costs for both creators and users.

3D renderingAR effectsComputer Vision
0 likes · 16 min read
How Adaptive 3D Face Cutout Transforms Kuaishou’s AR Effects
JD Cloud Developers
JD Cloud Developers
Mar 3, 2022 · Artificial Intelligence

How JD Explore’s Silver‑Bullet‑3D Dominated the SAPIEN ManiSkill Challenge

JD Explore Research Institute’s Visual and Multimedia Lab team “Silver‑Bullet‑3D” secured top positions in the 2021 SAPIEN ManiSkill Challenge by excelling in both imitation‑learning and rule‑based tracks, showcasing cutting‑edge computer‑vision and robotic‑arm control technologies that earned them international recognition.

AI competitionComputer VisionRobotics
0 likes · 5 min read
How JD Explore’s Silver‑Bullet‑3D Dominated the SAPIEN ManiSkill Challenge
Python Crawling & Data Mining
Python Crawling & Data Mining
Feb 22, 2022 · Artificial Intelligence

Create a Dancing Word‑Cloud Video with Python and AI

This tutorial walks through downloading a dance video, extracting frames, using Baidu AI for person segmentation, generating word‑cloud masks, and stitching the results into a dancing word‑cloud video with Python, OpenCV and the WordCloud library.

Baidu AIComputer VisionOpenCV
0 likes · 8 min read
Create a Dancing Word‑Cloud Video with Python and AI
Kuaishou Tech
Kuaishou Tech
Feb 9, 2022 · Mobile Development

Kuaishou Mobile Mixed Reality System: Architecture, Algorithms, and Applications

This article presents Kuaishou's mobile mixed reality (MR) system, detailing its integration of deep learning, SLAM, and scene reconstruction for real‑time spatial computing, the design of a monocular depth‑estimation model, a lightweight 3D rendering engine, and its deployment across iOS and Android devices with various user‑facing effects.

Computer VisionDepth estimationKuaishou
0 likes · 16 min read
Kuaishou Mobile Mixed Reality System: Architecture, Algorithms, and Applications
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 28, 2022 · Artificial Intelligence

How Masked Autoencoders Revolutionize Vision Pre‑Training: A Deep Dive

This article provides a detailed technical walkthrough of Masked Autoencoders (MAE) for computer vision, covering its BERT‑inspired masking strategy, asymmetric encoder‑decoder design, implementation specifics, experimental findings on mask ratios and decoder depth, and the resulting performance gains over supervised ViT models.

Computer VisionMAEMasked Modeling
0 likes · 11 min read
How Masked Autoencoders Revolutionize Vision Pre‑Training: A Deep Dive
Kuaishou Tech
Kuaishou Tech
Jan 27, 2022 · Artificial Intelligence

Kuaishou’s Self‑Developed Green‑Screen Matting Algorithm and Its Deployment in Kuaiying, Live Companion, and Cloud Editing

This article explains the principles, challenges, and implementation details of Kuaishou’s proprietary green‑screen matting algorithm, covering fine‑detail handling, color‑spill reduction, green‑reflection removal, and its real‑time deployment across mobile video‑editing and live‑streaming products.

Computer VisionKuaishouReal-time Processing
0 likes · 13 min read
Kuaishou’s Self‑Developed Green‑Screen Matting Algorithm and Its Deployment in Kuaiying, Live Companion, and Cloud Editing
Kuaishou Tech
Kuaishou Tech
Jan 26, 2022 · Artificial Intelligence

Technical Overview of Kuaishou Y‑Tech Body‑Shaping Effects and Underlying Algorithms

This article explains how Kuaishou's Y‑Tech leverages human detection, keypoint localization, and image‑deformation algorithms such as stretching, triangulation and liquify, together with background‑distortion correction, to deliver seven stable, natural body‑shaping effects for short‑video applications.

AIComputer Visionbody shaping
0 likes · 13 min read
Technical Overview of Kuaishou Y‑Tech Body‑Shaping Effects and Underlying Algorithms
Kuaishou Large Model
Kuaishou Large Model
Jan 22, 2022 · Artificial Intelligence

How Kuaishou Achieves Realistic Body Beautification with AI‑Driven Pose Detection and Image Warping

This article explains Kuaishou’s Y‑tech body‑beautification pipeline, detailing how proprietary human pose detection, key‑point localization, and image‑warping techniques such as stretching, triangulation, and liquify are combined to create stable, natural effects like long‑leg, slim‑waist, and swan‑neck, while minimizing background distortion.

AIComputer Visionbody beautification
0 likes · 15 min read
How Kuaishou Achieves Realistic Body Beautification with AI‑Driven Pose Detection and Image Warping
Baidu Geek Talk
Baidu Geek Talk
Jan 17, 2022 · Artificial Intelligence

Unlocking Video AI: PaddleVideo’s Open‑Source Solutions for Sports, Media, and Safety

This article surveys PaddleVideo, Baidu's open‑source video AI toolkit, detailing its industry‑focused models for sports action recognition, multimodal tagging, intelligent production, interactive segmentation, drone detection, and medical imaging, while providing performance metrics and GitHub resources for each solution.

Computer VisionMultimodal LearningPaddleVideo
0 likes · 14 min read
Unlocking Video AI: PaddleVideo’s Open‑Source Solutions for Sports, Media, and Safety
DataFunSummit
DataFunSummit
Jan 5, 2022 · Artificial Intelligence

Improving Financial Micro‑Business Efficiency with OCR: Challenges, Applications, and an Intelligent Platform

This article explores how optical character recognition (OCR) technology can address the financing pain points of micro‑enterprises by automating document verification, enhancing risk assessment, and enabling an end‑to‑end intelligent OCR platform built on deep‑learning models, data pipelines, and deployment automation.

Computer VisionDocument AutomationMicro Business
0 likes · 15 min read
Improving Financial Micro‑Business Efficiency with OCR: Challenges, Applications, and an Intelligent Platform
Code DAO
Code DAO
Dec 31, 2021 · Artificial Intelligence

Why RegNet Is the Most Flexible Architecture for Computer Vision

RegNet introduces a scalable design space defined by quantized linear functions, enabling flexible trade‑offs between accuracy, efficiency, and mobile deployment, and demonstrates superior performance compared with ResNet, EfficientNet, and other mobile‑optimized networks.

Computer VisionDeep LearningDesign Space
0 likes · 7 min read
Why RegNet Is the Most Flexible Architecture for Computer Vision
Laiye Technology Team
Laiye Technology Team
Dec 31, 2021 · Artificial Intelligence

Overview of Table Recognition Techniques and Practical Implementation

This article reviews the challenges of extracting structured table data from images, compares two‑stage and end‑to‑end OCR approaches, evaluates four state‑of‑the‑art table‑recognition models (SPLERGE, CascadeTabNet, TableMASTER, UnetTable), and presents a practical deployment workflow with performance metrics.

AIComputer VisionDeep Learning
0 likes · 14 min read
Overview of Table Recognition Techniques and Practical Implementation
Code DAO
Code DAO
Dec 29, 2021 · Artificial Intelligence

Understanding Stand-Alone Axial-Attention for Panoptic Segmentation

The paper proposes a stand‑alone axial‑attention mechanism that converts 2‑D attention into 1‑D to lower computational cost while preserving global context, introduces position‑sensitive self‑attention, integrates it into Axial‑ResNet and Axial‑DeepLab, and demonstrates strong results on four large segmentation datasets.

Axial AttentionComputer VisionDeepLab
0 likes · 7 min read
Understanding Stand-Alone Axial-Attention for Panoptic Segmentation
Laravel Tech Community
Laravel Tech Community
Dec 27, 2021 · Artificial Intelligence

OpenCV 4.5.5 Release Highlights and New Features

OpenCV 4.5.5 introduces audio support in VideoCapture, updates SOVERSION handling, adds OpenVINO 2021.4.2 LTS compatibility, expands ONNX test coverage, upgrades protobuf, optimizes for RISC‑V, and enhances the G‑API module with numerous vectorized kernels, SIMD scheduling, and various bug fixes.

AIComputer VisionG-API
0 likes · 3 min read
OpenCV 4.5.5 Release Highlights and New Features
Code DAO
Code DAO
Dec 22, 2021 · Artificial Intelligence

Understanding SimCLR: A Simple Contrastive Learning Framework for Visual Representations

This article explains SimCLR, the 2020 Google Research framework that advances self‑supervised visual pre‑training by using extensive data augmentations, a ResNet encoder, a projection‑head MLP, and the NT‑Xent loss to learn robust image representations that outperform many prior methods on ImageNet and other benchmarks.

Computer VisionNT-Xent lossResNet
0 likes · 7 min read
Understanding SimCLR: A Simple Contrastive Learning Framework for Visual Representations
ITPUB
ITPUB
Dec 13, 2021 · Artificial Intelligence

How Data Augmentation Boosts Machine Learning When Data Is Scarce

This article explains how data augmentation can alleviate overfitting by artificially expanding limited training sets, outlines common transformation techniques for images, text, and audio, and discusses the method's benefits, practical applications, and inherent limitations for machine‑learning practitioners.

Computer VisionDeep Learningdata augmentation
0 likes · 6 min read
How Data Augmentation Boosts Machine Learning When Data Is Scarce
Code DAO
Code DAO
Dec 12, 2021 · Artificial Intelligence

Lightning Flash 0.3 Introduces New Tasks, Visualization Tools, Data Pipelines, and Registry API

Lightning Flash 0.3 expands the PyTorch Lightning ecosystem with eight new computer‑vision and NLP tasks, modular API design, integrated model hubs, visualisation callbacks, customizable data‑source hooks, and a central registry for model backbones, all illustrated with concrete code examples.

Computer VisionDeep LearningLightning Flash
0 likes · 7 min read
Lightning Flash 0.3 Introduces New Tasks, Visualization Tools, Data Pipelines, and Registry API
Kuaishou Large Model
Kuaishou Large Model
Dec 10, 2021 · Artificial Intelligence

How AI Restores Blurry Faces: Inside Kuaishou’s Y‑Tech High‑Definition Portrait Project

Image clarity impacts daily life, from personal memories to security, and Kuaishou’s Y‑Tech team tackles degradation by constructing paired low‑high quality datasets and a style‑based AI model that leverages facial masks to restore high‑definition portraits, preserving identity while enhancing detail.

AIComputer VisionDeep Learning
0 likes · 10 min read
How AI Restores Blurry Faces: Inside Kuaishou’s Y‑Tech High‑Definition Portrait Project
Code DAO
Code DAO
Dec 5, 2021 · Artificial Intelligence

Why DropBlock Outperforms Dropout as an Image Regularizer

This article demonstrates how to implement DropBlock in PyTorch, explains why Dropout fails on image data, details the gamma calculation and mask generation, and shows visual comparisons that illustrate the superiority of contiguous region dropping over random pixel dropout.

Computer VisionDeep LearningDropBlock
0 likes · 11 min read
Why DropBlock Outperforms Dropout as an Image Regularizer
Java Captain
Java Captain
Dec 4, 2021 · Artificial Intelligence

Java Spring Boot License Plate Recognition and Training System (Open‑Source)

This open‑source project implements a Spring Boot and Maven based license‑plate detection and training system in Java, leveraging OpenCV and JavaCPP, supporting multiple plate colors, SVM and ANN algorithms, and providing a B/S architecture with SQLite, Swagger documentation, and extensible image‑recognition features.

Computer VisionDeep LearningImage Processing
0 likes · 4 min read
Java Spring Boot License Plate Recognition and Training System (Open‑Source)
Kuaishou Large Model
Kuaishou Large Model
Dec 3, 2021 · Artificial Intelligence

How Can Your Face Reveal Heart Rate? Exploring rPPG Technology

This article explains the principles of remote photoplethysmography (rPPG), how facial skin color changes caused by heartbeats can be captured by a camera to measure heart rate, respiration, SpO₂ and other physiological signals, and reviews traditional and data‑driven algorithms for robust signal extraction.

AIComputer Visionheart rate detection
0 likes · 7 min read
How Can Your Face Reveal Heart Rate? Exploring rPPG Technology
Kuaishou Tech
Kuaishou Tech
Dec 1, 2021 · Industry Insights

Turning Sketches into Live AR Characters: Kuaishou’s All‑Things‑AR Technical Journey

This article details how Kuaishou transformed a user‑drawn sketch concept into the All‑Things‑AR feature, covering background inspiration, the end‑to‑end pipeline, data collection, mobile‑friendly segmentation model design, model optimizations, engineering integration, SLAM‑based camera localization, and final production results.

ARComputer VisionMobile Development
0 likes · 15 min read
Turning Sketches into Live AR Characters: Kuaishou’s All‑Things‑AR Technical Journey
21CTO
21CTO
Nov 27, 2021 · Artificial Intelligence

How Huawei’s “Genius Teen” Scaled AutoML to Millions of Phones

Huawei’s 201‑million‑yuan “genius teen” Zhong Zhao leveraged AutoML to deploy high‑precision image‑pixel processing algorithms across tens of millions of Mate and P series smartphones, pioneering large‑scale commercial use of AutoML and advancing mobile visual models with dynamic convolution kernels and adversarial data augmentation.

AutoMLComputer VisionDeep Learning
0 likes · 9 min read
How Huawei’s “Genius Teen” Scaled AutoML to Millions of Phones
DeWu Technology
DeWu Technology
Nov 18, 2021 · Artificial Intelligence

Background Complexity Detection for Sneaker Images Using MobileNet, FPN, and Modified SAM

The project presents a lightweight MobileNet‑FPN architecture enhanced with a modified spatial‑attention module that evaluates corner‑based self‑similarity to classify sneaker photo backgrounds, achieving 96% test accuracy—exceeding baseline CNN performance—and meeting business targets of over 80% hint accuracy and 90% mandatory enforcement.

CNNComputer VisionImage Processing
0 likes · 12 min read
Background Complexity Detection for Sneaker Images Using MobileNet, FPN, and Modified SAM
DataFunTalk
DataFunTalk
Nov 16, 2021 · Artificial Intelligence

InsightFace: Open‑Source 2D/3D Deep Face Analysis Toolbox with PaddlePaddle Support

InsightFace is an open‑source 2D/3D deep face analysis toolbox that implements a variety of detection, alignment and recognition algorithms, now supports PaddlePaddle with out‑of‑the‑box models, high‑throughput distributed training up to 60 million classes, and provides a one‑line demo script for quick testing.

ArcFaceComputer VisionDeep Learning
0 likes · 3 min read
InsightFace: Open‑Source 2D/3D Deep Face Analysis Toolbox with PaddlePaddle Support
Alibaba Terminal Technology
Alibaba Terminal Technology
Nov 15, 2021 · Artificial Intelligence

How AI Powers Smart Home Workouts on Mobile: Alibaba Sports’ Pose‑Tracking

Alibaba Sports’ AI-powered smart workout system transforms a simple smartphone and a few square meters of space into an interactive home fitness solution, using MNN‑based pose estimation to recognize and correct dozens of exercises, while addressing challenges like accuracy, performance, and automated testing.

AIAutomated TestingComputer Vision
0 likes · 11 min read
How AI Powers Smart Home Workouts on Mobile: Alibaba Sports’ Pose‑Tracking
Amap Tech
Amap Tech
Nov 4, 2021 · Artificial Intelligence

POI Signboard Image Retrieval: Technical Solution, Model Design, and Future Directions

To efficiently filter unchanged POI signboards, the authors propose a multimodal image‑retrieval system that combines enhanced global and local visual features with BERT‑encoded OCR text, using metric learning and alignment techniques to achieve over 95 % accuracy while handling occlusion, viewpoint variation, and subtle text changes.

Computer VisionDeep LearningMultimodal Learning
0 likes · 17 min read
POI Signboard Image Retrieval: Technical Solution, Model Design, and Future Directions
Laiye Technology Team
Laiye Technology Team
Sep 24, 2021 · Artificial Intelligence

Self‑Supervised Learning and Contrastive Methods for Computer Vision and OCR Applications

This article surveys self‑supervised learning techniques for computer‑vision tasks, explains common pretext tasks and contrastive loss designs, reviews representative models such as SimCLR, MoCo, SmAV and SimSiam, and demonstrates their practical impact on a captcha‑OCR system with measurable accuracy gains.

Computer VisionDeep LearningOCR
0 likes · 23 min read
Self‑Supervised Learning and Contrastive Methods for Computer Vision and OCR Applications
Kuaishou Tech
Kuaishou Tech
Sep 17, 2021 · Artificial Intelligence

SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer

SnowflakeNet introduces a novel Snowflake Point Deconvolution architecture combined with a Skip-Transformer to progressively split seed points, enabling high‑quality point‑cloud completion that preserves fine‑grained geometric details such as smooth surfaces, sharp edges, and corners across dense and sparse datasets.

3D reconstructionComputer VisionDeep Learning
0 likes · 10 min read
SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer
JD Retail Technology
JD Retail Technology
Sep 8, 2021 · Artificial Intelligence

ARShoe: Real-Time Augmented Reality Shoe Try-On System on Smartphones

The paper presents ARShoe, the first practical real‑time augmented reality shoe try‑on system for smartphones, detailing its multi‑branch neural network, foot pose estimation, rendering pipeline, a newly built foot dataset, and extensive experiments demonstrating high accuracy and over 30 FPS performance on multiple devices.

ARComputer VisionMobile
0 likes · 6 min read
ARShoe: Real-Time Augmented Reality Shoe Try-On System on Smartphones
Baidu Geek Talk
Baidu Geek Talk
Sep 8, 2021 · Artificial Intelligence

How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations

The article provides a comprehensive technical overview of PaddleOCR's PP‑OCRv2, detailing its five major algorithmic enhancements, performance improvements over previous versions, historical milestones, core capabilities, and links to the open‑source repositories for developers interested in state‑of‑the‑art OCR solutions.

Computer VisionModel OptimizationOCR
0 likes · 10 min read
How PP‑OCRv2 Boosts OCR Speed and Accuracy with Five Key Innovations
NetEase Smart Enterprise Tech+
NetEase Smart Enterprise Tech+
Sep 2, 2021 · Artificial Intelligence

How AI Detects Video Deepfakes: Techniques, Challenges, and Real-World Solutions

This article explores the rapid rise of AI‑generated video deepfakes, examines the four main manipulation techniques, discusses the inherent security risks, and presents NetEase Yidun’s comprehensive detection framework—including face‑detection‑based classification, semi‑supervised learning, feature fusion, and model distillation—to combat content‑security threats.

AI securityComputer VisionSemi-supervised Learning
0 likes · 12 min read
How AI Detects Video Deepfakes: Techniques, Challenges, and Real-World Solutions
Kuaishou Large Model
Kuaishou Large Model
Aug 30, 2021 · Artificial Intelligence

How Kuaishou’s Y‑Tech Fixes Background Distortion in Portrait Beautification

This article explains the challenges of background distortion caused by portrait beautification effects, describes Kuaishou Y‑Tech’s line‑segment‑based optimization framework that preserves line slopes and triangle shapes, and demonstrates the algorithm’s effectiveness through before‑and‑after visual results.

Computer VisionImage Processingbackground correction
0 likes · 11 min read
How Kuaishou’s Y‑Tech Fixes Background Distortion in Portrait Beautification
Beike Product & Technology
Beike Product & Technology
Aug 13, 2021 · Artificial Intelligence

AI-Powered Intelligent Testing Platform for Frontend UI Quality Assurance

The article describes how an AI-driven testing platform combines computer‑vision, OCR, and machine‑learning techniques to automatically detect frontend UI and backend‑related quality issues in mobile apps, outlines its architecture, core capabilities, deployment workflow, and reports successful real‑world deployments and future plans.

AI testingComputer Visionfrontend quality
0 likes · 11 min read
AI-Powered Intelligent Testing Platform for Frontend UI Quality Assurance
Alimama Tech
Alimama Tech
Aug 11, 2021 · Artificial Intelligence

Dynamic Descriptive Model: A Scalable Paradigm for High‑Quality Native Creative Generation

The Dynamic Descriptive Model (DDM) introduces a scalable pipeline that automatically harvests product assets, perceives their visual attributes, encodes designers’ expertise in an extended SVG‑based descriptive language, and generates high‑quality, native‑looking ad creatives at massive scale, delivering 5‑80 % CTR gains and tens of millions of daily outputs.

AIAdvertisingComputer Vision
0 likes · 13 min read
Dynamic Descriptive Model: A Scalable Paradigm for High‑Quality Native Creative Generation
MaGe Linux Operations
MaGe Linux Operations
Aug 9, 2021 · Artificial Intelligence

Top Python Libraries for Image Processing: A Practical Guide with Code

This article introduces the most popular Python image‑processing libraries, explains their core features, and provides ready‑to‑run code examples for tasks such as filtering, segmentation, and computer‑vision applications, helping readers quickly start working with images in Python.

Computer VisionImage ProcessingNumPy
0 likes · 9 min read
Top Python Libraries for Image Processing: A Practical Guide with Code
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 6, 2021 · Artificial Intelligence

I2UV-HandNet: High‑Fidelity 3D Hand Mesh Reconstruction from Monocular RGB Images

I2UV-HandNet reconstructs high-fidelity 3D hand meshes from a single RGB image using an AffineNet encoder‑decoder to predict coarse UV maps and an SRNet super‑resolution module, trained on the SuperHandScan dataset, achieving real‑time performance and state‑of‑the‑art benchmark results, and targeting integration into next‑generation VR headsets without external controllers.

3D meshComputer VisionDeep Learning
0 likes · 11 min read
I2UV-HandNet: High‑Fidelity 3D Hand Mesh Reconstruction from Monocular RGB Images
MaGe Linux Operations
MaGe Linux Operations
Jul 29, 2021 · Artificial Intelligence

Unlock Powerful Face Recognition with Python’s face_recognition Library

This article introduces the open‑source Python library face_recognition, explains how to install it, locate and extract faces, generate 128‑dimensional embeddings, compare faces, detect facial landmarks, apply virtual makeup, and build a simple custom face‑recognition application with complete code examples and visual results.

Computer VisionImage Processingface recognition
0 likes · 11 min read
Unlock Powerful Face Recognition with Python’s face_recognition Library
Python Programming Learning Circle
Python Programming Learning Circle
Jul 27, 2021 · Artificial Intelligence

Common Python Libraries for Image Processing: Overview and Code Examples

This article introduces the most widely used Python image‑processing libraries—including scikit‑image, NumPy, SciPy, Pillow, OpenCV‑Python, SimpleCV, Mahotas, SimpleITK, pgmagick, and Pycairo—explaining their key features and providing concise code snippets that demonstrate filtering, segmentation, enhancement, and computer‑vision tasks.

Computer VisionImage ProcessingNumPy
0 likes · 8 min read
Common Python Libraries for Image Processing: Overview and Code Examples
Test Development Learning Exchange
Test Development Learning Exchange
Jul 21, 2021 · Artificial Intelligence

Drawing Shapes on Images with OpenCV in Python

This tutorial demonstrates how to use OpenCV in Python to read an image and draw basic shapes such as rectangles and circles by specifying coordinates, dimensions, colors, and line thickness, then display the edited image in a window.

Computer VisionDrawing ShapesImage Processing
0 likes · 2 min read
Drawing Shapes on Images with OpenCV in Python
Test Development Learning Exchange
Test Development Learning Exchange
Jul 20, 2021 · Artificial Intelligence

Resizing Images with Python and OpenCV

This article demonstrates how to use Python's OpenCV library to read an image, display its original dimensions, resize it to a specified size, save the resized image, and handle user input to close the display windows.

Computer VisionImage ProcessingOpenCV
0 likes · 2 min read
Resizing Images with Python and OpenCV
Test Development Learning Exchange
Test Development Learning Exchange
Jul 17, 2021 · Artificial Intelligence

Face Recognition with OpenCV and Python

This tutorial explains the concept of facial recognition, describes how it works, and provides step‑by‑step instructions and code examples for implementing face detection and identification using OpenCV and Python, including installation, basic image handling, and a complete sample script.

Computer VisionOpenCVPython
0 likes · 4 min read
Face Recognition with OpenCV and Python
Kuaishou Large Model
Kuaishou Large Model
Jul 15, 2021 · Artificial Intelligence

How Kuaishou’s YKit AI SDK Powers Mass‑Production of Viral Effects

The article details Kuaishou Y‑tech's YKit AI SDK architecture, its unified interface, modular design, performance optimizations, and three real‑world case studies that illustrate how the SDK enables large‑scale, high‑quality short‑video effects across diverse devices while addressing challenges of effect variety, performance, and cost.

AI SDKARComputer Vision
0 likes · 14 min read
How Kuaishou’s YKit AI SDK Powers Mass‑Production of Viral Effects
21CTO
21CTO
Jul 14, 2021 · Artificial Intelligence

How a Chinese PhD’s Vision Research Earned a 2‑Million‑Yuan Huawei Offer

The article profiles Liao Minghui, a recent PhD graduate from Huazhong University of Science and Technology whose groundbreaking work in computer‑vision text detection earned him top honors, multiple patents, and a record‑breaking 2.01 million‑yuan annual salary offer from Huawei’s “Genius Youth” program.

Academic AchievementComputer VisionHuawei Recruitment
0 likes · 7 min read
How a Chinese PhD’s Vision Research Earned a 2‑Million‑Yuan Huawei Offer
Beike Product & Technology
Beike Product & Technology
Jul 8, 2021 · Artificial Intelligence

Raster‑to‑Vector Floorplan Reconstruction (R2V) for Standardized Housing Layouts

This article presents the motivation, definitions, related work, and a detailed R2V (Raster‑to‑Vector) modeling pipeline—including DNN segmentation, integer programming, and vector standardization—used by Beike to standardize diverse floor‑plan images, discusses challenges, and outlines future directions, while also noting recruitment opportunities.

Computer Visionfloorplaninteger optimization
0 likes · 20 min read
Raster‑to‑Vector Floorplan Reconstruction (R2V) for Standardized Housing Layouts
Youku Technology
Youku Technology
Jul 8, 2021 · Artificial Intelligence

Key Findings from Alibaba Moku Lab at ACM MM 2021

At ACM MM 2021, Alibaba’s Moku Lab presented four cutting‑edge studies: an interactive video inpainting system using user doodles, a decoupled IoU regression model for object detection, a spatio‑temporal distortion‑aware video quality assessment framework, and a multimodal emotional relationship recognition dataset and benchmark.

Computer VisionVideo Inpaintingmultimodal emotion recognition
0 likes · 8 min read
Key Findings from Alibaba Moku Lab at ACM MM 2021