Tagged articles
76 articles
Page 1 of 1
Java Architect Essentials
Java Architect Essentials
Apr 17, 2026 · Backend Development

How to Integrate Tess4J OCR into a Spring Boot Application

This article explains OCR fundamentals, introduces Tesseract and its Java wrapper Tess4J, guides you through downloading language data, shows step‑by‑step Spring Boot integration with Maven dependencies and configuration classes, and provides test code for Chinese, English, and mixed‑language image recognition.

Language DataOCRSpring Boot
0 likes · 9 min read
How to Integrate Tess4J OCR into a Spring Boot Application
Java Architect Handbook
Java Architect Handbook
Apr 1, 2026 · Backend Development

Integrating Tess4j OCR into a Spring Boot 3 Project

This guide explains OCR fundamentals, introduces Tesseract and Tess4j, shows how to download the required language data files, and provides step‑by‑step instructions with Maven configuration, Spring Boot properties, Java code, and test examples for Chinese, English, and mixed‑language image recognition.

OCRSpring Bootimage recognition
0 likes · 11 min read
Integrating Tess4j OCR into a Spring Boot 3 Project
Architecture Digest
Architecture Digest
Mar 26, 2026 · Artificial Intelligence

How to Integrate Tess4j OCR into a Spring Boot 3 Application

This guide explains the fundamentals of OCR, introduces Tesseract and its Java wrapper Tess4j, shows how to download language data files, configure a Spring Boot 3 project with Maven dependencies and YAML settings, and provides comprehensive test code for Chinese, English, and mixed‑language image recognition.

Artificial IntelligenceOCRSpring Boot
0 likes · 9 min read
How to Integrate Tess4j OCR into a Spring Boot 3 Application
SpringMeng
SpringMeng
Mar 25, 2026 · Backend Development

How to Perform OCR in SpringBoot Using Tess4j

This tutorial explains OCR fundamentals, introduces Tesseract and its Java wrapper Tess4j, shows how to download language data, integrate Tess4j into a SpringBoot 3 project with Maven configuration, and provides test code for Chinese, English, and mixed‑language image recognition while highlighting performance considerations.

ConfigurationOCRSpringBoot
0 likes · 9 min read
How to Perform OCR in SpringBoot Using Tess4j
java1234
java1234
Mar 24, 2026 · Backend Development

How to Elegantly Perform OCR in Spring Boot 3 Using Tess4J

This tutorial explains OCR fundamentals, introduces the open‑source Tesseract engine and its Java wrapper Tess4J, shows how to download the required traineddata files, and provides step‑by‑step Spring Boot 3 integration, configuration, and test code for Chinese, English, and mixed‑language image recognition, plus important usage notes.

OCRSpring Bootimage recognition
0 likes · 8 min read
How to Elegantly Perform OCR in Spring Boot 3 Using Tess4J
Java Companion
Java Companion
Mar 22, 2026 · Backend Development

How to Seamlessly Integrate Tess4j OCR into a SpringBoot Application

This tutorial walks through the fundamentals of OCR, explains how to download the required Tesseract traineddata files, shows how to add Tess4j as a Maven dependency, configure SpringBoot with custom properties, and provides complete Java test code for Chinese, English, and mixed‑language image recognition, highlighting performance considerations and file‑naming requirements.

BackendOCRSpringBoot
0 likes · 9 min read
How to Seamlessly Integrate Tess4j OCR into a SpringBoot Application
Woodpecker Software Testing
Woodpecker Software Testing
Jan 21, 2026 · Artificial Intelligence

Build an AI Agent with FastAPI & Alibaba Cloud: Text Q&A, Image Recognition, and Text‑to‑Image

This guide walks through designing and implementing an AI assistant that connects FastAPI to Alibaba Cloud large‑model services, supports streaming text Q&A, image understanding, text‑to‑image generation, network search, and MCP‑based map queries, with full front‑end and back‑end code examples.

AI chatbotAlibaba CloudFastAPI
0 likes · 38 min read
Build an AI Agent with FastAPI & Alibaba Cloud: Text Q&A, Image Recognition, and Text‑to‑Image
Java Architecture Diary
Java Architecture Diary
May 19, 2025 · Artificial Intelligence

How Ollama 0.7 Unlocks Local Multimodal AI with One Command

Ollama 0.7 introduces a fully re‑engineered core that brings seamless multimodal model support, lists top visual models, showcases OCR and image analysis capabilities, explains technical breakthroughs, and provides a quick three‑step guide to deploy powerful local AI vision.

AI EngineeringAI modelsOllama
0 likes · 7 min read
How Ollama 0.7 Unlocks Local Multimodal AI with One Command
php Courses
php Courses
Apr 15, 2025 · Artificial Intelligence

Using PHP to Access a Camera and Perform Image Recognition

This article explains how to use PHP to control a camera via extensions such as OpenCV or FFmpeg, integrate image‑recognition libraries like Tesseract OCR, and apply these techniques to scenarios such as security monitoring, object detection, and facial‑recognition login, enhancing application intelligence.

AICameraOpenCV
0 likes · 6 min read
Using PHP to Access a Camera and Perform Image Recognition
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Mar 7, 2025 · Artificial Intelligence

How AI Turned My Chaotic Home Inventory into an Organized System

The author describes the problems of wasted storage, expired food, hard-to‑locate items, and duplicate purchases after moving house, then details an AI‑driven home inventory app built with Cursor, Trae, and large‑vision models that digitizes, classifies, and reminds about household goods, complete with architecture, implementation steps, and a comparative review of the AI tools used.

AICursorGPC classification
0 likes · 15 min read
How AI Turned My Chaotic Home Inventory into an Organized System
ByteFE
ByteFE
Mar 7, 2025 · Artificial Intelligence

AI-Powered Home Inventory Management Application: Design, Implementation, and Experience

This article describes the development of an AI-driven home inventory management tool that addresses storage waste, food expiration, item locating, and duplicate purchases by integrating barcode scanning, image recognition, intelligent classification, and multimodal models, while also comparing the performance of Cursor and Trae IDEs and Claude‑3.5‑sonnet versus deepseek‑r1 models.

AIbarcodehome inventory
0 likes · 17 min read
AI-Powered Home Inventory Management Application: Design, Implementation, and Experience
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Nov 25, 2024 · Artificial Intelligence

Get High-Quality OCR with Ollama-OCR in Just a Few Lines of Code

This guide shows how to set up the open‑source Ollama‑OCR tool, which leverages the Llama 3.2‑Vision multimodal model to perform high‑quality OCR, covering installation of Ollama, the vision model, the OCR package, and example code for plain‑text and Markdown outputs.

Llama 3.2-VisionMultimodal LLMNode.js
0 likes · 6 min read
Get High-Quality OCR with Ollama-OCR in Just a Few Lines of Code
Baidu Geek Talk
Baidu Geek Talk
Nov 25, 2024 · Artificial Intelligence

PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX

PP‑ShiTuV2, a PaddleX pipeline that integrates subject detection, deep feature encoding, and vector retrieval, delivers 91 % recall@1 on AliProducts, surpasses earlier models by over 20 points, runs efficiently on GPU and CPU, and offers simple installation, quick‑start code, and full fine‑tuning support.

Computer VisionDeep LearningModel Deployment
0 likes · 8 min read
PP-ShiTuV2: A General Image Recognition Pipeline in PaddleX
DaTaobao Tech
DaTaobao Tech
May 17, 2024 · Artificial Intelligence

Understanding Convolutional Neural Networks: Theory, Architecture, and Practical Techniques

The article explains CNN fundamentals—convolution, pooling, and fully‑connected layers—illustrates their implementation for American Sign Language letter recognition, details parameter calculations, demonstrates data augmentation and transfer learning techniques, and highlights how these methods boost image‑classification accuracy to around 92%.

CNNdata augmentationimage recognition
0 likes · 19 min read
Understanding Convolutional Neural Networks: Theory, Architecture, and Practical Techniques
php Courses
php Courses
May 10, 2024 · Artificial Intelligence

Using PHP to Operate a Camera and Perform Image Recognition

This article explains how to use PHP together with camera control libraries and image‑recognition tools such as OpenCV and Tesseract OCR to build intelligent applications, providing code examples and discussing practical use cases like security monitoring and face‑login.

CameraOpenCVPHP
0 likes · 5 min read
Using PHP to Operate a Camera and Perform Image Recognition
The Dominant Programmer
The Dominant Programmer
Mar 30, 2024 · Backend Development

Implement OCR in Spring Boot with Tess4J for Image Text Recognition

This guide shows how to integrate the open‑source Tesseract OCR engine into a Spring Boot application using the Tess4J Java wrapper, covering Chinese language data setup, Maven dependency configuration, bean creation, service implementation, and a unit test to verify image text extraction.

OCRSpring Bootimage recognition
0 likes · 6 min read
Implement OCR in Spring Boot with Tess4J for Image Text Recognition
Open Source Tech Hub
Open Source Tech Hub
Mar 13, 2024 · Artificial Intelligence

How to Use Google Gemini AI in PHP to Solve Image CAPTCHAs

This guide shows how to set up a PHP project, install the Gemini PHP client, and use Google Gemini's multimodal model to recognize text and solve image CAPTCHAs, providing complete code examples, dependency instructions, and sample outputs.

Artificial IntelligenceCaptchaGemini AI
0 likes · 6 min read
How to Use Google Gemini AI in PHP to Solve Image CAPTCHAs
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Dec 21, 2023 · Artificial Intelligence

Video and Image Technologies in NetEase Cloud Music: Architecture, Algorithms, and Applications

The article examines NetEase Cloud Music’s video and image technology stack—covering a four‑module architecture, algorithms for content understanding, intelligent production, moderation, and interactive effects—and explains how these systems enhance user experience, streamline backend processing, and position the platform for future AIGC‑driven innovations.

AI AlgorithmsMultimodal LearningVideo processing
0 likes · 11 min read
Video and Image Technologies in NetEase Cloud Music: Architecture, Algorithms, and Applications
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Dec 11, 2023 · Frontend Development

Bypassing Juejin Slider Captcha with Puppeteer and Canvas Image Recognition

This article demonstrates how to use Puppeteer and the Canvas API to automate login on Juejin, extract the slider captcha image, apply grayscale and binarization processing to locate the gap, calculate the required drag distance, and simulate human‑like mouse movements with easing functions for successful verification.

Captchaimage recognitionweb-scraping
0 likes · 17 min read
Bypassing Juejin Slider Captcha with Puppeteer and Canvas Image Recognition
MoonWebTeam
MoonWebTeam
Nov 9, 2023 · Mobile Development

Master Mobile E2E Testing with Appium: Setup, Principles, and Real‑World Examples

This comprehensive guide explains Appium’s cross‑platform architecture, walks through setting up an Android testing environment on macOS, demonstrates a full‑stack test case for an in‑app H5 page, and shares advanced techniques like a WebSocket‑based JS agent and OpenCV image‑recognition for challenging hybrid scenarios.

AndroidAppiumE2E automation
0 likes · 16 min read
Master Mobile E2E Testing with Appium: Setup, Principles, and Real‑World Examples
Huolala Tech
Huolala Tech
Sep 28, 2023 · Artificial Intelligence

How Mobile AI Transforms Logistics: Real‑World Image Algorithms at Huolala

This article explores Huolala's deployment of mobile AI image algorithms for driver document verification and vehicle sticker inspection, detailing model design, lightweighting, hybrid processing, data stream handling, and on‑device deployment that boost efficiency, privacy, and real‑time performance in logistics operations.

Edge ComputingLogisticsMobile AI
0 likes · 13 min read
How Mobile AI Transforms Logistics: Real‑World Image Algorithms at Huolala
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Aug 6, 2023 · Artificial Intelligence

Explaining Image Recognition: Logistic Regression and Convolutional Neural Networks

This article introduces the principles of image recognition, compares traditional logistic regression with convolutional neural networks, demonstrates their implementation using Python code, visualizes model weights, and explains key concepts such as padding, convolution, pooling, receptive fields, and multi‑layer feature extraction.

convolutional neural networkexplainable AIimage recognition
0 likes · 12 min read
Explaining Image Recognition: Logistic Regression and Convolutional Neural Networks
php Courses
php Courses
Jun 21, 2023 · Backend Development

Using PHP to Recognize QR Codes and Output Their Content

This article explains how to use the PHP library phpqrcode (via Zxing) to read QR code images, extract their text content, and display it in a web browser, including installation steps and sample code.

PHPQR codeimage recognition
0 likes · 5 min read
Using PHP to Recognize QR Codes and Output Their Content
Python Programming Learning Circle
Python Programming Learning Circle
Mar 21, 2023 · Artificial Intelligence

Analyzing WeChat Friend Data with Python: Gender, Avatar, Signature, and Location Insights

This tutorial demonstrates how to use Python libraries such as itchat, jieba, matplotlib, SnowNLP, and Tencent Youtu SDK to collect WeChat friend information and perform data analysis on gender distribution, avatar characteristics, signature text (including word‑cloud and sentiment analysis), and geographic location, presenting the results with visual charts and maps.

NLPWeChatdata-analysis
0 likes · 14 min read
Analyzing WeChat Friend Data with Python: Gender, Avatar, Signature, and Location Insights
Zhuanzhuan Tech
Zhuanzhuan Tech
Oct 20, 2022 · Artificial Intelligence

Automated Image Review System for Second‑Hand Product Listings on ZhiZhuan Platform

This article describes how ZhiZhuan’s B2C marketplace implemented an automated image review system using computer‑vision techniques such as image matching, regression and detection to verify product‑image consistency, clarity, anti‑tamper labels, cleanliness and centering, achieving a 50% reduction in manual workload.

image recognitionproduct verification
0 likes · 16 min read
Automated Image Review System for Second‑Hand Product Listings on ZhiZhuan Platform
Huolala Tech
Huolala Tech
Sep 10, 2022 · Artificial Intelligence

How AI Transforms Freight Safety: Real‑Time Risk Detection and Intervention

This article explains how AI technologies enable end‑to‑end freight safety monitoring, from pre‑trip and in‑trip risk identification to targeted interventions and governance, addressing challenges such as long‑tail data, small‑sample learning, fine‑grained classification, and multi‑level filtering.

AIDeep LearningLogistics
0 likes · 12 min read
How AI Transforms Freight Safety: Real‑Time Risk Detection and Intervention
DataFunTalk
DataFunTalk
Jul 12, 2022 · Artificial Intelligence

Applying Computer Vision for Content Safety in Live Streaming: Practices and Future Directions

This presentation details how Huya leverages computer‑vision algorithms to detect and mitigate risky content such as political, pornographic, and violent material in live‑streaming and short‑video platforms, describing system architecture, labeling strategies, algorithmic pipelines, real‑time moderation techniques, and future research directions.

AI SafetyComputer VisionRisk Detection
0 likes · 11 min read
Applying Computer Vision for Content Safety in Live Streaming: Practices and Future Directions
ITPUB
ITPUB
Jun 9, 2022 · Artificial Intelligence

How 58’s Multi‑Label Image Recognition Boosts Semantic Search and Recommendations

This article details the design, data pipeline, model architecture, loss functions, and evaluation metrics of a large‑scale multi‑label image classification system built for 58.com, showing how it improves semantic similarity detection, recommendation, and content moderation across diverse business domains.

Computer VisionDeep Learningasymmetric loss
0 likes · 18 min read
How 58’s Multi‑Label Image Recognition Boosts Semantic Search and Recommendations
DataFunTalk
DataFunTalk
May 28, 2022 · Artificial Intelligence

Adversarial Examples for Captcha: Techniques, Applications, and Future Directions

This article presents a comprehensive overview of adversarial example research applied to captcha systems, covering the definition and history of adversarial attacks, geometric‑aware generation frameworks, FGSM‑based attack variants, experimental results, trade‑offs between image quality and attack strength, and future work such as AdvGAN integration.

AI SafetyDeep LearningFGSM
0 likes · 14 min read
Adversarial Examples for Captcha: Techniques, Applications, and Future Directions
Code DAO
Code DAO
Dec 2, 2021 · Artificial Intelligence

Transfer Learning with ShuffleNetV2 for Flower Classification

This article walks through building a PyTorch ShuffleNetV2 model, preparing the Kaggle Flowers dataset, training with transfer learning on a GPU, visualizing loss and accuracy, and performing inference on five test images, achieving nearly 90% validation accuracy after 95 epochs.

CNNPyTorchShuffleNetV2
0 likes · 19 min read
Transfer Learning with ShuffleNetV2 for Flower Classification
Youzan Coder
Youzan Coder
Nov 5, 2021 · Artificial Intelligence

AI-Powered Image Recognition for Fresh Produce Retail: System Design and Implementation

An AI‑driven image‑recognition system using TensorFlow Lite cameras on checkout scales replaces barcode PLU lookup with hierarchical product categories, caches offline selections for incremental model updates, and delivers instant, offline‑capable identification, dramatically speeding fresh produce checkout, cutting labor costs, and offering a reusable framework for other retail sectors.

AIRetailTensorFlow
0 likes · 8 min read
AI-Powered Image Recognition for Fresh Produce Retail: System Design and Implementation
Tencent Cloud Developer
Tencent Cloud Developer
Jun 29, 2021 · Information Security

Tencent Cloud Object Storage Content Security: Comprehensive Multi-Modal Content Moderation Solution

Tencent Cloud Object Storage Content Security offers a comprehensive, multi‑modal moderation solution—leveraging YouTu Lab’s advanced image, video, audio and text analysis—to automatically detect and handle prohibited material across hundreds of violation types, providing one‑click task initiation, configurable callbacks, and visual tracking for platforms such as social media, online education, e‑commerce, and gaming.

AI content moderationAudio AnalysisContent Security
0 likes · 6 min read
Tencent Cloud Object Storage Content Security: Comprehensive Multi-Modal Content Moderation Solution
Baidu Geek Talk
Baidu Geek Talk
Jun 21, 2021 · Artificial Intelligence

Detecting Pornographic Videos with Dual‑Modal AI: Images + Audio

This article presents a technical overview of a multimodal AI framework that combines image and audio analysis to identify pornographic video content, detailing model architectures, feature extraction methods, and experimental results achieving 93.4% accuracy on a 3,000‑sample test set.

Audio AnalysisDeep LearningMultimodal AI
0 likes · 6 min read
Detecting Pornographic Videos with Dual‑Modal AI: Images + Audio
Youku Technology
Youku Technology
Mar 12, 2021 · Mobile Development

Intelligent Component Testing Solution for Youku Mobile App

Youku’s intelligent component‑testing solution for its mobile app combines mock‑driven data factories, image‑recognition layout verification, and a data‑driven automation framework to dramatically cut regression effort, boost test stability, and now automates over 60% of component cases while covering more than 90% of frequently used UI components.

UI verificationcomponent automationimage recognition
0 likes · 10 min read
Intelligent Component Testing Solution for Youku Mobile App
Youku Technology
Youku Technology
Mar 9, 2021 · Mobile Development

Design and Implementation of a Mobile Automation Testing Framework for Youku APP

The article describes how a three‑layer, cross‑platform mobile automation framework was designed and implemented for the Youku app, integrating driver, encapsulation, and test‑case layers with utilities, logging, image‑recognition and platform reporting to streamline regression testing, cut labor costs, and guide future enhancements.

Mobile AutomationTesting frameworkUI testing
0 likes · 9 min read
Design and Implementation of a Mobile Automation Testing Framework for Youku APP
Baidu Intelligent Testing
Baidu Intelligent Testing
Jan 27, 2021 · Artificial Intelligence

Baidu Mini‑Program Online Quality Assurance System: AI‑Driven Automated Traversal, Page Anomaly Detection, and Cloud‑Phone Cluster

This article describes how Baidu built an end‑to‑end online quality‑assurance platform for its mini‑program ecosystem, leveraging AI‑powered automated traversal, intelligent page‑exception detection, and a scalable cloud‑phone cluster to identify red‑line issues, improve audit efficiency, and reduce manual effort.

AIcloud phoneimage recognition
0 likes · 20 min read
Baidu Mini‑Program Online Quality Assurance System: AI‑Driven Automated Traversal, Page Anomaly Detection, and Cloud‑Phone Cluster
DataFunTalk
DataFunTalk
Dec 9, 2020 · Artificial Intelligence

WeChat Identify: From Object Detection to Large‑Scale Image Search – Technical Overview

This article details the evolution of WeChat’s Identify product, explaining its end‑to‑end image recognition pipeline—including object detection, multi‑label classification, mobile‑side detection, large‑scale retrieval, unsupervised clustering, and system architecture—while showcasing various application scenarios such as product, plant, and landmark recognition.

Computer VisionMobile AIWeChat
0 likes · 12 min read
WeChat Identify: From Object Detection to Large‑Scale Image Search – Technical Overview
21CTO
21CTO
Nov 3, 2020 · Artificial Intelligence

How Does Image Recognition Work? A Simple Guide to Core Principles

This article explains the fundamental principles of image recognition, covering how images are converted to numeric arrays, processed by scanning matrix blocks, and matched against patterns to identify objects such as text, faces, cats, dogs, or mice.

AI basicsComputer VisionConvolution
0 likes · 4 min read
How Does Image Recognition Work? A Simple Guide to Core Principles
Tencent Cloud Developer
Tencent Cloud Developer
Mar 30, 2020 · Information Security

How AI Powers Real-Time Content Moderation for Live Streams

With the surge in online content, Tencent Cloud’s content security team outlines a multi‑layered AI approach—ranging from MD5 matching to deep‑learning multi‑label and fine‑grained image analysis, audio VAD and speech models, and adaptive text filtering—to detect and mitigate unsafe live‑stream material.

AIAudio DetectionText Filtering
0 likes · 17 min read
How AI Powers Real-Time Content Moderation for Live Streams
Huajiao Technology
Huajiao Technology
Mar 3, 2020 · Mobile Development

Why UI Automation Matters for Mobile Apps and Using Appium with Cucumber

This article explains why UI automation testing is crucial for complex mobile apps, introduces Appium as a cross‑platform open‑source solution, demonstrates organizing test cases with Cucumber and Page Object patterns, details element locating strategies, custom steps, workflow architecture, and discusses current limitations and improvement plans.

AppiumCucumberPage Object
0 likes · 18 min read
Why UI Automation Matters for Mobile Apps and Using Appium with Cucumber
360 Quality & Efficiency
360 Quality & Efficiency
Jan 2, 2020 · Mobile Development

Common Element Locating Strategies in Appium for Mobile Automation

This article introduces Appium's basic element locating techniques—including id, name, class name, XPath, UIAutomator, and relative coordinates—explains how to handle non‑unique elements through iteration or OCR, and demonstrates image‑based locating with OpenCV and screenshot code examples.

AppiumElement LocatingMobile Automation
0 likes · 5 min read
Common Element Locating Strategies in Appium for Mobile Automation
Tencent Cloud Developer
Tencent Cloud Developer
Dec 26, 2019 · Artificial Intelligence

WeChat Scan-to-Identify (Scan Object) Feature: Overview, Technical Architecture, Data Construction, and Algorithmic Advances

WeChat’s iOS Scan‑to‑Identify feature lets users point a camera at any product or scene to instantly retrieve related e‑commerce, encyclopedia or news content, using a four‑pipeline architecture that builds massive annotated and deduplicated databases, advanced RetinaNet‑based detection, multi‑task metric learning, and scalable training, deployment and scheduling platforms, with plans to extend into domains like facial, vehicle and plant recognition.

AIComputer VisionWeChat
0 likes · 34 min read
WeChat Scan-to-Identify (Scan Object) Feature: Overview, Technical Architecture, Data Construction, and Algorithmic Advances
Tencent Cloud Developer
Tencent Cloud Developer
Sep 19, 2019 · Artificial Intelligence

Inside Tencent Cloud OCR: Architecture, Performance, and Integration Guide

The article provides a comprehensive overview of Tencent Cloud’s OCR platform, detailing its service architecture, product capabilities, integration methods, performance metrics, engineering improvements, testing automation, and operational considerations, offering developers practical insights into building and deploying OCR solutions on the cloud.

Cloud AIComputer VisionOCR
0 likes · 10 min read
Inside Tencent Cloud OCR: Architecture, Performance, and Integration Guide
360 Quality & Efficiency
360 Quality & Efficiency
Jun 28, 2019 · Operations

Using Sikuli for GUI Automation: Installation, Python Integration, and Practical Tips

This article introduces Sikuli, an image‑based GUI automation tool, explains its origins, provides download links, details installation steps, demonstrates Python integration via the Lackey library and SikuliX API, shares useful code snippets, and highlights common pitfalls and overall considerations for test automation.

GUI automationLackeyPython
0 likes · 6 min read
Using Sikuli for GUI Automation: Installation, Python Integration, and Practical Tips
Tencent Cloud Developer
Tencent Cloud Developer
Apr 16, 2019 · Artificial Intelligence

Building Image Recognition Systems: From Basics to Advanced AI Techniques

This article summarizes a computer‑vision salon where Dr. Ji Yongnan explains imaging pipelines, traditional feature‑based methods, deep‑learning breakthroughs, Tencent Cloud AI services, real‑world case studies, and answers audience questions about machine‑vision versus computer‑vision and data‑scarcity challenges.

AI applicationsComputer VisionDeep Learning
0 likes · 18 min read
Building Image Recognition Systems: From Basics to Advanced AI Techniques
iQIYI Technical Product Team
iQIYI Technical Product Team
Dec 28, 2018 · Artificial Intelligence

AI‑Driven Visual Automation Testing Frameworks: Challenges, Opportunities, and the Aion Solution

The article examines shortcomings of traditional visual automation frameworks—weak cross‑platform support, ID dependence, and fragile screenshot matching—and shows how Aion’s hybrid approach, merging image‑processing segmentation with deep‑learning classification and OCR, delivers a more stable, cross‑platform, “visible‑to‑obtain” testing solution while acknowledging remaining accuracy challenges.

AI testingOCRUI2Code
0 likes · 11 min read
AI‑Driven Visual Automation Testing Frameworks: Challenges, Opportunities, and the Aion Solution
Java Captain
Java Captain
Sep 26, 2018 · Artificial Intelligence

Step-by-Step Guide to Using Baidu OCR API with Java

This article provides a comprehensive Java tutorial for accessing Baidu's OCR service, covering prerequisite setup, Maven dependencies, token acquisition, image-to‑Base64 conversion, HTTP request construction, and performance observations for Chinese, English, and mixed‑language image recognition.

APIBaidu OCRBase64
0 likes · 9 min read
Step-by-Step Guide to Using Baidu OCR API with Java
360 Tech Engineering
360 Tech Engineering
May 17, 2018 · Artificial Intelligence

Applying Image Recognition in UI Automation Testing with Sikuli

This article introduces how image‑recognition techniques, particularly using the Sikuli tool, can be applied to UI automation testing for both web and mobile applications, covering practical scenarios, core principles, a suite of useful functions, example code, and the advantages and limitations of the approach.

Computer VisionSikuliUI automation
0 likes · 7 min read
Applying Image Recognition in UI Automation Testing with Sikuli
360 Quality & Efficiency
360 Quality & Efficiency
May 16, 2018 · Fundamentals

Applying Image Recognition in UI Automation Testing with Sikuli

This article introduces the use of image‑recognition techniques, particularly the Sikuli tool, for UI automation testing, covering typical scenarios, underlying principles, key functions such as Find, click, wait, and type, as well as example code, and discusses the advantages and limitations of this approach.

Computer VisionJythonSikuli
0 likes · 7 min read
Applying Image Recognition in UI Automation Testing with Sikuli
Ctrip Technology
Ctrip Technology
Mar 22, 2018 · Artificial Intelligence

Poetry Generation from Images: Design, Implementation, and Evaluation of Ctrip’s “Xiao Shi Ji” System

The article presents Ctrip’s “Xiao Shi Ji” system that combines large‑scale tourism knowledge graphs, image recognition, and deep‑learning‑based poetry generation to automatically compose Chinese classical poems from photos, evaluates its performance against human poets, and discusses the underlying AI techniques.

Poetry Generationimage recognition
0 likes · 14 min read
Poetry Generation from Images: Design, Implementation, and Evaluation of Ctrip’s “Xiao Shi Ji” System
21CTO
21CTO
Jan 6, 2018 · Artificial Intelligence

How Image Recognition Transforms Our World: Principles, Processes, and Future

This article explains the fundamentals of image recognition technology, its underlying principles, processing steps, neural‑network and nonlinear‑dimensionality‑reduction approaches, and highlights its wide‑range applications and future potential across many industries.

AIComputer VisionNeural Networks
0 likes · 11 min read
How Image Recognition Transforms Our World: Principles, Processes, and Future
21CTO
21CTO
Dec 19, 2017 · Artificial Intelligence

How Deep Neural Networks Decode Images: From CNNs to RNNs

This article explains the fundamental principles behind deep neural networks for image recognition, covering convolutional and recurrent architectures, their training processes, feature extraction mechanisms, and the emerging ability to generate automatic image captions.

Deep LearningRecurrent Neural Networkconvolutional neural network
0 likes · 13 min read
How Deep Neural Networks Decode Images: From CNNs to RNNs
Baidu Intelligent Testing
Baidu Intelligent Testing
Oct 27, 2017 · Mobile Development

From Zero to a Universal Android Script Testing Solution: Mixed‑Script Automation, Image‑Recognition, and Recording Tools

The article details how Baidu MTC designed and implemented a universal Android script testing platform that combines UIAutomator, a custom Clean‑SDK for popup handling, image‑recognition algorithms, and a recording‑playback tool to enable robust, non‑native mobile automated testing across thousands of devices.

AndroidScript RecordingUIAutomator
0 likes · 12 min read
From Zero to a Universal Android Script Testing Solution: Mixed‑Script Automation, Image‑Recognition, and Recording Tools
Architecture Digest
Architecture Digest
Sep 30, 2017 · Artificial Intelligence

Overview of Prominent Deep Learning Architectures for Computer Vision

This article surveys recent progress in deep learning by presenting key computer‑vision architectures such as AlexNet, VGG, GoogleNet, ResNet, ResNeXt, RCNN, YOLO, SqueezeNet, SegNet and GANs, providing brief descriptions, their advantages, and links to original papers and Keras implementations.

Computer VisionDeep LearningKeras
0 likes · 16 min read
Overview of Prominent Deep Learning Architectures for Computer Vision
Qunar Tech Salon
Qunar Tech Salon
Dec 5, 2016 · Artificial Intelligence

Understanding Convolutional Neural Networks for OCR and CAPTCHA Recognition

This article introduces the fundamentals of neural networks for image recognition, explains regression vs classification, describes convolution, pooling and fully connected layers, illustrates the classic LeNet‑5 model on the MNIST dataset, and shows how a TensorFlow‑based CNN can be trained to recognize CAPTCHA images, achieving high accuracy.

CNNCaptchaLeNet-5
0 likes · 10 min read
Understanding Convolutional Neural Networks for OCR and CAPTCHA Recognition
Baidu Intelligent Testing
Baidu Intelligent Testing
Apr 21, 2016 · Mobile Development

Integrating OpenCV with Appium for Automated Game Testing on Mobile Devices

This article describes how the MMGame testing team combined the open‑source Appium automation framework with OpenCV's image‑recognition capabilities to enable coordinate‑based testing of third‑party mobile games that lack accessible UI elements, detailing the workflow, implementation, results, and a comparison with other mobile testing tools.

AkazeAppiumMobile Automation
0 likes · 16 min read
Integrating OpenCV with Appium for Automated Game Testing on Mobile Devices
Ctrip Technology
Ctrip Technology
Jun 19, 2015 · Artificial Intelligence

Bank Card Scanning and Recognition Project Overview

This article describes a mobile payment‑focused bank card OCR project that extends an open‑source solution to support Chinese 19‑digit debit cards by introducing new algorithms for vertical coordinate detection, background filtering, single‑character recognition, and Luhn‑based checksum validation.

AILuhn algorithmbank card OCR
0 likes · 7 min read
Bank Card Scanning and Recognition Project Overview
Baidu Tech Salon
Baidu Tech Salon
May 9, 2014 · Artificial Intelligence

Connecting People and Services Through Visual Recognition: Insights from Baidu's Tech Salon

At Baidu’s Xierqi Night Talk, senior developers learned how the company’s new “Light Tap” visual‑recognition platform and open cloud services aim to link people with everyday services through camera‑based interactions, positioning image recognition as the leading O2O connection method over QR codes, NFC, and voice.

Artificial IntelligenceBaidu technologycloud computing
0 likes · 10 min read
Connecting People and Services Through Visual Recognition: Insights from Baidu's Tech Salon