Control Web Pages with Hand Gestures Using Tampermonkey & MediaPipe
This tutorial shows how to inject JavaScript with Tampermonkey and use MediaPipe's hand‑gesture recognition to enable air‑gesture page scrolling, cursor movement, and click simulation, turning any web page into a touch‑free interface.
Introduction
Some readers asked whether the ability of Tampermonkey to inject arbitrary front‑end JavaScript into a page could be combined with hand‑gesture recognition to achieve remote page control, similar to air‑page turning on a phone. The author explored this idea and implemented it.
Features
Up/down page scrolling controlled by hand gestures: open left hand to scroll down, fist left hand to scroll up.
A simulated cursor moves with the right hand.
Fist gesture with the right hand triggers click actions.
Additional gestures include a two‑hand "peace" sign to close the current page, a left thumb up with the right hand to zoom, and many others.
Implementation Principle
The solution simply combines Tampermonkey and MediaPipe hand‑gesture recognition .
Tampermonkey
Tampermonkey is a browser extension that lets users inject custom JavaScript into pages at load time, enabling enhancement, modification, or automation of web behavior. It can be used for auto‑login, data scraping, ad blocking, and more.
MediaPipe Hand‑Gesture Recognition
MediaPipe provides a library of AI and ML tools, including hand‑gesture detection. The demo uses the @mediapipe/tasks-vision NPM package to obtain a gesture recognizer.
// Create task for image file processing:
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm "
);
const gestureRecognizer = await GestureRecognizer.createFromOptions(vision, {
baseOptions: {
modelAssetPath: "https://storage.googleapis.com/mediapipe-tasks/gesture_recognizer/gesture_recognizer.task"
},
numHands: 2
});Combining Both
By injecting the MediaPipe gesture code via a Tampermonkey script, the gesture recognizer runs on any web page. The script requests camera permission; the video preview is hidden to avoid visual clutter.
Gesture detection works by analyzing key point coordinates returned by MediaPipe. Simple distance checks differentiate gestures such as open hand, fist, and victory sign.
// Determine if hand is open
function isHandOpen(hand) {
const fingers = [[8,5],[12,9],[16,13],[20,17]];
return fingers.filter(([tip,base])=>dist(hand[tip],hand[base])>0.1).length>=4;
}
// Determine if hand is a fist
function isFist(hand) {
const fingers = [[8,5],[12,9],[16,13],[20,17]];
return fingers.filter(([tip,base])=>dist(hand[tip],hand[base])<0.06).length>=3;
}
// Victory sign
function isVictory(hand) {
const extended=[8,12];
const folded=[16,20];
return (
extended.every(i=>dist(hand[i],hand[i-3])>0.1) &&
folded.every(i=>dist(hand[i],hand[i-3])<0.05)
);
}The hand object comes from MediaPipe and contains the positions of key landmarks; custom logic maps these to desired actions.
Further Learning
Explore the official MediaPipe demo and NPM package documentation for more gestures and features. For deeper Tampermonkey scripting, refer to the "Tampermonkey Script Practical Guide".
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
