Mobile Development 13 min read

How Combined Gestures Revolutionize Mobile Video Controls

This article explores the design of combined gestures for Baidu App's mobile video player, detailing the concept, implementation of long‑press‑plus‑slide gestures, user testing, performance results, and future extensions to improve interaction efficiency and immersion.

Baidu MEUX
Baidu MEUX
Baidu MEUX
How Combined Gestures Revolutionize Mobile Video Controls

Introduction

Video players on mobile devices need to keep the interface minimal while providing rich playback controls; therefore, well‑designed gestures can reduce visible UI elements and enable faster access to functions.

What Is a “Combined Gesture”?

A combined gesture extends a basic gesture by linking two or more fundamental gestures. When the combination and context are appropriate, it allows users to reach functions more conveniently.

Traditional gestures follow a two‑stage process: step 1 – interaction signal → step 2 – task execution. This linear model limits extensibility for video scenarios.

We introduced an “intent recognition” stage, creating a three‑stage model: step 1 – interaction signal → step 2 – intent recognition → step 3 – task execution. Each stage can be composed of basic gesture branches.

Long‑Press Combined Gesture to Activate a Quick Menu

Project Background

The early Baidu App video player had few controls, with features like “download” hidden in a basic menu. As more functions were added, the menu became crowded, requiring a left swipe to reveal hidden items, leading to user complaints about difficulty finding functions.

Competitive Research and Selection

We identified three common long‑press interaction patterns in competitors: (A) long‑press opens an independent control panel, (B) long‑press triggers a floating overlay, and (C) long‑press directly activates a specific function.

We chose pattern B (floating overlay) as the basis for our redesign, aiming to surface high‑frequency controls while preserving the existing “fast‑forward” long‑press behavior.

Design Solution

We defined a “long‑press + slide selection” combined gesture:

Step 1: Long‑press creates a new mode, emitting an interaction signal and displaying a floating menu.

Step 2: If the finger remains pressed, the system recognizes intent and allows the user to slide to a target function.

Step 3: Releasing the finger confirms and executes the selected function.

Two variations were defined:

Long‑press + upward slide triggers the quick‑menu function.

Long‑press + downward slide triggers fast‑forward, preserving legacy behavior.

Fault‑Tolerance and Compatibility

In addition to the combined gesture, we support a tap fallback: if the user releases without sliding, a tap on the floating item still activates the corresponding control.

Usability Refinement

We conducted a small‑scale qualitative usability test with over ten participants. Most users mastered the “long‑press + slide” gesture after one or two attempts, confirming its learnability, though some details required further polishing.

Key refinements included:

Expanding the long‑press hot‑zone to cover most of the screen except system bars and existing long‑press buttons.

Allowing the floating menu to follow the finger’s vertical position, reducing movement distance.

Providing real‑time visual and haptic feedback to indicate intent recognition and selection.

Validation

We ran an A/B experiment: the control group kept the original long‑press fast‑forward behavior, while the test group used the long‑press quick‑menu with slide and tap modes. After two weeks, key metrics such as playback completion rate improved, and usage of high‑frequency functions increased significantly.

Further Exploration of Combined Gestures

Beyond the long‑press menu, we applied the combined‑gesture model to other video interactions:

Right‑swipe back gesture combined with upward drag activates “picture‑in‑picture” playback.

Two‑finger drag followed by two‑finger spread triggers “full‑screen” playback.

Conclusion

Designing convenient combined gestures reduces interaction steps and enhances immersion in mobile video playback. The case study demonstrates how a systematic gesture‑combination model, backed by user testing and data‑driven validation, can significantly improve user experience and functional efficiency.

User ExperienceInteraction Designusability testingcombined gesturesmobile gesturesvideo player UI
Baidu MEUX
Written by

Baidu MEUX

MEUX, Baidu Mobile Ecosystem UX Design Center, handling end-to-end experience design for user and commercial products in Baidu's mobile ecosystem. Send resumes to [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.