Can Your Keyboard Secrets Be Heard? Inside the Keytap Acoustic Attack
This article explains how the open‑source Keytap project captures short audio snippets from a microphone to reconstruct typed characters, outlines its four‑step process of data collection, model building, keystroke detection, and character identification, and compares it with related acoustic eavesdropping research.
Keytap is a "black‑tech" that listens to the sound of your keyboard via the computer’s microphone and reconstructs the typed content. By capturing only 75‑100 ms of audio around each keystroke, the system builds a dataset of short audio clips.
Four‑Step Spy Method
1. Collect training data. 2. Build a prediction model and learn the data. 3. Detect when someone is typing. 4. Identify what they are typing.
Collect Training Data
Keytap records the audio 75‑100 ms before and after each keystroke, ignoring the rest of the signal. This sacrifices some information but reduces noise and data size. The waveform of a key press shows a primary peak followed by a smaller release peak about 150 ms later.
Data are visualized as short waveform snippets, but overlapping keystrokes within 75 ms can cause mixing.
Build a Prediction Model
The model aligns waveform peaks to compensate for latency, then refines alignment using a similarity metric (cross‑correlation). After alignment, a weighted average waveform is computed for each key, which serves as the reference for real‑time matching.
Cross‑correlation values (CC) indicate similarity; higher CC means a better match. Other similarity metrics could also be used.
Detect Keyboard Activity
An adaptive threshold monitors the raw audio stream for large peaks that indicate a keystroke. The threshold adjusts based on the average signal level of the past few hundred milliseconds.
Identify the Pressed Key
When a keystroke is detected, the system compares the captured snippet against the averaged waveforms using cross‑correlation. The key with the highest CC score is selected as the predicted character.
The method currently works best with mechanical keyboards.
"Chip‑Bag Spy"
Similar acoustic eavesdropping research includes the "Don’t Skype & Type!" paper, which extracts keystroke sounds from VoIP streams, and the "Visual Microphone" technique that recovers audio from high‑speed video of vibrating objects such as a chip bag.
These studies show that, with appropriate signal processing and machine‑learning models, it is possible to infer typed characters or spoken words from indirect acoustic or visual cues.
Resources
Blog: https://ggerganov.github.io/jekyll/update/2018/11/30/keytap-description-and-thoughts.html
Code: https://github.com/ggerganov/kbd-audio
Demo: https://ggerganov.github.io/jekyll/update/2018/11/24/keytap.html
Related paper (ASIACCS 2017): https://www.math.unipd.it/~dlain/papers/2017-skype.pdf
Visual Microphone paper (SIGGRAPH 2014): http://t.cn/EyZEZYI
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
