Information Security 8 min read

Can Your Keyboard Secrets Be Heard? Inside the Keytap Acoustic Attack

This article explains how the open‑source Keytap project captures short audio snippets from a microphone to reconstruct typed characters, outlines its four‑step process of data collection, model building, keystroke detection, and character identification, and compares it with related acoustic eavesdropping research.

21CTO

Dec 3, 2018

Can Your Keyboard Secrets Be Heard? Inside the Keytap Acoustic Attack

Keytap is a "black‑tech" that listens to the sound of your keyboard via the computer’s microphone and reconstructs the typed content. By capturing only 75‑100 ms of audio around each keystroke, the system builds a dataset of short audio clips.

Four‑Step Spy Method

1. Collect training data. 2. Build a prediction model and learn the data. 3. Detect when someone is typing. 4. Identify what they are typing.

Collect Training Data

Keytap records the audio 75‑100 ms before and after each keystroke, ignoring the rest of the signal. This sacrifices some information but reduces noise and data size. The waveform of a key press shows a primary peak followed by a smaller release peak about 150 ms later.

Data are visualized as short waveform snippets, but overlapping keystrokes within 75 ms can cause mixing.

Build a Prediction Model

The model aligns waveform peaks to compensate for latency, then refines alignment using a similarity metric (cross‑correlation). After alignment, a weighted average waveform is computed for each key, which serves as the reference for real‑time matching.

Cross‑correlation values (CC) indicate similarity; higher CC means a better match. Other similarity metrics could also be used.

Detect Keyboard Activity

An adaptive threshold monitors the raw audio stream for large peaks that indicate a keystroke. The threshold adjusts based on the average signal level of the past few hundred milliseconds.

Identify the Pressed Key

When a keystroke is detected, the system compares the captured snippet against the averaged waveforms using cross‑correlation. The key with the highest CC score is selected as the predicted character.

The method currently works best with mechanical keyboards.

"Chip‑Bag Spy"

Similar acoustic eavesdropping research includes the "Don’t Skype & Type!" paper, which extracts keystroke sounds from VoIP streams, and the "Visual Microphone" technique that recovers audio from high‑speed video of vibrating objects such as a chip bag.

These studies show that, with appropriate signal processing and machine‑learning models, it is possible to infer typed characters or spoken words from indirect acoustic or visual cues.

Resources

Blog: https://ggerganov.github.io/jekyll/update/2018/11/30/keytap-description-and-thoughts.html

Code: https://github.com/ggerganov/kbd-audio

Demo: https://ggerganov.github.io/jekyll/update/2018/11/24/keytap.html

Related paper (ASIACCS 2017): https://www.math.unipd.it/~dlain/papers/2017-skype.pdf

Visual Microphone paper (SIGGRAPH 2014): http://t.cn/EyZEZYI

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning information security acoustic side-channel audio keylogging keyboard eavesdropping

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.