Artificial Intelligence 25 min read

How to Build a Raspberry Pi Baby Cry Detector with TensorFlow and Open‑Source Tools

This guide shows how to turn a Raspberry Pi into an automated baby monitor that records audio, trains a TensorFlow sound‑detection model, generates labeled datasets, runs real‑time inference, and sends push notifications via Platypush, while also integrating a camera and audio streaming.

Programmer DD

Jan 22, 2021

How to Build a Raspberry Pi Baby Cry Detector with TensorFlow and Open‑Source Tools

Becoming a parent often means handling many tasks that are not yet automated; this article explores turning those tasks—especially baby‑cry detection—into an automated solution using a Raspberry Pi, inexpensive USB microphones, and open‑source software.

Ideal Baby‑monitor Features

Runs on cheap hardware (e.g., Raspberry Pi with a USB mic).

Detects baby crying and notifies you (e.g., via phone) or logs events.

Plays audio on any device (speaker, phone, computer) without moving the speaker.

Includes a camera for visual checks or short video clips when crying starts.

Recording Audio Samples

First, install TensorFlow on a Raspberry Pi (model 3 or newer) and set up Linux on the SD card. Connect a compatible USB microphone.

[sudo] apt-get install ffmpeg lame libatlas-base-dev alsa-utils
[sudo] pip3 install tensorflow

Use arecord -l to list capture devices, then record with the desired mic (e.g., plughw:2,0):

arecord -D plughw:2,0 -c 1 -f cd | lame - audio.mp3

Record several minutes to hours of baby‑room audio, ensuring you capture quiet periods, crying, and other background sounds. Stop recording with Ctrl C and repeat over multiple days.

Labeling Audio Samples

Copy the recordings to a computer and organize them:

~/datasets/sound-detect/audio
  ├─ sample_1
  │   ├─ audio.mp3
  │   └─ labels.json
  ├─ sample_2
  │   ├─ audio.mp3
  │   └─ labels.json
  ...

Create a labels.json for each sample, marking timestamps with "positive" (crying) or "negative" (no cry):

{
  "00:00": "negative",
  "02:13": "positive",
  "04:57": "negative",
  "15:41": "positive",
  "18:24": "negative"
}

Generating the Dataset

Clone and install the micmon library, which extracts FFT features and stores them in compressed .npz files.

git clone [email protected]:/BlackLight/micmon.git
cd micmon
[sudo] pip3 install -r requirements.txt
[sudo] python3 setup.py build install

Run the data‑generation command, specifying frequency range (250‑2500 Hz), number of bins, and sample duration (2 s):

micmon-datagen \
    --low 250 --high 2500 --bins 100 \
    --sample-duration 2 --channels 1 \
    ~/datasets/sound-detect/audio  ~/datasets/sound-detect/data

The command creates .npz files in ~/datasets/sound-detect/data, each containing the spectral signature of a 2‑second audio slice.

Training the Model

Use the provided micmon Python API to load the datasets and train a TensorFlow/Keras model.

import os
from tensorflow.keras import layers
from micmon.dataset import Dataset
from micmon.model import Model

datasets_dir = os.path.expanduser('~/datasets/sound-detect/data')
model_dir = os.path.expanduser('~/models/sound-detect')
epochs = 2

datasets = Dataset.scan(datasets_dir, validation_split=0.3)
labels = ['negative', 'positive']
freq_bins = len(datasets[0].samples[0])

model = Model([
    layers.Input(shape=(freq_bins,)),
    layers.Dense(2 * freq_bins, activation='relu'),
    layers.Dense(int(0.75 * freq_bins), activation='relu'),
    layers.Dense(len(labels), activation='softmax'),
], labels=labels, low_freq=datasets[0].low_freq, high_freq=datasets[0].high_freq)

for epoch in range(epochs):
    for i, dataset in enumerate(datasets):
        print(f'[epoch {epoch+1}/{epochs}] [audio sample {i+1}/{len(datasets)}]')
        model.fit(dataset)
        evaluation = model.evaluate(dataset)
        print(f'Validation loss and accuracy: {evaluation}')

model.save(model_dir, overwrite=True)

After training, a model with >96 % accuracy can be obtained from about 5 hours of baby‑room recordings.

Running Inference on the Pi

Load the saved model and process microphone frames in real time:

import os
from micmon.audio import AudioDevice
from micmon.model import Model

model_dir = os.path.expanduser('~/models/sound-detect')
model = Model.load(model_dir)

audio_device = 'plughw:2,0'  # replace with your mic
with AudioDevice(system='alsa', device=audio_device) as source:
    for sample in source:
        source.pause()
        prediction = model.predict(sample)
        print(prediction)
        source.resume()

The script prints negative when no cry is detected and positive otherwise.

Sending Push Notifications with Platypush

Install Redis and Platypush with HTTP and Pushbullet support:

[sudo] apt-get install redis-server
[sudo] systemctl start redis-server.service
[sudo] systemctl enable redis-server.service
[sudo] pip3 install 'platypush[http,pushbullet]'

Configure Pushbullet in ~/.config/platypush/config.yaml:

backend.http:
  enabled: True
pushbullet:
  token: YOUR_TOKEN

Modify the detection script to emit a CustomEvent instead of printing:

#!/usr/bin/python3
import argparse, logging, os, sys
from platypush import RedisBus
from platypush.message.event.custom import CustomEvent
from micmon.audio import AudioDevice
from micmon.model import Model

# (argument parsing omitted for brevity)

model = Model.load(model_dir)
bus = RedisBus()
with AudioDevice(system=args.sound_server, device=args.sound_device,
                sample_duration=args.sample_duration, sample_rate=args.sample_rate,
                channels=args.channels, ffmpeg_bin=args.ffmpeg_bin, debug=args.debug) as source:
    for sample in source:
        source.pause()
        prediction = model.predict(sample)
        # sliding‑window logic omitted for brevity
        if has_change:
            evt = CustomEvent(subtype=args.event_type, state=prediction)
            bus.post(evt)
        source.resume()

Create hook scripts that react to the baby-cry event and send Pushbullet notes:

from platypush.context import get_plugin
from platypush.event.hook import hook
from platypush.message.event.custom import CustomEvent

@hook(CustomEvent, subtype='baby-cry', state='positive')
def on_baby_cry_start(event, **_):
    pb = get_plugin('pushbullet')
    pb.send_note(title='Baby cry status', body='The baby is crying!')

@hook(CustomEvent, subtype='baby-cry', state='negative')
def on_baby_cry_stop(event, **_):
    pb = get_plugin('pushbullet')
    pb.send_note(title='Baby cry status', body='The baby stopped crying – good job!')

Adding a Camera Stream

Install the PiCamera integration:

[sudo] pip3 install 'platypush[http,camera,picamera]'

Configure the camera in ~/.config/platypush/config.yaml:

camera.pi:
  listen_port: 5001

Start streaming automatically with a hook that runs on ApplicationStartedEvent:

from platypush.context import get_plugin
from platypush.event.hook import hook
from platypush.message.event.application import ApplicationStartedEvent

@hook(ApplicationStartedEvent)
def on_application_started(event, **_):
    cam = get_plugin('camera.pi')
    cam.start_streaming()

View the video via http://raspberry-pi:8008/camera/pi/video.mjpg or with VLC ( tcp/h264://raspberry-pi:5001).

Audio Streaming for Remote Listening

Clone and install micstream to serve live MP3 streams:

git clone https://github.com/BlackLight/micstream.git
cd micstream
[sudo] python3 setup.py install

Start a stream from the third audio device, writing to /baby.mp3, at 96 kbps on port 8088:

micstream -i plughw:3,0 -e '/baby.mp3' -b 96 -p 8088

Clients can listen at http://your‑rpi:8088/baby.mp3.

Putting It All Together

Create a systemd user service that runs the detection script with appropriate arguments (e.g., -i plughw:2,0 -e baby-cry -w 10 -n 2 ~/models/sound-detect) and enable it to start on boot. The service monitors audio, triggers Platypush events, sends push notifications, and can optionally start the camera stream.

With this setup, you receive instant alerts when your baby cries, can view live video, and listen to the raw audio stream, while the underlying machine‑learning model can be retrained for any other sound‑detection use case.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

automation TensorFlow Audio Processing Raspberry Pi baby monitor sound detection

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.