Building a Hands‑Free Voice Assistant with Neuron AI’s Multimodal Audio Providers

This guide explains how to use Neuron v3’s multimodal audio capabilities—including OpenAI and ElevenLabs text‑to‑speech and speech‑to‑text providers—to create a local, hands‑free voice assistant that captures audio, transcribes it, processes it via an agent, and plays back responses.

Open Source Tech Hub
Open Source Tech Hub
Open Source Tech Hub
Building a Hands‑Free Voice Assistant with Neuron AI’s Multimodal Audio Providers

Multimodal Support in Neuron v3

Neuron v3 adds full multimodal capabilities, allowing both audio input and output to be used inside AI agents. Audio components implement AIProviderInterface, so they can be integrated into an agent workflow and benefit from middleware, safety guards, and other agent features.

Typical Local Voice Assistant Flow

Capture audio from a microphone.

Send the audio to a Speech‑To‑Text (STT) service to obtain a transcript.

Pass the transcript to an Agent for processing.

Convert the agent’s textual reply to audio with a Text‑To‑Speech (TTS) service and play it.

Using an Audio Provider as an Agent

namespace App\Neuron;

use NeuronAI\Agent\Agent;
use NeuronAI\Chat\Messages\UserMessage;
use NeuronAI\Providers\AIProviderInterface;
use NeuronAI\Providers\OpenAI\Audio\OpenAITextToSpeech;

class MyAgent extends Agent {
    protected function provider(): AIProviderInterface {
        return new OpenAITextToSpeech(
            key: 'OPENAI_API_KEY',
            model: 'gpt-4o-mini-tts',
            voice: 'alloy',
        );
    }
}

// Run the agent
$message = MyAgent::make()
    ->chat(new UserMessage('Hi!'))
    ->getMessage();

$audioBase64 = $message->getAudio()->getContent();
file_put_contents(__DIR__ . '/assets/speech.mp3', base64_decode($audioBase64));

Direct Use of a TTS Provider

$provider = new OpenAITextToSpeech(
    key: 'OPENAI_API_KEY',
    model: 'gpt-4o-mini-tts',
    voice: 'alloy',
);

$message = $provider->chat(new UserMessage("Hi, I'm the creator of Neuron AI framework!"));
$audioBase64 = $message->getAudio()->getContent();
file_put_contents(__DIR__ . '/assets/speech.mp3', base64_decode($audioBase64));

OpenAI Audio Providers

Text‑to‑Speech

use NeuronAI\Providers\OpenAI\Audio\OpenAITextToSpeech;

$provider = new OpenAITextToSpeech(
    key: 'OPENAI_API_KEY',
    model: 'gpt-4o-mini-tts',
    voice: 'alloy',
);

$message = $provider->chat(new UserMessage('Hello from Neuron AI!'));
$audioBase64 = $message->getAudio();
file_put_contents(__DIR__ . '/assets/speech.mp3', base64_decode($audioBase64));

Speech‑to‑Text

use NeuronAI\Providers\OpenAI\Audio\OpenAISpeechToText;

$provider = new OpenAISpeechToText(
    key: 'OPENAI_API_KEY',
    model: 'gpt-4o-transcribe',
);

$message = $provider->chat(new UserMessage([
    new TextContent('This audio is about a math lesson. Take care of the technical words.'),
    new AudioContent(__DIR__ . '/assets/intro.mp3', SourceType::URL)
]));

echo $message->getContent();

ElevenLabs Audio Providers

Text‑to‑Speech

use NeuronAI\Providers\ElevenLabs\ElevenLabsTextToSpeech;

$provider = new ElevenLabsTextToSpeech(
    key: 'ELEVENLABS_API_KEY',
    model: 'eleven_multilingual_v2', // adjust per ElevenLabs documentation
    voice: 'Rachel',
);

$message = $provider->chat(new UserMessage('Hello from Neuron AI!'));
$audioBase64 = $message->getAudio();
file_put_contents(__DIR__ . '/assets/speech.mp3', base64_decode($audioBase64));

Speech‑to‑Text

use NeuronAI\Providers\ElevenLabs\ElevenLabsSpeechToText;

$provider = new ElevenLabsSpeechToText(
    key: 'ELEVENLABS_API_KEY',
    model: 'whisper-1', // example; verify actual model name
);

$message = $provider->chat(new UserMessage(
    new AudioContent(__DIR__ . '/assets/intro.mp3', SourceType::URL)
));

echo $message->getContent();
AgentPHPOpenAImultimodaltext-to-speechspeech-to-textNeuron AIElevenLabs
Open Source Tech Hub
Written by

Open Source Tech Hub

Sharing cutting-edge internet technologies and practical AI resources.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.