Building a Hands‑Free Voice Assistant with Neuron AI’s Multimodal Audio Providers
This guide explains how to use Neuron v3’s multimodal audio capabilities—including OpenAI and ElevenLabs text‑to‑speech and speech‑to‑text providers—to create a local, hands‑free voice assistant that captures audio, transcribes it, processes it via an agent, and plays back responses.
Multimodal Support in Neuron v3
Neuron v3 adds full multimodal capabilities, allowing both audio input and output to be used inside AI agents. Audio components implement AIProviderInterface, so they can be integrated into an agent workflow and benefit from middleware, safety guards, and other agent features.
Typical Local Voice Assistant Flow
Capture audio from a microphone.
Send the audio to a Speech‑To‑Text (STT) service to obtain a transcript.
Pass the transcript to an Agent for processing.
Convert the agent’s textual reply to audio with a Text‑To‑Speech (TTS) service and play it.
Using an Audio Provider as an Agent
namespace App\Neuron;
use NeuronAI\Agent\Agent;
use NeuronAI\Chat\Messages\UserMessage;
use NeuronAI\Providers\AIProviderInterface;
use NeuronAI\Providers\OpenAI\Audio\OpenAITextToSpeech;
class MyAgent extends Agent {
protected function provider(): AIProviderInterface {
return new OpenAITextToSpeech(
key: 'OPENAI_API_KEY',
model: 'gpt-4o-mini-tts',
voice: 'alloy',
);
}
}
// Run the agent
$message = MyAgent::make()
->chat(new UserMessage('Hi!'))
->getMessage();
$audioBase64 = $message->getAudio()->getContent();
file_put_contents(__DIR__ . '/assets/speech.mp3', base64_decode($audioBase64));Direct Use of a TTS Provider
$provider = new OpenAITextToSpeech(
key: 'OPENAI_API_KEY',
model: 'gpt-4o-mini-tts',
voice: 'alloy',
);
$message = $provider->chat(new UserMessage("Hi, I'm the creator of Neuron AI framework!"));
$audioBase64 = $message->getAudio()->getContent();
file_put_contents(__DIR__ . '/assets/speech.mp3', base64_decode($audioBase64));OpenAI Audio Providers
Text‑to‑Speech
use NeuronAI\Providers\OpenAI\Audio\OpenAITextToSpeech;
$provider = new OpenAITextToSpeech(
key: 'OPENAI_API_KEY',
model: 'gpt-4o-mini-tts',
voice: 'alloy',
);
$message = $provider->chat(new UserMessage('Hello from Neuron AI!'));
$audioBase64 = $message->getAudio();
file_put_contents(__DIR__ . '/assets/speech.mp3', base64_decode($audioBase64));Speech‑to‑Text
use NeuronAI\Providers\OpenAI\Audio\OpenAISpeechToText;
$provider = new OpenAISpeechToText(
key: 'OPENAI_API_KEY',
model: 'gpt-4o-transcribe',
);
$message = $provider->chat(new UserMessage([
new TextContent('This audio is about a math lesson. Take care of the technical words.'),
new AudioContent(__DIR__ . '/assets/intro.mp3', SourceType::URL)
]));
echo $message->getContent();ElevenLabs Audio Providers
Text‑to‑Speech
use NeuronAI\Providers\ElevenLabs\ElevenLabsTextToSpeech;
$provider = new ElevenLabsTextToSpeech(
key: 'ELEVENLABS_API_KEY',
model: 'eleven_multilingual_v2', // adjust per ElevenLabs documentation
voice: 'Rachel',
);
$message = $provider->chat(new UserMessage('Hello from Neuron AI!'));
$audioBase64 = $message->getAudio();
file_put_contents(__DIR__ . '/assets/speech.mp3', base64_decode($audioBase64));Speech‑to‑Text
use NeuronAI\Providers\ElevenLabs\ElevenLabsSpeechToText;
$provider = new ElevenLabsSpeechToText(
key: 'ELEVENLABS_API_KEY',
model: 'whisper-1', // example; verify actual model name
);
$message = $provider->chat(new UserMessage(
new AudioContent(__DIR__ . '/assets/intro.mp3', SourceType::URL)
));
echo $message->getContent();Open Source Tech Hub
Sharing cutting-edge internet technologies and practical AI resources.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
