Tagged articles

speech-to-text

13 articles · Page 1 of 1

Mar 7, 2026 · Artificial Intelligence

Building a Hands‑Free Voice Assistant with Neuron AI’s Multimodal Audio Providers

This guide explains how to use Neuron v3’s multimodal audio capabilities—including OpenAI and ElevenLabs text‑to‑speech and speech‑to‑text providers—to create a local, hands‑free voice assistant that captures audio, transcribes it, processes it via an agent, and plays back responses.

AgentElevenLabsMultimodal

0 likes · 5 min read

Building a Hands‑Free Voice Assistant with Neuron AI’s Multimodal Audio Providers

IT Services Circle

Jan 18, 2026 · Artificial Intelligence

Discover Four Open-Source AI-Powered Tools: LifeTrace, Voquill, RunCat365 & WindowPet

This article introduces four open-source projects—LifeTrace, Voquill, RunCat365, and WindowPet—detailing how they use AI, OCR, and lightweight cross-platform frameworks to automatically record digital activity, convert speech to clean text, visualize CPU load, and add customizable desktop pets, with GitHub links for each.

AIAutomationDesktop Utilities

0 likes · 6 min read

Discover Four Open-Source AI-Powered Tools: LifeTrace, Voquill, RunCat365 & WindowPet

Code Mala Tang

Nov 16, 2025 · Frontend Development

How to Build a Robust Speech‑to‑Text Feature in React with Tencent ASR

This article walks through the complete front‑end architecture and implementation details for integrating Tencent Cloud speech‑to‑text into a React app, covering token authentication, SDK initialization, event handling, cursor‑aware text insertion, character limits, permission handling, error management, and state management with MobX.

MobXReActTencent Cloud

0 likes · 11 min read

How to Build a Robust Speech‑to‑Text Feature in React with Tencent ASR

Instant Consumer Technology Team

Nov 7, 2025 · Artificial Intelligence

Three Open‑Source Gems: Local‑First Knowledge Hub, NL‑to‑SQL AI, and Private Speech‑to‑Text

This weekly roundup spotlights three open‑source tools—AFFiNE’s local‑first knowledge workspace, Vanna’s natural‑language‑to‑SQL AI framework, and Handy’s offline, privacy‑focused speech‑to‑text app—showcasing recent advances in knowledge management, data analysis, and secure voice transcription.

AIKnowledge ManagementPrivacy

0 likes · 7 min read

Three Open‑Source Gems: Local‑First Knowledge Hub, NL‑to‑SQL AI, and Private Speech‑to‑Text

Liangxu Linux

Nov 6, 2025 · Artificial Intelligence

8 Must‑Explore Open‑Source Projects: AI Prompt Tools, Voice Transcription, Browser Engine & More

This article introduces eight noteworthy open‑source projects—including an interactive prompt‑engineering tutorial, Claude Cookbooks, an offline speech‑to‑text tool, an eBook‑to‑audiobook converter, the Servo browser engine, a free programming‑books collection, a real‑time object‑detection model, and other popular repositories—each with brief descriptions and GitHub links.

AI toolsGitHubPrompt engineering

0 likes · 7 min read

8 Must‑Explore Open‑Source Projects: AI Prompt Tools, Voice Transcription, Browser Engine & More

Wuming AI

Oct 19, 2025 · Artificial Intelligence

Can AI Voice Input Replace Typing? A Hands‑On Review of Daiti’s Offline Speech‑to‑Text Tool

The author evaluates Daiti, an offline AI voice‑to‑text application, describing personal typing pain points, testing methodology, pros such as speed, privacy and custom dictionaries, cons like limited model support and missing export features, and provides step‑by‑step installation, configuration, and cost details.

AI voice inputDaitiProduct Review

0 likes · 10 min read

Can AI Voice Input Replace Typing? A Hands‑On Review of Daiti’s Offline Speech‑to‑Text Tool

IT Services Circle

Sep 4, 2025 · Artificial Intelligence

4 Open‑Source AI Tools: Datasets, K‑Line Model, Real‑Time Speech, Agent Toolbox

This article introduces four high‑impact open‑source AI projects—a curated high‑quality dataset collection, the Kronos financial K‑line model, WhisperLiveKit for real‑time speech transcription, and Youtu‑agent for building versatile AI agents—highlighting their features, usage, and GitHub links.

AI agentsdatasetsfinancial modeling

0 likes · 6 min read

4 Open‑Source AI Tools: Datasets, K‑Line Model, Real‑Time Speech, Agent Toolbox

DataFunTalk

Mar 21, 2025 · Artificial Intelligence

OpenAI Unveils New STT and TTS Models: gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts – Performance, Pricing, and Demo

OpenAI announced three new speech models—two STT models (gpt-4o-transcribe and its lightweight gpt-4o-mini-transcribe) and one TTS model (gpt-4o-mini-tts)—showcasing strong accuracy on multilingual benchmarks, competitive pricing, and a quick‑start API demo for developers.

AI modelsGPT-4oOpenAI

0 likes · 8 min read

OpenAI Unveils New STT and TTS Models: gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts – Performance, Pricing, and Demo

CSS Magic

Sep 29, 2024 · Artificial Intelligence

Can You Code Just by Speaking? A Hands‑Free Voice Guide to AI Programming Assistants

This article walks through how to enable voice-driven coding with GitHub Copilot, VS Code Speech, and Cursor, detailing plugin installation, configuration steps, shortcut keys, and tips for using system or input‑method speech input to create a seamless hands‑free AI coding experience.

AI coding assistantCursorGitHub Copilot

0 likes · 10 min read

Can You Code Just by Speaking? A Hands‑Free Voice Guide to AI Programming Assistants

IT Services Circle

Feb 28, 2024 · Artificial Intelligence

Transcribing Audio and Video to Text with OpenAI Whisper and Faster‑Whisper

This article explains how to use OpenAI's Whisper and the faster‑Whisper wrapper to quickly convert audio or video files into searchable text, covering installation, Python code examples, a Swift client, and a Flask‑based server API for practical transcription workflows.

AIFast-WhisperPython

0 likes · 6 min read

Transcribing Audio and Video to Text with OpenAI Whisper and Faster‑Whisper

Code DAO

Dec 10, 2021 · Artificial Intelligence

Deep Learning for Automatic Speech Recognition (ASR): From Mel Spectrograms to CTC Decoding

This article explains the end‑to‑end deep‑learning pipeline for speech‑to‑text, covering audio digitization, preprocessing with librosa, conversion to Mel spectrograms and MFCCs, data augmentation, a CNN‑RNN architecture, CTC loss, decoding strategies and evaluation with word error rate.

ASRBeam SearchCTC

0 likes · 13 min read

Deep Learning for Automatic Speech Recognition (ASR): From Mel Spectrograms to CTC Decoding

MaGe Linux Operations

Oct 8, 2020 · Artificial Intelligence

How to Transcribe Audio to Text with AssemblyAI’s Python API – Step‑by‑Step Guide

This tutorial walks you through setting up a Python environment, installing required dependencies, and using AssemblyAI’s high‑accuracy speech‑to‑text Web API to upload audio files, start transcription, and retrieve the transcribed text, including tips for handling API keys and checking transcription status.

APIAssemblyAIPython

0 likes · 14 min read

How to Transcribe Audio to Text with AssemblyAI’s Python API – Step‑by‑Step Guide

MaGe Linux Operations

Feb 1, 2019 · Artificial Intelligence

Master Python Speech Recognition: Install, Process Audio Files, and Capture Live Voice

This guide walks you through the fundamentals of speech recognition, explains how modern systems work, shows how to choose and install the Python SpeechRecognition package, and demonstrates processing audio files, handling noise, using offsets, and capturing live microphone input with practical code examples.

audio-processingmachine-learningmicrophone

0 likes · 16 min read

Master Python Speech Recognition: Install, Process Audio Files, and Capture Live Voice