Local AI Voice Companion – Realtime Voice Chat

Open-source toolkit for natural voice AI conversations with 500ms latency. Supports local LLMs and custom TTS voices.

RealtimeVoiceChat is an open-source project that lets you have fluid, spoken conversations with any AI models.

Unlike clunky voice assistants that make you wait between turns, this tool delivers near-human response times around 500ms by stitching together several cutting-edge AI components.

It’s the kind of system that makes you forget you’re talking to a machine – the responses come so fast you can actually interrupt the AI mid-sentence, just like a real conversation.

See It In Action

From RealtimeVoiceChat’s Official GitHub Repo

Features

  • Fluid Conversation: The main selling point. Speak and listen, aiming for a natural conversational flow.
  • Real-Time Feedback: You can see partial transcriptions as you speak and even the AI’s text response as it’s being generated, before the audio is ready. This helps reduce that “am I talking to a void?” feeling.
  • Low Latency Focus: The architecture is built around streaming audio chunks via WebSockets to keep delays minimal.
  • Smart Turn-Taking: It uses dynamic silence detection (turndetect.py) to try and figure out when you’ve finished your thought, adapting to the conversation’s pace. This is more sophisticated than a fixed pause timer.
  • Flexible AI Brains: You’re not locked into one LLM. It defaults to Ollama, letting you run local models (which is great for privacy and cost), but it also supports OpenAI if you prefer cloud-based models. This is handled by llm_module.py.
  • Customizable Voices: You get choices for Text-to-Speech engines. The audio_module.py allows you to switch between options like Kokoro, Coqui, or Orpheus.
  • Web Interface: A clean, no-frills UI using Vanilla JavaScript and the Web Audio API.

Use Cases

  • Personal AI Companion: Set it up with a local LLM via Ollama, tweak the system_prompt.txt to give your AI a specific personality, and just chat. It’s quite different from typing at a model.
  • Voice-Controlled Information Kiosk: Imagine a scenario where you need hands-free access to information. You could adapt this to query specific knowledge bases or APIs via voice.
  • Accessibility Tool Exploration: For users who find typing difficult, a robust voice chat interface to an LLM could be a significant enabler.
  • Language Learning Practice: Configure the LLM to act as a conversation partner in a language you’re learning. The real-time nature could make practice more engaging.

How to Use It

1. Install prerequisites:

    • For best performance, use a machine with a CUDA-enabled NVIDIA GPU
    • You’ll need Python 3.9+ if installing manually
    • Docker is recommended for simpler setup

    2. Get the code:

      git clone https://github.com/KoljaB/RealtimeVoiceChat.git
      cd RealtimeVoiceChat

      3. Choose your installation method:

        Docker installation (recommended for Linux/GPU):

        # Build the Docker images
        docker compose build
        # Start the services
        docker compose up -d
        # Pull your desired Ollama model
        docker compose exec ollama ollama pull hf.co/bartowski/huihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-GGUF:Q4_K_M

        Manual installation (Windows):

        # Use the provided script
        install.bat
        # Or follow manual steps for Linux/macOS/Windows
        python -m venv venv
        source venv/bin/activate  # Linux/macOS
        # or
        .\venv\Scripts\activate   # Windows
        cd code
        pip install torch==2.5.1+cu121 torchaudio==2.5.1+cu121 torchvision --index-url https://download.pytorch.org/whl/cu121
        pip install -r requirements.txt

        4. Run the application:

          • With Docker, it’s already running after docker compose up -d
          • If installed manually:
           # Navigate to code directory
           cd code
           # Start the server
           python server.py

          5. Access the interface:

            • Open your browser to http://localhost:8000
            • Grant microphone permissions when prompted
            • Click “Start” to begin chatting
            • Use “Stop” to end and “Reset” to clear the conversation

            6. Customize as needed:

              • Edit configuration in Python files under the code/ directory
              • Change TTS engine in server.py and audio_module.py
              • Switch LLM backend in server.py and llm_module.py
              • Adjust STT settings in transcribe.py
              • Modify turn detection sensitivity in turndetect.py

              Pros

              • Genuinely natural flow – The turn detection and interruption handling make conversations feel much more like talking with a person
              • Open source – Complete access to the code for customization and learning
              • Modular design – Easy to swap components like TTS engines or AI backends
              • Web-based interface – No special app needed beyond a modern browser
              • Local processing – Can run entirely on your own hardware
              • Docker support – Simplifies deployment and dependency management

              Cons

              • Hardware requirements – Needs a decent GPU for good performance, especially for speech synthesis
              • Technical setup – Not plug-and-play; requires comfort with command line and possibly Docker
              • Limited mobile support – Primarily designed for desktop browsers
              • Dependency complexity – Multiple ML libraries with their own requirements and potential conflicts
              • Configuration requires code edits – No simple settings UI for changing voices or models

              Related Resources

              FAQs

              Q: Can I run RealtimeVoiceChat without an NVIDIA GPU?
              A: Technically, yes, the components can fall back to CPU. However, performance for STT and TTS will be significantly slower, and you likely won’t achieve that “real-time” conversational feel. A powerful CUDA-enabled NVIDIA GPU is highly recommended.

              Q: What LLMs are supported out of the box?
              A: It’s primarily set up for Ollama, which allows you to run a wide variety of open-source LLMs locally (like Mistral, Llama, etc.). There’s also a connector for OpenAI models if you have an API key and prefer to use their services.

              Q: How do I change the AI’s voice?
              A: You’ll need to edit server.py to change the START_ENGINE variable to your desired TTS engine (e.g., "coqui", "kokoro", "orpheus"). Then, you’d go into audio_module.py and adjust the engine-specific settings within the AudioProcessor.__init__ method, like voice model paths or speaker IDs.

              Q: Is it difficult to set up for someone new to Docker?
              A: If you’re new to Docker, there might be a slight learning curve, but the docker compose build and docker compose up -d commands do most of the heavy lifting. The main challenge might be ensuring Docker itself is installed correctly and, if you’re using a GPU, that the NVIDIA Container Toolkit is set up so Docker can access the GPU. The provided docker-compose.yml handles the service definitions.

              Q: How much RAM do I need?
              A: At least 8GB, but 16GB or more is recommended, especially when running the LLM and TTS models together.

              Try It Yourself

              If you’ve been looking for a more natural way to interact with AI through conversation, RealtimeVoiceChat is worth setting up. The initial configuration takes some work, but the resulting experience feels significantly more fluid than most voice interfaces I’ve used.

              The project is actively developed on GitHub, so if you run into issues or have ideas for improvements, the community there can help. I’d love to hear about your experiences if you give it a try!

              Leave a Reply

              Your email address will not be published. Required fields are marked *

              Get the latest & top AI tools sent directly to your email.

              Subscribe now to explore the latest & top AI tools and resources, all in one convenient newsletter. No spam, we promise!