For years, intelligent voice agents on enterprise phone systems meant expensive proprietary platforms, vendor lock-in, and audio streaming to distant data centers. AVA — the open-source AI Voice Agent for Asterisk and FreePBX — challenges every one of those assumptions. Released this week as version 6.4.2, AVA integrates directly into the world’s most widely deployed open-source telephony platform and turns any existing PBX into an AI-powered voice agent with no external telephony vendor required.

The project, hosted on GitHub under the MIT license, positions itself as “the most powerful and flexible open-source AI voice agent for Asterisk/FreePBX.” Its key differentiators are a modular pipelined architecture, production-ready deployment baselines, and an uncompromising focus on low-latency, natural conversation — not the clunky touch-tone IVR trees that have frustrated callers for decades.

<2s Target Response Latency
6 Production Baselines
$0.001 Min Cost / Minute
MIT License

The Foundation: Asterisk & FreePBX

Asterisk is the backbone of enterprise telephony for tens of thousands of organizations worldwide. An open-source software PBX, it handles call routing, voicemail, conferencing, and SIP trunking. FreePBX is its graphical management layer. Together they power everything from small business phone systems to large contact center deployments. AVA plugs directly into this ecosystem using Asterisk’s ARI (Asterisk REST Interface) and Audiosocket/RTP audio transport — meaning there is no need to replace existing infrastructure, trunks, or phone numbers.

This integration is the key unlock: organizations that already operate Asterisk can layer AI voice capabilities on top of their existing setup, often in an afternoon, without replacing hardware or signing new carrier contracts.

How It Works: The AI Pipeline

AVA’s architecture follows a clean, instrumented pipeline. When a caller dials in, Asterisk captures the audio stream and hands it to AVA’s AI engine over Audiosocket (TCP) or ExternalMedia (RTP/UDP). The engine processes audio through three sequential AI stages — Speech-to-Text, Language Model reasoning, and Text-to-Speech — before returning synthesized audio back to Asterisk, which plays it to the caller. The entire round trip is designed to complete in under two seconds.

# AVA Core Audio Pipeline
Caller Asterisk / FreePBX AVA AI Engine
STT (Speech→Text) LLM (Reasoning) TTS (Text→Audio)
Asterisk Playback Caller

Voice Activity Detection (VAD) governs when the AI listens versus when it speaks. The engine uses streaming STT — sending audio in chunks when VAD is silent — to ensure transcription keeps pace with conversation. Post-TTS protection prevents echo loops by gating audio capture until playback completes. Latency histograms are instrumented directly in the engine, exposing timing data for every turn so operators can tune and benchmark their specific deployment.

Any organization can deploy intelligent, natural voice agents on their existing phone infrastructure — with full control over privacy, cost, and provider choice.

Six Deployment Baselines

AVA ships with six production-validated configuration profiles, covering the spectrum from fully cloud-based to fully air-gapped. Each baseline has been battle-tested and is designed to be operational with a single Docker Compose command.

Mode Description Best For Cost/Min
OpenAI Realtime Full cloud pipeline — highest voice quality and natural conversation flow Enterprise, quick setup Cloud API
Deepgram Voice Agent Deepgram ecosystem with advanced STT features and Think stage reasoning Advanced Deepgram users Cloud API
Gemini Live Google’s multimodal Gemini Flash model for real-time voice Google ecosystem Cloud API
Local Hybrid Local STT/TTS with cloud LLM (OpenAI). Audio never leaves the premises Privacy & compliance ~$0.001–$0.003
Telnyx Hybrid Local STT/TTS + Telnyx LLM — access to 53+ models at competitive pricing Cost optimization Telnyx Pricing
Fully Local 100% on-premises. No cloud APIs of any kind. Vosk + Phi-3 + Piper Air-gapped, maximum privacy $0.00

The Local Hybrid mode is particularly compelling for cost-sensitive deployments: by keeping audio on-premises but routing only text to a cloud LLM, it achieves strong AI capability at roughly $0.001–$0.003 per minute — a fraction of fully managed AI voice services that can run $0.05–$0.15 per minute or more.

What AVA Can Do

🔁

Live Two-Way Conversation

Natural dialogue with barge-in support, VAD gating, and post-TTS echo protection. Not IVR — real conversation.

🛠️

Tool Calling & Actions

The AI can execute real telephony actions mid-call: transfer to queues or extensions, send email summaries, drop voicemails, and more.

📞

Outbound Dialer

Alpha outbound dialer with scheduling, answering machine detection (AMD), voicemail drop, consent gate, and DNC list support (in progress).

🌍

Multi-Language Support

Dynamic language detection and provider switching per call, with Russian-language backends shipping in v6.4.0.

🔒

On-Premises AI

Fully local mode runs Vosk (STT), Phi-3 (LLM), and Piper (TTS) with zero external dependencies — suitable for regulated environments.

🖥️

Admin UI

Web-based dashboard with setup wizard, live log streaming, real-time metrics, YAML editor with validation, and Asterisk configuration audit.

Tool Calling: AI That Acts, Not Just Talks

One of AVA’s most powerful capabilities is its tool calling system, which allows the language model to trigger real actions during a phone call. When a caller says “transfer me to billing,” the LLM generates a structured tool call rather than a text response. AVA’s UnifiedTransferTool intercepts this, executes an ARI redirect, and the call is transferred — all seamlessly within the conversation.

Supported Tool Actions

  • Transfer to SIP/PJSIP extensions — direct endpoint transfer
  • Transfer to ACD queues — with position announcements and hold music
  • Send email summary — post-call transcript delivery
  • Voicemail drop — leave a message on outbound campaigns
  • Call hangup — graceful termination with confirmation
  • Caller recording — consent-managed audio capture (v6.3.2+)

This architecture transforms AVA from a voice interface into a telephony automation platform. The AI doesn’t just answer questions — it completes tasks on behalf of callers and operators alike.

Deployment & Administration

AVA ships as a Docker Compose application. The entire stack — AI engine, local AI server (for on-premises models), and Admin UI — can be brought up with a single command. A preflight script checks system readiness, validates media directory permissions, and generates environment configuration. For GPU-accelerated local deployments, the local AI server may take 15–20 minutes on first startup to load LLM and TTS models, after which it is persistent across restarts.

The Admin UI, accessible at http://localhost:3003, provides a setup wizard that replaces the command-line configuration experience. Operators can switch providers, edit system prompts, stream live logs, and inspect Asterisk integration status — all through a browser. The YAML editor includes syntax validation to prevent configuration errors from taking the system down.

What’s Coming: The Roadmap

The AVA roadmap is ambitious. Near-term targets include speculative LLM inference — beginning inference as soon as a partial STT transcript is stable, potentially saving 300–1,500ms of latency per turn. A real-time call dashboard with live visualization of active calls is planned, as is voice biometrics for authentication. The team is also targeting streaming latency below 500ms end-to-end for future releases.

On the outbound side, the dialer is moving from alpha toward production hardening: DNC list enforcement, retry automation with outcome classification, and resilience improvements are all active development areas. The project encourages operator contributions, noting that telephony expertise is rare among developers — and that AVA’s own AI assistant can write the code if you describe what you need.

Why It Matters

The managed AI voice agent market is dominated by a handful of vendors who charge per-minute fees and require audio to flow through their cloud infrastructure. For many organizations — particularly those in healthcare, finance, legal, or government — this model is untenable. Privacy regulations, compliance requirements, and simple cost economics push against cloud-first voice AI.

AVA fills this gap in a way no commercial vendor currently does: a full-stack, production-ready AI voice agent that runs entirely on your own hardware, with no vendor dependency and no per-minute fees beyond the compute you already own. That it integrates natively with Asterisk — rather than requiring a rip-and-replace of existing telephony infrastructure — makes it uniquely accessible to the enormous installed base of Asterisk operators worldwide.

For healthcare, finance, and government operators constrained by data residency requirements, AVA’s fully local mode may be the only compliant path to AI voice capability.

AVA v6.4.2 is available now on GitHub under the MIT license. Documentation, installation guides, and the interactive setup wizard are included in the repository. The project welcomes contributions from telephony operators, with a contributor guide specifically written for those who have never used GitHub before.