Edit

How to add an MCP server to Voice Live

Introduction

Voice Live supports connecting to remote Model Context Protocol (MCP) servers during a voice session. MCP integration enables the model to discover and invoke tools hosted on external services, such as documentation search, wiki lookup, or custom APIs, and incorporate tool results into spoken responses.

MCP server integration differs from function calling in these ways:

Aspect Function calling MCP server
Tool execution Client-side Server-side (managed by Voice Live)
Tool discovery Client defines tools explicitly Voice Live auto-discovers tools from MCP endpoint
Approval model Not applicable Configurable: "always" (default), "never", or per-tool dictionary
API version required 2025-10-01 2026-04-10 or later

Key concepts

  • MCPServer definition: Declare one or more MCP endpoints in the session configuration with server_label, server_url, and optional allowed_tools, headers, authorization, and require_approval.
  • Tool discovery: On session start, Voice Live calls each MCP server's tool listing endpoint and emits mcp_list_tools events.
  • Tool invocation: When the model decides to call an MCP tool, the service handles execution and streams response.mcp_call events.
  • Approval flow: When require_approval is set to "always" (the default), the client receives an mcp_approval_request conversation item and must respond with an mcp_approval_response before the call executes. Set require_approval to "never" for automatic execution, or use a per-tool dictionary to mix modes on the same server.

Approval modes

The require_approval property on each MCPServer controls whether tool calls need client-side approval before execution. It accepts a string or a per-tool dictionary.

Mode Value Behavior
Always (default) "always" Every tool call sends an mcp_approval_request to the client. The call doesn't execute until the client responds with mcp_approval_response and approve=true.
Never "never" Tool calls execute automatically. No approval event is sent.
Per-tool {"always": ["tool_a"], "never": ["tool_b", "tool_c"]} Each tool is assigned an approval mode individually. Tools not listed in either key default to "always".

When to use each mode:

  • "always" — Use for tools that perform write operations, access sensitive data, or incur costs. The voice samples auto-approve subsequent calls to the same server within the same turn to reduce repeated prompts.
  • "never" — Use for read-only lookups, search APIs, or trusted internal tools where user confirmation adds latency without security benefit.
  • Per-tool dictionary — Use when a single MCP server exposes a mix of read-only and write tools. For example, a documentation server might allow search_docs without approval but require approval for submit_feedback.

Note

In voice scenarios, each approval triggers a conversational prompt. Configure require_approval carefully to balance security with conversation flow. See Voice-native approval for implementation patterns.

For the full MCP event and type reference, see Voice Live API reference.

Learn how to connect remote MCP servers to a Voice Live session using the VoiceLive SDK for Python. This article builds on the Quickstart: Create a Voice Live real-time voice agent with MCP server integration.

Reference documentation | Package (PyPi) | Additional samples on GitHub

Follow the how-to below or get the full sample code:

Prerequisites

  • An Azure subscription. Create one for free.
  • Python 3.10 or later version. If you don't have a suitable version of Python installed, you can follow the instructions in the VS Code Python Tutorial for the easiest way of installing Python on your operating system.
  • A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
  • azure-ai-voicelive package version 1.2.0 or later (MCP support requires api_version="2026-04-10").
  • Assign the Cognitive Services User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Tip

To use Voice Live with MCP, you don't need to deploy an audio model with your Foundry resource. Voice Live is fully managed, and the model is automatically deployed for you. For more information about model availability, see the Voice Live overview documentation.

Prepare the environment

Complete the Voice Live quickstart to set up your environment, configure authentication, and test your first Voice Live conversation.

MCP integration concepts

MCP server definition

Use the MCPServer class to declare each remote MCP endpoint. At minimum, provide server_label (a display name) and server_url (the MCP endpoint URL). Optionally restrict available tools with allowed_tools and configure the approval mode.

Approval modes

Control whether MCP tool calls require user approval before execution:

  • require_approval="never": The tool executes automatically when the model invokes it.
  • require_approval="always" (default): The client receives an mcp_approval_request and must respond before the tool runs.
  • Per-tool dictionary: Set require_approval={"never": ["tool_a"], "always": ["tool_b"]} for granular control.

API version requirement

MCP support requires api_version="2026-04-10" or later. Pass this value in the connect() call.

Define MCP servers

Define the MCP servers that Voice Live can use during the session. Each server is an MCPServer instance added to the tools list in the session configuration.

The following code defines two MCP servers: one with automatic tool execution and one that requires user approval before running.

# Define MCP servers that Voice Live can use during the session.
# Each server is an MCPServer instance added to the tools list.
mcp_tools: list[Tool] = [
    MCPServer(
        server_label="deepwiki",
        server_url="https://mcp.deepwiki.com/mcp",
        allowed_tools=["read_wiki_structure", "ask_question"],
        require_approval="never",
    ),
    MCPServer(
        server_label="azure_doc",
        server_url="https://learn.microsoft.com/api/mcp",
        require_approval="always",
    ),
]

In this sample:

  • The deepwiki server allows only read_wiki_structure and ask_question tools, with require_approval="never" for automatic execution.
  • The azure_doc server allows all tools on the endpoint, with require_approval="always" so users can review each call before execution.

Configure the session with MCP tools

Pass the MCP server definitions to the RequestSession tools list alongside your voice, modality, and turn-detection settings.

async def _setup_session(self, mcp_tools: list[Tool]):
    """Configure the VoiceLive session with MCP tools."""
    logger.info("Setting up voice conversation session with MCP tools...")

    # Create voice configuration
    voice_config: Union[AzureStandardVoice, str]
    if "-" in self.voice or ":" in self.voice:
        voice_config = AzureStandardVoice(name=self.voice)
    else:
        voice_config = self.voice

    # Create turn detection configuration
    turn_detection_config = ServerVad(
        threshold=0.5,
        prefix_padding_ms=300,
        silence_duration_ms=500)

    # Create session configuration with MCP tools in the tools list
    session_config = RequestSession(
        modalities=[Modality.TEXT, Modality.AUDIO],
        instructions=self.instructions,
        voice=voice_config,
        input_audio_format=InputAudioFormat.PCM16,
        output_audio_format=OutputAudioFormat.PCM16,
        turn_detection=turn_detection_config,
        input_audio_echo_cancellation=AudioEchoCancellation(),
        input_audio_noise_reduction=AudioNoiseReduction(type="azure_deep_noise_suppression"),
        tools=mcp_tools,
        tool_choice=ToolChoiceLiteral.AUTO,
        input_audio_transcription=AudioInputTranscriptionOptions(
            model="azure-speech" if "realtime" not in self.model.lower() else "whisper-1"
        ),
    )

    # Interim response bridges latency during MCP tool calls, but is only
    # supported on non-realtime model pipelines (e.g. gpt-4o-mini).
    if "realtime" not in self.model.lower():
        session_config.interim_response = LlmInterimResponseConfig(
            triggers=[InterimResponseTrigger.TOOL, InterimResponseTrigger.LATENCY],
            latency_threshold_ms=100,
            instructions="Create friendly interim responses indicating wait time due to "
                         "ongoing processing, if any. Do not include in all responses! "
                         "Do not say you don't have real-time access to information when "
                         "calling tools!",
        )
        logger.info("Interim response enabled for model %s", self.model)
    else:
        logger.info("Interim response skipped — not supported on realtime pipeline (%s)", self.model)

    conn = self.connection
    assert conn is not None
    await conn.session.update(session=session_config)
    logger.info("Session configuration with MCP tools sent")

In this sample:

  • RequestSession bundles MCP tools with audio format, voice, and turn detection settings.
  • connection.session.update(session=session_config) sends the full configuration to Voice Live.
  • Voice Live automatically discovers available tools from each MCP server after the session starts.

Handle MCP events

Process MCP-specific events in the event loop. The key events are:

  • CONVERSATION_ITEM_CREATED with ItemType.MCP_CALL: An MCP tool call was triggered by the model.
  • RESPONSE_MCP_CALL_COMPLETED: The MCP call completed successfully.
  • RESPONSE_MCP_CALL_FAILED: The MCP call failed.
  • CONVERSATION_ITEM_CREATED with ItemType.MCP_APPROVAL_REQUEST: The server is requesting approval for a tool call.
  • CONVERSATION_ITEM_CREATED with ItemType.MCP_LIST_TOOLS: Tool discovery completed for a server.
async def _handle_event(self, event):
    """Handle different types of events from VoiceLive, including MCP events."""
    ap = self.audio_processor
    conn = self.connection
    assert ap is not None
    assert conn is not None

    if event.type == ServerEventType.SESSION_UPDATED:
        logger.info("Session ready: %s", event.session.id)
        await write_conversation_log(f"SessionID: {event.session.id}")
        await write_conversation_log(f"Model: {event.session.model}")
        await write_conversation_log(f"Voice: {event.session.voice}")
        await write_conversation_log("")
        self.session_ready = True
        ap.start_capture()

    elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
        logger.info("User started speaking - stopping playback")
        print("šŸŽ¤ Listening...")
        ap.skip_pending_audio()
        # Approval call counter is NOT reset on speech — it tracks the
        # lifecycle of a task (reset on denial or after results are spoken)
        # But approved-servers-this-turn resets when user starts a new topic
        if self._pending_approval is None and self._mcp_call_in_progress <= 0:
            self._approved_servers_this_turn.clear()

        # Clear deferred response flags if no MCP calls are in progress.
        # Prevents stale _needs_response_create from re-triggering result
        # playback after the user interrupts.
        if self._mcp_call_in_progress <= 0:
            self._needs_response_create = False
            self._mcp_results_pending = False

        if self._active_response and not self._response_api_done:
            try:
                await conn.response.cancel()
            except Exception as e:
                if "no active response" not in str(e).lower():
                    logger.warning("Cancel failed: %s", e)

        # If an MCP call is running, mark current calls as stale (user is moving on)
        # and let the user know it's still in progress
        if self._mcp_call_in_progress > 0 and self._pending_approval is None:
            self._stale_mcp_items.update(self._active_mcp_items)
            logger.info("User spoke during MCP call — marking %d calls as stale", len(self._active_mcp_items))
            try:
                await conn.conversation.item.create(
                    item=MessageItem(
                        role="system",
                        content=[InputTextContentPart(
                            text="A tool call is still running in the background. The user just spoke. "
                                 "Respond to what the user said. If a tool result arrives later, "
                                 "briefly introduce it as a late result from an earlier request."
                        )],
                    )
                )
            except Exception as e:
                logger.warning("Failed to inject MCP status update: %s", e)

    elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED:
        logger.info("User stopped speaking")
        print("šŸ¤” Processing...")

    elif event.type == ServerEventType.RESPONSE_CREATED:
        logger.info("Assistant response created")
        self._active_response = True
        self._response_api_done = False

    elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
        ap.queue_audio(event.delta)

    elif event.type == ServerEventType.RESPONSE_AUDIO_DONE:
        logger.info("Assistant finished speaking")
        print("šŸŽ¤ Ready for next input...")

    elif event.type == ServerEventType.RESPONSE_TEXT_DONE:
        text = event.text if hasattr(event, 'text') else event.get("text", "")
        print(f"šŸ¤– Assistant text:\t{text}")
        await write_conversation_log(f"Assistant Text Response:\t{text}")

    elif event.type == ServerEventType.RESPONSE_AUDIO_TRANSCRIPT_DONE:
        transcript = event.transcript if hasattr(event, 'transcript') else event.get("transcript", "")
        print(f"šŸ¤– Assistant audio transcript:\t{transcript}")
        await write_conversation_log(f"Assistant Audio Response:\t{transcript}")

    elif event.type == ServerEventType.RESPONSE_DONE:
        logger.info("Response complete")
        await write_conversation_log("--- Response complete ---")
        self._active_response = False
        self._response_api_done = True

        # If an approval prompt needs to be injected, do it now that no response is active
        if self._approval_prompt_needed and self._pending_approval is not None:
            self._approval_prompt_needed = False
            await self._send_approval_voice_prompt(self._pending_approval, conn)
        # If MCP results are pending and all calls are now done, create response
        elif self._mcp_results_pending and self._mcp_call_in_progress <= 0 and self._pending_approval is None:
            self._mcp_results_pending = False
            try:
                await conn.response.create()
            except Exception:
                pass
        # If a response.create was deferred due to collision, retry now
        elif self._needs_response_create:
            self._needs_response_create = False
            try:
                await conn.response.create()
            except Exception:
                pass  # Best-effort retry

    elif event.type == ServerEventType.CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED:
        transcript = event.transcript if hasattr(event, 'transcript') else event.get("transcript", "")
        logger.info("User said: %s", transcript)
        print(f"šŸ‘¤ You said:\t{transcript}")
        await write_conversation_log(f"User Input:\t{transcript}")

        # Interpret as an approval answer if we have a pending approval —
        # whether or not the prompt has finished speaking. This allows the
        # user to barge in with "yes" without waiting for the full prompt.
        if self._pending_approval is not None:
            await self._resolve_voice_approval(transcript, conn)

    elif event.type == ServerEventType.ERROR:
        msg = event.error.message
        # Reset response state — errors can terminate a response without RESPONSE_DONE
        self._active_response = False
        self._response_api_done = True
        if "Cancellation failed: no active response" not in msg:
            if "interim response" in msg.lower():
                logger.warning("Interim response not supported with this model pipeline (non-fatal)")
            elif "active response" in msg.lower():
                logger.debug("Response collision (expected during MCP flow): %s", msg)
            else:
                logger.error("VoiceLive error: %s", msg)
                print(f"Error: {msg}")
                await write_conversation_log(f"ERROR: {msg}")

    # MCP-specific events
    elif event.type == ServerEventType.MCP_LIST_TOOLS_IN_PROGRESS:
        logger.info("MCP list tools in progress for %s", event.item_id)

    elif event.type == ServerEventType.MCP_LIST_TOOLS_COMPLETED:
        logger.info("MCP list tools completed for %s", event.item_id)
        print("šŸ”§ MCP tools discovered successfully")
        await write_conversation_log("MCP tools discovered successfully")

    elif event.type == ServerEventType.MCP_LIST_TOOLS_FAILED:
        logger.error("MCP list tools failed for %s", event.item_id)
        print("āŒ MCP tool discovery failed")
        await write_conversation_log("ERROR: MCP tool discovery failed")

    elif event.type == ServerEventType.RESPONSE_MCP_CALL_IN_PROGRESS:
        logger.info("MCP call in progress for %s", event.item_id)
        print("ā³ MCP tool call in progress...")
        await write_conversation_log(f"MCP call in progress: {event.item_id}")
        self._mcp_call_in_progress += 1
        self._active_mcp_items.add(event.item_id)
        self._start_mcp_stall_timer(conn)

    elif event.type == ServerEventType.RESPONSE_MCP_CALL_COMPLETED:
        item_id = event.item_id
        self._mcp_call_in_progress = max(0, self._mcp_call_in_progress - 1)
        self._active_mcp_items.discard(item_id)
        self._cancel_mcp_stall_timer()
        if item_id in self._handled_mcp_completions:
            logger.debug("Ignoring duplicate MCP completion for %s", item_id)
        else:
            self._handled_mcp_completions.add(item_id)
            is_stale = item_id in self._stale_mcp_items
            self._stale_mcp_items.discard(item_id)
            logger.info("MCP call completed for %s (stale=%s)", item_id, is_stale)
            await write_conversation_log(f"MCP call completed: {item_id} (stale={is_stale})")
            await self._handle_mcp_call_completed(event, conn, is_stale=is_stale)

    elif event.type == ServerEventType.RESPONSE_MCP_CALL_FAILED:
        item_id = event.item_id
        logger.error("MCP call failed for %s", item_id)
        print("āŒ MCP tool call failed")
        await write_conversation_log(f"ERROR: MCP call failed: {item_id}")
        self._mcp_call_in_progress = max(0, self._mcp_call_in_progress - 1)
        self._active_mcp_items.discard(item_id)
        self._stale_mcp_items.discard(item_id)
        self._cancel_mcp_stall_timer()
        # Kick the model to inform the user the tool call failed
        try:
            await conn.response.create()
        except Exception as e:
            if "active response" not in str(e).lower():
                logger.warning("Failed to create response after MCP failure: %s", e)

    elif event.type == ServerEventType.CONVERSATION_ITEM_CREATED:
        logger.info("Conversation item created: id=%s, type=%s", event.item.id, event.item.type)
        if event.item.type == ItemType.MCP_LIST_TOOLS:
            logger.info("MCP list tools item: server_label=%s", event.item.server_label)
        elif event.item.type == ItemType.MCP_CALL:
            await self._handle_mcp_call_arguments(event, conn)
        elif event.item.type == ItemType.MCP_APPROVAL_REQUEST:
            await self._handle_mcp_approval_request(event, conn)
    else:
        logger.debug("Unhandled event type: %s", event.type)

In this sample:

  • _handle_mcp_call_arguments waits for the full arguments to stream in via RESPONSE_MCP_CALL_ARGUMENTS_DONE, then waits for the response to complete.
  • _handle_mcp_call_completed receives the tool output and triggers a new response so the model can incorporate the result into its next spoken reply.

Handle approval requests

When a server is configured with require_approval="always", client code must handle the approval flow. Instead of blocking on console input, inject a system message so the model asks the user verbally and parse the spoken response.

async def _handle_mcp_approval_request(self, conversation_created_event, connection):
    """Handle MCP approval request by asking the user via voice."""
    if not isinstance(conversation_created_event, ServerEventConversationItemCreated):
        logger.error("Expected ServerEventConversationItemCreated")
        return
    if not isinstance(conversation_created_event.item, ResponseMCPApprovalRequestItem):
        logger.error("Expected ResponseMCPApprovalRequestItem")
        return

    mcp_approval_item = conversation_created_event.item
    approval_id = mcp_approval_item.id
    server_label = mcp_approval_item.server_label
    function_name = mcp_approval_item.name

    if not approval_id:
        logger.error("MCP approval item missing ID")
        return

    # Auto-deny after too many calls to the same server in one task.
    # This prevents infinite tool-call loops in voice UX.
    MAX_APPROVAL_CALLS_PER_TASK = 3
    current_count = self._approval_call_count.get(server_label, 0)
    if current_count >= MAX_APPROVAL_CALLS_PER_TASK:
        logger.info("Auto-denying %s — reached %d calls this task", function_name, current_count)
        print(f"   Auto-denied: {server_label}/{function_name} (max {MAX_APPROVAL_CALLS_PER_TASK} calls reached)")
        try:
            await connection.conversation.item.create(
                item=MCPApprovalResponseRequestItem(approval_request_id=approval_id, approve=False)
            )
        except Exception as e:
            logger.warning("Failed to send auto-deny: %s", e)
        return

    # Auto-approve if user already approved this server earlier in the same turn.
    # This avoids repeated approval prompts for consecutive calls to the same service.
    if server_label in self._approved_servers_this_turn:
        logger.info("Auto-approving %s — server already approved this turn", function_name)
        print(f"   Auto-approved: {server_label}/{function_name} (already approved this turn)")
        try:
            await connection.conversation.item.create(
                item=MCPApprovalResponseRequestItem(approval_request_id=approval_id, approve=True)
            )
        except Exception as e:
            logger.warning("Failed to send auto-approve: %s", e)
        return

    # If another approval is already pending, queue this one
    if self._pending_approval is not None:
        logger.info("Queuing approval for %s — another is already pending", function_name)
        self._approval_queue.append({
            "approval_id": approval_id,
            "server_label": server_label,
            "function_name": function_name,
        })
        return

    logger.info("MCP approval request: server=%s tool=%s", server_label, function_name)
    print(f"\nšŸ” MCP Approval Request (voice-based):")
    print(f"   Server: {server_label}  Tool: {function_name}")

    # Store the pending approval. If no response is currently active,
    # send the voice prompt immediately. Otherwise, defer it to
    # RESPONSE_DONE to avoid colliding with an active response.
    self._pending_approval = {
        "approval_id": approval_id,
        "server_label": server_label,
        "function_name": function_name,
    }

    if not self._active_response:
        await self._send_approval_voice_prompt(self._pending_approval, connection)
    else:
        self._approval_prompt_needed = True

async def _send_approval_voice_prompt(self, pending: dict, connection):
    """Inject a system message asking the model to verbally request permission."""
    server = pending["server_label"]
    call_count = self._approval_call_count.get(server, 0)
    self._approval_call_count[server] = call_count + 1

    if call_count == 0:
        prompt = (
            "You MUST ask the user for explicit permission before proceeding. "
            f'Say exactly: "I\'d like to search the {server} service for information. '
            f'Do you approve? Please say yes or no."'
        )
    else:
        prompt = (
            "You MUST ask the user for permission again. "
            'Say exactly: "I need to do one more search to get complete information. '
            'Should I continue? Please say yes or no."'
        )

    try:
        await connection.conversation.item.create(
            item=MessageItem(
                role="system",
                content=[InputTextContentPart(text=prompt)],
            )
        )
        await connection.response.create()
    except Exception as e:
        logger.warning("Failed to send approval voice prompt: %s", e)

async def _resolve_voice_approval(self, transcript: str, connection):
    """Interpret the user's spoken response as approval or denial."""
    pending = self._pending_approval
    if pending is None:
        return

    text = transcript.strip().lower()

    # Match "yes" or "no" as whole words (word boundaries prevent false
    # positives from words like "yesterday" or "nobody").
    # Also accept "stop" and "cancel" as denial.
    approved = bool(re.search(r'\byes\b', text))
    denied = bool(re.search(r'\b(no|stop|cancel)\b', text))

    if not approved and not denied:
        # Ambiguous — ask again via the deferred prompt mechanism
        logger.info("Ambiguous approval response: %s", transcript)
        self._approval_prompt_needed = True
        return

    if approved and denied:
        # Conflicting signals — treat as denial for safety
        approved = False

    # Clear the pending state before sending the response
    self._pending_approval = None
    if approved:
        self._approved_servers_this_turn.add(pending["server_label"])
    else:
        self._approval_call_count.clear()  # Topic is over
        self._approved_servers_this_turn.discard(pending["server_label"])

    approval_response_item = MCPApprovalResponseRequestItem(
        approval_request_id=pending["approval_id"], approve=approved
    )
    try:
        await connection.conversation.item.create(item=approval_response_item)
    except Exception as e:
        logger.error("Failed to send approval response: %s", e)
        return
    logger.info("Voice approval resolved: %s for %s", approved, pending["function_name"])
    print(f"   Voice approval: {'Approved āœ…' if approved else 'Denied āŒ'}")
    await write_conversation_log(f"Voice approval: {'Approved' if approved else 'Denied'} for {pending['server_label']}")

    # Process next queued approval, if any
    await self._process_next_approval(connection)

async def _process_next_approval(self, connection):
    """Pop the next queued approval and ask via voice."""
    if not self._approval_queue:
        return
    next_approval = self._approval_queue.pop(0)
    self._pending_approval = next_approval

    # Send immediately if no response is active, otherwise defer
    if not self._active_response:
        await self._send_approval_voice_prompt(next_approval, connection)
    else:
        self._approval_prompt_needed = True

In this sample:

  • The mcp_approval_request event contains server_label, name (tool name), and arguments.
  • A system message instructs the model to verbally ask for permission.
  • MCPApprovalResponseRequestItem sends the decision back to Voice Live with approve=True or approve=False.

Resolve voice-based approval

Parse the user's spoken transcript to determine approval. Use word-boundary regex to avoid false positives from words like "yesterday" or "nobody".

elif event.type == ServerEventType.CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED:
    transcript = event.transcript if hasattr(event, 'transcript') else event.get("transcript", "")
    logger.info("User said: %s", transcript)
    print(f"šŸ‘¤ You said:\t{transcript}")
    await write_conversation_log(f"User Input:\t{transcript}")

    # Interpret as an approval answer if we have a pending approval —
    # whether or not the prompt has finished speaking. This allows the
    # user to barge in with "yes" without waiting for the full prompt.
    if self._pending_approval is not None:
        await self._resolve_voice_approval(transcript, conn)

In this sample:

  • The transcript from CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED is matched against \byes\b and \b(no|stop|cancel)\b patterns.
  • Subsequent calls to the same server within the same turn are auto-approved to avoid repeated prompts.
  • After a configurable maximum (for example, 3 approvals), further calls are auto-denied and the model responds with what it has.

Detect stalls during MCP tool calls

MCP tool calls can take several seconds. Use a repeating timer to proactively inform the user that the assistant is still waiting for results.

MCP_STALL_MAX_NOTIFICATIONS = 3

def _start_mcp_stall_timer(self, connection):
    """Start a repeating timer that verbally updates the user if an MCP call takes too long."""
    self._cancel_mcp_stall_timer()

    async def _stall_loop():
        stall_count = 0
        while self._mcp_call_in_progress > 0 and stall_count < self.MCP_STALL_MAX_NOTIFICATIONS:
            await asyncio.sleep(10)
            if self._mcp_call_in_progress <= 0:
                break
            stall_count += 1
            # Note: MCP calls cannot be cancelled via the API — only honest
            # status updates are possible until the server responds or times out.
            msg = ("The tool call is still running. "
                   "Briefly reassure the user that you're still waiting for results. "
                   "One short sentence only.")
            logger.info("MCP stall notification #%d", stall_count)
            try:
                await connection.conversation.item.create(
                    item=MessageItem(
                        role="system",
                        content=[InputTextContentPart(text=msg)],
                    )
                )
                await connection.response.create()
            except Exception as e:
                if "active response" in str(e).lower():
                    self._needs_response_create = True
                else:
                    logger.debug("Stall notification failed: %s", e)

    self._mcp_stall_task = asyncio.create_task(_stall_loop())

def _cancel_mcp_stall_timer(self):
    """Cancel the MCP stall timer if running."""
    if self._mcp_stall_task and not self._mcp_stall_task.done():
        self._mcp_stall_task.cancel()
    self._mcp_stall_task = None

In this sample:

  • A 10-second interval timer injects system messages like "Tell the user you're still waiting" up to 3 times.
  • The timer is cancelled when the MCP call completes or the user interrupts with barge-in.

Run the sample

  1. Create the mcp-quickstart.py file with the following code:

    # -------------------------------------------------------------------------
    # Copyright (c) Microsoft Corporation. All rights reserved.
    # Licensed under the MIT License.
    # -------------------------------------------------------------------------
    
    """
    FILE: mcp-quickstart.py
    
    DESCRIPTION:
        This sample demonstrates how to use the Azure AI Voice Live SDK with MCP
        (Model Context Protocol) server integration. It shows how to define MCP
        servers, handle MCP tool call events, and implement an approval flow for
        tool calls that require user consent.
    
    USAGE:
        python mcp-quickstart.py --use-token-credential
    
        Set the environment variables with your own values before running the sample:
        1) AZURE_VOICELIVE_ENDPOINT - The Azure VoiceLive endpoint
        2) AZURE_VOICELIVE_API_KEY  - The Azure VoiceLive API key (if not using token credential)
    
    REQUIREMENTS:
        - azure-ai-voicelive
        - python-dotenv
        - pyaudio (for audio capture and playback)
        - azure-identity (for token credential authentication)
    """
    
    from __future__ import annotations
    import os
    import sys
    import argparse
    import asyncio
    import base64
    from datetime import datetime
    import logging
    import queue
    import re
    import signal
    from typing import Union, Optional, TYPE_CHECKING, cast
    
    from azure.core.credentials import AzureKeyCredential
    from azure.core.credentials_async import AsyncTokenCredential
    from azure.identity.aio import AzureCliCredential
    
    from azure.ai.voicelive.aio import connect
    from azure.ai.voicelive.models import (
        AudioEchoCancellation,
        AudioInputTranscriptionOptions,
        AudioNoiseReduction,
        AzureStandardVoice,
        InputAudioFormat,
        InputTextContentPart,
        InterimResponseTrigger,
        ItemType,
        LlmInterimResponseConfig,
        MCPApprovalResponseRequestItem,
        MCPServer,
        MessageItem,
        Modality,
        OutputAudioFormat,
        RequestSession,
        ResponseMCPApprovalRequestItem,
        ResponseMCPCallItem,
        ServerEventConversationItemCreated,
        ServerEventResponseMcpCallCompleted,
        ServerEventType,
        ServerVad,
        Tool,
        ToolChoiceLiteral,
    )
    from dotenv import load_dotenv
    import pyaudio
    
    if TYPE_CHECKING:
        from azure.ai.voicelive.aio import VoiceLiveConnection
    
    # Change to the directory where this script is located
    os.chdir(os.path.dirname(os.path.abspath(__file__)))
    
    # Environment variable loading
    load_dotenv('../.env', override=True)
    
    # Set up logging
    if not os.path.exists('logs'):
        os.makedirs('logs')
    
    timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
    
    # Conversation log filename (separate from debug log)
    _script_dir = os.path.dirname(os.path.abspath(__file__))
    conversation_logfilename = f"conversation_{timestamp}.log"
    
    logging.basicConfig(
        filename=f'logs/{timestamp}_voicelive.log',
        filemode="w",
        format='%(asctime)s:%(name)s:%(levelname)s:%(message)s',
        level=logging.INFO
    )
    logger = logging.getLogger(__name__)
    
    
    class AudioProcessor:
        """
        Handles real-time audio capture and playback for the voice assistant.
    
        Threading Architecture:
        - Main thread: Event loop and UI
        - Capture thread: PyAudio input stream reading
        - Send thread: Async audio data transmission to VoiceLive
        - Playback thread: PyAudio output stream writing
        """
    
        loop: asyncio.AbstractEventLoop
    
        class AudioPlaybackPacket:
            """Represents a packet that can be sent to the audio playback queue."""
            def __init__(self, seq_num: int, data: Optional[bytes]):
                self.seq_num = seq_num
                self.data = data
    
        def __init__(self, connection):
            self.connection = connection
            self.audio = pyaudio.PyAudio()
    
            # Audio configuration - PCM16, 24kHz, mono
            self.format = pyaudio.paInt16
            self.channels = 1
            self.rate = 24000
            self.chunk_size = 1200  # 50ms
    
            # Capture and playback state
            self.input_stream = None
    
            self.playback_queue: queue.Queue[AudioProcessor.AudioPlaybackPacket] = queue.Queue()
            self.playback_base = 0
            self.next_seq_num = 0
            self.output_stream: Optional[pyaudio.Stream] = None
    
            logger.info("AudioProcessor initialized with 24kHz PCM16 mono audio")
    
        def start_capture(self):
            """Start capturing audio from microphone."""
            def _capture_callback(in_data, _frame_count, _time_info, _status_flags):
                audio_base64 = base64.b64encode(in_data).decode("utf-8")
                asyncio.run_coroutine_threadsafe(
                    self.connection.input_audio_buffer.append(audio=audio_base64), self.loop
                )
                return (None, pyaudio.paContinue)
    
            if self.input_stream:
                return
    
            self.loop = asyncio.get_event_loop()
    
            try:
                self.input_stream = self.audio.open(
                    format=self.format,
                    channels=self.channels,
                    rate=self.rate,
                    input=True,
                    frames_per_buffer=self.chunk_size,
                    stream_callback=_capture_callback,
                )
                logger.info("Started audio capture")
            except Exception:
                logger.exception("Failed to start audio capture")
                raise
    
        def start_playback(self):
            """Initialize audio playback system."""
            if self.output_stream:
                return
    
            remaining = bytes()
    
            def _playback_callback(_in_data, frame_count, _time_info, _status_flags):
                nonlocal remaining
                frame_count *= pyaudio.get_sample_size(pyaudio.paInt16)
    
                out = remaining[:frame_count]
                remaining_local = remaining[frame_count:]
    
                while len(out) < frame_count:
                    try:
                        packet = self.playback_queue.get_nowait()
                    except queue.Empty:
                        out = out + bytes(frame_count - len(out))
                        continue
    
                    if not packet or not packet.data:
                        break
    
                    if packet.seq_num < self.playback_base:
                        continue
    
                    num_to_take = frame_count - len(out)
                    out = out + packet.data[:num_to_take]
                    remaining_local = packet.data[num_to_take:]
    
                remaining = remaining_local
    
                if len(out) >= frame_count:
                    return (out, pyaudio.paContinue)
                else:
                    return (out, pyaudio.paComplete)
    
            try:
                self.output_stream = self.audio.open(
                    format=self.format,
                    channels=self.channels,
                    rate=self.rate,
                    output=True,
                    frames_per_buffer=self.chunk_size,
                    stream_callback=_playback_callback
                )
                logger.info("Audio playback system ready")
            except Exception:
                logger.exception("Failed to initialize audio playback")
                raise
    
        def _get_and_increase_seq_num(self):
            seq = self.next_seq_num
            self.next_seq_num += 1
            return seq
    
        def queue_audio(self, audio_data: Optional[bytes]) -> None:
            """Queue audio data for playback."""
            self.playback_queue.put(
                AudioProcessor.AudioPlaybackPacket(
                    seq_num=self._get_and_increase_seq_num(),
                    data=audio_data))
    
        def skip_pending_audio(self):
            """Skip current audio in playback queue."""
            self.playback_base = self._get_and_increase_seq_num()
    
        def shutdown(self):
            """Clean up audio resources."""
            if self.input_stream:
                self.input_stream.stop_stream()
                self.input_stream.close()
                self.input_stream = None
            logger.info("Stopped audio capture")
    
            if self.output_stream:
                self.skip_pending_audio()
                self.queue_audio(None)
                self.output_stream.stop_stream()
                self.output_stream.close()
                self.output_stream = None
            logger.info("Stopped audio playback")
    
            if self.audio:
                self.audio.terminate()
            logger.info("Audio processor cleaned up")
    
    
    class MCPVoiceAssistant:
        """Voice assistant with MCP server integration."""
    
        def __init__(
            self,
            endpoint: str,
            credential: Union[AzureKeyCredential, AsyncTokenCredential],
            model: str,
            voice: str,
            instructions: str,
        ):
            self.endpoint = endpoint
            self.credential = credential
            self.model = model
            self.voice = voice
            self.instructions = instructions
            self.connection: Optional["VoiceLiveConnection"] = None
            self.audio_processor: Optional[AudioProcessor] = None
            self.session_ready = False
            self._active_response = False
            self._response_api_done = False
            self._pending_approval: Optional[dict] = None  # Currently active approval request
            self._approval_queue: list[dict] = []  # Queued approvals waiting to be asked
            self._approval_prompt_needed = False  # True when we need to inject the prompt at next RESPONSE_DONE
            self._mcp_call_in_progress = 0  # Count of active MCP tool calls
            self._handled_mcp_completions: set = set()  # Deduplicate MCP completion events
            self._needs_response_create = False  # Retry response.create at next RESPONSE_DONE
            self._approval_call_count: dict[str, int] = {}  # Per-server call count this turn
            self._mcp_item_to_server: dict = {}  # Map MCP item IDs to server_label/function_name
            self._approval_servers: set = set()  # Server labels that require approval
            self._mcp_stall_task: Optional[asyncio.Task] = None  # Timer for MCP stall detection
            self._active_mcp_items: set = set()  # Item IDs of currently in-progress MCP calls
            self._stale_mcp_items: set = set()  # MCP calls the user has moved on from
            self._approved_servers_this_turn: set = set()  # Servers user already approved this turn
            self._mcp_results_pending = False  # True when MCP calls completed but response.create deferred
    
        async def start(self):
            """Start the voice assistant session with MCP support."""
            try:
                logger.info("Connecting to VoiceLive API with model %s", self.model)
    
                # <define_mcp_servers>
                # Define MCP servers that Voice Live can use during the session.
                # Each server is an MCPServer instance added to the tools list.
                mcp_tools: list[Tool] = [
                    MCPServer(
                        server_label="deepwiki",
                        server_url="https://mcp.deepwiki.com/mcp",
                        allowed_tools=["read_wiki_structure", "ask_question"],
                        require_approval="never",
                    ),
                    MCPServer(
                        server_label="azure_doc",
                        server_url="https://learn.microsoft.com/api/mcp",
                        require_approval="always",
                    ),
                ]
                # </define_mcp_servers>
    
                # Track which servers require approval for per-turn loop prevention.
                # Servers with require_approval="always" are guarded to avoid
                # repeated approval prompts in voice UX — a design decision to keep
                # the voice conversation flow smooth. Servers with "never" are allowed
                # to make multiple calls (e.g. DeepWiki's read_wiki_structure →
                # ask_question pattern) since they don't interrupt the user.
                self._approval_servers = {
                    s.server_label for s in mcp_tools
                    if isinstance(s, MCPServer) and s.require_approval == "always"
                }
    
                # Connect with api_version="2026-01-01-preview" for MCP support
                async with connect(
                    endpoint=self.endpoint,
                    credential=self.credential,
                    model=self.model,
                    api_version="2026-01-01-preview",
                ) as connection:
                    self.connection = connection
    
                    # Initialize audio processor
                    ap = AudioProcessor(connection)
                    self.audio_processor = ap
    
                    # Configure session with MCP tools
                    await self._setup_session(mcp_tools)
    
                    # Start audio systems
                    ap.start_playback()
    
                    logger.info("Voice assistant with MCP ready! Start speaking...")
                    print("\n" + "=" * 70)
                    print("šŸŽ¤ VOICE ASSISTANT WITH MCP READY")
                    print("Try saying:")
                    print("  • 'What is the GitHub repo fastapi about?'")
                    print("  • 'Search the Azure documentation for Voice Live API.'")
                    print("You may need to approve some MCP tool calls in the console.")
                    print("Press Ctrl+C to exit")
                    print("=" * 70 + "\n")
    
                    # Process events
                    await self._process_events()
            finally:
                if self.audio_processor:
                    self.audio_processor.shutdown()
    
        # <configure_session>
        async def _setup_session(self, mcp_tools: list[Tool]):
            """Configure the VoiceLive session with MCP tools."""
            logger.info("Setting up voice conversation session with MCP tools...")
    
            # Create voice configuration
            voice_config: Union[AzureStandardVoice, str]
            if "-" in self.voice or ":" in self.voice:
                voice_config = AzureStandardVoice(name=self.voice)
            else:
                voice_config = self.voice
    
            # Create turn detection configuration
            turn_detection_config = ServerVad(
                threshold=0.5,
                prefix_padding_ms=300,
                silence_duration_ms=500)
    
            # Create session configuration with MCP tools in the tools list
            session_config = RequestSession(
                modalities=[Modality.TEXT, Modality.AUDIO],
                instructions=self.instructions,
                voice=voice_config,
                input_audio_format=InputAudioFormat.PCM16,
                output_audio_format=OutputAudioFormat.PCM16,
                turn_detection=turn_detection_config,
                input_audio_echo_cancellation=AudioEchoCancellation(),
                input_audio_noise_reduction=AudioNoiseReduction(type="azure_deep_noise_suppression"),
                tools=mcp_tools,
                tool_choice=ToolChoiceLiteral.AUTO,
                input_audio_transcription=AudioInputTranscriptionOptions(
                    model="azure-speech" if "realtime" not in self.model.lower() else "whisper-1"
                ),
            )
    
            # Interim response bridges latency during MCP tool calls, but is only
            # supported on non-realtime model pipelines (e.g. gpt-4o-mini).
            if "realtime" not in self.model.lower():
                session_config.interim_response = LlmInterimResponseConfig(
                    triggers=[InterimResponseTrigger.TOOL, InterimResponseTrigger.LATENCY],
                    latency_threshold_ms=100,
                    instructions="Create friendly interim responses indicating wait time due to "
                                 "ongoing processing, if any. Do not include in all responses! "
                                 "Do not say you don't have real-time access to information when "
                                 "calling tools!",
                )
                logger.info("Interim response enabled for model %s", self.model)
            else:
                logger.info("Interim response skipped — not supported on realtime pipeline (%s)", self.model)
    
            conn = self.connection
            assert conn is not None
            await conn.session.update(session=session_config)
            logger.info("Session configuration with MCP tools sent")
        # </configure_session>
    
        async def _process_events(self):
            """Process events from the VoiceLive connection."""
            conn = self.connection
            assert conn is not None
            async for event in conn:
                try:
                    await self._handle_event(event)
                except Exception:
                    logger.exception("Error handling event %s (non-fatal)", getattr(event, 'type', '?'))
    
        # <handle_mcp_events>
        async def _handle_event(self, event):
            """Handle different types of events from VoiceLive, including MCP events."""
            ap = self.audio_processor
            conn = self.connection
            assert ap is not None
            assert conn is not None
    
            if event.type == ServerEventType.SESSION_UPDATED:
                logger.info("Session ready: %s", event.session.id)
                await write_conversation_log(f"SessionID: {event.session.id}")
                await write_conversation_log(f"Model: {event.session.model}")
                await write_conversation_log(f"Voice: {event.session.voice}")
                await write_conversation_log("")
                self.session_ready = True
                ap.start_capture()
    
            elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED:
                logger.info("User started speaking - stopping playback")
                print("šŸŽ¤ Listening...")
                ap.skip_pending_audio()
                # Approval call counter is NOT reset on speech — it tracks the
                # lifecycle of a task (reset on denial or after results are spoken)
                # But approved-servers-this-turn resets when user starts a new topic
                if self._pending_approval is None and self._mcp_call_in_progress <= 0:
                    self._approved_servers_this_turn.clear()
    
                # Clear deferred response flags if no MCP calls are in progress.
                # Prevents stale _needs_response_create from re-triggering result
                # playback after the user interrupts.
                if self._mcp_call_in_progress <= 0:
                    self._needs_response_create = False
                    self._mcp_results_pending = False
    
                if self._active_response and not self._response_api_done:
                    try:
                        await conn.response.cancel()
                    except Exception as e:
                        if "no active response" not in str(e).lower():
                            logger.warning("Cancel failed: %s", e)
    
                # If an MCP call is running, mark current calls as stale (user is moving on)
                # and let the user know it's still in progress
                if self._mcp_call_in_progress > 0 and self._pending_approval is None:
                    self._stale_mcp_items.update(self._active_mcp_items)
                    logger.info("User spoke during MCP call — marking %d calls as stale", len(self._active_mcp_items))
                    try:
                        await conn.conversation.item.create(
                            item=MessageItem(
                                role="system",
                                content=[InputTextContentPart(
                                    text="A tool call is still running in the background. The user just spoke. "
                                         "Respond to what the user said. If a tool result arrives later, "
                                         "briefly introduce it as a late result from an earlier request."
                                )],
                            )
                        )
                    except Exception as e:
                        logger.warning("Failed to inject MCP status update: %s", e)
    
            elif event.type == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED:
                logger.info("User stopped speaking")
                print("šŸ¤” Processing...")
    
            elif event.type == ServerEventType.RESPONSE_CREATED:
                logger.info("Assistant response created")
                self._active_response = True
                self._response_api_done = False
    
            elif event.type == ServerEventType.RESPONSE_AUDIO_DELTA:
                ap.queue_audio(event.delta)
    
            elif event.type == ServerEventType.RESPONSE_AUDIO_DONE:
                logger.info("Assistant finished speaking")
                print("šŸŽ¤ Ready for next input...")
    
            elif event.type == ServerEventType.RESPONSE_TEXT_DONE:
                text = event.text if hasattr(event, 'text') else event.get("text", "")
                print(f"šŸ¤– Assistant text:\t{text}")
                await write_conversation_log(f"Assistant Text Response:\t{text}")
    
            elif event.type == ServerEventType.RESPONSE_AUDIO_TRANSCRIPT_DONE:
                transcript = event.transcript if hasattr(event, 'transcript') else event.get("transcript", "")
                print(f"šŸ¤– Assistant audio transcript:\t{transcript}")
                await write_conversation_log(f"Assistant Audio Response:\t{transcript}")
    
            elif event.type == ServerEventType.RESPONSE_DONE:
                logger.info("Response complete")
                await write_conversation_log("--- Response complete ---")
                self._active_response = False
                self._response_api_done = True
    
                # If an approval prompt needs to be injected, do it now that no response is active
                if self._approval_prompt_needed and self._pending_approval is not None:
                    self._approval_prompt_needed = False
                    await self._send_approval_voice_prompt(self._pending_approval, conn)
                # If MCP results are pending and all calls are now done, create response
                elif self._mcp_results_pending and self._mcp_call_in_progress <= 0 and self._pending_approval is None:
                    self._mcp_results_pending = False
                    try:
                        await conn.response.create()
                    except Exception:
                        pass
                # If a response.create was deferred due to collision, retry now
                elif self._needs_response_create:
                    self._needs_response_create = False
                    try:
                        await conn.response.create()
                    except Exception:
                        pass  # Best-effort retry
    
            # <voice_approval_transcription>
            elif event.type == ServerEventType.CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED:
                transcript = event.transcript if hasattr(event, 'transcript') else event.get("transcript", "")
                logger.info("User said: %s", transcript)
                print(f"šŸ‘¤ You said:\t{transcript}")
                await write_conversation_log(f"User Input:\t{transcript}")
    
                # Interpret as an approval answer if we have a pending approval —
                # whether or not the prompt has finished speaking. This allows the
                # user to barge in with "yes" without waiting for the full prompt.
                if self._pending_approval is not None:
                    await self._resolve_voice_approval(transcript, conn)
            # </voice_approval_transcription>
    
            elif event.type == ServerEventType.ERROR:
                msg = event.error.message
                # Reset response state — errors can terminate a response without RESPONSE_DONE
                self._active_response = False
                self._response_api_done = True
                if "Cancellation failed: no active response" not in msg:
                    if "interim response" in msg.lower():
                        logger.warning("Interim response not supported with this model pipeline (non-fatal)")
                    elif "active response" in msg.lower():
                        logger.debug("Response collision (expected during MCP flow): %s", msg)
                    else:
                        logger.error("VoiceLive error: %s", msg)
                        print(f"Error: {msg}")
                        await write_conversation_log(f"ERROR: {msg}")
    
            # MCP-specific events
            elif event.type == ServerEventType.MCP_LIST_TOOLS_IN_PROGRESS:
                logger.info("MCP list tools in progress for %s", event.item_id)
    
            elif event.type == ServerEventType.MCP_LIST_TOOLS_COMPLETED:
                logger.info("MCP list tools completed for %s", event.item_id)
                print("šŸ”§ MCP tools discovered successfully")
                await write_conversation_log("MCP tools discovered successfully")
    
            elif event.type == ServerEventType.MCP_LIST_TOOLS_FAILED:
                logger.error("MCP list tools failed for %s", event.item_id)
                print("āŒ MCP tool discovery failed")
                await write_conversation_log("ERROR: MCP tool discovery failed")
    
            elif event.type == ServerEventType.RESPONSE_MCP_CALL_IN_PROGRESS:
                logger.info("MCP call in progress for %s", event.item_id)
                print("ā³ MCP tool call in progress...")
                await write_conversation_log(f"MCP call in progress: {event.item_id}")
                self._mcp_call_in_progress += 1
                self._active_mcp_items.add(event.item_id)
                self._start_mcp_stall_timer(conn)
    
            elif event.type == ServerEventType.RESPONSE_MCP_CALL_COMPLETED:
                item_id = event.item_id
                self._mcp_call_in_progress = max(0, self._mcp_call_in_progress - 1)
                self._active_mcp_items.discard(item_id)
                self._cancel_mcp_stall_timer()
                if item_id in self._handled_mcp_completions:
                    logger.debug("Ignoring duplicate MCP completion for %s", item_id)
                else:
                    self._handled_mcp_completions.add(item_id)
                    is_stale = item_id in self._stale_mcp_items
                    self._stale_mcp_items.discard(item_id)
                    logger.info("MCP call completed for %s (stale=%s)", item_id, is_stale)
                    await write_conversation_log(f"MCP call completed: {item_id} (stale={is_stale})")
                    await self._handle_mcp_call_completed(event, conn, is_stale=is_stale)
    
            elif event.type == ServerEventType.RESPONSE_MCP_CALL_FAILED:
                item_id = event.item_id
                logger.error("MCP call failed for %s", item_id)
                print("āŒ MCP tool call failed")
                await write_conversation_log(f"ERROR: MCP call failed: {item_id}")
                self._mcp_call_in_progress = max(0, self._mcp_call_in_progress - 1)
                self._active_mcp_items.discard(item_id)
                self._stale_mcp_items.discard(item_id)
                self._cancel_mcp_stall_timer()
                # Kick the model to inform the user the tool call failed
                try:
                    await conn.response.create()
                except Exception as e:
                    if "active response" not in str(e).lower():
                        logger.warning("Failed to create response after MCP failure: %s", e)
    
            elif event.type == ServerEventType.CONVERSATION_ITEM_CREATED:
                logger.info("Conversation item created: id=%s, type=%s", event.item.id, event.item.type)
                if event.item.type == ItemType.MCP_LIST_TOOLS:
                    logger.info("MCP list tools item: server_label=%s", event.item.server_label)
                elif event.item.type == ItemType.MCP_CALL:
                    await self._handle_mcp_call_arguments(event, conn)
                elif event.item.type == ItemType.MCP_APPROVAL_REQUEST:
                    await self._handle_mcp_approval_request(event, conn)
            else:
                logger.debug("Unhandled event type: %s", event.type)
        # </handle_mcp_events>
    
        # <handle_approval>
        async def _handle_mcp_approval_request(self, conversation_created_event, connection):
            """Handle MCP approval request by asking the user via voice."""
            if not isinstance(conversation_created_event, ServerEventConversationItemCreated):
                logger.error("Expected ServerEventConversationItemCreated")
                return
            if not isinstance(conversation_created_event.item, ResponseMCPApprovalRequestItem):
                logger.error("Expected ResponseMCPApprovalRequestItem")
                return
    
            mcp_approval_item = conversation_created_event.item
            approval_id = mcp_approval_item.id
            server_label = mcp_approval_item.server_label
            function_name = mcp_approval_item.name
    
            if not approval_id:
                logger.error("MCP approval item missing ID")
                return
    
            # Auto-deny after too many calls to the same server in one task.
            # This prevents infinite tool-call loops in voice UX.
            MAX_APPROVAL_CALLS_PER_TASK = 3
            current_count = self._approval_call_count.get(server_label, 0)
            if current_count >= MAX_APPROVAL_CALLS_PER_TASK:
                logger.info("Auto-denying %s — reached %d calls this task", function_name, current_count)
                print(f"   Auto-denied: {server_label}/{function_name} (max {MAX_APPROVAL_CALLS_PER_TASK} calls reached)")
                try:
                    await connection.conversation.item.create(
                        item=MCPApprovalResponseRequestItem(approval_request_id=approval_id, approve=False)
                    )
                except Exception as e:
                    logger.warning("Failed to send auto-deny: %s", e)
                return
    
            # Auto-approve if user already approved this server earlier in the same turn.
            # This avoids repeated approval prompts for consecutive calls to the same service.
            if server_label in self._approved_servers_this_turn:
                logger.info("Auto-approving %s — server already approved this turn", function_name)
                print(f"   Auto-approved: {server_label}/{function_name} (already approved this turn)")
                try:
                    await connection.conversation.item.create(
                        item=MCPApprovalResponseRequestItem(approval_request_id=approval_id, approve=True)
                    )
                except Exception as e:
                    logger.warning("Failed to send auto-approve: %s", e)
                return
    
            # If another approval is already pending, queue this one
            if self._pending_approval is not None:
                logger.info("Queuing approval for %s — another is already pending", function_name)
                self._approval_queue.append({
                    "approval_id": approval_id,
                    "server_label": server_label,
                    "function_name": function_name,
                })
                return
    
            logger.info("MCP approval request: server=%s tool=%s", server_label, function_name)
            print(f"\nšŸ” MCP Approval Request (voice-based):")
            print(f"   Server: {server_label}  Tool: {function_name}")
    
            # Store the pending approval. If no response is currently active,
            # send the voice prompt immediately. Otherwise, defer it to
            # RESPONSE_DONE to avoid colliding with an active response.
            self._pending_approval = {
                "approval_id": approval_id,
                "server_label": server_label,
                "function_name": function_name,
            }
    
            if not self._active_response:
                await self._send_approval_voice_prompt(self._pending_approval, connection)
            else:
                self._approval_prompt_needed = True
    
        async def _send_approval_voice_prompt(self, pending: dict, connection):
            """Inject a system message asking the model to verbally request permission."""
            server = pending["server_label"]
            call_count = self._approval_call_count.get(server, 0)
            self._approval_call_count[server] = call_count + 1
    
            if call_count == 0:
                prompt = (
                    "You MUST ask the user for explicit permission before proceeding. "
                    f'Say exactly: "I\'d like to search the {server} service for information. '
                    f'Do you approve? Please say yes or no."'
                )
            else:
                prompt = (
                    "You MUST ask the user for permission again. "
                    'Say exactly: "I need to do one more search to get complete information. '
                    'Should I continue? Please say yes or no."'
                )
    
            try:
                await connection.conversation.item.create(
                    item=MessageItem(
                        role="system",
                        content=[InputTextContentPart(text=prompt)],
                    )
                )
                await connection.response.create()
            except Exception as e:
                logger.warning("Failed to send approval voice prompt: %s", e)
    
        async def _resolve_voice_approval(self, transcript: str, connection):
            """Interpret the user's spoken response as approval or denial."""
            pending = self._pending_approval
            if pending is None:
                return
    
            text = transcript.strip().lower()
    
            # Match "yes" or "no" as whole words (word boundaries prevent false
            # positives from words like "yesterday" or "nobody").
            # Also accept "stop" and "cancel" as denial.
            approved = bool(re.search(r'\byes\b', text))
            denied = bool(re.search(r'\b(no|stop|cancel)\b', text))
    
            if not approved and not denied:
                # Ambiguous — ask again via the deferred prompt mechanism
                logger.info("Ambiguous approval response: %s", transcript)
                self._approval_prompt_needed = True
                return
    
            if approved and denied:
                # Conflicting signals — treat as denial for safety
                approved = False
    
            # Clear the pending state before sending the response
            self._pending_approval = None
            if approved:
                self._approved_servers_this_turn.add(pending["server_label"])
            else:
                self._approval_call_count.clear()  # Topic is over
                self._approved_servers_this_turn.discard(pending["server_label"])
    
            approval_response_item = MCPApprovalResponseRequestItem(
                approval_request_id=pending["approval_id"], approve=approved
            )
            try:
                await connection.conversation.item.create(item=approval_response_item)
            except Exception as e:
                logger.error("Failed to send approval response: %s", e)
                return
            logger.info("Voice approval resolved: %s for %s", approved, pending["function_name"])
            print(f"   Voice approval: {'Approved āœ…' if approved else 'Denied āŒ'}")
            await write_conversation_log(f"Voice approval: {'Approved' if approved else 'Denied'} for {pending['server_label']}")
    
            # Process next queued approval, if any
            await self._process_next_approval(connection)
    
        async def _process_next_approval(self, connection):
            """Pop the next queued approval and ask via voice."""
            if not self._approval_queue:
                return
            next_approval = self._approval_queue.pop(0)
            self._pending_approval = next_approval
    
            # Send immediately if no response is active, otherwise defer
            if not self._active_response:
                await self._send_approval_voice_prompt(next_approval, connection)
            else:
                self._approval_prompt_needed = True
        # </handle_approval>
    
        # <mcp_stall_detection>
        MCP_STALL_MAX_NOTIFICATIONS = 3
    
        def _start_mcp_stall_timer(self, connection):
            """Start a repeating timer that verbally updates the user if an MCP call takes too long."""
            self._cancel_mcp_stall_timer()
    
            async def _stall_loop():
                stall_count = 0
                while self._mcp_call_in_progress > 0 and stall_count < self.MCP_STALL_MAX_NOTIFICATIONS:
                    await asyncio.sleep(10)
                    if self._mcp_call_in_progress <= 0:
                        break
                    stall_count += 1
                    # Note: MCP calls cannot be cancelled via the API — only honest
                    # status updates are possible until the server responds or times out.
                    msg = ("The tool call is still running. "
                           "Briefly reassure the user that you're still waiting for results. "
                           "One short sentence only.")
                    logger.info("MCP stall notification #%d", stall_count)
                    try:
                        await connection.conversation.item.create(
                            item=MessageItem(
                                role="system",
                                content=[InputTextContentPart(text=msg)],
                            )
                        )
                        await connection.response.create()
                    except Exception as e:
                        if "active response" in str(e).lower():
                            self._needs_response_create = True
                        else:
                            logger.debug("Stall notification failed: %s", e)
    
            self._mcp_stall_task = asyncio.create_task(_stall_loop())
    
        def _cancel_mcp_stall_timer(self):
            """Cancel the MCP stall timer if running."""
            if self._mcp_stall_task and not self._mcp_stall_task.done():
                self._mcp_stall_task.cancel()
            self._mcp_stall_task = None
        # </mcp_stall_detection>
    
        async def _handle_mcp_call_completed(self, mcp_call_completed_event, connection, *, is_stale=False):
            """Handle MCP call completed events."""
            if not isinstance(mcp_call_completed_event, ServerEventResponseMcpCallCompleted):
                logger.error("Expected ServerEventResponseMcpCallCompleted")
                return
    
            logger.info("MCP call completed for %s (stale=%s)", mcp_call_completed_event.item_id, is_stale)
            print("āœ… MCP tool call completed successfully")
    
            # Clean up item mapping
            self._mcp_item_to_server.pop(mcp_call_completed_event.item_id, None)
    
            # Reset approval counter if no more approvals are pending (task complete)
            if self._pending_approval is None and not self._approval_queue:
                self._approval_call_count.clear()
    
            # If the user moved on during this call, tell the model it's a late result
            if is_stale:
                try:
                    await connection.conversation.item.create(
                        item=MessageItem(
                            role="system",
                            content=[InputTextContentPart(
                                text="This tool result is from an earlier request. The user has "
                                     "since moved on. Briefly introduce it as a late result, e.g. "
                                     "'By the way, those results from earlier just came in...' "
                                     "then share the key findings concisely."
                            )],
                        )
                    )
                except Exception as e:
                    logger.warning("Failed to inject late-result context: %s", e)
    
            # Batch response: only call response.create when ALL MCP calls for this
            # turn have completed. This prevents partial results and repeated tool calls.
            if self._mcp_call_in_progress <= 0 and self._pending_approval is None and not self._approval_queue:
                logger.info("All MCP calls complete — creating response")
                try:
                    await connection.response.create()
                except Exception as e:
                    if "active response" in str(e).lower():
                        self._needs_response_create = True
                    else:
                        logger.warning("Failed to create response after MCP calls: %s", e)
            else:
                self._mcp_results_pending = True
                logger.info("MCP calls still in progress (%d) — deferring response", self._mcp_call_in_progress)
    
        async def _handle_mcp_call_arguments(self, conversation_created_event, connection):
            """Log MCP call details and announce the tool call to the user via voice."""
            if not isinstance(conversation_created_event, ServerEventConversationItemCreated):
                logger.error("Expected ServerEventConversationItemCreated")
                return
            if not isinstance(conversation_created_event.item, ResponseMCPCallItem):
                logger.error("Expected ResponseMCPCallItem")
                return
    
            mcp_call_item = conversation_created_event.item
            server_label = mcp_call_item.server_label
            function_name = mcp_call_item.name
    
            logger.info("MCP Call triggered: server_label=%s, function_name=%s", server_label, function_name)
            print(f"šŸ”§ MCP tool call: {server_label}/{function_name}")
            self._mcp_item_to_server[mcp_call_item.id] = f"{server_label}/{function_name}"
    
            # Announce the tool call to the user so they know something is
            # happening while the MCP call runs. Skip for approval-required
            # servers (the approval prompt handles communication) and skip
            # if an approval is already pending.
            if self._pending_approval is None and server_label not in self._approval_servers:
                try:
                    await connection.conversation.item.create(
                        item=MessageItem(
                            role="system",
                            content=[InputTextContentPart(
                                text="Briefly tell the user you're looking something up. One short sentence only."
                            )],
                        )
                    )
                    await connection.response.create()
                except Exception as e:
                    if "active response" not in str(e).lower():
                        logger.warning("Failed to create tool announcement: %s", e)
    
    
    def parse_arguments():
        """Parse command line arguments."""
        parser = argparse.ArgumentParser(
            description="Voice Assistant with MCP using Azure VoiceLive SDK",
        )
    
        parser.add_argument(
            "--api-key",
            help="Azure VoiceLive API key (or set AZURE_VOICELIVE_API_KEY env var)",
            type=str,
            default=os.environ.get("AZURE_VOICELIVE_API_KEY"),
        )
        parser.add_argument(
            "--endpoint",
            help="Azure VoiceLive endpoint (default: from AZURE_VOICELIVE_ENDPOINT env var)",
            type=str,
            default=os.environ.get("AZURE_VOICELIVE_ENDPOINT", "https://your-resource-name.services.ai.azure.com/"),
        )
        parser.add_argument(
            "--model",
            help="VoiceLive model to use (default: gpt-realtime)",
            type=str,
            default=os.environ.get("AZURE_VOICELIVE_MODEL", "gpt-realtime"),
        )
        parser.add_argument(
            "--voice",
            help="Voice to use for the assistant (default: en-US-Ava:DragonHDLatestNeural)",
            type=str,
            default=os.environ.get("AZURE_VOICELIVE_VOICE", "en-US-Ava:DragonHDLatestNeural"),
        )
        parser.add_argument(
            "--instructions",
            help="System instructions for the AI assistant",
            type=str,
            default=os.environ.get(
                "AZURE_VOICELIVE_INSTRUCTIONS",
                "You are a helpful AI assistant with access to MCP tools. "
                "Always respond in English. "
                "When a user asks a question, use the appropriate tool once to find information, "
                "then summarize the results conversationally. IMPORTANT: Never call the same tool "
                "more than once per user question. After receiving a tool result, always respond "
                "to the user with what you found — do not search again. "
                "Some tools require user approval before they can be used. When you receive a "
                "system message asking you to request permission, you MUST clearly ask the user "
                "for their explicit approval before proceeding. Always wait for the user to say "
                "yes or no. Never skip the approval question or assume permission is granted. "
                "If a tool result arrives after the conversation has moved to a different topic, "
                "briefly introduce it as a late result before sharing the findings.",
            ),
        )
        parser.add_argument(
            "--use-token-credential", help="Use Azure token credential instead of API key", action="store_true", default=False
        )
        parser.add_argument("--verbose", help="Enable verbose logging", action="store_true")
    
        return parser.parse_args()
    
    
    async def write_conversation_log(message: str) -> None:
        """Write a message to the conversation log."""
        log_path = os.path.join(_script_dir, 'logs', conversation_logfilename)
        def _write():
            with open(log_path, 'a', encoding='utf-8') as f:
                f.write(message + "\n")
        await asyncio.to_thread(_write)
    
    
    def main():
        """Main function."""
        args = parse_arguments()
    
        if args.verbose:
            logging.getLogger().setLevel(logging.DEBUG)
    
        if not args.api_key and not args.use_token_credential:
            print("āŒ Error: No authentication provided")
            print("Please provide an API key using --api-key or set AZURE_VOICELIVE_API_KEY environment variable,")
            print("or use --use-token-credential for Azure authentication.")
            sys.exit(1)
    
        credential: Union[AzureKeyCredential, AsyncTokenCredential]
        if args.use_token_credential:
            credential = AzureCliCredential()
            logger.info("Using Azure token credential")
        else:
            credential = AzureKeyCredential(args.api_key)
            logger.info("Using API key credential")
    
        assistant = MCPVoiceAssistant(
            endpoint=args.endpoint,
            credential=credential,
            model=args.model,
            voice=args.voice,
            instructions=args.instructions,
        )
    
        def signal_handler(_sig, _frame):
            logger.info("Received shutdown signal")
            raise KeyboardInterrupt()
    
        signal.signal(signal.SIGINT, signal_handler)
        signal.signal(signal.SIGTERM, signal_handler)
    
        try:
            asyncio.run(assistant.start())
        except KeyboardInterrupt:
            print("\nšŸ‘‹ Voice assistant with MCP shut down. Goodbye!")
        except Exception as e:
            print("Fatal Error: ", e)
    
    
    if __name__ == "__main__":
        # Check audio system
        try:
            p = pyaudio.PyAudio()
            input_devices = [
                i for i in range(p.get_device_count())
                if cast(Union[int, float], p.get_device_info_by_index(i).get("maxInputChannels", 0) or 0) > 0
            ]
            output_devices = [
                i for i in range(p.get_device_count())
                if cast(Union[int, float], p.get_device_info_by_index(i).get("maxOutputChannels", 0) or 0) > 0
            ]
            p.terminate()
    
            if not input_devices:
                print("āŒ No audio input devices found. Please check your microphone.")
                sys.exit(1)
            if not output_devices:
                print("āŒ No audio output devices found. Please check your speakers.")
                sys.exit(1)
        except Exception as e:
            print(f"āŒ Audio system check failed: {e}")
            sys.exit(1)
    
        print("šŸŽ™ļø  Voice Assistant with MCP - Azure VoiceLive SDK")
        print("=" * 65)
    
        main()
    
  2. Sign in to Azure with the following command:

    az login
    
  3. Run the Python script:

    python mcp-quickstart.py
    
  4. Speak into your microphone. Try asking questions like "What tools do you have?" or "Search the Azure documentation for Voice Live API."

    • For the deepwiki server (require_approval="never"), tool calls execute automatically.
    • For the azure_doc server (require_approval="always"), you're prompted to approve each tool call in the console.
  5. Press Ctrl+C to stop the session.

MCP server configuration reference

Parameter Required Description
server_label Yes Display name for the MCP server.
server_url Yes URL of the remote MCP endpoint.
allowed_tools No List of tool names the model can call. If omitted, all tools are allowed.
require_approval No "never", "always" (default), or a per-tool dictionary.
headers No Extra HTTP headers to include in MCP requests.
authorization No Authorization token for MCP requests.

For the complete REST API type definition, see MCPTool in the Voice Live API reference.

Learn how to connect remote MCP servers to a Voice Live session using the VoiceLive SDK for C#. This article builds on the Quickstart: Create a Voice Live real-time voice agent with MCP server integration.

Reference documentation | Package (NuGet) | Additional samples on GitHub

Follow the how-to below or get the full sample code:

Prerequisites

  • An Azure subscription. Create one for free.
  • .NET 8.0 SDK or later.
  • A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
  • Azure.AI.VoiceLive package version 1.1.0 or later (MCP support requires API version 2026-04-10).
  • Assign the Cognitive Services User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Tip

To use Voice Live with MCP, you don't need to deploy an audio model with your Foundry resource. Voice Live is fully managed, and the model is automatically deployed for you. For more information about model availability, see the Voice Live overview documentation.

Prepare the environment

Complete the Voice Live quickstart to set up your environment, configure authentication, and test your first Voice Live conversation.

MCP integration concepts

MCP server definition

Use the VoiceLiveMcpServerDefinition class to declare each remote MCP endpoint. At minimum, provide ServerLabel (a display name) and ServerUrl (the MCP endpoint URL). Optionally restrict available tools with AllowedTools and configure the approval mode.

Approval modes

Control whether MCP tool calls require user approval before execution:

  • RequireApproval = "never": The tool executes automatically when the model invokes it.
  • RequireApproval = "always" (default): The client receives an approval request and must respond before the tool runs.

API version requirement

MCP support requires API version 2026-04-10 or later.

Define MCP servers

Define the MCP servers that Voice Live can use during the session. Each server is a VoiceLiveMcpServerDefinition instance added to the tools list in the session configuration.

The following code defines two MCP servers: one with automatic tool execution and one that requires user approval before running.

/// <summary>
/// Define MCP servers that Voice Live can use during the session.
/// Each server is a VoiceLiveMcpServerDefinition instance added to the session options tools list.
/// </summary>
private List<VoiceLiveToolDefinition> DefineMCPServers()
{
    var mcpTools = new List<VoiceLiveToolDefinition>
    {
        new VoiceLiveMcpServerDefinition("deepwiki", "https://mcp.deepwiki.com/mcp")
        {
            AllowedTools = { "read_wiki_structure", "ask_question" },
            RequireApproval = BinaryData.FromString("\"never\""),
        },
        new VoiceLiveMcpServerDefinition("azure_doc", "https://learn.microsoft.com/api/mcp")
        {
            RequireApproval = BinaryData.FromString("\"always\""),
        },
    };

    return mcpTools;
}

In this sample:

  • The deepwiki server allows only read_wiki_structure and ask_question tools, with RequireApproval set to "never" for automatic execution.
  • The azure_doc server allows all tools on the endpoint, with RequireApproval set to "always" so users can review each call before execution.

Configure the session with MCP tools

Pass the MCP server definitions to the session options tools list alongside your voice, modality, and turn-detection settings.

private async Task SetupSessionAsync(CancellationToken cancellationToken)
{
    _logger.LogInformation("Setting up session with MCP tools...");

    var azureVoice = new AzureStandardVoice(_voice);
    var turnDetection = new ServerVadTurnDetection
    {
        Threshold = 0.5f,
        PrefixPadding = TimeSpan.FromMilliseconds(300),
        SilenceDuration = TimeSpan.FromMilliseconds(500)
    };

    // Create session options and add MCP servers to the tools list
    var sessionOptions = new VoiceLiveSessionOptions
    {
        InputAudioEchoCancellation = new AudioEchoCancellation(),
        Model = _model,
        Instructions = _instructions,
        Voice = azureVoice,
        InputAudioFormat = InputAudioFormat.Pcm16,
        OutputAudioFormat = OutputAudioFormat.Pcm16,
        TurnDetection = turnDetection
    };

    // Enable input audio transcription so we receive
    // SessionUpdateConversationItemInputAudioTranscriptionCompleted events
    // (required for the voice-based approval flow).
    sessionOptions.InputAudioTranscription = new AudioInputTranscriptionOptions(
        _model.Contains("realtime", StringComparison.OrdinalIgnoreCase) ? "whisper-1" : "azure-speech");

    sessionOptions.Modalities.Clear();
    sessionOptions.Modalities.Add(InteractionModality.Text);
    sessionOptions.Modalities.Add(InteractionModality.Audio);

    // Add MCP servers to the tools list
    var mcpServers = DefineMCPServers();
    foreach (var tool in mcpServers)
    {
        sessionOptions.Tools.Add(tool);
    }

    // Track which servers require approval for per-turn loop prevention
    _approvalServers = new HashSet<string> { "azure_doc" };

    await _session!.ConfigureSessionAsync(sessionOptions, cancellationToken).ConfigureAwait(false);
    _logger.LogInformation("Session with MCP tools configured");
}

In this sample:

  • VoiceLiveSessionOptions bundles MCP tools with audio format, voice, and turn detection settings.
  • ConfigureSessionAsync(options) sends the full configuration to Voice Live.
  • Voice Live automatically discovers available tools from each MCP server after the session starts.

Handle MCP events

Process MCP-specific events in the event loop. The key events include MCP tool call creation, completion, failure, and approval requests.

private async Task HandleSessionUpdateAsync(SessionUpdate serverEvent, CancellationToken cancellationToken)
{
    switch (serverEvent)
    {
        case SessionUpdateSessionUpdated sessionUpdated:
            _logger.LogInformation("Session updated");
            WriteLog($"SessionID: {sessionUpdated.Session?.Id}");
            WriteLog($"Model: {_model}");
            WriteLog($"Voice: {_voice}");
            WriteLog("");
            if (_audioProcessor != null)
                await _audioProcessor.StartCaptureAsync().ConfigureAwait(false);
            break;

        case SessionUpdateInputAudioBufferSpeechStarted:
            Console.WriteLine("šŸŽ¤ Listening...");
            if (_audioProcessor != null)
                await _audioProcessor.StopPlaybackAsync().ConfigureAwait(false);
            if (_responseActive && _canCancelResponse)
            {
                try { await _session!.CancelResponseAsync(cancellationToken).ConfigureAwait(false); }
                catch { }
                try { await _session!.ClearStreamingAudioAsync(cancellationToken).ConfigureAwait(false); }
                catch { }
            }
            // Do NOT reset _approvalCallCount here — the counter should only
            // reset on task completion (in MCP-call-completed when no pending/queued
            // approvals remain) or on explicit denial (in ResolveVoiceApprovalAsync).
            // Resetting on every speech-start would let the model retry denied calls.

            // Clear deferred response flags if no MCP calls are in progress.
            // Prevents stale needsResponseCreate from re-triggering result playback
            // after the user interrupts.
            if (_mcpCallInProgress <= 0)
            {
                _needsResponseCreate = false;
                _mcpResultsPending = false;
            }

            // Reset approved-servers-this-turn when user starts a new topic
            if (_pendingApproval == null && _mcpCallInProgress <= 0)
                _approvedServersThisTurn.Clear();

            // If an MCP call is running, ask the user if they want to wait or skip
            if (_mcpCallInProgress > 0 && _pendingApproval == null)
            {
                foreach (var id in _activeMcpItems) _staleMcpItems.Add(id);
                _logger.LogInformation("User spoke during MCP call — marking {Count} calls as stale", _activeMcpItems.Count);
                try
                {
                    await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                    {
                        type = "conversation.item.create",
                        item = new
                        {
                            type = "message",
                            role = "system",
                            content = new[] { new { type = "input_text", text = "A tool call is still running in the background. The user just spoke. Respond to what the user said. If a tool result arrives later, briefly introduce it as a late result from an earlier request." } }
                        }
                    }), cancellationToken).ConfigureAwait(false);
                }
                catch (Exception ex) { _logger.LogWarning("Failed to inject MCP status update: {Error}", ex.Message); }
            }
            break;

        case SessionUpdateInputAudioBufferSpeechStopped:
            Console.WriteLine("šŸ¤” Processing...");
            if (_audioProcessor != null)
                await _audioProcessor.StartPlaybackAsync().ConfigureAwait(false);
            break;

        case SessionUpdateResponseCreated:
            _responseActive = true;
            _canCancelResponse = true;
            break;

        case SessionUpdateResponseAudioDelta audioDelta:
            if (audioDelta.Delta != null && _audioProcessor != null)
                await _audioProcessor.QueueAudioAsync(audioDelta.Delta.ToArray()).ConfigureAwait(false);
            break;

        case SessionUpdateResponseAudioDone:
            Console.WriteLine("šŸŽ¤ Ready for next input...");
            break;

        case SessionUpdateResponseDone:
            _responseActive = false;
            _canCancelResponse = false;
            WriteLog("--- Response complete ---");
            // If an approval prompt needs to be injected, do it now
            if (_approvalPromptNeeded && _pendingApproval != null)
            {
                _approvalPromptNeeded = false;
                await SendApprovalVoicePromptAsync(cancellationToken).ConfigureAwait(false);
            }
            // If MCP results are pending and all calls are now done, create response
            else if (_mcpResultsPending && _mcpCallInProgress <= 0 && _pendingApproval == null)
            {
                _mcpResultsPending = false;
                try { await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false); }
                catch { }
            }
            else if (_needsResponseCreate)
            {
                _needsResponseCreate = false;
                try { await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false); }
                catch { }
            }
            break;

        case SessionUpdateError errorEvent:
            var msg = errorEvent.Error?.Message ?? "";
            if (!msg.Contains("no active response", StringComparison.OrdinalIgnoreCase))
            {
                // Suppress non-fatal interim/collision errors
                if (msg.Contains("interim response", StringComparison.OrdinalIgnoreCase))
                {
                    _logger.LogWarning("Interim response not supported with this model pipeline (non-fatal)");
                }
                else if (msg.Contains("active response", StringComparison.OrdinalIgnoreCase))
                {
                    _logger.LogDebug("Response collision (expected during MCP flow): {Message}", msg);
                }
                else
                {
                    Console.WriteLine($"āŒ Error: {msg}");
                    WriteLog($"ERROR: {msg}");
                }
            }
            _responseActive = false;
            _canCancelResponse = false;
            break;

        // Transcription event — used for voice-based approval resolution
        case SessionUpdateConversationItemInputAudioTranscriptionCompleted transcription:
            var transcript = transcription.Transcript ?? "";
            _logger.LogInformation("User said: {Transcript}", transcript);
            Console.WriteLine($"šŸ‘¤ You said:\t{transcript}");
            WriteLog($"User Input:\t{transcript}");
            if (_pendingApproval != null)
            {
                await ResolveVoiceApprovalAsync(transcript, cancellationToken).ConfigureAwait(false);
            }
            break;

        // MCP-specific events
        case SessionUpdateMcpListToolsCompleted mcpListDone:
            Console.WriteLine("šŸ”§ MCP tools discovered successfully");
            WriteLog("MCP tools discovered successfully");
            _logger.LogInformation("MCP tools discovered for server");
            break;

        case SessionUpdateMcpListToolsFailed:
            Console.WriteLine("āŒ MCP tool discovery failed");
            WriteLog("ERROR: MCP tool discovery failed");
            break;

        case SessionUpdateResponseMcpCallInProgress mcpInProgress:
            Console.WriteLine("ā³ MCP tool call in progress...");
            WriteLog($"MCP call in progress: {mcpInProgress.ItemId}");
            _mcpCallInProgress++;
            _activeMcpItems.Add(mcpInProgress.ItemId ?? "");
            StartMcpStallTimer(cancellationToken);
            break;

        case SessionUpdateResponseMcpCallCompleted mcpCompleted:
        {
            var itemId = mcpCompleted.ItemId ?? "";
            _mcpCallInProgress = Math.Max(0, _mcpCallInProgress - 1);
            _activeMcpItems.Remove(itemId);
            CancelMcpStallTimer();
            if (_handledMcpCompletions.Contains(itemId))
            {
                _logger.LogDebug("Ignoring duplicate MCP completion for {ItemId}", itemId);
            }
            else
            {
                _handledMcpCompletions.Add(itemId);
                bool isStale = _staleMcpItems.Remove(itemId);
                _logger.LogInformation("MCP call completed for {ItemId} (stale={IsStale})", itemId, isStale);
                Console.WriteLine("āœ… MCP tool call completed successfully");
                WriteLog($"MCP call completed: {itemId} (stale={isStale})");

                // Clean up item mapping
                _mcpItemToServer.Remove(itemId);

                // Reset approval counter if no more approvals are pending
                if (_pendingApproval == null && _approvalQueue.Count == 0)
                    _approvalCallCount.Clear();

                // If the user moved on during this call, tell the model it's a late result
                if (isStale)
                {
                    try
                    {
                        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                        {
                            type = "conversation.item.create",
                            item = new
                            {
                                type = "message",
                                role = "system",
                                content = new[] { new { type = "input_text", text = "This tool result is from an earlier request. The user has since moved on. Briefly introduce it as a late result, e.g. 'By the way, those results from earlier just came in...' then share the key findings concisely." } }
                            }
                        }), cancellationToken).ConfigureAwait(false);
                    }
                    catch (Exception ex) { _logger.LogWarning("Failed to inject late-result context: {Error}", ex.Message); }
                }

                // Batch response: only call response.create when ALL MCP calls for this
                // turn have completed. This prevents partial results and repeated tool calls.
                if (_pendingApproval == null && _approvalQueue.Count == 0 && _mcpCallInProgress <= 0)
                {
                    try
                    {
                        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false);
                    }
                    catch (Exception ex)
                    {
                        if (ex.Message.Contains("active response", StringComparison.OrdinalIgnoreCase))
                            _needsResponseCreate = true;
                        else
                            _logger.LogWarning("Failed to create response after MCP call: {Error}", ex.Message);
                    }
                }
                else
                {
                    _mcpResultsPending = true;
                    _logger.LogInformation("MCP calls still in progress ({Count}) — deferring response", _mcpCallInProgress);
                }
            }
            break;
        }

        case SessionUpdateResponseMcpCallFailed mcpFailed:
        {
            var failedItemId = mcpFailed.ItemId ?? "";
            Console.WriteLine("āŒ MCP tool call failed");
            WriteLog($"ERROR: MCP call failed: {failedItemId}");
            _mcpCallInProgress = Math.Max(0, _mcpCallInProgress - 1);
            _activeMcpItems.Remove(failedItemId);
            _staleMcpItems.Remove(failedItemId);
            CancelMcpStallTimer();
            try { await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false); }
            catch { }
            break;
        }

        case SessionUpdateConversationItemCreated itemCreated
            when itemCreated.Item is SessionResponseMcpApprovalRequestItem mcpApproval:
            await HandleMCPApprovalAsync(mcpApproval, cancellationToken).ConfigureAwait(false);
            break;

        case SessionUpdateConversationItemCreated itemCreated:
            _logger.LogDebug("Conversation item created: {ItemType}", itemCreated.Item?.GetType().Name);
            // Track mcp_call items for server mapping and announce non-approval tool calls
            if (itemCreated.Item is SessionResponseMcpCallItem mcpCallItem)
            {
                var serverLabel = mcpCallItem.ServerLabel ?? "";
                var functionName = mcpCallItem.Name ?? "";
                var mcpItemId = mcpCallItem.Id ?? "";
                _logger.LogInformation("MCP Call triggered: server_label={Server}, function_name={Function}", serverLabel, functionName);
                Console.WriteLine($"šŸ”§ MCP tool call: {serverLabel}/{functionName}");
                if (!string.IsNullOrEmpty(mcpItemId))
                    _mcpItemToServer[mcpItemId] = $"{serverLabel}/{functionName}";

                // Announce the tool call so the user knows something is happening
                if (_pendingApproval == null && !_approvalServers.Contains(serverLabel))
                {
                    try
                    {
                        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                        {
                            type = "conversation.item.create",
                            item = new
                            {
                                type = "message",
                                role = "system",
                                content = new[] { new { type = "input_text", text = "Briefly tell the user you're looking something up. One short sentence only." } }
                            }
                        }), cancellationToken).ConfigureAwait(false);
                        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false);
                    }
                    catch (Exception ex)
                    {
                        if (!ex.Message.Contains("active response", StringComparison.OrdinalIgnoreCase))
                            _logger.LogWarning("Failed to create tool announcement: {Error}", ex.Message);
                    }
                }
            }
            break;

        default:
            _logger.LogDebug("Unhandled event: {EventType}", serverEvent.GetType().Name);
            break;
    }
}

Handle approval requests

When a server is configured with RequireApproval = "always", client code must handle the approval flow. Instead of blocking on Console.ReadLine(), inject a system message so the model asks the user verbally and parse the spoken transcript for intent.

/// <summary>
/// Handle MCP approval request by asking the user via voice.
/// </summary>
private async Task HandleMCPApprovalAsync(SessionResponseMcpApprovalRequestItem approvalItem, CancellationToken cancellationToken)
{
    var approvalId = approvalItem.Id;
    var serverLabel = approvalItem.ServerLabel ?? "";
    var toolName = approvalItem.Name ?? "";

    if (string.IsNullOrEmpty(approvalId))
    {
        _logger.LogError("MCP approval item missing ID");
        return;
    }

    // If another approval is already pending, queue this one
    if (_pendingApproval != null)
    {
        _logger.LogInformation("Queuing approval for {Tool} — another is already pending", toolName);
        _approvalQueue.Enqueue(new ApprovalInfo(approvalId, serverLabel, toolName));
        return;
    }

    const int MaxApprovalCallsPerTask = 3;
    _approvalCallCount.TryGetValue(serverLabel, out var currentCount);
    if (currentCount >= MaxApprovalCallsPerTask)
    {
        _logger.LogInformation("Auto-denying {Tool} — reached {Count} calls this task", toolName, currentCount);
        Console.WriteLine($"   Auto-denied: {serverLabel}/{toolName} (max {MaxApprovalCallsPerTask} calls reached)");
        try
        {
            await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
            {
                type = "conversation.item.create",
                item = new
                {
                    type = "mcp_approval_response",
                    approval_request_id = approvalId,
                    approve = false
                }
            }), cancellationToken).ConfigureAwait(false);
        }
        catch (Exception ex)
        {
            _logger.LogWarning("Failed to send auto-deny: {Error}", ex.Message);
        }
        return;
    }

    // Auto-approve if user already approved this server earlier in the same turn
    if (_approvedServersThisTurn.Contains(serverLabel))
    {
        _logger.LogInformation("Auto-approving {Tool} — server already approved this turn", toolName);
        Console.WriteLine($"   Auto-approved: {serverLabel}/{toolName} (already approved this turn)");
        try
        {
            await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
            {
                type = "conversation.item.create",
                item = new
                {
                    type = "mcp_approval_response",
                    approval_request_id = approvalId,
                    approve = true
                }
            }), cancellationToken).ConfigureAwait(false);
        }
        catch (Exception ex)
        {
            _logger.LogWarning("Failed to send auto-approve: {Error}", ex.Message);
        }
        return;
    }

    _logger.LogInformation("MCP approval request: server={Server} tool={Tool}", serverLabel, toolName);
    Console.WriteLine();
    Console.WriteLine($"šŸ” MCP Approval Request (voice-based):");
    Console.WriteLine($"   Server: {serverLabel}  Tool: {toolName}");
    WriteLog($"Approval request: server={serverLabel} tool={toolName}");

    _pendingApproval = new ApprovalInfo(approvalId, serverLabel, toolName);

    if (!_responseActive)
    {
        await SendApprovalVoicePromptAsync(cancellationToken).ConfigureAwait(false);
    }
    else
    {
        _approvalPromptNeeded = true;
    }
}

/// <summary>
/// Inject a system message asking the model to verbally request permission.
/// </summary>
private async Task SendApprovalVoicePromptAsync(CancellationToken cancellationToken)
{
    var pending = _pendingApproval;
    if (pending == null) return;

    var server = pending.ServerLabel;
    _approvalCallCount.TryGetValue(server, out var callCount);
    _approvalCallCount[server] = callCount + 1;

    string prompt;
    if (callCount == 0)
    {
        prompt = "You MUST ask the user for explicit permission before proceeding. "
               + $"Say exactly: \"I'd like to search the {server} service for information. "
               + "Do you approve? Please say yes or no.\"";
    }
    else
    {
        prompt = "You MUST ask the user for permission again. "
               + "Say exactly: \"I need to do one more search to get complete information. "
               + "Should I continue? Please say yes or no.\"";
    }

    try
    {
        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
        {
            type = "conversation.item.create",
            item = new
            {
                type = "message",
                role = "system",
                content = new[] { new { type = "input_text", text = prompt } }
            }
        }), cancellationToken).ConfigureAwait(false);
        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false);
    }
    catch (Exception ex)
    {
        _logger.LogWarning("Failed to send approval voice prompt: {Error}", ex.Message);
    }
}

/// <summary>
/// Interpret the user's spoken response as approval or denial.
/// </summary>
private async Task ResolveVoiceApprovalAsync(string transcript, CancellationToken cancellationToken)
{
    var pending = _pendingApproval;
    if (pending == null) return;

    var text = transcript.Trim().ToLowerInvariant();

    bool approved = Regex.IsMatch(text, @"\byes\b");
    bool denied = Regex.IsMatch(text, @"\b(no|stop|cancel)\b");

    if (!approved && !denied)
    {
        // Ambiguous — ask again via the deferred prompt mechanism
        _logger.LogInformation("Ambiguous approval response: {Transcript}", transcript);
        _approvalPromptNeeded = true;
        return;
    }

    if (approved && denied)
    {
        // Conflicting signals — treat as denial for safety
        approved = false;
    }

    // Clear the pending state before sending the response
    _pendingApproval = null;
    if (approved)
        _approvedServersThisTurn.Add(pending.ServerLabel);
    else
    {
        _approvalCallCount.Clear();
        _approvedServersThisTurn.Remove(pending.ServerLabel);
    }

    try
    {
        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
        {
            type = "conversation.item.create",
            item = new
            {
                type = "mcp_approval_response",
                approval_request_id = pending.ApprovalId,
                approve = approved,
            }
        }), cancellationToken).ConfigureAwait(false);
    }
    catch (Exception ex)
    {
        _logger.LogError("Failed to send approval response: {Error}", ex.Message);
        return;
    }
    _logger.LogInformation("Voice approval resolved: {Approved} for {Tool}", approved, pending.FunctionName);
    Console.WriteLine($"   Voice approval: {(approved ? "Approved āœ…" : "Denied āŒ")}");
    WriteLog($"Approval resolved: {(approved ? "APPROVED" : "DENIED")} for {pending.ServerLabel}/{pending.FunctionName}");

    // Process next queued approval, if any
    await ProcessNextApprovalAsync(cancellationToken).ConfigureAwait(false);
}

/// <summary>
/// Pop the next queued approval and ask via voice.
/// </summary>
private async Task ProcessNextApprovalAsync(CancellationToken cancellationToken)
{
    if (_approvalQueue.Count == 0) return;

    var next = _approvalQueue.Dequeue();
    _pendingApproval = next;

    if (!_responseActive)
    {
        await SendApprovalVoicePromptAsync(cancellationToken).ConfigureAwait(false);
    }
    else
    {
        _approvalPromptNeeded = true;
    }
}

In this sample:

  • A system message instructs the model to verbally ask for permission.
  • McpApprovalResponseItem sends the decision back to Voice Live with Approve = true or Approve = false.

Resolve voice-based approval

Parse the user's spoken transcript to determine approval. Use word-boundary regex to avoid false positives from words like "yesterday" or "nobody".

/// <summary>
/// Interpret the user's spoken response as approval or denial.
/// </summary>
private async Task ResolveVoiceApprovalAsync(string transcript, CancellationToken cancellationToken)
{
    var pending = _pendingApproval;
    if (pending == null) return;

    var text = transcript.Trim().ToLowerInvariant();

    bool approved = Regex.IsMatch(text, @"\byes\b");
    bool denied = Regex.IsMatch(text, @"\b(no|stop|cancel)\b");

    if (!approved && !denied)
    {
        // Ambiguous — ask again via the deferred prompt mechanism
        _logger.LogInformation("Ambiguous approval response: {Transcript}", transcript);
        _approvalPromptNeeded = true;
        return;
    }

    if (approved && denied)
    {
        // Conflicting signals — treat as denial for safety
        approved = false;
    }

    // Clear the pending state before sending the response
    _pendingApproval = null;
    if (approved)
        _approvedServersThisTurn.Add(pending.ServerLabel);
    else
    {
        _approvalCallCount.Clear();
        _approvedServersThisTurn.Remove(pending.ServerLabel);
    }

    try
    {
        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
        {
            type = "conversation.item.create",
            item = new
            {
                type = "mcp_approval_response",
                approval_request_id = pending.ApprovalId,
                approve = approved,
            }
        }), cancellationToken).ConfigureAwait(false);
    }
    catch (Exception ex)
    {
        _logger.LogError("Failed to send approval response: {Error}", ex.Message);
        return;
    }
    _logger.LogInformation("Voice approval resolved: {Approved} for {Tool}", approved, pending.FunctionName);
    Console.WriteLine($"   Voice approval: {(approved ? "Approved āœ…" : "Denied āŒ")}");
    WriteLog($"Approval resolved: {(approved ? "APPROVED" : "DENIED")} for {pending.ServerLabel}/{pending.FunctionName}");

    // Process next queued approval, if any
    await ProcessNextApprovalAsync(cancellationToken).ConfigureAwait(false);
}

/// <summary>
/// Pop the next queued approval and ask via voice.
/// </summary>
private async Task ProcessNextApprovalAsync(CancellationToken cancellationToken)
{
    if (_approvalQueue.Count == 0) return;

    var next = _approvalQueue.Dequeue();
    _pendingApproval = next;

    if (!_responseActive)
    {
        await SendApprovalVoicePromptAsync(cancellationToken).ConfigureAwait(false);
    }
    else
    {
        _approvalPromptNeeded = true;
    }
}

In this sample:

  • The transcript from ConversationItemInputAudioTranscriptionCompleted is matched against \byes\b and \b(no|stop|cancel)\b patterns.
  • Subsequent calls to the same server within the same turn are auto-approved to avoid repeated prompts.
  • After a configurable maximum (for example, 3 approvals), further calls are auto-denied and the model responds with what it has.

Detect stalls during MCP tool calls

MCP tool calls can take several seconds. Use a repeating timer to proactively inform the user that the assistant is still waiting for results.

private void StartMcpStallTimer(CancellationToken ct)
{
    CancelMcpStallTimer();
    _mcpStallCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
    var token = _mcpStallCts.Token;
    _ = Task.Run(async () =>
    {
        int stallCount = 0;
        while (_mcpCallInProgress > 0 && stallCount < 3)
        {
            await Task.Delay(10000, token).ConfigureAwait(false);
            if (_mcpCallInProgress <= 0 || _session == null)
                break;
            stallCount++;
            // MCP calls cannot be cancelled — only honest status updates are possible.
            string msg = "The tool call is still running. Briefly reassure the user that you're still waiting for results. One short sentence only.";
            try
            {
                await _session.SendCommandAsync(BinaryData.FromObjectAsJson(new
                {
                    type = "conversation.item.create",
                    item = new
                    {
                        type = "message",
                        role = "system",
                        content = new[] { new { type = "input_text", text = msg } }
                    }
                }), token).ConfigureAwait(false);
                await _session.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), token).ConfigureAwait(false);
            }
            catch (Exception ex)
            {
                if (ex.Message.Contains("active response", StringComparison.OrdinalIgnoreCase))
                    _needsResponseCreate = true;
            }
        }
    }, token);
}

private void CancelMcpStallTimer()
{
    if (_mcpStallCts != null)
    {
        _mcpStallCts.Cancel();
        _mcpStallCts.Dispose();
        _mcpStallCts = null;
    }
}

In this sample:

  • A 10-second interval timer injects system messages like "Tell the user you're still waiting" up to 3 times.
  • The timer is cancelled when the MCP call completes or the user interrupts with barge-in.

Run the sample

  1. Create the MCPQuickstart.cs file with the following code:

    // Copyright (c) Microsoft Corporation. All rights reserved.
    // Licensed under the MIT License.
    
    using System;
    using System.Collections.Generic;
    using System.IO;
    using System.Linq;
    using System.Text.RegularExpressions;
    using System.Threading;
    using System.Threading.Channels;
    using System.Threading.Tasks;
    using Azure.AI.VoiceLive;
    using Azure.Identity;
    using Microsoft.Extensions.Configuration;
    using Microsoft.Extensions.Logging;
    using NAudio.Wave;
    
    namespace Azure.AI.VoiceLive.Samples
    {
        /// <summary>
        /// MCP Quickstart - demonstrates MCP server integration with VoiceLive SDK.
        /// Shows how to define MCP servers, handle MCP tool calls, and implement
        /// an approval flow for tool calls that require user consent.
        /// </summary>
        public class Program
        {
            public static async Task<int> Main(string[] args)
            {
                // Setup configuration
                var configuration = new ConfigurationBuilder()
                    .AddJsonFile("appsettings.json", optional: true)
                    .AddEnvironmentVariables()
                    .Build();
    
                var apiKey = configuration["VoiceLive:ApiKey"] ?? Environment.GetEnvironmentVariable("AZURE_VOICELIVE_API_KEY");
                var endpoint = configuration["VoiceLive:Endpoint"] ?? Environment.GetEnvironmentVariable("AZURE_VOICELIVE_ENDPOINT") ?? "https://your-resource-name.services.ai.azure.com/";
                var model = configuration["VoiceLive:Model"] ?? Environment.GetEnvironmentVariable("AZURE_VOICELIVE_MODEL") ?? "gpt-realtime";
                var voice = configuration["VoiceLive:Voice"] ?? Environment.GetEnvironmentVariable("AZURE_VOICELIVE_VOICE") ?? "en-US-Ava:DragonHDLatestNeural";
                var instructions = configuration["VoiceLive:Instructions"] ?? "You are a helpful AI assistant with access to MCP tools. Use the tools to help answer user questions. Respond naturally and conversationally. Some tools require user approval before they can be used. When you receive a system message asking you to request permission, you MUST clearly ask the user for their explicit approval before proceeding. Always wait for the user to say yes or no. Never skip the approval question or assume permission is granted. If a tool result arrives after the conversation has moved to a different topic, briefly introduce it as a late result before sharing the findings.";
                var useTokenCredential = args.Length > 0 && args[0] == "--use-token-credential";
    
                // Setup logging
                using var loggerFactory = LoggerFactory.Create(builder =>
                {
                    builder.AddConsole();
                    builder.SetMinimumLevel(LogLevel.Information);
                });
    
                var logger = loggerFactory.CreateLogger<Program>();
    
                // Validate credentials
                if (string.IsNullOrEmpty(apiKey) && !useTokenCredential)
                {
                    Console.WriteLine("āŒ Error: No authentication provided");
                    Console.WriteLine("Set AZURE_VOICELIVE_API_KEY or use --use-token-credential.");
                    return 1;
                }
    
                // Check audio system
                if (!CheckAudioSystem(logger))
                    return 1;
    
                try
                {
                    VoiceLiveClient client;
                    if (useTokenCredential)
                    {
                        client = new VoiceLiveClient(new Uri(endpoint), new DefaultAzureCredential(), new VoiceLiveClientOptions());
                        logger.LogInformation("Using Azure token credential");
                    }
                    else
                    {
                        client = new VoiceLiveClient(new Uri(endpoint), new AzureKeyCredential(apiKey!), new VoiceLiveClientOptions());
                        logger.LogInformation("Using API key credential");
                    }
    
                    using var assistant = new MCPVoiceAssistant(client, model, voice, instructions, loggerFactory);
                    using var cts = new CancellationTokenSource();
    
                    Console.CancelKeyPress += (sender, e) =>
                    {
                        e.Cancel = true;
                        cts.Cancel();
                    };
    
                    await assistant.StartAsync(cts.Token).ConfigureAwait(false);
                }
                catch (OperationCanceledException)
                {
                    Console.WriteLine("\nšŸ‘‹ Voice assistant with MCP shut down. Goodbye!");
                }
                catch (Exception ex)
                {
                    logger.LogError(ex, "Fatal error");
                    Console.WriteLine($"āŒ Error: {ex.Message}");
                    return 1;
                }
    
                return 0;
            }
    
            private static bool CheckAudioSystem(ILogger logger)
            {
                try
                {
                    using var waveIn = new WaveInEvent { WaveFormat = new WaveFormat(24000, 16, 1), BufferMilliseconds = 50 };
                    waveIn.DataAvailable += (_, __) => { };
                    waveIn.StartRecording();
                    waveIn.StopRecording();
    
                    var buffer = new BufferedWaveProvider(new WaveFormat(24000, 16, 1)) { BufferDuration = TimeSpan.FromMilliseconds(200) };
                    using var waveOut = new WaveOutEvent { DesiredLatency = 100 };
                    waveOut.Init(buffer);
                    waveOut.Play();
                    waveOut.Stop();
    
                    logger.LogInformation("Audio system check passed");
                    return true;
                }
                catch (Exception ex)
                {
                    Console.WriteLine($"āŒ Audio system check failed: {ex.Message}");
                    return false;
                }
            }
        }
    
        /// <summary>
        /// Voice assistant with MCP server integration.
        /// </summary>
        public class MCPVoiceAssistant : IDisposable
        {
            private readonly VoiceLiveClient _client;
            private readonly string _model;
            private readonly string _voice;
            private readonly string _instructions;
            private readonly ILogger<MCPVoiceAssistant> _logger;
            private readonly ILoggerFactory _loggerFactory;
    
            private VoiceLiveSession? _session;
            private AudioProcessor? _audioProcessor;
            private bool _disposed;
            private bool _responseActive;
            private bool _canCancelResponse;
    
            // Voice-based MCP approval state
            private record ApprovalInfo(string ApprovalId, string ServerLabel, string FunctionName);
            private ApprovalInfo? _pendingApproval;
            private readonly Queue<ApprovalInfo> _approvalQueue = new();
            private bool _approvalPromptNeeded;
            private int _mcpCallInProgress;
            private readonly HashSet<string> _handledMcpCompletions = new();
            private bool _needsResponseCreate;
            private readonly Dictionary<string, int> _approvalCallCount = new();
            private readonly Dictionary<string, string> _mcpItemToServer = new();
            private HashSet<string> _approvalServers = new();
            private CancellationTokenSource? _mcpStallCts;
            private readonly HashSet<string> _activeMcpItems = new();
            private readonly HashSet<string> _staleMcpItems = new();
            private bool _mcpResultsPending;
            private readonly HashSet<string> _approvedServersThisTurn = new();
            private static readonly string LogFilename = $"conversation_{DateTime.Now:yyyyMMdd_HHmmss}.log";
    
            public MCPVoiceAssistant(
                VoiceLiveClient client,
                string model,
                string voice,
                string instructions,
                ILoggerFactory loggerFactory)
            {
                _client = client;
                _model = model;
                _voice = voice;
                _instructions = instructions;
                _loggerFactory = loggerFactory;
                _logger = loggerFactory.CreateLogger<MCPVoiceAssistant>();
            }
    
            public async Task StartAsync(CancellationToken cancellationToken = default)
            {
                try
                {
                    _logger.LogInformation("Connecting to VoiceLive API with model {Model}", _model);
    
                    _session = await _client.StartSessionAsync(_model, cancellationToken).ConfigureAwait(false);
                    _audioProcessor = new AudioProcessor(_session, _loggerFactory.CreateLogger<AudioProcessor>());
    
                    await SetupSessionAsync(cancellationToken).ConfigureAwait(false);
    
                    await _audioProcessor.StartPlaybackAsync().ConfigureAwait(false);
                    await _audioProcessor.StartCaptureAsync().ConfigureAwait(false);
    
                    _logger.LogInformation("Voice assistant with MCP ready!");
                    Console.WriteLine();
                    Console.WriteLine(new string('=', 70));
                    Console.WriteLine("šŸŽ¤ VOICE ASSISTANT WITH MCP READY");
                    Console.WriteLine("Try saying:");
                    Console.WriteLine("  • 'What is the GitHub repo fastapi about?'");
                    Console.WriteLine("  • 'Search the Azure documentation for Voice Live API.'");
                    Console.WriteLine("You may need to approve some MCP tool calls in the console.");
                    Console.WriteLine("Press Ctrl+C to exit");
                    Console.WriteLine(new string('=', 70));
                    Console.WriteLine();
    
                    await ProcessEventsAsync(cancellationToken).ConfigureAwait(false);
                }
                catch (OperationCanceledException)
                {
                    _logger.LogInformation("Shutting down...");
                }
                finally
                {
                    if (_audioProcessor != null)
                        await _audioProcessor.CleanupAsync().ConfigureAwait(false);
                }
            }
    
            // <define_mcp_servers>
            /// <summary>
            /// Define MCP servers that Voice Live can use during the session.
            /// Each server is a VoiceLiveMcpServerDefinition instance added to the session options tools list.
            /// </summary>
            private List<VoiceLiveToolDefinition> DefineMCPServers()
            {
                var mcpTools = new List<VoiceLiveToolDefinition>
                {
                    new VoiceLiveMcpServerDefinition("deepwiki", "https://mcp.deepwiki.com/mcp")
                    {
                        AllowedTools = { "read_wiki_structure", "ask_question" },
                        RequireApproval = BinaryData.FromString("\"never\""),
                    },
                    new VoiceLiveMcpServerDefinition("azure_doc", "https://learn.microsoft.com/api/mcp")
                    {
                        RequireApproval = BinaryData.FromString("\"always\""),
                    },
                };
    
                return mcpTools;
            }
            // </define_mcp_servers>
    
            // <configure_session>
            private async Task SetupSessionAsync(CancellationToken cancellationToken)
            {
                _logger.LogInformation("Setting up session with MCP tools...");
    
                var azureVoice = new AzureStandardVoice(_voice);
                var turnDetection = new ServerVadTurnDetection
                {
                    Threshold = 0.5f,
                    PrefixPadding = TimeSpan.FromMilliseconds(300),
                    SilenceDuration = TimeSpan.FromMilliseconds(500)
                };
    
                // Create session options and add MCP servers to the tools list
                var sessionOptions = new VoiceLiveSessionOptions
                {
                    InputAudioEchoCancellation = new AudioEchoCancellation(),
                    Model = _model,
                    Instructions = _instructions,
                    Voice = azureVoice,
                    InputAudioFormat = InputAudioFormat.Pcm16,
                    OutputAudioFormat = OutputAudioFormat.Pcm16,
                    TurnDetection = turnDetection
                };
    
                // Enable input audio transcription so we receive
                // SessionUpdateConversationItemInputAudioTranscriptionCompleted events
                // (required for the voice-based approval flow).
                sessionOptions.InputAudioTranscription = new AudioInputTranscriptionOptions(
                    _model.Contains("realtime", StringComparison.OrdinalIgnoreCase) ? "whisper-1" : "azure-speech");
    
                sessionOptions.Modalities.Clear();
                sessionOptions.Modalities.Add(InteractionModality.Text);
                sessionOptions.Modalities.Add(InteractionModality.Audio);
    
                // Add MCP servers to the tools list
                var mcpServers = DefineMCPServers();
                foreach (var tool in mcpServers)
                {
                    sessionOptions.Tools.Add(tool);
                }
    
                // Track which servers require approval for per-turn loop prevention
                _approvalServers = new HashSet<string> { "azure_doc" };
    
                await _session!.ConfigureSessionAsync(sessionOptions, cancellationToken).ConfigureAwait(false);
                _logger.LogInformation("Session with MCP tools configured");
            }
            // </configure_session>
    
            private async Task ProcessEventsAsync(CancellationToken cancellationToken)
            {
                try
                {
                    await foreach (SessionUpdate serverEvent in _session!.GetUpdatesAsync(cancellationToken).ConfigureAwait(false))
                    {
                        await HandleSessionUpdateAsync(serverEvent, cancellationToken).ConfigureAwait(false);
                    }
                }
                catch (OperationCanceledException) { }
            }
    
            // <handle_mcp_events>
            private async Task HandleSessionUpdateAsync(SessionUpdate serverEvent, CancellationToken cancellationToken)
            {
                switch (serverEvent)
                {
                    case SessionUpdateSessionUpdated sessionUpdated:
                        _logger.LogInformation("Session updated");
                        WriteLog($"SessionID: {sessionUpdated.Session?.Id}");
                        WriteLog($"Model: {_model}");
                        WriteLog($"Voice: {_voice}");
                        WriteLog("");
                        if (_audioProcessor != null)
                            await _audioProcessor.StartCaptureAsync().ConfigureAwait(false);
                        break;
    
                    case SessionUpdateInputAudioBufferSpeechStarted:
                        Console.WriteLine("šŸŽ¤ Listening...");
                        if (_audioProcessor != null)
                            await _audioProcessor.StopPlaybackAsync().ConfigureAwait(false);
                        if (_responseActive && _canCancelResponse)
                        {
                            try { await _session!.CancelResponseAsync(cancellationToken).ConfigureAwait(false); }
                            catch { }
                            try { await _session!.ClearStreamingAudioAsync(cancellationToken).ConfigureAwait(false); }
                            catch { }
                        }
                        // Do NOT reset _approvalCallCount here — the counter should only
                        // reset on task completion (in MCP-call-completed when no pending/queued
                        // approvals remain) or on explicit denial (in ResolveVoiceApprovalAsync).
                        // Resetting on every speech-start would let the model retry denied calls.
    
                        // Clear deferred response flags if no MCP calls are in progress.
                        // Prevents stale needsResponseCreate from re-triggering result playback
                        // after the user interrupts.
                        if (_mcpCallInProgress <= 0)
                        {
                            _needsResponseCreate = false;
                            _mcpResultsPending = false;
                        }
    
                        // Reset approved-servers-this-turn when user starts a new topic
                        if (_pendingApproval == null && _mcpCallInProgress <= 0)
                            _approvedServersThisTurn.Clear();
    
                        // If an MCP call is running, ask the user if they want to wait or skip
                        if (_mcpCallInProgress > 0 && _pendingApproval == null)
                        {
                            foreach (var id in _activeMcpItems) _staleMcpItems.Add(id);
                            _logger.LogInformation("User spoke during MCP call — marking {Count} calls as stale", _activeMcpItems.Count);
                            try
                            {
                                await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                                {
                                    type = "conversation.item.create",
                                    item = new
                                    {
                                        type = "message",
                                        role = "system",
                                        content = new[] { new { type = "input_text", text = "A tool call is still running in the background. The user just spoke. Respond to what the user said. If a tool result arrives later, briefly introduce it as a late result from an earlier request." } }
                                    }
                                }), cancellationToken).ConfigureAwait(false);
                            }
                            catch (Exception ex) { _logger.LogWarning("Failed to inject MCP status update: {Error}", ex.Message); }
                        }
                        break;
    
                    case SessionUpdateInputAudioBufferSpeechStopped:
                        Console.WriteLine("šŸ¤” Processing...");
                        if (_audioProcessor != null)
                            await _audioProcessor.StartPlaybackAsync().ConfigureAwait(false);
                        break;
    
                    case SessionUpdateResponseCreated:
                        _responseActive = true;
                        _canCancelResponse = true;
                        break;
    
                    case SessionUpdateResponseAudioDelta audioDelta:
                        if (audioDelta.Delta != null && _audioProcessor != null)
                            await _audioProcessor.QueueAudioAsync(audioDelta.Delta.ToArray()).ConfigureAwait(false);
                        break;
    
                    case SessionUpdateResponseAudioDone:
                        Console.WriteLine("šŸŽ¤ Ready for next input...");
                        break;
    
                    case SessionUpdateResponseDone:
                        _responseActive = false;
                        _canCancelResponse = false;
                        WriteLog("--- Response complete ---");
                        // If an approval prompt needs to be injected, do it now
                        if (_approvalPromptNeeded && _pendingApproval != null)
                        {
                            _approvalPromptNeeded = false;
                            await SendApprovalVoicePromptAsync(cancellationToken).ConfigureAwait(false);
                        }
                        // If MCP results are pending and all calls are now done, create response
                        else if (_mcpResultsPending && _mcpCallInProgress <= 0 && _pendingApproval == null)
                        {
                            _mcpResultsPending = false;
                            try { await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false); }
                            catch { }
                        }
                        else if (_needsResponseCreate)
                        {
                            _needsResponseCreate = false;
                            try { await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false); }
                            catch { }
                        }
                        break;
    
                    case SessionUpdateError errorEvent:
                        var msg = errorEvent.Error?.Message ?? "";
                        if (!msg.Contains("no active response", StringComparison.OrdinalIgnoreCase))
                        {
                            // Suppress non-fatal interim/collision errors
                            if (msg.Contains("interim response", StringComparison.OrdinalIgnoreCase))
                            {
                                _logger.LogWarning("Interim response not supported with this model pipeline (non-fatal)");
                            }
                            else if (msg.Contains("active response", StringComparison.OrdinalIgnoreCase))
                            {
                                _logger.LogDebug("Response collision (expected during MCP flow): {Message}", msg);
                            }
                            else
                            {
                                Console.WriteLine($"āŒ Error: {msg}");
                                WriteLog($"ERROR: {msg}");
                            }
                        }
                        _responseActive = false;
                        _canCancelResponse = false;
                        break;
    
                    // Transcription event — used for voice-based approval resolution
                    case SessionUpdateConversationItemInputAudioTranscriptionCompleted transcription:
                        var transcript = transcription.Transcript ?? "";
                        _logger.LogInformation("User said: {Transcript}", transcript);
                        Console.WriteLine($"šŸ‘¤ You said:\t{transcript}");
                        WriteLog($"User Input:\t{transcript}");
                        if (_pendingApproval != null)
                        {
                            await ResolveVoiceApprovalAsync(transcript, cancellationToken).ConfigureAwait(false);
                        }
                        break;
    
                    // MCP-specific events
                    case SessionUpdateMcpListToolsCompleted mcpListDone:
                        Console.WriteLine("šŸ”§ MCP tools discovered successfully");
                        WriteLog("MCP tools discovered successfully");
                        _logger.LogInformation("MCP tools discovered for server");
                        break;
    
                    case SessionUpdateMcpListToolsFailed:
                        Console.WriteLine("āŒ MCP tool discovery failed");
                        WriteLog("ERROR: MCP tool discovery failed");
                        break;
    
                    case SessionUpdateResponseMcpCallInProgress mcpInProgress:
                        Console.WriteLine("ā³ MCP tool call in progress...");
                        WriteLog($"MCP call in progress: {mcpInProgress.ItemId}");
                        _mcpCallInProgress++;
                        _activeMcpItems.Add(mcpInProgress.ItemId ?? "");
                        StartMcpStallTimer(cancellationToken);
                        break;
    
                    case SessionUpdateResponseMcpCallCompleted mcpCompleted:
                    {
                        var itemId = mcpCompleted.ItemId ?? "";
                        _mcpCallInProgress = Math.Max(0, _mcpCallInProgress - 1);
                        _activeMcpItems.Remove(itemId);
                        CancelMcpStallTimer();
                        if (_handledMcpCompletions.Contains(itemId))
                        {
                            _logger.LogDebug("Ignoring duplicate MCP completion for {ItemId}", itemId);
                        }
                        else
                        {
                            _handledMcpCompletions.Add(itemId);
                            bool isStale = _staleMcpItems.Remove(itemId);
                            _logger.LogInformation("MCP call completed for {ItemId} (stale={IsStale})", itemId, isStale);
                            Console.WriteLine("āœ… MCP tool call completed successfully");
                            WriteLog($"MCP call completed: {itemId} (stale={isStale})");
    
                            // Clean up item mapping
                            _mcpItemToServer.Remove(itemId);
    
                            // Reset approval counter if no more approvals are pending
                            if (_pendingApproval == null && _approvalQueue.Count == 0)
                                _approvalCallCount.Clear();
    
                            // If the user moved on during this call, tell the model it's a late result
                            if (isStale)
                            {
                                try
                                {
                                    await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                                    {
                                        type = "conversation.item.create",
                                        item = new
                                        {
                                            type = "message",
                                            role = "system",
                                            content = new[] { new { type = "input_text", text = "This tool result is from an earlier request. The user has since moved on. Briefly introduce it as a late result, e.g. 'By the way, those results from earlier just came in...' then share the key findings concisely." } }
                                        }
                                    }), cancellationToken).ConfigureAwait(false);
                                }
                                catch (Exception ex) { _logger.LogWarning("Failed to inject late-result context: {Error}", ex.Message); }
                            }
    
                            // Batch response: only call response.create when ALL MCP calls for this
                            // turn have completed. This prevents partial results and repeated tool calls.
                            if (_pendingApproval == null && _approvalQueue.Count == 0 && _mcpCallInProgress <= 0)
                            {
                                try
                                {
                                    await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false);
                                }
                                catch (Exception ex)
                                {
                                    if (ex.Message.Contains("active response", StringComparison.OrdinalIgnoreCase))
                                        _needsResponseCreate = true;
                                    else
                                        _logger.LogWarning("Failed to create response after MCP call: {Error}", ex.Message);
                                }
                            }
                            else
                            {
                                _mcpResultsPending = true;
                                _logger.LogInformation("MCP calls still in progress ({Count}) — deferring response", _mcpCallInProgress);
                            }
                        }
                        break;
                    }
    
                    case SessionUpdateResponseMcpCallFailed mcpFailed:
                    {
                        var failedItemId = mcpFailed.ItemId ?? "";
                        Console.WriteLine("āŒ MCP tool call failed");
                        WriteLog($"ERROR: MCP call failed: {failedItemId}");
                        _mcpCallInProgress = Math.Max(0, _mcpCallInProgress - 1);
                        _activeMcpItems.Remove(failedItemId);
                        _staleMcpItems.Remove(failedItemId);
                        CancelMcpStallTimer();
                        try { await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false); }
                        catch { }
                        break;
                    }
    
                    case SessionUpdateConversationItemCreated itemCreated
                        when itemCreated.Item is SessionResponseMcpApprovalRequestItem mcpApproval:
                        await HandleMCPApprovalAsync(mcpApproval, cancellationToken).ConfigureAwait(false);
                        break;
    
                    case SessionUpdateConversationItemCreated itemCreated:
                        _logger.LogDebug("Conversation item created: {ItemType}", itemCreated.Item?.GetType().Name);
                        // Track mcp_call items for server mapping and announce non-approval tool calls
                        if (itemCreated.Item is SessionResponseMcpCallItem mcpCallItem)
                        {
                            var serverLabel = mcpCallItem.ServerLabel ?? "";
                            var functionName = mcpCallItem.Name ?? "";
                            var mcpItemId = mcpCallItem.Id ?? "";
                            _logger.LogInformation("MCP Call triggered: server_label={Server}, function_name={Function}", serverLabel, functionName);
                            Console.WriteLine($"šŸ”§ MCP tool call: {serverLabel}/{functionName}");
                            if (!string.IsNullOrEmpty(mcpItemId))
                                _mcpItemToServer[mcpItemId] = $"{serverLabel}/{functionName}";
    
                            // Announce the tool call so the user knows something is happening
                            if (_pendingApproval == null && !_approvalServers.Contains(serverLabel))
                            {
                                try
                                {
                                    await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                                    {
                                        type = "conversation.item.create",
                                        item = new
                                        {
                                            type = "message",
                                            role = "system",
                                            content = new[] { new { type = "input_text", text = "Briefly tell the user you're looking something up. One short sentence only." } }
                                        }
                                    }), cancellationToken).ConfigureAwait(false);
                                    await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false);
                                }
                                catch (Exception ex)
                                {
                                    if (!ex.Message.Contains("active response", StringComparison.OrdinalIgnoreCase))
                                        _logger.LogWarning("Failed to create tool announcement: {Error}", ex.Message);
                                }
                            }
                        }
                        break;
    
                    default:
                        _logger.LogDebug("Unhandled event: {EventType}", serverEvent.GetType().Name);
                        break;
                }
            }
            // </handle_mcp_events>
    
            // <handle_approval>
            /// <summary>
            /// Handle MCP approval request by asking the user via voice.
            /// </summary>
            private async Task HandleMCPApprovalAsync(SessionResponseMcpApprovalRequestItem approvalItem, CancellationToken cancellationToken)
            {
                var approvalId = approvalItem.Id;
                var serverLabel = approvalItem.ServerLabel ?? "";
                var toolName = approvalItem.Name ?? "";
    
                if (string.IsNullOrEmpty(approvalId))
                {
                    _logger.LogError("MCP approval item missing ID");
                    return;
                }
    
                // If another approval is already pending, queue this one
                if (_pendingApproval != null)
                {
                    _logger.LogInformation("Queuing approval for {Tool} — another is already pending", toolName);
                    _approvalQueue.Enqueue(new ApprovalInfo(approvalId, serverLabel, toolName));
                    return;
                }
    
                const int MaxApprovalCallsPerTask = 3;
                _approvalCallCount.TryGetValue(serverLabel, out var currentCount);
                if (currentCount >= MaxApprovalCallsPerTask)
                {
                    _logger.LogInformation("Auto-denying {Tool} — reached {Count} calls this task", toolName, currentCount);
                    Console.WriteLine($"   Auto-denied: {serverLabel}/{toolName} (max {MaxApprovalCallsPerTask} calls reached)");
                    try
                    {
                        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                        {
                            type = "conversation.item.create",
                            item = new
                            {
                                type = "mcp_approval_response",
                                approval_request_id = approvalId,
                                approve = false
                            }
                        }), cancellationToken).ConfigureAwait(false);
                    }
                    catch (Exception ex)
                    {
                        _logger.LogWarning("Failed to send auto-deny: {Error}", ex.Message);
                    }
                    return;
                }
    
                // Auto-approve if user already approved this server earlier in the same turn
                if (_approvedServersThisTurn.Contains(serverLabel))
                {
                    _logger.LogInformation("Auto-approving {Tool} — server already approved this turn", toolName);
                    Console.WriteLine($"   Auto-approved: {serverLabel}/{toolName} (already approved this turn)");
                    try
                    {
                        await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                        {
                            type = "conversation.item.create",
                            item = new
                            {
                                type = "mcp_approval_response",
                                approval_request_id = approvalId,
                                approve = true
                            }
                        }), cancellationToken).ConfigureAwait(false);
                    }
                    catch (Exception ex)
                    {
                        _logger.LogWarning("Failed to send auto-approve: {Error}", ex.Message);
                    }
                    return;
                }
    
                _logger.LogInformation("MCP approval request: server={Server} tool={Tool}", serverLabel, toolName);
                Console.WriteLine();
                Console.WriteLine($"šŸ” MCP Approval Request (voice-based):");
                Console.WriteLine($"   Server: {serverLabel}  Tool: {toolName}");
                WriteLog($"Approval request: server={serverLabel} tool={toolName}");
    
                _pendingApproval = new ApprovalInfo(approvalId, serverLabel, toolName);
    
                if (!_responseActive)
                {
                    await SendApprovalVoicePromptAsync(cancellationToken).ConfigureAwait(false);
                }
                else
                {
                    _approvalPromptNeeded = true;
                }
            }
    
            /// <summary>
            /// Inject a system message asking the model to verbally request permission.
            /// </summary>
            private async Task SendApprovalVoicePromptAsync(CancellationToken cancellationToken)
            {
                var pending = _pendingApproval;
                if (pending == null) return;
    
                var server = pending.ServerLabel;
                _approvalCallCount.TryGetValue(server, out var callCount);
                _approvalCallCount[server] = callCount + 1;
    
                string prompt;
                if (callCount == 0)
                {
                    prompt = "You MUST ask the user for explicit permission before proceeding. "
                           + $"Say exactly: \"I'd like to search the {server} service for information. "
                           + "Do you approve? Please say yes or no.\"";
                }
                else
                {
                    prompt = "You MUST ask the user for permission again. "
                           + "Say exactly: \"I need to do one more search to get complete information. "
                           + "Should I continue? Please say yes or no.\"";
                }
    
                try
                {
                    await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                    {
                        type = "conversation.item.create",
                        item = new
                        {
                            type = "message",
                            role = "system",
                            content = new[] { new { type = "input_text", text = prompt } }
                        }
                    }), cancellationToken).ConfigureAwait(false);
                    await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), cancellationToken).ConfigureAwait(false);
                }
                catch (Exception ex)
                {
                    _logger.LogWarning("Failed to send approval voice prompt: {Error}", ex.Message);
                }
            }
    
            // <voice_approval_transcription>
            /// <summary>
            /// Interpret the user's spoken response as approval or denial.
            /// </summary>
            private async Task ResolveVoiceApprovalAsync(string transcript, CancellationToken cancellationToken)
            {
                var pending = _pendingApproval;
                if (pending == null) return;
    
                var text = transcript.Trim().ToLowerInvariant();
    
                bool approved = Regex.IsMatch(text, @"\byes\b");
                bool denied = Regex.IsMatch(text, @"\b(no|stop|cancel)\b");
    
                if (!approved && !denied)
                {
                    // Ambiguous — ask again via the deferred prompt mechanism
                    _logger.LogInformation("Ambiguous approval response: {Transcript}", transcript);
                    _approvalPromptNeeded = true;
                    return;
                }
    
                if (approved && denied)
                {
                    // Conflicting signals — treat as denial for safety
                    approved = false;
                }
    
                // Clear the pending state before sending the response
                _pendingApproval = null;
                if (approved)
                    _approvedServersThisTurn.Add(pending.ServerLabel);
                else
                {
                    _approvalCallCount.Clear();
                    _approvedServersThisTurn.Remove(pending.ServerLabel);
                }
    
                try
                {
                    await _session!.SendCommandAsync(BinaryData.FromObjectAsJson(new
                    {
                        type = "conversation.item.create",
                        item = new
                        {
                            type = "mcp_approval_response",
                            approval_request_id = pending.ApprovalId,
                            approve = approved,
                        }
                    }), cancellationToken).ConfigureAwait(false);
                }
                catch (Exception ex)
                {
                    _logger.LogError("Failed to send approval response: {Error}", ex.Message);
                    return;
                }
                _logger.LogInformation("Voice approval resolved: {Approved} for {Tool}", approved, pending.FunctionName);
                Console.WriteLine($"   Voice approval: {(approved ? "Approved āœ…" : "Denied āŒ")}");
                WriteLog($"Approval resolved: {(approved ? "APPROVED" : "DENIED")} for {pending.ServerLabel}/{pending.FunctionName}");
    
                // Process next queued approval, if any
                await ProcessNextApprovalAsync(cancellationToken).ConfigureAwait(false);
            }
    
            /// <summary>
            /// Pop the next queued approval and ask via voice.
            /// </summary>
            private async Task ProcessNextApprovalAsync(CancellationToken cancellationToken)
            {
                if (_approvalQueue.Count == 0) return;
    
                var next = _approvalQueue.Dequeue();
                _pendingApproval = next;
    
                if (!_responseActive)
                {
                    await SendApprovalVoicePromptAsync(cancellationToken).ConfigureAwait(false);
                }
                else
                {
                    _approvalPromptNeeded = true;
                }
            }
            // </voice_approval_transcription>
            // </handle_approval>
    
            // <mcp_stall_detection>
            private void StartMcpStallTimer(CancellationToken ct)
            {
                CancelMcpStallTimer();
                _mcpStallCts = CancellationTokenSource.CreateLinkedTokenSource(ct);
                var token = _mcpStallCts.Token;
                _ = Task.Run(async () =>
                {
                    int stallCount = 0;
                    while (_mcpCallInProgress > 0 && stallCount < 3)
                    {
                        await Task.Delay(10000, token).ConfigureAwait(false);
                        if (_mcpCallInProgress <= 0 || _session == null)
                            break;
                        stallCount++;
                        // MCP calls cannot be cancelled — only honest status updates are possible.
                        string msg = "The tool call is still running. Briefly reassure the user that you're still waiting for results. One short sentence only.";
                        try
                        {
                            await _session.SendCommandAsync(BinaryData.FromObjectAsJson(new
                            {
                                type = "conversation.item.create",
                                item = new
                                {
                                    type = "message",
                                    role = "system",
                                    content = new[] { new { type = "input_text", text = msg } }
                                }
                            }), token).ConfigureAwait(false);
                            await _session.SendCommandAsync(BinaryData.FromObjectAsJson(new { type = "response.create" }), token).ConfigureAwait(false);
                        }
                        catch (Exception ex)
                        {
                            if (ex.Message.Contains("active response", StringComparison.OrdinalIgnoreCase))
                                _needsResponseCreate = true;
                        }
                    }
                }, token);
            }
    
            private void CancelMcpStallTimer()
            {
                if (_mcpStallCts != null)
                {
                    _mcpStallCts.Cancel();
                    _mcpStallCts.Dispose();
                    _mcpStallCts = null;
                }
            }
            // </mcp_stall_detection>
    
            private static void WriteLog(string message)
            {
                try
                {
                    var logDir = Path.Combine(Directory.GetCurrentDirectory(), "logs");
                    Directory.CreateDirectory(logDir);
                    var logPath = Path.Combine(logDir, LogFilename);
                    File.AppendAllText(logPath, $"[{DateTime.Now:HH:mm:ss}] {message}{Environment.NewLine}");
                }
                catch (IOException) { }
            }
    
            public void Dispose()
            {
                if (_disposed) return;
                CancelMcpStallTimer();
                _audioProcessor?.Dispose();
                _session?.Dispose();
                _disposed = true;
            }
        }
    
        /// <summary>
        /// Audio processor for real-time capture and playback.
        /// Same pattern as ModelQuickstart - handles PCM16 24kHz mono audio.
        /// </summary>
        public class AudioProcessor : IDisposable
        {
            private readonly VoiceLiveSession _session;
            private readonly ILogger<AudioProcessor> _logger;
    
            private const int SampleRate = 24000;
            private const int Channels = 1;
            private const int BitsPerSample = 16;
    
            private WaveInEvent? _waveIn;
            private WaveOutEvent? _waveOut;
            private BufferedWaveProvider? _playbackBuffer;
    
            private bool _isCapturing;
            private bool _isPlaying;
    
            private readonly Channel<byte[]> _audioSendChannel;
            private readonly ChannelWriter<byte[]> _audioSendWriter;
            private readonly ChannelReader<byte[]> _audioSendReader;
            private readonly Channel<byte[]> _audioPlaybackChannel;
            private readonly ChannelWriter<byte[]> _audioPlaybackWriter;
            private readonly ChannelReader<byte[]> _audioPlaybackReader;
    
            private Task? _audioSendTask;
            private Task? _audioPlaybackTask;
            private readonly CancellationTokenSource _cancellationTokenSource;
            private CancellationTokenSource _playbackCancellationTokenSource;
    
            public AudioProcessor(VoiceLiveSession session, ILogger<AudioProcessor> logger)
            {
                _session = session;
                _logger = logger;
    
                _audioSendChannel = Channel.CreateUnbounded<byte[]>();
                _audioSendWriter = _audioSendChannel.Writer;
                _audioSendReader = _audioSendChannel.Reader;
    
                _audioPlaybackChannel = Channel.CreateUnbounded<byte[]>();
                _audioPlaybackWriter = _audioPlaybackChannel.Writer;
                _audioPlaybackReader = _audioPlaybackChannel.Reader;
    
                _cancellationTokenSource = new CancellationTokenSource();
                _playbackCancellationTokenSource = new CancellationTokenSource();
            }
    
            public Task StartCaptureAsync()
            {
                if (_isCapturing) return Task.CompletedTask;
                _isCapturing = true;
    
                _waveIn = new WaveInEvent
                {
                    WaveFormat = new WaveFormat(SampleRate, BitsPerSample, Channels),
                    BufferMilliseconds = 50
                };
    
                _waveIn.DataAvailable += (sender, e) =>
                {
                    if (_isCapturing && e.BytesRecorded > 0)
                    {
                        var audioData = new byte[e.BytesRecorded];
                        Array.Copy(e.Buffer, 0, audioData, 0, e.BytesRecorded);
                        _audioSendWriter.TryWrite(audioData);
                    }
                };
    
                _waveIn.StartRecording();
                _audioSendTask = ProcessAudioSendAsync(_cancellationTokenSource.Token);
                _logger.LogInformation("Started audio capture");
                return Task.CompletedTask;
            }
    
            public Task StartPlaybackAsync()
            {
                if (_isPlaying) return Task.CompletedTask;
                _isPlaying = true;
    
                _waveOut = new WaveOutEvent { DesiredLatency = 100 };
                _playbackBuffer = new BufferedWaveProvider(new WaveFormat(SampleRate, BitsPerSample, Channels))
                {
                    BufferDuration = TimeSpan.FromSeconds(10),
                    DiscardOnBufferOverflow = true
                };
    
                _waveOut.Init(_playbackBuffer);
                _waveOut.Play();
    
                _playbackCancellationTokenSource = new CancellationTokenSource();
                _audioPlaybackTask = ProcessAudioPlaybackAsync();
                _logger.LogInformation("Audio playback ready");
                return Task.CompletedTask;
            }
    
            public async Task StopPlaybackAsync()
            {
                if (!_isPlaying) return;
                _isPlaying = false;
    
                while (_audioPlaybackReader.TryRead(out _)) { }
                _playbackBuffer?.ClearBuffer();
    
                if (_waveOut != null) { _waveOut.Stop(); _waveOut.Dispose(); _waveOut = null; }
                _playbackBuffer = null;
                _playbackCancellationTokenSource.Cancel();
    
                if (_audioPlaybackTask != null)
                {
                    await _audioPlaybackTask.ConfigureAwait(false);
                    _audioPlaybackTask = null;
                }
            }
    
            public async Task QueueAudioAsync(byte[] audioData)
            {
                if (_isPlaying && audioData.Length > 0)
                    await _audioPlaybackWriter.WriteAsync(audioData).ConfigureAwait(false);
            }
    
            public async Task CleanupAsync()
            {
                _isCapturing = false;
                if (_waveIn != null) { _waveIn.StopRecording(); _waveIn.Dispose(); _waveIn = null; }
                _audioSendWriter.TryComplete();
                if (_audioSendTask != null) await _audioSendTask.ConfigureAwait(false);
    
                await StopPlaybackAsync().ConfigureAwait(false);
                _cancellationTokenSource.Cancel();
                _logger.LogInformation("Audio processor cleaned up");
            }
    
            private async Task ProcessAudioSendAsync(CancellationToken ct)
            {
                try
                {
                    await foreach (var audioData in _audioSendReader.ReadAllAsync(ct).ConfigureAwait(false))
                    {
                        try { await _session.SendInputAudioAsync(audioData, ct).ConfigureAwait(false); }
                        catch { }
                    }
                }
                catch (OperationCanceledException) { }
            }
    
            private async Task ProcessAudioPlaybackAsync()
            {
                try
                {
                    var ct = CancellationTokenSource.CreateLinkedTokenSource(
                        _playbackCancellationTokenSource.Token, _cancellationTokenSource.Token).Token;
    
                    await foreach (var audioData in _audioPlaybackReader.ReadAllAsync(ct).ConfigureAwait(false))
                    {
                        if (_playbackBuffer != null && _isPlaying)
                            _playbackBuffer.AddSamples(audioData, 0, audioData.Length);
                    }
                }
                catch (OperationCanceledException) { }
            }
    
            public void Dispose()
            {
                CleanupAsync().Wait();
                _cancellationTokenSource.Dispose();
            }
        }
    }
    
  2. Sign in to Azure with the following command:

    az login
    
  3. Build and run the application:

    dotnet run
    
  4. Speak into your microphone. Try asking questions like "What tools do you have?" or "Search the Azure documentation for Voice Live API."

    • For the deepwiki server (RequireApproval = "never"), tool calls execute automatically.
    • For the azure_doc server (RequireApproval = "always"), you're prompted to approve each tool call in the console.
  5. Press Ctrl+C to stop the session.

MCP server configuration reference

Parameter Required Description
ServerLabel Yes Display name for the MCP server.
ServerUrl Yes URL of the remote MCP endpoint.
AllowedTools No List of tool names the model can call. If omitted, all tools are allowed.
RequireApproval No "never", "always" (default), or a per-tool dictionary.
Headers No Extra HTTP headers to include in MCP requests.
Authorization No Authorization token for MCP requests.

For the complete REST API type definition, see MCPTool in the Voice Live API reference.

Learn how to connect remote MCP servers to a Voice Live session using the VoiceLive SDK for Java. This article builds on the Quickstart: Create a Voice Live real-time voice agent with MCP server integration.

Reference documentation | Package (Maven) | Additional samples on GitHub

Follow the how-to below or get the full sample code:

Prerequisites

Tip

To use Voice Live with MCP, you don't need to deploy an audio model with your Foundry resource. Voice Live is fully managed, and the model is automatically deployed for you. For more information about model availability, see the Voice Live overview documentation.

Prepare the environment

Complete the Voice Live quickstart to set up your environment, configure authentication, and test your first Voice Live conversation.

MCP integration concepts

MCP server definition

Use the MCPServer type to declare each remote MCP endpoint. At minimum, provide serverLabel (a display name) and serverUrl (the MCP endpoint URL). Optionally restrict available tools with allowedTools and configure the approval mode.

Approval modes

Control whether MCP tool calls require user approval before execution:

  • requireApproval("never"): The tool executes automatically when the model invokes it.
  • requireApproval("always") (default): The client receives an approval request and must respond before the tool runs.

API version requirement

MCP support requires API version 2026-04-10 or later.

Define MCP servers

Define the MCP servers that Voice Live can use during the session. Each server is an MCPServer instance added to the tools list in the session configuration.

The following code defines two MCP servers: one with automatic tool execution and one that requires user approval before running.

/**
 * Define MCP servers that Voice Live can use during the session.
 * Each server is an MCPServer instance added to the session options tools list.
 */
private static List<VoiceLiveToolDefinition> defineMCPServers() {
    List<VoiceLiveToolDefinition> mcpTools = new ArrayList<>();

    mcpTools.add(new MCPServer("deepwiki", "https://mcp.deepwiki.com/mcp")
        .setAllowedTools(Arrays.asList("read_wiki_structure", "ask_question"))
        .setRequireApproval(BinaryData.fromString("never")));

    mcpTools.add(new MCPServer("azure_doc", "https://learn.microsoft.com/api/mcp")
        .setRequireApproval(BinaryData.fromString("always")));

    return mcpTools;
}

In this sample:

  • The deepwiki server allows only read_wiki_structure and ask_question tools, with requireApproval set to "never" for automatic execution.
  • The azure_doc server allows all tools on the endpoint, with requireApproval set to "always" so users can review each call before execution.

Configure the session with MCP tools

Pass the MCP server definitions to the session options tools list alongside your voice, modality, and turn-detection settings.

/**
 * Create session configuration with MCP servers in the tools list.
 */
private static VoiceLiveSessionOptions createSessionOptions(Config config) {
    ServerVadTurnDetection turnDetection = new ServerVadTurnDetection()
        .setThreshold(0.5)
        .setPrefixPaddingMs(300)
        .setSilenceDurationMs(500)
        .setInterruptResponse(true)
        .setAutoTruncate(true)
        .setCreateResponse(true);

    // Enable input audio transcription so we receive user speech as text
    AudioInputTranscriptionOptionsModel transcriptionModel = config.model.toLowerCase().contains("realtime")
        ? AudioInputTranscriptionOptionsModel.WHISPER_1
        : AudioInputTranscriptionOptionsModel.fromString("azure-speech");
    AudioInputTranscriptionOptions transcriptionOptions =
        new AudioInputTranscriptionOptions(transcriptionModel);

    VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
        .setInstructions(config.instructions)
        .setVoice(BinaryData.fromObject(new AzureStandardVoice(config.voice)))
        .setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))
        .setInputAudioFormat(InputAudioFormat.PCM16)
        .setOutputAudioFormat(OutputAudioFormat.PCM16)
        .setInputAudioSamplingRate(SAMPLE_RATE)
        .setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD))
        .setInputAudioEchoCancellation(new AudioEchoCancellation())
        .setInputAudioTranscription(transcriptionOptions)
        .setTurnDetection(turnDetection);

    // Add MCP servers to the tools list
    List<VoiceLiveToolDefinition> mcpServers = defineMCPServers();
    options.setTools(mcpServers);

    return options;
}

In this sample:

  • VoiceLiveSessionOptions bundles MCP tools with audio format, voice, and turn detection settings.
  • The session configuration is sent to Voice Live after connecting.
  • Voice Live automatically discovers available tools from each MCP server after the session starts.

Handle MCP events

Process MCP-specific events in the event loop. The key events include MCP tool call creation, completion, failure, and approval requests.

/**
 * Handle incoming server events, including MCP-specific events
 * and voice-based approval flow.
 */
private static void handleServerEvent(SessionUpdate event, AudioProcessor audioProcessor,
                                       SessionState state, VoiceLiveSessionAsyncClient session) {
    ServerEventType eventType = event.getType();

    try {
        if (eventType == ServerEventType.SESSION_UPDATED) {
            System.out.println("āœ“ Session updated - starting microphone");
            writeLog("Session updated");
            audioProcessor.startCapture();

        } else if (eventType == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED) {
            System.out.println("šŸŽ¤ Listening...");
            audioProcessor.skipPendingAudio();

            // Cancel any active response — prevents duplicate result playback
            // when the user interrupts during MCP result speech (matches C#/Python/JS)
            if (state.responseActive) {
                session.send(BinaryData.fromString("{\"type\":\"response.cancel\"}"))
                    .subscribeOn(Schedulers.boundedElastic())
                    .subscribe(v -> {}, err -> {});
            }

            // Clear deferred response flags if no MCP calls are in progress.
            // Without this, a stale needsResponseCreate from a collision during
            // the approval flow causes the model to re-speak results after the
            // user interrupts.
            if (state.mcpCallInProgress.get() <= 0) {
                state.needsResponseCreate = false;
                state.mcpResultsPending = false;
            }

            // Reset approved-servers-this-turn when user starts a new topic
            if (state.pendingApproval == null && state.mcpCallInProgress.get() <= 0) {
                state.approvedServersThisTurn.clear();
            }

            // If an MCP call is running and no approval is pending, mark as stale
            if (state.mcpCallInProgress.get() > 0 && state.pendingApproval == null) {
                state.staleMcpItems.addAll(state.activeMcpItems);
                System.out.println("[barge-in] Marking " + state.activeMcpItems.size() + " MCP calls as stale");
                sendSystemMessage(session,
                    "A tool call is still running in the background. The user just spoke. "
                    + "Respond to what the user said. If a tool result arrives later, "
                    + "briefly introduce it as a late result from an earlier request.")
                    .subscribeOn(Schedulers.boundedElastic())
                    .subscribe(v -> {}, err -> {});
            }

        } else if (eventType == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED) {
            System.out.println("šŸ¤” Processing...");

        } else if (eventType == ServerEventType.RESPONSE_CREATED) {
            state.responseActive = true;

        } else if (eventType == ServerEventType.RESPONSE_AUDIO_DELTA) {
            if (event instanceof SessionUpdateResponseAudioDelta) {
                SessionUpdateResponseAudioDelta audioEvent = (SessionUpdateResponseAudioDelta) event;
                byte[] audioData = audioEvent.getDelta();
                if (audioData != null && audioData.length > 0) {
                    audioProcessor.queueAudio(audioData);
                }
            }

        } else if (eventType == ServerEventType.RESPONSE_AUDIO_DONE) {
            System.out.println("šŸŽ¤ Ready for next input...");

        } else if (eventType == ServerEventType.RESPONSE_DONE) {
            state.responseActive = false;
            System.out.println("āœ… Response complete");
            writeLog("--- Response complete ---");

            // If an approval prompt needs to be injected, do it now
            if (state.approvalPromptNeeded && state.pendingApproval != null) {
                state.approvalPromptNeeded = false;
                sendApprovalVoicePrompt(state, session);
            // If MCP results are pending and all calls are now done, create response
            } else if (state.mcpResultsPending && state.mcpCallInProgress.get() <= 0 && state.pendingApproval == null) {
                state.mcpResultsPending = false;
                try {
                    session.send(BinaryData.fromString("{\"type\":\"response.create\"}"))
                        .subscribeOn(Schedulers.boundedElastic())
                        .subscribe(v -> {}, err -> {});
                } catch (Exception e) {
                    // best-effort
                }
            } else if (state.needsResponseCreate) {
                // Deferred response.create — retry now that no response is active
                state.needsResponseCreate = false;
                try {
                    session.send(BinaryData.fromString("{\"type\":\"response.create\"}"))
                        .subscribeOn(Schedulers.boundedElastic())
                        .subscribe(v -> {}, err -> {});
                } catch (Exception e) {
                    // best-effort retry
                }
            }

        } else if (eventType == ServerEventType.CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED) {
            String eventJson = BinaryData.fromObject(event).toString();
            String transcript = extractJsonField(eventJson, "transcript");
            System.out.println("šŸ‘¤ You said:\t" + transcript);
            writeLog("User Input:\t" + transcript);

            // Interpret as an approval answer if we have a pending approval
            if (state.pendingApproval != null) {
                resolveVoiceApproval(transcript, state, session);
            }

        } else if (eventType == ServerEventType.ERROR) {
            // Reset response state — errors can terminate a response without RESPONSE_DONE
            state.responseActive = false;
            if (event instanceof SessionUpdateError) {
                String msg = ((SessionUpdateError) event).getError().getMessage();
                if (msg.contains("no active response")) {
                    // suppress
                } else if (msg.toLowerCase().contains("interim response")) {
                    // non-fatal
                } else if (msg.toLowerCase().contains("active response")) {
                    // expected during MCP flow
                } else {
                    System.out.println("āŒ Error: " + msg);
                    writeLog("ERROR: " + msg);
                }
            }

        // MCP-specific events
        } else if (eventType == ServerEventType.MCP_LIST_TOOLS_COMPLETED) {
            System.out.println("šŸ”§ MCP tools discovered successfully");
            writeLog("MCP tools discovered successfully");

        } else if (eventType == ServerEventType.MCP_LIST_TOOLS_FAILED) {
            System.out.println("āŒ MCP tool discovery failed");
            writeLog("ERROR: MCP tool discovery failed");

        } else if (eventType == ServerEventType.RESPONSE_MCP_CALL_IN_PROGRESS) {
            System.out.println("ā³ MCP tool call in progress...");
            writeLog("MCP call in progress");
            state.mcpCallInProgress.incrementAndGet();
            String inProgressJson = BinaryData.fromObject(event).toString();
            String inProgressItemId = extractJsonField(inProgressJson, "item_id");
            if (inProgressItemId != null) state.activeMcpItems.add(inProgressItemId);
            startMcpStallTimer(state, session);

        } else if (eventType == ServerEventType.RESPONSE_MCP_CALL_COMPLETED) {
            String eventJson = BinaryData.fromObject(event).toString();
            String itemId = extractJsonField(eventJson, "item_id");
            state.mcpCallInProgress.updateAndGet(v -> Math.max(0, v - 1));
            if (itemId != null) state.activeMcpItems.remove(itemId);
            cancelMcpStallTimer(state);

            if (state.handledMcpCompletions.contains(itemId)) {
                // duplicate — ignore
            } else {
                state.handledMcpCompletions.add(itemId);
                boolean isStale = itemId != null && state.staleMcpItems.remove(itemId);
                System.out.println("āœ… MCP tool call completed (stale=" + isStale + ")");
                writeLog("MCP call completed: " + itemId + " (stale=" + isStale + ")");
                state.mcpItemToServer.remove(itemId);

                // Reset approval counter if no more approvals pending
                if (state.pendingApproval == null && state.approvalQueue.isEmpty()) {
                    state.approvalCallCount.clear();
                }

                // If the user moved on during this call, tell the model it's a late result.
                // Chain any late-result context message with the response.create below
                // to ensure the system message arrives first.
                Mono<Void> preResponseMono = Mono.empty();
                if (isStale) {
                    preResponseMono = sendSystemMessage(session,
                        "This tool result is from an earlier request. The user has "
                        + "since moved on. Briefly introduce it as a late result, e.g. "
                        + "'By the way, those results from earlier just came in...' "
                        + "then share the key findings concisely.");
                }

                // Batch response: only call response.create when ALL MCP calls for this
                // turn have completed. This prevents partial results and repeated tool calls.
                if (state.pendingApproval == null && state.approvalQueue.isEmpty()
                        && state.mcpCallInProgress.get() <= 0) {
                    preResponseMono
                        .then(session.send(BinaryData.fromString("{\"type\":\"response.create\"}")))
                        .subscribeOn(Schedulers.boundedElastic())
                        .subscribe(v -> {}, err -> {
                            if (err.getMessage().toLowerCase().contains("active response")) {
                                state.needsResponseCreate = true;
                            }
                        });
                } else {
                    preResponseMono
                        .subscribeOn(Schedulers.boundedElastic())
                        .subscribe(v -> {}, err -> {});
                    state.mcpResultsPending = true;
                    System.out.println("[mcp] MCP calls still in progress (" + state.mcpCallInProgress.get() + ") or approval pending — deferring response");
                }
            }

        } else if (eventType == ServerEventType.RESPONSE_MCP_CALL_FAILED) {
            System.out.println("āŒ MCP tool call failed");
            writeLog("ERROR: MCP tool call failed");
            String failedJson = BinaryData.fromObject(event).toString();
            String failedItemId = extractJsonField(failedJson, "item_id");
            state.mcpCallInProgress.updateAndGet(v -> Math.max(0, v - 1));
            if (failedItemId != null) {
                state.activeMcpItems.remove(failedItemId);
                state.staleMcpItems.remove(failedItemId);
            }
            cancelMcpStallTimer(state);
            try {
                session.send(BinaryData.fromString("{\"type\":\"response.create\"}"))
                    .subscribeOn(Schedulers.boundedElastic())
                    .subscribe(v -> {}, err -> {});
            } catch (Exception e) {
                // best effort
            }

        } else if (eventType == ServerEventType.CONVERSATION_ITEM_CREATED) {
            handleMCPConversationItem(event, state, session);
        }
    } catch (Exception e) {
        System.err.println("āŒ Error handling event: " + e.getMessage());
    }
}

Handle approval requests

When a server is configured with requireApproval("always"), client code must handle the approval flow. Instead of blocking on Scanner.nextLine(), inject a system message so the model asks the user verbally and parse the spoken response.

/**
 * Handle MCP conversation items: approval requests, tool call announcements,
 * and item-to-server tracking.
 */
private static void handleMCPConversationItem(SessionUpdate event, SessionState state,
                                                VoiceLiveSessionAsyncClient session) {
    String eventJson = BinaryData.fromObject(event).toString();

    if (eventJson.contains("mcp_approval_request")) {
        // Extract approval details
        String approvalId = extractJsonField(eventJson, "id");
        String serverLabel = extractJsonField(eventJson, "server_label");
        String functionName = extractJsonField(eventJson, "name");

        if ("unknown".equals(approvalId)) {
            return;
        }

        final int MAX_APPROVAL_CALLS_PER_TASK = 3;
        int currentCount = state.approvalCallCount.getOrDefault(serverLabel, 0);
        if (currentCount >= MAX_APPROVAL_CALLS_PER_TASK) {
            System.out.println("   Auto-denied: " + serverLabel + "/" + functionName
                + " (max " + MAX_APPROVAL_CALLS_PER_TASK + " calls reached)");
            try {
                String denyJson = String.format(
                    "{\"type\":\"conversation.item.create\",\"item\":"
                    + "{\"type\":\"mcp_approval_response\","
                    + "\"approval_request_id\":\"%s\","
                    + "\"approve\":false}}",
                    approvalId);
                session.send(BinaryData.fromString(denyJson))
                    .subscribeOn(Schedulers.boundedElastic())
                    .subscribe(v -> {}, err ->
                        System.err.println("Failed to send auto-deny: " + err.getMessage()));
            } catch (Exception e) {
                System.err.println("Failed to send auto-deny: " + e.getMessage());
            }
            return;
        }

        // Auto-approve if user already approved this server earlier in the same turn
        if (state.approvedServersThisTurn.contains(serverLabel)) {
            System.out.println("   Auto-approved: " + serverLabel + "/" + functionName
                + " (already approved this turn)");
            try {
                String approveJson = String.format(
                    "{\"type\":\"conversation.item.create\",\"item\":"
                    + "{\"type\":\"mcp_approval_response\","
                    + "\"approval_request_id\":\"%s\","
                    + "\"approve\":true}}",
                    approvalId);
                session.send(BinaryData.fromString(approveJson))
                    .subscribeOn(Schedulers.boundedElastic())
                    .subscribe(v -> {}, err ->
                        System.err.println("Failed to send auto-approve: " + err.getMessage()));
            } catch (Exception e) {
                System.err.println("Failed to send auto-approve: " + e.getMessage());
            }
            return;
        }

        // If another approval is already pending, queue this one
        if (state.pendingApproval != null) {
            state.approvalQueue.add(
                new SessionState.ApprovalInfo(approvalId, serverLabel, functionName));
            return;
        }

        System.out.println();
        System.out.println("šŸ” MCP Approval Request (voice-based):");
        System.out.println("   Server: " + serverLabel + "  Tool: " + functionName);
        writeLog("Approval request: server=" + serverLabel + " tool=" + functionName);

        state.pendingApproval =
            new SessionState.ApprovalInfo(approvalId, serverLabel, functionName);

        if (!state.responseActive) {
            sendApprovalVoicePrompt(state, session);
        } else {
            state.approvalPromptNeeded = true;
        }

    } else if (eventJson.contains("\"type\":\"mcp_call\"")) {
        // Track MCP call items and announce non-approval tool calls
        String itemId = extractJsonField(eventJson, "id");
        String serverLabel = extractJsonField(eventJson, "server_label");
        String functionName = extractJsonField(eventJson, "name");
        System.out.println("šŸ”§ MCP tool call: " + serverLabel + "/" + functionName);
        state.mcpItemToServer.put(itemId, serverLabel + "/" + functionName);

        // Announce to the user if this server doesn't require approval
        if (state.pendingApproval == null && !state.approvalServers.contains(serverLabel)) {
            sendSystemMessage(session,
                "Briefly tell the user you're looking something up. One short sentence only.")
                .then(session.send(BinaryData.fromString("{\"type\":\"response.create\"}")))
                .subscribeOn(Schedulers.boundedElastic())
                .subscribe(v -> {}, err -> {});
        }
    }
}

/**
 * Inject a system message asking the model to verbally request permission.
 */
private static void sendApprovalVoicePrompt(SessionState state,
                                              VoiceLiveSessionAsyncClient session) {
    SessionState.ApprovalInfo pending = state.pendingApproval;
    if (pending == null) return;

    int callCount = state.approvalCallCount.getOrDefault(pending.serverLabel(), 0);
    state.approvalCallCount.put(pending.serverLabel(), callCount + 1);

    String prompt;
    if (callCount == 0) {
        prompt = "You MUST ask the user for explicit permission before proceeding. "
            + "Say exactly: \"I'd like to search the " + pending.serverLabel()
            + " service for information. Do you approve? Please say yes or no.\"";
    } else {
        prompt = "You MUST ask the user for permission again. "
            + "Say exactly: \"I need to do one more search to get complete information. "
            + "Should I continue? Please say yes or no.\"";
    }

    sendSystemMessage(session, prompt)
        .then(session.send(BinaryData.fromString("{\"type\":\"response.create\"}")))
        .subscribeOn(Schedulers.boundedElastic())
        .subscribe(v -> {}, err ->
            System.err.println("āŒ Failed to send approval voice prompt: " + err.getMessage()));
}

/**
 * Interpret the user's spoken response as approval or denial.
 */
private static void resolveVoiceApproval(String transcript, SessionState state,
                                           VoiceLiveSessionAsyncClient session) {
    SessionState.ApprovalInfo pending = state.pendingApproval;
    if (pending == null) return;

    String text = transcript.trim().toLowerCase();
    boolean approved = YES_PATTERN.matcher(text).find();
    boolean denied = NO_PATTERN.matcher(text).find();

    if (!approved && !denied) {
        // Ambiguous — ask again at next RESPONSE_DONE
        state.approvalPromptNeeded = true;
        return;
    }
    if (approved && denied) {
        approved = false; // conflicting signals — deny for safety
    }

    state.pendingApproval = null;
    if (approved) {
        state.approvedServersThisTurn.add(pending.serverLabel());
    } else {
        state.approvalCallCount.clear();
        state.approvedServersThisTurn.remove(pending.serverLabel());
    }

    System.out.println("   Voice approval: " + (approved ? "Approved āœ…" : "Denied āŒ"));
    writeLog("Approval resolved: " + (approved ? "APPROVED" : "DENIED") + " for " + pending.serverLabel() + "/" + pending.functionName());

    // Send approval/denial response via raw JSON.
    // Chain processNextApproval after the send completes to avoid racing.
    String approvalJson = String.format(
        "{\"type\":\"conversation.item.create\",\"item\":"
        + "{\"type\":\"mcp_approval_response\","
        + "\"approval_request_id\":\"%s\","
        + "\"approve\":%s}}",
        pending.approvalId(), approved);

    session.send(BinaryData.fromString(approvalJson))
        .subscribeOn(Schedulers.boundedElastic())
        .subscribe(
            v -> processNextApproval(state, session),
            error -> {
                System.err.println("āŒ Failed to send approval response: " + error.getMessage());
                processNextApproval(state, session);
            }
        );
}

/**
 * Pop the next queued approval and ask via voice.
 */
private static void processNextApproval(SessionState state,
                                          VoiceLiveSessionAsyncClient session) {
    SessionState.ApprovalInfo next = state.approvalQueue.poll();
    if (next == null) return;

    // Auto-approve if user already approved this server earlier in the same turn
    if (state.approvedServersThisTurn.contains(next.serverLabel())) {
        System.out.println("   Auto-approved (queued): " + next.serverLabel() + "/" + next.functionName());
        String approveJson = String.format(
            "{\"type\":\"conversation.item.create\",\"item\":"
            + "{\"type\":\"mcp_approval_response\","
            + "\"approval_request_id\":\"%s\","
            + "\"approve\":true}}",
            next.approvalId());
        session.send(BinaryData.fromString(approveJson))
            .subscribeOn(Schedulers.boundedElastic())
            .subscribe(
                v -> processNextApproval(state, session),
                err -> {
                    System.err.println("Failed to send queued auto-approve: " + err.getMessage());
                    processNextApproval(state, session);
                });
        return;
    }

    state.pendingApproval = next;
    if (!state.responseActive) {
        sendApprovalVoicePrompt(state, session);
    } else {
        state.approvalPromptNeeded = true;
    }
}

In this sample:

  • A system message instructs the model to verbally ask for permission.
  • MCPApprovalResponseRequestItem sends the decision back to Voice Live.

Resolve voice-based approval

Parse the user's spoken transcript to determine approval. Use word-boundary regex to avoid false positives from words like "yesterday" or "nobody".

} else if (eventType == ServerEventType.CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED) {
    String eventJson = BinaryData.fromObject(event).toString();
    String transcript = extractJsonField(eventJson, "transcript");
    System.out.println("šŸ‘¤ You said:\t" + transcript);
    writeLog("User Input:\t" + transcript);

    // Interpret as an approval answer if we have a pending approval
    if (state.pendingApproval != null) {
        resolveVoiceApproval(transcript, state, session);
    }

In this sample:

  • The transcript from CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED is matched against \byes\b and \b(no|stop|cancel)\b patterns.
  • Subsequent calls to the same server within the same turn are auto-approved to avoid repeated prompts.
  • After a configurable maximum (for example, 3 approvals), further calls are auto-denied and the model responds with what it has.

Detect stalls during MCP tool calls

MCP tool calls can take several seconds. Use a repeating timer to proactively inform the user that the assistant is still waiting for results.

/**
 * Start a timer that verbally updates the user if an MCP call takes too long.
 */
private static void startMcpStallTimer(SessionState state,
                                         VoiceLiveSessionAsyncClient session) {
    cancelMcpStallTimer(state);
    final AtomicInteger stallCount = new AtomicInteger(0);
    state.mcpStallTimer = SCHEDULER.scheduleAtFixedRate(() -> {
        if (state.mcpCallInProgress.get() <= 0) {
            cancelMcpStallTimer(state);
            return;
        }
        int count = stallCount.incrementAndGet();
        if (count > 3) {
            cancelMcpStallTimer(state);
            return;
        }
        // MCP calls cannot be cancelled — only honest status updates are possible.
        String msg = "The tool call is still running. "
            + "Briefly reassure the user that you're still waiting for results. "
            + "One short sentence only.";
        sendSystemMessage(session, msg)
            .then(session.send(BinaryData.fromString("{\"type\":\"response.create\"}")))
            .subscribeOn(Schedulers.boundedElastic())
            .subscribe(v -> {}, err -> {
                if (err.getMessage() != null
                    && err.getMessage().toLowerCase().contains("active response")) {
                    state.needsResponseCreate = true;
                }
            });
    }, 10, 10, TimeUnit.SECONDS);
}

/**
 * Cancel the MCP stall timer if running.
 */
private static void cancelMcpStallTimer(SessionState state) {
    ScheduledFuture<?> timer = state.mcpStallTimer;
    if (timer != null && !timer.isDone()) {
        timer.cancel(false);
    }
    state.mcpStallTimer = null;
}

In this sample:

  • A ScheduledExecutorService fires at a 10-second interval, injecting system messages up to 3 times.
  • The timer is cancelled when the MCP call completes or the user interrupts with barge-in.

Run the sample

  1. Create the src/main/java/MCPQuickstart.java file with the following code:

    // Copyright (c) Microsoft Corporation. All rights reserved.
    // Licensed under the MIT License.
    
    import com.azure.ai.voicelive.VoiceLiveAsyncClient;
    import com.azure.ai.voicelive.VoiceLiveClientBuilder;
    import com.azure.ai.voicelive.VoiceLiveServiceVersion;
    import com.azure.ai.voicelive.VoiceLiveSessionAsyncClient;
    import com.azure.ai.voicelive.models.AudioEchoCancellation;
    import com.azure.ai.voicelive.models.AudioInputTranscriptionOptions;
    import com.azure.ai.voicelive.models.AudioInputTranscriptionOptionsModel;
    import com.azure.ai.voicelive.models.AudioNoiseReduction;
    import com.azure.ai.voicelive.models.AudioNoiseReductionType;
    import com.azure.ai.voicelive.models.AzureStandardVoice;
    import com.azure.ai.voicelive.models.ClientEventSessionUpdate;
    import com.azure.ai.voicelive.models.InputAudioFormat;
    import com.azure.ai.voicelive.models.InteractionModality;
    import com.azure.ai.voicelive.models.MCPServer;
    import com.azure.ai.voicelive.models.OutputAudioFormat;
    import com.azure.ai.voicelive.models.ServerEventType;
    import com.azure.ai.voicelive.models.ServerVadTurnDetection;
    import com.azure.ai.voicelive.models.SessionUpdate;
    import com.azure.ai.voicelive.models.SessionUpdateError;
    import com.azure.ai.voicelive.models.SessionUpdateResponseAudioDelta;
    import com.azure.ai.voicelive.models.VoiceLiveSessionOptions;
    import com.azure.ai.voicelive.models.VoiceLiveToolDefinition;
    import com.azure.core.credential.KeyCredential;
    import com.azure.core.credential.TokenCredential;
    import com.azure.core.util.BinaryData;
    import com.azure.identity.AzureCliCredentialBuilder;
    import reactor.core.publisher.Mono;
    import reactor.core.scheduler.Schedulers;
    
    import javax.sound.sampled.AudioFormat;
    import javax.sound.sampled.AudioSystem;
    import javax.sound.sampled.DataLine;
    import javax.sound.sampled.LineUnavailableException;
    import javax.sound.sampled.SourceDataLine;
    import javax.sound.sampled.TargetDataLine;
    
    import java.io.FileInputStream;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.PrintWriter;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    import java.time.LocalDateTime;
    import java.time.format.DateTimeFormatter;
    import java.util.ArrayList;
    import java.util.Arrays;
    import java.util.List;
    import java.util.Map;
    import java.util.Properties;
    import java.util.Queue;
    import java.util.Set;
    import java.util.concurrent.BlockingQueue;
    import java.util.concurrent.ConcurrentHashMap;
    import java.util.concurrent.ConcurrentLinkedQueue;
    import java.util.concurrent.Executors;
    import java.util.concurrent.LinkedBlockingQueue;
    import java.util.concurrent.ScheduledExecutorService;
    import java.util.concurrent.ScheduledFuture;
    import java.util.concurrent.TimeUnit;
    import java.util.concurrent.atomic.AtomicBoolean;
    import java.util.concurrent.atomic.AtomicInteger;
    import java.util.concurrent.atomic.AtomicReference;
    import java.util.regex.Pattern;
    
    /**
     * MCP Quickstart - demonstrates MCP server integration with the VoiceLive SDK.
     * Shows how to define MCP servers, handle MCP tool calls, and implement
     * an approval flow for tool calls that require user consent.
     *
     * <p><strong>Environment Variables Required:</strong></p>
     * <ul>
     *   <li>AZURE_VOICELIVE_ENDPOINT - The VoiceLive service endpoint URL</li>
     *   <li>AZURE_VOICELIVE_API_KEY - The API key (required if not using --use-token-credential)</li>
     * </ul>
     *
     * <p><strong>How to Run:</strong></p>
     * <pre>{@code
     * mvn compile exec:java -Dexec.mainClass="MCPQuickstart" -q
     * }</pre>
     */
    public final class MCPQuickstart {
    
        private static final String DEFAULT_MODEL = "gpt-realtime";
        private static final String DEFAULT_VOICE = "en-US-Ava:DragonHDLatestNeural";
        private static final String DEFAULT_INSTRUCTIONS =
            "You are a helpful AI assistant with access to MCP tools. "
            + "Use the tools to help answer user questions. "
            + "Respond naturally and conversationally. "
            + "Some tools require user approval before they can be used. When you receive a "
            + "system message asking you to request permission, you MUST clearly ask the user "
            + "for their explicit approval before proceeding. Always wait for the user to say "
            + "yes or no. Never skip the approval question or assume permission is granted. "
            + "If a tool result arrives after the conversation has moved to a different topic, "
            + "briefly introduce it as a late result before sharing the findings.";
    
        private static final String ENV_ENDPOINT = "AZURE_VOICELIVE_ENDPOINT";
        private static final String ENV_API_KEY = "AZURE_VOICELIVE_API_KEY";
    
        private static final int SAMPLE_RATE = 24000;
        private static final int CHANNELS = 1;
        private static final int SAMPLE_SIZE_BITS = 16;
        private static final int CHUNK_SIZE = 1200;
        private static final int AUDIO_BUFFER_SIZE_MULTIPLIER = 4;
    
        private MCPQuickstart() {
            throw new UnsupportedOperationException("Utility class");
        }
    
        private static final ScheduledExecutorService SCHEDULER = Executors.newSingleThreadScheduledExecutor(r -> {
            Thread t = new Thread(r, "MCP-StallTimer");
            t.setDaemon(true);
            return t;
        });
    
        private static final Pattern YES_PATTERN = Pattern.compile("\\byes\\b", Pattern.CASE_INSENSITIVE);
        private static final Pattern NO_PATTERN = Pattern.compile("\\b(no|stop|cancel)\\b", Pattern.CASE_INSENSITIVE);
    
        private static final String LOG_FILENAME = "conversation_"
                + LocalDateTime.now().format(DateTimeFormatter.ofPattern("yyyyMMdd_HHmmss")) + ".log";
    
        /**
         * Mutable session state shared across event handlers.
         * All fields are thread-safe (volatile or concurrent collections).
         */
        private static class SessionState {
            volatile ApprovalInfo pendingApproval;
            final Queue<ApprovalInfo> approvalQueue = new ConcurrentLinkedQueue<>();
            volatile boolean approvalPromptNeeded;
            final AtomicInteger mcpCallInProgress = new AtomicInteger(0);
            final Set<String> handledMcpCompletions = ConcurrentHashMap.newKeySet();
            volatile boolean needsResponseCreate;
            final Map<String, Integer> approvalCallCount = new ConcurrentHashMap<>();
            final Map<String, String> mcpItemToServer = new ConcurrentHashMap<>();
            Set<String> approvalServers = Set.of();
            volatile ScheduledFuture<?> mcpStallTimer;
            volatile boolean responseActive;
            final Set<String> activeMcpItems = ConcurrentHashMap.newKeySet();
            final Set<String> staleMcpItems = ConcurrentHashMap.newKeySet();
            volatile boolean mcpResultsPending;
            final Set<String> approvedServersThisTurn = ConcurrentHashMap.newKeySet();
    
            static class ApprovalInfo {
                final String approvalId;
                final String serverLabel;
                final String functionName;
    
                ApprovalInfo(String approvalId, String serverLabel, String functionName) {
                    this.approvalId = approvalId;
                    this.serverLabel = serverLabel;
                    this.functionName = functionName;
                }
    
                String approvalId() { return approvalId; }
                String serverLabel() { return serverLabel; }
                String functionName() { return functionName; }
            }
        }
    
        private static class AudioPlaybackPacket {
            final int sequenceNumber;
            final byte[] audioData;
    
            AudioPlaybackPacket(int sequenceNumber, byte[] audioData) {
                this.sequenceNumber = sequenceNumber;
                this.audioData = audioData;
            }
        }
    
        /**
         * Audio processor for real-time capture and playback.
         */
        private static class AudioProcessor {
            private final VoiceLiveSessionAsyncClient session;
            private final AudioFormat audioFormat;
    
            private TargetDataLine microphone;
            private SourceDataLine speaker;
            private final AtomicBoolean isCapturing = new AtomicBoolean(false);
            private final AtomicBoolean isPlaying = new AtomicBoolean(false);
            private final BlockingQueue<AudioPlaybackPacket> playbackQueue = new LinkedBlockingQueue<>();
            private final AtomicInteger nextSequenceNumber = new AtomicInteger(0);
            private final AtomicInteger playbackBase = new AtomicInteger(0);
    
            AudioProcessor(VoiceLiveSessionAsyncClient session) {
                this.session = session;
                this.audioFormat = new AudioFormat(
                    AudioFormat.Encoding.PCM_SIGNED,
                    SAMPLE_RATE, SAMPLE_SIZE_BITS, CHANNELS,
                    CHANNELS * SAMPLE_SIZE_BITS / 8, SAMPLE_RATE, false
                );
            }
    
            void startCapture() {
                if (isCapturing.get()) return;
    
                try {
                    DataLine.Info micInfo = new DataLine.Info(TargetDataLine.class, audioFormat);
                    microphone = (TargetDataLine) AudioSystem.getLine(micInfo);
                    microphone.open(audioFormat, CHUNK_SIZE * AUDIO_BUFFER_SIZE_MULTIPLIER);
                    microphone.start();
                    isCapturing.set(true);
    
                    Thread captureThread = new Thread(this::captureAudioLoop, "VoiceLive-AudioCapture");
                    captureThread.setDaemon(true);
                    captureThread.start();
                    System.out.println("šŸŽ¤ Microphone capture started");
                } catch (LineUnavailableException e) {
                    throw new RuntimeException("Failed to initialize microphone", e);
                }
            }
    
            void startPlayback() {
                if (isPlaying.get()) return;
    
                try {
                    DataLine.Info speakerInfo = new DataLine.Info(SourceDataLine.class, audioFormat);
                    speaker = (SourceDataLine) AudioSystem.getLine(speakerInfo);
                    speaker.open(audioFormat, CHUNK_SIZE * AUDIO_BUFFER_SIZE_MULTIPLIER);
                    speaker.start();
                    isPlaying.set(true);
    
                    Thread playbackThread = new Thread(this::playbackAudioLoop, "VoiceLive-AudioPlayback");
                    playbackThread.setDaemon(true);
                    playbackThread.start();
                    System.out.println("šŸ”Š Audio playback started");
                } catch (LineUnavailableException e) {
                    throw new RuntimeException("Failed to initialize speaker", e);
                }
            }
    
            private void captureAudioLoop() {
                byte[] buffer = new byte[CHUNK_SIZE * 2];
                while (isCapturing.get() && microphone != null) {
                    try {
                        int bytesRead = microphone.read(buffer, 0, buffer.length);
                        if (bytesRead > 0) {
                            byte[] audioChunk = Arrays.copyOf(buffer, bytesRead);
                            session.sendInputAudio(BinaryData.fromBytes(audioChunk))
                                .subscribeOn(Schedulers.boundedElastic())
                                .subscribe(v -> {}, error -> {
                                    if (!error.getMessage().contains("cancelled")) {
                                        System.err.println("āŒ Error sending audio: " + error.getMessage());
                                    }
                                });
                        }
                    } catch (Exception e) {
                        if (isCapturing.get()) {
                            System.err.println("āŒ Error in audio capture: " + e.getMessage());
                        }
                        break;
                    }
                }
            }
    
            private void playbackAudioLoop() {
                while (isPlaying.get()) {
                    try {
                        AudioPlaybackPacket packet = playbackQueue.take();
                        if (packet.audioData == null) break;
                        if (packet.sequenceNumber < playbackBase.get()) continue;
                        if (speaker != null && speaker.isOpen()) {
                            speaker.write(packet.audioData, 0, packet.audioData.length);
                        }
                    } catch (InterruptedException e) {
                        Thread.currentThread().interrupt();
                        break;
                    }
                }
            }
    
            void queueAudio(byte[] audioData) {
                if (audioData != null && audioData.length > 0) {
                    int seqNum = nextSequenceNumber.getAndIncrement();
                    playbackQueue.offer(new AudioPlaybackPacket(seqNum, audioData));
                }
            }
    
            void skipPendingAudio() {
                playbackBase.set(nextSequenceNumber.get());
                playbackQueue.clear();
                if (speaker != null && speaker.isOpen()) speaker.flush();
            }
    
            void shutdown() {
                isCapturing.set(false);
                if (microphone != null) { microphone.stop(); microphone.close(); microphone = null; }
                isPlaying.set(false);
                playbackQueue.offer(new AudioPlaybackPacket(-1, null));
                if (speaker != null) { speaker.stop(); speaker.close(); speaker = null; }
                System.out.println("šŸ”‡ Audio processor shut down");
            }
        }
    
        private static class Config {
            String endpoint;
            String apiKey;
            String model = DEFAULT_MODEL;
            String voice = DEFAULT_VOICE;
            String instructions = DEFAULT_INSTRUCTIONS;
            boolean useTokenCredential = false;
    
            static Config load(String[] args) {
                Config config = new Config();
                Properties props = loadProperties();
                if (props != null) {
                    config.endpoint = props.getProperty("azure.voicelive.endpoint");
                    config.apiKey = props.getProperty("azure.voicelive.api-key");
                    config.model = props.getProperty("azure.voicelive.model", DEFAULT_MODEL);
                    config.voice = props.getProperty("azure.voicelive.voice", DEFAULT_VOICE);
                }
                if (System.getenv(ENV_ENDPOINT) != null) config.endpoint = System.getenv(ENV_ENDPOINT);
                if (System.getenv(ENV_API_KEY) != null) config.apiKey = System.getenv(ENV_API_KEY);
    
                for (int i = 0; i < args.length; i++) {
                    switch (args[i]) {
                        case "--endpoint": if (i + 1 < args.length) config.endpoint = args[++i]; break;
                        case "--api-key": if (i + 1 < args.length) config.apiKey = args[++i]; break;
                        case "--model": if (i + 1 < args.length) config.model = args[++i]; break;
                        case "--voice": if (i + 1 < args.length) config.voice = args[++i]; break;
                        case "--use-token-credential": config.useTokenCredential = true; break;
                    }
                }
                return config;
            }
        }
    
        private static Properties loadProperties() {
            Properties props = new Properties();
            try (InputStream input = new FileInputStream("application.properties")) {
                props.load(input);
                return props;
            } catch (IOException e) {
                return null;
            }
        }
    
        // <define_mcp_servers>
        /**
         * Define MCP servers that Voice Live can use during the session.
         * Each server is an MCPServer instance added to the session options tools list.
         */
        private static List<VoiceLiveToolDefinition> defineMCPServers() {
            List<VoiceLiveToolDefinition> mcpTools = new ArrayList<>();
    
            mcpTools.add(new MCPServer("deepwiki", "https://mcp.deepwiki.com/mcp")
                .setAllowedTools(Arrays.asList("read_wiki_structure", "ask_question"))
                .setRequireApproval(BinaryData.fromString("never")));
    
            mcpTools.add(new MCPServer("azure_doc", "https://learn.microsoft.com/api/mcp")
                .setRequireApproval(BinaryData.fromString("always")));
    
            return mcpTools;
        }
        // </define_mcp_servers>
    
        // <configure_session>
        /**
         * Create session configuration with MCP servers in the tools list.
         */
        private static VoiceLiveSessionOptions createSessionOptions(Config config) {
            ServerVadTurnDetection turnDetection = new ServerVadTurnDetection()
                .setThreshold(0.5)
                .setPrefixPaddingMs(300)
                .setSilenceDurationMs(500)
                .setInterruptResponse(true)
                .setAutoTruncate(true)
                .setCreateResponse(true);
    
            // Enable input audio transcription so we receive user speech as text
            AudioInputTranscriptionOptionsModel transcriptionModel = config.model.toLowerCase().contains("realtime")
                ? AudioInputTranscriptionOptionsModel.WHISPER_1
                : AudioInputTranscriptionOptionsModel.fromString("azure-speech");
            AudioInputTranscriptionOptions transcriptionOptions =
                new AudioInputTranscriptionOptions(transcriptionModel);
    
            VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
                .setInstructions(config.instructions)
                .setVoice(BinaryData.fromObject(new AzureStandardVoice(config.voice)))
                .setModalities(Arrays.asList(InteractionModality.TEXT, InteractionModality.AUDIO))
                .setInputAudioFormat(InputAudioFormat.PCM16)
                .setOutputAudioFormat(OutputAudioFormat.PCM16)
                .setInputAudioSamplingRate(SAMPLE_RATE)
                .setInputAudioNoiseReduction(new AudioNoiseReduction(AudioNoiseReductionType.NEAR_FIELD))
                .setInputAudioEchoCancellation(new AudioEchoCancellation())
                .setInputAudioTranscription(transcriptionOptions)
                .setTurnDetection(turnDetection);
    
            // Add MCP servers to the tools list
            List<VoiceLiveToolDefinition> mcpServers = defineMCPServers();
            options.setTools(mcpServers);
    
            return options;
        }
        // </configure_session>
    
        // <handle_mcp_events>
        /**
         * Handle incoming server events, including MCP-specific events
         * and voice-based approval flow.
         */
        private static void handleServerEvent(SessionUpdate event, AudioProcessor audioProcessor,
                                               SessionState state, VoiceLiveSessionAsyncClient session) {
            ServerEventType eventType = event.getType();
    
            try {
                if (eventType == ServerEventType.SESSION_UPDATED) {
                    System.out.println("āœ“ Session updated - starting microphone");
                    writeLog("Session updated");
                    audioProcessor.startCapture();
    
                } else if (eventType == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STARTED) {
                    System.out.println("šŸŽ¤ Listening...");
                    audioProcessor.skipPendingAudio();
    
                    // Cancel any active response — prevents duplicate result playback
                    // when the user interrupts during MCP result speech (matches C#/Python/JS)
                    if (state.responseActive) {
                        session.send(BinaryData.fromString("{\"type\":\"response.cancel\"}"))
                            .subscribeOn(Schedulers.boundedElastic())
                            .subscribe(v -> {}, err -> {});
                    }
    
                    // Clear deferred response flags if no MCP calls are in progress.
                    // Without this, a stale needsResponseCreate from a collision during
                    // the approval flow causes the model to re-speak results after the
                    // user interrupts.
                    if (state.mcpCallInProgress.get() <= 0) {
                        state.needsResponseCreate = false;
                        state.mcpResultsPending = false;
                    }
    
                    // Reset approved-servers-this-turn when user starts a new topic
                    if (state.pendingApproval == null && state.mcpCallInProgress.get() <= 0) {
                        state.approvedServersThisTurn.clear();
                    }
    
                    // If an MCP call is running and no approval is pending, mark as stale
                    if (state.mcpCallInProgress.get() > 0 && state.pendingApproval == null) {
                        state.staleMcpItems.addAll(state.activeMcpItems);
                        System.out.println("[barge-in] Marking " + state.activeMcpItems.size() + " MCP calls as stale");
                        sendSystemMessage(session,
                            "A tool call is still running in the background. The user just spoke. "
                            + "Respond to what the user said. If a tool result arrives later, "
                            + "briefly introduce it as a late result from an earlier request.")
                            .subscribeOn(Schedulers.boundedElastic())
                            .subscribe(v -> {}, err -> {});
                    }
    
                } else if (eventType == ServerEventType.INPUT_AUDIO_BUFFER_SPEECH_STOPPED) {
                    System.out.println("šŸ¤” Processing...");
    
                } else if (eventType == ServerEventType.RESPONSE_CREATED) {
                    state.responseActive = true;
    
                } else if (eventType == ServerEventType.RESPONSE_AUDIO_DELTA) {
                    if (event instanceof SessionUpdateResponseAudioDelta) {
                        SessionUpdateResponseAudioDelta audioEvent = (SessionUpdateResponseAudioDelta) event;
                        byte[] audioData = audioEvent.getDelta();
                        if (audioData != null && audioData.length > 0) {
                            audioProcessor.queueAudio(audioData);
                        }
                    }
    
                } else if (eventType == ServerEventType.RESPONSE_AUDIO_DONE) {
                    System.out.println("šŸŽ¤ Ready for next input...");
    
                } else if (eventType == ServerEventType.RESPONSE_DONE) {
                    state.responseActive = false;
                    System.out.println("āœ… Response complete");
                    writeLog("--- Response complete ---");
    
                    // If an approval prompt needs to be injected, do it now
                    if (state.approvalPromptNeeded && state.pendingApproval != null) {
                        state.approvalPromptNeeded = false;
                        sendApprovalVoicePrompt(state, session);
                    // If MCP results are pending and all calls are now done, create response
                    } else if (state.mcpResultsPending && state.mcpCallInProgress.get() <= 0 && state.pendingApproval == null) {
                        state.mcpResultsPending = false;
                        try {
                            session.send(BinaryData.fromString("{\"type\":\"response.create\"}"))
                                .subscribeOn(Schedulers.boundedElastic())
                                .subscribe(v -> {}, err -> {});
                        } catch (Exception e) {
                            // best-effort
                        }
                    } else if (state.needsResponseCreate) {
                        // Deferred response.create — retry now that no response is active
                        state.needsResponseCreate = false;
                        try {
                            session.send(BinaryData.fromString("{\"type\":\"response.create\"}"))
                                .subscribeOn(Schedulers.boundedElastic())
                                .subscribe(v -> {}, err -> {});
                        } catch (Exception e) {
                            // best-effort retry
                        }
                    }
    
                // <voice_approval_transcription>
                } else if (eventType == ServerEventType.CONVERSATION_ITEM_INPUT_AUDIO_TRANSCRIPTION_COMPLETED) {
                    String eventJson = BinaryData.fromObject(event).toString();
                    String transcript = extractJsonField(eventJson, "transcript");
                    System.out.println("šŸ‘¤ You said:\t" + transcript);
                    writeLog("User Input:\t" + transcript);
    
                    // Interpret as an approval answer if we have a pending approval
                    if (state.pendingApproval != null) {
                        resolveVoiceApproval(transcript, state, session);
                    }
                // </voice_approval_transcription>
    
                } else if (eventType == ServerEventType.ERROR) {
                    // Reset response state — errors can terminate a response without RESPONSE_DONE
                    state.responseActive = false;
                    if (event instanceof SessionUpdateError) {
                        String msg = ((SessionUpdateError) event).getError().getMessage();
                        if (msg.contains("no active response")) {
                            // suppress
                        } else if (msg.toLowerCase().contains("interim response")) {
                            // non-fatal
                        } else if (msg.toLowerCase().contains("active response")) {
                            // expected during MCP flow
                        } else {
                            System.out.println("āŒ Error: " + msg);
                            writeLog("ERROR: " + msg);
                        }
                    }
    
                // MCP-specific events
                } else if (eventType == ServerEventType.MCP_LIST_TOOLS_COMPLETED) {
                    System.out.println("šŸ”§ MCP tools discovered successfully");
                    writeLog("MCP tools discovered successfully");
    
                } else if (eventType == ServerEventType.MCP_LIST_TOOLS_FAILED) {
                    System.out.println("āŒ MCP tool discovery failed");
                    writeLog("ERROR: MCP tool discovery failed");
    
                } else if (eventType == ServerEventType.RESPONSE_MCP_CALL_IN_PROGRESS) {
                    System.out.println("ā³ MCP tool call in progress...");
                    writeLog("MCP call in progress");
                    state.mcpCallInProgress.incrementAndGet();
                    String inProgressJson = BinaryData.fromObject(event).toString();
                    String inProgressItemId = extractJsonField(inProgressJson, "item_id");
                    if (inProgressItemId != null) state.activeMcpItems.add(inProgressItemId);
                    startMcpStallTimer(state, session);
    
                } else if (eventType == ServerEventType.RESPONSE_MCP_CALL_COMPLETED) {
                    String eventJson = BinaryData.fromObject(event).toString();
                    String itemId = extractJsonField(eventJson, "item_id");
                    state.mcpCallInProgress.updateAndGet(v -> Math.max(0, v - 1));
                    if (itemId != null) state.activeMcpItems.remove(itemId);
                    cancelMcpStallTimer(state);
    
                    if (state.handledMcpCompletions.contains(itemId)) {
                        // duplicate — ignore
                    } else {
                        state.handledMcpCompletions.add(itemId);
                        boolean isStale = itemId != null && state.staleMcpItems.remove(itemId);
                        System.out.println("āœ… MCP tool call completed (stale=" + isStale + ")");
                        writeLog("MCP call completed: " + itemId + " (stale=" + isStale + ")");
                        state.mcpItemToServer.remove(itemId);
    
                        // Reset approval counter if no more approvals pending
                        if (state.pendingApproval == null && state.approvalQueue.isEmpty()) {
                            state.approvalCallCount.clear();
                        }
    
                        // If the user moved on during this call, tell the model it's a late result.
                        // Chain any late-result context message with the response.create below
                        // to ensure the system message arrives first.
                        Mono<Void> preResponseMono = Mono.empty();
                        if (isStale) {
                            preResponseMono = sendSystemMessage(session,
                                "This tool result is from an earlier request. The user has "
                                + "since moved on. Briefly introduce it as a late result, e.g. "
                                + "'By the way, those results from earlier just came in...' "
                                + "then share the key findings concisely.");
                        }
    
                        // Batch response: only call response.create when ALL MCP calls for this
                        // turn have completed. This prevents partial results and repeated tool calls.
                        if (state.pendingApproval == null && state.approvalQueue.isEmpty()
                                && state.mcpCallInProgress.get() <= 0) {
                            preResponseMono
                                .then(session.send(BinaryData.fromString("{\"type\":\"response.create\"}")))
                                .subscribeOn(Schedulers.boundedElastic())
                                .subscribe(v -> {}, err -> {
                                    if (err.getMessage().toLowerCase().contains("active response")) {
                                        state.needsResponseCreate = true;
                                    }
                                });
                        } else {
                            preResponseMono
                                .subscribeOn(Schedulers.boundedElastic())
                                .subscribe(v -> {}, err -> {});
                            state.mcpResultsPending = true;
                            System.out.println("[mcp] MCP calls still in progress (" + state.mcpCallInProgress.get() + ") or approval pending — deferring response");
                        }
                    }
    
                } else if (eventType == ServerEventType.RESPONSE_MCP_CALL_FAILED) {
                    System.out.println("āŒ MCP tool call failed");
                    writeLog("ERROR: MCP tool call failed");
                    String failedJson = BinaryData.fromObject(event).toString();
                    String failedItemId = extractJsonField(failedJson, "item_id");
                    state.mcpCallInProgress.updateAndGet(v -> Math.max(0, v - 1));
                    if (failedItemId != null) {
                        state.activeMcpItems.remove(failedItemId);
                        state.staleMcpItems.remove(failedItemId);
                    }
                    cancelMcpStallTimer(state);
                    try {
                        session.send(BinaryData.fromString("{\"type\":\"response.create\"}"))
                            .subscribeOn(Schedulers.boundedElastic())
                            .subscribe(v -> {}, err -> {});
                    } catch (Exception e) {
                        // best effort
                    }
    
                } else if (eventType == ServerEventType.CONVERSATION_ITEM_CREATED) {
                    handleMCPConversationItem(event, state, session);
                }
            } catch (Exception e) {
                System.err.println("āŒ Error handling event: " + e.getMessage());
            }
        }
        // </handle_mcp_events>
    
        // <handle_approval>
        /**
         * Handle MCP conversation items: approval requests, tool call announcements,
         * and item-to-server tracking.
         */
        private static void handleMCPConversationItem(SessionUpdate event, SessionState state,
                                                        VoiceLiveSessionAsyncClient session) {
            String eventJson = BinaryData.fromObject(event).toString();
    
            if (eventJson.contains("mcp_approval_request")) {
                // Extract approval details
                String approvalId = extractJsonField(eventJson, "id");
                String serverLabel = extractJsonField(eventJson, "server_label");
                String functionName = extractJsonField(eventJson, "name");
    
                if ("unknown".equals(approvalId)) {
                    return;
                }
    
                final int MAX_APPROVAL_CALLS_PER_TASK = 3;
                int currentCount = state.approvalCallCount.getOrDefault(serverLabel, 0);
                if (currentCount >= MAX_APPROVAL_CALLS_PER_TASK) {
                    System.out.println("   Auto-denied: " + serverLabel + "/" + functionName
                        + " (max " + MAX_APPROVAL_CALLS_PER_TASK + " calls reached)");
                    try {
                        String denyJson = String.format(
                            "{\"type\":\"conversation.item.create\",\"item\":"
                            + "{\"type\":\"mcp_approval_response\","
                            + "\"approval_request_id\":\"%s\","
                            + "\"approve\":false}}",
                            approvalId);
                        session.send(BinaryData.fromString(denyJson))
                            .subscribeOn(Schedulers.boundedElastic())
                            .subscribe(v -> {}, err ->
                                System.err.println("Failed to send auto-deny: " + err.getMessage()));
                    } catch (Exception e) {
                        System.err.println("Failed to send auto-deny: " + e.getMessage());
                    }
                    return;
                }
    
                // Auto-approve if user already approved this server earlier in the same turn
                if (state.approvedServersThisTurn.contains(serverLabel)) {
                    System.out.println("   Auto-approved: " + serverLabel + "/" + functionName
                        + " (already approved this turn)");
                    try {
                        String approveJson = String.format(
                            "{\"type\":\"conversation.item.create\",\"item\":"
                            + "{\"type\":\"mcp_approval_response\","
                            + "\"approval_request_id\":\"%s\","
                            + "\"approve\":true}}",
                            approvalId);
                        session.send(BinaryData.fromString(approveJson))
                            .subscribeOn(Schedulers.boundedElastic())
                            .subscribe(v -> {}, err ->
                                System.err.println("Failed to send auto-approve: " + err.getMessage()));
                    } catch (Exception e) {
                        System.err.println("Failed to send auto-approve: " + e.getMessage());
                    }
                    return;
                }
    
                // If another approval is already pending, queue this one
                if (state.pendingApproval != null) {
                    state.approvalQueue.add(
                        new SessionState.ApprovalInfo(approvalId, serverLabel, functionName));
                    return;
                }
    
                System.out.println();
                System.out.println("šŸ” MCP Approval Request (voice-based):");
                System.out.println("   Server: " + serverLabel + "  Tool: " + functionName);
                writeLog("Approval request: server=" + serverLabel + " tool=" + functionName);
    
                state.pendingApproval =
                    new SessionState.ApprovalInfo(approvalId, serverLabel, functionName);
    
                if (!state.responseActive) {
                    sendApprovalVoicePrompt(state, session);
                } else {
                    state.approvalPromptNeeded = true;
                }
    
            } else if (eventJson.contains("\"type\":\"mcp_call\"")) {
                // Track MCP call items and announce non-approval tool calls
                String itemId = extractJsonField(eventJson, "id");
                String serverLabel = extractJsonField(eventJson, "server_label");
                String functionName = extractJsonField(eventJson, "name");
                System.out.println("šŸ”§ MCP tool call: " + serverLabel + "/" + functionName);
                state.mcpItemToServer.put(itemId, serverLabel + "/" + functionName);
    
                // Announce to the user if this server doesn't require approval
                if (state.pendingApproval == null && !state.approvalServers.contains(serverLabel)) {
                    sendSystemMessage(session,
                        "Briefly tell the user you're looking something up. One short sentence only.")
                        .then(session.send(BinaryData.fromString("{\"type\":\"response.create\"}")))
                        .subscribeOn(Schedulers.boundedElastic())
                        .subscribe(v -> {}, err -> {});
                }
            }
        }
    
        /**
         * Inject a system message asking the model to verbally request permission.
         */
        private static void sendApprovalVoicePrompt(SessionState state,
                                                      VoiceLiveSessionAsyncClient session) {
            SessionState.ApprovalInfo pending = state.pendingApproval;
            if (pending == null) return;
    
            int callCount = state.approvalCallCount.getOrDefault(pending.serverLabel(), 0);
            state.approvalCallCount.put(pending.serverLabel(), callCount + 1);
    
            String prompt;
            if (callCount == 0) {
                prompt = "You MUST ask the user for explicit permission before proceeding. "
                    + "Say exactly: \"I'd like to search the " + pending.serverLabel()
                    + " service for information. Do you approve? Please say yes or no.\"";
            } else {
                prompt = "You MUST ask the user for permission again. "
                    + "Say exactly: \"I need to do one more search to get complete information. "
                    + "Should I continue? Please say yes or no.\"";
            }
    
            sendSystemMessage(session, prompt)
                .then(session.send(BinaryData.fromString("{\"type\":\"response.create\"}")))
                .subscribeOn(Schedulers.boundedElastic())
                .subscribe(v -> {}, err ->
                    System.err.println("āŒ Failed to send approval voice prompt: " + err.getMessage()));
        }
    
        /**
         * Interpret the user's spoken response as approval or denial.
         */
        private static void resolveVoiceApproval(String transcript, SessionState state,
                                                   VoiceLiveSessionAsyncClient session) {
            SessionState.ApprovalInfo pending = state.pendingApproval;
            if (pending == null) return;
    
            String text = transcript.trim().toLowerCase();
            boolean approved = YES_PATTERN.matcher(text).find();
            boolean denied = NO_PATTERN.matcher(text).find();
    
            if (!approved && !denied) {
                // Ambiguous — ask again at next RESPONSE_DONE
                state.approvalPromptNeeded = true;
                return;
            }
            if (approved && denied) {
                approved = false; // conflicting signals — deny for safety
            }
    
            state.pendingApproval = null;
            if (approved) {
                state.approvedServersThisTurn.add(pending.serverLabel());
            } else {
                state.approvalCallCount.clear();
                state.approvedServersThisTurn.remove(pending.serverLabel());
            }
    
            System.out.println("   Voice approval: " + (approved ? "Approved āœ…" : "Denied āŒ"));
            writeLog("Approval resolved: " + (approved ? "APPROVED" : "DENIED") + " for " + pending.serverLabel() + "/" + pending.functionName());
    
            // Send approval/denial response via raw JSON.
            // Chain processNextApproval after the send completes to avoid racing.
            String approvalJson = String.format(
                "{\"type\":\"conversation.item.create\",\"item\":"
                + "{\"type\":\"mcp_approval_response\","
                + "\"approval_request_id\":\"%s\","
                + "\"approve\":%s}}",
                pending.approvalId(), approved);
    
            session.send(BinaryData.fromString(approvalJson))
                .subscribeOn(Schedulers.boundedElastic())
                .subscribe(
                    v -> processNextApproval(state, session),
                    error -> {
                        System.err.println("āŒ Failed to send approval response: " + error.getMessage());
                        processNextApproval(state, session);
                    }
                );
        }
    
        /**
         * Pop the next queued approval and ask via voice.
         */
        private static void processNextApproval(SessionState state,
                                                  VoiceLiveSessionAsyncClient session) {
            SessionState.ApprovalInfo next = state.approvalQueue.poll();
            if (next == null) return;
    
            // Auto-approve if user already approved this server earlier in the same turn
            if (state.approvedServersThisTurn.contains(next.serverLabel())) {
                System.out.println("   Auto-approved (queued): " + next.serverLabel() + "/" + next.functionName());
                String approveJson = String.format(
                    "{\"type\":\"conversation.item.create\",\"item\":"
                    + "{\"type\":\"mcp_approval_response\","
                    + "\"approval_request_id\":\"%s\","
                    + "\"approve\":true}}",
                    next.approvalId());
                session.send(BinaryData.fromString(approveJson))
                    .subscribeOn(Schedulers.boundedElastic())
                    .subscribe(
                        v -> processNextApproval(state, session),
                        err -> {
                            System.err.println("Failed to send queued auto-approve: " + err.getMessage());
                            processNextApproval(state, session);
                        });
                return;
            }
    
            state.pendingApproval = next;
            if (!state.responseActive) {
                sendApprovalVoicePrompt(state, session);
            } else {
                state.approvalPromptNeeded = true;
            }
        }
        // </handle_approval>
    
        // <mcp_stall_detection>
        /**
         * Start a timer that verbally updates the user if an MCP call takes too long.
         */
        private static void startMcpStallTimer(SessionState state,
                                                 VoiceLiveSessionAsyncClient session) {
            cancelMcpStallTimer(state);
            final AtomicInteger stallCount = new AtomicInteger(0);
            state.mcpStallTimer = SCHEDULER.scheduleAtFixedRate(() -> {
                if (state.mcpCallInProgress.get() <= 0) {
                    cancelMcpStallTimer(state);
                    return;
                }
                int count = stallCount.incrementAndGet();
                if (count > 3) {
                    cancelMcpStallTimer(state);
                    return;
                }
                // MCP calls cannot be cancelled — only honest status updates are possible.
                String msg = "The tool call is still running. "
                    + "Briefly reassure the user that you're still waiting for results. "
                    + "One short sentence only.";
                sendSystemMessage(session, msg)
                    .then(session.send(BinaryData.fromString("{\"type\":\"response.create\"}")))
                    .subscribeOn(Schedulers.boundedElastic())
                    .subscribe(v -> {}, err -> {
                        if (err.getMessage() != null
                            && err.getMessage().toLowerCase().contains("active response")) {
                            state.needsResponseCreate = true;
                        }
                    });
            }, 10, 10, TimeUnit.SECONDS);
        }
    
        /**
         * Cancel the MCP stall timer if running.
         */
        private static void cancelMcpStallTimer(SessionState state) {
            ScheduledFuture<?> timer = state.mcpStallTimer;
            if (timer != null && !timer.isDone()) {
                timer.cancel(false);
            }
            state.mcpStallTimer = null;
        }
        // </mcp_stall_detection>
    
        /**
         * Send a system message to the model via raw JSON.
         * Returns a Mono so callers can chain subsequent sends sequentially,
         * avoiding FAIL_NON_SERIALIZED errors from concurrent sends.
         */
        private static Mono<Void> sendSystemMessage(VoiceLiveSessionAsyncClient session, String text) {
            String escaped = text.replace("\\", "\\\\").replace("\"", "\\\"");
            String json = "{\"type\":\"conversation.item.create\",\"item\":"
                + "{\"type\":\"message\",\"role\":\"system\",\"content\":"
                + "[{\"type\":\"input_text\",\"text\":\"" + escaped + "\"}]}}";
            return session.send(BinaryData.fromString(json));
        }
    
        /**
         * Write a line to the conversation log file.
         */
        private static void writeLog(String message) {
            try {
                Path logDir = Paths.get("logs");
                Files.createDirectories(logDir);
                try (PrintWriter writer = new PrintWriter(
                        new FileWriter(logDir.resolve(LOG_FILENAME).toString(), true))) {
                    writer.println(message);
                }
            } catch (IOException e) {
                System.err.println("Failed to write conversation log: " + e.getMessage());
            }
        }
    
        /**
         * Extract a simple string field value from a JSON string.
         */
        private static String extractJsonField(String json, String fieldName) {
            String pattern = "\"" + fieldName + "\":\"";
            int start = json.indexOf(pattern);
            if (start < 0) return "unknown";
            start += pattern.length();
            int end = json.indexOf("\"", start);
            if (end < 0) return "unknown";
            return json.substring(start, end);
        }
    
        private static boolean checkAudioSystem() {
            try {
                AudioFormat format = new AudioFormat(SAMPLE_RATE, SAMPLE_SIZE_BITS, CHANNELS, true, false);
                if (!AudioSystem.isLineSupported(new DataLine.Info(TargetDataLine.class, format))) {
                    System.err.println("āŒ No compatible microphone found");
                    return false;
                }
                if (!AudioSystem.isLineSupported(new DataLine.Info(SourceDataLine.class, format))) {
                    System.err.println("āŒ No compatible speaker found");
                    return false;
                }
                System.out.println("āœ“ Audio system check passed");
                return true;
            } catch (Exception e) {
                System.err.println("āŒ Audio system check failed: " + e.getMessage());
                return false;
            }
        }
    
        public static void main(String[] args) {
            Config config = Config.load(args);
    
            if (config.endpoint == null) {
                System.err.println("āŒ Missing endpoint. Set AZURE_VOICELIVE_ENDPOINT or pass --endpoint.");
                return;
            }
            if (!config.useTokenCredential && config.apiKey == null) {
                System.err.println("āŒ No authentication. Set AZURE_VOICELIVE_API_KEY or use --use-token-credential.");
                return;
            }
            if (!checkAudioSystem()) return;
    
            System.out.println("šŸŽ™ļø Starting Voice Assistant with MCP...");
    
            // Session state for voice-based MCP approval flow
            SessionState state = new SessionState();
            state.approvalServers = Set.of("azure_doc");
    
            try {
                VoiceLiveAsyncClient client;
                if (config.useTokenCredential) {
                    TokenCredential credential = new AzureCliCredentialBuilder().build();
                    client = new VoiceLiveClientBuilder()
                        .endpoint(config.endpoint)
                        .credential(credential)
                        .serviceVersion(VoiceLiveServiceVersion.V2026_01_01_PREVIEW)
                        .buildAsyncClient();
                    System.out.println("šŸ”‘ Using Token Credential authentication");
                } else {
                    client = new VoiceLiveClientBuilder()
                        .endpoint(config.endpoint)
                        .credential(new KeyCredential(config.apiKey))
                        .serviceVersion(VoiceLiveServiceVersion.V2026_01_01_PREVIEW)
                        .buildAsyncClient();
                    System.out.println("šŸ”‘ Using API Key authentication");
                }
    
                VoiceLiveSessionOptions sessionOptions = createSessionOptions(config);
                AtomicReference<AudioProcessor> audioProcessorRef = new AtomicReference<>();
    
                client.startSession(config.model)
                    .flatMap(session -> {
                        System.out.println("āœ“ Session started");
    
                        AudioProcessor audioProcessor = new AudioProcessor(session);
                        audioProcessorRef.set(audioProcessor);
    
                        session.receiveEvents()
                            .subscribe(
                                event -> handleServerEvent(event, audioProcessor, state, session),
                                error -> System.err.println("āŒ Event error: " + error.getMessage())
                            );
    
                        ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(sessionOptions);
                        session.sendEvent(updateEvent).subscribe();
    
                        audioProcessor.startPlayback();
    
                        System.out.println();
                        System.out.println("=".repeat(70));
                        System.out.println("šŸŽ¤ VOICE ASSISTANT WITH MCP READY");
                        System.out.println("Try saying:");
                        System.out.println("  • 'Can you summarize the GitHub repo azure-sdk-for-java?'");
                        System.out.println("  • 'Search the Azure documentation for Voice Live API.'");
                        System.out.println("Approve MCP tool calls by voice — say 'yes' or 'no' when asked.");
                        System.out.println("Press Ctrl+C to exit");
                        System.out.println("=".repeat(70));
                        System.out.println();
    
                        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
                            System.out.println("\nšŸ›‘ Shutting down...");
                            audioProcessor.shutdown();
                            SCHEDULER.shutdownNow();
                        }));
    
                        return Mono.never();
                    })
                    .doFinally(signalType -> {
                        AudioProcessor ap = audioProcessorRef.get();
                        if (ap != null) ap.shutdown();
                        SCHEDULER.shutdownNow();
                    })
                    .block();
    
            } catch (Exception e) {
                System.err.println("āŒ Fatal error: " + e.getMessage());
            }
        }
    }
    
  2. Sign in to Azure with the following command:

    az login
    
  3. Build and run the application:

    mvn compile exec:java -Dexec.mainClass="MCPQuickstart" -q
    
  4. Speak into your microphone. Try asking questions like "What tools do you have?" or "Search the Azure documentation for Voice Live API."

    • For the deepwiki server (requireApproval="never"), tool calls execute automatically.
    • For the azure_doc server (requireApproval="always"), you're prompted to approve each tool call in the console.
  5. Press Ctrl+C to stop the session.

MCP server configuration reference

Parameter Required Description
serverLabel Yes Display name for the MCP server.
serverUrl Yes URL of the remote MCP endpoint.
allowedTools No List of tool names the model can call. If omitted, all tools are allowed.
requireApproval No "never", "always" (default), or a per-tool dictionary.
headers No Extra HTTP headers to include in MCP requests.
authorization No Authorization token for MCP requests.

For the complete REST API type definition, see MCPTool in the Voice Live API reference.

Learn how to connect remote MCP servers to a Voice Live session using the VoiceLive SDK for JavaScript. This article builds on the Quickstart: Create a Voice Live real-time voice agent with MCP server integration.

Reference documentation | Package (npm) | Additional samples on GitHub

Follow the how-to below or get the full sample code:

Note

The JavaScript Voice Live SDK is designed for browser-based applications with built-in WebSocket and Web Audio support. This how-to guide uses Node.js with node-record-lpcm16 and speaker for a console experience.

Prerequisites

  • An Azure subscription. Create one for free.
  • Node.js version 18 or later.
  • SoX installed on your system (required by node-record-lpcm16 for microphone capture).
  • A Microsoft Foundry resource created in one of the supported regions. For more information about region availability, see the Voice Live overview documentation.
  • @azure/ai-voicelive package version 1.0.0 or later (MCP support requires API version 2026-04-10).
  • Assign the Cognitive Services User role to your user account. You can assign roles in the Azure portal under Access control (IAM) > Add role assignment.

Tip

To use Voice Live with MCP, you don't need to deploy an audio model with your Foundry resource. Voice Live is fully managed, and the model is automatically deployed for you. For more information about model availability, see the Voice Live overview documentation.

Prepare the environment

Complete the Voice Live quickstart to set up your environment, configure authentication, and test your first Voice Live conversation.

MCP integration concepts

MCP server definition

Use an MCP server object with type: "mcp" to declare each remote MCP endpoint. At minimum, provide server_label (a display name) and server_url (the MCP endpoint URL). Optionally restrict available tools with allowed_tools and configure the approval mode.

Approval modes

Control whether MCP tool calls require user approval before execution:

  • require_approval: "never": The tool executes automatically when the model invokes it.
  • require_approval: "always" (default): The client receives an approval request and must respond before the tool runs.

API version requirement

MCP support requires API version 2026-04-10 or later.

Define MCP servers

Define the MCP servers that Voice Live can use during the session. Each server is an MCP server object added to the tools list in the session configuration.

The following code defines two MCP servers: one with automatic tool execution and one that requires user approval before running.

/**
 * Define MCP servers that Voice Live can use during the session.
 * Each server is an MCPTool object added to the session tools array.
 */
function defineMCPServers() {
  return [
    {
      type: "mcp",
      serverLabel: "deepwiki",
      serverUrl: "https://mcp.deepwiki.com/mcp",
      allowedTools: ["read_wiki_structure", "ask_question"],
      requireApproval: "never",
    },
    {
      type: "mcp",
      serverLabel: "azure_doc",
      serverUrl: "https://learn.microsoft.com/api/mcp",
      requireApproval: "always",
    },
  ];
}

In this sample:

  • The deepwiki server allows only read_wiki_structure and ask_question tools, with require_approval set to "never" for automatic execution.
  • The azure_doc server allows all tools on the endpoint, with require_approval set to "always" so users can review each tool call before execution.

Configure the session with MCP tools

Pass the MCP server definitions to the session configuration alongside your voice, modality, and turn-detection settings.

/**
 * Configure the session with MCP servers in the tools list.
 */
async _setupSession() {
  console.log("[session] Configuring session with MCP tools...");

  const mcpServers = defineMCPServers();

  this._approvalServers = new Set(
    mcpServers.filter(s => s.requireApproval === "always").map(s => s.serverLabel)
  );

  await this._session.updateSession({
    model: this.model,
    modalities: ["text", "audio"],
    instructions: this.instructions,
    voice: resolveVoiceConfig(this.voice),
    inputAudioFormat: "pcm16",
    outputAudioFormat: "pcm16",
    turnDetection: {
      type: "server_vad",
      threshold: 0.5,
      prefixPaddingInMs: 300,
      silenceDurationInMs: 500,
    },
    inputAudioEchoCancellation: { type: "server_echo_cancellation" },
    inputAudioNoiseReduction: { type: "azure_deep_noise_suppression" },
    inputAudioTranscription: { model: this.model.toLowerCase().includes("realtime") ? "whisper-1" : "azure-speech" },
    tools: mcpServers,
  });

  console.log("[session] Session configuration with MCP tools sent");
}

In this sample:

  • The session configuration bundles MCP tools with audio format, voice, and turn detection settings.
  • session.updateSession(...) sends the full configuration to Voice Live.
  • Voice Live automatically discovers available tools from each MCP server after the session starts.

Handle MCP events

Process MCP-specific events in the event loop. The key events include MCP tool call creation, completion, failure, and approval requests.

/**
 * Subscribe to session events, including MCP-specific events.
 */
_subscribeToEvents(session) {
  this._subscription = session.subscribe({
    onSessionUpdated: async (event, context) => {
      const s = event.session;
      const model = s?.model;
      const voice = s?.voice;
      console.log(`[session] Session ready: ${context.sessionId}`);
      console.log(
        `  Model: ${typeof model === "string" ? model : model?.toString?.() ?? ""}`,
      );
      console.log(`  Voice: ${voice?.name ?? ""}`);
      writeConversationLog(
        [
          `SessionID: ${context.sessionId}`,
          `Model: ${typeof model === "string" ? model : model?.toString?.() ?? ""}`,
          `Voice Name: ${voice?.name ?? ""}`,
          `Voice Type: ${voice?.type ?? ""}`,
          `Log File: ${conversationLogFile}`,
          "",
        ].join("\n"),
      );
    },

    onConversationItemInputAudioTranscriptionCompleted: async (event) => {
      const transcript = event.transcript ?? "";
      console.log(`šŸ‘¤ You said:\t${transcript}`);
      writeConversationLog(`User Input:\t${transcript}`);
      if (this._pendingApproval !== null) {
        await this._resolveVoiceApproval(transcript, session);
      }
    },

    onResponseTextDone: async (event) => {
      const text = event.text ?? "";
      console.log(`šŸ¤– Assistant text:\t${text}`);
      writeConversationLog(`Assistant Text Response:\t${text}`);
    },

    onResponseAudioTranscriptDone: async (event) => {
      const transcript = event.transcript ?? "";
      console.log(`šŸ¤– Assistant audio transcript:\t${transcript}`);
      writeConversationLog(`Assistant Audio Response:\t${transcript}`);
    },

    onInputAudioBufferSpeechStarted: async () => {
      console.log("šŸŽ¤ Listening...");
      this._audio.skipPendingAudio();

      // Do NOT reset _approvalCallCount here — the counter should only
      // reset on task completion (in onResponseMcpCallCompleted when no
      // pending/queued approvals remain) or on denial (in _resolveVoiceApproval).
      // Resetting on every speech-start would let the model retry denied calls.

      // Clear ALL deferred response flags on barge-in.
      // This prevents onResponseDone (fired by the cancelled response)
      // from immediately creating a new response that overlaps the user.
      this._needsResponseCreate = false;
      this._mcpResultsPending = false;

      // Reset approved-servers-this-turn when user starts a new topic
      if (this._pendingApproval === null && this._mcpCallInProgress <= 0) {
        this._approvedServersThisTurn.clear();
      }

      if (this._activeResponse && !this._responseApiDone) {
        // Mark barge-in so onResponseDone skips deferred actions
        this._bargeInActive = true;
        try {
          await session.sendEvent({ type: "response.cancel" });
        } catch (err) {
          const msg = err?.message ?? "";
          if (!msg.toLowerCase().includes("no active response")) {
            console.warn("[barge-in] Cancel failed:", msg);
          }
        }
        try {
          await session.sendEvent({ type: "input_audio_buffer.clear" });
        } catch { /* best-effort */ }
      }

      if (this._mcpCallInProgress > 0 && this._pendingApproval === null) {
        this._staleMcpItems = new Set([...this._staleMcpItems, ...this._activeMcpItems]);
        console.log(`[barge-in] Marking ${this._activeMcpItems.size} MCP calls as stale`);
        try {
          await session.addConversationItem({ type: "message", role: "system", content: [{ type: "input_text", text: "A tool call is still running in the background. The user just spoke. Respond to what the user said. If a tool result arrives later, briefly introduce it as a late result from an earlier request." }] });
        } catch {}
      }
    },

    onInputAudioBufferSpeechStopped: async () => {
      console.log("šŸ¤” Processing...");
    },

    onResponseCreated: async () => {
      this._activeResponse = true;
      this._responseApiDone = false;
    },

    onResponseAudioDelta: async (event) => {
      if (event.delta) {
        this._audio.queueAudio(event.delta);
      }
    },

    onResponseAudioDone: async () => {
      console.log("šŸŽ¤ Ready for next input...");
    },

    onResponseDone: async () => {
      console.log("āœ… Response complete");
      writeConversationLog("--- Response complete ---");
      this._activeResponse = false;
      this._responseApiDone = true;

      // If this response.done is the result of a barge-in cancel,
      // skip all deferred actions — the user's new turn will handle things.
      if (this._bargeInActive) {
        this._bargeInActive = false;
        return;
      }

      if (this._approvalPromptNeeded && this._pendingApproval !== null) {
        this._approvalPromptNeeded = false;
        await this._sendApprovalVoicePrompt(session);
      } else if (this._mcpResultsPending && this._mcpCallInProgress <= 0 && this._pendingApproval === null) {
        this._mcpResultsPending = false;
        try { await session.sendEvent({ type: "response.create" }); } catch {}
      } else if (this._needsResponseCreate) {
        this._needsResponseCreate = false;
        try { await session.sendEvent({ type: "response.create" }); } catch {}
      }
    },

    onServerError: async (event) => {
      const msg = event.error?.message ?? "";
      // Reset response state — errors can terminate a response without onResponseDone
      this._activeResponse = false;
      this._responseApiDone = true;
      if (msg.includes("Cancellation failed: no active response")) return;
      if (msg.toLowerCase().includes("interim response")) {
        console.log("[session] Interim response not supported (non-fatal)");
        return;
      }
      if (msg.toLowerCase().includes("active response")) return;
      console.error(`āŒ VoiceLive error: ${msg}`);
      writeConversationLog(`ERROR: ${msg}`);
    },

    // MCP-specific event handlers
    onMcpListToolsCompleted: async (event) => {
      console.log(`šŸ”§ MCP tools discovered successfully`);
      writeConversationLog("MCP tools discovered successfully");
    },

    onMcpListToolsFailed: async (event) => {
      console.error(`āŒ MCP tool discovery failed`);
      writeConversationLog("ERROR: MCP tool discovery failed");
    },

    onResponseMcpCallInProgress: async (event) => {
      console.log("ā³ MCP tool call in progress...");
      writeConversationLog(`MCP call in progress: ${event.item_id ?? ""}`);
      this._mcpCallInProgress++;
      this._activeMcpItems.add(event.item_id);
      this._startMcpStallTimer(session);
    },

    onResponseMcpCallArgumentsDone: async (event) => {
      const name = event.name ?? "";
      console.log(`šŸ“‹ MCP tool call arguments ready: ${name}`);
    },

    onResponseMcpCallCompleted: async (event) => {
      const itemId = event.item_id ?? "";
      this._mcpCallInProgress = Math.max(0, this._mcpCallInProgress - 1);
      this._activeMcpItems.delete(itemId);
      this._cancelMcpStallTimer();
      if (this._handledMcpCompletions.has(itemId)) return;
      this._handledMcpCompletions.add(itemId);

      const isStale = this._staleMcpItems.has(itemId);
      this._staleMcpItems.delete(itemId);
      console.log(`āœ… MCP tool call completed (stale=${isStale})`);
      writeConversationLog(`MCP call completed: ${itemId} (stale=${isStale})`);

      delete this._mcpItemToServer[itemId];
      if (this._pendingApproval === null && this._approvalQueue.length === 0) {
        this._approvalCallCount = {};
      }

      if (isStale) {
        try {
          await session.addConversationItem({ type: "message", role: "system", content: [{ type: "input_text", text: "This tool result is from an earlier request. The user has since moved on. Briefly introduce it as a late result, e.g. 'By the way, those results from earlier just came in...' then share the key findings concisely." }] });
        } catch {}
      }

      // Batch response: only call response.create when ALL MCP calls for this
      // turn have completed. This prevents partial results and repeated tool calls.
      if (this._mcpCallInProgress <= 0 && this._pendingApproval === null && this._approvalQueue.length === 0) {
        try {
          await session.sendEvent({ type: "response.create" });
        } catch (e) {
          if (e?.message?.toLowerCase().includes("active response")) {
            this._needsResponseCreate = true;
          }
        }
      } else {
        this._mcpResultsPending = true;
        console.log(`[mcp] MCP calls still in progress (${this._mcpCallInProgress}) — deferring response`);
      }
    },

    onResponseMcpCallFailed: async (event) => {
      const itemId = event.item_id ?? "";
      console.error("āŒ MCP tool call failed");
      writeConversationLog(`ERROR: MCP call failed: ${itemId}`);
      this._mcpCallInProgress = Math.max(0, this._mcpCallInProgress - 1);
      this._activeMcpItems.delete(itemId);
      this._staleMcpItems.delete(itemId);
      this._cancelMcpStallTimer();
      try { await session.sendEvent({ type: "response.create" }); } catch {}
    },

    onConversationItemCreated: async (event) => {
      const item = event.item;
      if (item?.type === "mcp_call") {
        const sl = item.serverLabel ?? item.server_label ?? "";
        const fn = item.name ?? "";
        this._mcpItemToServer[item.id] = `${sl}/${fn}`;
        console.log(`šŸ”§ MCP tool call: ${sl}/${fn}`);
        writeConversationLog(`MCP tool call: ${sl}/${fn} (id=${item.id})`);
        if (!this._pendingApproval && !this._approvalServers.has(sl)) {
          try {
            await session.addConversationItem({ type: "message", role: "system", content: [{ type: "input_text", text: "Briefly tell the user you're looking something up. One short sentence only." }] });
            await session.sendEvent({ type: "response.create" });
          } catch {}
        }
      }
      if (item?.type === "mcp_approval_request") {
        writeConversationLog(`MCP approval request: ${item.serverLabel ?? item.server_label ?? ""} / ${item.name ?? ""} (id=${item.id ?? ""})`);
        await this._handleApprovalRequest(item, session);
      }
    },
  });
}

Handle approval requests

When a server is configured with require_approval: "always", client code must handle the approval flow. Instead of blocking on readline, the sample injects a system message so the model asks the user verbally. The user's spoken transcript is then parsed for intent using word-boundary regex (\byes\b, \b(no|stop|cancel)\b).

/**
 * Handle MCP approval requests via voice-based approval flow.
 */
async _handleApprovalRequest(item, session) {
  const approvalId = item.id ?? "unknown";
  const serverLabel = item.serverLabel ?? item.server_label ?? "unknown";
  const functionName = item.name ?? "unknown";

  console.log();
  console.log("šŸ” MCP Approval Request");
  console.log(`   Server: ${serverLabel}`);
  console.log(`   Tool: ${functionName}`);
  console.log(`   Approval ID: ${approvalId}`);

  const MAX_APPROVAL_CALLS_PER_TASK = 3;
  const currentCount = this._approvalCallCount[serverLabel] ?? 0;
  if (currentCount >= MAX_APPROVAL_CALLS_PER_TASK) {
    console.log(`   Auto-denied: ${serverLabel}/${functionName} (max ${MAX_APPROVAL_CALLS_PER_TASK} calls reached)`);
    try {
      await session.addConversationItem({
        type: "mcp_approval_response",
        approvalRequestId: approvalId,
        approve: false,
      });
    } catch (err) {
      console.warn("Failed to send auto-deny:", err?.message ?? err);
    }
    return;
  }

  // Auto-approve if user already approved this server earlier in the same turn
  if (this._approvedServersThisTurn.has(serverLabel)) {
    console.log(`   Auto-approved: ${serverLabel}/${functionName} (already approved this turn)`);
    try {
      await session.addConversationItem({
        type: "mcp_approval_response",
        approvalRequestId: approvalId,
        approve: true,
      });
    } catch (err) {
      console.warn("Failed to send auto-approve:", err?.message ?? err);
    }
    return;
  }

  if (this._pendingApproval !== null) {
    this._approvalQueue.push({ approvalId, serverLabel, functionName });
    console.log("   (queued — another approval is pending)");
    return;
  }

  this._pendingApproval = { approvalId, serverLabel, functionName };

  if (!this._activeResponse) {
    await this._sendApprovalVoicePrompt(session);
  } else {
    this._approvalPromptNeeded = true;
  }
}

async _sendApprovalVoicePrompt(session) {
  const pending = this._pendingApproval;
  if (!pending) return;

  const server = pending.serverLabel;
  const count = this._approvalCallCount[server] ?? 0;
  this._approvalCallCount[server] = count + 1;

  let prompt;
  if (count === 0) {
    prompt = `You MUST ask the user for explicit permission before proceeding. Say exactly: "I'd like to search the ${server} service for information. Do you approve? Please say yes or no."`;
  } else {
    prompt = `You MUST ask the user for permission again. Say exactly: "I need to do one more search to get complete information. Should I continue? Please say yes or no."`;
  }

  try {
    await session.addConversationItem({
      type: "message",
      role: "system",
      content: [{ type: "input_text", text: prompt }],
    });
    await session.sendEvent({ type: "response.create" });
  } catch (err) {
    console.error("āŒ Failed to send approval voice prompt:", err?.message ?? err);
  }
}

async _resolveVoiceApproval(transcript, session) {
  if (this._pendingApproval === null) return;

  const lower = transcript.toLowerCase();
  let approved = /\byes\b/.test(lower);
  const denied = /\b(no|stop|cancel)\b/.test(lower);

  if (!approved && !denied) {
    // Ambiguous — will re-prompt at next response.done
    this._approvalPromptNeeded = true;
    return;
  }

  if (approved && denied) {
    approved = false; // Conflicting signals — deny for safety
  }

  const { approvalId, serverLabel } = this._pendingApproval;

  console.log(`   Voice response: ${approved ? "Approved āœ…" : "Denied āŒ"}`);
  writeConversationLog(`Voice approval: ${approved ? "Approved" : "Denied"} for ${serverLabel}`);

  this._pendingApproval = null;

  if (approved) {
    this._approvedServersThisTurn.add(serverLabel);
  } else {
    this._approvalCallCount = {};
    this._approvedServersThisTurn.delete(serverLabel);
  }

  try {
    await session.addConversationItem({
      type: "mcp_approval_response",
      approvalRequestId: approvalId,
      approve: approved,
    });
  } catch (err) {
    console.error("āŒ Failed to send approval response:", err?.message ?? err);
  }

  await this._processNextApproval(session);
}

async _processNextApproval(session) {
  if (this._approvalQueue.length === 0) return;

  const next = this._approvalQueue.shift();

  // Auto-approve if user already approved this server earlier in the same turn
  if (this._approvedServersThisTurn.has(next.serverLabel)) {
    console.log(`   Auto-approved (queued): ${next.serverLabel}/${next.functionName}`);
    try {
      await session.addConversationItem({
        type: "mcp_approval_response",
        approvalRequestId: next.approvalId,
        approve: true,
      });
    } catch (err) {
      console.warn("Failed to send queued auto-approve:", err?.message ?? err);
    }
    await this._processNextApproval(session);
    return;
  }

  this._pendingApproval = next;

  if (!this._activeResponse) {
    await this._sendApprovalVoicePrompt(session);
  } else {
    this._approvalPromptNeeded = true;
  }
}

In this sample:

  • A system message instructs the model to verbally ask for permission.
  • mcp_approval_response sends the decision back to Voice Live with approve: true or approve: false.

Resolve voice-based approval

Parse the user's spoken transcript to determine approval. Use word-boundary regex to avoid false positives from words like "yesterday" or "nobody".

async _resolveVoiceApproval(transcript, session) {
  if (this._pendingApproval === null) return;

  const lower = transcript.toLowerCase();
  let approved = /\byes\b/.test(lower);
  const denied = /\b(no|stop|cancel)\b/.test(lower);

  if (!approved && !denied) {
    // Ambiguous — will re-prompt at next response.done
    this._approvalPromptNeeded = true;
    return;
  }

  if (approved && denied) {
    approved = false; // Conflicting signals — deny for safety
  }

  const { approvalId, serverLabel } = this._pendingApproval;

  console.log(`   Voice response: ${approved ? "Approved āœ…" : "Denied āŒ"}`);
  writeConversationLog(`Voice approval: ${approved ? "Approved" : "Denied"} for ${serverLabel}`);

  this._pendingApproval = null;

  if (approved) {
    this._approvedServersThisTurn.add(serverLabel);
  } else {
    this._approvalCallCount = {};
    this._approvedServersThisTurn.delete(serverLabel);
  }

  try {
    await session.addConversationItem({
      type: "mcp_approval_response",
      approvalRequestId: approvalId,
      approve: approved,
    });
  } catch (err) {
    console.error("āŒ Failed to send approval response:", err?.message ?? err);
  }

  await this._processNextApproval(session);
}

async _processNextApproval(session) {
  if (this._approvalQueue.length === 0) return;

  const next = this._approvalQueue.shift();

  // Auto-approve if user already approved this server earlier in the same turn
  if (this._approvedServersThisTurn.has(next.serverLabel)) {
    console.log(`   Auto-approved (queued): ${next.serverLabel}/${next.functionName}`);
    try {
      await session.addConversationItem({
        type: "mcp_approval_response",
        approvalRequestId: next.approvalId,
        approve: true,
      });
    } catch (err) {
      console.warn("Failed to send queued auto-approve:", err?.message ?? err);
    }
    await this._processNextApproval(session);
    return;
  }

  this._pendingApproval = next;

  if (!this._activeResponse) {
    await this._sendApprovalVoicePrompt(session);
  } else {
    this._approvalPromptNeeded = true;
  }
}

In this sample:

  • The transcript from conversation.item.input_audio_transcription.completed is matched against \byes\b and \b(no|stop|cancel)\b patterns.
  • Subsequent calls to the same server within the same turn are auto-approved to avoid repeated prompts.
  • After a configurable maximum (for example, 3 approvals), further calls are auto-denied and the model responds with what it has.

Detect stalls during MCP tool calls

MCP tool calls can take several seconds. Use a repeating timer to proactively inform the user that the assistant is still waiting for results.

_startMcpStallTimer(session) {
  this._cancelMcpStallTimer();
  let stallCount = 0;
  const MCP_STALL_MAX_NOTIFICATIONS = 3;
  this._mcpStallTimer = setInterval(async () => {
    if (this._mcpCallInProgress <= 0) {
      this._cancelMcpStallTimer();
      return;
    }
    stallCount++;
    if (stallCount > MCP_STALL_MAX_NOTIFICATIONS) {
      this._cancelMcpStallTimer();
      return;
    }
    // MCP calls cannot be cancelled — only honest status updates are possible.
    const msg = "The tool call is still running. Briefly reassure the user that you're still waiting for results. One short sentence only.";
    try {
      await session.addConversationItem({ type: "message", role: "system", content: [{ type: "input_text", text: msg }] });
      await session.sendEvent({ type: "response.create" });
    } catch (e) {
      if (e?.message?.toLowerCase().includes("active response")) {
        this._needsResponseCreate = true;
      }
    }
  }, 10000);
}

_cancelMcpStallTimer() {
  if (this._mcpStallTimer) {
    clearInterval(this._mcpStallTimer);
    this._mcpStallTimer = null;
  }
}

In this sample:

  • A setInterval timer fires at a 10-second interval, injecting system messages up to 3 times.
  • The timer is cancelled when the MCP call completes or the user interrupts with barge-in.

Run the sample

  1. Create the mcp-quickstart.js file with the following code:

    // Copyright (c) Microsoft Corporation. All rights reserved.
    // Licensed under the MIT License.
    
    import "dotenv/config";
    import { VoiceLiveClient } from "@azure/ai-voicelive";
    import { AzureKeyCredential } from "@azure/core-auth";
    import { DefaultAzureCredential } from "@azure/identity";
    import { spawn } from "node:child_process";
    import { existsSync, mkdirSync, appendFileSync } from "node:fs";
    import { join, dirname } from "node:path";
    import { fileURLToPath } from "node:url";
    
    const __dirname = dirname(fileURLToPath(import.meta.url));
    
    const logsDir = join(__dirname, "logs");
    if (!existsSync(logsDir)) mkdirSync(logsDir, { recursive: true });
    
    const timestamp = new Date()
      .toISOString()
      .replace(/[:.]/g, "-")
      .replace("T", "_")
      .slice(0, 19);
    const conversationLogFile = join(logsDir, `conversation_${timestamp}.log`);
    
    function writeConversationLog(message) {
      appendFileSync(conversationLogFile, message + "\n", "utf-8");
    }
    
    function printUsage() {
      console.log("Usage: node mcp-quickstart.js [options]");
      console.log("");
      console.log("Options:");
      console.log("  --api-key <key>             VoiceLive API key");
      console.log("  --endpoint <url>            VoiceLive endpoint URL");
      console.log("  --model <name>              Model to use (default: gpt-realtime)");
      console.log(
        "  --voice <name>              Voice (default: en-US-Ava:DragonHDLatestNeural)",
      );
      console.log("  --instructions <text>       System instructions for the assistant");
      console.log("  --audio-input-device <name> Explicit SoX input device name (Windows)");
      console.log("  --list-audio-devices        List available audio input devices and exit");
      console.log("  --use-token-credential      Use Azure credential instead of API key");
      console.log("  --no-audio                  Connect and configure session without mic/speaker");
      console.log("  -h, --help                  Show this help text");
    }
    
    function parseArguments(argv) {
      const parsed = {
        apiKey: process.env.AZURE_VOICELIVE_API_KEY,
        endpoint: process.env.AZURE_VOICELIVE_ENDPOINT,
        model: process.env.AZURE_VOICELIVE_MODEL ?? "gpt-realtime",
        voice:
          process.env.AZURE_VOICELIVE_VOICE ?? "en-US-Ava:DragonHDLatestNeural",
        instructions:
          process.env.AZURE_VOICELIVE_INSTRUCTIONS ??
          "You are a helpful AI assistant with access to MCP tools. Always respond in English. When a user asks a question, use the appropriate tool once to find information, then summarize the results conversationally. IMPORTANT: Never call the same tool more than once per user question. After receiving a tool result, always respond to the user with what you found — do not search again. Some tools require user approval before they can be used. When you receive a system message asking you to request permission, you MUST clearly ask the user for their explicit approval before proceeding. Always wait for the user to say yes or no. Never skip the approval question or assume permission is granted. If a tool result arrives after the conversation has moved to a different topic, briefly introduce it as a late result before sharing the findings.",
        audioInputDevice: process.env.AUDIO_INPUT_DEVICE,
        listAudioDevices: false,
        useTokenCredential: false,
        noAudio: false,
        help: false,
      };
    
      for (let i = 0; i < argv.length; i++) {
        const arg = argv[i];
        switch (arg) {
          case "--api-key":
            parsed.apiKey = argv[++i];
            break;
          case "--endpoint":
            parsed.endpoint = argv[++i];
            break;
          case "--model":
            parsed.model = argv[++i];
            break;
          case "--voice":
            parsed.voice = argv[++i];
            break;
          case "--instructions":
            parsed.instructions = argv[++i];
            break;
          case "--audio-input-device":
            parsed.audioInputDevice = argv[++i];
            break;
          case "--list-audio-devices":
            parsed.listAudioDevices = true;
            break;
          case "--use-token-credential":
            parsed.useTokenCredential = true;
            break;
          case "--no-audio":
            parsed.noAudio = true;
            break;
          case "--help":
          case "-h":
            parsed.help = true;
            break;
          default:
            if (arg?.startsWith("-")) {
              throw new Error(`Unknown option: ${arg}`);
            }
            break;
        }
      }
    
      return parsed;
    }
    
    /**
     * List available audio input devices on Windows (AudioEndpoint via WMI).
     */
    async function listAudioDevices() {
      if (process.platform !== "win32") {
        console.log("Device listing is currently supported on Windows only.");
        console.log("On macOS/Linux, run: sox -V6 -n -t coreaudio -n trim 0 0  (or similar)");
        return;
      }
    
      const { execSync } = await import("node:child_process");
      try {
        const output = execSync(
          'powershell -NoProfile -Command "Get-CimInstance Win32_PnPEntity | Where-Object { $_.PNPClass -eq \'AudioEndpoint\' } | Select-Object -ExpandProperty Name"',
          { encoding: "utf-8", timeout: 10000 },
        ).trim();
    
        if (!output) {
          console.log("No audio endpoint devices found.");
          return;
        }
    
        console.log("Available audio endpoint devices:");
        console.log("");
        for (const line of output.split(/\r?\n/)) {
          const name = line.trim();
          if (name) console.log(`  ${name}`);
        }
        console.log("");
        console.log("Use the device name (or a unique substring) with --audio-input-device.");
        console.log('Example: node mcp-quickstart.js --audio-input-device "Microphone"');
      } catch (err) {
        console.error("Failed to query audio devices:", err.message);
      }
    }
    
    function resolveVoiceConfig(voiceName) {
      const looksLikeAzureVoice = voiceName.includes("-") || voiceName.includes(":");
      if (looksLikeAzureVoice) {
        return { type: "azure-standard", name: voiceName };
      }
      return { type: "openai", name: voiceName };
    }
    
    class AudioProcessor {
      constructor(enableAudio = true, inputDevice = undefined) {
        this._enableAudio = enableAudio;
        this._inputDevice = inputDevice;
        this._recorder = null;
        this._soxProcess = null;
        this._speaker = null;
        this._skipSeq = 0;
        this._nextSeq = 0;
        this._recordModule = null;
        this._speakerCtor = null;
      }
    
      async _ensureAudioModulesLoaded() {
        if (!this._enableAudio) return;
        if (this._recordModule && this._speakerCtor) return;
    
        try {
          const recordModule = await import("node-record-lpcm16");
          const speakerModule = await import("speaker");
          this._recordModule = recordModule.default;
          this._speakerCtor = speakerModule.default;
        } catch {
          throw new Error(
            "Audio dependencies are unavailable. Install optional packages (node-record-lpcm16, speaker) " +
            "and required native build tools, or run with --no-audio for connectivity-only validation.",
          );
        }
      }
    
      async startCapture(session) {
        if (!this._enableAudio) {
          console.log("[audio] --no-audio enabled: microphone capture skipped");
          return;
        }
        if (this._recorder || this._soxProcess) return;
    
        if (this._inputDevice) {
          console.log(`[audio] Using explicit input device: ${this._inputDevice}`);
    
          const soxArgs = [
            "-q", "-t", "waveaudio", this._inputDevice,
            "-r", "24000", "-c", "1", "-e", "signed-integer", "-b", "16",
            "-t", "raw", "-",
          ];
    
          this._soxProcess = spawn("sox", soxArgs, {
            stdio: ["ignore", "pipe", "pipe"],
          });
    
          this._soxProcess.stdout.on("data", (chunk) => {
            if (session.isConnected) {
              session.sendAudio(new Uint8Array(chunk)).catch(() => {});
            }
          });
    
          this._soxProcess.stderr.on("data", (data) => {
            const msg = data.toString().trim();
            if (msg) console.error(`[audio] sox stderr: ${msg}`);
          });
    
          this._soxProcess.on("error", (error) => {
            console.error(`[audio] SoX process error: ${error?.message ?? error}`);
          });
    
          this._soxProcess.on("close", (code) => {
            if (code !== 0) console.error(`[audio] SoX exited with code ${code}`);
            this._soxProcess = null;
          });
    
          console.log("[audio] Microphone capture started");
          return;
        }
    
        await this._ensureAudioModulesLoaded();
    
        const recorderOptions = {
          sampleRate: 24000,
          channels: 1,
          audioType: "raw",
          recorder: "sox",
          encoding: "signed-integer",
          bitwidth: 16,
        };
    
        this._recorder = this._recordModule.record(recorderOptions);
        const recorderStream = this._recorder.stream();
    
        recorderStream.on("data", (chunk) => {
          if (session.isConnected) {
            session.sendAudio(new Uint8Array(chunk)).catch(() => {});
          }
        });
    
        recorderStream.on("error", (error) => {
          console.error(`[audio] Recorder stream error: ${error?.message ?? error}`);
        });
    
        console.log("[audio] Microphone capture started");
      }
    
      async startPlayback() {
        if (!this._enableAudio) {
          console.log("[audio] --no-audio enabled: speaker playback skipped");
          return;
        }
        if (this._speaker) return;
        await this._resetSpeaker();
        console.log("[audio] Playback ready");
      }
    
      queueAudio(base64Delta) {
        const seq = this._nextSeq++;
        if (seq < this._skipSeq) return;
        const chunk = Buffer.from(base64Delta, "base64");
        if (this._speaker && !this._speaker.destroyed) {
          this._speaker.write(chunk);
        }
      }
    
      skipPendingAudio() {
        if (!this._enableAudio) return;
        this._skipSeq = this._nextSeq++;
        this._resetSpeaker().catch(() => {});
      }
    
      shutdown() {
        if (this._soxProcess) {
          try { this._soxProcess.kill(); } catch { /* no-op */ }
          this._soxProcess = null;
        }
        if (this._recorder) {
          this._recorder.stop();
          this._recorder = null;
        }
        if (this._speaker) {
          this._speaker.end();
          this._speaker = null;
        }
        console.log("[audio] Audio processor shut down");
      }
    
      async _resetSpeaker() {
        await this._ensureAudioModulesLoaded();
        if (this._speaker && !this._speaker.destroyed) {
          // Use destroy() instead of end() to immediately discard buffered audio.
          // end() drains the buffer (plays it out), which causes old MCP response
          // audio to keep playing after barge-in.
          try { this._speaker.destroy(); } catch { /* no-op */ }
        }
        this._speaker = new this._speakerCtor({
          channels: 1,
          bitDepth: 16,
          sampleRate: 24000,
          signed: true,
        });
        this._speaker.on("error", () => {});
      }
    }
    
    // <define_mcp_servers>
    /**
     * Define MCP servers that Voice Live can use during the session.
     * Each server is an MCPTool object added to the session tools array.
     */
    function defineMCPServers() {
      return [
        {
          type: "mcp",
          serverLabel: "deepwiki",
          serverUrl: "https://mcp.deepwiki.com/mcp",
          allowedTools: ["read_wiki_structure", "ask_question"],
          requireApproval: "never",
        },
        {
          type: "mcp",
          serverLabel: "azure_doc",
          serverUrl: "https://learn.microsoft.com/api/mcp",
          requireApproval: "always",
        },
      ];
    }
    // </define_mcp_servers>
    
    class MCPVoiceAssistant {
      constructor(options) {
        this.endpoint = options.endpoint;
        this.credential = options.credential;
        this.model = options.model;
        this.voice = options.voice;
        this.instructions = options.instructions;
        this.audioInputDevice = options.audioInputDevice;
        this.noAudio = options.noAudio;
    
        this._session = null;
        this._subscription = null;
        this._audio = new AudioProcessor(!options.noAudio, options.audioInputDevice);
        this._activeResponse = false;
        this._responseApiDone = false;
        this._pendingApproval = null;
        this._approvalQueue = [];
        this._approvalPromptNeeded = false;
        this._mcpCallInProgress = 0;
        this._handledMcpCompletions = new Set();
        this._needsResponseCreate = false;
        this._approvalCallCount = {};
        this._mcpItemToServer = {};
        this._approvalServers = new Set();
        this._mcpStallTimer = null;
        this._activeMcpItems = new Set();
        this._staleMcpItems = new Set();
        this._mcpResultsPending = false;
        this._approvedServersThisTurn = new Set();
        this._bargeInActive = false;
      }
    
      // <configure_session>
      /**
       * Configure the session with MCP servers in the tools list.
       */
      async _setupSession() {
        console.log("[session] Configuring session with MCP tools...");
    
        const mcpServers = defineMCPServers();
    
        this._approvalServers = new Set(
          mcpServers.filter(s => s.requireApproval === "always").map(s => s.serverLabel)
        );
    
        await this._session.updateSession({
          model: this.model,
          modalities: ["text", "audio"],
          instructions: this.instructions,
          voice: resolveVoiceConfig(this.voice),
          inputAudioFormat: "pcm16",
          outputAudioFormat: "pcm16",
          turnDetection: {
            type: "server_vad",
            threshold: 0.5,
            prefixPaddingInMs: 300,
            silenceDurationInMs: 500,
          },
          inputAudioEchoCancellation: { type: "server_echo_cancellation" },
          inputAudioNoiseReduction: { type: "azure_deep_noise_suppression" },
          inputAudioTranscription: { model: this.model.toLowerCase().includes("realtime") ? "whisper-1" : "azure-speech" },
          tools: mcpServers,
        });
    
        console.log("[session] Session configuration with MCP tools sent");
      }
      // </configure_session>
    
      // <handle_mcp_events>
      /**
       * Subscribe to session events, including MCP-specific events.
       */
      _subscribeToEvents(session) {
        this._subscription = session.subscribe({
          onSessionUpdated: async (event, context) => {
            const s = event.session;
            const model = s?.model;
            const voice = s?.voice;
            console.log(`[session] Session ready: ${context.sessionId}`);
            console.log(
              `  Model: ${typeof model === "string" ? model : model?.toString?.() ?? ""}`,
            );
            console.log(`  Voice: ${voice?.name ?? ""}`);
            writeConversationLog(
              [
                `SessionID: ${context.sessionId}`,
                `Model: ${typeof model === "string" ? model : model?.toString?.() ?? ""}`,
                `Voice Name: ${voice?.name ?? ""}`,
                `Voice Type: ${voice?.type ?? ""}`,
                `Log File: ${conversationLogFile}`,
                "",
              ].join("\n"),
            );
          },
    
          onConversationItemInputAudioTranscriptionCompleted: async (event) => {
            const transcript = event.transcript ?? "";
            console.log(`šŸ‘¤ You said:\t${transcript}`);
            writeConversationLog(`User Input:\t${transcript}`);
            if (this._pendingApproval !== null) {
              await this._resolveVoiceApproval(transcript, session);
            }
          },
    
          onResponseTextDone: async (event) => {
            const text = event.text ?? "";
            console.log(`šŸ¤– Assistant text:\t${text}`);
            writeConversationLog(`Assistant Text Response:\t${text}`);
          },
    
          onResponseAudioTranscriptDone: async (event) => {
            const transcript = event.transcript ?? "";
            console.log(`šŸ¤– Assistant audio transcript:\t${transcript}`);
            writeConversationLog(`Assistant Audio Response:\t${transcript}`);
          },
    
          onInputAudioBufferSpeechStarted: async () => {
            console.log("šŸŽ¤ Listening...");
            this._audio.skipPendingAudio();
    
            // Do NOT reset _approvalCallCount here — the counter should only
            // reset on task completion (in onResponseMcpCallCompleted when no
            // pending/queued approvals remain) or on denial (in _resolveVoiceApproval).
            // Resetting on every speech-start would let the model retry denied calls.
    
            // Clear ALL deferred response flags on barge-in.
            // This prevents onResponseDone (fired by the cancelled response)
            // from immediately creating a new response that overlaps the user.
            this._needsResponseCreate = false;
            this._mcpResultsPending = false;
    
            // Reset approved-servers-this-turn when user starts a new topic
            if (this._pendingApproval === null && this._mcpCallInProgress <= 0) {
              this._approvedServersThisTurn.clear();
            }
    
            if (this._activeResponse && !this._responseApiDone) {
              // Mark barge-in so onResponseDone skips deferred actions
              this._bargeInActive = true;
              try {
                await session.sendEvent({ type: "response.cancel" });
              } catch (err) {
                const msg = err?.message ?? "";
                if (!msg.toLowerCase().includes("no active response")) {
                  console.warn("[barge-in] Cancel failed:", msg);
                }
              }
              try {
                await session.sendEvent({ type: "input_audio_buffer.clear" });
              } catch { /* best-effort */ }
            }
    
            if (this._mcpCallInProgress > 0 && this._pendingApproval === null) {
              this._staleMcpItems = new Set([...this._staleMcpItems, ...this._activeMcpItems]);
              console.log(`[barge-in] Marking ${this._activeMcpItems.size} MCP calls as stale`);
              try {
                await session.addConversationItem({ type: "message", role: "system", content: [{ type: "input_text", text: "A tool call is still running in the background. The user just spoke. Respond to what the user said. If a tool result arrives later, briefly introduce it as a late result from an earlier request." }] });
              } catch {}
            }
          },
    
          onInputAudioBufferSpeechStopped: async () => {
            console.log("šŸ¤” Processing...");
          },
    
          onResponseCreated: async () => {
            this._activeResponse = true;
            this._responseApiDone = false;
          },
    
          onResponseAudioDelta: async (event) => {
            if (event.delta) {
              this._audio.queueAudio(event.delta);
            }
          },
    
          onResponseAudioDone: async () => {
            console.log("šŸŽ¤ Ready for next input...");
          },
    
          onResponseDone: async () => {
            console.log("āœ… Response complete");
            writeConversationLog("--- Response complete ---");
            this._activeResponse = false;
            this._responseApiDone = true;
    
            // If this response.done is the result of a barge-in cancel,
            // skip all deferred actions — the user's new turn will handle things.
            if (this._bargeInActive) {
              this._bargeInActive = false;
              return;
            }
    
            if (this._approvalPromptNeeded && this._pendingApproval !== null) {
              this._approvalPromptNeeded = false;
              await this._sendApprovalVoicePrompt(session);
            } else if (this._mcpResultsPending && this._mcpCallInProgress <= 0 && this._pendingApproval === null) {
              this._mcpResultsPending = false;
              try { await session.sendEvent({ type: "response.create" }); } catch {}
            } else if (this._needsResponseCreate) {
              this._needsResponseCreate = false;
              try { await session.sendEvent({ type: "response.create" }); } catch {}
            }
          },
    
          onServerError: async (event) => {
            const msg = event.error?.message ?? "";
            // Reset response state — errors can terminate a response without onResponseDone
            this._activeResponse = false;
            this._responseApiDone = true;
            if (msg.includes("Cancellation failed: no active response")) return;
            if (msg.toLowerCase().includes("interim response")) {
              console.log("[session] Interim response not supported (non-fatal)");
              return;
            }
            if (msg.toLowerCase().includes("active response")) return;
            console.error(`āŒ VoiceLive error: ${msg}`);
            writeConversationLog(`ERROR: ${msg}`);
          },
    
          // MCP-specific event handlers
          onMcpListToolsCompleted: async (event) => {
            console.log(`šŸ”§ MCP tools discovered successfully`);
            writeConversationLog("MCP tools discovered successfully");
          },
    
          onMcpListToolsFailed: async (event) => {
            console.error(`āŒ MCP tool discovery failed`);
            writeConversationLog("ERROR: MCP tool discovery failed");
          },
    
          onResponseMcpCallInProgress: async (event) => {
            console.log("ā³ MCP tool call in progress...");
            writeConversationLog(`MCP call in progress: ${event.item_id ?? ""}`);
            this._mcpCallInProgress++;
            this._activeMcpItems.add(event.item_id);
            this._startMcpStallTimer(session);
          },
    
          onResponseMcpCallArgumentsDone: async (event) => {
            const name = event.name ?? "";
            console.log(`šŸ“‹ MCP tool call arguments ready: ${name}`);
          },
    
          onResponseMcpCallCompleted: async (event) => {
            const itemId = event.item_id ?? "";
            this._mcpCallInProgress = Math.max(0, this._mcpCallInProgress - 1);
            this._activeMcpItems.delete(itemId);
            this._cancelMcpStallTimer();
            if (this._handledMcpCompletions.has(itemId)) return;
            this._handledMcpCompletions.add(itemId);
    
            const isStale = this._staleMcpItems.has(itemId);
            this._staleMcpItems.delete(itemId);
            console.log(`āœ… MCP tool call completed (stale=${isStale})`);
            writeConversationLog(`MCP call completed: ${itemId} (stale=${isStale})`);
    
            delete this._mcpItemToServer[itemId];
            if (this._pendingApproval === null && this._approvalQueue.length === 0) {
              this._approvalCallCount = {};
            }
    
            if (isStale) {
              try {
                await session.addConversationItem({ type: "message", role: "system", content: [{ type: "input_text", text: "This tool result is from an earlier request. The user has since moved on. Briefly introduce it as a late result, e.g. 'By the way, those results from earlier just came in...' then share the key findings concisely." }] });
              } catch {}
            }
    
            // Batch response: only call response.create when ALL MCP calls for this
            // turn have completed. This prevents partial results and repeated tool calls.
            if (this._mcpCallInProgress <= 0 && this._pendingApproval === null && this._approvalQueue.length === 0) {
              try {
                await session.sendEvent({ type: "response.create" });
              } catch (e) {
                if (e?.message?.toLowerCase().includes("active response")) {
                  this._needsResponseCreate = true;
                }
              }
            } else {
              this._mcpResultsPending = true;
              console.log(`[mcp] MCP calls still in progress (${this._mcpCallInProgress}) — deferring response`);
            }
          },
    
          onResponseMcpCallFailed: async (event) => {
            const itemId = event.item_id ?? "";
            console.error("āŒ MCP tool call failed");
            writeConversationLog(`ERROR: MCP call failed: ${itemId}`);
            this._mcpCallInProgress = Math.max(0, this._mcpCallInProgress - 1);
            this._activeMcpItems.delete(itemId);
            this._staleMcpItems.delete(itemId);
            this._cancelMcpStallTimer();
            try { await session.sendEvent({ type: "response.create" }); } catch {}
          },
    
          onConversationItemCreated: async (event) => {
            const item = event.item;
            if (item?.type === "mcp_call") {
              const sl = item.serverLabel ?? item.server_label ?? "";
              const fn = item.name ?? "";
              this._mcpItemToServer[item.id] = `${sl}/${fn}`;
              console.log(`šŸ”§ MCP tool call: ${sl}/${fn}`);
              writeConversationLog(`MCP tool call: ${sl}/${fn} (id=${item.id})`);
              if (!this._pendingApproval && !this._approvalServers.has(sl)) {
                try {
                  await session.addConversationItem({ type: "message", role: "system", content: [{ type: "input_text", text: "Briefly tell the user you're looking something up. One short sentence only." }] });
                  await session.sendEvent({ type: "response.create" });
                } catch {}
              }
            }
            if (item?.type === "mcp_approval_request") {
              writeConversationLog(`MCP approval request: ${item.serverLabel ?? item.server_label ?? ""} / ${item.name ?? ""} (id=${item.id ?? ""})`);
              await this._handleApprovalRequest(item, session);
            }
          },
        });
      }
      // </handle_mcp_events>
    
      // <handle_approval>
      /**
       * Handle MCP approval requests via voice-based approval flow.
       */
      async _handleApprovalRequest(item, session) {
        const approvalId = item.id ?? "unknown";
        const serverLabel = item.serverLabel ?? item.server_label ?? "unknown";
        const functionName = item.name ?? "unknown";
    
        console.log();
        console.log("šŸ” MCP Approval Request");
        console.log(`   Server: ${serverLabel}`);
        console.log(`   Tool: ${functionName}`);
        console.log(`   Approval ID: ${approvalId}`);
    
        const MAX_APPROVAL_CALLS_PER_TASK = 3;
        const currentCount = this._approvalCallCount[serverLabel] ?? 0;
        if (currentCount >= MAX_APPROVAL_CALLS_PER_TASK) {
          console.log(`   Auto-denied: ${serverLabel}/${functionName} (max ${MAX_APPROVAL_CALLS_PER_TASK} calls reached)`);
          try {
            await session.addConversationItem({
              type: "mcp_approval_response",
              approvalRequestId: approvalId,
              approve: false,
            });
          } catch (err) {
            console.warn("Failed to send auto-deny:", err?.message ?? err);
          }
          return;
        }
    
        // Auto-approve if user already approved this server earlier in the same turn
        if (this._approvedServersThisTurn.has(serverLabel)) {
          console.log(`   Auto-approved: ${serverLabel}/${functionName} (already approved this turn)`);
          try {
            await session.addConversationItem({
              type: "mcp_approval_response",
              approvalRequestId: approvalId,
              approve: true,
            });
          } catch (err) {
            console.warn("Failed to send auto-approve:", err?.message ?? err);
          }
          return;
        }
    
        if (this._pendingApproval !== null) {
          this._approvalQueue.push({ approvalId, serverLabel, functionName });
          console.log("   (queued — another approval is pending)");
          return;
        }
    
        this._pendingApproval = { approvalId, serverLabel, functionName };
    
        if (!this._activeResponse) {
          await this._sendApprovalVoicePrompt(session);
        } else {
          this._approvalPromptNeeded = true;
        }
      }
    
      async _sendApprovalVoicePrompt(session) {
        const pending = this._pendingApproval;
        if (!pending) return;
    
        const server = pending.serverLabel;
        const count = this._approvalCallCount[server] ?? 0;
        this._approvalCallCount[server] = count + 1;
    
        let prompt;
        if (count === 0) {
          prompt = `You MUST ask the user for explicit permission before proceeding. Say exactly: "I'd like to search the ${server} service for information. Do you approve? Please say yes or no."`;
        } else {
          prompt = `You MUST ask the user for permission again. Say exactly: "I need to do one more search to get complete information. Should I continue? Please say yes or no."`;
        }
    
        try {
          await session.addConversationItem({
            type: "message",
            role: "system",
            content: [{ type: "input_text", text: prompt }],
          });
          await session.sendEvent({ type: "response.create" });
        } catch (err) {
          console.error("āŒ Failed to send approval voice prompt:", err?.message ?? err);
        }
      }
    
      // <voice_approval_transcription>
      async _resolveVoiceApproval(transcript, session) {
        if (this._pendingApproval === null) return;
    
        const lower = transcript.toLowerCase();
        let approved = /\byes\b/.test(lower);
        const denied = /\b(no|stop|cancel)\b/.test(lower);
    
        if (!approved && !denied) {
          // Ambiguous — will re-prompt at next response.done
          this._approvalPromptNeeded = true;
          return;
        }
    
        if (approved && denied) {
          approved = false; // Conflicting signals — deny for safety
        }
    
        const { approvalId, serverLabel } = this._pendingApproval;
    
        console.log(`   Voice response: ${approved ? "Approved āœ…" : "Denied āŒ"}`);
        writeConversationLog(`Voice approval: ${approved ? "Approved" : "Denied"} for ${serverLabel}`);
    
        this._pendingApproval = null;
    
        if (approved) {
          this._approvedServersThisTurn.add(serverLabel);
        } else {
          this._approvalCallCount = {};
          this._approvedServersThisTurn.delete(serverLabel);
        }
    
        try {
          await session.addConversationItem({
            type: "mcp_approval_response",
            approvalRequestId: approvalId,
            approve: approved,
          });
        } catch (err) {
          console.error("āŒ Failed to send approval response:", err?.message ?? err);
        }
    
        await this._processNextApproval(session);
      }
    
      async _processNextApproval(session) {
        if (this._approvalQueue.length === 0) return;
    
        const next = this._approvalQueue.shift();
    
        // Auto-approve if user already approved this server earlier in the same turn
        if (this._approvedServersThisTurn.has(next.serverLabel)) {
          console.log(`   Auto-approved (queued): ${next.serverLabel}/${next.functionName}`);
          try {
            await session.addConversationItem({
              type: "mcp_approval_response",
              approvalRequestId: next.approvalId,
              approve: true,
            });
          } catch (err) {
            console.warn("Failed to send queued auto-approve:", err?.message ?? err);
          }
          await this._processNextApproval(session);
          return;
        }
    
        this._pendingApproval = next;
    
        if (!this._activeResponse) {
          await this._sendApprovalVoicePrompt(session);
        } else {
          this._approvalPromptNeeded = true;
        }
      }
      // </voice_approval_transcription>
    
      // </handle_approval>
    
      // <mcp_stall_detection>
      _startMcpStallTimer(session) {
        this._cancelMcpStallTimer();
        let stallCount = 0;
        const MCP_STALL_MAX_NOTIFICATIONS = 3;
        this._mcpStallTimer = setInterval(async () => {
          if (this._mcpCallInProgress <= 0) {
            this._cancelMcpStallTimer();
            return;
          }
          stallCount++;
          if (stallCount > MCP_STALL_MAX_NOTIFICATIONS) {
            this._cancelMcpStallTimer();
            return;
          }
          // MCP calls cannot be cancelled — only honest status updates are possible.
          const msg = "The tool call is still running. Briefly reassure the user that you're still waiting for results. One short sentence only.";
          try {
            await session.addConversationItem({ type: "message", role: "system", content: [{ type: "input_text", text: msg }] });
            await session.sendEvent({ type: "response.create" });
          } catch (e) {
            if (e?.message?.toLowerCase().includes("active response")) {
              this._needsResponseCreate = true;
            }
          }
        }, 10000);
      }
    
      _cancelMcpStallTimer() {
        if (this._mcpStallTimer) {
          clearInterval(this._mcpStallTimer);
          this._mcpStallTimer = null;
        }
      }
      // </mcp_stall_detection>
    
      async start() {
        const client = new VoiceLiveClient(this.endpoint, this.credential, {
          apiVersion: "2026-01-01-preview",
        });
        const session = client.createSession({ model: this.model });
        this._session = session;
    
        console.log(
          `[init] Connecting to VoiceLive with model "${this.model}" at "${this.endpoint}" ...`,
        );
    
        this._subscribeToEvents(session);
    
        await session.connect();
        console.log("[init] Connected to VoiceLive session websocket");
    
        await this._setupSession();
    
        await this._audio.startPlayback();
        await this._audio.startCapture(session);
    
        console.log("\n" + "=".repeat(70));
        console.log("šŸŽ¤ VOICE ASSISTANT WITH MCP READY");
        console.log("Try saying:");
        console.log('  • "Can you summarize the GitHub repo azure-sdk-for-java?"');
        console.log('  • "Search the Azure documentation for Voice Live API."');
        console.log("You may need to approve some MCP tool calls by voice.");
        console.log("Press Ctrl+C to exit");
        console.log("=".repeat(70) + "\n");
    
        if (this.noAudio) {
          setTimeout(() => {
            process.emit("SIGINT");
          }, 6000);
        }
    
        await new Promise((resolve) => {
          const onSignal = () => resolve();
          process.once("SIGINT", onSignal);
          process.once("SIGTERM", onSignal);
    
          const poll = setInterval(() => {
            if (!session.isConnected) {
              clearInterval(poll);
              resolve();
            }
          }, 500);
        });
    
        await this.shutdown();
      }
    
      async shutdown() {
        this._cancelMcpStallTimer();
    
        if (this._subscription) {
          await this._subscription.close();
          this._subscription = null;
        }
    
        if (this._session) {
          try {
            await this._session.disconnect();
          } catch {
            // ignore disconnect errors during shutdown
          }
    
          this._audio.shutdown();
    
          try {
            await this._session.dispose();
          } catch {
            // ignore dispose errors during shutdown
          }
    
          this._session = null;
        }
      }
    }
    
    async function main() {
      let args;
      try {
        args = parseArguments(process.argv.slice(2));
      } catch (err) {
        console.error(`āŒ ${err.message}`);
        printUsage();
        process.exit(1);
      }
    
      if (args.help) {
        printUsage();
        return;
      }
    
      if (args.listAudioDevices) {
        await listAudioDevices();
        return;
      }
    
      if (!args.endpoint) {
        console.error(
          "āŒ Missing endpoint. Set AZURE_VOICELIVE_ENDPOINT or pass --endpoint.",
        );
        process.exit(1);
      }
    
      if (!args.apiKey && !args.useTokenCredential) {
        console.error("āŒ No authentication provided.");
        console.error(
          "Provide --api-key / AZURE_VOICELIVE_API_KEY or use --use-token-credential.",
        );
        process.exit(1);
      }
    
      const credential = args.useTokenCredential
        ? new DefaultAzureCredential()
        : new AzureKeyCredential(args.apiKey);
    
      console.log("Configuration:");
      console.log(`  AZURE_VOICELIVE_ENDPOINT: ${args.endpoint}`);
      console.log(`  AZURE_VOICELIVE_MODEL: ${args.model}`);
      console.log(`  AZURE_VOICELIVE_VOICE: ${args.voice}`);
      console.log(`  AUDIO_INPUT_DEVICE: ${args.audioInputDevice ?? "(not set)"}`);
      console.log(`  No audio mode: ${args.noAudio ? "enabled" : "disabled"}`);
      console.log(
        `  Authentication: ${args.useTokenCredential ? "DefaultAzureCredential" : "API Key"}`,
      );
      console.log(`  Log file: ${conversationLogFile}`);
    
      const assistant = new MCPVoiceAssistant({
        endpoint: args.endpoint,
        credential,
        model: args.model,
        voice: args.voice,
        instructions: args.instructions,
        audioInputDevice: args.audioInputDevice,
        noAudio: args.noAudio,
      });
    
      try {
        await assistant.start();
      } catch (err) {
        if (err?.code === "ERR_USE_AFTER_CLOSE") return;
        console.error("Fatal error:", err);
        process.exit(1);
      }
    }
    
    console.log("šŸŽ™ļø  Voice Assistant with MCP - Azure VoiceLive SDK");
    console.log("=".repeat(70));
    main().then(
      () => console.log("\nšŸ‘‹ Voice assistant shut down. Goodbye!"),
      (err) => {
        console.error("Unhandled error:", err);
        process.exit(1);
      },
    );
    
  2. Sign in to Azure with the following command:

    az login
    
  3. Run the application:

    node mcp-quickstart.js
    
  4. Speak into your microphone. Try asking questions like "What tools do you have?" or "Search the Azure documentation for Voice Live API."

    • For the deepwiki server (require_approval: "never"), tool calls execute automatically.
    • For the azure_doc server (require_approval: "always"), you're prompted to approve each tool call in the console.
  5. Press Ctrl+C to stop the session.

MCP server configuration reference

Parameter Required Description
server_label Yes Display name for the MCP server.
server_url Yes URL of the remote MCP endpoint.
allowed_tools No List of tool names the model can call. If omitted, all tools are allowed.
require_approval No "never", "always" (default), or a per-tool dictionary.
headers No Extra HTTP headers to include in MCP requests.
authorization No Authorization token for MCP requests.

For the complete REST API type definition, see MCPTool in the Voice Live API reference.

Best practices

Integrating MCP servers into a voice assistant introduces UX challenges that don't exist in text-based or console-based MCP clients. MCP tool calls can take 3–60+ seconds, approval prompts must happen conversationally, and users expect continuous spoken feedback. Plan for these patterns when building a voice-enabled MCP integration.

Voice-native approval

Console-based MCP samples typically use blocking input (such as input() or readline) for approval. In a voice assistant, blocking the audio pipeline freezes the conversation. Instead, handle approvals conversationally:

  • Inject a system message that instructs the model to verbally ask for permission.
  • Parse the user's spoken response for clear intent (yes, no, stop, cancel).
  • Allow barge-in so the user can say "yes" without waiting for the full approval prompt to finish.
  • Use word-boundary matching (such as \byes\b) to avoid false positives from words like "yesterday" or "nobody".

System instructions for the approval flow

The model needs explicit instructions about the approval flow in its system prompt. Without them, it might paraphrase the permission request into a generic "Let me look that up," skipping the actual question. Include language like:

"Some tools require user approval. When you receive a system message asking you to request permission, you MUST clearly ask the user for their explicit approval. Never skip the approval question or assume permission is granted."

Use "Say exactly:" phrasing in per-request system messages to prevent the model from rewording the question.

Handle repeated tool calls

MCP servers might require multiple searches to gather complete information. Each search triggers a separate approval if require_approval="always". Rather than asking the identical question each time:

  • Track the call count per server.
  • Change the prompt wording for subsequent calls (for example, "I need one more search. Should I continue?").
  • Consider auto-denying after a maximum number of approved calls (for example, 3) to prevent infinite loops. The model responds with what it has.
  • Reset the counter when results are delivered or the user denies a request.

For approval-required servers, consider auto-approving subsequent calls to the same server within the same turn to avoid repeated voice prompts for what is logically a single task.

Fill silence during tool calls

MCP tool calls can take several seconds to complete. Without feedback, the user assumes the assistant is unresponsive. Use these complementary layers:

  1. Tool announcements (immediate, client-side): For auto-approved servers, have the assistant say something like "Let me look that up" when the call starts. Skip this for approval-required servers since the approval prompt already communicates that a tool call is happening.
  2. Stall detection (client-side, repeating timer): If a tool call runs longer than expected, proactively tell the user the assistant is still waiting. A 10-second interval with a maximum of 3 notifications works well for medium-latency servers (5–15 seconds). Adjust the interval based on your expected MCP server latency.

Note

MCP calls can't be cancelled. Stall notifications are status updates, not actionable options. Once a call starts, it runs until the server responds or times out.

Handle barge-in during MCP calls

Users naturally try to interrupt or ask "Are you still there?" during long tool calls. Rather than ignoring this:

  • Inject a system message so the model can acknowledge the user.
  • If the original MCP call completes later, introduce its result as a late result (for example, "By the way, those results from earlier just came in...").
  • Protect against response collisions: when a cancelled response's completion handler runs, skip any deferred processing (pending approval prompts, queued MCP results) so it doesn't overlap with the user's new turn.

Choose MCP servers for voice latency

Not all MCP servers are well-suited for voice UX. When selecting MCP servers for a voice assistant:

  • Prefer low-latency servers — search APIs, simple lookups, and cached data sources that respond within 5 seconds work best.
  • Avoid servers that perform heavy computation — large repository analysis, complex document retrieval, or multi-step workflows can take 30–60+ seconds, degrading the voice experience.
  • Plan for non-cancellable calls — MCP calls can't be cancelled from the client. If the user moves on during a slow call, the result arrives out of context and must be introduced as a late result, which can feel disjointed.
  • Consider your use case — if users expect real-time answers, long-running MCP servers frustrate them. If the interaction style is more like a research assistant, asynchronous results might be acceptable.

Troubleshooting

MCP tool discovery fails (mcp_list_tools.failed)

Voice Live contacts each MCP server's tool listing endpoint at session start. If discovery fails, no tools from that server are available during the session.

Cause Resolution
Incorrect server_url Verify the MCP server URL is reachable and includes the correct path (for example, https://mcp.deepwiki.com/mcp).
Server is unreachable Confirm the MCP server is running and accessible from Azure's network. Check firewall rules and DNS resolution.
Authentication failure If the server requires authentication, verify the authorization or headers values are correct and not expired.
Server returns invalid tool schema Check the MCP server's tool listing response conforms to the MCP specification.

MCP tool call fails (response.mcp_call.failed)

A tool call failure means Voice Live successfully discovered the tool but the call didn't complete.

Cause Resolution
Server timeout The MCP server took too long to respond. Optimize the server-side handler or choose a lower-latency server.
Server returned an error Check your MCP server logs. Common issues include missing parameters, invalid input, or downstream service failures.
Network interruption Transient network errors between Voice Live and the MCP server. Retry by prompting the model again.

Tip

When an MCP call fails, trigger response.create so the model can inform the user and continue the conversation. The sample code does this automatically.

No MCP events received

Cause Resolution
Wrong API version MCP requires api_version="2026-04-10" or later. Earlier API versions silently ignore MCP server configuration.
MCP servers not in session config Verify that MCPServer objects are included in the tools list passed to configure_session or updateSession.
allowed_tools mismatch If allowed_tools is set, only the listed tool names are exposed. Verify the names match exactly what the MCP server advertises.

Approval requests not received

Cause Resolution
require_approval set to "never" Tool calls auto-execute without approval. Change to "always" or use a per-tool dictionary if you need approval for specific tools.
Event handler not subscribed Ensure your code listens for mcp_approval_request conversation items in the event loop.
Duplicate handling The approval request arrives as a conversation item creation event, not a standalone event type. Check that your conversation.item.created handler inspects the item type.

Response collision errors during MCP flow

Voice Live doesn't allow overlapping responses. During MCP flows, response.create calls can collide with an in-progress response.

Cause Resolution
"Cancellation failed: no active response" Non-fatal. This occurs when a cancel is issued but the response already completed. Log and ignore.
"active response" errors A new response.create was attempted while another response is still generating. Track response state (response.created / response.done events) and defer actions until the active response completes.
Interim response errors Some model pipelines don't support interimResponse. If you receive interim response errors, remove the interim response configuration or verify your model supports it.