🎙️

Voice Setup

Voice

Complete voice configuration in chat, including PTT key, microphone permissions, ElevenLabs TTS, and troubleshooting

Install
assistant skills install voice-setup

compatibility:Designed for Vellum personal assistants
metadata.vellum.includes:["elevenlabs-voice"]

You are helping the user set up and troubleshoot voice features (push-to-talk, text-to-speech) entirely within this conversation. Do NOT direct the user to the Settings page for initial setup - handle everything in-chat using the tools below.

Available Tools

  • voice_config_update - Change any voice setting (PTT key, conversation timeout, TTS voice ID)
  • open_system_settings - Open macOS System Settings to a specific privacy pane
  • navigate_settings_tab - Open the Vellum settings panel to the Voice tab
  • credential_store - Collect API keys securely (for ElevenLabs TTS)

Setup Flow

Walk the user through each section in order. Skip sections they don't need. Ask before proceeding to the next section.

1. Microphone Permission

Check <channel_capabilities> for microphone_permission_granted.

If false or missing:

  1. Explain that macOS requires microphone permission for voice features.
  2. Use open_system_settings with pane: "microphone" to open the right System Settings pane.
  3. Tell the user: "I've opened System Settings to the Microphone section. Please toggle Vellum Assistant on, then come back here."
  4. After they confirm, verify by checking capabilities on the next turn.

If true: Tell them microphone is already granted and move on.

2. Push-to-Talk Activation Key

Present common PTT key options:

  • Right Option - Default, good general choice
  • Fn - Dedicated key on most Mac keyboards
  • Right Command - Easy to reach
  • Right Control - Familiar from gaming

Ask which key they prefer, then use voice_config_update with setting: "activation_key" and the chosen value.

Common issues to mention:

  • If they pick a key that conflicts with their emoji picker (Fn or Globe on newer Macs), warn them and suggest an alternative.
  • If they use a terminal app heavily, warn that some keys may be captured by the terminal.

3. Text-to-Speech / ElevenLabs (Optional)

Ask if they want high-quality text-to-speech voices via ElevenLabs (optional - standard TTS works without it).

If yes, the included ElevenLabs Voice skill (automatically appended below via includes) provides the full setup flow: curated voice list, API key collection, advanced voice selection, and tuning parameters. Follow the instructions there.

Note: The config key services.tts.providers.elevenlabs.voiceId controls the voice for both in-app TTS and phone calls. If the user sets up phone calls later, they will automatically use the same voice for a consistent experience.

4. Verification

After setup is complete:

  1. Summarize what was configured.
  2. Suggest they test by pressing their PTT key and speaking.
  3. Offer to open the Voice settings tab if they want to review: use navigate_settings_tab with tab: "Voice".

Troubleshooting Decision Trees

When the user reports a problem, follow the appropriate decision tree:

"PTT isn't working" / "Can't record"

  1. Microphone permission - Check microphone_permission_granted in capabilities. If false, guide through granting it.
  2. Key check - Ask what key they're using. Confirm it matches their configured PTT key.
  3. Emoji picker conflict - On newer Macs, Fn/Globe opens the emoji picker. If they're using Fn, suggest switching to Right Option or Right Command.
  4. Speech Recognition permission - Some voice features need this. Use open_system_settings with pane: "speech_recognition".
  5. App focus - PTT may not work when Vellum is not the frontmost app or if another app has captured the key.

"Recording but no text" / "Transcription not working"

  1. Speech Recognition permission - Must be granted for transcription.
  2. Microphone input - Ask if they see the recording indicator. If yes, the mic works but transcription is failing.
  3. Locale/language - Speech recognition works best with the system language. Ask if they're speaking in a different language.
  4. Background noise - Excessive noise can prevent transcription. Suggest a quieter environment or a closer microphone.

"Changed a setting but it didn't work"

  1. Event broadcast - The setting should take effect immediately. If it didn't, suggest restarting the assistant.
  2. Verify - Open the Voice settings tab with navigate_settings_tab to confirm the setting was persisted.

Deep Debugging

For persistent issues, suggest checking system logs:

log stream --predicate 'subsystem == "com.vellum.assistant"' --level debug

Key log categories:

  • voice - PTT activation, recording state
  • speech - Speech recognition results

Rules

  • Always handle setup conversationally in-chat. Do NOT tell the user to go to Settings for initial configuration.
  • Use navigate_settings_tab only for review/verification after in-chat setup, not as the primary setup method.
  • Be concise. Don't explain every option exhaustively - present the most common choices and let the user ask for more.
  • If a permission is denied, acknowledge it gracefully and explain what features won't work without it.
CreatorVellum
LicenseMIT
Updated1 month ago
SecurityVerified
View on GitHub

The Personal AI you were promised

GET STARTED