Phantom — AI Voice Agent for Chrome

Capabilities

Everything you'd want a browser assistant to do

Voice-first, vision-enabled, and privacy-aware. Phantom uses Gemini to understand your screen and act on it.

Voice & text chat

Natural conversation via Gemini Live. Speak or type — Phantom responds with real voice, not robot TTS.

Screen vision

Phantom sees your screen in real time. Describe what's on the page, find elements, react to changes as they happen.

Click, scroll, type

18 browser tools — DOM interaction via selectors plus AI-powered coordinate clicking for canvas and complex UIs.

Memory

Remembers who you are and what you've done across sessions. Local embeddings for semantic recall — nothing leaves your browser.

Content actions

Highlight any text on a page and get an AI summary, rewrite, explanation, or translation in a popup — powered by Flash Lite.

Privacy shield

Automatically blurs passwords, credit cards, API keys, and PII before any screenshot reaches the AI. Your secrets stay yours.

Get started

Up and running in 30 seconds

Install the extension

Download the zip, unpack it, and load it in Chrome's developer mode.

Add your API key

Grab a free Gemini API key from Google AI Studio and paste it in settings.

Pick a persona

Choose from 9 personalities — each with a unique voice, look, and attitude.

Start talking

Click the microphone and tell Phantom what to do. It handles the rest.

Privacy

Your data stays on your device

Phantom is designed with privacy as a default, not an afterthought.

Privacy Shield

Sensitive content is blurred before any screenshot is sent to the AI — passwords, credit cards, API keys, SSNs, and more.

Memory and embeddings stored locally in your browser — never sent to a server
Regex + DOM scanning detects 9 categories of PII automatically
Session summaries generated server-side but stored only on your device
No accounts, no tracking, no telemetry
Open source — audit every line of code

Built with Google

Technology Stack

Phantom is built entirely on Google technologies — from the AI model to the deployment platform.

Technology	How we use it
Gemini 2.5 Flash Native Audio	Real-time voice conversations via the Live API WebSocket protocol
Gemini Live API	Bidirectional audio streaming, function calling, session resumption, context compression
Affective Dialog	Model reads tone and emotion from the user's voice for natural responses
Proactive Audio	Model intelligently decides when to respond vs. stay silent
Google Search Grounding	Model can search the web for current, factual information
@google/genai SDK	Server-side session management for the Gemini Live connection
Google Cloud Run	Hosts the auto-scaling WebSocket proxy server
Chrome Extension APIs	Side Panel, Scripting, Tabs, Tab Capture, Storage, Commands (MV3)
WebGL	GPU-accelerated page effects and real-time audio wave visualizer
Google Fonts	Google Sans typography across the extension and website

Talk to your browser.
It listens.

Everything you'd want a browser assistant to do

Voice & text chat

Screen vision

Click, scroll, type

Memory

Content actions

Privacy shield

Up and running in 30 seconds

Install the extension

Add your API key

Pick a persona

Start talking

Your data stays on your device

Privacy Shield

How it all fits together

Technology Stack

Ready to try Phantom?

Talk to your browser.It listens.

Everything you'd want a browser assistant to do

Voice & text chat

Screen vision

Click, scroll, type

Memory

Content actions

Privacy shield

Up and running in 30 seconds

Install the extension

Add your API key

Pick a persona

Start talking

Your data stays on your device

Privacy Shield

How it all fits together

Technology Stack

Ready to try Phantom?

Talk to your browser.
It listens.