Phantom is a voice-powered AI agent that lives in Chrome. Ask it to click, scroll, read, summarize, and navigate — hands-free.
Voice-first, vision-enabled, and privacy-aware. Phantom uses Gemini to understand your screen and act on it.
Natural conversation via Gemini Live. Speak or type — Phantom responds with real voice, not robot TTS.
Phantom sees your screen in real time. Describe what's on the page, find elements, react to changes as they happen.
18 browser tools — DOM interaction via selectors plus AI-powered coordinate clicking for canvas and complex UIs.
Remembers who you are and what you've done across sessions. Local embeddings for semantic recall — nothing leaves your browser.
Highlight any text on a page and get an AI summary, rewrite, explanation, or translation in a popup — powered by Flash Lite.
Automatically blurs passwords, credit cards, API keys, and PII before any screenshot reaches the AI. Your secrets stay yours.
Download the zip, unpack it, and load it in Chrome's developer mode.
Grab a free Gemini API key from Google AI Studio and paste it in settings.
Choose from 9 personalities — each with a unique voice, look, and attitude.
Click the microphone and tell Phantom what to do. It handles the rest.
Phantom is designed with privacy as a default, not an afterthought.
Sensitive content is blurred before any screenshot is sent to the AI — passwords, credit cards, API keys, SSNs, and more.
From voice input to browser action — a real-time pipeline powered by Gemini.
Phantom is built entirely on Google technologies — from the AI model to the deployment platform.
| Technology | How we use it |
|---|---|
| Gemini 2.5 Flash Native Audio | Real-time voice conversations via the Live API WebSocket protocol |
| Gemini Live API | Bidirectional audio streaming, function calling, session resumption, context compression |
| Affective Dialog | Model reads tone and emotion from the user's voice for natural responses |
| Proactive Audio | Model intelligently decides when to respond vs. stay silent |
| Google Search Grounding | Model can search the web for current, factual information |
| @google/genai SDK | Server-side session management for the Gemini Live connection |
| Google Cloud Run | Hosts the auto-scaling WebSocket proxy server |
| Chrome Extension APIs | Side Panel, Scripting, Tabs, Tab Capture, Storage, Commands (MV3) |
| WebGL | GPU-accelerated page effects and real-time audio wave visualizer |
| Google Fonts | Google Sans typography across the extension and website |
Free, open source, and takes 30 seconds to set up. All you need is Chrome and a Gemini API key.