Phantom mascot
Built with Gemini

Talk to your browser.
It listens.

Phantom is a voice-powered AI agent that lives in Chrome. Ask it to click, scroll, read, summarize, and navigate — hands-free.

Capabilities

Everything you'd want a browser assistant to do

Voice-first, vision-enabled, and privacy-aware. Phantom uses Gemini to understand your screen and act on it.

Voice & text chat

Natural conversation via Gemini Live. Speak or type — Phantom responds with real voice, not robot TTS.

Screen vision

Phantom sees your screen in real time. Describe what's on the page, find elements, react to changes as they happen.

Click, scroll, type

18 browser tools — DOM interaction via selectors plus AI-powered coordinate clicking for canvas and complex UIs.

Memory

Remembers who you are and what you've done across sessions. Local embeddings for semantic recall — nothing leaves your browser.

Content actions

Highlight any text on a page and get an AI summary, rewrite, explanation, or translation in a popup — powered by Flash Lite.

Privacy shield

Automatically blurs passwords, credit cards, API keys, and PII before any screenshot reaches the AI. Your secrets stay yours.

Up and running in 30 seconds

1

Install the extension

Download the zip, unpack it, and load it in Chrome's developer mode.

2

Add your API key

Grab a free Gemini API key from Google AI Studio and paste it in settings.

3

Pick a persona

Choose from 9 personalities — each with a unique voice, look, and attitude.

4

Start talking

Click the microphone and tell Phantom what to do. It handles the rest.

Privacy

Your data stays on your device

Phantom is designed with privacy as a default, not an afterthought.

Privacy Shield

Sensitive content is blurred before any screenshot is sent to the AI — passwords, credit cards, API keys, SSNs, and more.

How it all fits together

From voice input to browser action — a real-time pipeline powered by Gemini.

System Architecture
Built with Google

Technology Stack

Phantom is built entirely on Google technologies — from the AI model to the deployment platform.

Technology How we use it
Gemini 2.5 Flash Native Audio Real-time voice conversations via the Live API WebSocket protocol
Gemini Live API Bidirectional audio streaming, function calling, session resumption, context compression
Affective Dialog Model reads tone and emotion from the user's voice for natural responses
Proactive Audio Model intelligently decides when to respond vs. stay silent
Google Search Grounding Model can search the web for current, factual information
@google/genai SDK Server-side session management for the Gemini Live connection
Google Cloud Run Hosts the auto-scaling WebSocket proxy server
Chrome Extension APIs Side Panel, Scripting, Tabs, Tab Capture, Storage, Commands (MV3)
WebGL GPU-accelerated page effects and real-time audio wave visualizer
Google Fonts Google Sans typography across the extension and website

Ready to try Phantom?

Free, open source, and takes 30 seconds to set up. All you need is Chrome and a Gemini API key.