← All posts
6 min read

MCP Screenshot Tool: Capture Any Mac App for Your AI

Give Claude Code and Cursor eyes on any native Mac app, privately. An on-device MCP screenshot tool with region crops, blur redaction, and a spatial manifest.

Your AI coding agent can read every file in your repo — but it cannot see your screen. Wire it an MCP screenshot tool and that changes: the agent captures what it needs, when it needs it, and reasons over the result. Here is what an MCP screenshot tool is, why "any Mac app" and "on-device" matter, and how to set one up for Claude Code or Cursor.

What is an MCP screenshot tool?

MCP (Model Context Protocol) is the standard that lets an AI agent call external tools directly. An MCP screenshot tool exposes screen-capture functions over MCP, so instead of you screenshotting and pasting on every iteration, the agent can:

  • list your open windows and capture the right one,
  • zoom into a sub-region to read small text,
  • run OCR to pull an exact error string,
  • and do it again next loop — no courier required.

(For the deeper "why agents are blind" background, see How to Give Your AI Coding Agent Visual Context.)

The catch is that not all screenshot-for-AI tools are equal. Three things separate a toy from one you'll actually keep in your loop.

1. It has to capture any Mac app — not just the browser

A lot of "annotate for AI" tools are browser extensions. They read the page's DOM, which is fine until your bug lives somewhere the browser can't see: a simulator, Xcode, a native app, a generated image, a Spine or Live2D model, a video frame. If the tool can only screenshot a web page, your agent is still blind to most of your screen.

An OS-level MCP screenshot tool captures any window on macOS. Screentack uses macOS's native capture APIs, so "show the agent what's wrong" works the same whether the problem is in a browser tab or a native app you built.

2. It has to be private — on-device, with redaction

You're about to send a picture of your screen to an AI. That screen often has other things on it: tokens in a terminal, a customer's name, an unrelated message thread.

Two properties make that safe:

  • On-device capture. Screentack captures and processes everything locally — nothing is uploaded to a vendor cloud. The image goes to your agent, not through someone else's servers.
  • Blur everything you didn't pin. Screentack's redaction mode Gaussian-blurs the whole screen except the regions you marked, before the image is shared. The bug stays sharp; secrets and unrelated windows blur out. (Crops and OCR still run on the sharp originals locally — only what leaves your machine is redacted.)

Local by default, redacted on the way out — that's what makes "screenshot my screen for AI" something you can do without auditing everything else that's visible first.

3. It has to send structured context, not just an image

Vision models are good, but they're far more reliable when the spatial facts are also written down. A raw pasted image makes the model hunt; a manifest hands it a map.

Screentack returns, in one shot: the full frame (layout preserved), a tight crop of every region you marked (fine detail legible), and a spatial manifest — each region's label, normalized position, the window it came from, and per-display coordinates:

## Capture — checkout.app  ·  Display 1 (2560x1600 @2x)
region-1  "wrong accent color"   x:18%  y:32%   crop: region-1.png
region-2  "misaligned button"    x:71%  y:64%   crop: region-2.png

Now the agent reasons over region-2: "misaligned button" at x:71% y:64%, not "the blue thing near the top." That's the difference between one clean fix and a five-message back-and-forth.

Set it up for Claude Code or Cursor

Screentack hosts its MCP server locally and registers with a single command:

claude mcp add --transport http screentack http://127.0.0.1:PORT/mcp

Once it's wired in, your agent has 12 MCP tools — capture a window, zoom into a region, run OCR, record a window's changes — and can call them itself, mid-task, without you screenshotting anything.

One-time purchase

Screentack is a one-time purchase — buy it once, it's yours. It's macOS-native, private and on-device by default, and built for exactly this loop: stop narrating your screen to a blind assistant; hand it eyes, a map, and the privacy to use them.

Want your agent to see your whole Mac — privately? Join the early-access waitlist.

Frequently asked questions

What is an MCP screenshot tool?

An MCP screenshot tool is a screen-capture utility that exposes its capture functions to an AI agent over the Model Context Protocol (MCP). Instead of you screenshotting and pasting on every iteration, the agent calls tools like capture_window or capture_region itself, gets the image plus structured metadata back, and reasons over it directly.

Can it capture native Mac apps, not just the browser?

Yes. Browser-extension annotation tools can only see a web page's DOM. An OS-level MCP screenshot tool like Screentack captures any window on macOS — Xcode, a simulator, a native app, a generated image — so your agent is not limited to the browser.

Is sending screenshots to AI private?

It can be. Screentack captures and processes everything on-device — nothing is uploaded to a cloud. Its redaction mode blurs everything outside the regions you pinned before the image is shared, so secrets and unrelated windows stay private.

How do I add a screenshot MCP server to Claude Code or Cursor?

Point your agent at the local MCP endpoint, e.g. claude mcp add --transport http screentack http://127.0.0.1:PORT/mcp. Once registered, the agent can capture windows, zoom into regions, and run OCR on its own.