OpenAI Explains URL-Based Data Exfiltration Mitigations in New Paper
嗯,用户让我总结一篇文章的内容,控制在100字以内,而且不需要特定的开头。首先,我需要快速浏览文章内容,抓住主要信息。 文章讲的是OpenAI如何防止通过URL进行的数据外泄。他们最初在2023年通过url_safe功能进行了初步缓解,但后来发现有漏洞,比如在macOS和iOS应用中没有完全应用这些措施。到了2025年,他们增加了新的缓解措施,使用自己的网络爬虫来创建允许列表。如果URL是动态生成的,则会被标记为不安全。 此外,文章提到尽管有这些措施,仍有一些绕过方法,比如使用单独的URL来表示每个字母。作者还建议了进一步的缓解措施,比如缓存响应和限制同一会话中重复访问相同的URL。 最后,文章指出虽然这些措施是改进的一步,但并非万无一失,并且需要确保所有开发人员都正确采用这些缓解措施。 现在我需要将这些要点浓缩到100字以内。要涵盖时间线、缓解措施的变化、绕过方法以及未来的建议。 </think> OpenAI介绍了防止语言模型通过URL外泄数据的新缓解措施。初始缓解于2023年引入,但存在漏洞。2025年新增基于爬虫的允许列表机制,并直接用户输入的URL视为安全。尽管如此,仍存在绕过方法如利用多个URL映射字符。作者建议进一步限制重复访问同一URL以提高安全性。 2026-2-5 07:0:16 Author: embracethered.com(查看原文) 阅读量:1 收藏

Last week I saw this paper from OpenAI called “Preventing URL-Based Data Exfiltration in Language-Model Agents”, which goes into detail on new mitigations they’ve added.

OpenAI Paper Abstract

This is a great read. I like this transparency.

Initial Disclosure in 2023

Nearly three years ago I reported the zero-click data exfiltration exploit to OpenAI. Back in early 2023 OpenAI did not have a bug bounty program, so communication was via email, and unfortunately there was little traction or appetite to fix the problem in ChatGPT. I also reported the same issue to Microsoft as Bing Chat was impacted, and Microsoft applied a fix (via a Content-Security-Policy header) in May 2023 to generally prevent loading of images.

Since then, dozens of such vulnerabilities across the AI ecosystem, being one of the most common AI application security vulnerabilities.

Tackling the Problem

In December 2023 OpenAI added initial mitigations via a url_safe feature. But it was unclear exactly what the decision process was that made a URL safe. I reverse engineered a few things over time and found bypasses, like those presented at BlackHat Europe 2024. Additionally, there were places where OpenAI had not added the mitigation at all, like in the macOS and iOS apps.

Interestingly, back then I wrote:

There is some internal decision making happening when an image is considered safe and when not, maybe ChatGPT queries the Bing index to see if an image is valid and pre-existing or have other tracking capabilities and/or other checks.

Around August 2025 additional mitigations were added, specifically the url_safe feature which mitigated the attacks shown at BlackHat and Month of AI Bugs.

The latest paper from OpenAI discusses these new mitigation steps in detail.

What Is the New Mitigation

First, it’s not a solved problem, but complexity of attacks has increased.

But at a high level, OpenAI has a crawler and any URL that it encounters is added to an allow-list, meaning ChatGPT will navigate to that URL without issue. However, if ChatGPT tries to dynamically create a new URL, then it would be unsafe.

So, it’s not using the Bing Index (as I had ideated in 2023), but OpenAI now has their own web index and they also use it for URL validation.

Also, if a URL is directly entered as part of a user message, that URL is considered safe for the chat session. At least that has been my understanding with ChatGPT for a while.

How Can It Be Bypassed?

Despite these mitigations, certain bypasses remain viable.

One such bypass trick that I described in 2023 is still applicable, and OpenAI also highlights it in the paper.

The idea is to use individual URLs for each letter.

Now, those URLs would have to be in their search index also. If someone has a larger website or blog there are probably already 36 URLs indexed that can be used for the LLM to do the mapping from A-Z and 0-9. 😈 Someone could also go a lot further, expanding the vocabulary to thousands of URLs and teach the model via indirect prompt injection the mapping.

Leakage of bits of information still seems straightforward, and that remains a risk for certain scenarios.

You can see a basic demo of something similar in my blog post from 2023 which highlights the idea, but with limitations.

Additional Mitigation Ideas

One challenge for attackers that I highlighted back in 2023 was that the browser caches requests and requests might arrive out of order. Both are mitigating factors, but the fact that the AI does connect is still a leakage channel that can be abused.

Two ideas immediately come to mind, and they are both related. The idea is to prevent the agent from visiting the same full URL multiple times in one session. This makes it even more difficult for an attacker.

This could be achieved by:

  1. Caching the response for some time (e.g., even just a few minutes would probably suffice)
  2. Prevent the agent from visiting the same URL multiple times within one session

This would still allow limited data exfiltration, but tricks like using fixed URLs to map characters repeatedly would face additional challenges. I have not tested the feature recently to check if some of these mitigations are in place by the way.

Further advances in research and feature additions or changes might highlight additional exploit techniques in the future.

Final Risk - Thorough Adoption of the Mitigation

The final real-world risk is that not every developer at OpenAI might know about this mitigation and adopt it accordingly. This is a typical engineering challenge for the security organization that I have personally encountered often in the past also. Just because there is a good mitigation, it doesn’t mean that it is actually used.

Also, this mitigation is about the data exfiltration channel, it does not make any claim or statement about indirect prompt injection.

It’s great to see these mitigations being added, and also a paper about it being published by OpenAI. Even though it’s not a silver bullet to fix all leaks, it is a significant improvement.

Happy Hacking.

References


文章来源: https://embracethered.com/blog/posts/2026/data-exfiltration-mitigation-paper-by-openai/
如有侵权请联系:admin#unsafe.sh