Building An Offensive Security AI Agent - Part 2
文章介绍了一种利用AI构建进攻性安全代理的方法,该代理能够智能地搜索和验证API端点,并分析其请求要求。通过结合LLM模型和多种工具,代理能够从JS文件中提取API信息,并生成有效的HTTP请求进行验证。实验结果表明该代理能够准确识别复杂的API结构及其相关要求。 2025-9-24 23:54:19 Author: infosecwriteups.com(查看原文) 阅读量:11 收藏

Building an offensive security agent that discovers API endpoints and their requirements.

OTR

Press enter or click to view image in full size

Offensive Security AI Agent

In my previous post, I discussed building a proof-of-concept agent that performs basic pentester functionality. Specifically, I wanted to see if I could build an agent that could find API endpoints from a JS file, and then determine whether those endpoints returned sensitive data.

In this post, I will go over how I improved my agent to be slightly more intelligent. For this iteration, I will focus specifically on searching and validating API endpoints. I removed the functionality for determining if an endpoint has sensitive data since I’ve already proven that capability. In the future, I will add this all together, but for now, I want to see how intelligent AI can be at finding API endpoints.

Updated Server & APIs

I want to see how well an LLM could find API endpoints that have different requirements. This is common in how web applications work today. APIs often use different methods, authentication headers, and parameters. To replicate this, I created the following endpoints:

  • /profile — GET/POST (unauthenticated)
  • /admin — GET/POST (authenticated — requires header and secret)
  • /users — GET (unauthenticated — requires “email” parameter)

Unlike the last agent, this combines different HTTP methods, required parameters and headers.

@app.route(f"{API_PREFIX}/profile", methods=["GET"])
def profile_get():
return json_success()

@app.route(f"{API_PREFIX}/profile", methods=["POST"])
def profile_post():
return json_success()

@app.route(f"{API_PREFIX}/admin", methods=["GET"])
@require_header("X-Admin-Secret")
def admin_get():
return json_success()

@app.route(f"{API_PREFIX}/admin", methods=["POST"])
@require_header("X-Admin-Secret")
def admin_post():
return json_success()

@app.route(f"{API_PREFIX}/users", methods=["GET"])
def users_get():
email = request.args.get("email")
if not email:
return jsonify({"msg": "missing email parameter"}), 400
return jsonify(SUCCESS)

Agent Tools

After creating my test application, I needed to create some new tools for my ReAct agent to use. Before creating the tools, I had to think about the workflow an agent should take

  1. Given a url, find all JS resources on the page
  2. For all JS resources on the page, find API endpoints
  3. Of all those API endpoints, determine if there are any requirements necessary to make a successful request
  4. Validate your assumptions by making an HTTP request. If you do not get a valid request, reassess the JS again. Continue this process until a valid request is made.

Given this workflow, I created the following tools:

  • find_script_sources_tool — Find JS on web pages
  • find_request_requirements_tool — Searches requirements for an API endpoint from the JS
  • verify_endpoint_tool — verify the endpoint by generating curl requests

I also modified the preexisting find_endpoints_tool tool to use an LLM instead of a regex. While a regex in my previous agent was great for a PoC, regex can often be inaccurate since APIs can be defined a many number of ways. My previous regex would have failed in this example because the string /api/v1 was abstracted away into a variable to remove redundancy. However, a human or LLM, would be able to notice such nuances and pick up on this.

find_script_sources_tool

This tool uses the built-in python HTMLParser to find javascript sources on an HTML page. There is no AI used here, although in the future, we may also want to use another tool that detects embedded JS using an LLM.

class ScriptSrcParser(HTMLParser):
"""
Script source parser for HTML documents.
"""
def __init__(self):
super().__init__()
self.script_srcs = []

def handle_starttag(self, tag, attrs):
if tag == "script":
for attr, value in attrs:
if attr == "src":
self.script_srcs.append(value)

def find_script_sources_tool(url: str) -> list[str]:
"""
Find all script sources in the HTML page.

Args:
url (str): The URL of the HTML page to analyze.
"""
log(f"Fetching HTML page from {url} to find script sources")
resp = requests.get(url)
if resp.status_code == 200:
parser = ScriptSrcParser()
parser.feed(resp.text)
log(f"Found script sources: {parser.script_srcs}")
return parser.script_srcs
return []

find_endpoints_tool

This tool is the modified version from the previous agent. It uses OpenAI’s GPT-5 model to search for API endpoints. I instruct it to be aware of dynamic URL construction, which is typical when JS is minified or from frameworks.

def find_endpoints_tool(url: str):
"""
Find all API endpoints in the JavaScript code.

Args:
url (str): The URL of the JavaScript file to analyze.
"""
log(f"Fetching JavaScript file from {url} to find endpoints.")
resp = requests.get(url)
if resp.status_code == 200:
log("Finding endpoints from JS file")
model = ChatOpenAI(model="gpt-5", temperature=0)
prompt = f"""
You are a security researcher analyzing javascript code for API endpoints.
Your task is to identify all API endpoints in the following javascript code.

Consider the following points:
1. Look for common patterns in API endpoint definitions, such as `fetch` calls or `XMLHttpRequest` usage.
2. Pay attention to any dynamic URL construction that may not be immediately obvious.

JavaScript code:
```
{resp.text}

ALWAYS return the endpoints as a list of strings(DO NOT use markdown formatting).
```"""
res = model.invoke(prompt).content
log(f"Found endpoints: {res}")
return res

find_request_requirements_tool

This tool uses openAI’s GPT-5 model and prompts it to inspect the JS file and search for any requirements necessary to make a valid request. Then, return a json object in the format that lists the endpoint, method used, and what requirements it believes it requires.

def find_request_requirements_tool(endpoint: str, url: str):
"""
Discover the request requirements for each API endpoint in the JavaScript code.

Args:
endpoints (str): The API endpoints to analyze.
url (str): The URL of the JavaScript file.
"""
log(f"Fetching JavaScript file from {url} to analyze endpoint {endpoint} for requirements")
resp = requests.get(url)

if resp.status_code == 200:
model = ChatOpenAI(model="gpt-5", temperature=0)
prompt = f"""
You are a security researcher analyzing javascript code for any requirements necessary to make a successful call to the {endpoint} API url.
Your task is to identify all required parameters, headers, and other relevant information that is necessary to make a successful request.

Javascript Code:
```
{resp.text}
```

Response should be in a JSON object format:
```
{{
"endpoint": "<API endpoint>",
"method": "<HTTP method>",
"reason": "<justification for why this is necessary such as headers or parameters>"
}}
ALWAYS return the endpoints as a JSON object (DO NOT use markdown formatting).
"""
res = model.invoke(prompt).content
log(f"Found requirements: {res}")
return res

verify_endpoint_tool

Finally, this tool takes in the request details found from the find_request_requirements_tool tool and generates a curl request using that information. It then executes that command and returns the output. The ReAct agent will determine if the command was a success based on the prompt I gave it, which is if it returns a 200 or 201 — which is common for APIs.

Note: For the purpose of this exercise, it generates and executes a curl command. This is dangerous and there are methods to ensure an LLM does not go rogue and run destructive commands.

def verify_endpoint_tool(request_details: str) -> str:
"""
Verify if the endpoint is valid by making a test request.
Args:
request (str): The request details in JSON format.
"""

model = ChatOpenAI(model="gpt-5", temperature=0)
prompt = f"""

You are a security researcher searching for valid API endpoints.
Your task is to identify which endpoints are valid by generating a test request using the curl command.
You must take into account any context in the request details such as required headers, parameters, cookies, or any other relevant fields.
ALWAYS use the following flags:
- -s silent
- -L follow redirects
- -i include headers in output

REQUEST DETAILS:
```
{request_details}
```
ONLY generate a curl command that can be used to test the endpoint. DO NOT use markdown or include any other explaination.
"""

res = model.invoke(prompt).content
log(f"Generated curl command: {res}")
try:
# Dangerous but this is for research/demo purposes - use a validator
completed = subprocess.run(res, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output = completed.stdout.decode() + completed.stderr.decode()
log(f"Curl command output: {output}")
return output
except Exception as e:
log(f"Error executing curl command: {e}")
return ""

If you’re interested in the full code, you can follow along in my github repo here: https://github.com/offftherecord/offsec-agents.

Execution

Now that I built the test server and provided new tools to my agent, it was time to run and determine the results.

First I’ll start the test server API by running.

Press enter or click to view image in full size

Running server.py API server

Next, I’ll run my agent. Once run, we can see my debug messages help display what’s the agent is doing.

Press enter or click to view image in full size

Running the agent.py file

To my surprise, it found all of the endpoints on the first shot! I ran it a few more times to see if the results would be consistent and it found all of them every time. Not only did it find the headers, methods, and parameters, it also reported the content-type and body request to be used. Some APIs will fail if the content-type header is not used and json is not in the body, so this was nice to know this was determined.

Below is the output of the agent, minus the debug statements.

Press enter or click to view image in full size

Highlighted agent output

If we switch back to the server, we can see the HTTP requests that were made by the agent.

Press enter or click to view image in full size

Server HTTP debug messages

Below is a video of the agent running to completion. You can see the entire output, which helps give us an idea of the type of information the agent is receiving and acting on.

Video of agent running

Conclusion

I continue to be impressed with the ease of using Langgraph and a ReAct agent to achieve common pentesting goals. While this was doing “recon” for API endpoints, I can see this agent being used in workflows or in more complicated agent-to-agent architecture — which I plan on eventually exploring.

Prompts continue to be an important piece when working with AI. I did not use ReAct prompt techniques, but I did attempt to be very verbose with my prompts for all tools. I also noticed improvement in logic and reasoning when using GPT-5. I have not specifically found a need to use ReAct prompting, but I may need to test that in the future.

For my upcoming iteration, I want to really push this agent to the limit and use even more complicated API structures and requirements. I will be introducing different JS frameworks, and using popular tools that may “compile” or minify JS. This is typically where it takes a human pentester the most amount of time to reverse engineer.

Thanks for reading this! If you want to show some love, please take a moment and clap and follow. This motivates me to continue to share my research! If you want to hear more from me, you can also follow me on Twitter!


文章来源: https://infosecwriteups.com/building-an-offensive-security-ai-agent-part-2-d3fa197c4d20?source=rss----7b722bfd1b8d--bug_bounty
如有侵权请联系:admin#unsafe.sh