Press enter or click to view image in full size
In my previous post, I discussed building a proof-of-concept agent that performs basic pentester functionality. Specifically, I wanted to see if I could build an agent that could find API endpoints from a JS file, and then determine whether those endpoints returned sensitive data.
In this post, I will go over how I improved my agent to be slightly more intelligent. For this iteration, I will focus specifically on searching and validating API endpoints. I removed the functionality for determining if an endpoint has sensitive data since I’ve already proven that capability. In the future, I will add this all together, but for now, I want to see how intelligent AI can be at finding API endpoints.
I want to see how well an LLM could find API endpoints that have different requirements. This is common in how web applications work today. APIs often use different methods, authentication headers, and parameters. To replicate this, I created the following endpoints:
Unlike the last agent, this combines different HTTP methods, required parameters and headers.
@app.route(f"{API_PREFIX}/profile", methods=["GET"])
def profile_get():
return json_success()@app.route(f"{API_PREFIX}/profile", methods=["POST"])
def profile_post():
return json_success()
@app.route(f"{API_PREFIX}/admin", methods=["GET"])
@require_header("X-Admin-Secret")
def admin_get():
return json_success()
@app.route(f"{API_PREFIX}/admin", methods=["POST"])
@require_header("X-Admin-Secret")
def admin_post():
return json_success()
@app.route(f"{API_PREFIX}/users", methods=["GET"])
def users_get():
email = request.args.get("email")
if not email:
return jsonify({"msg": "missing email parameter"}), 400
return jsonify(SUCCESS)
After creating my test application, I needed to create some new tools for my ReAct agent to use. Before creating the tools, I had to think about the workflow an agent should take
Given this workflow, I created the following tools:
I also modified the preexisting find_endpoints_tool tool to use an LLM instead of a regex. While a regex in my previous agent was great for a PoC, regex can often be inaccurate since APIs can be defined a many number of ways. My previous regex would have failed in this example because the string /api/v1 was abstracted away into a variable to remove redundancy. However, a human or LLM, would be able to notice such nuances and pick up on this.
find_script_sources_tool
This tool uses the built-in python HTMLParser to find javascript sources on an HTML page. There is no AI used here, although in the future, we may also want to use another tool that detects embedded JS using an LLM.
class ScriptSrcParser(HTMLParser):
"""
Script source parser for HTML documents.
"""
def __init__(self):
super().__init__()
self.script_srcs = [] def handle_starttag(self, tag, attrs):
if tag == "script":
for attr, value in attrs:
if attr == "src":
self.script_srcs.append(value)
def find_script_sources_tool(url: str) -> list[str]:
"""
Find all script sources in the HTML page.
Args:
url (str): The URL of the HTML page to analyze.
"""
log(f"Fetching HTML page from {url} to find script sources")
resp = requests.get(url)
if resp.status_code == 200:
parser = ScriptSrcParser()
parser.feed(resp.text)
log(f"Found script sources: {parser.script_srcs}")
return parser.script_srcs
return []
find_endpoints_tool
This tool is the modified version from the previous agent. It uses OpenAI’s GPT-5 model to search for API endpoints. I instruct it to be aware of dynamic URL construction, which is typical when JS is minified or from frameworks.
def find_endpoints_tool(url: str):
"""
Find all API endpoints in the JavaScript code. Args:
url (str): The URL of the JavaScript file to analyze.
"""
log(f"Fetching JavaScript file from {url} to find endpoints.")
resp = requests.get(url)
if resp.status_code == 200:
log("Finding endpoints from JS file")
model = ChatOpenAI(model="gpt-5", temperature=0)
prompt = f"""
You are a security researcher analyzing javascript code for API endpoints.
Your task is to identify all API endpoints in the following javascript code.
Consider the following points:
1. Look for common patterns in API endpoint definitions, such as `fetch` calls or `XMLHttpRequest` usage.
2. Pay attention to any dynamic URL construction that may not be immediately obvious.
JavaScript code:
```
{resp.text}
ALWAYS return the endpoints as a list of strings(DO NOT use markdown formatting).
```"""
res = model.invoke(prompt).content
log(f"Found endpoints: {res}")
return res
find_request_requirements_tool
This tool uses openAI’s GPT-5 model and prompts it to inspect the JS file and search for any requirements necessary to make a valid request. Then, return a json object in the format that lists the endpoint, method used, and what requirements it believes it requires.
def find_request_requirements_tool(endpoint: str, url: str):
"""
Discover the request requirements for each API endpoint in the JavaScript code. Args:
endpoints (str): The API endpoints to analyze.
url (str): The URL of the JavaScript file.
"""
log(f"Fetching JavaScript file from {url} to analyze endpoint {endpoint} for requirements")
resp = requests.get(url)
if resp.status_code == 200:
model = ChatOpenAI(model="gpt-5", temperature=0)
prompt = f"""
You are a security researcher analyzing javascript code for any requirements necessary to make a successful call to the {endpoint} API url.
Your task is to identify all required parameters, headers, and other relevant information that is necessary to make a successful request.
Javascript Code:
```
{resp.text}
```
Response should be in a JSON object format:
```
{{
"endpoint": "<API endpoint>",
"method": "<HTTP method>",
"reason": "<justification for why this is necessary such as headers or parameters>"
}}
ALWAYS return the endpoints as a JSON object (DO NOT use markdown formatting).
"""
res = model.invoke(prompt).content
log(f"Found requirements: {res}")
return res
verify_endpoint_tool
Finally, this tool takes in the request details found from the find_request_requirements_tool tool and generates a curl request using that information. It then executes that command and returns the output. The ReAct agent will determine if the command was a success based on the prompt I gave it, which is if it returns a 200 or 201 — which is common for APIs.
Note: For the purpose of this exercise, it generates and executes a curl command. This is dangerous and there are methods to ensure an LLM does not go rogue and run destructive commands.
def verify_endpoint_tool(request_details: str) -> str:
"""
Verify if the endpoint is valid by making a test request.
Args:
request (str): The request details in JSON format.
""" model = ChatOpenAI(model="gpt-5", temperature=0)
prompt = f"""
You are a security researcher searching for valid API endpoints.
Your task is to identify which endpoints are valid by generating a test request using the curl command.
You must take into account any context in the request details such as required headers, parameters, cookies, or any other relevant fields.
ALWAYS use the following flags:
- -s silent
- -L follow redirects
- -i include headers in output
REQUEST DETAILS:
```
{request_details}
```
ONLY generate a curl command that can be used to test the endpoint. DO NOT use markdown or include any other explaination.
"""
res = model.invoke(prompt).content
log(f"Generated curl command: {res}")
try:
# Dangerous but this is for research/demo purposes - use a validator
completed = subprocess.run(res, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output = completed.stdout.decode() + completed.stderr.decode()
log(f"Curl command output: {output}")
return output
except Exception as e:
log(f"Error executing curl command: {e}")
return ""
If you’re interested in the full code, you can follow along in my github repo here: https://github.com/offftherecord/offsec-agents.
Now that I built the test server and provided new tools to my agent, it was time to run and determine the results.
First I’ll start the test server API by running.
Press enter or click to view image in full size
Next, I’ll run my agent. Once run, we can see my debug messages help display what’s the agent is doing.
Press enter or click to view image in full size
To my surprise, it found all of the endpoints on the first shot! I ran it a few more times to see if the results would be consistent and it found all of them every time. Not only did it find the headers, methods, and parameters, it also reported the content-type and body request to be used. Some APIs will fail if the content-type header is not used and json is not in the body, so this was nice to know this was determined.
Below is the output of the agent, minus the debug statements.
Press enter or click to view image in full size
If we switch back to the server, we can see the HTTP requests that were made by the agent.
Press enter or click to view image in full size
Below is a video of the agent running to completion. You can see the entire output, which helps give us an idea of the type of information the agent is receiving and acting on.
I continue to be impressed with the ease of using Langgraph and a ReAct agent to achieve common pentesting goals. While this was doing “recon” for API endpoints, I can see this agent being used in workflows or in more complicated agent-to-agent architecture — which I plan on eventually exploring.
Prompts continue to be an important piece when working with AI. I did not use ReAct prompt techniques, but I did attempt to be very verbose with my prompts for all tools. I also noticed improvement in logic and reasoning when using GPT-5. I have not specifically found a need to use ReAct prompting, but I may need to test that in the future.
For my upcoming iteration, I want to really push this agent to the limit and use even more complicated API structures and requirements. I will be introducing different JS frameworks, and using popular tools that may “compile” or minify JS. This is typically where it takes a human pentester the most amount of time to reverse engineer.
Thanks for reading this! If you want to show some love, please take a moment and clap and follow. This motivates me to continue to share my research! If you want to hear more from me, you can also follow me on Twitter!