TL;DR: Julius v0.2.0 nearly doubles LLM fingerprinting probe coverage from 33 to 63, adding detection for cloud-managed AI services (AWS Bedrock, Azure OpenAI, Vertex AI), high-performance inference servers (SGLang, TensorRT-LLM, Triton), AI gateways (Portkey, Helicone, Bifrost), and self-hosted RAG platforms (PrivateGPT, RAGFlow, Quivr). This release also hardens the scanner itself with response size limiting and TLS configuration for enterprise environments. Update Julius and scan your network — you almost certainly have AI infrastructure you don’t know about.
When we shipped the v0.1.1 update back in February, Julius could detect 33 LLM services. That covered the self-hosted basics (Ollama, vLLM, llama.cpp) and a growing list of orchestration tools. But the gap was obvious: we had almost no coverage for cloud-managed AI services, production inference servers, or the AI gateway layer that sits between applications and models.
That gap is now closed. Julius v0.2.0 ships with 63 probes, adding 30 new detections in a single release. More importantly, the types of infrastructure we now detect reflect where enterprise AI deployments are actually heading: cloud-managed endpoints, high-throughput inference engines, and the growing ecosystem of proxies and gateways that route traffic between them.
This is the biggest category and the one we’ve been asked about most. Organizations deploying AI through their cloud provider often assume these endpoints are inherently private. They’re not — misconfigured API gateways, exposed proxy layers, and overly permissive network policies can put them on the open internet.
/foundation-models and /model/{modelId}/converseThese are the workhorses of production AI: high-performance inference engines that teams deploy for throughput, latency, or cost reasons. They tend to run with default configurations and minimal authentication.
/server_info endpoint exposing mem_fraction_static and disaggregation_mode fieldsThe gateway layer is where organizations route, observe, and control traffic between their applications and LLM providers. An exposed gateway often means access to every model and API key behind it.
Self-hosted RAG platforms are where things get particularly sensitive. These systems are purpose-built to ingest and query internal documents — contracts, HR policies, financial data, source code. An exposed RAG endpoint is, by definition, an exposed document store.
/v1/ingest/list endpoint, which returns data even with zero ingested documents and auth disabled by default)The OpenClaw story from our last update highlighted what happens when AI agent platforms get exposed: leaked API keys, filesystem access, and user impersonation. With this release, we’re seeing the same pattern play out with RAG platforms — except the stakes are different. Instead of agent credentials, you’re looking at the documents themselves.
PrivateGPT is a good example. The entire value proposition is “keep your documents private by running everything locally.” The irony is that PrivateGPT’s API defaults to no authentication. Its /v1/ingest/list endpoint is a simple GET that returns every ingested document’s metadata, including filenames and chunk counts. The model field is hardcoded to "private-gpt", which makes detection trivial and false positives near-zero.
RAGFlow follows a similar pattern. Its /v1/system/healthz endpoint is unauthenticated and returns a JSON health check with a doc_engine field that’s unique to RAGFlow — it tracks the status of the Elasticsearch or Infinity backend that powers document retrieval. Even when RAGFlow is partially broken (HTTP 500), the health endpoint still responds with the same structure, making detection reliable in any state.
The problem isn’t that these tools are insecure by design. It’s that they’re easy to deploy, they serve an obvious need (“let me ask questions about our internal docs”), and teams spin them up without involving security. By the time anyone notices, the system has been indexing sensitive documents on an endpoint with no auth, no network restriction, and no monitoring.
This is shadow IT for the AI era, and it’s why discovery tooling matters.
Beyond new probes, v0.2.0 includes changes to the scanner itself:
Breaking API change: scanner.NewScanner() now requires two additional parameters — maxResponseSize and tlsConfig. If you’re using Julius as a library, see the migration guide in the changelog.
New CLI flags:
--max-response-size — Limits response body size (default 10MB) to prevent memory exhaustion from large or malicious responses--insecure — Skips TLS certificate verification for testing environments--ca-cert — Specifies a custom CA certificate file for enterprise PKI environmentsProbe quality fixes:
"families" field in /api/tags responsesheader.contains rules that silently failed on HTTP/2 connections — this affected 5 cloud probes (AWS Bedrock, Cloudflare AI Gateway, Fireworks AI, Modal, OmniRoute)If you’re running Julius as part of your attack surface discovery workflow, update to v0.2.0:
$ go install github.com/praetorian-inc/julius/cmd/julius@latest
$ julius probe
For enterprise environments with internal CAs:
$ julius probe --ca-cert /path/to/ca.pem
All 63 probes are embedded in the binary. No external config, no probe downloads, no API keys.
The coverage now spans the full AI infrastructure stack: from cloud-managed inference (Bedrock, Azure OpenAI, Vertex AI) through self-hosted serving (SGLang, TensorRT-LLM, Triton) to the RAG and orchestration layer (PrivateGPT, RAGFlow, Langflow). If an organization is running AI infrastructure, Julius should find it.
We’re continuing to expand probe coverage as new tools emerge. If there’s a service you’re seeing in the wild that Julius doesn’t cover, open an issue or submit a PR. Probes are simple YAML files — you can test locally with julius validate ./probes before submitting.
What’s the difference between Julius and model fingerprinting tools? Model fingerprinting identifies which LLM generated a piece of text. Julius identifies the server infrastructure: what software is running on the endpoint. Think of it as service detection for AI, similar to what Nmap does for traditional services.
Does Julius send anything malicious? No. Julius sends standard HTTP requests (GET/POST to known paths) and analyzes the responses. It doesn’t exploit vulnerabilities, submit prompts, or modify anything on the target. It’s passive fingerprinting.
How do probes get validated before release? Every probe is tested against live instances of the target service and cross-tested against other LLM services to confirm zero false positives. This release also fixed several cross-probe false positives from v0.1.x.
Can I add detection for a service Julius doesn’t support yet? Yes. Probes are defined in simple YAML files. The contributing guide walks through the format, and you can test locally with julius validate ./probes before submitting a PR.
Why is there a breaking API change? The NewScanner() signature now requires maxResponseSize and tlsConfig parameters. This was necessary to add response size limiting (preventing OOM from malicious servers) and TLS configuration for enterprise environments. If you’re only using the CLI, nothing changes.
The post Julius v0.2.0: From 33 to 63 Probes — Now Detecting Cloud AI, Enterprise Inference, and RAG Pipelines appeared first on Praetorian.
*** This is a Security Bloggers Network syndicated blog from Offensive Security Blog: Latest Trends in Hacking | Praetorian authored by Michelle Rhodes. Read the original post at: https://www.praetorian.com/blog/julius-v020-cloud-ai-rag-detection/