I have been working on SovereignShield, a security layer that sits between user input and your LLM. Instead of using another model to judge if input is safe, it uses pure pattern matching against a structured ruleset. Fully deterministic: same input, same result, every time. Sub-millisecond latency.
Why I built it: Every AI agent I have seen trusts user input by default. LLM-based safety filters are probabilistic, meaning they can be bypassed with creative encoding, context manipulation, or just trying enough times. I wanted something that gives a hard yes/no using math, not guessing.
What it blocks:
Prompt injection and jailbreaks
Encoded payloads (base64, hex, unicode obfuscation)
Shell execution (os.system, subprocess, rm -rf)
Credential exfiltration via URL parameters
SQL injection, XSS, path traversal, reverse shells
50+ attack categories total
Architecture: 4 layers run in sequence: InputFilter (pattern matching), Firewall (rate limiting), CoreSafety (action-level blocking), and Conscience (ethical gate). Every security verdict is returned as a frozen dataclass inside a locked namespace, making it physically impossible for downstream code to override a BLOCK decision at runtime.
Self-improving: There is an adaptive engine where you report new attacks via the API. The system extracts detection keywords, sandbox-tests them against your historical scan data for false positives, and auto-deploys rules that pass validation.
Integration:
pip install sovereign-shield-client
from sovereign_shield_client import SovereignShield
shield = SovereignShield(api_key="ss_your_key")
safe = shield.scan(user_input)
Free tier: 1,000 scans/month (no credit card). Pro: 100,000 scans/month for $8/mo.
Site: https://sovereign-shield.net
GitHub (BSL 1.1): https://github.com/mattijsmoens/sovereign-shield