LLAMATOR – Red Team Framework for Testing LLM Security
LLAMATOR 是一个用于评估大型语言模型系统安全性的 Python 框架,支持可重复攻击活动和多角色测试(攻击者、目标、裁判),并提供预设攻击库、接口适配器和标准化报告功能。 2025-9-16 10:0:7 Author: www.darknet.org.uk(查看原文) 阅读量:12 收藏

LLAMATOR is a Python framework that helps offensive security teams evaluate the real-world security of large language model systems. It focuses on repeatable campaigns rather than one-off prompts. You define three roles for each test run: an attack model that crafts adversarial inputs, a tested model that represents the target application, and a judge model that scores responses. LLAMATOR includes adapters for OpenAI-compatible REST endpoints and popular orchestration stacks, allowing teams to test the same interfaces used in production. It also produces artefacts for stakeholders, including structured logs and documents suitable for audit and remediation tracking.

LLAMATOR - Red Team Framework for Testing LLM Security

Features

  • Preset attack library. Curated single-stage and multi-stage tests that exercise common LLM risks such as prompt injection, system prompt leakage, unsafe tool invocation, and retrieval contamination.
  • Multi-client adapters. Clients for OpenAI-compatible REST APIs and LangChain-style integrations. The pattern supports drop-in use with local servers and commercial APIs.
  • Three-model test harness. Separate roles for attacker, target, and judge to score outcomes in a structured way.
  • Artefacts and reports. Options to log to CSV or Excel and to generate a DOCX-style report so findings can be consumed by leadership and tracked to closure.
  • Examples and notebooks. Quick starts that demonstrate local testing with desktop servers as well as code-driven campaigns.

Installation

Install the published package from PyPI.

<code>pip install llamator==3.3.0</code>

Pinning to a specific version keeps tests reproducible across team members.

Usage

LLAMATOR has several helper functions and preset configurations.

print_test_preset

Prints example configuration for presets to the console.

Available presets: all, eng, llm, owasp:llm01, owasp:llm07, owasp:llm09, rus, vlm

Usage:

from llamator import print_test_preset

# Print configuration for all available tests

print_test_preset("all")

Attack Scenario

Target. A customer support chatbot with Retrieval Augmented Generation that answers order and account questions from a private knowledge base. The team mirrors production locally against a compatible REST server to avoid testing in the live environment.

Approach. The red team defines the three roles and selects attack presets that map to standard failure modes in RAG systems. The set includes prompt injection that attempts data exfiltration, system prompt disclosure to extract hidden instructions, and base64 payloads intended to contaminate retrieval. Logging and reports are enabled to capture every exchange.

Outcome. Several injections bypass content policy when the target model is presented with blended support tickets that contain crafted instructions. The judge model flags responses that leak system prompt fragments and returns a score with evidence. The team hands off the DOCX report and CSV log that link each failure to system prompts and retrieval parameters, with recommendations for stronger instruction hierarchies, stricter tool invocation rules, and sanitisation of retrieved context.

Red Team Relevance

LLM security testing needs repeatability. LLAMATOR’s three-role harness, preset attack library, and reporting close the gap between clever prompt screenshots and a defensible testing program. Client adapters allow teams to exercise the same endpoints, chains, and agents that power production. Artefacts in standard formats make it straightforward to track remediation and to re-run identical campaigns after changes to prompts, safety filters, or retrieval logic.

Conclusion

LLAMATOR provides red teams with a structured approach to testing chatbots, agents, RAG pipelines, and related systems. It installs from PyPI, integrates with OpenAI-style endpoints, ships curated attacks, and produces evidence suitable for audits. If you are building a formal adversarial evaluation program for LLM applications, LLAMATOR is a practical starting point.

You can read more or download LLAMATOR here: https://github.com/LLAMATOR-Core/llamator


文章来源: https://www.darknet.org.uk/2025/09/llamator-red-team-framework-for-testing-llm-security/
如有侵权请联系:admin#unsafe.sh