LLM08: Vector & Embedding Weaknesses

Nov 07, 2025 – – In 2025, with the rise of AI, we’ve seen a parallel rise in cyber risks. The OWASP Top 10 for LLM helps us categorize and understand the biggest risks we are seeing in today’s landscape. In previous blogs, we’ve gone over risks 1-7. Today, we’re covering #8: Vector and Embedding Weaknesses.‍Vector and embedding weaknesses primarily affect programs that use Retrieval Augmented Generation, or RAG, with LLMs. RAG uses vector databases and embedding to combine pre-trained LLMs with external information sources. But when these vectors are not secure, the entire system is put at risk.Some common examples of this risk include:Unauthorized access- misconfigured vectors and embeddings can lead to data breachesCross-context information leaks- when multiple users share the same vector database, there is a risk of context leakage between users or queriesFederation knowledge conflict- this occurs when data from multiple sources contradict each other (for instance, old information the LLM was trained on does not match with new data from RAG, or two RAG sources contain different information for the same data point, as an example)Embedding Inversion Attacks- attackers can invert or access embeddings via prompt injections or manipulation to retrieve sensitive informationData Poisoning Attacks- similar to what we’ve discussed with other vulnerabilities, bad actors can poison data to produce undesired outputs.Behavior Alteration- the model may behave differently than it was trained due to new information obtained from the RAG‍Mitigation techniques include:Secure permissions and access control: security teams should always implement tight controls and permission-aware vector and embedding stores, as well as dividing datasets in the vector database to prevent cross-context information leaks.Data validation/source authentication: teams should enforce robust data validation pipelines and regularly audit them to validate the integrity of knowledge sources so the LLM can only accept data from trusted sources.Review data for combination and classification: especially when combining data from multiple sources, it is critical that teams thoroughly review and classify data to prevent mismatch errors.Monitoring and logging: maintain detailed logs of activity monitored across the landscape to swiftly respond to incidents.‍Hopefully, a lot of these practices are already a part of your AI security posture, or even part of your data security and governance practices. But keeping up with AI security in a constantly evolving environment is a task that grows more difficult by the day. FireTail attempts to help you simplify these steps by cutting out the middle man.Want to learn how it works? Schedule a free, 30-minute demo with us, today!

*** This is a Security Bloggers Network syndicated blog from FireTail - AI and API Security Blog authored by FireTail - AI and API Security Blog. Read the original post at: https://www.firetail.ai/blog/llm08-vector-embedding-weaknesses