trickest/wordlists: Real-world infosec wordlists, updated regularly

Real-world infosec wordlists, updated regularly

Current Wordlists

Technologies

These wordlists are based on the source code of the CMSes/servers/frameworks listed here. The current wordlists include:

Wordpress
Joomla
Drupal
Magento
Ghost
Tomcat

There are 2 versions of each wordlist:

Base (example tomcat.txt): Lists the full paths of each file in the repository

webapps/examples/WEB-INF/classes/websocket/echo/servers.json

All levels (example tomcat-all-levels.txt): Includes all directory levels of the files in the base wordlist - if you have tried dsieve, this is going to look familiar! This wordlist will be larger than the base wordlist but it accounts for cases where the directory structure of the repository isn't mapped perfectly on the target.

webapps/examples/WEB-INF/classes/websocket/echo/servers.json
examples/WEB-INF/classes/websocket/echo/servers.json
WEB-INF/classes/websocket/echo/servers.json
websocket/echo/servers.json
echo/servers.json
servers.json

Inspired by Daniel Miessler's RobotsDisallowed project, these wordlists contain the robots.txt Allow and Disallow paths in the top 100, top 1000, and top 10000 websites according to Domcop's Open PageRank dataset.

Inventory Subdomains

This wordlist contains the subdomains found for each target on the Inventory project. It consists of 1.4 million words generated from the subdomains of over 50 public bug bounty programs.

Cloud Subdomains

This wordlist contains the subdomains found through enumerating cloud assets. It consists of 940k words generated from the subdomains extracted from the Common Names and Subject Alternative Names of over 7 million SSL certificates.

And more wordlists to come!

How it Works

Technologies

A Trickest workflow clones the repositories in technology-repositories.json, lists the paths of all their files, removes non-interesting files, generates combinations, and pushes the wordlists to this repository.

Robots

Another Trickest workflow gets the top 100, 1000, and 1000 websites from Domcop's Open PageRank dataset, uses meg to fetch their robots.txt files (Thanks, @tomnomnom!), removes irrelevant entries, cleans up the paths, and pushes the wordlists to this repository.

Contribution

All contributions/suggestions/questions are welcome! Feel free to create a new ticket via GitHub issues, tweet at us @trick3st, or join the conversation on Discord.

Build your own workflows!

We believe in the value of tinkering. Sign up for a demo on trickest.com to customize this workflow to your use case, get access to many more workflows, or build your own from scratch!