Wordlist Engineering & Manipulation

Wordlists are not static artifacts they are _constructed_, _reduced_, and _reshaped_ to match observed human password behavior. This page covers common utilities used to **generate**, **normalize**, **combine**, and **filter** wordlists for downstream use in cracking, analysis, and pattern research. It is worth mentioning here, and everywhere else that I can on the site, but its important to know that: If you are new or doing this for the first time, you are probably going to make mistakes here. **Double check your process and understand what you are trying to create** before you hit the 'generate' button. Learning the hard way here results in protracted generation cycles that **will eat up large chunks of your storage**, probably in TeraBytes. --- ## John the Ripper - Wordlist Generation John can be used as a **candidate generator**, not just a cracker. Using `--stdout`, John emits transformed candidates without attacking hashes. ### Generate candidates that meet complexity filters ```bash john --wordlist=dict.txt --stdout --external:[filter_name] > outfile.txt ``` **Use case** - Enforce enterprise-specific password policies - Pre-generate compliant candidates for Hashcat - Research how policy constraints affect entropy --- ## Stemming (Base-Word Extraction) **Stemming** strips passwords down to their linguistic or structural core (“base word”). This is useful for: - Discovering semantic reuse - Building smarter combinator dictionaries - Separating _meaning_ from _mutation_ ### Extract lowercase alphabetic strings ```bash sed 's/[^a-z]//g' passwords.txt > lowercase.txt ``` ### Extract uppercase alphabetic strings ```bash sed 's/[^A-Z]//g' passwords.txt > uppercase.txt ``` ### Extract all alphabetic strings (any case) ```bash sed 's/[^a-zA-Z]//g' passwords.txt > alpha.txt ``` ### Extract digits only ```bash sed 's/[^0-9]//g' passwords.txt > digits.txt ``` **Research value** - Identify base words reused across multiple mutations - Quantify how often numbers vs words carry meaning - Feed cleaned bases into combinator or hybrid attacks --- ## Combinator Combines multiple dictionaries by appending each word to every other word. ```bash combinator.bin dict1.txt dict2.txt > combined.txt combinator3.bin dict1.txt dict2.txt dict3.txt > combined.txt ``` **Use case** - FirstName + Year - Word + Suffix - OrganizationName + Role **Strength** - Explodes _semantic combinations_ efficiently - Often outperforms brute force when meaning is present --- ## CUTB - Substring Extraction Cuts fixed-length substrings from dictionary entries. ```bash cutb.bin offset length < infile.txt > outfile.txt ``` ### Example: extract first 4 characters ```bash cutb.bin 0 4 < dict.txt > stems.txt ``` **Use case** - Extract prefixes - Study truncation habits - Generate short base words from long candidates --- ## RLI - De-duplication Removes duplicates by comparing one file against others. ```bash rli dict1.txt outfile.txt dict2.txt ``` **Use case** - Prevent redundant candidates - Clean merged dictionaries - Reduce attack keyspace without losing coverage --- ## REQ - Character Class Filtering Filters candidates by required character classes. |Flag|Class| |--:|---| |1|Lowercase| |2|Uppercase| |4|Digits| |8|Special| ### Example: lowercase + uppercase only (1 + 2 = 3) ```bash req.bin 3 < dict.txt ``` **Use case** - Enforce policy compliance - Separate human-chosen passwords from generated noise --- ## COMBIPOW Generates _unique combinations_ of a dictionary ⚠️ Dictionary must be **≤ 64 lines** ```bash combipow.bin dict.txt combipow.bin -1 dict.txt ``` **Use case** - Small, high-signal keyword sets - Passwords built from multiple short tokens --- ## EXPANDER Splits dictionary entries into individual characters (up to 4-char chains). ```bash expander.bin < dict.txt ``` **Use case** - Character-level frequency analysis - Rule generation - Micro-pattern research --- ## LEN - Length Filtering Filters candidates by length. ```bash len.bin <min> <max> < dict.txt ``` ### Example: allow 5–10 characters ```bash len.bin 5 10 < dict.txt ``` **Use case** - Align wordlists to observed length distributions - Reduce waste in brute-force-adjacent attacks --- ## MORPH Automatically generates insertion rules based on frequent character chains. ```bash morph.bin dict.txt depth width pos_min pos_max ``` **Use case** - Discover common mutation patterns - Generate targeted rule sets from real data --- ## PERMUTE Generates permutations using the _Countdown QuickPerm Algorithm_. ```bash permute.bin < dict.txt ``` **Use case** - Passwords built from rearranged tokens - Low-entropy but non-obvious constructions --- ## CRUNCH - Wordlist Generator Generates all combinations from a defined charset. ```bash crunch <min> <max> <charset> -o outfile.txt ``` ### Example: 8-char hex strings ```bash crunch 8 8 0123456789ABCDEF -o hex8.txt ``` **Use case** - PINs, hex tokens, constrained formats - Research into purely synthetic password spaces --- ## Key Takeaway (Hashtopia Principle) > **Wordlists are hypotheses.** > The better your hypothesis about human behavior, the smaller, and more effective, your candidate space becomes. This tooling exists to **shape probability**, not to exhaust possibility :) --- #sudad #wordlists [[Wordlists and Dictionaries]] [[Home]]