Rule Harvesting Methodology
**Raking** is an iterative analysis technique used to **discover which rules actually work** against a given dataset by observing real cracking outcomes instead of guessing in advance.
Rather than treating rules as a static asset, raking treats them as **experimental variables** that can be measured, refined, and re-used.
This methodology is most commonly applied with **Hashcat**, but the conceptual approach is portable to other cracking frameworks.
---
## **What Raking Is (Conceptually)**
At its core, raking answers a simple question:
> _Which base words + transformations are actually producing results in this dataset?_
Raking does this by:
- Running **large numbers of automatically generated rules**
- Capturing **successful rule applications**
- Extracting:
- Base words
- Rules that worked
- Final transformed passwords
- Feeding those results back into future attacks
Over time, this creates **dataset-specific intelligence** instead of relying on generic rule packs.
---
## **Why Raking Works**
Traditional rule attacks assume:
- You know which rules matter
- You know which base words matter
- You know the right order to apply them
Raking flips this around:
- Let the attack run
- Observe what succeeds
- Promote _empirically proven_ rules and words
This is especially effective against:
- Large, noisy datasets
- Mixed password policies
- Environments with user-specific behavior
- Long-running cracking campaigns
---
## **High-Level Workflow**
1. **Run a large automated rule attack**
2. **Log every successful transformation**
3. **Extract components from debug output**
4. **Analyze frequency and effectiveness**
5. **Re-apply the best performers**
Each loop improves signal quality and reduces wasted keyspace.
---
## **Step-by-Step Raking Process**
### **Step 1: Initial Raking Pass**
Run a fast hash type with:
- Automatic rule generation (-g)
- Debug logging enabled
```
hashcat -a 0 -m #type -w 3 hash.txt wordlists/* \
-g 100000 \
--debug-mode=4 \
--debug-file=nodename.debug
```
**What this does:**
- Loops across multiple wordlists
- Generates rules on the fly
- Logs every successful guess with full context
This step is intentionally **broad and noisy**.
---
### **Step 2: Extract Base Words**
```
cut -d: -f1 < nodename.debug > nodename.base
```
**Result:**
- A list of dictionary inputs that actually mattered
- Often much smaller than the original wordlists
- Strong candidates for:
- Re-use
- Targeted dictionaries
- PRINCE or combinator inputs
---
### **Step 3: Extract Successful Rules**
```
cut -d: -f2 < nodename.debug > nodename.rule
```
**Result:**
- A dataset-specific rule list
- Reflects _real_ user behavior
- Can be:
- Deduplicated
- Frequency-ranked
- Trimmed to top-N performers
This is how **generated2.rule** was originally built.
---
### **Step 4: Extract Final Passwords**
```
cut -d: -f3- < nodename.debug > nodename.final
```
**Result:**
- Ground-truth cracked passwords
- Useful for:
- Pattern analysis
- Mask derivation
- Markov training
- PCFG or ML research
---
## **Iterative Refinement Loop**
After the first raking pass:
1. Re-run attacks using:
- nodename.base as a new dictionary
- nodename.rule as a curated rule set
2. Generate a **new debug file**
3. Compare effectiveness across iterations
4. Retain only high-yield components
Over time, the attack becomes:
- Smaller
- Faster
- More precise
- More explainable
---
## **Measuring Effectiveness**
Common metrics to track:
- Rule frequency of success
- Base word reuse rate
- Cracks per million guesses
- Marginal gains per iteration
This allows you to:
- Drop unproductive rules
- Promote dominant transformations
- Stop runs earlier with confidence
---
## **When to Use Raking**
Raking is most valuable when:
- You have **time**, not just speed
- You are attacking **large datasets**
- You want **long-term improvement**
- You are building reusable assets
It is less useful for:
- One-off quick wins
- Very small hash lists
- Extremely slow hash types (unless scoped carefully)
---
## **Key Takeaways**
- Raking turns cracking into **measurement**, not guesswork
- Rules should be _discovered_, not assumed
- Debug output is an **intelligence source**
- The best rules are **dataset-specific**
- Iteration matters more than brute force
---
[[Advanced Compositional Attacks]]
[[Home]]