Probabalistic Context Free Grammer (PCFG)

A **Probabilistic Context-Free Grammar (PCFG)** is a statistical model for password structure. It represents how humans typically build passwords bycombining words, digits, symbols, substitutions, etc. and assigns each structure a probability based on observed data. This makes PCFGs a useful tool to model **real-world password behavior** and to reason about password guessability and risk at scale. --- ## What Is a PCFG? A PCFG extends a traditional context-free grammar (CFG) by adding **probabilities** to production rules. - A CFG defines how strings (e.g. passwords) can be composed: combining non-terminal elements, terminal symbols (letters, digits, symbols), etc. - A PCFG adds weights (probabilities) reflecting how often each rule occurs in actual password data. This allows the system to generate or analyze candidate passwords in **order of decreasing likelihood**, based on how humans generally create passwords. --- ## Why PCFGs Are Useful for Password Analysis - **Model human password patterns:** PCFGs capture how real users build passwords, base words, substitutions, digits, symbols, common affixes. - **Generate high-probability guesses first:** When used for guessing or analysis, PCFGs try likely patterns before rarer ones, maximizing efficiency. - - **Estimate effective risk:** PCFG-based analysis reflects real guessability more accurately than naive randomness or brute-force expectations, because it models structure and human behavior. - **Adapt over time and data:** Training on updated datasets lets the grammar reflect evolving password-creation habits, including new patterns, substitutions, or cultural trends. --- ## How a PCFG for Passwords Works 1. **Training phase** - collect a large sample of human-generated passwords (e.g. from breach datasets, anonymized) 2. **Analyze structural patterns** - break each password into components: alphabetic words, digits, symbols, substitutions, affixes, etc. 3. **Generate grammar rules** - define production rules for components (e.g. Word → AlphabeticString, DigitSuffix → DigitSequence, SymbolSuffix → SymbolSequence) 4. **Assign probabilities** - compute how frequently each rule / component appears within the training data 5. **Use grammar to generate or rank guesses** - when generating guesses (for analysis or testing), PCFG produces candidate passwords in descending likelihood order, highest probability first Using probabilistic ranking dramatically improves efficiency compared to brute force or naive dictionary-mangling methods. --- ## What PCFG Models Capture And What They Don’t ### What They Capture - Common structural patterns (words + digits + symbols) - Typical human substitutions (e.g. “o → 0”, “a → @”, common symbol usage) - Likely positions for digits/symbols (suffixes, prefixes) - Frequency distributions of base words, affixes, common roots - Password families and reuse-influenced variants ### ⚠️ What They Do Not Capture Reliably - Truly random passwords with no structure - Long, high-entropy passphrases unrelated to observed patterns - Behavior changes beyond the training distribution (e.g. novel cultural references, new languages, ad-hoc constructions) - External constraints (lockout policies, rate limits, user context) - Non-visible patterns (usage of password managers, multi-factor authentication, salted/hash-stored passwords) PCFG is a **model of structure and probability**, not a guarantee. It reflects likely behavior, not all possible behavior. --- ## Ethical & Defensive View: Why PCFG in Hashtopia Is for Awareness PCFGs are not taught or recommended here for illicit cracking. Instead, PCFGs help defenders and researchers: - **Understand how predictable human password behavior is** - **Estimate likely risk across a population** - **Design better password policies and defenses** (e.g. encouraging passphrases, blocking common patterns, educating users) - **Interpret leaked-password data** without exposing individuals’ privacy, focusing on structural patterns, not identities Used responsibly, PCFGs shift the focus from “can this password be cracked?” to “how likely is this password to be guessed, given what humans actually choose?” --- ## How PCFG Fits Into Hashtopia's Conceptual Model PCFG builds on and connects multiple core concepts: - **[[Password Structure and Composition]]** - PCFG formalizes structure as grammar rules - **[[Entropy and Guessability]]** - PCFG helps approximate real-world guessability over theoretical entropy models - **Password Reuse & Lifecycle** - PCFG captures reuse patterns, variants, and transformation behavior - **Scale & Risk Amplification** - PCFG reveals how small structural habits dominate breaches at population scale PCFG is one of the **most direct tools** for bridging human behavior, security theory, and empirical risk analysis. --- ## Key Takeaway Passwords are not random, they follow human-shaped patterns. A PCFG captures those patterns, estimates how common they are, and offers a realistic model of which passwords are most likely to fail under attack or leak pressure. Security improves when we study structure, not just string space. PCFG is a powerful conceptual and analytical tool when used defensively, ethically, and with respect. ## Resources Password Cracking Using Probabilistic Context-Free Grammars https://sites.google.com/site/reusablesec/Home/password-crackingtools/probablistic_cracker Next Gen [[PCFG - Pretty Cool Fuzzy Guesser]] https://github.com/lakiw/pcfg_cracker [[1. Concepts|Concepts]] [[Home]] #concepts #advanced #research #education