Cybersecurityhashingmd5securitycryptanalysis

Hash Functions & Brute Force Attacks

Implementation of MD5 hashing, rainbow table attacks, password cracking, and salt-based defenses.

Date

2024

Understanding Cryptographic Hash Functions

Lab 3 delves into cryptographic hash functions with hands-on implementation of MD5, practical password cracking techniques, and defense mechanisms. This project explores hash function properties, collision resistance, and demonstrates why unsalted password hashes are vulnerable through participation in a hash-breaking competition.

What is Hashing?

A cryptographic hash function is a mathematical algorithm that takes an input (or "message") of any length and produces a fixed-size string of bytes. The output, typically a "digest," is unique to each unique input. Hash functions are deterministic, meaning the same input will always produce the same hash output. They are designed to be one-way functions, making it computationally infeasible to reverse the process and retrieve the original input from the hash.

Deterministic: Same input always produces the same hash
Fixed output size: Regardless of input length, hash is always the same size
Fast computation: Hash can be computed quickly for any input
One-way function: Infeasible to reverse the hash to get the original input
Avalanche effect: Small change in input drastically changes the output
Collision resistant: Hard to find two different inputs with the same hash

What is MD5?

MD5 (Message Digest Algorithm 5) is a widely used cryptographic hash function that produces a 128-bit (16-byte) hash value, typically expressed as a 32-character hexadecimal number. Designed by Ronald Rivest in 1991, MD5 was initially developed as a cryptographic hash function for digital signatures and message authentication.

MD5 processes input data in 512-bit blocks through a series of mathematical operations including bitwise operations, modular additions, and nonlinear functions. Despite its historical popularity, MD5 is now considered cryptographically broken due to discovered vulnerabilities that allow collision attacks, where two different inputs can produce the same hash output.

Technologies Used

PythonMD5Rainbow TablesDictionary AttacksSalt Generation

Key Features

MD5 hash function implementation from scratch

Brute force password cracking engine

Dictionary attack with common password lists

Rainbow table generation and lookup

Salt generation and secure password hashing

Hash collision detection

Showing 6 of 8 features

Implementation Details

Q&A: How does the length of hash correspond to input string?

One of the fundamental properties of cryptographic hash functions like MD5 is that they produce a fixed-size output regardless of the input size. Whether you hash a single character or an entire novel, the MD5 hash will always be exactly 128 bits (32 hexadecimal characters) long.

This fixed-size property is crucial for many applications, including password storage and data integrity verification. It means that comparing hashes is always a constant-time operation, and the storage requirement for hashes is predictable.

md5_length_demo.py

import hashlib

# Function to compute MD5 hash
def compute_md5(text):
    result = hashlib.md5(text.encode())
    return result.hexdigest()

# Strings of different lengths
strings = [
    "a",                  # 1 character
    "hello",              # 5 characters
    "cybersecurity",      # 13 characters
    "This is a longer string with spaces",  # 35 characters
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam eget felis euismod."  # 83 characters
]

print("MD5 Hash Demonstration\n")
print("String Length | MD5 Hash")
print("-" * 50)

for s in strings:
    md5_hash = compute_md5(s)
    print(f"'{s}'")
    print(f"Length: {len(s)} characters")
    print(f"MD5 Hash: {md5_hash}")
    print("-" * 50)

# Note: On Windows PowerShell, you can compute MD5 hash using:
# Get-FileHash -Algorithm MD5 -InputStream ([IO.MemoryStream]::new([Text.Encoding]::UTF8.GetBytes("your text")))
#
# On Linux/macOS, you can use:
# echo -n "your text" | md5sum

Code Explanation

The code demonstrates that regardless of input length (1 to 83 characters), MD5 always produces a 32-character hexadecimal hash. This fixed-size output is a defining characteristic of hash functions.

Q&A: Are there any visible correlations between the hash and input string?

No, there are no visible correlations between the input string and its hash output. This is due to the "avalanche effect" - a critical property of cryptographic hash functions where even a tiny change in the input produces a dramatically different hash.

For example, hashing "password" versus "Password" (just changing the case of one letter) produces completely different hashes with no visible pattern or relationship. This property is essential for security, as it prevents attackers from making educated guesses about the original input by examining the hash.

Similar inputs produce completely different hashes
No pattern can be observed between sequential inputs and their hashes
Hash values appear random and uniformly distributed
Single bit change in input changes approximately 50% of output bits
Impossible to predict hash output without computing it

Q&A: What are the issues related to cryptographic weakness of MD5?

MD5 is considered cryptographically broken and unsuitable for further use in security-critical applications. The algorithm has several significant vulnerabilities that have been discovered and exploited over the years:

Collision Attacks: Researchers can generate two different inputs that produce the same MD5 hash in a matter of seconds using modern hardware. This breaks the collision resistance property.
Preimage Attacks: While theoretically difficult, advances in computing power have made finding an input that produces a specific hash more feasible than originally intended.
Rainbow Tables: Pre-computed tables of hashes for common passwords make it trivial to reverse MD5 hashes of weak passwords.
Speed is a Weakness: MD5 was designed to be fast, which actually makes it vulnerable to brute-force attacks. Modern GPUs can compute billions of MD5 hashes per second.
No Salt Support: MD5 itself doesn't include salting, making identical passwords have identical hashes across different systems.

Despite these vulnerabilities, MD5 is still used in non-security contexts such as checksums for file integrity verification. However, for password storage, digital signatures, and other security-critical applications, modern alternatives like bcrypt, Argon2, or SHA-256 should be used instead.

In this lab, we explore these weaknesses hands-on by implementing brute force attacks, generating rainbow tables, and demonstrating how proper salting can mitigate some (but not all) of MD5's security issues.

Breaking Hashes with Brute Force

Brute force attacks on MD5 hashes involve systematically trying every possible input until finding one that produces the target hash. Due to MD5's speed and the absence of built-in protection mechanisms, this approach is surprisingly effective for short passwords.

The attack strategy depends on the expected password length and character set. For a 5-character password using lowercase letters and digits (36 possible characters), there are 36^5 = 60,466,176 possible combinations. While this sounds like a lot, modern computers can compute MD5 hashes at millions per second, making this entirely feasible.

Character Set Definition: Determine the possible characters (lowercase, uppercase, digits, symbols)
Combination Generation: Use itertools.product to generate all possible strings of target length
Hash Computation: Calculate MD5 hash for each generated string
Comparison: Check if computed hash matches any target hash
Optimization: Stop early if all target hashes are found
Progress Tracking: Monitor speed and estimated time remaining

The key insight is that MD5's computational efficiency becomes its weakness. A password that takes microseconds to hash can be cracked in seconds through brute force. This is why modern password hashing algorithms like bcrypt are intentionally slow, using key derivation functions with adjustable work factors.

brute_force_md5.py

import hashlib
import time
import string
import itertools

def main():
    # Start timing
    start_time = time.time()

    # Read hash values from file
    with open('hash5.txt', 'r') as f:
        target_hashes = [line.strip() for line in f.readlines()]

    # Create a dictionary to store the results
    results = {}
    remaining_hashes = set(target_hashes)
    total_hashes = len(remaining_hashes)

    # Define character set: lowercase letters and numbers
    charset = string.ascii_lowercase + string.digits

    # Calculate total combinations for progress tracking
    total_combinations = len(charset) ** 5
    print(f"Starting brute force of {total_hashes} hashes...")
    print(f"Total possible combinations: {total_combinations:,}")

    # Counter for progress tracking
    counter = 0
    last_update = time.time()
    update_interval = 2  # seconds between progress updates

    # Generate all possible 5-character strings and check their hashes
    for combo in itertools.product(charset, repeat=5):
        # Convert tuple to string
        plaintext = ''.join(combo)

        # Calculate MD5 hash
        hash_value = hashlib.md5(plaintext.encode()).hexdigest()

        # Check if this hash is one of our targets
        if hash_value in remaining_hashes:
            results[hash_value] = plaintext
            remaining_hashes.remove(hash_value)
            print(f"Found: {plaintext} -> {hash_value} ({len(results)}/{total_hashes})")

            # If we've found all hashes, we can stop
            if not remaining_hashes:
                break

        # Update progress periodically
        counter += 1
        current_time = time.time()
        if current_time - last_update > update_interval:
            elapsed = current_time - start_time
            progress = counter / total_combinations * 100
            combinations_per_sec = counter / elapsed if elapsed > 0 else 0
            print(f"Progress: {progress:.4f}% | Combinations tried: {counter:,} | Speed: {combinations_per_sec:.0f} combinations/sec")
            last_update = current_time

    # Calculate total time
    end_time = time.time()
    total_time = end_time - start_time

    # Print results
    print(f"\nAll hashes reversed in {total_time:.2f} seconds")

    # Save results to file
    with open('ex2_hash.txt', 'w') as f:
        for hash_val, plaintext in sorted(results.items()):
            f.write(f"{plaintext}\n")

    print(f"Results saved to ex2_hash.txt")

if __name__ == "__main__":
    main()

Code Explanation

This brute force implementation systematically generates all 5-character combinations from the charset (a-z, 0-9) and computes their MD5 hashes. It uses a set for O(1) hash lookups, implements progress tracking, and stops early once all target hashes are found. On modern hardware, this can crack millions of hashes per second, demonstrating why password length and complexity are critical.

Defense Mechanism: Salt and Secure Password Storage

What is a Salt?

A salt is a random string of characters that is appended or prepended to a password before hashing. Think of it as adding unique "seasoning" to each password hash. The salt is stored alongside the hash in the database, and while not secret, it dramatically increases the difficulty of hash-breaking attacks.

How Salts Work:

Random Generation: A unique salt is generated for each password (typically 16-32 bytes)
Concatenation: Salt is combined with the password (e.g., salt + password or password + salt)
Hashing: The combined string is hashed using MD5 or a better algorithm
Storage: Both the salt and hash are stored in the database
Verification: During login, the stored salt is retrieved, combined with the entered password, hashed, and compared

Why Salts Are Critical:

Prevents Rainbow Table Attacks: Pre-computed hash tables become useless because each password has a unique salt
Eliminates Duplicate Hashes: Two users with password "password123" will have completely different hashes
Forces Per-Password Cracking: Attackers must crack each hash individually, no bulk cracking
Increases Search Space: Even weak passwords become more resistant to dictionary attacks
No Secrecy Required: Salt can be stored in plaintext alongside the hash

Example Without Salt:

unsalted_hashing.py

import hashlib

# WITHOUT SALT - VULNERABLE
password1 = "password123"
password2 = "password123"

hash1 = hashlib.md5(password1.encode()).hexdigest()
hash2 = hashlib.md5(password2.encode()).hexdigest()

print(f"User 1 hash: {hash1}")
print(f"User 2 hash: {hash2}")
print(f"Hashes match: {hash1 == hash2}")  # True - PROBLEM!

# Result:
# User 1 hash: 482c811da5d5b4bc6d497ffa98491e38
# User 2 hash: 482c811da5d5b4bc6d497ffa98491e38
# Hashes match: True
# An attacker who cracks one hash can compromise ALL users with that password!

Code Explanation

Without salt, identical passwords produce identical hashes. If an attacker cracks one hash, they instantly know the password for all users with that hash. Rainbow tables can be used to crack millions of hashes simultaneously.

Implementing Salted Password Hashing

With proper salting, each password gets a unique hash even if the passwords are identical. This makes rainbow tables ineffective and forces attackers to crack each hash individually.

salted_hashing.py

import hashlib
import os
import binascii

def hash_password(password):
    """
    Hash a password with a randomly generated salt.
    Returns: salt + hash (both in hex format)
    """
    # Generate a random 16-byte salt
    salt = os.urandom(16)

    # Combine salt and password, then hash
    salted_password = salt + password.encode()
    hash_result = hashlib.md5(salted_password).digest()

    # Convert to hex for storage
    salt_hex = binascii.hexlify(salt).decode()
    hash_hex = binascii.hexlify(hash_result).decode()

    # Store both salt and hash (often combined as salt$hash)
    return f"\{salt_hex}$\{hash_hex}"

def verify_password(stored_hash, password):
    """
    Verify a password against the stored salted hash.
    """
    # Split stored data into salt and hash
    salt_hex, hash_hex = stored_hash.split('$')

    # Convert salt back from hex
    salt = binascii.unhexlify(salt_hex)

    # Hash the provided password with the stored salt
    salted_password = salt + password.encode()
    computed_hash = hashlib.md5(salted_password).hexdigest()

    # Compare hashes
    return computed_hash == hash_hex

# Example usage
print("=== Salted Password Hashing Demo ===\n")

# Two users with the same password
password = "password123"

user1_hash = hash_password(password)
user2_hash = hash_password(password)

print(f"User 1 stored: {user1_hash}")
print(f"User 2 stored: {user2_hash}")
print(f"\nHashes identical? {user1_hash == user2_hash}")  # False - SECURE!
print(f"Length of each stored string: {len(user1_hash)} characters")

# Verify passwords
print(f"\nUser 1 password verification: {verify_password(user1_hash, password)}")
print(f"User 2 password verification: {verify_password(user2_hash, password)}")
print(f"Wrong password verification: {verify_password(user1_hash, 'wrong')}")

# Output:
# User 1 stored: a3f7c8b2e9d4f1a6b5c3d7e8f2a1b9c4$e9f3a7c2b8d4f1e6a5c9d3b7f2a8e1c4
# User 2 stored: d8e3f7a2b9c4d1f5e6a8c3b7d2f9a4e1$f2a8c3e7d4b9f1e5a6c8d3b7f2a9e4c1
# Hashes identical? False
# Each user gets unique salt and hash, even with same password!

Code Explanation

This implementation generates a unique random salt for each password, combines it with the password before hashing, and stores both salt and hash. During verification, the stored salt is used to hash the provided password for comparison. This defeats rainbow tables and prevents bulk password cracking.

Hash Breaking Techniques: A Comprehensive Analysis

Professional password cracking employs multiple sophisticated techniques, each with different strengths and weaknesses. Understanding these methods is crucial for both offensive security assessments and defensive password policy design.

1. Brute Force Attack

Method: Try every possible combination systematically
Time Complexity: O(c^n) where c = charset size, n = password length
Effectiveness: Guaranteed to find password given enough time
Speed: Modern GPUs can test billions of MD5 hashes per second
Best Against: Short passwords (≤6 characters), limited character sets
Defense: Use long passwords (12+ characters) with mixed character types

2. Dictionary Attack

Method: Try words from a pre-compiled dictionary/wordlist
Common Wordlists: rockyou.txt (14M passwords), SecLists, common passwords lists
Variations: Add common patterns (password123, Password!, p@ssw0rd)
Effectiveness: 10-30% success rate on real-world databases
Speed: Can test millions of passwords in seconds
Best Against: Dictionary words, common passwords, predictable patterns
Defense: Avoid dictionary words, use passphrases, add random characters

3. Rainbow Table Attack

Method: Pre-computed hash tables using time-memory tradeoff
How it Works: Generate hash chains that cover the entire keyspace efficiently
Storage Requirements: Gigabytes to terabytes depending on coverage
Speed: Near-instant lookups once table is built
Effectiveness: Extremely fast for unsalted hashes
Limitation: Each salt requires a new rainbow table (making them impractical)
Defense: Always use unique salts, making rainbow tables ineffective

4. Hybrid Attack

Method: Combine dictionary words with character substitutions and patterns
Examples: "password" → "p@ssw0rd", "P@ssw0rd!", "Password123!"
Rules: Common substitutions (a→@, e→3, i→1, o→0, s→$)
Tools: Hashcat and John the Ripper excel at hybrid attacks
Effectiveness: Catches users who think l33tspeak makes passwords secure
Defense: Use truly random passwords or long passphrases

5. Mask Attack (Smart Brute Force)

Method: Brute force with pattern constraints (e.g., "?u?l?l?l?l?d?d" for Password12 pattern)
Pattern Definition: ?l=lowercase, ?u=uppercase, ?d=digit, ?s=special
Effectiveness: Much faster than pure brute force by focusing on likely patterns
Common Patterns: Uppercase first letter, digits at end, special char at end
Defense: Use random passwords that don't follow predictable patterns

6. Combinator Attack

Method: Concatenate words from multiple wordlists
Example: "summer" + "2024" = "summer2024"
Effectiveness: Catches passphrases made from common word combinations
Speed: Faster than pure brute force, more thorough than dictionary
Defense: Use 4+ random words or add random characters between words

7. Statistical Attack / Markov Chains

Method: Generate passwords based on statistical probability of character sequences
How it Works: Learn patterns from leaked password databases
Effectiveness: Can crack "random-looking" passwords that follow human patterns
Advanced: Machine learning models trained on billions of real passwords
Defense: Use cryptographically random password generators

Online Hash Breaking Resources

For educational purposes and security testing, several online resources provide hash lookup and cracking capabilities. These demonstrate how quickly unsalted hashes can be compromised.

Popular Online Hash Crackers:

CrackStation (https://crackstation.net/) - Free rainbow table lookup for MD5, SHA1, SHA256, NTLM. Database of 15+ billion entries.
Hashes.com (https://hashes.com/en/decrypt/hash) - Multi-algorithm hash lookup with submission system. Community-powered database.
MD5 Online (https://www.md5online.org/md5-decrypt.html) - Dedicated MD5 hash cracker with large database.
HashKiller (https://hashkiller.io/) - Free hash lookup supporting multiple algorithms.
OnlineHashCrack (https://www.onlinehashcrack.com/) - Paid service for serious cracking, supports GPU acceleration.
Cyber Chef (https://gchq.github.io/CyberChef/) - Not a cracker but excellent for hash analysis and encoding/decoding.

Offline Tools for Professional Use:

Hashcat - GPU-accelerated, supports 300+ hash types, incredibly fast. Industry standard.
John the Ripper - CPU-based, excellent rule engine, great for dictionary attacks.
Hydra - Network login cracker, supports many protocols.
Medusa - Parallel network login cracker.
RainbowCrack - Generates and uses rainbow tables.
ophcrack - Windows password cracker using rainbow tables.

⚠️ Legal and Ethical Considerations:

Only crack hashes you have explicit permission to crack
Use these tools for security research, penetration testing, or educational purposes
Unauthorized access to computer systems is illegal in most jurisdictions
Check local laws regarding password cracking tools and activities
Always get written authorization before security testing
Respect responsible disclosure practices when finding vulnerabilities

Best Practices for Defense:

Never use MD5 or SHA1 for password hashing - use bcrypt, Argon2, or PBKDF2
Always use unique random salts (16+ bytes)
Implement key stretching (multiple rounds of hashing)
Enforce minimum password length (12+ characters)
Use password complexity requirements wisely (length > complexity)
Implement rate limiting and account lockouts
Consider multi-factor authentication (MFA)
Monitor for breached credentials using services like Have I Been Pwned
Educate users about password managers
Conduct regular security audits and penetration testing

Challenges

Implementing MD5 algorithm correctly according to RFC 1321, optimizing brute force performance, generating effective rainbow tables, understanding time-space tradeoffs in password cracking.

Outcome & Impact

Successfully implemented MD5 from scratch, cracked thousands of unsalted password hashes, demonstrated effectiveness of salted hashes in preventing rainbow table attacks.

How I Grew

Mastered cryptographic hash function properties and requirements

Learned practical password cracking techniques and defenses

Understood the critical importance of salt in password storage

Gained experience with time-space tradeoffs in cryptanalysis

Developed skills in optimizing cryptographic operations

Learned why MD5 is deprecated for security-critical applications

Try It: Hash Breaker

Experience firsthand how quickly weak passwords can be cracked through brute force attacks. This demo uses a simplified hash function (SHA-1 truncated to 32 chars) for demonstration purposes, as browsers don't natively support MD5. The principles remain the same: systematic enumeration until a match is found.

⚠️ Note: Real MD5 brute force attacks are much faster. This educational demo runs in your browser and may be slower than native implementations.

Target Hash

MD5 Hash (32 hex characters)

Max Password Length

Include Uppercase (A-Z)

Include Numbers (0-9)

Understanding the Attack

• Character Set: Defines the pool of possible characters (a-z, A-Z, 0-9)

• Search Space: For length N with C characters: C^1 + C^2 + ... + C^N combinations

• Example: 4-char lowercase+digits (36 chars) = 36^1 + 36^2 + 36^3 + 36^4 = 1,727,604 combinations

• Defense: Use longer passwords (10+ chars), mix character types, and modern algorithms like bcrypt

All Projects Learn More About Me