Performance Benchmarks¶
Latest Benchmark Data: Updated February 2026 with measurements on both local SSD and network storage.
⚠️ DISCLAIMER: Benchmark results vary based on hardware, filesystem, and directory structure. Always run your own benchmarks on your target systems for accurate performance data specific to your use case.
Benchmark Methodology¶
Cache Management & Fair Testing¶
All benchmarks use controlled cache clearing to represent real-world performance:
--clear-cache: Clears filesystem cache before each backend (prevents warm-cache bias)--warmup: Single warmup iteration before timed runs (eliminates cold-start artifacts)--shuffle: Randomizes backend execution order (prevents first-run advantage)- Iterations: Multiple runs per backend, results reported as median
Test Datasets¶
| Dataset | Files | Storage | Hardware |
|---|---|---|---|
| XXLarge | 1,000,000 | Local NVMe SSD | MacBook Air M4 (16GB RAM) |
| XLarge | 200,000 | Network NFS | Linux VM: Intel Xeon Gold 6442Y (4 cores), 24GB RAM |
Performance Results¶
Latest Results (XXLarge Dataset, Local SSD, M4 MacBook Air 16GB)¶
1,000,000 files - XXLarge dataset on Apple Silicon - Median of 5 trials
Hardware: MacBook Air M4 (16GB RAM), Local NVMe SSD Test Command:
uv run python benchmarks/benchmark.py \
--path /tmp/tmp_yk/bench-test \
--dataset-size xxlarge \
--backend profiling \
-n 5 --clear-cache --warmup --shuffle
Profiling Backends (Full Metadata Collection)¶
Backend │ Median Time │ Files/sec │ Relative Speed
─────────────┼─────────────┼────────────┼───────────────
Rust │ 7.335s │ 136,331 │ 1.00x (baseline)
Async │ 11.495s │ 86,994 │ 0.64x
Rust-seq │ 22.475s │ 44,495 │ 0.33x
fd │ 33.329s │ 30,004 │ 0.22x
Python │ 35.512s │ 28,160 │ 0.21x
Traversal Backends (Discovery Only)¶
Backend │ Median Time │ Files/sec │ Relative Speed
─────────────┼─────────────┼────────────┼───────────────
os.walk │ 0.565s │ 1,771,045 │ 1.00x (baseline)
rust-fast │ 4.385s │ 228,035 │ 0.13x
async-fast │ 10.839s │ 92,262 │ 0.05x
pathlib │ 19.305s │ 51,799 │ 0.03x
Key Observations (Local SSD):
- Rust profiling dominates at 1M files (7.3s vs 35.5s for Python)
- os.walk is fastest for pure traversal (0.565s), validating Python's efficiency for discovery
- Local SSD shows different characteristics than network storage - Rust parallelism shines
- Async provides alternative with 0.64x baseline overhead vs 1.5x on network storage
- Fast-path overhead significant on parallel backends but minimal on Python baseline
Earlier Results (XLarge Dataset, Network Storage, Profiling)¶
200,000 files - Full metadata collection - Median of 3 trials
Test Command:
uv run python benchmarks/benchmark.py \
--path /XYZdata/tmp_yk/bench-test \
--dataset-size xlarge \
--backend profiling \
-n 3 --clear-cache --warmup --shuffle
Backend │ Median Time │ Files/sec │ Relative Speed
─────────────┼─────────────┼────────────┼───────────────
Rust │ 2.331s │ 85,807 │ 1.00x (baseline)
Async │ 2.842s │ 70,384 │ 0.82x
Rust-seq │ 8.584s │ 23,300 │ 0.27x
fd │ 14.386s │ 13,902 │ 0.16x
Python │ 15.146s │ 13,205 │ 0.15x
Key Observations:
- Rust parallel remains fastest for network storage
- Async backend (tokio) shows good performance on NFS with high-latency operations
- Sequential backends show their limitations at scale
- Results from actual network filesystem (/vitodata) reflect real-world performance
Traversal Performance (XLarge Dataset, Fast-Path Only)¶
200,000 files - File path discovery without metadata collection - Median of 3 trials
Test Command:
uv run python benchmarks/benchmark.py \
--path /XYZdata/tmp_yk/bench-test \
--dataset-size xlarge \
--backend traversal \
-n 3 --clear-cache --warmup --shuffle
Backend │ Median Time │ Files/sec │ Relative Speed
─────────────┼─────────────┼────────────┼───────────────
os.walk │ 0.412s │ 485,171 │ 1.00x (baseline)
rust-fast │ 0.640s │ 312,698 │ 0.64x
async-fast │ 2.742s │ 72,931 │ 0.15x
pathlib │ 9.933s │ 20,136 │ 0.04x
Key Observations:
- Rust fast-path is slower than pure os.walk for simple path discovery (0.64x)
- Python
os.walkis very efficient for discovery-only workloads - Async-fast introduces significant overhead when metadata collection is skipped
- Pathlib's recursive glob remains slowest at scale
- Fast-path variants show less advantage than profiling backends due to lower computational overhead
Key Insights¶
- 🦀 Rust excels at metadata collection - Fastest for profiling benchmarks across all storage types
- ⚡ Async is a strong alternative - Good performance on both local and network storage
- 🐍 Python dominates pure discovery - os.walk is fastest for traversal-only regardless of storage
- 🎯 Backend selection depends on task - Use Rust for metadata collection, os.walk for discovery-only
Backend Groups for Fair Comparisons¶
The benchmark tool separates backends into groups to ensure fair performance comparisons:
Profiling Backends¶
These backends perform full metadata collection (permissions, sizes, timestamps, etc.):
- Rust - Rayon parallel scanner, most optimized for metadata collection
- Rust-seq - Sequential Rust baseline for comparison
- Async - Tokio async scanner, excellent for high-latency network operations
- fd - External tool, traversal optimized but good reference point
- Python - Pure Python with os.walk, always available baseline
Use profiling backends to benchmark complete directory scanning with all metadata:
python benchmarks/benchmark.py /path -n 3 --backend profiling
Traversal Backends¶
These backends only discover file paths (fast-path mode - no metadata collection):
- os.walk - Python standard library baseline
- pathlib - Python pathlib.Path.rglob
- rust-fast - Rust with
fast_path_only=Truefor pure discovery - async-fast - Async with
fast_path_only=Truefor pure discovery
Use traversal backends to measure raw discovery performance:
python benchmarks/benchmark.py /path -n 3 --backend traversal
This separation is important because:
- Fair comparisons - Profiling and traversal measure different capabilities
- Realistic expectations - Fast-path shows what's possible with just path discovery
- Use case matching - Choose comparison group matching your actual use case
- Methodology clarity - Eliminates confusion about what's being measured
Network Storage Notes¶
Results above are from actual network-mounted storage (/XYZdata) with NFS protocol:
- Rust parallel excels even on network storage, thanks to bounded concurrency and work-queue pattern
- Async backend is competitive alternative with lower CPU overhead for high-latency operations
- fd is available but was not benchmarked in latest tests
Network I/O patterns are different from local storage—your results may vary based on NFS server, latency, and network configuration.
Benchmarking Best Practices¶
Accurate Performance Testing¶
import subprocess
import time
from filoma.directories import DirectoryProfiler
def clear_filesystem_cache():
"""Clear OS filesystem cache for realistic benchmarks."""
subprocess.run(['sync'], check=True)
subprocess.run(['sudo', 'tee', '/proc/sys/vm/drop_caches'],
input='3\n', text=True, stdout=subprocess.DEVNULL, check=True)
time.sleep(1) # Let cache clear settle
def benchmark_backend(backend_name, path, iterations=3):
"""Benchmark a specific backend with cold cache."""
profiler = DirectoryProfiler(DirectoryProfilerConfig(search_backend=backend_name, show_progress=False))
# Check if the specific backend is available
available = ((backend_name == "rust" and profiler.is_rust_available()) or
(backend_name == "fd" and profiler.is_fd_available()) or
(backend_name == "python")) # Python always available
if not available:
return None
times = []
for i in range(iterations):
clear_filesystem_cache()
start = time.time()
result = profiler.probe(path)
elapsed = time.time() - start
times.append(elapsed)
avg_time = sum(times) / len(times)
files_per_sec = result['summary']['total_files'] / avg_time
return {
'backend': backend_name,
'avg_time': avg_time,
'files_per_sec': files_per_sec,
'total_files': result['summary']['total_files']
}
# Example usage
results = []
for backend in ['rust', 'fd', 'python']:
result = benchmark_backend(backend, '/test/directory')
if result:
results.append(result)
print(f"{backend}: {result['avg_time']:.3f}s ({result['files_per_sec']:.0f} files/sec)")
# Find fastest
if results:
fastest = min(results, key=lambda x: x['avg_time'])
print(f"\n🏆 Fastest: {fastest['backend']}")
Performance Tips¶
- Disable progress bars for benchmarking:
show_progress=False - Use fast path only for discovery benchmarks:
fast_path_only=True - Clear filesystem cache between runs for realistic results
- Run multiple iterations and average the results
- Test on your target storage - results vary by filesystem type
Warm vs Cold Cache Comparison¶
# Cold cache (realistic)
clear_filesystem_cache()
start = time.time()
result = profiler.probe("/test/directory")
cold_time = time.time() - start
# Warm cache (for comparison only)
start = time.time()
result = profiler.probe("/test/directory")
warm_time = time.time() - start
print(f"Cold cache: {cold_time:.3f}s (realistic)")
print(f"Warm cache: {warm_time:.3f}s (cached, {cold_time/warm_time:.1f}x slower when cold)")
⚠️ Important: Always use cold cache for realistic benchmarks. Warm cache results can be 2-8x faster but don't represent real-world performance for first-time directory access.
Backend Selection Recommendations¶
| Use Case | Recommended Backend | Why |
|---|---|---|
| Large directories | Auto (Rust if available) | Best overall performance |
| Network filesystems | fd |
Optimized for network I/O |
| CI/CD environments | Auto | Reliable with graceful fallbacks |
| Maximum compatibility | python |
Always works, no dependencies |
| DataFrame analysis | Auto (Rust if available) | Fastest DataFrame building |
| Pattern matching | fd |
Advanced regex/glob support |
Your Results May Vary¶
Performance depends on:
- Storage type - NVMe SSD > SATA SSD > HDD
- Filesystem - ext4, NTFS, APFS, NFS all behave differently
- Directory structure - Deep vs wide, file size distribution
- System load - CPU, memory, I/O contention
- Network latency - Critical for NFS/network storage
Run your own benchmarks on your target systems for accurate performance data.