Property-Based and Load/Stress Testing
This document describes the property-based testing and load/stress testing infrastructure added to the Rhiza project.
Overview
Rhiza now includes two additional types of testing:
- Property-Based Testing (using Hypothesis) - Tests that verify properties hold across a wide range of generated inputs
- Load/Stress Testing (using pytest-benchmark) - Tests that measure performance and verify stability under load
README Code Block Testing
The test file .rhiza/tests/sync/test_readme_validation.py automatically executes every python
code block in README.md and checks that its output matches the adjacent ```result block. It
also syntax-checks every bash block using bash -n.
Skipping individual blocks with +RHIZA_SKIP
To exclude a specific code block from being executed or syntax-checked, append +RHIZA_SKIP to
the opening fence line:
```python +RHIZA_SKIP
# This block will NOT be executed or syntax-checked by the readme tests.
# Use it for illustrative examples, environment-specific code, or incomplete snippets.
from my_env import some_function
some_function()
```
```bash +RHIZA_SKIP
# This bash block will NOT be syntax-checked.
run-something --only-on-ci
```
Markdown renderers (including GitHub) ignore everything after the first word on a fence line, so
the block still renders as a normal syntax-highlighted code block. All blocks that do not carry
+RHIZA_SKIP continue to be validated as before.
Property-Based Testing
Property-based tests use the Hypothesis library to automatically generate test cases that verify certain properties always hold true.
Locations
In a standard Rhiza project, there are two relevant locations for property-based tests:
- Project property-based tests live in
tests/property/. These are part of your normal test suite and are discovered bypytestviapytest.ini:testpaths = tests. - Rhiza's own template/internal property-based tests (if present) live in
.rhiza/tests/property/. These are not part of your project's main test suite by default.
Running Property-Based Tests
In a typical setup, the Make targets map to these suites as follows:
make test: runspytestover everything undertests/(includingtests/property/).make rhiza-test: runs Rhiza's internal tests under.rhiza/tests/(including.rhiza/tests/property/if any exist).
You can also invoke the corresponding pytest commands directly:
# Run all project property-based tests (what make test covers)
pytest tests/property/ -v
# Run Rhiza's internal/template property-based tests (if you have any in .rhiza)
pytest .rhiza/tests/property/ -v
# Run project property-based tests with more examples (increase coverage)
pytest tests/property/ -v --hypothesis-max-examples=1000
# Run project property-based tests with verbose Hypothesis output
pytest tests/property/ -v --hypothesis-verbosity=verbose
Example Tests
The following property-based tests are included as examples:
Generic Property Tests
- test_sort_correctness_using_properties: Verifies that sorted() correctly orders lists and preserves all elements including duplicates
Load/Stress Testing
Load and stress tests use pytest-benchmark to measure performance and verify system stability under load.
Location
Benchmark and stress tests are located in tests/benchmarks/
Running Benchmark Tests
# Run all benchmarks
make benchmark
# Or with pytest directly
pytest tests/benchmarks/ -v
# Run benchmarks and generate histogram
pytest tests/benchmarks/ --benchmark-histogram=_tests/benchmarks/histogram
# Run benchmarks and save results
pytest tests/benchmarks/ --benchmark-json=_tests/benchmarks/results.json
# Skip benchmarks (for CI)
pytest tests/benchmarks/ --benchmark-skip
# Run only stress tests (note: these don't run with make benchmark by default)
pytest tests/benchmarks/ -m stress -v
# Skip stress tests (run only performance benchmarks)
pytest tests/benchmarks/ -m "not stress" -v
Note: The make benchmark target runs with --benchmark-only, which means stress tests (that don't use the benchmark fixture) will be skipped. To run stress tests explicitly, use pytest tests/benchmarks/ -m stress -v.
Benchmark Test Categories
1. Makefile Performance
Tests that measure the performance of common Makefile operations:
- test_help_target_performance - Measures help target execution time
- test_print_variable_performance - Measures variable printing performance
- test_dry_run_install_performance - Measures dry-run install performance
- test_makefile_parsing_overhead - Measures Makefile parsing overhead
2. File System Operations
Tests that benchmark file system operations:
- test_directory_traversal_performance - Measures directory traversal speed
- test_file_reading_performance - Measures file reading performance
- test_multiple_file_checks_performance - Measures file existence checking
3. Subprocess Overhead
Tests that measure subprocess creation overhead:
- test_subprocess_creation_overhead - Measures subprocess creation time
- test_git_command_performance - Measures git command execution time
4. Stress Scenarios
Tests that verify stability under load (marked with @pytest.mark.stress):
- test_repeated_help_invocations - Stress tests repeated help invocations (100 iterations)
- test_concurrent_print_variable_stress - Tests concurrent Makefile invocations (deterministic)
- test_file_system_stress - Tests rapid file creation/deletion (100 iterations)
Note: Stress tests can be slow and are marked with the stress marker. They don't use the benchmark fixture, so they won't run with make benchmark (which uses --benchmark-only). Use pytest tests/benchmarks/ -m stress -v to run them explicitly.
Understanding Benchmark Results
Benchmark output includes: - Min/Max: Minimum and maximum execution times - Mean: Average execution time - StdDev: Standard deviation (consistency) - Median: Median execution time - IQR: Interquartile range - Outliers: Number of outlier measurements - OPS: Operations per second (1/Mean)
Example output:
--------------------------------------------------- benchmark: 1 tests ---------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------
test_help_target_performance 16.5255 18.0592 16.9294 0.3194 16.8354 0.4791 15;1 59.0689 55 1
--------------------------------------------------------------------------------------------------------------------------
Integration with CI/CD
GitHub Actions Integration
The benchmark tests are integrated with GitHub Actions via .github/workflows/rhiza_benchmarks.yml:
- Runs benchmarks on every push to main and pull requests
- Stores historical benchmark data in the
gh-pagesbranch - Alerts on performance regressions > 150%
- Posts warnings to PRs for performance degradation
Running in CI
The property-based tests run as part of the regular test suite:
Benchmarks can be run separately or as part of validation:
Best Practices
Writing Property-Based Tests
- Focus on invariants: Test properties that should always hold true
- Use appropriate strategies: Choose Hypothesis strategies that generate realistic inputs
- Keep tests fast: Property tests run multiple times, so keep them quick
- Test edge cases: Use
@exampledecorator to test specific known edge cases
Example:
from hypothesis import given, strategies as st, example
@given(version=st.from_regex(r"^\d+\.\d+\.\d+$", fullmatch=True))
@example(version="0.0.0") # Test specific edge case
def test_version_parsing(version):
parts = version.split(".")
assert len(parts) == 3
assert all(p.isdigit() for p in parts)
Writing Benchmark Tests
- Benchmark real operations: Test actual operations users will perform
- Use fixtures wisely: Use module or session-scoped fixtures for expensive setup
- Test multiple scenarios: Benchmark best case, average case, and worst case
- Monitor trends: Track benchmark results over time to detect regressions
Example:
def test_operation_performance(benchmark):
def run_operation():
# Operation to benchmark
return perform_operation()
result = benchmark(run_operation)
assert result is not None
Writing Stress Tests
- Test realistic scenarios: Simulate real-world usage patterns
- Set reasonable thresholds: Allow small failure rates for resource contention
- Test concurrency: Use ThreadPoolExecutor or ProcessPoolExecutor for concurrent tests
- Monitor resource usage: Consider memory, CPU, and I/O in addition to time
Example:
def test_concurrent_operations(root):
import concurrent.futures
def operation():
# Operation to stress test
return perform_operation()
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = [executor.submit(operation) for _ in range(100)]
results = [f.result() for f in concurrent.futures.as_completed(futures)]
success_rate = sum(results) / len(results)
assert success_rate == 1.0 # Rhiza template stress tests require 100% success
Dependencies
The following dependencies are required for property-based and load/stress testing:
# Property-based testing
hypothesis>=6.150.0
# Benchmarking and performance testing
pytest-benchmark>=5.2.3
pygal>=3.1.0
These are automatically installed when running make install or by installing from .rhiza/requirements/tests.txt.
Troubleshooting
Hypothesis Hangs or Times Out
If Hypothesis tests hang, you can:
1. Reduce the number of examples: pytest --hypothesis-max-examples=10
2. Set a deadline: pytest --hypothesis-deadline=1000
3. Use the CI profile: pytest --hypothesis-profile=ci
Benchmarks Vary Too Much
If benchmark results have high variance:
1. Close other applications to reduce system load
2. Increase the number of rounds: pytest --benchmark-min-rounds=10
3. Run on a consistent environment (CI is preferred for accurate benchmarks)
Stress Tests Fail
If stress tests fail occasionally: 1. Check system resources (memory, CPU) 2. Increase acceptable failure rate if resource contention is expected 3. Reduce iteration count for local development