Performance Optimization Guide
Optimize MechanicsDSL simulations for maximum performance.
Numba JIT Acceleration
Use the Numba-accelerated solver for 5-10x speedups:
from mechanics_dsl.solver_numba import NumbaSimulator
import sympy as sp
# Define equations
theta = sp.Symbol('theta')
g, l = sp.Symbol('g'), sp.Symbol('l')
accelerations = {'theta_ddot': -g/l * sp.sin(theta)}
# Create Numba simulator
sim = NumbaSimulator()
sim.set_parameters({'g': 9.81, 'l': 1.0})
sim.set_initial_conditions({'theta': 0.3, 'theta_dot': 0.0})
sim.compile_equations(accelerations, ['theta'])
# Run simulation (5-10x faster than SciPy)
solution = sim.simulate_numba(
t_span=(0, 100),
num_points=10000,
method='rk4' # 'euler', 'rk4', or 'rk45'
)
Available Methods
euler- Simple Euler (fastest, least accurate)rk4- 4th order Runge-Kutta (recommended)rk45- Adaptive Dormand-Prince (most accurate)
GPU Acceleration with CUDA
For massive parallelism on NVIDIA GPUs:
Generate CUDA code:
from mechanics_dsl.codegen import CudaGenerator
gen = CudaGenerator(...)
gen.generate("cuda_output/")
Compile with nvcc:
cd cuda_output
mkdir build && cd build
cmake ..
make
Run on GPU:
./simulation_cuda
CPU Fallback
If no NVIDIA GPU is available, use the CPU version:
./simulation_cpu
Multi-Core Parallelism with OpenMP
For multi-core CPU simulation:
from mechanics_dsl.codegen import OpenMPGenerator
gen = OpenMPGenerator(
...,
num_threads=8 # 0 = auto-detect
)
gen.generate("simulation_openmp.cpp")
Compile with:
g++ -fopenmp -O3 -march=native -o simulation simulation_openmp.cpp
Memory Optimization
For large particle simulations:
Use float32 instead of float64 where precision allows
Structure of Arrays (SoA) layout for cache efficiency
Spatial hashing for O(n) neighbor search
Benchmarking
Run the included benchmark:
cd benchmarks
python numba_performance.py
Expected output:
Points |
Numba |
SciPy |
Speedup |
|---|---|---|---|
1,000 10,000 100,000 |
5 ms 50 ms 500 ms |
45 ms 450 ms 4500 ms |
9x 9x 9x |
Best Practices
Start with Python for debugging
Profile first to identify bottlenecks
Use Numba for quick wins (no code changes)
Generate C++/CUDA for production
Batch simulations with OpenMP for parameter sweeps