Performance Optimization Guide

Optimize MechanicsDSL simulations for maximum performance.

Numba JIT Acceleration

Use the Numba-accelerated solver for 5-10x speedups:

from mechanics_dsl.solver_numba import NumbaSimulator
import sympy as sp

# Define equations
theta = sp.Symbol('theta')
g, l = sp.Symbol('g'), sp.Symbol('l')

accelerations = {'theta_ddot': -g/l * sp.sin(theta)}

# Create Numba simulator
sim = NumbaSimulator()
sim.set_parameters({'g': 9.81, 'l': 1.0})
sim.set_initial_conditions({'theta': 0.3, 'theta_dot': 0.0})
sim.compile_equations(accelerations, ['theta'])

# Run simulation (5-10x faster than SciPy)
solution = sim.simulate_numba(
    t_span=(0, 100),
    num_points=10000,
    method='rk4'  # 'euler', 'rk4', or 'rk45'
)

Available Methods

  • euler - Simple Euler (fastest, least accurate)

  • rk4 - 4th order Runge-Kutta (recommended)

  • rk45 - Adaptive Dormand-Prince (most accurate)

GPU Acceleration with CUDA

For massive parallelism on NVIDIA GPUs:

  1. Generate CUDA code:

from mechanics_dsl.codegen import CudaGenerator
gen = CudaGenerator(...)
gen.generate("cuda_output/")
  1. Compile with nvcc:

cd cuda_output
mkdir build && cd build
cmake ..
make
  1. Run on GPU:

./simulation_cuda

CPU Fallback

If no NVIDIA GPU is available, use the CPU version:

./simulation_cpu

Multi-Core Parallelism with OpenMP

For multi-core CPU simulation:

from mechanics_dsl.codegen import OpenMPGenerator

gen = OpenMPGenerator(
    ...,
    num_threads=8  # 0 = auto-detect
)
gen.generate("simulation_openmp.cpp")

Compile with:

g++ -fopenmp -O3 -march=native -o simulation simulation_openmp.cpp

Memory Optimization

For large particle simulations:

  1. Use float32 instead of float64 where precision allows

  2. Structure of Arrays (SoA) layout for cache efficiency

  3. Spatial hashing for O(n) neighbor search

Benchmarking

Run the included benchmark:

cd benchmarks
python numba_performance.py

Expected output:

Points

Numba

SciPy

Speedup

1,000 10,000 100,000

5 ms 50 ms 500 ms

45 ms 450 ms 4500 ms

9x 9x 9x

Best Practices

  1. Start with Python for debugging

  2. Profile first to identify bottlenecks

  3. Use Numba for quick wins (no code changes)

  4. Generate C++/CUDA for production

  5. Batch simulations with OpenMP for parameter sweeps