Performance Optimization

Tips and techniques for maximizing simulation performance.

Profiling Your Simulation

Before optimizing, identify bottlenecks:

from mechanics_dsl import PhysicsCompiler
from mechanics_dsl.utils.profiling import profile_simulation

compiler = PhysicsCompiler()
compiler.compile_dsl(source)

# Profile the simulation
with profile_simulation() as prof:
    solution = compiler.simulate(t_span=(0, 10))

# Print timing breakdown
prof.print_stats()

Typical output:

MechanicsDSL Profiling Report
=============================
Compilation:     0.234 s (12.3%)
Simulation:      1.567 s (82.4%)
  - RHS evals:   1.234 s (64.9%)
  - Integration: 0.333 s (17.5%)
Visualization:   0.100 s (5.3%)

Solver Selection

Choose the right integrator for your problem:

Solver

Best For

Notes

RK45

General purpose

Default, adaptive step

DOP853

High accuracy needs

8th order, fewer evals

LSODA

Unknown stiffness

Auto-switches methods

BDF

Stiff systems

Implicit, stable

Radau

Very stiff systems

Implicit, high order

Set solver in DSL:

\solve{DOP853}

Or in Python:

solution = compiler.simulate(t_span=(0, 10), method='DOP853')

Tolerance Tuning

Balance accuracy vs speed with tolerances:

# Faster but less accurate
solution = compiler.simulate(rtol=1e-3, atol=1e-6)

# Slower but very accurate
solution = compiler.simulate(rtol=1e-12, atol=1e-14)

Rule of thumb:

  • Visualization only: rtol=1e-3 is fine

  • Conservation checks: rtol=1e-6 to 1e-9

  • Research quality: rtol=1e-10 or tighter

Symbolic Simplification

Complex Lagrangians generate complex equations. Simplify:

compiler = PhysicsCompiler(simplify=True)  # Default

# For very complex systems, try aggressive simplification
compiler = PhysicsCompiler(simplify='aggressive')

This calls SymPy’s simplification routines which may take longer but produce faster runtime code.

Caching

Enable equation caching to avoid recompilation:

from mechanics_dsl.utils.caching import enable_cache

enable_cache(max_size=100)  # Cache last 100 compilations

# First call compiles
compiler.compile_dsl(source)  # ~0.5s

# Subsequent calls use cache
compiler.compile_dsl(source)  # ~0.01s

C++ Code Generation

For maximum performance, generate native code:

# Generate and compile C++
compiler.compile_to_cpp("simulation.cpp", compile_binary=True)

Typical speedups: 10-100x faster than Python.

See codegen/cpp for details.

Parallelization

For N-body or SPH simulations, use OpenMP:

compiler.compile_to_cpp("simulation.cpp", target="openmp")

Run with multiple threads:

export OMP_NUM_THREADS=8
./simulation

Memory Efficiency

For long simulations, avoid storing every time point:

# Store fewer points
solution = compiler.simulate(
    t_span=(0, 1000),
    num_points=1000  # Instead of default 10000
)

Or use dense output for interpolation:

solution = compiler.simulate(dense_output=True)

# Evaluate at any time
state_at_50 = solution.sol(50.0)

Common Performance Issues

Issue: Simulation slows down over time

  • Check for energy divergence (numerical instability)

  • Try smaller time step or implicit solver

Issue: Compilation takes too long

  • Simplify Lagrangian if possible

  • Enable caching

  • Pre-compile equations and reuse

Issue: Memory usage grows

  • Reduce num_points

  • Use streaming output for long simulations

  • Process results in chunks

Benchmarks

Reference performance on Intel i7-10700K:

System

Points

Python

C++

Simple pendulum

10,000

50 ms

1 ms

Double pendulum

10,000

210 ms

8 ms

3-body problem

10,000

530 ms

15 ms

Figure-8 orbit

10,000

1.2 s

35 ms

SPH (1000 particles)

2000 frames

N/A

4.2 s