Compiler Architecture

Technical overview of the MechanicsDSL compiler pipeline.

Pipeline Overview

┌─────────────────────────────────────────────────────────┐
│                    DSL Source Code                      │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│  TOKENIZER (lexer.py)                                   │
│  - Break source into tokens                             │
│  - Handle LaTeX commands, numbers, operators            │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│  PARSER (parser.py)                                     │
│  - Build Abstract Syntax Tree (AST)                     │
│  - Validate syntax                                      │
│  - Error recovery                                       │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│  SEMANTIC ANALYZER (semantic.py)                        │
│  - Type checking                                        │
│  - Unit inference                                       │
│  - Symbol resolution                                    │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│  SYMBOLIC ENGINE (symbolic.py)                          │
│  - Convert to SymPy expressions                         │
│  - Derive equations of motion                           │
│  - Simplify equations                                   │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│  CODE GENERATOR                                         │
│  ├── NumPy (runtime)                                    │
│  ├── C++ (compile-time)                                 │
│  ├── WebAssembly                                        │
│  └── CUDA (planned)                                     │
└─────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────┐
│  SOLVER / EXECUTION                                     │
│  - Numerical integration                                │
│  - Result collection                                    │
└─────────────────────────────────────────────────────────┘

Tokenizer

The tokenizer (core/lexer.py) handles LaTeX-style input:

Token Types:

  • COMMAND: \system, \defvar, \lagrangian

  • LBRACE, RBRACE: {, }

  • NUMBER: 1.0, 9.81, -3.14

  • IDENTIFIER: theta, m, g

  • OPERATOR: +, -, *, /, ^

  • FUNCTION: \sin, \cos, \sqrt

  • DOT: \dot{x} (time derivative)

  • COMMENT: % ...

Example tokenization:

Input: \lagrangian{\frac{1}{2} m \dot{x}^2}

Tokens:
COMMAND('lagrangian')
LBRACE
FUNCTION('frac')
LBRACE
NUMBER(1)
RBRACE
LBRACE
NUMBER(2)
RBRACE
IDENTIFIER('m')
DOT
LBRACE
IDENTIFIER('x')
RBRACE
OPERATOR('^')
NUMBER(2)
RBRACE

Parser

The parser (core/parser.py) builds an AST using recursive descent:

AST Node Types:

@dataclass
class SystemNode:
    name: str

@dataclass
class DefvarNode:
    name: str
    type: str
    unit: str

@dataclass
class ParameterNode:
    name: str
    value: float
    unit: str

@dataclass
class LagrangianNode:
    expression: ExprNode

@dataclass
class ExprNode:
    op: str  # 'add', 'mul', 'pow', 'func', 'var', 'num'
    args: List[ExprNode]

Semantic Analysis

The semantic analyzer (core/semantic.py) performs:

  1. Symbol Table Construction:

    • Variables with their types

    • Parameters with values

    • Defined operators

  2. Type Checking:

    • Verify all variables are defined

    • Check unit consistency (warning only)

  3. Transformation:

    • Resolve coordinate transforms

    • Expand custom operators

Symbolic Engine

The symbolic engine (core/symbolic.py) uses SymPy:

Euler-Lagrange Derivation:

def derive_euler_lagrange(L, q, q_dot):
    """
    Compute: d/dt(∂L/∂q̇) - ∂L/∂q = 0
    Solve for q̈
    """
    dL_dqdot = sp.diff(L, q_dot)  # ∂L/∂q̇
    dL_dq = sp.diff(L, q)          # ∂L/∂q

    # Time derivative using chain rule
    d_dt_dL_dqdot = sum(
        sp.diff(dL_dqdot, var) * var_dot
        for var, var_dot in zip(coords, velocities)
    ) + sp.diff(dL_dqdot, q_ddot) * ???

    # Solve for acceleration
    equation = d_dt_dL_dqdot - dL_dq
    q_ddot_solution = sp.solve(equation, q_ddot)[0]

    return q_ddot_solution

Solving Strategy (“Search & Destroy”):

For coupled systems, we use iterative substitution:

  1. Attempt direct solve for each q_ddot

  2. If coupled, build matrix equation

  3. Solve linear system for accelerations

Code Generation

Code generators (codegen/) translate SymPy to target languages:

Common Interface:

class CodeGenerator(ABC):
    @abstractmethod
    def generate_derivatives(self, equations) -> str:
        """Generate the ODE right-hand side function."""
        pass

    @abstractmethod
    def generate_integrator(self) -> str:
        """Generate time stepping code."""
        pass

SymPy Code Printers:

  • sympy.printing.ccode for C++

  • sympy.printing.NumPyPrinter for Python

  • Custom printer for WASM

Solver Integration

The solver (core/solver.py) wraps SciPy:

class Simulator:
    def simulate(self, t_span, y0, method='RK45', **kwargs):
        def derivatives(t, y):
            return self.compiled_eqns(t, y, self.params)

        solution = solve_ivp(
            derivatives, t_span, y0,
            method=method,
            dense_output=True,
            **kwargs
        )

        return self.format_solution(solution)

Error Handling

Errors are categorized:

  • Lexer errors: Invalid characters, unclosed strings

  • Parser errors: Syntax errors, unbalanced braces

  • Semantic errors: Undefined variables, type mismatches

  • Symbolic errors: Cannot solve for accelerations

  • Runtime errors: Numerical instability, NaN values

Each error type provides:

  • Line/column location

  • Context (surrounding code)

  • Suggested fix (when possible)

Performance Considerations

Compilation Caching:

Parsed AST and derived equations are cached using LRU cache.

Lazy Evaluation:

Equations are only derived when simulation starts.

NumPy Vectorization:

Generated Python uses NumPy operations for speed.

SymPy Optimization:

  • Common subexpression elimination

  • Constant folding

  • Trigonometric simplification