Compiler Architecture
Technical overview of the MechanicsDSL compiler pipeline.
Pipeline Overview
┌─────────────────────────────────────────────────────────┐
│ DSL Source Code │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ TOKENIZER (lexer.py) │
│ - Break source into tokens │
│ - Handle LaTeX commands, numbers, operators │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ PARSER (parser.py) │
│ - Build Abstract Syntax Tree (AST) │
│ - Validate syntax │
│ - Error recovery │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ SEMANTIC ANALYZER (semantic.py) │
│ - Type checking │
│ - Unit inference │
│ - Symbol resolution │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ SYMBOLIC ENGINE (symbolic.py) │
│ - Convert to SymPy expressions │
│ - Derive equations of motion │
│ - Simplify equations │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ CODE GENERATOR │
│ ├── NumPy (runtime) │
│ ├── C++ (compile-time) │
│ ├── WebAssembly │
│ └── CUDA (planned) │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ SOLVER / EXECUTION │
│ - Numerical integration │
│ - Result collection │
└─────────────────────────────────────────────────────────┘
Tokenizer
The tokenizer (core/lexer.py) handles LaTeX-style input:
Token Types:
COMMAND:\system,\defvar,\lagrangianLBRACE,RBRACE:{,}NUMBER:1.0,9.81,-3.14IDENTIFIER:theta,m,gOPERATOR:+,-,*,/,^FUNCTION:\sin,\cos,\sqrtDOT:\dot{x}(time derivative)COMMENT:% ...
Example tokenization:
Input: \lagrangian{\frac{1}{2} m \dot{x}^2}
Tokens:
COMMAND('lagrangian')
LBRACE
FUNCTION('frac')
LBRACE
NUMBER(1)
RBRACE
LBRACE
NUMBER(2)
RBRACE
IDENTIFIER('m')
DOT
LBRACE
IDENTIFIER('x')
RBRACE
OPERATOR('^')
NUMBER(2)
RBRACE
Parser
The parser (core/parser.py) builds an AST using recursive descent:
AST Node Types:
@dataclass
class SystemNode:
name: str
@dataclass
class DefvarNode:
name: str
type: str
unit: str
@dataclass
class ParameterNode:
name: str
value: float
unit: str
@dataclass
class LagrangianNode:
expression: ExprNode
@dataclass
class ExprNode:
op: str # 'add', 'mul', 'pow', 'func', 'var', 'num'
args: List[ExprNode]
Semantic Analysis
The semantic analyzer (core/semantic.py) performs:
Symbol Table Construction:
Variables with their types
Parameters with values
Defined operators
Type Checking:
Verify all variables are defined
Check unit consistency (warning only)
Transformation:
Resolve coordinate transforms
Expand custom operators
Symbolic Engine
The symbolic engine (core/symbolic.py) uses SymPy:
Euler-Lagrange Derivation:
def derive_euler_lagrange(L, q, q_dot):
"""
Compute: d/dt(∂L/∂q̇) - ∂L/∂q = 0
Solve for q̈
"""
dL_dqdot = sp.diff(L, q_dot) # ∂L/∂q̇
dL_dq = sp.diff(L, q) # ∂L/∂q
# Time derivative using chain rule
d_dt_dL_dqdot = sum(
sp.diff(dL_dqdot, var) * var_dot
for var, var_dot in zip(coords, velocities)
) + sp.diff(dL_dqdot, q_ddot) * ???
# Solve for acceleration
equation = d_dt_dL_dqdot - dL_dq
q_ddot_solution = sp.solve(equation, q_ddot)[0]
return q_ddot_solution
Solving Strategy (“Search & Destroy”):
For coupled systems, we use iterative substitution:
Attempt direct solve for each
q_ddotIf coupled, build matrix equation
Solve linear system for accelerations
Code Generation
Code generators (codegen/) translate SymPy to target languages:
Common Interface:
class CodeGenerator(ABC):
@abstractmethod
def generate_derivatives(self, equations) -> str:
"""Generate the ODE right-hand side function."""
pass
@abstractmethod
def generate_integrator(self) -> str:
"""Generate time stepping code."""
pass
SymPy Code Printers:
sympy.printing.ccodefor C++sympy.printing.NumPyPrinterfor PythonCustom printer for WASM
Solver Integration
The solver (core/solver.py) wraps SciPy:
class Simulator:
def simulate(self, t_span, y0, method='RK45', **kwargs):
def derivatives(t, y):
return self.compiled_eqns(t, y, self.params)
solution = solve_ivp(
derivatives, t_span, y0,
method=method,
dense_output=True,
**kwargs
)
return self.format_solution(solution)
Error Handling
Errors are categorized:
Lexer errors: Invalid characters, unclosed strings
Parser errors: Syntax errors, unbalanced braces
Semantic errors: Undefined variables, type mismatches
Symbolic errors: Cannot solve for accelerations
Runtime errors: Numerical instability, NaN values
Each error type provides:
Line/column location
Context (surrounding code)
Suggested fix (when possible)
Performance Considerations
Compilation Caching:
Parsed AST and derived equations are cached using LRU cache.
Lazy Evaluation:
Equations are only derived when simulation starts.
NumPy Vectorization:
Generated Python uses NumPy operations for speed.
SymPy Optimization:
Common subexpression elimination
Constant folding
Trigonometric simplification