Symbolic refinement
Algebraic simplification, structural normalization, and the e-graph rewriter — reducing every candidate to its canonical form before it enters the gene pool.
The pipeline that runs every generation
In a standard genetic-programming setup, candidates produced by mutation and crossover are dropped straight into the population. Selection pressure is the only mechanism by which structure improves. Ufinq inverts that: every candidate passes through a refinement pipeline before it is allowed into the gene pool. The pipeline runs on every individual, every generation. It is not optional and not occasional.
The pipeline has five stages. The middle three — algebraic simplification, structural normalization, and constant optimization — fall under symbolic refinement. The two book-end stages, verification and validation, are operational guards. Together they ensure that what enters the gene pool is already a local optimum: structurally clean, numerically tuned, and provably sound.
Three threads of work
Algebraic simplification
- Rule DSL — a declarative rule language for expressing local simplifications: identities, absorbing elements, commutative rewrites, trigonometric collapses, logical reductions. Hundreds of rules covering arithmetic, comparison, conditional, and set-theoretic operations.
- Commutative variation — a single rule covers all argument orderings for commutative functions; operand-swapped variants are generated automatically by the rule engine, eliminating mirror-image rule duplication.
- Constant folding — value nodes are folded by direct evaluation before any rule fires; the rest of the pipeline only sees expressions that have no further constant arithmetic to do.
- Open questions — automatic rule discovery from canonical-form collisions; rule-pack scheduling under per-domain workload distributions; interaction effects with the e-graph rewriter under equality saturation.
Structural normalization
- E-graph rewriter — an equality-saturation-based rewriter that finds simplifications no fixed rule order would catch. Equivalent sub-expressions are unified into equivalence classes; cost-driven extraction picks the canonical representative.
- Canonical form — every refined candidate is reduced to a unique canonical form. This is what makes structural diversity gates meaningful: two distinct slots in the gene pool represent two distinct hypotheses, not two random spellings of the same hypothesis.
- Open questions — extraction-cost design under multi-objective pressure (small expressions are not always preferred); commutative-aware matching in the e-graph; bounded saturation when the e-graph blows up.
Verification and validation
- Structural verification — checks that the candidate satisfies invariants on the computation graph: type compatibility, arity, domain constraints, finite-cost evaluation. Malformed candidates are rejected before they consume any further compute.
- Population validation — the refined candidate must demonstrate value to the population — must not be a duplicate, must satisfy diversity constraints, and must produce a finite score on the training data.
- Open questions — early-exit prediction (skipping the rest of the pipeline when a candidate is provably going to fail validation); refinement-budget scheduling under heterogeneous candidate cost.
Three consequences of refining every individual
Related material
A short-form note on the consequences of this design lives at Every Ufinq individual is already a local optimum. The refinement pipeline as a whole is described under Technology § 6, with a five-stage diagram (Figure 3).
No paper is yet specifically about symbolic refinement; the pipeline is described in passing in the AHISTER paper (in preparation). A dedicated technical report on the e-graph rewriter is on the candidate list.
Continue reading
Other research areas are listed under Research. Published work lives on Papers; short-form notes on Notes.
