Genetic operators
How candidates are generated and recombined — the mutation chain, crossover families, and the structural invariants they must preserve.
What generates candidates, what recombines them, and what they have to preserve
Two families of operators move every population forward. Mutation takes a single candidate and produces a structurally adjacent one. Crossover takes two candidates and produces an offspring that combines material from both. In symbolic regression both kinds of operator have to walk the same line: enough structural change to find new hypotheses, not so much that the offspring is indistinguishable from a fresh random tree.
The Ufinq operator surface is built from the same primitives as classical tree-based GP, but with three differences. Every operator is part of a named family with declared semantics, so the population's mix can be reasoned about and tuned. Every replacement-style operator is parameterized by a pluggable subtree-selection strategy, so bloat control is an orthogonal modifier rather than a baked-in constant. And every operator's success rate is tracked per-branch by an adaptive bandit, so over time the population draws from the operators that are actually working.
Four threads of work
Mutation chain
- Node-kind families — mutations are organized by what they edit: arithmetic, comparison, boolean, function, variable, constant, binding, component, sequence, mapping, model. Each family has its own canonical invariants and its own set of legal local edits.
- Random-node mutation, made structural — a random-node mutation picks a node by random walk and replaces it through
rootNode.replace, which preserves type compatibility and arity. The earlier flat-node-list approach has been retired; the framework now goes through the same structural API every other operator uses. - Recomposition mutations — beyond local edits, a mutation can recompose a sub-component (a binding, a function call, a sequence element) by swapping in a freshly composed alternative. This is the mutation counterpart of the recomposition crossover family.
- Open questions — adaptive mutation rates per node-kind family; mutation operators that respect numerically-tuned constants instead of re-randomizing them; correlation between local-edit "size" and offspring quality.
Crossover taxonomy
- Seven canonical families — Addition, Removal, Replacement, Combination, Blending, Splice, Semantic. Every crossover declares its family on the
CrossoverFunctioninterface so the bandit can do hierarchical selection over family × operator instead of one flat distribution. - Subtree-selection strategy as a seam — the replacement family is parameterized by a
SubtreeSelectionStrategyrather than callingrandomNodedirectly. Uniform-random is the default; size-fair, depth-fair, and homologous-aligned strategies plug in without changing operator code. - Brood selection — a wrapper that runs the underlying crossover k times and keeps the best by a cheap proxy (complexity, structural diversity, simplification feasibility). True-fitness brood is deferred behind the bandit until the cost trade-off is understood.
- Headless-chicken baseline — mating with a freshly composed random tree. Required as a known-baseline action for the bandit; every other operator must beat it on at least one of fitness gain or diversity contribution to justify its slot.
- Open questions — geometric-semantic crossover under canonicalized populations; tree-aligned one-point crossover (Poli & Langdon) interaction with the e-graph rewriter; soft internal brood vs outer brood wrapper trade-off.
Adaptive operator selection
- Per-branch bandits — each branch maintains its own bandit state over the operators available to it. Branch isolation is absolute, so a branch's selection learns from its own outcomes only — there is no cross-branch operator-weight aggregation.
- Hierarchical selection — the bandit is hierarchical over family × operator (and, eventually, over hyperparameter values). Hot families do not shadow cold operators within other families; the distribution mixes at the right granularity.
- Reward against parent best — the reward signal is offspring quality versus parent best, not versus population mean. This makes the signal informative even when the population is converging.
- Tracker-backed instrumentation — the per-operator outcome stream is captured by the
Trackerinfrastructure (sibling toTracer), so the same data that drives selection drives offline ablation analysis. - Open questions — operator-hyperparameter meta-bandit; reward-design under multi-objective selection; bandit warm-up under cold-start.
Bloat control as an orthogonal modifier
- Subtree-selection strategies — bloat control lives in the selection strategy, not in each operator. A single
SizeFairSubtreeSelectionStrategyapplies to every replacement-family crossover and every random-node mutation that uses the strategy seam. - Strategies in scope — uniform (current default), size-fair (Langdon 2000), depth-fair, and homologous-aligned (Poli & Langdon 1998). Each is a small policy class, not a fork of every operator.
- Validation per operator + per strategy — diversity contribution, locality metric, and SRBench delta are all measured per (operator, strategy) pair. Operators that don't earn their slot get cut; same applies to strategies.
- Open questions — strategy choice as a per-branch bandit arm of its own; strategy interaction with the canonicalized populations produced by symbolic refinement; constraint-aware strategy variants.
Three consequences of this design
Related material
The crossover-coverage program is the reference design for the family taxonomy and the bandit architecture; the program is in active progress (multiple layers shipped, the rest in the queue). The mutation chain expansion landed earlier as a single shipped change.
Adjacent areas: Symbolic refinement § 2.2 describes the canonicalization that makes structural diversity-based bandit rewards meaningful; Search strategies § 2.6 describes the dual diversity gates that decide whether an offspring survives into the gene pool. The system view of how operators fit into a single branch is described under Technology § 1.
Continue reading
Other research areas are listed under Research. Published work lives on Papers; short-form notes on Notes.
