§ 2.5 · Research area

Genetic operators

How candidates are generated and recombined — the mutation chain, crossover families, and the structural invariants they must preserve.

§ 1 Why operators

What generates candidates, what recombines them, and what they have to preserve

Two families of operators move every population forward. Mutation takes a single candidate and produces a structurally adjacent one. Crossover takes two candidates and produces an offspring that combines material from both. In symbolic regression both kinds of operator have to walk the same line: enough structural change to find new hypotheses, not so much that the offspring is indistinguishable from a fresh random tree.

The Ufinq operator surface is built from the same primitives as classical tree-based GP, but with three differences. Every operator is part of a named family with declared semantics, so the population's mix can be reasoned about and tuned. Every replacement-style operator is parameterized by a pluggable subtree-selection strategy, so bloat control is an orthogonal modifier rather than a baked-in constant. And every operator's success rate is tracked per-branch by an adaptive bandit, so over time the population draws from the operators that are actually working.

§ 2 Sub-areas

Four threads of work

§ 2.1

Mutation chain

Node-kind families — mutations are organized by what they edit: arithmetic, comparison, boolean, function, variable, constant, binding, component, sequence, mapping, model. Each family has its own canonical invariants and its own set of legal local edits.
Random-node mutation, made structural — a random-node mutation picks a node by random walk and replaces it through rootNode.replace, which preserves type compatibility and arity. The earlier flat-node-list approach has been retired; the framework now goes through the same structural API every other operator uses.
Recomposition mutations — beyond local edits, a mutation can recompose a sub-component (a binding, a function call, a sequence element) by swapping in a freshly composed alternative. This is the mutation counterpart of the recomposition crossover family.
Open questions — adaptive mutation rates per node-kind family; mutation operators that respect numerically-tuned constants instead of re-randomizing them; correlation between local-edit "size" and offspring quality.

§ 2.2

Crossover taxonomy

Seven canonical families — Addition, Removal, Replacement, Combination, Blending, Splice, Semantic. Every crossover declares its family on the CrossoverFunction interface so the bandit can do hierarchical selection over family × operator instead of one flat distribution.
Subtree-selection strategy as a seam — the replacement family is parameterized by a SubtreeSelectionStrategy rather than calling randomNode directly. Uniform-random is the default; size-fair, depth-fair, and homologous-aligned strategies plug in without changing operator code.
Brood selection — a wrapper that runs the underlying crossover k times and keeps the best by a cheap proxy (complexity, structural diversity, simplification feasibility). True-fitness brood is deferred behind the bandit until the cost trade-off is understood.
Headless-chicken baseline — mating with a freshly composed random tree. Required as a known-baseline action for the bandit; every other operator must beat it on at least one of fitness gain or diversity contribution to justify its slot.
Open questions — geometric-semantic crossover under canonicalized populations; tree-aligned one-point crossover (Poli & Langdon) interaction with the e-graph rewriter; soft internal brood vs outer brood wrapper trade-off.

§ 2.3

Adaptive operator selection

Per-branch bandits — each branch maintains its own bandit state over the operators available to it. Branch isolation is absolute, so a branch's selection learns from its own outcomes only — there is no cross-branch operator-weight aggregation.
Hierarchical selection — the bandit is hierarchical over family × operator (and, eventually, over hyperparameter values). Hot families do not shadow cold operators within other families; the distribution mixes at the right granularity.
Reward against parent best — the reward signal is offspring quality versus parent best, not versus population mean. This makes the signal informative even when the population is converging.
Tracker-backed instrumentation — the per-operator outcome stream is captured by the Tracker infrastructure (sibling to Tracer), so the same data that drives selection drives offline ablation analysis.
Open questions — operator-hyperparameter meta-bandit; reward-design under multi-objective selection; bandit warm-up under cold-start.

§ 2.4

Bloat control as an orthogonal modifier

Subtree-selection strategies — bloat control lives in the selection strategy, not in each operator. A single SizeFairSubtreeSelectionStrategy applies to every replacement-family crossover and every random-node mutation that uses the strategy seam.
Strategies in scope — uniform (current default), size-fair (Langdon 2000), depth-fair, and homologous-aligned (Poli & Langdon 1998). Each is a small policy class, not a fork of every operator.
Validation per operator + per strategy — diversity contribution, locality metric, and SRBench delta are all measured per (operator, strategy) pair. Operators that don't earn their slot get cut; same applies to strategies.
Open questions — strategy choice as a per-branch bandit arm of its own; strategy interaction with the canonicalized populations produced by symbolic refinement; constraint-aware strategy variants.

§ 3 Why it matters

Three consequences of this design

Coverage with disciplineOperators are added by family with declared semantics, not by hunch. Coverage of the canonical taxonomy is a goal, but only operators that beat headless-chicken on a measurable axis stay in the lineup.

Bloat is decoupledSubtree-selection strategy is an orthogonal modifier. Adding a new bloat-control idea is a new strategy class, not a fork of every operator that has to be kept in sync forever.

The mix tunes itselfPer-branch bandits learn which operators are paying off in this run, on this dataset, against this population. The configured operator list is the upper bound on what is reachable; what actually fires is selected.

§ 4 Notes & references

Related material

The crossover-coverage program is the reference design for the family taxonomy and the bandit architecture; the program is in active progress (multiple layers shipped, the rest in the queue). The mutation chain expansion landed earlier as a single shipped change.

Adjacent areas: Symbolic refinement § 2.2 describes the canonicalization that makes structural diversity-based bandit rewards meaningful; Search strategies § 2.6 describes the dual diversity gates that decide whether an offspring survives into the gene pool. The system view of how operators fit into a single branch is described under Technology § 1.

Read

Continue reading

Other research areas are listed under Research. Published work lives on Papers; short-form notes on Notes.

Back to Research areas