§ 2.3  ·  Research area

Constant optimization

CMA-ES, L-BFGS, Nelder-Mead, Coordinate Descent — numeric methods specialized for the constraints of symbolic regression candidates.

§ 1   Why constant optimization

Where the numeric tuning lives

A symbolic regression candidate is two things at once: a structure (which functions, which arguments, what shape) and a set of numeric constants (multipliers, offsets, exponents, decision thresholds). The structure is moved by mutation and crossover. The constants are moved by an optimizer. Without that optimizer, every structurally promising candidate would carry whatever random constants it was born with, and selection would have to hope they were close enough.

In Ufinq the optimizer is part of the refinement pipeline that runs on every candidate, every generation. Its job is to leave each candidate's constants tuned to the data before the candidate enters the gene pool. The research question is not whether to optimize but which optimizer: symbolic regression candidates have unusual loss landscapes, and the methods that work for parameter-vector ML do not always survive contact with them.

§ 2   Sub-areas

Four threads of work

§ 2.1

Four numeric optimizers, four convergence shapes

  • CMA-ES — derivative-free, evolutionary; robust on rugged landscapes and bad initializations. Pays for that robustness in iteration count, so the cost-per-candidate is high.
  • L-BFGS — quasi-Newton, gradient-based; near-quadratic convergence near a local minimum, very fast per iteration. Fragile when the candidate has discontinuous regions or when the gradient is poorly scaled — both common in symbolic candidates.
  • Nelder-Mead — simplex-based, derivative-free; cheap per iteration, but slow to converge on ill-conditioned problems and scales poorly in dimension. Useful as a refinement step on top of something coarser.
  • Coordinate descent — one-dimensional line searches in turn; extremely cheap, particularly suited to candidates whose constants decouple in the loss. Bad on coupled landscapes.
  • Open questions — automatic selection per candidate based on structural features; calibration against new optimizer families (TRBDF2, dogleg trust-region); cost-aware termination policy under a fixed per-generation budget.
§ 2.2

Chained optimization and stage selection

  • Chained pipelinesChainedOptimizationFunction runs a sequence of optimizers on the same candidate, each starting from the previous one's output. A typical chain runs a coarse method first to escape a bad initialization, then a local method to converge sharply.
  • Per-candidate skip — the optimizer can decline to run on a candidate when the structure makes it ineffective (no constants, constants that the loss does not depend on, candidates that already evaluate to non-finite). The skip is structural, not heuristic.
  • Stage cost vs gain — every stage in the chain has a measurable cost and a measurable gain. The chain definition is what decides which optimizers to run, in which order, with which termination criteria — a research surface in its own right rather than a tuning constant.
  • Open questions — adaptive chain composition per branch; chain-output reuse across generations when structure is unchanged; chain selection conditioned on the dataset's noise model.
§ 2.3

Levenberg-Marquardt and Gauss-Newton, deferred

  • What's tempting — LM/GN exploits the least-squares structure of the loss explicitly. On well-conditioned candidates it is the obvious choice — superlinear convergence, established theory, mature implementations.
  • What's wrong — Gauss-Newton degrades on the bad-structure candidates that dominate symbolic-regression populations. The Jacobian is ill-conditioned or singular, the residual landscape has non-smooth ridges, and the method either stalls or chases noise. LM's damping helps but does not fix the underlying issue.
  • What would unblock — a selective-LM variant that runs only on candidates whose structure passes a curvature test, with a principled fallback for the rest. The decision to defer LM stands until that variant shows a measurable SRBench delta over the current chain.
  • Open questions — the right curvature test (cheap, structure- only, predictive of LM stability); Trust-region variants as a middle ground; conditional-LM-as-final-stage of an existing chain.
§ 2.4

Structural optimization, adjacent

  • Pruning — a structural-optimization pass that removes nodes whose contribution to the loss is below a threshold. Distinct from constant optimization because it changes the structure, but interacts with it: pruned candidates often need a re-optimization pass on the smaller remaining structure.
  • Function substitution — swapping a function node for a simpler equivalent when constant optimization makes the simpler form provably superior. The kind of move a fixed simplification rule cannot make because the equivalence depends on the optimized constants.
  • Why it lives next to constant optimization — both run inside the same refinement pipeline stage. Both shrink candidates. They are separate research threads only because the methods differ; the surrounding machinery and the cost story are shared.
  • Open questions — interleaving policy with constant optimization (when does pruning happen, before or after, or both); integration with the e-graph rewriter (which finds equivalences but does not pick a representative under a numerical objective).
§ 3   Why it matters

Three consequences of this design

No structure-vs-constants tradeSelection sees candidates whose constants are already tuned. A bad-structure candidate that would have looked good with the right constants gets the right constants and proves it one way or the other; selection's signal is structural, not noisy.
Optimizer choice is data, not codeThe optimizer chain is configurable per run. Adding a new optimizer is implementing one interface and dropping it into a chain — not editing every place that does numerical work.
Methods that fail get caughtA method is added when an SRBench delta says it earns its slot, and rejected when the delta says it doesn't. LM is the standing example: theoretically obvious, empirically a wash on the candidate distribution, deferred until that changes.
§ 4   Notes & references

Related material

The pipeline view is described under Technology § 6; constant optimization is the third stage in the five-stage diagram (Figure 3). The note Every Ufinq individual is already a local optimum describes the design contract that the optimizer is part of.

Adjacent areas: Symbolic refinement § 2.2 describes the canonicalization that runs alongside constant optimization; Approximate evaluation § 2.4 describes the variable-cost-evaluation story that constant optimization dominates for many candidates.

Read

Continue reading

Other research areas are listed under Research. Published work lives on Papers; short-form notes on Notes.

Back to Research areas