§ 2.7  ·  Research area

Compositional search

Building large models by composing smaller ones — divide-and-conquer in symbolic regression, sub-model assembly, and the seams between them.

§ 1   Why compositional search

Models are made of smaller models, and the search has to know that

A symbolic regression candidate that fits real-world data is rarely one monolithic expression. It is usually a composition: a forest of weak learners, a linear transformation over engineered features, a small neural-style structure, or a hand-shaped combination of named submodels. Treating the search as if every candidate were a single tree of arithmetic throws away a lot of structure that the search could be using as a prior.

Ufinq treats composition as a first-class concept. A CompositionFunction is the object that draws a candidate from the universal function space — not a procedural detail, but a named family the search picks among. The same primitives let a candidate be assembled from named submodels in a model library, or be recombined with another candidate at the compositional rather than node level. The research question is what to compose, with what, when.

§ 2   Sub-areas

Four threads of work

§ 2.1

Composition templates

  • Random composition — the baseline; draws an arbitrary tree from the function space under structural constraints. Useful as a null hypothesis the others have to beat.
  • Forest composition — an ensemble of independently composed sub-trees combined by a fixed aggregator. The search optimizes the ensemble jointly, not each tree in isolation.
  • Linear-transformation composition — a learned linear transform over a set of feature expressions. Captures the case where the right answer is "a small number of nonlinear features then linear regression".
  • Neuronal and support-vector compositions — templates inspired by neural networks and SVMs respectively. The structure is fixed; the search tunes the inner expressions and the gating.
  • Model composition — a candidate that wraps a named Model as one of its components. The wrapper is evolved; the wrapped model is treated as an opaque function.
  • Open questions — automatic template selection per dataset; template-conditional search-space priors; mixed-template populations (different templates competing under one strategy).
§ 2.2

Sub-model assembly from a library

  • Modelbase — a content-addressable store of named Models. Models that have been validated on past problems live here and can be referenced by future searches as building blocks.
  • Model discovery — given a candidate position in a search and a Modelbase, propose candidates that fit. Cheap-to-evaluate similarity surrogates (signature-based, behavior-based) keep the discovery pass fast enough to run in the inner loop.
  • Model substitution — splice a discovered submodel into a candidate at the position where it fits, replacing whatever random subexpression was there. The substitution is a structural edit, gated by the same refinement pipeline as anything else.
  • Open questions — ranking policy when multiple submodels fit; substitution-cost accounting (a substituted submodel can be arbitrarily large); preventing degenerate runs that converge on "use the named model directly with a thin wrapper".
§ 2.3

Combination crossovers

  • Function-mapping combination — recombines two parents by matching their function-mapping structure (which inputs feed which named output) and crossing at the mapping rather than the tree level. Preserves intent across complex compositions where sub-tree splice would lose semantic alignment.
  • Model-mapping combination — the analogue for Model-shaped candidates: parents that wrap submodels recombine those wrappings rather than the inner submodel structure.
  • Family discipline — both operators are part of the Combination crossover family in the broader taxonomy; the adaptive operator bandit weighs them against the other families based on per-branch outcome data.
  • Open questions — combination crossover that can move material across template kinds (forest ↔ linear); preserving the inner expressions' refinement state across recombination; interaction with sub-model substitution within the same generation.
§ 2.4

Divide-and-conquer for hard targets

  • Sub-problem decomposition — when a regression target decomposes naturally (multi-output, piecewise behavior, regime shifts), the search can solve sub-problems independently and assemble. The decomposition is a first-class object in the search space; the assembly happens through the composition templates.
  • Multi-output models — a single Model can have multiple named outputs, evolved jointly with shared structure. The search balances per-output accuracy against shared-structure regularity.
  • Promotion of sub-results — a sub-search's best output can be promoted into the Modelbase mid-run, becoming available as a substitution target for the rest of the run. This is the bridge between sub-model assembly and divide-and-conquer.
  • Open questions — automatic decomposition when the user does not pre-declare it; promotion-threshold policy; preventing sub-result promotion from collapsing diversity at the outer level.
§ 3   Why it matters

Three consequences of this design

Templates carry priorsA composition template is a search-space prior — it says this is the kind of model to look for first. Trying many templates against one dataset is a cheap way to find out which family the data is asking for.
Past work is reusableModels that worked on past problems are building blocks for new searches via the Modelbase. The system remembers what was hard-won and offers it back rather than rediscovering each time from scratch.
Hard problems decomposeTargets that are out of reach for a single-tree search become reachable when the search can spawn sub-problems and assemble. Compositional search is the surface through which divide-and-conquer is expressed.
§ 4   Notes & references

Related material

The system view of how composition fits into a single branch's lifecycle is described under Technology § 1. The function space the templates draw from is described under Technology § 4.

Adjacent areas: Genetic operators § 2.5 describes the broader operator taxonomy that the combination crossovers fit into; Distributed orchestration § 2.1 describes how a divide-and-conquer sub-search is spawned and aggregated across the cluster.

Read

Continue reading

Other research areas are listed under Research. Published work lives on Papers; short-form notes on Notes.

Back to Research areas