Machine Learning and Data-Driven Model Discovery

← Back to Research


Motivation

A central ambition of modern applied mathematics and mathematical mechanics is to discover governing laws directly from data — to extract compact, interpretable mathematical structures from observed time series without imposing a pre-specified model form. This is particularly compelling for biological systems, where the underlying functional principles are often only partially known, and where the available data are noisy, sparse, and difficult to collect.

A concrete and biologically meaningful example is plant gravitropism: the slow reorientation of a plant shoot toward the vertical in response to gravity. Phenomenological models exist, but the precise functional form of the cost — or energy — that the plant is effectively minimizing through its growth and cellular reorganization has not been derived from first principles. It must be inferred from observation.

This research line addresses this challenge by combining gradient-flow mechanics with symbolic regression via genetic programming, to recover latent cost functions from time-series data. The work is carried out within the PRIN 2022 project DISCOVER (Data-drIven diSCOvery of latent Variable-modElled Relations), funded by the Italian Ministry of University and Research (€277k, 2024–2026), for which I serve as Principal Investigator.


Details

The gradient-flow framework and plant gravitropism

The starting point is a coupled fast–slow mechanical model for the orientation of a plant shoot growing in a tilted vase. The plant shoot is modeled as a rigid rod whose actual orientation θ is set quasi-instantaneously by mechanical equilibrium:

\[\theta - \varepsilon\,\sin(\theta + \alpha) = \theta_0,\]

where α is the tilt of the vase, θ₀ is the plant’s preferred orientation (an internal, slowly evolving variable), and ε = LW/K is the ratio of gravitational to elastic restoring moment.

The slow dynamics of θ₀ are governed by a gradient-flow remodeling law:

\[\tau\,\dot{\theta}_0 = -\frac{\partial F(\theta, \alpha)}{\partial \theta},\]

where F(θ, α) is an unknown scalar cost function. Mechanically, the plant adjusts its preferred orientation as if attempting to minimize F with respect to θ, without accounting for the mechanical forces acting on it — it senses orientation, not force. The canonical gravitropic cost, F_α(θ) = −cos(θ + α), reproduces the classical sine law of gravitropism and has its minimum at θ = −α, corresponding to vertical posture.

The key scientific question is: can this cost function be recovered from time-series observations of θ(t) and α(t) alone, without any prior knowledge of its functional form?

Symbolic regression via genetic programming

To answer this question, the paper develops a genetic-programming (GP) framework in which candidate cost functions are represented as symbolic expression trees built from elementary operations (addition, multiplication, exponentiation, sin, cos, exp, log) and the variables θ and α. The algorithm evolves a population of candidate expressions through selection, crossover, and mutation, scoring each candidate by the trajectory-prediction error it induces.

A key technical ingredient is a coefficient tuning step that runs at every generation. Each candidate expression is algebraically expanded into an additive decomposition:

\[F(\theta, \alpha) = \sum_{m=1}^{M} s_m\, T_m(\theta, \alpha),\]

and the scalar coefficients sₘ are optimized by ridge-regularized least squares on a pool of data snapshots, using the measured remodeling rates as regression targets. This hybrid approach separates structural search (handled by GP) from parameter estimation (handled by linear algebra), dramatically reducing the effective search space.

Fitness is measured by the mean normalized root-mean-square error (nRMSE) of the predicted trajectories on a held-out test set, penalized by a term proportional to expression-tree complexity to favor compact, interpretable laws.

Results and robustness

The framework is validated through Monte Carlo simulations on synthetic datasets generated by the forward model under randomized stimuli, with both process noise (Ornstein–Uhlenbeck torque perturbations) and measurement noise (additive Gaussian errors on the angle signal).

Across 30 independent trials per noise configuration and five noise levels (σ_m ∈ {0, 0.01, 0.02, 0.05, 0.10}, σ_p ∈ {0, 0.05, 0.10}):

The algorithm successfully decouples the deterministic governing law from the stochastic background — it recovers the physical signal rather than fitting the noise.

Context and outlook

This work sits at the intersection of gradient-flow mechanics, inverse problems, and machine learning. The methodology is not specific to plant gravitropism: any biological or physical system whose observable dynamics arise from a latent cost function of gradient-flow type is a potential target. Future directions include applications to experimental time-lapse imaging data of real plant shoots, extension to three-dimensional dynamics (circumnutation), and recovery of constitutive laws in soft materials (viscoelastic and poroelastic systems) — the broader scientific mission of the DISCOVER project.


Papers


Grant

This research is supported by the Italian Ministry of University and Research (MUR) through the PRIN 2022 grant DISCOVER (Data-drIven diSCOvery of latent Variable-modElled Relations), prot. 2022XKBCZB, total budget €277k, period 2024–2026. Principal Investigator: Giuseppe Tomassetti (Roma Tre University).