Learning only what needs to be learned

May 10, 2026

Our paper “A neural operator framework for solving inverse scattering problems”, joint with Victor Chenu and Houssem Haddar, has been available on arXiv for a couple months.

This post is a short attempt to explain what we did, but also why I find this direction interesting. The paper is about inverse acoustic scattering, neural operators, and the linear sampling method. More broadly, it is about a philosophy of scientific machine learning that I care about: machine learning should not replace reliable numerical algorithms when good mathematical methods already exist. It should help them.

Two extremes

There are two extreme ways of thinking about numerical algorithms in the age of machine learning.

The first extreme is to keep using standard methods exactly as before. This is the safe option. The algorithms are interpretable, connected to the theory, and often come with rigorous mathematical justification. In inverse scattering, the linear sampling method belongs to this world. It is a beautiful and robust method, but it can be computationally expensive in practice.

The second extreme is to replace the whole inverse problem by a neural network. The input is the measured data, the output is an image of the obstacle, and the intermediate mathematics disappears. Once trained, such a model can be very fast. But the training itself may be expensive, the method may require a lot of data, and it is often difficult to say precisely what has been preserved from the original inverse problem.

Our position is in between these two extremes.

We do not want to replace the linear sampling method. We want to keep it. But we also do not want to ignore the fact that some parts of the numerical pipeline are expensive and repetitive. The question is therefore: can we use a neural network only for one carefully chosen part of the algorithm?

The guiding principle is simple: learn only where learning is useful, and only where imperfection is acceptable.

The inverse scattering problem

The setting is acoustic scattering. One sends waves toward an unknown object and measures the waves scattered back. The direct problem is: given the obstacle, compute the scattered wave. The inverse problem is the opposite: given measurements of the scattered wave, recover information about the obstacle.

This is difficult for two reasons. First, the inverse problem is nonlinear. The relation between the shape of the obstacle and the measurements is not simple. Second, it is severely ill-posed. Small perturbations in the data, such as measurement noise, can lead to large changes in the reconstruction.

This is why classical inverse problem methods are valuable. They are not just numerical recipes; they are built around the structure of the equations. They encode mathematical information about the problem.

But they also have bottlenecks. In the linear sampling method, one of the main practical issues is regularization.

The linear sampling method

The linear sampling method is a classical method for reconstructing obstacles from far-field measurements.

Very roughly, the method probes the domain point by point. For each sampling point \(z\), one solves a regularized linear problem involving the far-field operator. The result is an indicator function: its values tell us whether \(z\) is likely to be inside or outside the scatterer.

This is attractive because the method does not require a parametrization of the obstacle. One does not need to assume in advance that the scatterer is a disk, an ellipse, or a polygon. The method produces a qualitative reconstruction directly from the data.

However, the regularization parameter matters. In the standard approach, one often uses Tikhonov regularization together with Morozov’s discrepancy principle. This is robust and mathematically meaningful, but it requires knowledge of the noise level and can be computationally expensive.

This is the point where we insert learning.

Learning only the regularization

The key idea of the paper is to use a neural operator to help regularize the linear sampling method.

This is not the same as training a neural network to solve the full inverse problem. The neural network does not produce the final reconstruction by itself. Instead, it produces information that is used inside the classical method.

This distinction is important.

Regularization does not need to be perfect. Its role is to guide the inversion process, not to solve the entire problem alone. If the true obstacle is a kite or an ellipse, the learned regularization does not need to be a perfect kite or a perfect ellipse. It only needs to provide useful spatial information about where the obstacle is likely to be.

This is why the approach can remain lightweight. We train using simple shapes, in particular circles. Circles are easy to generate, the corresponding scattering data can be computed analytically, and the training set can remain small and high-quality. The network learns a useful regularization pattern, not a complete inverse solver.

Then this learned regularization is injected into the linear sampling method. The final reconstruction still comes from the classical algorithm. The neural network helps with one part of the pipeline, but the mathematical backbone remains the linear sampling method.

This is perhaps the main message of the paper:

We train the network not to solve the inverse problem, but to help regularize a method that already knows how to solve it.

The hybrid method

More concretely, the input of the learned model is the noisy far-field matrix. From this data, we learn two quantities.

First, a neural operator produces a preliminary indicator function. This indicator gives a rough estimate of the location and shape of the obstacle. Again, it does not need to be perfect. It is only used as a guide.

Second, another neural network estimates the noise level in the data. This is important because classical regularization strategies often require information about the noise level.

The two learned outputs are then combined to define a regularization function for the linear sampling method. In simplified notation, one may think of it as

\[ \alpha_\theta[F^\delta](z) = \delta_\theta[F^\delta]\, I_\theta[F^\delta](z), \]

where \(F^\delta\) is the noisy far-field matrix, \(I_\theta\) is the learned indicator, and \(\delta_\theta\) is the learned noise estimate.

This regularization function \(\alpha_\theta\) is then used in the linear sampling method. The final reconstruction is therefore not just the output of a neural network. It is the output of a classical inverse scattering method whose regularization has been informed by learning.

A deliberately simple neural operator

The neural operator we use is a DeepONet with a fixed radial-basis-function trunk.

The branch network reads the far-field matrix and produces coefficients. The trunk gives a fixed spatial basis, made of radial basis functions, in which the indicator function is represented.

This design is intentionally simple. We are not trying to build the largest possible architecture. We want a model that is expressive enough to be useful, but structured enough to remain understandable.

A nice feature of the construction is that the network is essentially determined by the wavelength. Once the wavelength is fixed, we choose a spatial discretization scale, for instance \(h=\lambda/2\) or \(h=\lambda/4\). This in turn determines much of the architecture and training setup: the number of radial basis functions in the trunk (and hence the output dimension of the branch network), the branch network's width, the number of training samples, and the quadrature size.

This is quite different from the usual trial-and-error approach to neural network design. We are not tuning a large architecture by hand until it works. The geometry and physics of the scattering problem determine the scale at which the network should operate.

There remains essentially one important architectural parameter: the width of the radial basis functions in the trunk. This controls how much neighboring basis functions overlap. If the width is too small, the basis is too localized; if it is too large, the representation becomes too smooth.

In the paper, we use an NTK analysis to guide this choice.

Why the NTK appears

The neural tangent kernel, or NTK, is one way of analyzing the training dynamics of neural networks. Very roughly, it describes how the network changes during training in a regime where the dynamics can be approximated by a kernel method.

For us, the NTK is not used as an abstract theoretical decoration. It has a practical purpose: it helps understand the effect of the radial-basis-function width in the trunk.

This is exactly the kind of analysis I would like to see more often in scientific machine learning. If a neural component is inserted into a numerical method, its design should not be arbitrary. The architecture should be constrained by the mathematical problem, and the remaining parameters should be chosen using analysis whenever possible.

Noise estimation

The second learned ingredient is the noise level.

Instead of feeding the full far-field matrix to the noise-estimation network, we feed it singular values. This is natural for two reasons.

First, the singular values contain useful information about the noise level. In ill-posed inverse problems, noise often appears as a plateau in the singular spectrum. Second, they are invariant under translations of the obstacle. This is a very useful feature: the noise-estimation network can be trained only on disks of varying radius centered at the origin, and still generalize to translated obstacles.

Again, the point is not to use a neural network blindly. The input representation is chosen because it matches the structure of the inverse problem.

What the numerical experiments show

The numerical experiments are in two dimensions.

The main observation is that the learned regularization can significantly accelerate the linear sampling method workflow while producing accurate reconstructions. After training, the neural networks provide the regularization information quickly, avoiding the repeated parameter search associated with Morozov’s discrepancy principle.

For single obstacles, the method gives good reconstructions. For multiple obstacles, the learned indicator alone can be less precise, which is not surprising since the training data are deliberately simple. But once this imperfect indicator is used inside the linear sampling method, the final reconstruction improves substantially.

This is an important point. The neural network output by itself is not the final product. It is allowed to be imperfect because the classical method corrects and refines it.

That is the advantage of the hybrid approach.

Interactive scientific machine learning

There is also a practical aspect that matters a lot to me.

I am not especially interested in methods that require enormous datasets, huge networks, and specialized hardware just to reproduce a result. Large models have their place, but in computational mathematics there is great value in methods that are small enough to understand, fast enough to experiment with, and simple enough to modify.

This is close in spirit to Nick Trefethen’s idea of “ten digit algorithms”: programs that aim for “ten digits, five seconds, and just one page.” The exact numbers are not the point. The point is the attitude: a good computational idea should be accurate, fast enough to explore interactively, and simple enough to be read, modified, and understood.

I like to think of this as interactive scientific machine learning: models that are scientifically useful, fast enough for real-time experimentation, simple enough to be understood and modified, and small enough to remain at least potentially amenable to mathematical analysis.

This matters scientifically, not just practically. If a learned component is modest in size and has a clear role inside a numerical pipeline, one has a better chance of understanding what it represents, how it trains, how errors propagate, and how it interacts with the rest of the algorithm.

I think scientific machine learning should learn something from this. A good hybrid method should not require a gigantic model and a massive dataset whenever the underlying numerical problem already contains structure. Ideally, a student should be able to run experiments on a laptop, change a parameter, see what happens, and develop intuition.

Interactivity is not only a computational convenience. It is part of the scientific method.

Final thoughts

What I like about this project is that it sits in a middle ground.

It is not classical inverse problems as usual, because learning genuinely enters the algorithm. But it is not black-box AI either, because the learned components are embedded into a mathematical method with a clear purpose.

The neural network is not asked to solve the whole problem. It is asked to do something more modest, and more useful: provide a regularization function for the linear sampling method.

Thanks to Victor and Houssem for a very enjoyable collaboration. I am especially happy that this project connects naturally with the broader research direction I hope to develop in the coming years: making hybrid scientific machine learning more predictable, more interactive, and more useful for computational science.