I'm wanting to train a very basic feed-forward neural network to learn a nonlinear-opeator, that is I want it to learn a function of functions.
For the sake of simplicity, let's say this operator maps a function, call it f, of type: R¹ → R¹ to another function g of type: R¹ → R¹ . The nonlinear operator is defined as y = N(f(x)) := ∫₋₁¹ f(x) * s + f'(x) * sin(π * s²)*cos(x) dx (if s is the gridpoints of the codomain y, but for simplicity I'm assuming s and x are the same size, same nodes).
So for my input functions, I just generate 31 random points, and then have my operator act on them to produce the correspodning "output functions" which serve as target functions, and I'll have say 1000 samples of those input-output pairs (i.e. I'll have 1000 pairs of 31D vectors, or two 31 x 1000 arrays, one storing my input functions, and the other storing my output functions).
I'm willing to post more code, e.g. my Gauss-Lobatto-Legendre discretization nodes, weights, and differentiation matrix, for my test operator I've implemented that I want my feed-forward neural network to learn. Issue is that going into that will run well into the limits of this post (and is probably going to be irrelevant to my main question here).
My main question:
In the past, e.g. when I tried following the "Fitting a Polynomial using MLP" Lux tutorial, examples like these would build out their models like so:
model = Chain(Dense(1 => 16, relu), Dense(16 => 1))
i.e. the first layer seems like a n_param × 1 row vector, i.e. it takes 1 point and maps it to 16 parameters.
However, thinking about what the first layer should look like mathematically: L₁= W v_in + b, I changed the first layer to take in my number of points directly. I.e. my first weight matrix is 64 x 31.
# Define the neural network model, Nx = Ny = 30
model = Lux.Chain(
Dense(Nx+1, 64, leakyrelu),
Dense(64, 64, leakyrelu),
Dense(64, Ny+1)
)
to be continued below: