Merge pull request #39 from Abhishek-1Bhatt/deeponet

Deeponet
SciML · Feb 27, 2022 · 82b0465 · 82b0465
2 parents f47637f + d3e58f1
commit 82b0465
Show file tree

Hide file tree

Showing 12 changed files with 334 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -36,11 +36,15 @@ It performs Fourier transformation across infinite-dimensional function spaces a
 With only one time step information of learning, it can predict the following few steps with low loss
 by linking the operators into a Markov chain.
 
+**DeepONet operator** (Deep Operator Network)learns a neural operator with the help of two sub-neural net structures described as the branch and the trunk network. The branch network is fed the initial conditions data, whereas the trunk is fed with the locations where the target(output) is evaluated from the corresponding initial conditions. It is important that the output size of the branch and trunk subnets is same so that a dot product can be performed between them.
+
 Currently, the `FourierOperator` layer is provided in this work.
 As for model, there are `FourierNeuralOperator` and `MarkovNeuralOperator` provided. Please take a glance at them [here](src/model.jl).
 
 ## Usage
 
+### Fourier Neural Operator
+
 ```julia
 model = Chain(
     # lift (d + 1)-dimensional vector field to n-dimensional vector field
@@ -76,6 +80,32 @@ opt = Flux.Optimiser(WeightDecay(1f-4), Flux.ADAM(1f-3))
 Flux.@epochs 50 Flux.train!(loss, params(model), data, opt)
 ```
 
+### DeepONet
+
+```julia
+#tuple of Ints for branch net architecture and then for trunk net, followed by activations for branch and trunk respectively
+model = DeepONet((32,64,72), (24,64,72), σ, tanh)
+```
+Or specify branch and trunk as separate `Chain` from Flux and pass to `DeepONet`
+
+```julia
+branch = Chain(Dense(32,64,σ), Dense(64,72,σ))
+trunk = Chain(Dense(24,64,tanh), Dense(64,72,tanh))
+model = DeepONet(branch,trunk)
+```
+
+You can again specify loss, optimization and training parameters just as you would for a simple neural network with Flux.
+
+```julia
+loss(xtrain,ytrain,sensor) = Flux.Losses.mse(model(xtrain,sensor),ytrain)
+evalcb() = @show(loss(xval,yval,grid))
+
+learning_rate = 0.001
+opt = ADAM(learning_rate)
+parameters = params(model)
+Flux.@epochs 400 Flux.train!(loss, parameters, [(xtrain,ytrain,grid)], opt, cb = evalcb)
+```
+
 ## Examples
 
 PDE training examples are provided in `example` folder.
@@ -84,6 +114,10 @@ PDE training examples are provided in `example` folder.
 
 [Burgers' equation](example/Burgers)
 
+### DeepONet implementation for solving Burgers' equation
+
+[Burgers' equation](example/Burgers/src/Burgers_deeponet.jl)
+
 ### Two-dimensional Fourier Neural Operator
 
 [Double Pendulum](example/DoublePendulum)
@@ -113,3 +147,4 @@ PDE training examples are provided in `example` folder.
 - [Neural Operator: Graph Kernel Network for Partial Differential Equations](https://arxiv.org/abs/2003.03485)
   - [zongyi-li/graph-pde](https://github.com/zongyi-li/graph-pde)
 - [Markov Neural Operators for Learning Chaotic Systems](https://arxiv.org/abs/2106.06898)
+- [DeepONet: Learning nonlinear operators for identifying  differential equations based on the universal approximation theorem of operators](https://arxiv.org/abs/1910.03193)
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -10,7 +10,7 @@ Documentation for [NeuralOperators](https://github.com/foldfelis/NeuralOperators
 |:----------------:|:--------------:|
 | ![](https://github.com/foldfelis/NeuralOperators.jl/blob/master/example/FlowOverCircle/gallery/ans.gif?raw=true) | ![](https://github.com/foldfelis/NeuralOperators.jl/blob/master/example/FlowOverCircle/gallery/inferenced.gif?raw=true) |
 
-The demonstration showing above is Navier-Stokes equation learned by the `MarkovNeuralOperator` with only one time step information.
+The demonstration shown above is Navier-Stokes equation learned by the `MarkovNeuralOperator` with only one time step information.
 Example can be found in [`example/FlowOverCircle`](https://github.com/foldfelis/NeuralOperators.jl/tree/master/example/FlowOverCircle).
 
 ## Abstract
@@ -30,6 +30,8 @@ It performs Fourier transformation across infinite-dimensional function spaces a
 With only one time step information of learning, it can predict the following few steps with low loss
 by linking the operators into a Markov chain.
 
+**DeepONet operator** (Deep Operator Network)learns a neural operator with the help of two sub-neural net structures described as the branch and the trunk network. The branch network is fed the initial conditions data, whereas the trunk is fed with the locations where the target(output) is evaluated from the corresponding initial conditions. It is important that the output size of the branch and trunk subnets is same so that a dot product can be performed between them.
+
 Currently, the `FourierOperator` layer is provided in this work.
 As for model, there are `FourierNeuralOperator` and `MarkovNeuralOperator` provided.
 Please take a glance at them [here](apis.html#Models).
@@ -44,6 +46,8 @@ pkg> add NeuralOperators
 
 ## Usage
 
+### Fourier Neural Operator
+
 ```julia
 model = Chain(
     # lift (d + 1)-dimensional vector field to n-dimensional vector field
@@ -78,3 +82,31 @@ loss(𝐱, 𝐲) = sum(abs2, 𝐲 .- model(𝐱)) / size(𝐱)[end]
 opt = Flux.Optimiser(WeightDecay(1f-4), Flux.ADAM(1f-3))
 Flux.@epochs 50 Flux.train!(loss, params(model), data, opt)
 ```
+
+### DeepONet
+
+```julia
+#tuple of Ints for branch net architecture and then for trunk net, followed by activations for branch and trunk respectively
+model = DeepONet((32,64,72), (24,64,72), σ, tanh)
+```
+
+Or specify branch and trunk as separate `Chain` from Flux and pass to `DeepONet`
+
+```julia
+branch = Chain(Dense(32,64,σ), Dense(64,72,σ))
+trunk = Chain(Dense(24,64,tanh), Dense(64,72,tanh))
+model = DeepONet(branch,trunk)
+```
+
+You can again specify loss, optimization and training parameters just as you would for a simple neural network with Flux.
+
+```julia
+loss(xtrain,ytrain,sensor) = Flux.Losses.mse(model(xtrain,sensor),ytrain)
+evalcb() = @show(loss(xval,yval,grid))
+
+learning_rate = 0.001
+opt = ADAM(learning_rate)
+parameters = params(model)
+Flux.@epochs 400 Flux.train!(loss, parameters, [(xtrain,ytrain,grid)], opt, cb = evalcb)
+```
+A more complete example using DeepONet architecture to solve Burgers' equation can be found in the [examples](../../example/Burgers/src/Burgers_deeponet.jl)
diff --git a/example/Burgers/src/Burgers.jl b/example/Burgers/src/Burgers.jl
@@ -5,6 +5,7 @@ using Flux
 using CUDA
 
 include("data.jl")
+include("Burgers_deeponet.jl")
 
 __init__() = register_burgers()
 
@@ -30,7 +31,7 @@ function train()
         Dense(128, 1),
         flatten
     ) |> device
-    
+
     loss(𝐱, 𝐲) = sum(abs2, 𝐲 .- m(𝐱)) / size(𝐱)[end]
 
     loader_train, loader_test = get_dataloader()

diff --git a/example/Burgers/src/Burgers_deeponet.jl b/example/Burgers/src/Burgers_deeponet.jl
@@ -0,0 +1,32 @@
+function train_don()
+    if has_cuda()
+        @info "CUDA is on"
+        device = gpu
+        CUDA.allowscalar(false)
+    else
+        device = cpu
+    end
+
+    x, y = get_data_don(n=300)
+    xtrain = x[1:280, :]' |> device
+    xval = x[end-19:end, :]' |device
+
+    ytrain = y[1:280, :] |> device
+    yval = y[end-19:end, :] |> device
+
+    grid = collect(range(0, 1, length=1024))' |> device
+
+    learning_rate = 0.001
+    opt = ADAM(learning_rate)
+
+    m = DeepONet((1024,1024,1024),(1,1024,1024),gelu,gelu)
+    loss(xtrain,ytrain,sensor) = Flux.Losses.mse(model(xtrain,sensor),ytrain)
+    evalcb() = @show(loss(xval,yval,grid))
+
+    Flux.@epochs 400 Flux.train!(loss, params(m), [(xtrain,ytrain,grid)], opt, cb = evalcb)
+    ỹ = m(xval, grid)
+
+    diffvec = vec(abs.((yval .- ỹ)))
+    mean_diff = sum(diffvec)/length(diffvec)
+    return mean_diff
+end
diff --git a/example/Burgers/src/data.jl b/example/Burgers/src/data.jl
@@ -27,6 +27,15 @@ function get_data(; n=2048, Δsamples=2^3, grid_size=div(2^13, Δsamples), T=Flo
     return x_loc_data, y_data
 end
 
+function get_data_don(; n=2048, Δsamples=2^3, grid_size=div(2^13, Δsamples))
+    file = matopen(joinpath(datadep"Burgers", "burgers_data_R10.mat"))
+    x_data = collect(read(file, "a")[1:n, 1:Δsamples:end])
+    y_data = collect(read(file, "u")[1:n, 1:Δsamples:end])
+    close(file)
+
+    return x_data, y_data
+end
+
 function get_dataloader(; n_train=1800, n_test=200, batchsize=100)
     𝐱, 𝐲 = get_data(n=2048)
 

diff --git a/example/Burgers/test/deeponet.jl b/example/Burgers/test/deeponet.jl
@@ -0,0 +1,5 @@
+@testset "DeepONet Training Accuracy" begin
+    ϵ = Burgers.train_don()
+
+    @test ϵ < 0.4
+end
diff --git a/example/Burgers/test/runtests.jl b/example/Burgers/test/runtests.jl
@@ -3,4 +3,5 @@ using Test
 
 @testset "Burgers" begin
     include("data.jl")
+    include("deeponet.jl")
 end
diff --git a/src/DeepONet.jl b/src/DeepONet.jl
@@ -0,0 +1,132 @@
+"""
+`DeepONet(architecture_branch::Tuple, architecture_trunk::Tuple,
+                        act_branch = identity, act_trunk = identity;
+                        init_branch = Flux.glorot_uniform,
+                        init_trunk = Flux.glorot_uniform,
+                        bias_branch=true, bias_trunk=true)`
+`DeepONet(branch_net::Flux.Chain, trunk_net::Flux.Chain)`
+
+Create an (unstacked) DeepONet architecture as proposed by Lu et al.
+arXiv:1910.03193
+
+The model works as follows:
+
+x --- branch --
+               |
+                -⊠--u-
+               |
+y --- trunk ---
+
+Where `x` represents the input function, discretely evaluated at its respective sensors.
+So the ipnut is of shape [m] for one instance or [m x b] for a training set.
+`y` are the probing locations for the operator to be trained. It has shape [N x n] for
+N different variables in the PDE (i.e. spatial and temporal coordinates) with each n distinct evaluation points.
+`u` is the solution of the queried instance of the PDE, given by the specific choice of parameters.
+
+Both inputs `x` and `y` are multiplied together via dot product Σᵢ bᵢⱼ tᵢₖ.
+
+You can set up this architecture in two ways:
+
+1. By Specifying the architecture and all its parameters as given above. This always creates
+ `Dense` layers for the branch and trunk net and corresponds to the DeepONet proposed by Lu et al.
+
+2. By passing two architectures in the form of two Chain structs directly. Do this if you want more
+flexibility and e.g. use an RNN or CNN instead of simple `Dense` layers.
+
+Strictly speaking, DeepONet does not imply either of the branch or trunk net to be a simple
+ DNN. Usually though, this is the case which is why it's treated as the default case here.
+
+# Example
+
+Consider a transient 1D advection problem ∂ₜu + u ⋅ ∇u = 0, with an IC u(x,0) = g(x).
+We are given several (b = 200) instances of the IC, discretized at 50 points each and want
+ to query the solution for 100 different locations and times [0;1].
+
+That makes the branch input of shape [50 x 200] and the trunk input of shape [2 x 100]. So the
+ input for the branch net is 50 and 100 for the trunk net.
+
+# Usage
+
+```julia
+julia> model = DeepONet((32,64,72), (24,64,72))
+DeepONet with
+branch net: (Chain(Dense(32, 64), Dense(64, 72)))
+Trunk net: (Chain(Dense(24, 64), Dense(64, 72)))
+
+julia> model = DeepONet((32,64,72), (24,64,72), σ, tanh; init_branch=Flux.glorot_normal, bias_trunk=false)
+DeepONet with
+branch net: (Chain(Dense(32, 64, σ), Dense(64, 72, σ)))
+Trunk net: (Chain(Dense(24, 64, tanh; bias=false), Dense(64, 72, tanh; bias=false)))
+
+julia> branch = Chain(Dense(2,128),Dense(128,64),Dense(64,72))
+Chain(
+  Dense(2, 128),                        # 384 parameters
+  Dense(128, 64),                       # 8_256 parameters
+  Dense(64, 72),                        # 4_680 parameters
+)                   # Total: 6 arrays, 13_320 parameters, 52.406 KiB.
+
+julia> trunk = Chain(Dense(1,24),Dense(24,72))
+Chain(
+  Dense(1, 24),                         # 48 parameters
+  Dense(24, 72),                        # 1_800 parameters
+)                   # Total: 4 arrays, 1_848 parameters, 7.469 KiB.
+
+julia> model = DeepONet(branch,trunk)
+DeepONet with
+branch net: (Chain(Dense(2, 128), Dense(128, 64), Dense(64, 72)))
+Trunk net: (Chain(Dense(1, 24), Dense(24, 72)))
+```
+"""
+struct DeepONet
+    branch_net::Flux.Chain
+    trunk_net::Flux.Chain
+end
+
+# Declare the function that assigns Weights and biases to the layer
+function DeepONet(architecture_branch::Tuple, architecture_trunk::Tuple,
+                        act_branch = identity, act_trunk = identity;
+                        init_branch = Flux.glorot_uniform,
+                        init_trunk = Flux.glorot_uniform,
+                        bias_branch=true, bias_trunk=true)
+
+    @assert architecture_branch[end] == architecture_trunk[end] "Branch and Trunk net must share the same amount of nodes in the last layer. Otherwise Σᵢ bᵢⱼ tᵢₖ won't work."
+
+    # To construct the subnets we use the helper function in subnets.jl
+    # Initialize the branch net
+    branch_net = construct_subnet(architecture_branch, act_branch;
+                                    init=init_branch, bias=bias_branch)
+    # Initialize the trunk net
+    trunk_net = construct_subnet(architecture_trunk, act_trunk;
+                                    init=init_trunk, bias=bias_trunk)
+
+    return DeepONet(branch_net, trunk_net)
+end
+
+Flux.@functor DeepONet
+
+#= The actual layer that does stuff
+x is the input function, evaluated at m locations (or m x b in case of batches)
+y is the array of sensors, i.e. the variables of the output function
+with shape (N x n) - N different variables with each n evaluation points =#
+function (a::DeepONet)(x::AbstractArray, y::AbstractVecOrMat)
+    # Assign the parameters
+    branch, trunk = a.branch_net, a.trunk_net
+
+    #= Dot product needs a dim to contract
+    However, we perform the transformations by the NNs always in the first dim
+    so we need to adjust (i.e. transpose) one of the inputs,
+    which we do on the branch input here =#
+    return branch(x)' * trunk(y)
+end
+
+# Sensors stay the same and shouldn't be batched
+(a::DeepONet)(x::AbstractArray, y::AbstractArray) =
+  throw(ArgumentError("Sensor locations fed to trunk net can't be batched."))
+
+# Print nicely
+function Base.show(io::IO, l::DeepONet)
+    print(io, "DeepONet with\nbranch net: (",l.branch_net)
+    print(io, ")\n")
+    print(io, "Trunk net: (", l.trunk_net)
+    print(io, ")\n")
+end
diff --git a/src/NeuralOperators.jl b/src/NeuralOperators.jl
@@ -8,6 +8,10 @@ module NeuralOperators
     using Zygote
     using ChainRulesCore
 
+    export DeepONet
+
     include("fourier.jl")
     include("model.jl")
+    include("DeepONet.jl")
+    include("subnets.jl")
 end
diff --git a/src/subnets.jl b/src/subnets.jl
@@ -0,0 +1,39 @@
+"""
+Construct a Chain of `Dense` layers from a given tuple of integers.
+
+Input:
+A tuple (m,n,o,p) of integer type numbers that each describe the width of the i-th Dense layer to Construct
+
+Output:
+A `Flux` Chain with length of the input tuple and individual width given by the tuple elements
+
+# Example
+
+```julia
+julia> model = NeuralOperators.construct_subnet((2,128,64,32,1))
+Chain(
+  Dense(2, 128),                        # 384 parameters
+  Dense(128, 64),                       # 8_256 parameters
+  Dense(64, 32),                        # 2_080 parameters
+  Dense(32, 1),                         # 33 parameters
+)                   # Total: 8 arrays, 10_753 parameters, 42.504 KiB.
+
+julia> model([2,1])
+1-element Vector{Float32}:
+ -0.7630446
+```
+"""
+function construct_subnet(architecture::Tuple, σ = identity;
+                          init=Flux.glorot_uniform, bias=true)
+    # First, create an array that contains all Dense layers independently
+    # Given n-element architecture constructs n-1 layers
+    layers = Array{Flux.Dense}(undef, length(architecture)-1)
+    @inbounds for i ∈ 2:length(architecture)
+      layers[i-1] = Flux.Dense(architecture[i-1], architecture[i], σ;
+                                init=init, bias=bias)
+    end
+
+    # Concatenate the layers to a string, chain them and parse them into
+    # the Flux Chain constructor syntax
+    return Meta.parse("Chain("*join(layers,",")*")") |> eval
+end