A new version of ATHENA has been released – version 1.2.4!
Since my last post regarding ATHENA back in October 2023, there have been quite a few changes. For a brief recap, ATHENA is a Fortran-based feed-forward neural network library, with a focus on 3D convolutional layers.
Here is a brief summary of the new features since my last post (1.1.0):
fpm compilation compatibility
Batch normalisation layers added
Adam, AdaGrad, RMSProp optimisers added
Learning rate decay methods added
Unit tests added (currently 75% line coverage)
Overhaul of example programs and new additional examples
Lots of bug fixing
Repository migrated from GitLab to GitHub
The first thing to note is that the ATHENA library has been moved from its old hosting (on the Exeter Uni GitLab) to GitHub to improve community engagement. The migrated repository should have everything the old one did (including the wiki), other than release tags for versions prior to 1.2.0. Here is the new link:
The library can be installed using cmake or fpm. Currently it only handles local fpm installation and is not yet on the fpm registry (I will be looking into this next, but it might take a while as I can imagine that the library may not align with some of the current standards of fpm).
A recap on the available layers in
1D, 3D, and 4D inputer layers
2D and 3D convolutional layers
2D and 3D dropblock layers
Dropout layers
2D and 3D flatten layers (automated)
2D and 3D average pooling layers
2D and 3D maxpool layers
2D and 3D batch normalisation layers
Fully-connected (dense) layers
The ATHENA project has drawn some inspiration from the neural-fortran project. I recommend people check out that project also for neural-network functionality in modern fortran.
I hope that this is useful for someone as well as myself. Feel free to do whatever you want with it. I hope to continue to maintain this and develop it further. Speed still may be a limiting factor (it has not hindered my projects, but I can imagine that there will be much more room for improvement).
Feel free to ask any questions or make recommendations. If anyone has any interest in contributing to this project, please get in touch as help and further guidance on where to take this would be greatly appreciated.
Hi @Arjen. Sorry about that. You are correct, thanks for pointing that out. I have now removed the reference to doc/ in the README. There is currently no doc/ directory as I have not yet made a manual.pdf (I hope to at some point though, just not sure when).
The GitHub wiki is currently how I am documenting the code. Hopefully this offers sufficient guidance on how the code all works/can be used (along with the example programs). Please let me know if it is not clear though and I will be happy to improve it.
Very nice. Wish your project was available last year when I was trying to generate a regression model for some very large scale CFD and experimental data sets. I was forced to use Python/keras (which was surprisingly fast and very easy to implement even for the rather large training data sets I was using). One thing that made a big difference in overall convergence was adding a batch normalization layer as a preprocessor for the training data before I added the hidden layers. Can your code do something like this
model = Sequential()
if ibatchnorm > 0 :
model.add(BatchNormalization())
for i in range (nhidden):
if iregl2 > 0:
model.add(Dense(nneurons, activation=actfun, kernel_regularizer=regularizers.L2(regval)))
else:
model.add(Dense(nneurons, activation=actfun))
if ibatchnorm > 0:
model.add(BatchNormalization())
if idropout > 0:
model.add(Dropout(0.1))
model.add(Dense(1, activation='linear'))
model.compile(loss=lossfun, optimizer=optim, metrics=[rmse, r2, yr2, 'mse','mae','mape'])
Yeah, I’ve found Python/keras and pytorch to be very fast as-is. I’d expect Fortran to be faster if well implemented, but that requires someone with more knowledge of parallelisation (and code efficiency) than my current self.
ATHENA can’t currently do this as I haven’t included the idea of a 1D batch normalisation layer (only 2D and 3D batch normalisation so far). But give me 1 day and it’ll should hopefully be ready in a development branch!
This is great, thanks for posting the update here. I’m also happy you moved it to GitHub for better visibility. I’d like to revisit our conversation soon; other projects took priority for now so I had to put it on hold. Neural-fortran has a WIP PR for a batchnorm layer started by last year’s GSoC student. When I resume that work I’ll certainly consult Athena’s implementation.
On a related topic, we recently proposed an ML for Fortran workshop to Supercomputing 2024 (and even reference Athena among others in the proposal). If we’re lucky to get selected I’d love to see an Athena submission to the workshop.
Hi again @rwmsu. I’ve developed a batchnorm1d layer derived type now. I’ve built a unit test for it (and after some editing, it worked), but have not fully tested it myself (it shouldn’t be broken as it is basically the same as batchnorm2d). If you are using it and it doesn’t work as expected, please let me know and I’ll see what I can do to fix it.
This new layer type “batchnorm1d_layer_type” is now in the “development” branch. It works the same as the “batchnorm2d_layer_type” (num_features and num_channels are optional input arguments, they both specify the same thing, so only one can ever be specified).
Here is an example program that should, hopefully match your example code as much as possible. Keep in mind that regularisers currently can only be applied uniformly across a network and cannot be specified differently for each layer. I hope this helps.
program main
use athena
implicit none
integer, parameter :: ibatchnorm = 1
integer, parameter :: iregl2 = 1
integer, parameter :: idropout = 1
integer, parameter :: nhidden = 3
integer, parameter :: nneurons = 4
real, parameter :: regval = 0.01
integer :: i
character(10) :: actfun = "relu"
type(network_type) :: network
call network%add(input1d_layer_type(input_shape=[1]))
if(ibatchnorm .gt. 0) call network%add(batchnorm1d_layer_type())
do i = 1, nhidden
call network%add(full_layer_type( &
num_outputs = nneurons, &
activation_function = actfun))
if(ibatchnorm .gt. 0) call network%add(batchnorm1d_layer_type())
if(idropout .gt. 0) call network%add(dropout_layer_type(num_masks=1, rate=0.1E0))
end do
call network%add(full_layer_type( &
num_outputs = 1, &
activation_function = "linear"))
if(iregl2 .gt. 0)then
call network%compile( &
optimiser = base_optimiser_type(learning_rate=1.E0, regulariser = l2_regulariser_type(regval)), &
loss_method="mse", metrics=["loss"])
else
call network%compile( &
optimiser = base_optimiser_type(learning_rate=1.E0), &
loss_method="mse", metrics=["loss"])
end if
end program main
Hi @milancurcic, thanks very much! Yes, I moved it to GitHub whilst submitting it to the JOSS journal to comply with their rules, but have since realised the much better accessibility and community it offers.
I’d love to revisit our conversations also. Whenever you have time, just send me a message.
Oooh, the workshop sounds interesting, I hope you get selected! And thank you for referencing Athena in it. Yeah, I’d definitely be interested in submitting something for Athena!
Wow, thats amazing. Thanks but I have retired since I was working on this last year and no longer have access to the training data (its from a U.S. DoD funded project thats covered by ITAR/export control laws) so I can’t try rerunning my cases with your code. However, I might try to generate some simple analogs for the data sets I was using just to see if your code works. I’ll keep you posted.
Hopefully you’ll get a chance to test it with some analogs. But if not, it’s still handy to have batchnorm1d in the code, so your comment was a good motivation!
I had a look at one of the examples - “simple”. I see that you use the random_seed subroutine, but it looks as if you are trying to set the size of the seed. Thaty is not how it works - “size = seed_size” will merely return the size of the actual seed used by the implementation, it does not change it.
I have now fixed this in the development branch. This needed addressing in a few of the examples, tests, and in the random_setup procedure provided within the ATHENA library.
Unfortunately, we were not selected for Supercomputing '24. Based on reviews, the proposal was deemed to be of decent quality and significance, but of insufficient or inappropriate scope and relevance (too niche) for Supercomputing. We will sit on the feedback and discuss what to do next. Onward.
It has indeed, after some fantastic (and much appreciated) work by the reviewers!
My next step will be to include message passing/graph layers (already in a development branch, just needs tidying up and testing). I then will make it so that people can extend the library with new layers types without any need to edit the original library.