New release of ATHENA - ATHENA 1.2.4

Dear all,

A new version of ATHENA has been released – version 1.2.4!

Since my last post regarding ATHENA back in October 2023, there have been quite a few changes. For a brief recap, ATHENA is a Fortran-based feed-forward neural network library, with a focus on 3D convolutional layers.

Here is a brief summary of the new features since my last post (1.1.0):

  • fpm compilation compatibility
  • Batch normalisation layers added
  • Adam, AdaGrad, RMSProp optimisers added
  • Learning rate decay methods added
  • Unit tests added (currently 75% line coverage)
  • Overhaul of example programs and new additional examples
  • Lots of bug fixing
  • Repository migrated from GitLab to GitHub

The first thing to note is that the ATHENA library has been moved from its old hosting (on the Exeter Uni GitLab) to GitHub to improve community engagement. The migrated repository should have everything the old one did (including the wiki), other than release tags for versions prior to 1.2.0. Here is the new link:

The library can be installed using cmake or fpm. Currently it only handles local fpm installation and is not yet on the fpm registry (I will be looking into this next, but it might take a while as I can imagine that the library may not align with some of the current standards of fpm).

A recap on the available layers in

  • 1D, 3D, and 4D inputer layers
  • 2D and 3D convolutional layers
  • 2D and 3D dropblock layers
  • Dropout layers
  • 2D and 3D flatten layers (automated)
  • 2D and 3D average pooling layers
  • 2D and 3D maxpool layers
  • 2D and 3D batch normalisation layers
  • Fully-connected (dense) layers

The ATHENA project has drawn some inspiration from the neural-fortran project. I recommend people check out that project also for neural-network functionality in modern fortran.

I hope that this is useful for someone as well as myself. Feel free to do whatever you want with it. I hope to continue to maintain this and develop it further. Speed still may be a limiting factor (it has not hindered my projects, but I can imagine that there will be much more room for improvement).

Feel free to ask any questions or make recommendations. If anyone has any interest in contributing to this project, please get in touch as help and further guidance on where to take this would be greatly appreciated.

Kind regards,
Ned

11 Likes

I had a look at your project, just trying to get some idea about it, mind you, but I do not see the documentation directory you mentioned.

Hi @Arjen. Sorry about that. You are correct, thanks for pointing that out. I have now removed the reference to doc/ in the README. There is currently no doc/ directory as I have not yet made a manual.pdf (I hope to at some point though, just not sure when).

The GitHub wiki is currently how I am documenting the code. Hopefully this offers sufficient guidance on how the code all works/can be used (along with the example programs). Please let me know if it is not clear though and I will be happy to improve it.

Ah, I had not looked at the Wiki. I will have a look there :slight_smile:

1 Like

Very nice. Wish your project was available last year when I was trying to generate a regression model for some very large scale CFD and experimental data sets. I was forced to use Python/keras (which was surprisingly fast and very easy to implement even for the rather large training data sets I was using). One thing that made a big difference in overall convergence was adding a batch normalization layer as a preprocessor for the training data before I added the hidden layers. Can your code do something like this

  model = Sequential()
  if ibatchnorm > 0 :
    model.add(BatchNormalization())
  for i in range (nhidden):
    if iregl2 > 0:
      model.add(Dense(nneurons, activation=actfun, kernel_regularizer=regularizers.L2(regval)))
    else:
      model.add(Dense(nneurons, activation=actfun))

    if ibatchnorm > 0:
      model.add(BatchNormalization())
    if idropout > 0:
      model.add(Dropout(0.1))
  model.add(Dense(1, activation='linear'))

  model.compile(loss=lossfun, optimizer=optim, metrics=[rmse, r2, yr2, 'mse','mae','mape'])

Thank you. :smiley:

Yeah, I’ve found Python/keras and pytorch to be very fast as-is. I’d expect Fortran to be faster if well implemented, but that requires someone with more knowledge of parallelisation (and code efficiency) than my current self.

ATHENA can’t currently do this as I haven’t included the idea of a 1D batch normalisation layer (only 2D and 3D batch normalisation so far). But give me 1 day and it’ll should hopefully be ready in a development branch!

2 Likes

This is great, thanks for posting the update here. I’m also happy you moved it to GitHub for better visibility. I’d like to revisit our conversation soon; other projects took priority for now so I had to put it on hold. Neural-fortran has a WIP PR for a batchnorm layer started by last year’s GSoC student. When I resume that work I’ll certainly consult Athena’s implementation.

On a related topic, we recently proposed an ML for Fortran workshop to Supercomputing 2024 (and even reference Athena among others in the proposal). If we’re lucky to get selected I’d love to see an Athena submission to the workshop.

4 Likes

Hi again @rwmsu. I’ve developed a batchnorm1d layer derived type now. I’ve built a unit test for it (and after some editing, it worked), but have not fully tested it myself (it shouldn’t be broken as it is basically the same as batchnorm2d). If you are using it and it doesn’t work as expected, please let me know and I’ll see what I can do to fix it.

This new layer type “batchnorm1d_layer_type” is now in the “development” branch. It works the same as the “batchnorm2d_layer_type” (num_features and num_channels are optional input arguments, they both specify the same thing, so only one can ever be specified).

Here is an example program that should, hopefully match your example code as much as possible. Keep in mind that regularisers currently can only be applied uniformly across a network and cannot be specified differently for each layer. I hope this helps. :slight_smile:

program main
  use athena

  implicit none

  integer, parameter :: ibatchnorm = 1
  integer, parameter :: iregl2 = 1
  integer, parameter :: idropout = 1
  integer, parameter :: nhidden = 3
  integer, parameter :: nneurons = 4
  real, parameter :: regval = 0.01

  integer :: i

  character(10) :: actfun = "relu"

  type(network_type) :: network

  
  call network%add(input1d_layer_type(input_shape=[1]))
  if(ibatchnorm .gt. 0) call network%add(batchnorm1d_layer_type())

  do i = 1, nhidden
     call network%add(full_layer_type( &
           num_outputs = nneurons, &
           activation_function = actfun))
     
     if(ibatchnorm .gt. 0) call network%add(batchnorm1d_layer_type())
     if(idropout .gt. 0) call network%add(dropout_layer_type(num_masks=1, rate=0.1E0))
  end do
  call network%add(full_layer_type( &
        num_outputs = 1, &
        activation_function = "linear"))

  if(iregl2 .gt. 0)then
     call network%compile( &
          optimiser = base_optimiser_type(learning_rate=1.E0, regulariser = l2_regulariser_type(regval)), &
          loss_method="mse", metrics=["loss"])
  else
     call network%compile( &
          optimiser = base_optimiser_type(learning_rate=1.E0), &
          loss_method="mse", metrics=["loss"])
  end if

end program main

Hi @milancurcic, thanks very much! :smiley: Yes, I moved it to GitHub whilst submitting it to the JOSS journal to comply with their rules, but have since realised the much better accessibility and community it offers.

I’d love to revisit our conversations also. Whenever you have time, just send me a message. :slight_smile:

Oooh, the workshop sounds interesting, I hope you get selected! And thank you for referencing Athena in it. Yeah, I’d definitely be interested in submitting something for Athena!

Wow, thats amazing. Thanks but I have retired since I was working on this last year and no longer have access to the training data (its from a U.S. DoD funded project thats covered by ITAR/export control laws) so I can’t try rerunning my cases with your code. However, I might try to generate some simple analogs for the data sets I was using just to see if your code works. I’ll keep you posted.

Hopefully you’ll get a chance to test it with some analogs. But if not, it’s still handy to have batchnorm1d in the code, so your comment was a good motivation! :smiley:

I had a look at one of the examples - “simple”. I see that you use the random_seed subroutine, but it looks as if you are trying to set the size of the seed. Thaty is not how it works - “size = seed_size” will merely return the size of the actual seed used by the implementation, it does not change it.

Thanks for pointing this out, @Arjen. :slight_smile:

I have now fixed this in the development branch. This needed addressing in a few of the examples, tests, and in the random_setup procedure provided within the ATHENA library.

Unfortunately, we were not selected for Supercomputing '24. Based on reviews, the proposal was deemed to be of decent quality and significance, but of insufficient or inappropriate scope and relevance (too niche) for Supercomputing. We will sit on the feedback and discuss what to do next. Onward.

Aw, that’s a shame. I’m sorry to hear that. :frowning: Thanks for keeping me updated on it. Let me know if I can be of any help.

A paper has been published about the package:

5 Likes

It has indeed, after some fantastic (and much appreciated) work by the reviewers!

My next step will be to include message passing/graph layers (already in a development branch, just needs tidying up and testing). I then will make it so that people can extend the library with new layers types without any need to edit the original library.

3 Likes