GSoC Optimizers: Weekly Progress Reports

GSoC 2023 Logging :bookmark_tabs:

Community Bonding Period :rocket:

Week 1 (11 May - 18 May)

  • Researched about optimizers and made a list of optimizers I plan to implement.
    • Momentum Optimizer
    • Root Mean Square Propagation (RMSprop)
    • Adaptive Moment Estimation (Adam)
    • Adagrad
    • Adadelta
  • Didn’t do much work due to ongoing final exams.
  • Next week’s goals:
    • Example of fitting a Quadratic function
    • Start with Basic optimizer implementation

Week 2 (18 May - 25 May)

  • Added example program to fit a quadratic function
  • Draft PR Created

Week 3 (25 May - 1 June)

  • Updated code for 1-D Array dataset defining x & y.
  • Implemented separate subroutine for each of the optimizer:
    • Batch gradient descent
    • Mini-batch gradient descent
    • Stochastic gradient descent

Week 4 (1 June - 8 June)

  • Updated and refactored the optimizer subroutines.
  • Compared the performance of each optimizer:
    • For 1000 epochs:
      Stochastic gradient descent MSE: 0.000778
           Batch gradient descent MSE:  0.054431
       Minibatch gradient descent MSE:  0.006031

Week 5 (8 June - 15 June)

  • Discussed the how to access layer parameters(weights, baises & gradients) with the Mentor.

  • Implemented the RMSProp optimizer subroutine, which was then refactored.

    PR Link: #144


Week 6 (15 June - 22 June)

  • Studied the changes done in new RMS subroutine and SGD optimizer Stub.

  • Implemented Momentum and Nesterov modifications in SGD optimizer

  • Every day, as the project is advancing, I’m learning something intriguing from @milancurcic, not just about fortran-lang but also about general coding practices.

    PR Link: #148


Week 7 (22 June - 29 June)

  • Refactored Momentum and Nesterov modifications in SGD optimizer
  • Added concrete implementation of RMSProp in optimizers module.
  • Implementation of Adam optimizer under Quadratic example.

Week 8 (29 June - 6 July)

  • Re-factorization of the subroutines in the quadratic_fit example to accept xtest and ytest.
  • Code to Report RMSE every 10% of num_epochs.
  • SGD and RMSprop optimizers plumbing at the network % update level
  • Added draft implementation of get_gradients() method.
  • Improvised implementation of Adam optimizer under Quadratic example.

Week 8 (29 June - 6 July)

  • Added test suite for optimizers module.

  • Implemented code for convergency tests.

    PR: #148


You and your GSOC mentors could consider whether to implement the algorithms in a recent preprint

Provably Faster Gradient Descent via Long Steps
by Benjamin Grimmer

This work establishes provably faster convergence rates for gradient descent via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster O(1/TlogT) rate for gradient descent is also motivated along with simple numerical validation.


Week 10 (13 July - 20 July)

  • Added Implementation for Adam optimizer.

  • Added convergency test for adam optimizer module.

    PR: #150


Week 11 (20 July - 27 July)

  • Added weight decay modification (AdamW) in Adam optimizer.

  • Added implementation for adagrad optimizer.

    PR: #154


Week 12 (27 July - 3 August)

  • Corrected implementation for adagrad optimizer.
  • Added convergency test for adagrad optimizer module.
1 Like

Week 13 (3 August - 10 August)

  • Added structure of the Batch Normalization Layer.

  • Refactored forward and backward algorithms based on interpretation of Ioffe and Szegedy (2015).

    PR: #157


Grimmer’s work is described in Quanta magazine: Risky Giant Steps Can Solve Optimization Problems Faster.

Week 14 (10 August - 17 August)

  • Added draft test suite for the Batch Normalization Layer.