How about starting to add difficulty labels to the issue trackers at fortran-lang? Maybe having easy, medium and difficult labels on almost all issues instead of the few good-first-issue and help-wanted labels makes it easier to pick a good starting point.
Personally, I find these two projects less suitable for GSoC students. While very important for the future growth of the projects, I think they demand less Fortran knowledge and more skills in GitHub, web scripting, software development practices, etc
At least from an outside perspective, I always though GSoC projects were supposed to engage students with an applied problem and help them become proficient in a programming language. The OS integration, subprocesses, and fpm compile flags belong in this area.
Of course I might be wrong, and we - the Fortran community - could benefit a lot from a student proficient in versioning, or with previous experience in registries, databases and related projects.
Infrastructure/Automation : These projects are the code that your organization uses to get its development work done; for example, projects that improve the automation of releases, regression tests and automated builds. This is a category in which a GSoC student can be really helpful, doing work that the development team has been putting off while they focus on core development.
Whether they’re interesting or rewarding for a specific student is a different question. But I know people who are very much into this kind of stuff.
Many thanks for all your work populating the ideas page so far @awvwgk, it is looking really promising!
I think it would be good to assign such difficulty labels to the issue trackers; do you have an idea of some criteria for categorising the issues? Perhaps based on size of changes required, familiarity with the codebase and general implementation complexity?
Another thing to consider is that any project ideas that require specification decisions will need to be highlighted as such (we already have a specification tag in fpm, thanks @awvwgk). Importantly we can’t allow specification decisions to bottleneck a student coding project and hence should aim to complete these decisions before start of code (June 7). This is well-suited for students to engage with the community during the ‘Bonding period’, however addressing specification earlier would be preferable IMO.
Implementation complexity sounds like a good starting point, a roughly estimated workload in time and coding might be a useful criteria as well. This assessment can be preferably done by project maintainers or prior contributors familiar with the code base.
I would suggest the following rough guidance:
easy is everything that estimates less than a whole day of work by a project maintainer
difficult is everything even the project maintainer can’t put a time estimate on
medium is (almost) everything else
I already created some difficulty labels in all the repos, but haven’t started labeling yet, so everybody feel free to start labeling as you visit your favourite issues.
Application Instructions: this is part of the first bullet point, this should probably be another wiki page I think. It should contain instructions what students have to do to be considered: introduce themselves at Discourse, we will discuss and help with their application, they should draft it, discuss with us, and submit. We agreed to do a patch requirement, so also submit a patch.
Label more issues with “easy to fix” labels.
Ondrej will contact other orgs to see if they would vouch for us
We will meet next Tuesday in the same time slot to finalize the application and submit
This is very last minute, but I have a Python project called Flint which I been working on from time to time, and may be relevant to GSoC.
It was an attempt at a Fortran linter and general code analysis tool, but at the moment it is just a tokenizer of the source code. It also creates a Project object which organizes its modules, variables, subprograms, metadata, etc. into a tree-like data structure.
The idea was to create something that we could navigate and inspect inside of Python, and then create use this information to create analysis tools.
As for the current state of the project:
Tokenization is very strong, including string handling, line continuation, and comments. For example, tokens inside strings or comments are handled correctly.
Whitespace tokens are also created, and is (at least in principle) something to be analyzed and checked against a style guide, and eventually discarded.
Many preprocessing statements are handled, although there are still issues with #if blocks (and others, most likely).
Statements are properly identified (module, subroutine, declaration, expression, etc) and much of the data is slotted into its appropriate sub-object.
S: subroutine meshgrid ( x , y , x_t , y_t )
D: real , dimension ( : ) , intent ( in ) :: x
D: real , dimension ( : ) , intent ( in ) :: y
D: real , dimension ( size ( x , 1 ) , size ( y , 1 ) ) , intent ( inout ) :: x_t
D: real , dimension ( size ( x , 1 ) , size ( y , 1 ) ) , intent ( inout ) :: y_t
D: integer :: ni , nj , i , j
E: ni = size ( x , 1 )
E: nj = size ( y , 1 )
C: do j = 1 , nj
C: do i = 1 , ni
e: x_t ( i , j ) = x ( i )
C: do j = 1 , nj
C: do i = 1 , ni
e: y_t ( i , j ) = y ( j )
S: end subroutine meshgrid
(S: subroutine, D: declaration, E: expression, C: control block, etc…)
Elements have “docstrings”, where any comments starting with !!, !<, !> are caught, preserved, bundled, and assigned to subroutines, variables, etc.
The statement iterator builds the “pre-docstring” and “post-docstring” for each line, and bundles multiline docstrings together.
What is missing is stronger development of the “frontend” code: style guide compliance, indent checking, variable and function usage stats, subroutine dependencies, parentheses checking (for floating point repro), keyword usage, etc.
Admittedly some of this can be done with compilers, but anything involving, say, whitespace or case sensitivity, is unlikely to be supported.
This project sat unfinished for a few years, but I have come back to it because my current employer is interested in using it for two purposes:
More aggressive rules for floating point expressions (to retain bit reproducibility)
Automated API documentation vis docstrings.
The latter is because we have started to feel the limits of Doxygen, which was very obviously designed around C++, and because we prefer to convert the output to something which Sphinx can render, which has become harder as Sphinx and various extensions evolve.
At the moment, flint can gather docstrings to produce documentation from this source:
Although far from complete and only sparsely documented, the tool has successfully managed to parse some very big (>100k line) projects, and I think it could be a useful starting point for some of the mentioned proposals, such as documentation or standards conformance.
And given how last minute (and half-baked) it is to bring this up, no worries if it’s too late to consider.
Welcome to the Discourse @marshallward and thanks for sharing your flint project, it looks very promising as a useful tool for the community! I think it is a good candidate for GSoC to push it forward.
I notice it doesn’t have a LICENSE file currently — is the project open source? If yes, then I don’t see any reason why we cannot list it on our GSoC ideas page; are able to provide one or more flint project ideas for the ideas page? Please do also join us for our Fortran-Lang GSoC meeting today to discuss if you’re able.