Workflow for project with fpm (fpm build/install vs run with input files)

Hi,

I am making an existing project work with fpm. Now I wonder how the usual workflow is during further development of that project. fpm build is fine but how can you try out your changes? As far as I see, fpm run is supposed to run in the root directory but what if the program needs input files and creates output files; those shouldn’t spam the directories. One solution would be to always use fpm install but then you would overwrite your latest knowingly-working version with a potentially buggy one. How do you usually do this?

I would like to keep automatic testing as a separate topic.

(Meta: Is this the correct place to ask such questions or is there an fpm-specific one? I was hesitant to open an issue for this because it’s not a specific proposal to change anything.)

2 Likes

Thanks @Sideboard for the question. Yes, this is a good place to ask. Opening up an issue would also work, but for a general discussion I think a Forum is better.

Yes, I struggled with the input files too in here: lfortran / examples / mvp_demo · GitLab, as you can see I put input.txt in the root directory. That’s not ideal.

You can do fpm run from any directory, but it seems the program still runs relative to the root directory.

I can see two ways out:

  1. Run the program by hand as ./build/lfortran_00000000811C9DC5/app/mvp_demo then you can run it from any directory you want.

  2. Use an argument to the executable to specify where the input file is. Something like fpm run mvp_demo --compiler=lfortran -- path/to/input.txt

What other ways are there for a good workflow of a computational code with input files?

Where would be a good place to put input files in an fpm repository?

1 Like

The easiest way at the moment is to pass the name of the input file as command line argument or via an environment variable. I think this is also a better design since it makes the program more flexible.

But the general issue remains, I’m aware of several widely used projects that assume that an input file with a certain name will be available in the current working directory and more importantly generate a bunch of output files in the working directory as well. Especially for the latter case it would be nice to have the possibility in fpm to set a working directory on the command line.

The related issue is here:

Fpm does support changing the directory with -C/--directory, but we are no better than other projects here and also assume that our input file, fpm.toml, is in our current working directory.

1 Like

That is a bit cumbersome due to the hash within the path. What is the hash made of (and is this documented)? If the hash changes fast, you cannot use an old call from the terminal history.

1 Like

Accessing anything directly from the build directory is always highly discouraged. Instead use fpm to give you the path to the executable:

pushd inputs && $(fpm run --runner which | grep '^/') && popd

By using which or realpath you will get the absolute path to the binary which remains usable when changing the directory. The pipe through grep is meant to filter only the final executable path and remove other output fpm might generate. This also ensures that the binary is actually updated in case you change your project source.

This solution is of course not perfect and a tiny bit hacky.

3 Likes

And I do not understand how it works. Or rather, I have the feeling it does not work outside the fpm project, does it? The main idea for calling the binary directly would be to call it where the data is.

The fpm run commands allows to provide a runner argument, this is mainly meant for starting the program in a debugger or using mpiexec, but in principle this can be anything. The command which can be used to return an absolute path to an executable when provided with a name found in the PATH or when given a path to any executable. We will use which as runner to turn the relative path provided by fpm to an absolute path.

The command

fpm run --runner which | grep '^/'

will always return the absolute path to the binary of the project, you can add all fpm options there as well, including the -C/--directory argument to change to the fpm project you want to run.

It works, it just not pretty because of all the shell hacks. I’ll put a proper solution for this in fpm on my TODO list.

1 Like

Works outside of an fpm project as well. Here is a quick demo:

asciicast

1 Like

But that’s what I meant. The -C option was missing in the first example and without it fpm wouldn’t know where to look for the project if called from outside.

Thanks for adding it as a todo.

Here is how it behaves on my computer:

$ fpm run mvp_demo --compiler=lfortran --runner             
build/lfortran_00000000811C9DC5/app/mvp_demo
$ fpm run mvp_demo --compiler=lfortran --runner | grep '^/'
$

So the grep does not work. But without the grep it works.

I agree I don’t like accessing the file directly via the build directory + hash. I also don’t like using this command, it seems hackish.

I think one good option is to improve the fpm driver to allow calling the application from any directory.

The other path is to just always install the program and use it from the installation directory.

@Sideboard if you can help us design fpm to work well, that would be awesome. Your use case is extremely common, that’s how I like to work also.

1 Like

I just have an alternate directory (called “pdq”) that I would guess you would call a scratch bin directory. When I want to try the command out “normally” I just do

fpm install --prefix ~/.local/pdq

and then I can use the program sans fpm. I use aliases that put me in and out of a shell with “pdq” in my path as well.

If you just delete files out of pdq when done and also have the same command (ie. your “production” version) in your path you might have to enter “rehash” or “hash -r”.

Going one step further I made a script call “tbash” that when I give it a name it starts a new shell with pdq first in the path and some shell functions so I can just enter commands like “build”, “run”, “tst” and since it knows the name I gave tbash it knows where to go first.

I have been playing with making that a plug-in but haven’t had the time to finish it; where it is a mini-shell written in Fortran that traps a few commands like build and run and test and lets anything it does not recognize just pass to the shell.

Another one I was playing with was a modified “fpm run” with a --shell switch. Basically it does the same thing as run does now except instead of running a program it starts a shell with the directory the executable(s) are in in the path. I only made it work on ULS systems and it always starts bash but that is not an insurmountable problem. Except for sometimes forgetting to type “exit” when done and ending up seven levels deep in spawned shells that actually works pretty well.

3 Likes

@urbanjost Have you tried setting up environment modules (Tcl modules or lmod) for this purpose? I found it really useful to have an environment management system on my local machine which allows to revert changes I did in my environment (i.e. loading and unloading a directory with development binaries).

1 Like

So fpm run --shell is like starting an environment with the compiled programs in $PATH? Sort of like conda activate my_env?

One can even imagine while developing to somehow make fpm put the compiled programs in your $PATH. Then you can just run them directly.

I generally do not like fiddling with my environment, so another option is along the lines of yours fpm install --prefix ~/.local/pdq. Possibly something like

$ fpm install --prefix `pwd`/bin

essentially creating a bin directory in the root directory and add the binaries there, it could be a symlink to the last compiled files from the build directory. So you can conveniently call them using bin/my_app. When I am done, just “git clean -dfx” and everything gets cleaned up. No environment changes.

Another idea: Have additional symlinks in the build directory:

  • build/bin/my_app (equivalent to the above bin idea, just nested under build)
  • build/latest/app/my_app (latest build, any compiler)
  • build/gfortran/latest/app/my_app (latest gfortran build)
  • build/lfortran/latest/app/my_app (latest lfortran build)

That way these links are stable, and you can use them (no hash). They are still quite long, except the first one. Even the first one is quite long, but better than the current situation.

1 Like

Symlinks are not cross-platform compatible and will break on Windows.

1 Like

Then we can use symlinks on platforms that support them and a copy on Windows. Windows I think has symlinks, you just have to deal with them differently. I don’t know if they break. If they are usable, we can use them, if not, we’ll do a copy.

1 Like

We can design symlinking carefully to support Windows as well, but I think our energy is better spend on improving the -C/--directory behaviour in fpm.

I believe fpm should provide the possibility to access the executables without requiring the user to directly interact with the build directory. Any direct interaction with the build directory introduces the possibility of having an outdated binary which does not represent the current project source. Also, fpm must not write any build artifacts outside of the build directory (unless it’s installing them).

3 Likes

I did. I have a long history with modules. Because each build with different flags uses a different directory I didn’t come up with anything much other than a module to load and unload the pdq. If you are using modules that is great but as a general solution requiring modules and writing files even if just a line or two in lua or tcl/tk seemed like too much of a complication. Depending on which module and shell you use it is also easy to set up useful aliases that way too so you can just enter “build” or “run” (I was reminded the hard way for a moment not to alias “test” when I tried it with modules – I deleted the file named “/bin/[” because it looked like junk on the first machine I tried Unix on a long time ago. You would think that would have stopped me from ever messing with “test” again :>). So for people comfortable with modules – yes, that is great. But although modules are useful on a home machine (I too have it on my home machines) I do not think they are ubiquitous enough to use that as a general solution. But the idea of packaging up an “fpm” module is very appealing for power users, I think. I think I would go with tcl/tk because it can be used by lmod as well, even though lmod prefers lua. So I am thinking down other roads, but as a special case you are right – modules works well.

2 Likes

The --shell has a very similar effect as the conda command in it’s simplest form. In that form it just spawns a shell with the current build’s directory in the path first. You can then go anywhere and type commands with the only difference being your binaries from the fpm build are in your path. To “deactivate” you just type exit.

I have had enough problems with links, especially in file systems that are cross-mounted to different platforms with different OSes that I habitually avoid them! I have a hunch they would be prone to dusty corner issues like that and would not work too well with builds using different platforms, which there are already issues with (gfortran might not be the same version or might build for a different architect in the same directory; or I might be switching between different compiler versions (maybe with modules!) in the same window, for example.

In the more complicated option --shell actually calls a fortran program that is a little shell that uses M_history for a platform-independent command history, traps command lines beginning with “build” and “test” and “run” and “remembers” the last set of compiler flags and other fun stuff, and lets anything else through to the shell. I like it, but since every command is a subshell it has limitations, and it has to call POSIX routines to do it’s own cd(1) and set environment variables so it is fun but in no way ready to go into the wild; but it might be a plugin some day but I would forget about it for now.

Maybe instead of the somewhat awkward (particularly when doing a non-default set of -flag and -profile options)

fpm run CMD --FPM_OPTIONS(S) -- CMD_OPTIONS

fpm could come with a command somewhat like the VMS @ command where you could just enter

@ cmd -options

and “@” would be a command that comes with fpm that looks for the cmd with the latest build time and calls it requoting the options; or something down those lines. Even the fpm command itself could do that, although that is a relatively small change from the current “run” command,
not sure if that would be sigificantly better but maybe

fpm :cmd --options

where you could leave off the : if the command is not called {test,run,build,install,…}. Wonder if the OP would find that any better? If it automatically looked for the latest built binary it would at least save having to respecify the build options on the “run” command. It is not in standard Fortran to find the last build all that easily, but the name of the last build directory could go in the build/cache.toml file to do it in a more system-independent manner.

3 Likes

I generally recommend against hard-coding file names/paths, but as a long-standing practice for many existing projects I’ve encountered it plenty of times and wished there was a solution. I didn’t realize we had added the -C option. I think adding a -W/--working-dir option for run and test would be a good solution. Setting a default in the fpm.toml file might be a good idea too, since it’s something you’d like to use over and over again, and probably won’t change very often.

1 Like

I like the name better than --directory but isn’t that what --directory does now? being able to set it also seems worth pursuing.

Combining some of the concepts above, if there were a shell command, such as

fpm shell [--install DIRECTORY_NAME]

where

  1. the directory defaults to build/bin in the current fpm parent directory
  2. the directory is put in the command path in a new shell
  3. a variable like FPM_DEVELOPMENT_INSTALL is set to the directory name
  4. when the FPM_DEVELOPMENT_INSTALL directory name is present any binary built is copied to the directory

would work. If FPM_DEVELOPMENT_INSTALL is not set, fpm would perform as it does now. That would mean anything I last built regardless of the options would go into that directory and immediately be in my path for testing, which I could then proceed to do anywhere? We could restrict the directory to having to be in the build directory, but by allowing it to be anywhere it could support something like the “pdq” directory described above that would allow multiple programs from multiple projects to be used instead of just the ones in the current directory?
You would have to opt in by doing the “fpm shell” command, somewhat like the conda(1) command mentioned above. Entering “exit” would revert you back to a normal shell. The way shell variables work that could be nestable too.

Except for the issue of not having a portable pure way to get the full pathname of the directory
for the current project that would be straight forward just using the existing code. Mostly it would mean forking a shell with $PATH altered and doing what the “install” command does into FPM_DEVELOPER_DIRECTORY after each executable is built.

As far as the full pathname issue almost all the compilers support a PWD() extension. Then
I am pretty sure gfortran returns a full path on an INQUIRE() of $arg(0) (but that most other compilers do not). Another alternative is that I have POSIX routines that call the C routines for ULS systems via realpath(3c) ; could even by done by reading the output of shell commands).
So that is solvable even if unfortunately requiring something non-standard or the calling of system commands.

After rereading everyones points I think that is easy to use and implementable and combines the ideas reasonably well.

1 Like