It depends very much on what you are doing in your algorithm, so in short, I would advise you to do your own benchmarks varying the number of bits starting from a small amount until one that it isn’t doable with both methods (you don’t need the results to have physical meaning while doing this).
If your algorithm rely on/exploits batch operations (like XOR, FLIP/NOT, AND…) over the bits I don’t see the reason why it would be any less efficient, in fact it will be faster as there are fewer elements (of the array) to be processed at once by your CPU (meaning more bits processed each step).
Otherwise, the bit extraction time will be relevant only if there is a function calling overhead involved to each bit, because the number of spins would remain the same.
I don’t see why it would be any slower as it is a cheap operation after all, plus neighbor’s bits would be a cache hit (in case you are doing spin “frustration”, “relaxation” methods).
But again, I suggest you benchmark if you’re still in doubt.
PS: Also be careful with deallocations you may be forgetting, even though fortran deallocates some of the arrays, as the code is crashing, I assume there is some kind of memory that should be freed manually (but it is not) and the SO needs to kill the process, it is worth to look into tools like valgrind or using flags like “-fsanitize=address” in GCC.