I have recently upgraded my PC; well, built a new one. I then installed Windows 11 and mingw64, using GCC 12.2.0
I run my code that I have run under my old Windows installation, on Mac OSx and on Linux (Fedora) and the resultant file is double in size.
The curious thing is that when I open the file up in a text editor, it is exactly the same as before. Obviously, there is a new set of characters appearing but I cannot seem to find them at all.
I have now run two versions of my code, in case changes I made have generated the error, but no. I’m unsure if I had MingGW-W-64 as my GCC installation before, and assuming it is something about this version of GCC?
Windows used to have UTF-16 encoding for text files, so the file size was roughly twice the number of characters. From what I checked on my wife’s laptop with W11, it is rather UTF-8 now, so it should make the files smaller, opposite of what you see. I don’t know/use Windows so no expertise, but maybe it is somehow configurable.
My experience is Windows (up to Win 10) uses UTF-16 for file names (2-byte coding), but UTF-8 for text files ( ie 1-byte coding with 7-bit ASCII text plus sign bit extension ).
I have never selected a 2-byte character kind in Fortran so am not aware of either UTF-8 or UTF-16 extension to the ASCII character set I use in Windows.
I have never observed text files containing a 2-byte encoding, while I have received and reviewed a variety of files from third parties. I regularly receive files with UTF-16 file names, mainly .pdf files.
Changing the default character KIND in GCC 12.2.0 would be a significant issue and I would expect a flurry of complaints.
You’re probably right, my fault. I’ve probably mistaken the file names and text content. I wander when did they introduce utf8, as the oldest versions had single-byte encodings, all those Windows- LatinN, leading to all sorts of confusion.
msz59 was not too far from the truth; it’s to do with using PowerShell (which I wasn’t initially aware I was using).
PowerShell does not use ASCII (UTF-8) but Unicode (UTF-16) when creating files; hence the doubling of the file size!
At least I am not suffering of additional characters, etc. Unfortunately, Python appears to be reading the file in as 8-bit, not recognising the change in character set. I have a fix.