Thanks for your encouraging comments @FortranFan. It’s not too bad, I did the tokens to string decoder here:
it can probably still be simplified (it even does simplified UTF-8 decoding!). The encoder will be harder, essentially we need to translate this little Python file: https://github.com/certik/fastGPT/blob/01eb84b015d89a567245da0445c0abb7d53a8500/encode_input.py, there is a regex in it, but I am hoping we can hand code it. We’ll have to write lots of tests to ensure we didn’t make a mistake, but it shouldn’t be hard, I was focusing on performance first.