ChatGPT for Programming Numerical Methods

A follow-up paper is Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

Overall, Copilot generates codes that are more reliable but less optimized, whereas codes generated by Llama-2 are less reliable but more optimized when correct.

Llama-2 was originally written in C, but there is a Fortran version.

(I am not sure where to post this, so just picking an old thread — please let me know if a different thread is better for posting.)

I’ve tried asking the following question to “Gemini” (google).

How to effectively use GPU in a platform-agnostic way with Fortran

then the reply was something like this. I imagined it would also show OpenMP, but it did not at the moment. (I guess the data for training these models might not be sufficiently new to reflect the latest situations (?). Also, other prompts or services might give different replies.)

Full reply from Gemini

Effective GPU Usage in Fortran: A Platform-Agnostic Approach

Understanding GPU Programming and Fortran

While Fortran is primarily a sequential language, its capabilities can be extended to leverage the parallel processing power of GPUs. To achieve this, we typically rely on libraries or extensions that provide abstractions for GPU programming. These abstractions help to isolate the platform-specific details, making the code more portable.

Popular Libraries and Extensions

  1. OpenACC: A directive-based approach that adds extensions to Fortran to offload computations to GPUs. It provides a high-level interface, making it relatively easy to parallelize code.
  2. OpenCL: A more general-purpose API for heterogeneous computing, including GPUs. It offers fine-grained control over device programming but requires more complex code.
  3. CUDA Fortran: A Fortran-specific extension developed by NVIDIA for their CUDA platform. It provides a similar interface to CUDA C and offers high performance on NVIDIA GPUs.

Key Considerations for Effective GPU Usage

  1. Data Transfer: Minimize data transfers between CPU and GPU, as these can be performance bottlenecks. Ensure that data is efficiently copied to the GPU before computations and back to the CPU when needed.
  2. Kernel Optimization: Write efficient GPU kernels that maximize the utilization of GPU resources. Consider factors like thread block size, grid size, and memory access patterns.
  3. Memory Management: Manage GPU memory carefully to avoid memory leaks and ensure optimal performance. Use techniques like memory pooling and explicit deallocation.
  4. Error Handling: Implement robust error handling to catch GPU-related errors and prevent unexpected behavior.
  5. Profiling: Use profiling tools to identify performance bottlenecks and optimize your code accordingly.

Example: Using OpenACC

use openacc
implicit none

real, dimension(:, :) :: a, b, c
integer :: i, j, n

! Allocate data on the GPU
call acc_init_device()
call acc_malloc(a, n*n)
call acc_malloc(b, n*n)
call acc_malloc(c, n*n)

! Copy data to the GPU
call acc_copy(a, a_device, n*n)
call acc_copy(b, b_device, n*n)

! Offload the computation to the GPU
!$acc parallel loop present(a, b, c)
do i = 1, n
  do j = 1, n
    c(i, j) = a(i, j) + b(i, j)
  enddo
enddo
!$acc end parallel loop

! Copy the result back to the CPU
call acc_copy(c_device, c, n*n)

! Deallocate GPU memory
call acc_free(a_device)
call acc_free(b_device)
call acc_free(c_device)
call acc_exit_device()

Choosing the Right Library

The best choice of library depends on your specific requirements, such as the level of control needed, the supported platforms, and the complexity of your code. For many applications, OpenACC offers a good balance between ease of use and performance. However, if you require fine-grained control or need to target specific platforms, OpenCL or CUDA Fortran may be more suitable.

By following these guidelines and carefully selecting the appropriate library, you can effectively leverage the power of GPUs to accelerate your Fortran applications.

Excerpt of an example OpenACC code:

use openacc
implicit none

real, dimension(:, :) :: a, b, c
integer :: i, j, n

! Allocate data on the GPU
call acc_init_device()
call acc_malloc(a, n*n)
call acc_malloc(b, n*n)
call acc_malloc(c, n*n)

! Copy data to the GPU
call acc_copy(a, a_device, n*n)
call acc_copy(b, b_device, n*n)

! Offload the computation to the GPU
!$acc parallel loop present(a, b, c)
do i = 1, n
  do j = 1, n
    c(i, j) = a(i, j) + b(i, j)
  enddo
enddo
!$acc end parallel loop

! Copy the result back to the CPU
call acc_copy(c_device, c, n*n)

! Deallocate GPU memory
call acc_free(a_device)
call acc_free(b_device)
call acc_free(c_device)
call acc_exit_device()

It seems that the example code has some bugs/issues (e.g., a_device etc are not declared, and possibly they need to be allocated with acc_malloc()? The order of loop indices i, jalso seems not good for efficiency…)

I’d expect any code generated by LLMs to be very fragile at best, and seriously wrong at worst, especially in the context of numerical methods. The reasons were discussed in this thread:

Specifically, this post by AI scholar Gary Marcus illustrates the fundamental limitations of stochastic prediction as an approach to knowledge.

That, in a nutshell, is why we should never trust pure LLMs; even under carefully controlled circumstances with massive amounts of directly relevant data, they still never really get even the most basic linear functions.

3 Likes

It’s such a funny coincidence, but I managed to fool myself today by using ChatGPT in the context of interpolation and quadrature on a triangular mesh. I was trying to integrate a function over defined over a scattered set of nodes where both the values and derivatives at the nodes are known. First I constructed a Delaunay triangulation of the points. Next I sought a quadrature rule for a general triangle. Finally I was missing the interpolation, so I gave ChatGPT the following prompt:

What is the formula for the barycentric interpolant using node values and first derivatives at the corners?

After an “elaborate” derivation process it first served me this formula:

Naive as I was, :man_facepalming:, I copied this into my program, and wrote a unit test to integrate the function f(x,y) = x y over the domain [0,1]^2 using a second-order triangle quadrature rule (meaning it should be exact for this specific polynomial). When the test didn’t pass, I checked painstakingly for coding mistakes, and even switched to using Kahan summation, thinking it might be round-off error from the sum.

Any FEM practitioner will notice the interpolant formula above is only linear w.r.t to the barycentric coordinate variables. It’s just using “perturbed” function values. It is not consistent at all and in general it doesn’t appear possible to define a consistent FEM basis using just the values and derivatives at the vertexes (the cubic triangular elements need 10 coefficients, but only 9 are available).

For fun, I just asked GPT the same prompt in the same session, and now it returned a different, and still erroneous formula.

So this case falls under “seriously wrong”. It was a stark reminder of a quote, I believe attributed to Carl Sagan or Richard Feynman: “the easiest person to fool is yourself”.

5 Likes

There are many books available that give a much better FEM Fortran framework.
The problem with ChatGPT is it just merges a literature search, but does not guarantee correctness or consistency.

I have been writing a (the same!) FEM program for 50 years. Many of the components are specialties in themselves, such as generating a model from a 3D modelling system, using appropriate elements, time varying response or varying geometry.
I don’t think the structure of a FEM program is too complex, but the detail of many of the specialty areas is often not public domain, ie not available to ChatGPT.

My FEM program is:
Generate mesh model of nodes and elements
Allocate equations suitable for the equation solver
generate the set of equations
Reduce the equations
Generate loading conditions
Solve for displacement conditions
repeat some of the above
Process and view the results

For some components that are less relevant, workable solutions are much easier.

Perhaps one day ChatGPT may be able to encompass all this, but one of the key parts of a FEM solution is knowing and managing errors of modelling approximations, an area where ChatGPT fails badly.

1 Like

I hope that LLMs can show, at least, the underlying references that they use as training data, so that one could consult the references for more details. Also, I think it would be very useful if LLMs could show the degree of “reliability” of each reply somehow (e.g., from “not very certain” to “very trustful”), in a way similar to statistical regression that gives the degree of uncertainty as a number (together with the estimated “answer” itself). But, estimating such a degree of uncertainty might be a very difficult problem in itself for this kind of “complicated” questions…

Perplexity is an LLM that cites references, and my colleagues and I often use it. Regarding uncertainty, since ChatGPT is non-deterministic, a way to get an upper bound on reliability is to ask the same question repeatedly. One cannot trust a ChatGPT answer if it does agree with itself. I did such an experiment in June 2023. Asked to write a Fortran program to estimate Euler’s number using a Taylor series, out of 10 tries it got the correct 2.718 4 times, got 2.586 2 times, and produced a non-compiling program 4 times. At the time it was not trustworthy for that task.

1 Like

Thanks very much for the info about Perplexity! I have asked it the same question, and it seems to give more info (e.g., it also mentions OpenMP) with “Sources” and “Related” links. Although one can use a search engine like Google for getting related links, I feel it is useful to be able to read such a “summary” of the most related pages.

Re the uncertainty of the reply, I think some questions have tons of underlying data and some not much, so there could be some means for estimating the uncertainty. But, I guess it may not be very straightforward if the algorithms used are very complicated…

And I also think math will be very hard for this kind of “AI” (a sort of quite complicated / elaborate interpolation of available text data?). An interesting question is then how human understand the logic of math and apply it to new problems. (To get more info, I will read the above article by Gary Marcus).

The readers of this thread may find the following newly released preprint by Italian physicist Sauro Succi of interest:

1 Like

A paper about this work has now been published:

Large language model evaluation for high-performance computing software development.
by William F. Godoy, Pedro Valero-Lara, Keita Teranishi, Prasanna Balaprakash, and Jeffrey S. Vetter
Concurrency and Computation: Practice and Experience
04 September 2024

We found that prompts from either a targeted language such as Fortran or the more general-purpose Python can benefit from adding language keywords, while Julia prompts perform acceptably well for its Threads and CUDA.jl programming models.

It can be requested at ResearchGate.

How did you ask for the second, third, … attempt? My impression is that ChatGPT will never give the same (or very similar) answer if it “thinks” the user considers the previous one wrong or not good enough. Well, if you point it to the bad code, it may correct it.

Also, non-compilable code happens to humans also, at least to me. I would expect some future version (5.0?) try to compile the code by itself and correct language errors before returning the answer. Not to mention checking the result in the tasks like yours - ChatGPT sure knows the true value of e.

Attempts should be separated in newly created threads to avoid contamination of the context. I’m assuming that @Beliavsky did exactly that.

Agreed on non-compilable code. Trying to see whether the model can “one-shot” the problem is a valid benchmark, but I’m not sure if that’s a practical goal, rather than an academically-minded one. I have regularly wrestled with LLMs to produce valid code; sometimes it doesn’t work out at all.

On the paper @Beliavsky shared. Seeing their results generated by use of GPT3.0 fills me with dread. How did this even get published in Sept’24? Actually, I understand “how”. “Why” would be a better question.

LLMs have seen progress by leaps and bounds, especially compared to GPT3.0, and especially in coding. GPT3.5 was the breakthrough, and it still couldn’t hold a candle to GPT4.0. GPT4.0 Turbo was somewhat better once again, and then Anthropic’s Claude 3.0 Opus came along to tie. We are now at a point where Claude 3.5 Sonnet and OpenAI’s o1-mini are (debatably) the SOTA models for code generation. The difference with GPT3.0 is… astounding.

Case in point, the output of the o1-mini Euler experiment which compiles to Euler's number (approximated): 2.7182818284590455 for both cases. Also,
Claude 3.5 Sonnet (2.6 KB)'s try was pretty good too, compiling to:

Euler's number (e) ≈         2.718281828459046
Number of terms used:    18
3 Likes

Peter Norvig compared the responses of various LLMs to the prompt

Write a Python program to list all the ways in which three distinct positive integers have a product of 108.

I found that ChatGPT-4o is good at doing this in Fortran. It implements the simplest algorithm, finds a faster one when asked, and correctly uses 64-bit integers so that the program can factor larger integers when asked.

1 Like

ChatGPT o1-preview “thinks” up to a minute when prompted and can solve more difficult programming problems. If there is a Fortran programming task you would like to try it on, please post here or message me, and I will put the resulting code on GitHub and link to it here.

I have recently read that one and it was interesting:

Riva, Giuseppe, Brenda Wiederhold, and Sauro Succi. “Zero Sales Resistance: The Dark Side of Big Data and Artificial Intelligence.” Cyberpsychology, Behavior, and Social Networking 25 (March 1, 2022): 169–73. https://doi.org/10.1089/cyber.2022.0035.

But I must admit that I am not sure about how to translate exactly in my mother tongue that ZSR concept they cite all along. I guess I should understand “Zero (Sales Resistance)” rather than “(Zero Sales) Resistance”. So it is the fact that we sometimes don’t resist to marketing? And should I understand “sales” in a general sense (to sell something) or as “seasonal sales”?

Reading this interesting AI-related thread made me think about a task that could be solved only by reading a good amount of background knowledge about the topic, and that could not be solved just by looking up lots of stored results that had been judged as correct/acceptable in the past.

Please write for me a working program in <…Fortran, Python, etc…> to read three integers a,b,c and test whether or not these integers form a Pythagorean Triplet. If they do, calculate and output the value of the amallest angle in the triangle in degrees.
A good review article on the topic is available. See also the article by Weisstein

Here is a simple solution in Fortran:

pyth.f90 (1021 Bytes)
Here are two data files containing the triplets listed in the Wikipedia article, for use as input to the above program.
d100.txt (169 Bytes)
D300.TXT (468 Bytes)