There is a new arXiv preprint
LLM & HPC:Benchmarking DeepSeek’s Performance in High-Performance Computing Tasks
by Noujoud Nader, Patrick Diehl, Steve Brandt, Hartmut Kaiser
Our results demonstrate that while DeepSeek generates functional code for HPC tasks, it lags behind GPT-4, in terms of scalability and execution efficiency of the generated code.
One important observation is that LLMs struggle to write scalable codes. In first two examples, both the C++ and Python codes showed consistent scaling with the increase in the number of cores, while the Fortran and Julia implementations showed limited scalability. This is consistent with prior research on HPC applications, which highlights the challenges of parallelism in certain programming languages. In example three, both C++ and Fortran scaled, but the Julia code could not be executed on RISC-V CPUs due to Julia not supporting this device. The Stream Triad clearly improved in performance with increasing array sizes for the all generated codes.
They survey LLM HPC research and study performance on the following prompts:
Conjugate Gradient Solver
Write a language code to solve the linear equation system using the conjugate gradient solver and validate it.Parallel 1D Stencil-Based Heat Equation Solver
Write a parallel language code to solve the one-dimensional heat equation using a finite difference scheme for the discretization in space and the Euler method for time integration, validate it and plot the solution.Parallel Matrix Multiplication
Write a parallel language code for matrix multiplication and validate it.DoublePrecision General Matrix Multiplication
Write a parallel language code to perform DGEMM on large matrices, optimize the implementation for performance using parallel computing techniques, validate the results, and compare the performance with different matrix sizes and parallelization
strategies.STREAM
Write a parallel language code to perform the STREAM triad operation on large arrays.
For a more explicit version of the first prompt
Write a Fortran subroutine in a module to solve a system of linear equations using the conjugate gradient solver and validate it. The equations are M * x = b, where M is
4 -1 0
-1 4 -1
0 -1 4and b is [1 2 3]. The L2 norm of residuals should be <= 1.0e-6.
Declare real variables as real(kind=dp) with the module constant dp defined as
integer, parameter :: dp = kind(1.0d0)
Print the solution x and verify that it is correct by printing M * x.
my Groq-Fortran-agent using the model llama-3.3-70b-versatile
gave the code
module conjugate_gradient_module
use iso_fortran_env, only: error_unit
implicit none
private
public :: conjugate_gradient_solver, dp
integer, parameter :: dp = kind(1.0d0)
contains
subroutine conjugate_gradient_solver(M, b, x, tol)
real(kind=dp), intent(in) :: M(:, :)
real(kind=dp), intent(in) :: b(:)
real(kind=dp), intent(out) :: x(:)
real(kind=dp), intent(in) :: tol
real(kind=dp) :: r(size(b)), p(size(b)), Ap(size(b))
real(kind=dp) :: alpha, beta, rnorm
integer :: n, iter
n = size(b)
x = 0.0_dp
r = b - matmul(M, x)
p = r
iter = 0
do
Ap = matmul(M, p)
alpha = dot_product(r, r) / dot_product(p, Ap)
x = x + alpha * p
r = r - alpha * Ap
rnorm = sqrt(dot_product(r, r))
if (rnorm <= tol) exit
beta = dot_product(r, r) / dot_product(r - alpha * Ap, r - alpha * Ap)
p = r + beta * p
iter = iter + 1
end do
print *, 'Solution x = ', x
print *, 'M * x = ', matmul(M, x)
end subroutine conjugate_gradient_solver
end module conjugate_gradient_module
program main
use conjugate_gradient_module, only: conjugate_gradient_solver, dp
implicit none
real(kind=dp) :: M(3, 3), b(3), x(3)
real(kind=dp) :: tol
M = reshape([4.0_dp, -1.0_dp, 0.0_dp, -1.0_dp, 4.0_dp, -1.0_dp, 0.0_dp, -1.0_dp, 4.0_dp], [3, 3])
b = [1.0_dp, 2.0_dp, 3.0_dp]
tol = 1.0e-6_dp
call conjugate_gradient_solver(M, b, x, tol)
end program main
in 3 tries (8 seconds), which gives output
Solution x = 0.46428571746412267 0.85714278307017844 0.96428568742458887
M * x = 1.0000000867863124 1.9999997273920023 2.9999999666281769
I put the instruction “use the Fortran intrinsic matmul function” in the prompt, because otherwise the LLM defines its own matmul
. I need to think about what compiler options to use for the “parallel language code” prompts.
Another recent preprint by some of the same authors is LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages that compares LLM codes using
- C++ GCC 12
- FORTRAN GCC 12
- Python 3.12
- Matlab R2024a
- CUDA 12.4.1
- OpenMPI 4.1.5
- mpi4py 4.0.1
- R 2024a