R to Fortran transpiler -- quickr

quick() can accelerate any R function, with some restrictions:

  • Function arguments must have their types and shapes declared using declare().
  • Only atomic vectors, matrices, and array are currently supported: integer, double, logical, and complex.
  • The return value must be an atomic array (e.g., not a list)
  • Named variables must have consistent shapes throughout their lifetimes.
  • NA values are not supported.
  • Only a subset of R’s vocabulary is currently supported.

The author finds that such R function can be sped up 50-200 times by translating them to Fortran. Here is an example of what the transpiled code looks like. Running the R script

library(quickr)

# R function to be transpiled
slow_viterbi <- function(observations, states, initial_probs, transition_probs, emission_probs) {
    declare(
      type(observations = integer(num_steps)),
      type(states = integer(num_states)),
      type(initial_probs = double(num_states)),
      type(transition_probs = double(num_states, num_states)),
      type(emission_probs = double(num_states, num_obs)),
    )

    trellis <- matrix(0, nrow = length(states), ncol = length(observations))
    backpointer <- matrix(0L, nrow = length(states), ncol = length(observations))
    trellis[, 1] <- initial_probs * emission_probs[, observations[1]]

    for (step in 2:length(observations)) {
      for (current_state in 1:length(states)) {
        probabilities <- trellis[, step - 1] * transition_probs[, current_state]
        trellis[current_state, step] <- max(probabilities) * emission_probs[current_state, observations[step]]
        backpointer[current_state, step] <- which.max(probabilities)
      }
    }

    path <- integer(length(observations))
    path[length(observations)] <- which.max(trellis[, length(observations)])
    for (step in seq(length(observations) - 1, 1)) {
      path[step] <- backpointer[path[step + 1], step + 1]
    }

    out <- states[path]
    out
}

cat(quickr:::r2f((slow_viterbi)), "\n") # print the Fortran code

gives a Fortran subroutine

subroutine anonymous_function(observations, states, initial_probs, transition_probs, emission_probs, out, emission_probs__dim_2_, &
observations__len_, states__len_) bind(c)
  use iso_c_binding, only: c_double, c_int, c_ptrdiff_t
  implicit none

  ! manifest start
  ! sizes
  integer(c_ptrdiff_t), intent(in), value :: observations__len_
  integer(c_ptrdiff_t), intent(in), value :: states__len_
  integer(c_int), intent(in), value :: emission_probs__dim_2_

  ! args
  integer(c_int), intent(in) :: observations(observations__len_)
  integer(c_int), intent(in) :: states(states__len_)
  real(c_double), intent(in) :: initial_probs(states__len_)
  real(c_double), intent(in) :: transition_probs(states__len_, states__len_)
  real(c_double), intent(in) :: emission_probs(states__len_, emission_probs__dim_2_)
  integer(c_int), intent(out) :: out(observations__len_)

  ! locals
  integer(c_int) :: current_state
  integer(c_int) :: step
  integer(c_int) :: backpointer(states__len_, observations__len_)
  real(c_double) :: trellis(states__len_, observations__len_)
  integer(c_int) :: path(observations__len_)
  real(c_double) :: probabilities(states__len_)
  ! manifest end


  trellis = 0.0_c_double
  backpointer = 0_c_int
  trellis(:, 1_c_int) = (initial_probs * emission_probs(:, observations(1_c_int)))
  do step = 2_c_int, size(observations), sign(1, size(observations)-2_c_int)
    do current_state = 1_c_int, size(states), sign(1, size(states)-1_c_int)
      probabilities = (trellis(:, (step - 1_c_int)) * transition_probs(:, current_state))
      trellis(current_state, step) = (maxval(probabilities) * emission_probs(current_state, observations(step)))
      backpointer(current_state, step) = maxloc(probabilities, 1)
    end do
  end do
  path = 0
  path(size(observations)) = maxloc(trellis(:, size(observations)), 1)
  do step = (size(observations) - 1_c_int), 1_c_int, sign(1, 1_c_int-(size(observations) - 1_c_int))
    path(step) = backpointer(path((step + 1_c_int)), (step + 1_c_int))
  end do
  out = states(path)
end subroutine

that the author says is about 50 times faster. There is a vast amount of R code out there, with base R having much functionality, and 22,000 packages on CRAN. It would be great if a large fraction could be automatically translated to Fortran, although R functions typically do not have a declare block as the function above does. R and Fortran have some similarities, both supporting 1-based multidimensional arrays with operations on whole arrays and array sections.

Another package of the author is

This R package provides .ModernFortran(), an interface similar to .Fortran() but for Fortran 2018.

The 2018 Fortran language standard expanded support for interfacing C and Fortran. One of the additions is the introduction of C descriptors, a data structure for passing arrays between C and Fortran. This package takes advantage of that.

In contrast with .Fortran, R arrays are not passed as naked pointers, but as C descriptors that contain information about rank, shape, element size, type, and memory stride of the array that the Fortran routine can access directly. This means that additional arguments for passing the size or rank of arrays are no longer needed, which should lead to cleaner, simpler Fortran code. Additionally, logical and raw types are now supported directly.

5 Likes