Implementing LC-3 in Fortran

I’m writing a virtual machine for the Little Computer 3, an educational computer from the textbook of Patt & Patel (2003) [1]. It’s a great way to learn about computer architecture, hence I decided to make a tutorial-like post if any of you are interested in joining along. Feel free to ask questions or brainstorm the best way to implement the virtual machine.

What is LC-3?

LC-3 is a register machine with a word size of 16-bits and a 2^16-location address space (131 KB of memory). It has eight general purpose registers named R0-R7, a program counter, and three status registers.

The full instruction set architecture (ISA) description can be found in [2]. You can find additional resources with your favorite search engine.

Memory

The “virtualized” memory in Fortran becomes:

integer(int16) :: mem(0:65535) ! memory

integer(int16) :: R(0:7) ! registers
integer :: PC            ! program counter
logical :: N, Z, P       ! status registers

Some of the decisions to make here are:

  • How to represent the memory? Should it be an array of int16 or an array of character(len=2)?
  • What about the registers? The int16 has the benefit the arithmetic operations and bit manipulation functions are immediately available. On the other hand, integer wraparound is not guaranteed, so a compiler option like gcc’s -fwrapv might be needed.
  • How to represent the status registers for conditional branching? As three logical variables (1-bit of information each) or simply as an integer mask?

(For simplicity, let’s assume we are on a two-complement machine, and the byte size is 8-bits. If you want to make it more challenging, feel free to lift this assumption.)

Originally, the program counter is supposed to be stored in (special-purpose) 16-bit register. Since Fortran only has signed integers, using int16 would complicate memory addressing, so we may as well store it in an integer type with larger range.

Instruction Set

The instruction set is very lean with only 15 instructions. There are no dedicated subtract, multiply, or divide instructions. Instead a subtraction can be performed by negating the operand and addition. Multiplication can be performed by a sequence of additions. For floating point it is necessary to write a floating-point library.

Each instruction is composed of an opcode (what the computer should do) and its operands. Some of the instructions support different addressing modes (where the data is located, registers, immediate data, or memory).

The LC3 Instruction Set [1]:

The instructions can be categorized into three groups:

  • Operate instructions (ADD, AND, NOT, LEA*); these process data
  • Data movement instructions (LD,LDR,LDI,ST,STR,STI); used to load and store data, moving it between registers and memory
  • Control instructions (BR, JMP, JSR, TRAP, RTI); used to change the sequence of operations

The (virtual) arithmethic-logic unit takes an opcode (the top 4 bits) and operands, and performs the operation. In Fortran this becomes one of the following:

  • a select case construct
  • a table of procedure pointers
  • a computed go to statement
instr = mem(PC)               ! Fetch instruction from memory
opcode = ibits(instr,12,4)    ! Extract opcode

select case(opcode)
case(ADD)
... perform addition ...
case(AND)
... perform bitwise-AND ...
case(...)
...
end select
type :: instr
  procedure(lc3_instr), pointer, nopass :: p
end type

type(instr), parameter :: ops(16) = [ instr(ADD), instr(AND), ...]

call ops(opcode)%p()
    go to (10, 20, 30, ...) opcode
 10 continue  ! ADD
    ...
 20  continue ! AND
    ...
 30 continue  ! ...
    ...

The select case construct is the easiest to start with. The other two options could be useful in an advanced setting where we implement software pipelining (a technique for implementing instruction-level parallelism) so that the execution of current opcode and decoding of the next instruction overlap.

After jumping to the right opcode branch we simply emulate the instruction using the built-in Fortran statements, operators, subroutines, and intrinsic functions. The definitions of the instructions can be found in [2]. Note that two instruction opcodes can be ignored (RTI - return from interrupt, and the reserved opcode 1101) for the start.

Fetch-execute cycle

The “heart” of the virtual machine is the instruction cycle (also known as fetch-decode-execute or simply fetch-execute cycle). After booting-up (in our case launching the virtual machine program), the computer enters the central processing cycle:

do 
   ! 1) Fetch instruction from memory
   !    ...

   ! 2) Decode operation
   !    ...

   ! 3) Execute the instruction (ALU)
   !    ...

end do

For practical reasons the steps 2 and 3 can be overlapped in the sense that we do decoding within the select case statement.

Putting it all together

The skeleton of the virtual machine is now,

program lc3
use, intrinsic :: iso_fortran_env, only: int16
implicit none

integer(int16) :: mem(0:65535) ! memory
integer(int16) :: R(0:7) ! registers
integer :: PC ! program counter
logical :: N, Z, P ! status registers

! TODO: Boot the virtual machine
!       1) Read user program into memory
!       2) Initialize program counter and other registers

do
  
!  TODO: Fetch instruction and decode
   instr = ... fetch instruction from memory ...
   opcode = ... extract opcode (upper 4 bits) ...
   PC = PC + 1 ! Adjust program counter

!  TODO: Implement instructions (and decoding)
   select case(opcode)
   case(...)
   ...
   end select

end do

end program

The program is to be invoked from the shell as:

$ lc3 path/to/object-file

where the object file contains the instructions stored in big-endian (!) format.

Besides the textbook [1], you can help yourself with the following tutorials:

Writing programs for LC-3

Assembly

In principle you could assemble programs manually, but this is very laborious. Software tools (assemblers and a C-compiler) can be found on the webpage [3] accompanying the Patt & Patel (2003) textbook. They are a somewhat outdated, but overhauled versions can be found on GitHub, for example lc3tools.

An alternative I’ve found is a Rust program called customasm which allows you to define your own assembly language. There is also an online version of it: customasm/web. The syntax of customasm is described here.

I have done my best to define the LC-3 instruction set in customasm here: LC-3 for customasm, but I don’t guarantee it is without errors. You can copy-paste it into the customasm web app, or save it in a file, which is then included in your assembly program, for example:

#include "LC-3.asm"

; Program to multiply an integer by the constant 6.
; Before execution, an integer must be stored in NUMBER
    LD  R1,SIX
    LD  R2,NUMBER
    AND R3,R3,0

; The inner loop
; 
AGAIN: 
    ADD R3,R3,R2
    ADD R1,R1,-1
    BRp AGAIN
;
    HALT
;

NUMBER:
    #res 1      ; Reserve one word for external input
SIX: 
    #d 0x0006   ; SIX = 6 

Customasm can output the instructions and data in different formats. Here is the annotated binary format,

 outp | addr | data (base 2)

  0:0 | 3000 | 00100010 00000111 ; LD  R1,SIX
  2:0 | 3001 | 00100100 00000101 ; LD  R2,NUMBER
  4:0 | 3002 | 01010110 11100000 ; AND R3,R3,0
  6:0 | 3003 |                   ; AGAIN:
  6:0 | 3003 | 00010110 11000010 ; ADD R3,R3,R2
  8:0 | 3004 | 00010010 01111111 ; ADD R1,R1,-1
  a:0 | 3005 | 00000011 11111101 ; BRp AGAIN
  c:0 | 3006 | 11110000 00100101 ; HALT
  e:0 | 3007 |                   ; NUMBER:
 10:0 | 3008 |                   ; SIX:
 10:0 | 3008 | 00000000 00000110 ; 0x0006

As a hex string this would become:

2207240556e016c2127f03fdf02500000006

(Each hex number corresponds to 4-bit binary number, so 4 hex numbers are one instruction.)

Higher-level programming languages

A C to LC-3 compiler named lcc is available as part of the student resources [3] that accompany the textbook. A few different copies of it can be found on GitHub too. I managed to get this one working: GitHub - haplesshero13/lcc-lc3: LC3 C Compiler for the Little Computer v3, but a few people commented elsewhere it can break easily.

In principle one could use f2c, to transpile a Fortran program to C and then compile it for LC-3. But I doubt this will work straight out of the box. It’s very likely that parts of f2c or lcc will need to be adapted first.

An alternative path would be to write an experimental code generation backend for LFortran (or any other compiler).


References

[1] Patt, Yale N.; Patel, Sanjay (2004). Introduction to Computing Systems: From Bits and Gates to C and Beyond. New York, NY: McGraw-Hill Higher Education. (A third edition was released in 2020, including C++)
[2] The LC-3 ISA, Excerpt from the Patt & Patel (2004) book, https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf (PDF, 349 KB)
[3] Student Resources: Introduction to Computing Systems

19 Likes

Looks like a very nice summer project.

In fact, I might give it a try myself…

1 Like

You can define memory and register to be int32 and then forget the higher bits, at the end it is just an emulator.

Or you can define your memory as:

integer(int16) :: memory(-128:127)

And then hope that your computer uses 2-complements to implement negative integer.
Nice project!

It would be nice if you try to implement the fetch decode and execute using parallel facility in Fortran, like coarrays etc.

That would be supernice!

I suppose you could do that, but it would be kind of inefficient to waste half the memory. In principle we could also pack two words into an int32 and then introduce functions for reading and writing individual words:

integer(int32) :: mem(32768)

! ...

function memread(i)
  integer, intent(in) :: i ! An address between z'0000' and z'FFFF' (inclusive)
  integer(word) :: memread ! Output of range -32768 to 32767
  integer :: j

  ! | 0 | 1 | 2 | 3 | 4 | 5 | ... | - Memory address
  ! |   1   |   2   |   3   | ... | - Fortran array index

  j = i / 2 + 1
  if (mod(i,2) == 0) then
    memread = low(mem(j))
  else 
    memread = high(mem(j))
  end if
end function

In a Fortran implementation which doesn’t support int16 it may be the only way to do it, besides using character(len=_) for storage.

I can’t really imagine what the parallel fetch-decode-execute distributed across coarray images would do? Maybe you can elaborate a bit? While it may be interesting in a meta-computational way, i.e. a virtual computer made of many images, I fear the latency of communication would make it unusable.

The type of parallelism more suitable for the virtual machine is instruction-level parallelism of the CPU. The compiler can also help here by reordering expressions when possible.