Fast.ai - Mojo may be the biggest programming language advance in decades

I have no experience with Mojo up to now (but seems interesting), and am wondering…

  • Is the line from algorithm import parallelize, vectorize from Mojo? (i.e., does it support threading also in a built-in manner?)

  • Does a variable like var x = SIMD[float_type, simd_width](0) suggest possible use of SIMD for the compiler? (i.e., the compiler treats them as possible targets of SIMD where possible)?

Another questions (not related to Mojo) are…

  • Does the machine used in the above page (Macbook Air M1, 8-core) have so-called “P-core” and “E-core”? In that case, only "P-core"s are effective for parallelization? (The Erlang code seems to scale only up to < 4, so I wonder if this reflects the number of P-cores.)

  • Is the Fortran code compiled with some option for parallelizing do concurrent? (I guess otherwise it will run in a serial mode, which might explain the time difference by ~4 between Mojo and Fortran)

FWIW, I have looked at the benchmark page in the following site, and it seems the Fortran code is using OpenMP for threading. Other languages also seem to be using threading, but overall, the “fastest” codes tend to become less and less readable…

Yes, parallelism is built in - though the language is in flux so I would not be surprised if the details look different in the future - it’s not a standard so the specification is not locked down.

There are many problems that have easy parallelism - and mandelbrot is one of them. Pixels can be processed independently - only difficulty is chunking them to get workloads that are worth processing in parallel. And some pixels are harder than others - e.g. in-set.

Mojo can parallelise for-loops easily - rust, python, erlang can too. Not built in to the language, but via external libraries. E.g. rayon crate in case of rust. Rust is great because it checks there are no race conditions.

The Fortran mandelbrot I made is single core - based on my rust experience I initially though do concurrent would be enough to do the same in Fortran. It isn’t of course - requires
a little more effort to set up with MPI - didn’t try on my laptop.

For Erlang I think you might be able to do better than 4x (with 8 cores) if you experiment with how you set it up. Erlang is very flexible when it comes to parallelisation - but not so
great with numerical tasks like Mandelbrot.
It is similar to python in this regard and really should be extended with “c-nodes” for instance to do well here.

Agree with you on readability.

1 Like