I am in the process of finishing an initial pass right now, I was going to release something in a couple of days.
I abstracted the interface to something akin to OpenGL to avoid passing pointers around and keeping the pointers inside ObjectiveC land which preserves the reference counts.
There are a lot of kernel types, I just don’t want to be supporting every call entry so I am abstracting kernel generation to a target which indicates which type of MPS kernel it is.
This way you get a generic interface driven by enums and named objects which simplifies all the pointer handling.
Here is a quick look at the C header for the interface, right now I support mpsMatrixMult / mpsMatrixSum and Metal shaders… its not complete as I have just switched over to kernel types
typedef enum {
kInvalid = 0,
kFloat32,
kFloat16,
kInt64,
kInt32,
kInt16,
kInt8,
kUint64,
kUInt32,
kUInt16,
kUInt8,
kBool,
kComplexFloat32,
kComplextFloat16,
kData,
kMaxMPSType
} DataType;
typedef enum {
kShader = 0,
kMatrixMult,
kMatrixSum
} KernelTarget;
typedef enum {
kAlpha,
kBeta,
kTranspose_A,
kTranspose_B,
kMaxOption
} MatrixOption;
// enable functions, mps kernels can take optional values
void mpsEnable(KernelTarget target, MatrixOption option);
void mpsDisable(KernelTarget target, MatrixOption option);
// set options value, mps kernels
void mpsSetOptionValuef(KernelTarget target, MatrixOption option, float value);
void mpsSetOptionValued(KernelTarget target, MatrixOption option, double value);
// gen type functions
unsigned mpsGenBuffer(size_t size, void *data);
unsigned mpsGenVector(unsigned len, void *data, DataType type);
unsigned mpsGenMatrix(unsigned rows, unsigned cols, void *data, DataType type);
// gen kernel functions
unsigned mpsGenMatrixMultKernel(unsigned rows, unsigned cols);
unsigned mpsGenMatrixSumKernel(unsigned rows, unsigned cols);
// data functions
void mpsSubBuffer(unsigned name, void *src, size_t size, size_t offset);
void mpsSubBufferVector(unsigned name, void *src, size_t len, size_t offset, DataType type);
void mpsSubBufferMat(unsigned name, void *src, unsigned row, unsigned col, unsigned width, unsigned height, DataType type);
// synchronize data from GPU to CPU after a GPU operation modifies the contents
void mpsSyncBuffer(unsigned name);
void mpsSyncVector(unsigned name);
void mpsSyncMatrix(unsigned name);
// copy metals copy of the data to a dst pointer in the same format it was submitted in
void mpsGetBuffer(unsigned name, void *dst);
void mpsGetVector(unsigned name, void *dst);
void mpsGetMatrix(unsigned name, void *dst);
// command buffer flush routines
void mpsFlush(void);
void mpsFinish(void);
This is a simple test of MatrixMult, again it’s missing the kernel target support but it’s getting there.
matA = mpsGenMatrix(rows, cols, dataA, kFloat32);
matB = mpsGenMatrix(rows, cols, dataB, kFloat32);
result = mpsGenMatrix(rows, cols, dataResult, kFloat32);
mpsEnable(kMatrixMult, kAlpha);
mpsSetOptionValued(kMatrixMult, kAlpha, 0.5);
mpsMatrixMult(matA, matB, result);
mpsSyncMatrix(result);
mpsFinish();
mpsGetMatrix(result, dataResult);
I will check it in github for review if people are interested in expanding on the work.
Walt