Linear Algebra

gemm

template<typename AlphaType, typename OCMShapeA, typename OCMShapeB, typename OCMShapeC, typename BetaType>
void chimera::linalg::gemm(AlphaType alpha, OCMShapeA &ocmA, OCMShapeB &ocmB, BetaType beta, OCMShapeC &ocmC, OCMShapeC &ocmOut)

Calculate general matrix multiplication (gemm) out = alpha*A*B + beta*C https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3.

Template Parameters
  • AlphaType: Type for the scalar alpha

  • OCMShapeA: Type for the matrix A (In OCM)

  • OCMShapeB: Type for the matrix B (In OCM)

  • OCMShapeC: Type for the matrix C (In OCM)

  • BetaType: Type for the scalar beta

Parameters
  • alpha: scalar alpha

  • ocmA: matrix A

  • ocmB: matrix B

  • beta: scalar beta

  • ocmC: matrix C

  • ocmOut: output matrix

solve

template<typename ocmAType, typename ocmBType, typename ocmXType>
void chimera::linalg::solve(ocmAType &ocmA, ocmBType &ocmB, ocmXType &ocmX, bool useCholesky = false)

Solves multiple independent matrix equations of the form Ax=b where A is non-singular small square matrix (matrixSize <= 20)

*  +---------------+
*  |  ocmA Tensor  |
*  +---------------+
*                      A matrices for each problem are stacked in x direction
*                      ----------->
*                   |  +------------------------------+-------------------------------+
*                   |  |                              |                               |
*                   |  |                              |                               |
*   matrix rows in  |  |<---------------------------->|<----------------------------->|
*     y direction   |  |       # problems in Roi      |       # problems in Roi       |
*                   |  |     (numCores x numCores)    |     (numCores x numCores)     |
*                   v  |                              |                               |
*                      +------------------------------+-------------------------------+
*
*                    matrix columns are in z direction (into page) with alternating
*                    real (even indicies) and imaginary (odd indicies) components.
*                    Thus, z dimension is of length 2 * matrixSize.
*
*
*  +---------------+
*  |  ocmB / ocmX  |
*  +---------------+
*                      b / x vectors for each problem are stacked in x direction
*                      ----------->
*                   |  +------------------------------+-------------------------------+
*                   |  |                              |                               |
*                   |  |                              |                               |
*  vector elements  |  |<---------------------------->|<----------------------------->|
*   in y direction  |  |       # problems in Roi      |       # problems in Roi       |
*                   |  |     (numCores x numCores)    |     (numCores x numCores)     |
*                   v  |                              |                               |
*                      +------------------------------+-------------------------------+
*
*                    real (index = 0) and imaginary (index = 1) components in z direction (into page).
*
*
*
*
*  For example, suppose we have 100 Ax=b problems to solve where each A is a 4x4 complex matrix. We need
*  to store all 100 matricies into one tensor. First let's consider the first matrix A1. Suppose A1 looks
*  like this:
*
*       ┌──────┬──────┬──────┬──────┐
*       │ 1+17i│ 2+18i│ 3+19i│ 4+20i│
*       ├──────┼──────┼──────┼──────┤
*       │ 5+21i│ 6+22i│ 7+23i│ 8+24i│
*  A1 = ├──────┼──────┼──────┼──────┤
*       │ 9+25i│10+26i│11+27i│12+28i│
*       ├──────┼──────┼──────┼──────┤
*       │13+29i│14+30i│15+31i│16+32i│
*       └──────┴──────┴──────┴──────┘
*
*  We first have to handle the real and complex parts. We transform A1 into a 4 x 8 matrix that has double
*  the number of columns. The real and imaginary parts of each original column are now their own individual
*  columns like so:
*
*      ┌────►z
*      │  Re Im Re Im Re Im Re Im
*      │ ┌──┬──┬──┬──┬──┬──┬──┬──┐
*      ▼ │ 1│17│ 2│18│ 3│19│ 4│20│
*      y ├──┼──┼──┼──┼──┼──┼──┼──┤
*        │ 5│21│ 6│22│ 7│23│ 8│24│
*  A1 =  ├──┼──┼──┼──┼──┼──┼──┼──┤
*        │ 9│25│11│26│12│27│13│28│
*        ├──┼──┼──┼──┼──┼──┼──┼──┤
*        │13│29│14│30│15│31│16│32│
*        └──┴──┴──┴──┴──┴──┴──┴──┘
*
*  The rows and columns of A1 will span the y and z directions of the final tensor that we will construct.
*
*  Now, we do this transformation to each matrix A1 to A100 and then stack them in the x direction. Thus,
*  the final tensor will have the dimensions (z, y, x) = (8, 4, 100).
*
*  Note: This implementation assumes input matrices are non-singular and does not check this assertion nor
*  gives notice to user. A notice or flag will be added in future.
*
*

Parameters
  • ocmA: A matrices in Ax=b. Input tensor with dimensions 1 x (2[=real+imaginary]*matrixSize) x matrixSize x numProblems

  • ocmB: b vectors in Ax=b. Input tensor with dimensions 1 x 2[=real+imaginary] x matrixSize x numProblems

  • ocmX: x vectors in Ax=b. Output tensor with dimensions 1 x 2[=real+imaginary] x matrixSize x numProblems

  • useCholesky: boolean value to determine which solver method to use:

    • (1) false: this uses gaussian elimination to solve Ax = b. This works for general square complex matrices A.

    • (2) true : this does a cholesky decomposition of A to form A = LL^H and then uses this decomposition to solve Ax=b. This only works for hermitian positive-definite matricies. Compute cycles are halved compared to gaussian elimination method when applicable.

matrixMulMatrix

template<typename OCMShapeA, typename OCMShapeB, typename OCMShapeC, isOcmTensor<OCMShapeA> = 0, isOcmTensor<OCMShapeB> = 0, isOcmTensor<OCMShapeC> = 0>
void chimera::linalg::matrixMulMatrix(OCMShapeA &matA, OCMShapeB &matB, OCMShapeC &matC)

Calculate matrix multiplication from matrix A and B, store the result in matrix C, C=A*B.

For matrix B, a rectangle area is loaded into the array. The width of the rectangle is numArrayCores so that each core holds data from a column.

Template Parameters
  • OCMShapeA: Type for the matrix A (In OCM)

  • OCMShapeB: Type for the matrix B (In OCM)

  • OCMShapeC: Type for the matrix C (In OCM)

Parameters
  • matA: the first matrix, A

  • matB: the second matrix, B

  • matC: the result matrix, C

*
*     B                     qB[0]                qB[1]
* ┌──────────────────────┐      ┌──────────────────┐ ┌──────────────────┐
* │     │ rowsOfBToLoad  │      │  B00 B01 B02 B03 │ │ B10 B11 B12 B13  │
* │-───-┘                │      │                  │ │                  │
* │numArrayCores         │      │  B04 B05 B06 B07 │ │ B14 B15 B16 B17  │
* │                      │  load│                  │ │                  │
* │                      │ ---->│  ......          │ │ ......           │
* │                      │      │                  │ │                  │
* └──────────────────────┘      └──────────────────┘ └──────────────────┘
*                                  * A00                *A01
*

Load the corresponding positions in matrix A and broadcast them, multipy a row of A with columns of B, Partial sum for an element in matrix C: => C00 = B00*A00 + B10*A01 + …. can be summed by utilizing data from the same core. Partial sums are stored in qC, them synced to the OCM. After finished one row of A, broadcast different rows of A, to utilize data from the same matrix B load.

scalarMulMatrix

template<typename AlphaType, typename OCMShapeM>
void chimera::linalg::scalarMulMatrix(AlphaType alpha, OCMShapeM &mat)

Multipy matrix mat with a scalar, store the result in the original matrix, mat*=alpha.

Template Parameters
  • OCMShapeM: Type for the matrix mat (In OCM)

Parameters
  • alpha: the scalar

  • mat: the matrix

cholesky

template<typename qLType, std::int32_t numRows, std::int32_t numCols, std::int32_t numComponents>
void chimera::linalg::cholesky(qVar_t<qLType> (&qL)[numCols][numComponents][numRows])

cholesky factorization of matrix A as A = LL^H. computation is done in place.

Parameters
  • qL: input is matrix A and output is L. This is done in place.

solveCholesky

template<typename qLType, typename qBType, typename qXType, std::int32_t matrixSize, std::int32_t numCols, std::int32_t numComponents, std::int32_t numComponentsB, std::int32_t numElemsB, std::int32_t numComponentsX, std::int32_t numElemsX>
void chimera::linalg::solveCholesky(qVar_t<qLType> (&qL)[numCols][numComponents][matrixSize], qVar_t<qBType> (&qB)[numComponentsB][numElemsB], qVar_t<qXType> (&qX)[numComponentsX][numElemsX])

Solves Ax = b where A=LL^H.

Parameters
  • qL: matrix L

  • qB: vector b

  • qX: solution to Ux=b

gaussianElimination

template<typename qUType, typename qBType, std::int32_t matrixSize, std::int32_t numCols, std::int32_t numComponents, std::int32_t numComponentsB, std::int32_t numElemsB>
void chimera::linalg::gaussianElimination(qVar_t<qUType> (&qU)[numCols][numComponents][matrixSize], qVar_t<qBType> (&qB)[numComponentsB][numElemsB])

gaussian elimination

Parameters
  • qU: matrix U

  • qB: vector b

backSubstitution

template<typename qUType, typename qBType, typename qXType, std::int32_t matrixSize, std::int32_t numCols, std::int32_t numComponents, std::int32_t numComponentsB, std::int32_t numElemsB, std::int32_t numComponentsX, std::int32_t numElemsX>
void chimera::linalg::backSubstitution(qVar_t<qUType> (&qU)[numCols][numComponents][matrixSize], qVar_t<qBType> (&qB)[numComponentsB][numElemsB], qVar_t<qXType> (&qX)[numComponentsX][numElemsX])

Solves multiple upper triangular matrix equations of the form Ux=b.

Parameters
  • qU: matrix U

  • qB: vector b

  • qX: solution to Ux=b