# Linear Algebra¶

## `gemm`¶

template<typename `AlphaType`, typename `OCMShapeA`, typename `OCMShapeB`, typename `OCMShapeC`, typename `BetaType`>
void `chimera::linalg::``gemm`(AlphaType alpha, OCMShapeA &ocmA, OCMShapeB &ocmB, BetaType beta, OCMShapeC &ocmC, OCMShapeC &ocmOut)

Calculate general matrix multiplication (gemm) out = alpha*A*B + beta*C https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms#Level_3.

Template Parameters
• `AlphaType`: Type for the scalar alpha

• `OCMShapeA`: Type for the matrix A (In OCM)

• `OCMShapeB`: Type for the matrix B (In OCM)

• `OCMShapeC`: Type for the matrix C (In OCM)

• `BetaType`: Type for the scalar beta

Parameters
• `alpha`: scalar alpha

• `ocmA`: matrix A

• `ocmB`: matrix B

• `beta`: scalar beta

• `ocmC`: matrix C

• `ocmOut`: output matrix

## `solve`¶

template<typename `ocmAType`, typename `ocmBType`, typename `ocmXType`>
void `chimera::linalg::``solve`(ocmAType &ocmA, ocmBType &ocmB, ocmXType &ocmX, bool useCholesky = false)

Solves multiple independent matrix equations of the form Ax=b where A is non-singular small square matrix (matrixSize <= 20)

```*  +---------------+
*  |  ocmA Tensor  |
*  +---------------+
*                      A matrices for each problem are stacked in x direction
*                      ----------->
*                   |  +------------------------------+-------------------------------+
*                   |  |                              |                               |
*                   |  |                              |                               |
*   matrix rows in  |  |<---------------------------->|<----------------------------->|
*     y direction   |  |       # problems in Roi      |       # problems in Roi       |
*                   |  |     (numCores x numCores)    |     (numCores x numCores)     |
*                   v  |                              |                               |
*                      +------------------------------+-------------------------------+
*
*                    matrix columns are in z direction (into page) with alternating
*                    real (even indicies) and imaginary (odd indicies) components.
*                    Thus, z dimension is of length 2 * matrixSize.
*
*
*  +---------------+
*  |  ocmB / ocmX  |
*  +---------------+
*                      b / x vectors for each problem are stacked in x direction
*                      ----------->
*                   |  +------------------------------+-------------------------------+
*                   |  |                              |                               |
*                   |  |                              |                               |
*  vector elements  |  |<---------------------------->|<----------------------------->|
*   in y direction  |  |       # problems in Roi      |       # problems in Roi       |
*                   |  |     (numCores x numCores)    |     (numCores x numCores)     |
*                   v  |                              |                               |
*                      +------------------------------+-------------------------------+
*
*                    real (index = 0) and imaginary (index = 1) components in z direction (into page).
*
*
*
*
*  For example, suppose we have 100 Ax=b problems to solve where each A is a 4x4 complex matrix. We need
*  to store all 100 matricies into one tensor. First let's consider the first matrix A1. Suppose A1 looks
*  like this:
*
*       ┌──────┬──────┬──────┬──────┐
*       │ 1+17i│ 2+18i│ 3+19i│ 4+20i│
*       ├──────┼──────┼──────┼──────┤
*       │ 5+21i│ 6+22i│ 7+23i│ 8+24i│
*  A1 = ├──────┼──────┼──────┼──────┤
*       │ 9+25i│10+26i│11+27i│12+28i│
*       ├──────┼──────┼──────┼──────┤
*       │13+29i│14+30i│15+31i│16+32i│
*       └──────┴──────┴──────┴──────┘
*
*  We first have to handle the real and complex parts. We transform A1 into a 4 x 8 matrix that has double
*  the number of columns. The real and imaginary parts of each original column are now their own individual
*  columns like so:
*
*      ┌────►z
*      │  Re Im Re Im Re Im Re Im
*      │ ┌──┬──┬──┬──┬──┬──┬──┬──┐
*      ▼ │ 1│17│ 2│18│ 3│19│ 4│20│
*      y ├──┼──┼──┼──┼──┼──┼──┼──┤
*        │ 5│21│ 6│22│ 7│23│ 8│24│
*  A1 =  ├──┼──┼──┼──┼──┼──┼──┼──┤
*        │ 9│25│11│26│12│27│13│28│
*        ├──┼──┼──┼──┼──┼──┼──┼──┤
*        │13│29│14│30│15│31│16│32│
*        └──┴──┴──┴──┴──┴──┴──┴──┘
*
*  The rows and columns of A1 will span the y and z directions of the final tensor that we will construct.
*
*  Now, we do this transformation to each matrix A1 to A100 and then stack them in the x direction. Thus,
*  the final tensor will have the dimensions (z, y, x) = (8, 4, 100).
*
*  Note: This implementation assumes input matrices are non-singular and does not check this assertion nor
*  gives notice to user. A notice or flag will be added in future.
*
*
```

Parameters
• `ocmA`: A matrices in Ax=b. Input tensor with dimensions 1 x (2[=real+imaginary]*matrixSize) x matrixSize x numProblems

• `ocmB`: b vectors in Ax=b. Input tensor with dimensions 1 x 2[=real+imaginary] x matrixSize x numProblems

• `ocmX`: x vectors in Ax=b. Output tensor with dimensions 1 x 2[=real+imaginary] x matrixSize x numProblems

• `useCholesky`: boolean value to determine which solver method to use:

• (1) false: this uses gaussian elimination to solve Ax = b. This works for general square complex matrices A.

• (2) true : this does a cholesky decomposition of A to form A = LL^H and then uses this decomposition to solve Ax=b. This only works for hermitian positive-definite matricies. Compute cycles are halved compared to gaussian elimination method when applicable.

## `matrixMulMatrix`¶

template<typename `OCMShapeA`, typename `OCMShapeB`, typename `OCMShapeC`, isOcmTensor<OCMShapeA> = 0, isOcmTensor<OCMShapeB> = 0, isOcmTensor<OCMShapeC> = 0>
void `chimera::linalg::``matrixMulMatrix`(OCMShapeA &matA, OCMShapeB &matB, OCMShapeC &matC)

Calculate matrix multiplication from matrix A and B, store the result in matrix C, C=A*B.

For matrix B, a rectangle area is loaded into the array. The width of the rectangle is numArrayCores so that each core holds data from a column.

Template Parameters
• `OCMShapeA`: Type for the matrix A (In OCM)

• `OCMShapeB`: Type for the matrix B (In OCM)

• `OCMShapeC`: Type for the matrix C (In OCM)

Parameters
• `matA`: the first matrix, A

• `matB`: the second matrix, B

• `matC`: the result matrix, C

```*
*     B                     qB[0]                qB[1]
* ┌──────────────────────┐      ┌──────────────────┐ ┌──────────────────┐
* │     │ rowsOfBToLoad  │      │  B00 B01 B02 B03 │ │ B10 B11 B12 B13  │
* │-───-┘                │      │                  │ │                  │
* │numArrayCores         │      │  B04 B05 B06 B07 │ │ B14 B15 B16 B17  │
* │                      │  load│                  │ │                  │
* │                      │ ---->│  ......          │ │ ......           │
* │                      │      │                  │ │                  │
* └──────────────────────┘      └──────────────────┘ └──────────────────┘
*                                  * A00                *A01
*
```

Load the corresponding positions in matrix A and broadcast them, multipy a row of A with columns of B, Partial sum for an element in matrix C: => C00 = B00*A00 + B10*A01 + …. can be summed by utilizing data from the same core. Partial sums are stored in qC, them synced to the OCM. After finished one row of A, broadcast different rows of A, to utilize data from the same matrix B load.

## `scalarMulMatrix`¶

template<typename `AlphaType`, typename `OCMShapeM`>
void `chimera::linalg::``scalarMulMatrix`(AlphaType alpha, OCMShapeM &mat)

Multipy matrix mat with a scalar, store the result in the original matrix, mat*=alpha.

Template Parameters
• `OCMShapeM`: Type for the matrix mat (In OCM)

Parameters
• `alpha`: the scalar

• `mat`: the matrix

## `cholesky`¶

template<typename `qLType`, std::int32_t `numRows`, std::int32_t `numCols`, std::int32_t `numComponents`>
void `chimera::linalg::``cholesky`(qVar_t<qLType> (&qL)[numCols][numComponents][numRows])

cholesky factorization of matrix A as A = LL^H. computation is done in place.

Parameters
• `qL`: input is matrix A and output is L. This is done in place.

## `solveCholesky`¶

template<typename `qLType`, typename `qBType`, typename `qXType`, std::int32_t `matrixSize`, std::int32_t `numCols`, std::int32_t `numComponents`, std::int32_t `numComponentsB`, std::int32_t `numElemsB`, std::int32_t `numComponentsX`, std::int32_t `numElemsX`>
void `chimera::linalg::``solveCholesky`(qVar_t<qLType> (&qL)[numCols][numComponents][matrixSize], qVar_t<qBType> (&qB)[numComponentsB][numElemsB], qVar_t<qXType> (&qX)[numComponentsX][numElemsX])

Solves Ax = b where A=LL^H.

Parameters
• `qL`: matrix L

• `qB`: vector b

• `qX`: solution to Ux=b

## `gaussianElimination`¶

template<typename `qUType`, typename `qBType`, std::int32_t `matrixSize`, std::int32_t `numCols`, std::int32_t `numComponents`, std::int32_t `numComponentsB`, std::int32_t `numElemsB`>
void `chimera::linalg::``gaussianElimination`(qVar_t<qUType> (&qU)[numCols][numComponents][matrixSize], qVar_t<qBType> (&qB)[numComponentsB][numElemsB])

gaussian elimination

Parameters
• `qU`: matrix U

• `qB`: vector b

## `backSubstitution`¶

template<typename `qUType`, typename `qBType`, typename `qXType`, std::int32_t `matrixSize`, std::int32_t `numCols`, std::int32_t `numComponents`, std::int32_t `numComponentsB`, std::int32_t `numElemsB`, std::int32_t `numComponentsX`, std::int32_t `numElemsX`>
void `chimera::linalg::``backSubstitution`(qVar_t<qUType> (&qU)[numCols][numComponents][matrixSize], qVar_t<qBType> (&qB)[numComponentsB][numElemsB], qVar_t<qXType> (&qX)[numComponentsX][numElemsX])

Solves multiple upper triangular matrix equations of the form Ux=b.

Parameters
• `qU`: matrix U

• `qB`: vector b

• `qX`: solution to Ux=b