# Signal Processing¶

## calculateHammingWeight¶

template<bool scale = false, typename OcmHammingWeightTensor>
INLINE void chimera::signal::calculateHammingWeight(OcmHammingWeightTensor &ocmHammingWeight)

Calculates the hamming factor which can be used for hamming window calculation.

Template Parameters
• scale: Used to scale the hamming Weights for the output.

• OcmHammingWeightTensor: Type for the matrix ocmHammingFactor (In OCM)

Parameters
• ocmHammingWeight: the output ocmTensor that will store the hamming weights.

## delayAndSumFrame¶

template<typename OcmFrameShape, typename qTdoaType, typename qOutType, std::int32_t numChannels, std::int32_t dataTiles>
INLINE void chimera::signal::delayAndSumFrame(OcmFrameShape &ocmFrame, qVar_t<qTdoaType> (&qTdoa)[numChannels], qVar_t<qOutType> (&qOut)[dataTiles])

Calculates delay-and-sum for a single frame of audio.

Parameters
• ocmFrame: Ocm tensor with dimensions 1 x 1 x numChannels x frameSize. This stores audio data across channels (microphones) over time.

• qTdoa: 1D qVar variable of length numChannels storing index delays to be applied to each channel. Each core has the same value for a given index. note: index_delay = sample_rate * time_delay.

• qOut: 1D qVar variable of length dataTile to store output signal where dataTiles = ceil(frameSize / core_array::numArrayCores).

template<typename OcmHopShape, typename qTdoaType, typename qOutType, std::int32_t numChannels, std::int32_t dataTiles>
INLINE void chimera::signal::delayAndSumFrame(OcmHopShape &ocmHopA, OcmHopShape &ocmHopB, qVar_t<qTdoaType> (&qTdoa)[numChannels], qVar_t<qOutType> (&qOut)[dataTiles])

Calculates delay-and-sum for a single frame of audio.

To help with describing algorithm, imagine concatenating the two input ocm tensors ocmHopA and ocmHopB like so:

                       ocmHopA       ocmHopB

┌───────────────┬───────────────┐
ch 1│       A       │       B       │
├───────────────┼───────────────┤
ch 2│       A       │       B       │
├───────────────┼───────────────┤
ch 3│       A       │       B       │
└───────────────┴───────────────┘
hopSize         hopSize
◄─────────────► ◄─────────────►

Parameters
• ocmHopA: Ocm tensor with dimensions 1 x 1 x numChannels x hopSize. This stores audio data across channels (microphones) over time. hopSize = frameSize / 2.

• ocmHopB: Ocm tensor with dimensions 1 x 1 x numChannels x hopSize. This stores audio data across channels (microphones) over time. hopSize = frameSize / 2.

• qTdoa: 1D qVar variable of length numChannels storing index delays to be applied to each channel. Each core has the same value for a given index. note: index_delay = sample_rate * time_delay.

• qOut: 1D qVar variable of length dataTile to store output signal where dataTiles = ceil(frameSize / core_array::numArrayCores).

Apply index delay to each channel:

                ┌───────────────┬───────────────┐
ch 1 │       A       │       B       │
└───┬───────────┴───┬───────────┴───┐
ch 2     │       A       │       B       │
└─────┬─────────┴─────┬─────────┴─────┐
ch 3           │       A       │       B       │
└───────────────┴───────────────┘


Now, we would like to take the mean along channels for each index to produce an output array of shape (1, 2*hopSize). Zeros are introduced in regions outside of arrays due to shifting.

                  │                               │
├───────────────┬───────────────┤
ch 1 │       A       │       B       │
├───┬───────────┴───┬───────────┼───┐
ch 2 │ 0 │       A       │       B   │   │
│   └─────┬─────────┴─────┬─────┼───┴─────┐
ch 3 │    0    │       A       │     │ B       │
│         └───┬───────────┴─────┼─────────┘
│             │                 │
│             │element-wise mean│
│             ▼                 │
├───────────────────────────────┤
│            output             │
├───────────────────────────────┤
│                               │
│                               │


This mean operation happens per tile as shown below and the output is stored in the qVar_t variable qOut. You will notice that there are 3 cases that appear: all the data exists in ocmHopA, all in ocmHopB, or data sits in both ocmTensors.

                  │     │     │     │     │     │     │
├─────┼─────┼───┬─┼─────┼─────┼─┐   │
ch 1 │     │     │   │ │     │     │ │   │
├───┬─┼─────┼───┴─┼─┬───┼─────┼─┴───┤
ch 2 │   │ │     │     │ │   │     │     │
│   └─┼───┬─┼─────┼─┴───┼─┬───┼─────┼─────┐
ch 3 │     │   │ │     │     │ │   │     │     │
│     │   └─┼─────┼─────┼─┴───┼─────┼─────┘
│     │     │     │     │     │     │
│     │     │     │     │     │     │
│     │     │     │     │     │     │
│     │     │     │     │     │     │
├─────┼─────┼─────┼─────┼─────┼─┐   │
│     │     │     │     │     │ │   │
├─────┼─────┼─────┼─────┼─────┼─┘   │
│     │     │     │     │     │     │
◄───►
tile


## fft1d¶

fft1d implements the $$N$$-point Fast Fourier Transform (FFT) algorithm for calculating the discrete Fourier transform (DFT) of a 1D input signal. The are many variants of the DFT. In this library, we define the DFT as:

$A_k = \sum_{n=0}^{N-1}a_n\exp\left(-2\pi i \frac{nk}{N}\right)$

where $$k = 0, \dots, N - 1$$ and $$a_m$$ is the $$n$$ th element of the signal $$a$$ with length $$N$$.

fft1d can also execute the inverse Fast Fourier Transform (IFFT) algorithm for calculating the inverse discrete Fourier Transform (IDFT). This is enabled by setting the parameter inverse to true. The inverse discrete Fourier transform (IDFT) is defined as

$a_n = \frac{1}{N}\sum_{k=0}^{N-1}A_k\exp\left(2\pi i \frac{nk}{N}\right)$

where $$n = 0, \dots, N - 1$$.

Below you will find 3 overwrite versions of fft1d which expect different input. Here is a quick summary of the main differences:

1. fft1D can operate on ddr input and output tensors

2. fft1D can operate on ocm input tensor with precomputed wieghts as qVar_t variable.

3. fft1D can operate on ocm input tensor with weights computed internally.

See further detailed documentation of each version below.

template<bool complexInput = true, typename DdrInOutSignalTensorShape, typename qvarElemType = typename DdrInOutSignalTensorShape::elemType>
INLINE void chimera::signal::fft1d(DdrInOutSignalTensorShape &ddrIn, DdrInOutSignalTensorShape &ddrOut, MemAllocator &ocmMemAlloc)

Performs a 1D N point FFT on a given DDR signal. N needs to be a power of two We maps N point FFT using Cooley–Tukey FFT algorithm. Similar to the explanation found here https://www.cs.cmu.edu/afs/andrew/scs/cs/15-463/2001/pub/www/notes/fourier/fourier.pdf The diagram below show 2 weight matrixes for a 8 point FFT.

* Below Wn = e^(-2*pi*n/N)
* +--------------------------+   +--------------------------+    +-----+
* | 1   W0                   |   | 1 W0                     |    |  a0 |
* |                          |   |                          |    |     |
* |    1   W2                |   | 1 W4                     |    |  a4 |
* |                          |   |                          |    |     |
* | 1   W4                   |   |      1 W0                |    |  a2 |
* |                          |   |                          |    |     |
* |    1   W6                |   |      1 W4                |    |  a6 |
* |                          |   |                          |    |     |
* |             1   W0       |   |           1 W0           |    |  a1 |
* |                          |   |                          |    |     |
* |                1   W2    |   |           1 W4           |    |  a5 |
* |                          |   |                          |    |     |
* |             1   W4       |   |                 ++       |    |  a3 |
* |                          |   |                          |    |     |
* |                1   W6    |   |                    ++    |    |  a7 |
* |                          |   |                          |    |     |
* +--------------------------+   +--------------------------+    +-----+
*       Weight stage 1                   Weight stage 0          input
*   W0, W2, W4, W6, W0....         W0, W4, W0, W4, W0....       a0, a4, a2 ....
*
* Weights is a sparse matrix but stored in a packed manner. There are a total of log2(inputPoint) where inputPoints
* = len(input matrix).
*
* The Diagram below shows the interaction of weights and inputs for stage 0
* +-----------+        +-----------+        +-----------+       +-----------+
* |           |        |           |        |           |       |           |
* |     a0    | <----> |     a4    |        |     a2    | <---> |    a6     |
* | a0 + W0*a4|        | a0 + W4*a4|        | a2 + W4*a6|       | a2 + W4*a6|
* |           |        |           |        |           |       |           |
* +-----------+        +-----------+        +-----------+       +-----------+
*
* The Diagram below shows the interaction of weights and inputs for stage 1
*  +-----------+        +-----------+        +-----------+       +-----------+
*  |           | <-------------------------> |           |       |           |
*  |    a0     |        |    a1     |        |    a2     |       |    a3     |
*  | a0 + W0*a2|        | a1 + W2*a3|        | a0 + W4*a2|       | a1 + W6*a3|
*  |           |        |           | <------------------------> |           |
*  +-----------+        +-----------+        +-----------+       +-----------+
*
* There will be a total of log2(inputPoint) stages and for each stage the neighbor data is 2^stage distance away.
* e.g on a 8x8 array, for stages 0-2, the neighbor data lies within the same row, stages 3-5 lies within the same
* column and anything over lies withing the same array core since the data wraps around
*
* For inverse fft, imag part is negated before and after the forward fft, also before the output, both are divided
* by inputPoint
*
* ref: https://www.dsprelated.com/showarticle/800.php , method 4
*
*


deduced) DdrInOutSignalTensorShape DDR input tensor. Expected to be of the form <1, complexCount,1,inputPoints>. ComplexCount = 1 if only real or 2 if complex

Return

void

Template Parameters
• complexInput: whether the input is complex or real

Template Parameters
• (deduced): DdrWeightsTensorShape: DDR weight tensor. Expected to be of the form <1, complexCount,log2(inputPoints),inputPoints>. ComplexCount = 1 if only real or 2 if complex

• (deduced): DdrWeightsTensorShape::elemType

Parameters
• ddrIn: input ddr tensor

• ddrOut: output ddr containing the final result

• ocmMemAlloc: ocm allocator to be used locally in fft to allocate ocm memory space

template<bool complexInput = true, typename OcmShape, typename T, FracRepType numFracBits, typename FxType = FixedPoint<T, numFracBits>>
INLINE void chimera::signal::fft1d(OcmShape &ocmIn, qVar_t<FixedPoint<T, numFracBits>> qWeights[], qVar_t<FixedPoint<T, numFracBits>> qOutput[], bool inverse = false)

computes 1d fft with precomputed weights. See further detail about core fft algorithm in docstring of fft1d operating on ddr tensors.

Return

void

Template Parameters
• complexInput: whether the input is complex or real

• OcmShape: (deduced) Expected to be of the form <1, complexCount, 1, inputPoints>. ComplexCount = 1 if only real or 2 if complex

• T: (deduced) fixed point data type used by qVar_t input variables

• numFracBits: (deduced) number of fractional bits of fixed point data type used by qVar_t input variables

• FxType: = FixedPoint<T, numFracBits>

Parameters
• ocmIn: input ocm data tensor storing 1D signal.

• qWeights: qVar variable storing fft weights. Length is equal to 2 * log2(frameSize) * ceil(frameSize /core_array::numArrayCores).

• qOutput: The qVar variable of length dataTiles where dataTiles = ceil(frameSize /core_array::numArrayCores) which will store the output.

• inverse: boolean value to pick forward (false) or inverse (true) fft.

template<bool complexInput = true, typename OcmShape, typename T, FracRepType numFracBits, typename FxType = FixedPoint<T, numFracBits>>
INLINE void chimera::signal::fft1d(OcmShape &ocmIn, qVar_t<FixedPoint<T, numFracBits>> qOutput[], bool inverse = false)

computes 1d fft where weights are calculated internally. see further detail about core fft algorithm in docstring above for public api version of fft1d.

Return

void

Template Parameters
• complexInput: whether the input is complex or real

• OcmShape: (deduced) Expected to be of the form <1, complexCount, 1, inputPoints>. ComplexCount = 1 if only real or 2 if complex

• T: (deduced) fixed point data type used by qOutput.

• numFracBits: (deduced) number of fractional bits of fixed point data type used by qOutput.

• FxType: = FixedPoint<T, numFracBits>

Parameters
• ocmIn: input ocm data tensor storing 1D signal.

• qOutput: The qVar variable of length dataTiles where dataTiles = ceil(frameSize /core_array::numArrayCores) which will store the output.

• inverse: boolean value to pick forward (false) or inverse (true) fft.

## rfft1d¶

rfft1d implements the $$N$$-point Fast Fourier Transform (FFT) algorithm for calculating the discrete Fourier transform (DFT) of multiple real valued 1D input signals. This makes use of the optimization where the FFT for two real sequences can be calculated by using just one FFT step. The math behind this two for one is defined as follows:

$\begin{split}\mathbf{Input}&\; x[n], y[n] \in \Re, n = 0, \dots,N-1 \\ \mathbf{Output}&\; X[k] = DFT_k^N\{x\}, Y[k] DFT_k^N\{y\} \\ &1.\; z[n] = x[n] + jy[n] \\ &2.\; Z[k] = DFT_k^N\{z\} \\ &3.\; X[k] = \frac{Z[k]+Z^*[N-k]}{2} \\ &4.\; Y[k] = -j \frac{Z[k]-Z^*[N-k]}{2} \\\end{split}$

Since $$X$$ and $$Y$$ are FFts of real-values sequences, $$X[k] = X^*[N-k]$$ and therefore the two sequences need to be computed for only $$0,\dots,N/2$$ points (inclusive). The optimization happens pair-wise for input channels and any odd numbered channel left does not make use of the optimization.

template<typename OcmInputShape, typename OcmFftShape, typename T, FracRepType numFracBits, typename FxType = FixedPoint<T, numFracBits>>
INLINE void chimera::signal::rfft1d(MemAllocator &ocmMem, OcmInputShape &ocmIn, qVar_t<FixedPoint<T, numFracBits>> qWeights[], OcmFftShape &ocmOut)

1. X[k] = \frac{Z[k]+Z^*[N-k]}{2}

2. Y[k] = -j \frac{Z[k]-Z^*[N-k]}{2} The X and Y are only calculated for the first 0,…,N/2 points because of the symmetric output of pure real sequences

Return

void

Template Parameters
• OcmInputShape: (deduced) Expected to be of the form <1, 1, numChannels, inputPoints>

• OcmFftShape: (deduced) Expected to be of the form <1, 2, numChannels, numRfftFrequencies>

• T: (deduced)

• numFracBits: (deduced)

• FxType: (deduced)

• numFracBits: (deduced)

Parameters
• ocmMem: ocm memory allocator used internally for RAU.

• ocmIn: input ocm data tensor storing 1D signal.

• qWeights: qVar variable storing fft weights. Length is equal to 2 * log2(frameSize) * ceil(frameSize /core_array::numArrayCores).

• ocmOut: the real-FFT output for all channels.

## fft2d¶

fft2d implements a 2d version of fft1d. The 2D DFT is defined as:

$A_{kl} = \sum_{m=0}^{M-1}\sum_{n=0}^{N-1}a_{mn}\exp\left(-2\pi i \left(\frac{mk}{M} + \frac{nl}{N}\right)\right)$

where $$k = 0, \dots, M - 1$$, $$l = 0, \dots, N - 1$$, and $$a_{mn}$$ is the $$(m,n)$$ element of the signal $$a$$ with shape $$(M, N)$$.

fft2d currently does not have inverse implemented.

template<typename DdrInOutSignalTensorShape, typename DdrWeightsTensorShape, typename qvarElemType = typename DdrWeightsTensorShape::elemType>
INLINE void chimera::signal::fft2d(DdrInOutSignalTensorShape &ddrIn, DdrWeightsTensorShape &ddrWeights, DdrInOutSignalTensorShape &ddrOut, MemAllocator &ocmMemAlloc)

Performs a 2D N point FFT on a given a 2D DDR signal.

*           1D per row FFT                         Store the output at                   Transpose                    1D row FFT. This in essence is 1D col
*                                                  bitreversed location                                               col FFT since we transposed it.
* +------------------------------+          +-----------------------------+       +----------------------------+       +----------------------------+
* +------------------------------+          |                             |       |                            |       +----------------------------+
* |    a0  a1  a2  a3 ...        +-----+ +->+    cc0  cc1  cc2  cc3  ...  |       |  cc0  bb0 ee0  aa0         |       |  cc0  bb0 ee0  aa0         |
* +------------------------------+     | |  |                             |       |                            |       +----------------------------+
* |    b0  b1  b2  b3 ...        +--------->+    bb0  bb1  bb2  bb3  ...  |       |  cc1  bb1 ee1  aa1         |       |  cc1  bb1 ee1  aa1         |
* +------------------------------+     | |  |                             | +---> |                     ...    | +---> +----------------------------+   +--->  Result (transposed)
* |    c0  c1  c2  c3 ...        +-------+  |    ee0  ee1  ee2  ee3  ...  |       |  cc2  bb2 ee2  aa2         |       |  cc2  bb2 ee2  aa2         |
* +------------------------------+     |    |                             |       |                            |       +----------------------------+
* |    d0  d1  d2  d3 ...        +--+  +--->+    aa0  aa1  aa2  aa3  ...  |       |  cc3  bb3 ee3  aa3         |       |  cc3  bb3 ee3  aa3         |
* +------------------------------+  |       |                             |       |                            |       +----------------------------+
* |                              |  |       |            ...              |       |  ...  ...  ...             |       |  ...  ...  ...             |
* |                              |  +------>+    dd0  dd1  dd2  dd3  ...  |       |                            |       |                            |
* |                              |          |                             |       |                            |       |                            |
* +------------------------------+          +-----------------------------+       +----------------------------+       +----------------------------+
*

Return

void

Template Parameters
• (deduced): DdrInOutSignalTensorShape: DDR input tensor shape . Expected to be of the form <1, complexCount,inputPoints,inputPoints>. ComplexCount = 1 if only real or 2 if complex

• (deduced): DdrWeightsTensorShape: DDR weight tensor. Expected to be of the form <1, complexCount,log2(inputPoints),inputPoints>. ComplexCount = 1 if only real or 2 if complex

• (deduced): DdrWeightsTensorShape::elemType

Parameters
• ddrIn: input ddr tensor

• ddrWeights: ddr weights

• ddrOut: output ddr containing the final result

• ocmMemAlloc: ocm allocator to be used locally in fft to allocate ocm memory space

## hamming¶

template<typename OcmFrameTensor, typename OcmHammingWeightTensor>
INLINE void chimera::signal::hamming(OcmFrameTensor &ocmFrameData, OcmHammingWeightTensor &ocmHammingWeight)

Applies a hamming window over a multi-channel signal using the pre-computed hamming factor.

Template Parameters
• OcmFrameTensor: Type for the matrix ocmFrameData (In OCM)

• OcmHammingWeightTensor: Type for the matrix ocmHammingWeight (In OCM)

Parameters
• ocmFrameData: the input frame from all channels.

• ocmHammingWeight: the output ocmTensor that will store the hamming weights.

template<typename OcmFrameTensor>
INLINE void chimera::signal::hamming(OcmFrameTensor &ocmFrameData)

Applies a hamming window over a multi-channel signal.

Template Parameters
• OcmFrameTensor: Type for the matrix ocmFrameData (In OCM)

Parameters
• ocmFrameData: the input frame from all channels.

template<typename OcmHammingWeightTensor, typename dataType, FracRepType numFracBits, std::int32_t dataTiles>
INLINE void chimera::signal::hamming(qVar_t<FixedPoint<dataType, numFracBits>> (&qData)[dataTiles], OcmHammingWeightTensor &ocmHammingWeight)

Applies a hamming window over a single-channel signal using pre-computed hamming weights.

Template Parameters
• OcmHammingWeightTensor: Type for the matrix ocmHammingWeight (In OCM)

Parameters
• qData: 1D qVar variable of length dataTiles where dataTiles = ceil(frameSize /core_array::numArrayCores).

• ocmHammingWeight: the output ocmTensor that will store the hamming weights.

## mvdrFrame¶

template<typename OcmFrameTensor, typename OcmCovarsTensor, typename OcmTdoaTileTensor, typename dataType, FracRepType numFracBits, std::int32_t dataTiles>
INLINE void chimera::signal::mvdrFrame(OcmTdoaTileTensor &ocmTdoa, std::int32_t frameCount, MemAllocator &ocmMem, OcmFrameTensor &ocmData, OcmCovarsTensor &ocmCovars, qVar_t<FixedPoint<dataType, numFracBits>> (&qOutFrame)[dataTiles])

Does a beamforming step for the MVDR beamformer on one frame with steering vector calculation.

Template Parameters
• OcmFrameTensor: Type for the matrix ocmData (In OCM)

• OcmCovarsTensor: Type for the matrix ocmCovars (In OCM)

• OcmTdoaTileTensor: Type for the matrix ocmTdoa (In OCM)

Parameters
• ocmTdoa: The time(index) delay of arrival for each channel.

• frameCount: Which frame id (1-indexed) are we processing. Required for covariance calculation.

• ocmMem: OCM allocator

• ocmData: The input for all channels. This assumes that the hamming window has already been applied.

• ocmCovars: Used to carry the covariance tensor between frames as that is accumulated over time.

• qOutFrame: The qVar variable of length dataTiles where dataTiles = ceil(frameSize /core_array::numArrayCores) which will store the output.