Data Types

Tensors

Tensors are 4D arrays of homogenous data. EPU hardware is designed around (and optimized for) tensor operations – efficiently flowing tensor data between OCM (on-chip memory) and the core, and performing arithmetic operations in parallel across tensor elements.

Tensor data can also flow between OCM and external DDR memory, although far less efficiently.

Instantiating Tensors

The locations of tensors are defined at compile time using the following data types for storing in DDR and OCM respectively:

DDR tensors

  DdrTensor<std::int32_t, 1, 2, 12, 15> ddrTensor;
  //          data type   Batch, Channels, Rows, Cols

Note

DDR Tensors need to be “recreated” in the EPU kernel, in order to recreate shape information, after being passed in by the host. This is due to their being passed in as pointers. See this example, which would be placed inside of the EPU_ENTRY { }. Please see Creating a Simple Kernel for an example.

DdrInOutShape ddrInp(ddrInpPtr);
DdrInOutShape ddrOut(ddrOutPtr);

OCM tensors

  OcmTensor<std::int32_t, 1, 2, 12, 15> ocmTensor;
  //          data type   Batch, Channels, Rows, Cols
OCM Tensor Allocation

The MemAllocator in quadric C++ uses two pointers to track the current memory allocation point and the previous memory allocation point in OCM. Memory allocation should be performed on a LIFO basis. All OCM allocation within quadric C++ is controlled by the developer.

Tensors in the OCM should be allocated and freed before and after being used, such as in the below example:

  1. Defining an OcmTensor type:

    typedef OcmTensor<std::int32_t, 1, 1, 8, 8> OcmInOutShape;
    
  2. Instantiating the memory allocator:

    MemAllocator ocmMem;
    

    Note

    The tensor above has already been populated. A full example of populating tensors can be found in the Creating a Simple Kernel tutorial.

  3. Allocating an instance of the tensor defined above:

      OcmInOutShape ocmInp;
      OcmInOutShape ocmOut;
      // Create an instance of the On Chip Memory (OCM) Memory Allocator
      MemAllocator ocmMem;
      ocmMem.allocate(ocmInp);
      ocmMem.allocate(ocmOut);
    
  4. Finally, freeing the tensor in when done:

    ocmMem.free(ocmInp);
    

Using Tensors

Tensors are accessed and used by moving them around and performing compute on them. We move them around using various Data Access Patterns and perform compute operations on them with the Math Library.

qVar_t

The qVar_t type is a template wrapper that maps directly to core array memory. Each qVar_t instance represents a tile of data. It provides convenient syntax for assigning values to array cores, and well as type safety mechanisms to prevent copies from core types back to scalar types. It’s a SIMD (single instruction multiple data) approach that simplifies the code required for assignment of scalar values to multiple array elements in parallel.

Using the qVar_t Type

Since a variable of type qVar_t represents the entire core array, assigning a value populates every element.

You cannot directly address individual cores using indices, but you can assign unique values conditionally, through predication, to any element in the array. See Accessing Individual Cores below.

Below, see an example qVar_t:

qVar_t<std::int32_t> qI32 = 525; //declare qI32 as qVar_t type, assign value of 525
debugPrint(qI32);

This example assigns the scalar value of 525 to a core variable, resulting in that value being stored in every core.

8x8 Output:

[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
--------------------------------------------------------------------------------------------------------------------
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
--------------------------------------------------------------------------------------------------------------------
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]
[525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]  [525]  [525]  [525]  [525]   | [525]  [525]  [525]  [525]

Note

Certain compiler optimizations may result in the conversion of the above qVar_t to a scalar, in which case the output would appear as such:

Constant: 525

Default Type

When the template type is unspecified for a qVar_t, it defaults to std::int32_t, so qVar_t<> qI32 is equivalent to qVar_t<std::int32_t> qI32

Core <-> Core

Note

The qRow<> and qCol<> variables shown below are special variables that give the row and column indices within the array. They are used here to introduce divergent data between cores.

Add the current core to current column and store to new variable:

qVar_t<std::int32_t> row = qRow<>;
qVar_t<std::int32_t> col = qCol<>;

qVar_t<std::int32_t> qVal = row + col;
debugPrint(qVal);

8x8 Output:

[ -8]  [ -7]  [ -6]  [ -5]   | [ -4]  [ -3]  [ -2]  [ -1]  [  0]  [  1]  [  2]  [  3]   | [  4]  [  5]  [  6]  [  7]
[ -7]  [ -6]  [ -5]  [ -4]   | [ -3]  [ -2]  [ -1]  [  0]  [  1]  [  2]  [  3]  [  4]   | [  5]  [  6]  [  7]  [  8]
[ -6]  [ -5]  [ -4]  [ -3]   | [ -2]  [ -1]  [  0]  [  1]  [  2]  [  3]  [  4]  [  5]   | [  6]  [  7]  [  8]  [  9]
[ -5]  [ -4]  [ -3]  [ -2]   | [ -1]  [  0]  [  1]  [  2]  [  3]  [  4]  [  5]  [  6]   | [  7]  [  8]  [  9]  [ 10]
------------------------------------------------------------------------------------------------
[ -4]  [ -3]  [ -2]  [ -1]   | [  0]  [  1]  [  2]  [  3]  [  4]  [  5]  [  6]  [  7]   | [  8]  [  9]  [ 10]  [ 11]
[ -3]  [ -2]  [ -1]  [  0]   | [  1]  [  2]  [  3]  [  4]  [  5]  [  6]  [  7]  [  8]   | [  9]  [ 10]  [ 11]  [ 12]
[ -2]  [ -1]  [  0]  [  1]   | [  2]  [  3]  [  4]  [  5]  [  6]  [  7]  [  8]  [  9]   | [ 10]  [ 11]  [ 12]  [ 13]
[ -1]  [  0]  [  1]  [  2]   | [  3]  [  4]  [  5]  [  6]  [  7]  [  8]  [  9]  [ 10]   | [ 11]  [ 12]  [ 13]  [ 14]
[  0]  [  1]  [  2]  [  3]   | [  4]  [  5]  [  6]  [  7]  [  8]  [  9]  [ 10]  [ 11]   | [ 12]  [ 13]  [ 14]  [ 15]
[  1]  [  2]  [  3]  [  4]   | [  5]  [  6]  [  7]  [  8]  [  9]  [ 10]  [ 11]  [ 12]   | [ 13]  [ 14]  [ 15]  [ 16]
[  2]  [  3]  [  4]  [  5]   | [  6]  [  7]  [  8]  [  9]  [ 10]  [ 11]  [ 12]  [ 13]   | [ 14]  [ 15]  [ 16]  [ 17]
[  3]  [  4]  [  5]  [  6]   | [  7]  [  8]  [  9]  [ 10]  [ 11]  [ 12]  [ 13]  [ 14]   | [ 15]  [ 16]  [ 17]  [ 18]
------------------------------------------------------------------------------------------------
[  4]  [  5]  [  6]  [  7]   | [  8]  [  9]  [ 10]  [ 11]  [ 12]  [ 13]  [ 14]  [ 15]   | [ 16]  [ 17]  [ 18]  [ 19]
[  5]  [  6]  [  7]  [  8]   | [  9]  [ 10]  [ 11]  [ 12]  [ 13]  [ 14]  [ 15]  [ 16]   | [ 17]  [ 18]  [ 19]  [ 20]
[  6]  [  7]  [  8]  [  9]   | [ 10]  [ 11]  [ 12]  [ 13]  [ 14]  [ 15]  [ 16]  [ 17]   | [ 18]  [ 19]  [ 20]  [ 21]
[  7]  [  8]  [  9]  [ 10]   | [ 11]  [ 12]  [ 13]  [ 14]  [ 15]  [ 16]  [ 17]  [ 18]   | [ 19]  [ 20]  [ 21]  [ 22]

Accessing Individual Cores

From the developer’s perspective, a qVar_t type can be treated as a single variable with regard to assignment and conditional testing of special variables.

Although there is no index-level access to individual cores, predication can be used to assign a unique value to a given core based on its location in the array. Use the qRow<> and qCol<> in if statements to isolate specific cores in the array, as shown here:

qVar_t<std::int32_t> predicatedAssignment = 42; // Initialize all cores

if (qVal > 7) {
  predicatedAssignment = qVal;          // qVal = qRow<> + qCol<>
}

debugPrint(predicatedAssignment);

The code within the if block will only execute on cores where the if condition is true.

8x8 Output:

[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [  8]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [  8]  [  9]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [  8]  [  9]  [ 10]
------------------------------------------------------------------------------------------------
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]   | [  8]  [  9]  [ 10]  [ 11]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [  8]   | [  9]  [ 10]  [ 11]  [ 12]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [  8]  [  9]   | [ 10]  [ 11]  [ 12]  [ 13]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [ 42]  [  8]  [  9]  [ 10]   | [ 11]  [ 12]  [ 13]  [ 14]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [ 42]  [  8]  [  9]  [ 10]  [ 11]   | [ 12]  [ 13]  [ 14]  [ 15]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [ 42]  [  8]  [  9]  [ 10]  [ 11]  [ 12]   | [ 13]  [ 14]  [ 15]  [ 16]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [ 42]  [  8]  [  9]  [ 10]  [ 11]  [ 12]  [ 13]   | [ 14]  [ 15]  [ 16]  [ 17]
[ 42]  [ 42]  [ 42]  [ 42]   | [ 42]  [  8]  [  9]  [ 10]  [ 11]  [ 12]  [ 13]  [ 14]   | [ 15]  [ 16]  [ 17]  [ 18]
------------------------------------------------------------------------------------------------
[ 42]  [ 42]  [ 42]  [ 42]   | [  8]  [  9]  [ 10]  [ 11]  [ 12]  [ 13]  [ 14]  [ 15]   | [ 16]  [ 17]  [ 18]  [ 19]
[ 42]  [ 42]  [ 42]  [  8]   | [  9]  [ 10]  [ 11]  [ 12]  [ 13]  [ 14]  [ 15]  [ 16]   | [ 17]  [ 18]  [ 19]  [ 20]
[ 42]  [ 42]  [  8]  [  9]   | [ 10]  [ 11]  [ 12]  [ 13]  [ 14]  [ 15]  [ 16]  [ 17]   | [ 18]  [ 19]  [ 20]  [ 21]
[ 42]  [  8]  [  9]  [ 10]   | [ 11]  [ 12]  [ 13]  [ 14]  [ 15]  [ 16]  [ 17]  [ 18]   | [ 19]  [ 20]  [ 21]  [ 22]

More general predication can also be used to enable behavior on cores that only meet specific requirements, in the same way that scalar predication is done.

Using qVar_t variables with predication can result in parallel ifs, which have very specific behavior, as described in the Language Concepts section.

FixedPoint

The quadric SDK implements a FixedPoint type for use with numbers that have a fractional component. It’s essentially a leaner alternative to the standard float type, which is not supported.*

SDK Support for C++ Types

The quadric C++ API supports most C++ scalar types up to 32-bits. Only float is unsupported. In place of float, the API provides the FixedPoint type, which implements a more architecturally efficient way to represent numbers with fractional components.

Using the FixedPoint Type

Understanding FixedPoint types is fundamental to using the SDK. FixedPoint is implemented as a template wrapper around integer types to represent numbers with fractional parts while minimizing the kind of runtime performance degradation associated the overhead of float.

FixedPoint<T, N>: Where T is a signed integer type and N is the number of bits dedicated to representing the fractional portion, with N >= 0 and < bit size of underlying type. In Q number parlance, this represents the .N component (Wikipedia - Q number format).

N is of type FracRepType, which is guaranteed to be an unsigned type large enough to fit the bit count of any compatible integer type.

Aliases for 16/32 bit types

The SDK library also provides two convenience aliases for 32 bit and 16 bit types.

Below, see an example of using one of these convenience aliases:

template <FracRepType fractionalBits>
using FixedPoint32 = FixedPoint<std::int32_t, fractionalBits>;

template <FracRepType fractionalBits>
using FixedPoint16 = FixedPoint<std::int16_t, fractionalBits>;

Note

The simulator debugPrint() functions have overloads for FixedPoint types, which will be used in the following examples. When printing a FixedPoint type with a debugPrint(), it will show the floating point representation. The .value member of the FixedPoint type allows access to the raw integer representation without the wrapper behavior.

Initialization

You can initialize FixedPointtypes from integers, floating point*, or otherFixedPointvariables. All conversions occur automatically based on the types.

Initialize from integer

  std::int32_t    i32    = 25;
  FixedPoint32<2> fx32_2 = i32;  // automatic conversion to Q.2, underlying value: 25 << 2 = 100
  debugPrint(i32);
  debugPrint(fx32_2);
  debugPrint(fx32_2.value);

Debug Output

[ PRINT 7 ] Constant: 25
[ PRINT 8 ] Constant: 25
[ PRINT 9 ] Constant: 100

Initialize from FixedPoint

  FixedPoint32<5> fx32_5 = fx32_2;  // automatic conversion from Q.2 to Q.5
  debugPrint(fx32_5);
  debugPrint(fx32_5.value);

  FixedPoint32<1> fx32_1 = fx32_3;  // automatic conversion from Q.3 to Q.1
  debugPrint(fx32_1);
  debugPrint(fx32_1.value);

Debug Output

[ PRINT 14 ] IMD register r10 = 25
[ PRINT 15 ] IMD register r10 = 800
[ PRINT 17 ] IMD register r10 = 25.5
[ PRINT 18 ] IMD register r10 = 51

Initialize from floating point (constexpr only)

Even though float types are not supported at runtime, constexpr FixedPoint variables can be initialized from a floating point literal.

  constexpr FixedPoint32<3> fx32_3 = 25.5;  // automatic conversion (compile time only), from float to Q.3 (204)
  debugPrint(fx32_3);
  debugPrint(fx32_3.value);

Debug Output

[ PRINT 11 ] Constant: 25.5
[ PRINT 12 ] Constant: 204

Raw Value Access

As mentioned above, the raw integer value of a FixedPoint type can be accessed via the .value variable.

Below, see an example of accessing the whole number component:

  std::int32_t i32_copy = fx32_5;
  std::int32_t i32_5    = fx32_5.value;

  debugPrint(i32_copy);
  debugPrint(i32_5);

Debug Output

[ PRINT 20 ] IMD register r11 = 25
[ PRINT 21 ] IMD register r10 = 800

This can be useful if you need to do raw manipulation of the data without the automatic conversions and is safer than a reinterpret_cast.

Binary/Dyadic Operations

The FixedPoint type supports the basic C++ arithmetic operations (+,-,*,/) as well as bit shifts and comparisons, all with automatic conversion between types. When operations occur between dissimilar types, the resulting type will be based on the type from the left side of the binary operation.

Below, see an example of binary computation:

  // fx32_3 is converted to FixedPoint32<5> format prior to operation. The resulting type is also FixedPoint32<5>
  auto fx32_5add = fx32_5 + fx32_3;

  // This subtraction operation will convert fx32_3 to FixedPoint32<5> for the operation, the resulting value will be
  // converted from FixedPoint32<5> to FixedPoint32<3> for assignment then convert to FixedPoint32<3> for assignment.
  FixedPoint32<3> fx32_3sub = fx32_5 - fx32_3;

Integers are treated as FixedPoint{int size}<0> types for the purpose of arithmetic operations. When doing binary operations with FixedPoint types and integers, the FixedPoint type resolution is used instead of integer (regardless of order)

Caveats

When working with FixedPoint types, it’s important to understand the underlying conversions implied by particular operations. Some non-obvious conversions happen automatically, and can lead to truncation of data or ambiguous order of operation if defensive coding recommendations are not observed.

FixedPoint Representation Conversions

The risk of data loss through truncation should be considered when using FixedPoint types. When converting from one representation to another, leading or trailing bits can be lost (depending on the direction of the conversion).

For multiplication and division, the FixedPoint type uses special hardware features to avoid data loss during the calculation. Addition and subtraction operations do not have this advantage. The right side operand will be converted to the same format as the left operand before the calculation.

Downcast/Upcast Conversions

For downcasts and upcasts (from 32 to 16 bit for example) of dissimilar fractional representation FixedPoint types combine two operations into one. The FixedPointtype does not currently guarantee the order of those operations.

Below, see an example of a casting caveat:

To ensure data integrity for such conversions, split the conversion into separate operations:

Note that the result of the first operation is incorrect due to shifting 16 bits on a 16 bit type, instead of shifting 16 bits on a 32 bit type and then downcasting.

Since FixedPoint types are simply template wrappers around integer types, the result in the second example does not actually incur any extra copies. It allows the compiler to enforce order of operations (shift first, then downcast) with zero runtime overhead. In cases where this kind of ambiguity may occur, it is recommended that you use temporary conversion types.