Operational Chaining

The high-level synthesiser will look at the functions and attempt to reduce the number of clock cycles required by possibly increasing the clock frequency.

As we lengthen the clock period ---> slower clock frequency, then we can pack more functionality per cycle.

Code Hoisting

Code optimisation to refactor redundant / uncommon code paths.

Loop Fission

Split a loop into multiple loops. This therefore allows the loops to be treated and optimised independently ~~(and can run in parallel)~~

Loop Unrolling

By default, HLS synthesis for loops sequentially; creating a data path that executes sequentially for each iteration of the loop. Replicate the loop body [within the same loop] and split the tasks.

We can insert the directive #pragma HLS unroll factor=2 to automate this.

If we don't specify a factor argument, the loop will be unrolled completely - This maximises the hardware resource usage (and takes a long time to synthesise). The bounds of the loop need to be statically defined (i.e. during compile time).

something something exit check?

Loop Pipelining

Overlapping of executions (where possible).
We can use the directive #pragma HLS pipeline II=2 which will attempt to achieve an II of 2. If we don't specify the II argument, II will attempt to be minimised

Loop Performance Metrics

Iteration latency - number of cycles it takes to perform one iteration of the loop body
Loop latency - number of cycles to complete the entire execution PLUS one to determine if the loop is finished / or a writeback
- Vivado HLS defines the loop latency prior to writeback
Initiation interval (II)- number of cycles before the next iteration of the loop can start
- A higher II value can potentially increase the maximum operating frequency (fMAX) without a decrease in throughput

Bit-width Optimisation

See here

Loop Interchange / Pipeline-interleaved Processing

Switching loop variable usage around to reduce repeated lookups

See here

Function Pipelining

When pipelining a function, all loops contained in the function are unrolled, which is a requirement for pipelining

Pipelining loops gives you an easy way to control resources, with the option of partially unrolling the design to meet performance.

False Dependencies

For operations to block RAM (i.e. two ported), we must alternate between reads and writes as long as x0 and x1 are independent - in order to complete the operation.

What if they are not actually independent? For instance, we might know that the source of data never produces two consecutive pieces of data that actually have the same bin. What do we do now? If we could give this extra information to the HLS tool, then it would be able to read at location x1 while writing at location x0 because it could guarantee that they are different addresses. In Vivado® HLS, this is done using the dependence directive.

To overcome this deficiency, you can use the DEPENDENCE directive to provide Vivado HLS with additional information about the dependencies.

Inter: Specifies the dependency is between different iterations of the same loop.
If this is specified as FALSE it allows Vivado HLS to perform operations in parallel if the
pipelined or loop is unrolled or partially unrolled and prevents such concurrent operation
when specified as TRUE.

Intra: Specifies dependence within the same iteration of a loop, for example an array being
accessed at the start and end of the same iteration.
When intra dependencies are specified as FALSE, Vivado HLS may move operations freely
within the loop, increasing their mobility and potentially improving performance or area.
When the dependency is specified as TRUE, the operations must be performed in the order
specified.

Implications of Performance

Contents