Read Identifying Performance Bottlenecks


Note about Pipelining

Tutorial page 120

Even though we pipelined the inner nested loop; the write operation could not be flattened into the inner loop.
We need to pipeline the outer loop instead - and unroll the inner loop instead

  • Will increase the area usage

Final Results

We've a <125 cycle interval; though we used 8x more FFs, and 2x more LUTs