• Represent a periodic signal as a sum of sinusoids
    • Transforms a fixed number of signal samples (i.e. discrete) in the time domain to a discrete signal in the frequency domain
    • Allows for advanced filtering and modulation techniques, as well as fast large integer and polynomial multiplication
  • Essentially a matrix-vector multiplication, where the matrix is a fixed set of coefficients

  • N real values, signal represented by g[] (time domain)
  • N/2 + 1 complex values, signal represented by G[] (frequency domain)
    • G = S·g
    • S is the matrix which our discrete time signals are multiplied by
      • // Columns represent fractional rotations //

Visualisation (8 point DFT)

  • Symmetry e.g. s[i][j] = s[j][i]

Matrix Multiplication Implementation

Read more here


Baseline

Notes

  • N = 256 means that SIZE = 256 * 256 = a big value
  • This implementation is not ideal for HLS
    • cos and sin are slow!
      • Maybe we could use CORDIC 🤔

Schedule

Architecture

Optimisation

  • Floating point operations is expensive and requires many pipeline stages
  • Pipelining reduces the impacts of these high-latency operations
  • However, the loop carry dependency of temp_real and temp_imag limits the achievable II when pipelining the inner loop

Optimisation - Loop Interchange / Pipeline-interleaved Processing

e.g. swap i and j

Accumulate independent values

Optimisation - Lookups for Sin and Cos

Optimisation - Interface