Rationale

Large data - more computations to sort - slow.

FPGAs have potential for tailored design and parallelism


Hardware Basics of Sorting


Types of Sorts

Insertion sort

Time Complexity: O(n^2)

Issue - the architecture would not allow fast and large implementations with many comparators because the input signal has to be broadcasted to all comparator instances

not good for large sets

Bucket Sort

Time Complexity: O(n^2)

Divide and conquer, based on a pivot point

Merge Sort

Time Complexity: O(n log(n))

FIFO-based Merge Sort

We can cascade several sorters

Each cascade will double the required memory resources

Merge Sort Trees

Backpressure flow control in the network


Sorting Networks

FPGA designs can obtain a time complexity of O(log n * log n)

Common architectures include the bitonic merge sort network, and even-odd networks.
For large sets... still an issue..


Optimised Merge-Sort Micro-Architecture

  • Bitonic Sorter stage 1 - Optimal sorting of 4 elements
  • Bitonic Half Cleaner - Switching of 2 elements by value
  • Bitonic Sorter stage 2 - Optimal sorting of 4 elements

n-tuple sorting

Imagine we have data of arbitrary size n

An n-tuple has N= floor(bandwidth / sizeof(element)) elements


Macro Architecture


Performance