Lol well this was all wrong xd but das okay

Background

A good algorithm

  • Values are only compared where needed
  • Quick speed / low execution counts
  • Deterministic output
  • Sort stability

Sorting algorithms

  • Bubble sort (unoptimised)
  • Bubble sort (optimised)
  • Insertion sort
  • Selection sort
  • Shell sort (Powers of 4)
  • Shell sort (Sedgewick)
  • Pseudo-Bogo Sort

Complexity

SortBest CaseAverage CaseWorst Case
BubbleO(n)O(n^2)O(n^2)
InsertionO(n)O(n^2)O(n^2)
SelectionO(n^2)O(n^2)O(n^2)
ShellO(n logn)O(n logn^2)O(n logn^2)
Shell (Sedgewick)O(n logn)O(n logn^2)O(n^43)
Bogo---

Sorting Stability

A sorting implementation is considered stable if values maintain ordering when sorted by another key.

For example

ItemKey AKey B
a2001
d3002
b2001
c2001

Would return

ItemKey AKey B
a2001
b2001
c2001
d3002

Rather than some variant where the a -> b -> c ordering is not preserved


StableUnstable
1 C
1 A
1 B
2
3
4
5


1 A
1 B
1 C
2
3
4
5

or some variation thereof

A stable sort would keep the ordering (for key 1) CAB,
whereas an unstable sort might produce ABC, ACB, BAC, BCA, or CBA

Design

The program execution time of sortA and sortB can be used to analyse the type of algorithm implemented. For example, a bubble sort in its best case (already sorted) scenario should occupy O(n) time

Methodology

Linear Time
Traversing a sorted array of 2n elements should take twice the time of a sorted array of n elements  
Bubble vs Insertion
Both Bubble Sort and Insertion Sort share the same best, average and worst case execution times. However one input case where Insertion Sort is faster than Bubble Sort is {1,3,3,3,3,3,…,3,2}. Where Insertion Sort will take O(n) time while Bubble Sort would take O(n^2) time.
We can craft an input with: python3 -c "print('\n'.join(map(str,[1,*([3]*(1000000-2)),2])))"
 
Checking for stability
We can craft an input to check for stablity

1
2
3
4
5
6
7
1 C
2
1 A
3
1 B
4
5

Checking for bogo-ness
For a sort of the same array of numbers taken n times, if the differences between timings are considerably different, or if the sort never finishes within a reasonable time - then it is highly liking that the sort is a bogo sort.

Lots of data … 'lazily'

I can’t be bothered to learn bash scripting so I’ll just use Python

This script generates a comma-separated list of n_items, time.
This could be used for finding patterns, or even for graphing/plotting purposes

Usage: ./data.py [items] [A/D/R]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/bin/python3
import sys

import subprocess
from random import shuffle

# Returns the mean of a list of items (to 3 decimal places)
def average(items):
  sum = 0
  for item in items:
    sum += item
  return round(sum / len(items), 3)

for n in range(2, int(sys.argv[1]), 10000):
  print(n, file=sys.stderr, end=" ")
  if sys.argv[2] == "D": nums = range(n, -1, -1)   # Descending
  else: nums = range(n)                            # Ascending
  
  nums = list(nums)
  if sys.argv[2] == "R": shuffle(nums)             # Random
  string = "\n".join(map(str,nums)).encode()

  times = []
  for x in range(10):
    with subprocess.Popen(["/usr/bin/time", "-f%U", "./sortA"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as process:
      times.append(float(process.communicate(string)[1][1:-2]))

  avg = average(times)
  print(str(n) + ",", avg)
  print(avg, file=sys.stderr, end="")

Tools

gen

Although I didn’t use this because my data generation script generated the numbers for me already

./gen <n-values> (A|D|R) [seed]

./gen is a program that generates numbers either in ascending, descending or random order
(n-values is actually n+1 values but sure)

GNU time

Version: 1.7
/usr/bin/time --format="%U"

The time program measures the execution time of a program.
We are only interested in the “user execution time” of our programs, so we can add the flag --format="%U%"

Experimentation

All tests would be conducted 10 times, and the average of these timings would be taken to mitigate time inconsistencies caused by environmental factors (computer load, other users, things gone wrong)

The tests would also be run on the same UNSW CSE computer (wagner), under the same user account

Results

sortA

NumbersAvg12345678910
Ascending
500000 A0.1270.120.120.130.130.130.130.110.130.140.13
1000000 A0.2620.260.260.250.280.260.270.260.260.270.25
Descending
500000 D0.1790.170.180.180.190.180.180.180.180.170.18
1000000 D0.3950.410.390.390.400.360.420.380.390.400.41
Random
500000 R0.3830.380.380.390.390.380.380.390.370.380.39
1000000 R0.8080.820.810.810.800.820.800.820.810.800.79
echo -e "1 C\n2\n1 A\n3\n1 B\n4\n5" | ./sortA
1 C
1 A
1 B
2
3
4
5

Further Timing

# ItemsAscendingDescendingRandom
20.0000.0000.000
100020.0000.0000.000
200020.0000.0000.010
300020.0000.0000.010
400020.0000.0100.020
500020.0100.0100.028
600020.0090.0190.030
700020.0100.0190.040
800020.0150.0200.048
900020.0190.0200.057
1000020.0200.0290.059
1100020.0200.0290.068
1200020.0260.0380.074
1300020.0240.0400.079
1400020.0260.0400.087
1500020.0320.0450.098
1600020.0370.0490.103
1700020.0360.0560.114
1800020.0380.0600.119
1900020.0400.0590.129
2000020.0450.0640.135
2100020.0450.0660.142
2200020.0480.0680.148
2300020.0500.0780.160
2400020.0550.0770.164
2500020.0550.0810.174
2600020.0590.0890.180
2700020.0620.0900.185
2800020.0660.0900.198
2900020.0650.0960.209
3000020.0720.1000.209
3100020.0710.1060.218
3200020.0720.1100.229
3300020.0790.1190.238
3400020.0810.1160.243
3500020.0810.1190.252
3600020.0860.1210.261
3700020.0880.1250.267
3800020.0910.1340.274
3900020.0950.1320.288
4000020.0980.1400.297
4100020.0960.1420.297
4200020.0930.1490.310
4300020.1020.1490.315
4400020.1020.1550.330
4500020.1090.1580.332
4600020.1080.1610.337
4700020.1120.1680.347
4800020.1120.1720.357
4900020.1170.1710.363
5000020.1210.1780.381
5100020.1210.1840.383
5200020.1280.1900.392
5300020.1260.1950.399
5400020.1270.1970.405
5500020.1360.1890.419
5600020.1320.2060.423
5700020.1370.2010.435
5800020.1390.2050.441
5900020.1410.2110.449
6000020.1480.2210.455
6100020.1490.2270.460
6200020.1510.2270.468
6300020.1550.2280.481
6400020.1530.2390.494
6500020.1580.2380.496
6600020.1570.2440.508
6700020.1590.2460.514
6800020.1600.2480.528
6900020.1690.2530.528
7000020.1660.2660.541
7100020.1700.2730.546
7200020.1730.2750.554
7300020.1800.2850.562
7400020.1810.2810.579
7500020.1860.2840.575
7600020.1830.2870.598
7700020.1850.2870.604
7800020.1930.2980.629
7900020.1910.3000.627
8000020.1920.3090.622
8100020.1960.3070.633
8200020.2000.3170.648
8300020.2060.3090.654
8400020.1990.3280.661
8500020.2030.3240.671
8600020.2120.3370.680
8700020.2080.3360.703
8800020.2160.3360.715
8900020.2190.3560.722
9000020.2210.3480.720
9100020.2160.3520.729
9200020.2240.3740.738
9300020.2240.3730.748
9400020.2260.3690.746
9500020.2340.3610.757
9600020.2330.3710.770
9700020.2310.3750.786
9800020.2400.3860.781
9900020.2420.3990.794

sortB

NumbersAverage12345678910
Ascending
25000 A0.960.960.960.960.960.960.960.960.960.960.96
50000 A3.873.873.873.873.873.873.873.873.873.87
Descending
25000 D4.4524.504.404.424.554.444.404.434.514.434.44
50000 D17.92717.7418.1417.7417.7817.8618.8417.5218.0617.8017.79
Random
25000 R3.6583.703.653.613.573.593.703.683.723.743.62
50000 R14.95414.9014.8914.8014.8614.7415.4315.0015.2015.0214.70

Further Timing

# ItemsAscendingDescendingRandom
20.0000.0000.000
10020.0000.0000.000
20020.0000.0210.020
30020.0100.0610.041
40020.0200.1130.084
50020.0390.1750.135
60020.0500.2590.202
70020.0700.3490.281
80020.1000.4510.365
90020.1200.5730.468
100020.1500.7200.588
110020.1800.8590.729
120020.2201.0080.859
130020.2601.1891.014
140020.3001.4031.180
150020.3401.6091.361
160020.3901.8271.521
170020.4402.0601.732
180020.5002.3311.950
190020.5602.5922.201
200020.6202.8572.444
210020.6803.1392.652
220020.7463.4602.945
230020.8193.7973.182
240020.8904.1013.493
250020.9604.4433.737
260021.0404.8044.111
270021.1205.2094.416
280021.2105.5484.757
290021.3005.9755.114
300021.3906.4035.435
310021.4806.8285.816
echo -e "1 C\n2\n1 A\n3\n1 B\n4\n5" | ./sortB
1 C
1 A
1 B
2
3
4
5

Analysis

sortA

NumbersAvg
Ascending
500000 A0.127
1000000 A0.262
Descending
500000 D0.179
1000000 D0.395
Random
500000 R0.383
1000000 R0.808

The sort timing results suggested the implementation of a Bubble or Insertion Sort for sortA’s algorithm.

In the best case scenario where all inputs are already sorted (ascending inputs) -
sortA produced timings whose input of 1000000 numbers resulted in a timing of 0.262s, which was twice that of half as many inputs (500000) at 0.127s.
This suggests that the sort algorithm was either a Bubble Sort or Insertion Sort, as both of these algorithms exhibit a best case time of O(n).

To determine whether this algorithm was a Bubble or Insertion sort, the command was run: python3 -c "print('\n'.join(map(str,[1,*([3]*(1000000-2)),2])))" | time --format="%U" ./sortA > /dev/null
This returned a timing in the 0.2s region which is closer to O(n) time than O(n^2), hence it can be concluded that sortA is an Insertion Sort

The stability sort passed, as the program output the correct order CAB BUT DIS WRONG, shell sorts are not stable by nature

sortB

NumbersTime (s)
Ascending
25000 A0.96
50000 A3.87
Descending
25000 D4.452
50000 D17.927
Random
25000 R3.658
50000 R14.954

It is evident that sortB is neither an Insertion Sort nor a Bubble Sort, as a doubling of the number of items does not double proportionally. It is also not a selection sort, the timings do not increase exponentially

sortB’s sorting computation time graph displayed a non-linear trend, where the linear increase of the amount of items to sort increased either logarithmically or exponentially.

On assumption of a Shell Sort, upon dividing the timings by the number of items, a trend can be seen where the average and worst cases have values are the square of the best case timings. This is consistent with the Shell Sort’s complexity.

The stability sort passed, as the program output the correct order CAB

Conclusion

sortA is an Insertion Sort.
sortB is a Shell Sort.

EDIT: I dun guf.
sortA was a shell sort - unstable. –> not enough input
sortB was a bubble sort - it has an early exit boi (best case)