This assignment can be done in teams of 1 or 2 students for full marks, or 3 students at 80% of the grade.
All teams, including teams of 1, must register their team as a D2L assignment group
in order to submit the assignment.
The primary purpose of this assignment is to deepen your understanding of asymptotic costs
by looking at the specific performance of specific algorithms with known asymptotic costs.
You will do this by profiling two well known algorithms running on large data sets and
plotting the results.
This assignment should be developed in Java.
You can develop it on the platform of your choice using your prefered development tools.
However we will be testing your programs on the UNIX moons and they must therefore work on that
environment as well.
The UNIX java compiler is called javac and java programs are launched with the java
command. See man java or man javac for more details.
Step 1: Algorithm Development
- Select one average case O(n2) sorting algorithm
(e.g. bubble sort, insertion sort, selection sort)
and one average case O(n logn) sorting algorithm (e.g. mergesort, quicksort, heapsort).
You can find other sorts at
Wikipedia's sorting page.
- Create a java class called Sort which contains two static functions:
public static void slowsort(int array) //This is your O(n2) algorithm
public static void fastsort(int array) //This is your O(n logn) algorithm
Both of these will sort the array array
The code for your sorting functions does not have to be original.
All of these algorithms are well known and you can find good implementations on the web.
You can also find bad implementations so you should make sure that your code is correct.
Also, please be sure to pick implementations with sort simple arrays
and other more complicated structures like collections.
In a separate class called Assignment
write a main test program which does the following:
- Sets a constant N to be of a particular size.
- Creates an array of that size (N) and fills it with numbers 1 to 10N randomly distributed
- Creates an identical copy of the array. Note that it is simpler
to just create the two arrays at the same time.
- Prints the array (you can use
to do so.)
- Sorts the first array with fastsort and prints the result
- Sorts the second array with slowsort and prints the result
- Once you are completely sure that your two sorting algorithms are completely functional and correct,
comment out all your printing. You are now ready to start your profiling.
- You should verify that your sort functions are working properly with arrays that are large enough. Sorting 10 randomly generated arrays of size n=100 should be good enough.
- While you are debugging your sorts, you can make sure that the arrays being sorted always contain the
same element by seeding the random number generator yourself.
Step 2: Profiling
You will now profile your two functions using the
Here are the steps to follow for the profiling:
- Your main program should contain the following constants:
final static int N = 100; // Size of the array
final static int MAXVALUE = 10*N; // maximum value in the array
final static int AVERAGEOVER = 1000000 / N; //Number of iterations over which to average performance
and code that calculates the average time to sort an array of size N with
each algorithm, as follows:
// repeat AVERAGEOVER times:
// randomly generate two identical arrays of size N as before
// time how long it takes for your fastsort to sort its array - keep cummulative time
// time how long it takes for your slowsort to sort its array - keep cummulative time
// Then divide each cummulative time by AVERAGEOVER to get the average time of each sort.
- You are now ready to do your profiling and collect data. The purpose of this assignment
is to experience first-hand O(n log n) and O(n2) behaviours, so you will need to
sample these behaviours at a few meaningful points. Since your fast algorithm behaves in a
partly logarithmic way, you will need to perform your sampling in a vaguely exponential way:
you will do it for N = 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, etc. up to at least 1,000,000
and more if your system can handle it.
Staying on the same system run your program once for each of these values of N,
and store the results in this
times summary spreadsheet.
As you will have hopefully noticed, the code above is designed to support the collection of
the average performance of the two algorithms for a particular value of N by calling each
algorithm repeatedly on fresh data.
For this to be a real average
you will need to make sure that you do not reset your random number generator during the
program. In other words, your random number generator should only be seeded once
at the beginning of the program and not afterwards. This way the arrays to be sorted will be
different at each iteration.
Step 3: Analysis
In this last step you will be using Matlab to graph the data you collected along with two functions, a multiple of n2, and a multiple of n logn.
Matlab is available in the Computer Science labs, but you can also download it for free from the
Electrical Engineering Department.
- Save your
times summary spreadsheet.
in tab-delimited text format.
- Use the Import Data button of Matlab to import the data from your tab-delimited summary into Matlab as three separate vectors called n, slowsort, fastsort. To do this, select the Column Vectors option in the Import window and rename the three columns n, slowsort, fastsort and your Import Selection should be Import Data.
- Enter the following three lines in Matlab:
>> hold on
>> legend('fastsort','n logn','slowsort','n^2')
More details can be obtained in the matlab help on loglog.
- loglog produces 2-D plots on a log-log scale (x and y axes are both logarithmic)
- loglog(X,Y,LineSpec) plots all Yn versus Xn pairs. LineSpec specifies a color and type of line. r=red, b=blue, x and o are point shapes and - is a continuous line.
- n.expression indicates that the expression will be applied to each element of the vector n.
- loglog(X1,Y1,LineSpec1,X2,Y2,LineSpec2,...,Xm,Ym,LineSpecm) plots m functions in the same figure.
- Hold on indicates that the figure should not be cleared before applying the next command
- Legend adds a legend for all the functions being plotted. The titles are in the same order as the loglog parameters
- Find coefficients for the n logn and n2 functions so that their lines in the plot are much closer to your slowsort and fastsort points while remaining upper bounds. I.e. manipulate the expressions n.*log(n) and n.^2 to bring the plot lines much closer to fastsort and slowsort respectively.
- Once you have found good coefficients for these two functions,
produce a clean plot with correct legends showing the exact n logn and n2 functions,
and print out your plot on pdf.
- Answer the questions on the second page of the marking sheet
You can submit this assignment in teams of 1 or 2 students.
There should be only one submission per team.
Duplicate submissions which have not been identified as teamwork will be treated
as plagiarism and refered to the Office of Academic Integrity.
This assignment is handed in partly electronically and partly on paper.
Both methods are described in the
Submission Information page.
- Zip up your two java files in a file called a1.zip. and
submit it electronically in the D2L "Assignment" submission folder belonging to your team.
The submission folder have a max of three students per team and are set up for
In other words, you and your team just pick one of the free folders (there are 180 altogether) and make it your own.
- Hand in manually a package consisting of printouts of
- The filled out marking sheet and questions
- your summary spreadsheet
- your matlab plot