CPS616

W2017 Assignment

Ryerson University

Worth: 10%

Profiling Efficiencies

Due: Friday March 17


Team size

This assignment can be done in teams of 1 or 2 students for full marks, or 3 students at 80% of the grade. All teams, including teams of 1, must register their team as a D2L assignment group in order to submit the assignment.

Purpose

The primary purpose of this assignment is to deepen your understanding of asymptotic costs by looking at the specific performance of specific algorithms with known asymptotic costs. You will do this by profiling two well known algorithms running on large data sets and plotting the results.

Programming Environment

This assignment should be developed in Java. You can develop it on the platform of your choice using your prefered development tools. However we will be testing your programs on the UNIX moons and they must therefore work on that environment as well. The UNIX java compiler is called javac and java programs are launched with the java command. See man java or man javac for more details.

Assignment Description

Step 1: Algorithm Development

  1. Select one average case O(n2) sorting algorithm (e.g. bubble sort, insertion sort, selection sort) and one average case O(n logn) sorting algorithm (e.g. mergesort, quicksort, heapsort). You can find other sorts at Wikipedia's sorting page.

  2. Create a java class called Sort which contains two static functions:
    public static void slowsort(int array[]) //This is your O(n2) algorithm
    public static void fastsort(int array[]) //This is your O(n logn) algorithm

    Both of these will sort the array array The code for your sorting functions does not have to be original. All of these algorithms are well known and you can find good implementations on the web. You can also find bad implementations so you should make sure that your code is correct. Also, please be sure to pick implementations with sort simple arrays and other more complicated structures like collections.

  3. In a separate class called Assignment write a main test program which does the following:
    1. Sets a constant N to be of a particular size.
    2. Creates an array of that size (N) and fills it with numbers 1 to 10N randomly distributed (See the Random class)
    3. Creates an identical copy of the array. Note that it is simpler to just create the two arrays at the same time.
    4. Prints the array (you can use Arrays.toString to do so.)
    5. Sorts the first array with fastsort and prints the result
    6. Sorts the second array with slowsort and prints the result

  4. Once you are completely sure that your two sorting algorithms are completely functional and correct, comment out all your printing. You are now ready to start your profiling.
Advice
  • You should verify that your sort functions are working properly with arrays that are large enough. Sorting 10 randomly generated arrays of size n=100 should be good enough.
  • While you are debugging your sorts, you can make sure that the arrays being sorted always contain the same element by seeding the random number generator yourself.

Step 2: Profiling

You will now profile your two functions using the System.nanotime method.

Here are the steps to follow for the profiling:

  1. Your main program should contain the following constants:
    	final static int N = 100;			// Size of the array
    	final static int MAXVALUE = 10*N;	// maximum value in the array
    	final static int AVERAGEOVER = 1000000 / N;	//Number of iterations over which to average performance
    
    and code that calculates the average time to sort an array of size N with each algorithm, as follows:
    	// repeat AVERAGEOVER times:
    	//     randomly generate two identical arrays of size N as before
    	//     time how long it takes for your fastsort to sort its array - keep cummulative time
    	//     time how long it takes for your slowsort to sort its array - keep cummulative time
    	// Then divide each cummulative time by AVERAGEOVER to get the average time of each sort.
    

  2. You are now ready to do your profiling and collect data. The purpose of this assignment is to experience first-hand O(n log n) and O(n2) behaviours, so you will need to sample these behaviours at a few meaningful points. Since your fast algorithm behaves in a partly logarithmic way, you will need to perform your sampling in a vaguely exponential way: you will do it for N = 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, etc. up to at least 1,000,000 and more if your system can handle it. Staying on the same system run your program once for each of these values of N, and store the results in this times summary spreadsheet.

    As you will have hopefully noticed, the code above is designed to support the collection of the average performance of the two algorithms for a particular value of N by calling each algorithm repeatedly on fresh data. For this to be a real average you will need to make sure that you do not reset your random number generator during the program. In other words, your random number generator should only be seeded once at the beginning of the program and not afterwards. This way the arrays to be sorted will be different at each iteration.

Step 3: Analysis

In this last step you will be using Matlab to graph the data you collected along with two functions, a multiple of n2, and a multiple of n logn. Matlab is available in the Computer Science labs, but you can also download it for free from the Electrical Engineering Department.

  1. Save your times summary spreadsheet. in tab-delimited text format.

  2. Use the Import Data button of Matlab to import the data from your tab-delimited summary into Matlab as three separate vectors called n, slowsort, fastsort. To do this, select the Column Vectors option in the Import window and rename the three columns n, slowsort, fastsort and your Import Selection should be Import Data.

  3. Enter the following three lines in Matlab:
    >> loglog(n,fastsort,'rx',n,n.*log(n),'r-',n,slowsort,'bo',n,n.^2,'b-')
    >> hold on
    >> legend('fastsort','n logn','slowsort','n^2')
    
    Explanation:
    • loglog produces 2-D plots on a log-log scale (x and y axes are both logarithmic)
    • loglog(X,Y,LineSpec) plots all Yn versus Xn pairs. LineSpec specifies a color and type of line. r=red, b=blue, x and o are point shapes and - is a continuous line.
    • n.expression indicates that the expression will be applied to each element of the vector n.
    • loglog(X1,Y1,LineSpec1,X2,Y2,LineSpec2,...,Xm,Ym,LineSpecm) plots m functions in the same figure.
    • Hold on indicates that the figure should not be cleared before applying the next command
    • Legend adds a legend for all the functions being plotted. The titles are in the same order as the loglog parameters
    More details can be obtained in the matlab help on loglog.

  4. Find coefficients for the n logn and n2 functions so that their lines in the plot are much closer to your slowsort and fastsort points while remaining upper bounds. I.e. manipulate the expressions n.*log(n) and n.^2 to bring the plot lines much closer to fastsort and slowsort respectively.

  5. Once you have found good coefficients for these two functions, produce a clean plot with correct legends showing the exact n logn and n2 functions, and print out your plot on pdf.

  6. Answer the questions on the second page of the marking sheet (docx or pdf)

Additional References

Hand In

You can submit this assignment in teams of 1 or 2 students. There should be only one submission per team. Duplicate submissions which have not been identified as teamwork will be treated as plagiarism and refered to the Office of Academic Integrity.

This assignment is handed in partly electronically and partly on paper. Both methods are described in the Submission Information page.

  1. Zip up your two java files in a file called a1.zip. and submit it electronically in the D2L "Assignment" submission folder belonging to your team. The submission folder have a max of three students per team and are set up for self-enrollment. In other words, you and your team just pick one of the free folders (there are 180 altogether) and make it your own.

  2. Hand in manually a package consisting of printouts of
    • The filled out marking sheet and questions (doc or pdf)
    • your summary spreadsheet
    • your matlab plot


This page is maintained by Sophie Quigley (cps616@scs.ryerson.ca)
Last modified Tuesday, 14-Mar-2017 22:32:29 EDT