|
|
COMP3300 Assignment 1 2004 |
Updated on |
||
| COURSE PROFILE
ASSIGNMENTS CONTACT EXAM |
Whats New:
Assignment 1 in 2005 will cover different ground but this will provide a useful starting point for understanding C coding. A solution is available. The solution corresponds to the 90% correct case since the 100% correct case is significantly more complex to understand. This 2004 assignment is supplied as an additional exercise: the 2005 assignment will cover similar ground but a different problem. |
|
||
Assignment 1
The purpose of this assignment is to show your understanding of threads and of C programming.You are required to use existing code as a base for developing a multithreaded mulifile sort. You have the option of developing a simplified version which sorts exactly 4 files, if you find the general case of an arbitrary number of files too hard.
Multifile here means that you should sort multiple files into one overall sorted result, and you should use the given
printBuffer function to produce the output.The notion is that because file reading is so slow, it you are sorting multiple files, it would be nice to start sorting as soon as the smallest file has been read, and to overlap as much work as possible with waiting for file reads.
The provided code shows how to read 1 file, sort it and print it, using threads. You need to work out how to merge multiple sorted files (in memory-based data structures), using threads.
The files you need to get started are in the Files directory. You can also fetch them as a gzipped tar file; to untar it (cd first to the directory where you want to develop your solution):
tar xfzv comp3300-assign1-1.0-2004.tgz
Please do not forget the importance of ethical conduct. If in doubt about
whether something constitutes plagiarism, please ask.Detailed Requirements
- Your program should use the same files as are supplied, and you should only modify the files you are told to modify.
- A command line is supplied to compile the program. Whatever development environment you use, your program must compile with the given command line on the student UNIX machines (Solaris). The provided code has been tested on Linux and Solaris, but your solution must run on Solaris.
- Your program should run with the following command:
./launch_parallel_sort data
where the file namedatacan be replaced by an arbitrary number of files. - The input data to your program should be any number of text files named on the command line.
- The output of your program should only consist of the output
produced by calling
printBufferonce as in the supplied code; the output should be the contents of all the files named on the command line, individually sorted then merged (i.e., the original data sorted into one data structure). - Your program should be written as follows (alternative organizations will be accepted if they are justified as simpler or more efficient, but the basic principle of doing work as soon as the data is available by using multiple threads must apply):
- it should have one thread for each file read (using the supplied code for reading a file)
- it should have one thread for each sort (using the supplied sorting code)
- it should have one or more threads for merging sorted results: each of these threads should merge two sorted results into one; and merges may be cascaded to handle arbitrary numbers of files
- You need to write code to do merging and code to combine reading, sorting and merging, using the given thread primitives.
- You should only use the following threading primitives, and not other parts of Pthreads or other libraries or APIs:
- init_threads
- launch_thread
- join_thread
- You are supplied with 4 data files for testing, but your final code will be tested with other files.
- You should only modify the following files; all other files will
be used unaltered when we test your code:
launch_parallel_sort.c
parallel-sort.c
if you modify any other files or submit files with different names to these your assignment will be graded as "doesn't compile". - Your program should conform to the following formatting requirements:
- All code at the same level should be left-aligned; an increase
of level by introducing "
{" should result in an indentation of 2 - 4 spaces; opening a new "{" block should usually be done at the end of the previous line and the closing "}" should be on a line of its own unless readbility dictates otherwise (for example, it is permissiable to write "} else {"); if in doubt ask - Type names should start with an initial capital; all other names with an initial lowercase letter. For readability you may use underscores or capitals within a name. (Violations of these guidelines in the original code should not be altered.)
- Any types or variables specific to one compilable ("
.c") file should appear at the top of the file after the "#includes".
- There will be a bonus for early completion: any assignment handed in 24 hours or more before the due date, which achieves at least 50% of available marks, will receive a 5% bonus. The maximum available for the assignment therefore is 105% of available marks. Recall that if you fail to achieve 50% overall for assignments, your exam alone will be used to determine your final grade, but with a maximum grade of 5.
Mark Scheme
- If your program fails to compile, it will receive a maximum mark of 40%.
- If your program fails to run to completion or has errors in output, it will receive a maximum mark of 50% (after other aspects of the mark scheme are taken into account).
- If your program works for a minimal case of exactly 4 files, it will receive a maximum mark of 65%.
- If your program fails to exploit obvious opportunities for parallelism it will receive a maximum mark of 70%.
- If your program misses less obvious opportunities for parallelism it will receive a maximum mark of 95%.
- marks will be deducted for each of the following, with the maximum
set by each of the above criteria:
- inadequate documentation: code must be explained at the level of each function, as well as high-level design decisions
- deviation from specification (less than the extreme case of failure to compile)
- clear approach -- over and above the documentation, it must be clear what your solution does
- failure to exploit maximum parallelism
