HW 1: external sorting
The problem to solve
Code a program that sorts the file data.txt
into data.out
(in current directory). These are plain text files with a key+value pair of 64-bit unsigned integers on each line. The output lines should be ordered by ascending keys with ties broken by ascending values.
You will test the program on an input file generated by /afs/ms/u/k/koucky/ds1/pbl1-gen-data
. The machines for this are in the Mala Strana laboratory u-pl0
to u-pl15
, i.e. those located on the outer arc of the UNIX part.
The file for measurements is approximately 45GB large, and the time limit to pass the assignment is 60 minutes. Only use the /tmp/
directory for these big files and any temporary files, as the home folders are remote (slower and smaller).
Be considerate
Clean the data files after you have finished, so others can measure after you. Note that the machines can be used remotely via ssh
. Be considerate of other people and do not use a machine if someone else is using it at the moment (commands who
and top
). You can use another of the machines or wait. Report bad behaviour by e-mail, please.
Submission
Deadline: October 23 for the source code, October 30 for the measurement. Students in English programme: the deadline for both parts is extended until November 6, so you can catch up on basic sorting methods.
Submit by e-mail to and a copy to my address on matfyz.cz. (Note for students of the Czech programme: instead of the copy you submit to CodEx.)
Example
Input
96313 3992064775605989878
6024613951808161023 4071849752967919252
11417427497858712892 9671181978208670736
3894 2867422479361385327
96313 3992064775605758114
6437842131676518652 6510455770151352973
Output
3894 2867422479361385327
96313 3992064775605758114
96313 3992064775605989878
6024613951808161023 4071849752967919252
6437842131676518652 6510455770151352973
11417427497858712892 9671181978208670736
Generating the input
Example: the command
/afs/ms/u/k/koucky/ds1/pbl1-gen-data XX >/tmp/data.txt
generates the input for evaluation, where XX
is the last two digits from your personal number (UKČO) found in SIS. The command
/afs/ms/u/k/koucky/ds1/pbl1-gen-data - XX >/tmp/data.txt
generates a tenfold smaller input that you can use for debugging.
E-mail with measurements
Use the following template:
Name: Michal Koucky
Machine: u-pl7
CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
RAM: 6GB
Time (by the 'time' utility): real 43m41.704s, user....
Date and time of measurment: DD-MM-YY HH:MM:SS
XX (for gen-data): 37
Short description of algorithm: I used foo-sort
applied on bar-transformation from baz-sort.
Pseudonym for web-site: I don't want to be listed.