HW 1: external sorting

The problem to solve

Code a program that sorts the file data.txt into data.out (in current directory). These are plain text files with a key+value pair of 64-bit unsigned integers on each line. The output lines should be ordered by ascending keys with ties broken by ascending values.

You will test the program on an input file generated by /afs/ms/u/k/koucky/ds1/pbl1-gen-data. The machines for this are in the Mala Strana laboratory u-pl0 to u-pl15, i.e. those located on the outer arc of the UNIX part.

The file for measurements is approximately 45GB large, and the time limit to pass the assignment is 60 minutes. Only use the /tmp/ directory for these big files and any temporary files, as the home folders are remote (slower and smaller).

Be considerate

Clean the data files after you have finished, so others can measure after you. Note that the machines can be used remotely via ssh. Be considerate of other people and do not use a machine if someone else is using it at the moment (commands who and top). You can use another of the machines or wait. Report bad behaviour by e-mail, please.

Submission

Deadline: October 23 for the source code, October 30 for the measurement. Students in English programme: the deadline for both parts is extended until November 6, so you can catch up on basic sorting methods.

Submit by e-mail to and a copy to my address on matfyz.cz. (Note for students of the Czech programme: instead of the copy you submit to CodEx.)

Example

Input

96313 3992064775605989878
6024613951808161023 4071849752967919252
11417427497858712892 9671181978208670736
3894 2867422479361385327
96313 3992064775605758114
6437842131676518652 6510455770151352973

Output

3894 2867422479361385327
96313 3992064775605758114
96313 3992064775605989878
6024613951808161023 4071849752967919252
6437842131676518652 6510455770151352973
11417427497858712892 9671181978208670736

Generating the input

Example: the command

/afs/ms/u/k/koucky/ds1/pbl1-gen-data XX >/tmp/data.txt

generates the input for evaluation, where XX is the last two digits from your personal number (UKČO) found in SIS. The command

/afs/ms/u/k/koucky/ds1/pbl1-gen-data - XX >/tmp/data.txt

generates a tenfold smaller input that you can use for debugging.

E-mail with measurements

Use the following template:

Name: Michal Koucky
Machine: u-pl7
CPU: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
RAM: 6GB
Time (by the 'time' utility): real 43m41.704s, user....
Date and time of measurment: DD-MM-YY HH:MM:SS
XX (for gen-data): 37
Short description of algorithm: I used foo-sort
 applied on bar-transformation from baz-sort.
Pseudonym for web-site: I don't want to be listed.