This project implements an external sorting algorithm using large buffers to efficiently handle datasets that exceed available RAM. It simulates real-world scenarios where data must be sorted while optimizing disk I/O operations. The program generates records, sorts them in phases using buffer-based merging, and analyzes performance metrics, comparing theoretical and practical results.
The experiment was conducted to examine how changes in specific parameters affect the number of I/O operations, and to compare the theoretical I/O operation counts with the practical ones. I performed tests with significantly varying record counts, and analyzed the differences between the theoretical and practical I/O operations using detailed charts.
More about the experiment and its results here.
The program's behavior can be customized using two separate arguments:
Specifies how the input data is provided.
-
Options:
"generate"
: The program generates records based on their predefined count."file"
: Data is read from a file."keyboard"
: Data is entered manually via the keyboard.
-
Default Behavior:
If no argument is provided, the program will generate records automatically.
The data is stored in the filedata.txt
by default.
Defines how the runs are presented after each phase of the algorithm.
-
Options:
"short"
: Displays the runs in a shortened version."full"
: Presents the runs in full detail.
-
Default Behavior:
If no argument is provided, the runs are not displayed.
Each record is provided in a text file, with each record in a separate line. Record is formatted as follows:
- The record contains 3 numbers selected from the set
[2, 3, 3.5, 4, 4.5, 5]
, each separated by a space. - The ordering of the records is based on the average score computed from these 3 values.
Example record:
2 3.5 4