Testing and optimizing different verions of hash functions
Studying task, which includes two parts:
- Comparing different hash functions
- Using different optimizing methods to increase performance of work with hash table
In this task list structure used, that is made by me previously. List structure here
Main propeties of good hash function:
- determinism
- high counting speed
- all values are equiprobable
- one-wayness
- by changing one bit in the key, the hash differs significantly ( hash(key) is not continuous function)
To compare different hash functions I used hash table with List structures as elements of it. If collisions occures, element ads to list, so by examing size of each list we can assume how often hash collisions are happening during experiment
For compare I used hamlet implementation to split input text file - Oxford Dictionary into words. In the cycle firstly added words to hash table, then I recorded sizes of lists in hash table and used simple python script to make histograms for each hash function I tested
Size of hash table were fixed during tests and it was equal 4096
By the size of lists in hash table we can get information how frequently hash collisions are happening
Data, used for making plots are here: /text_files/res.txt
- Hash function, that always returns 1
- Hash function, that returns ASCII code of first symbols of string
- Hash function, that returns size of string
- Hash function, that returns sum of ASCII codes of string elements
- Rol hash function ( code of function will be shown later)
- Crc32 hash function
Firstly we will look at plots and data, that I got from tests. Secondly we will process results.
Situation with colisions with this hash function is so bad, that we can not see one big column in histogram. Here is a screenshot of data, that were used for building plot:
As we can see, this hash function is pretty bad. Collisions in 100% of cases. This function does not have properties of good hash function
This function is better, but still can not be used for proper use of hash table, collisions are happening too frequently. Values of hash function are not equiprobable. This is not a good hash function too.
Here are data for plot:
As we can see, there are some troubles with making plot. Let's look at data:
This hash function has similar problems like previous one.
Dispersion of values of this hash function is better than of previous one, but still this function can not be called good hash function
Here is code of this functions:
And here is plot:
We can see, that this hash function is much better. All values in range are used, although there are some lists filles more than other. It meas that collisions are still happening pretty frequently with some key values
This hash functions is even better. Less high values of list sizes means that values in range of hash function are equiprobable, as it has to be with ideal hash function.
Dispersion values for different hash functions:
Hash Func | Dispersion |
---|---|
one | 1202909 |
first ascii | 35906 |
sizeof | 131952 |
ascii sum | 5077 |
rol | 87 |
crc32 | 17 |
We can see, that the best results are shown by crc32 hash function.
Second part of task - optimizing performance of some tests for hash table
I made test, that taked as input array of structures, containing char* pointer tp word from input text and unsigned int lenght of this string. Then it calculates hash value of tokens and ads them to hash table if there were no such words before. Then it 128 times executes hash table search, that includes hash value calculating and list search. Then for each word from array of tokens it searches it once again and deleted from hash table. Aso I added timer, that measures time of this stress test especially. During executing I don't have to calculate lenght of word again, cause it is already known after parsing text into tokens.
For measuring time of executing whole program, I used 'time' linux command. Before any optimizations, I measured time of perfoming programm with different optimization flags of gcc compiler. Here there are:
Number \ Flags | No flags | No flags | -O1 | -O1 | -O2 | -O2 | -O3 | -O3 |
---|---|---|---|---|---|---|---|---|
Period | Total | Test | Total | Test | Total | Test | Total | Test |
1 | 5.52 | 5.43 | 2.99 | 2.93 | 2.95 | 2.90 | 2.90 | 2.85 |
2 | 5.47 | 5.39 | 3.00 | 2.95 | 2.97 | 2.91 | 2.89 | 2.84 |
3 | 5.52 | 5.43 | 3.01 | 2.95 | 2.94 | 2.89 | 2.93 | 2.84 |
4 | 5.52 | 5.44 | 2.99 | 2.94 | 2.94 | 2.89 | 2.92 | 2.83 |
5 | 5.60 | 5.51 | 3.00 | 2.94 | 2.95 | 2.89 | 2.90 | 2.84 |
Average | 5.53 | 5.44 | 3.00 | 2.94 | 2.95 | 2.89 | 2.91 | 2.84 |
Input used for test - Oxford Dictionary, here it is
Also important thing to note that I tried to optimize only 'stress test function', initial processes like parsing input text and etc was not examined and changed with making optimizations.
I used callgrind tool from valgrind to collect profiling info and kcachegrind to examine it. This is profiler's output for the very first version without any optimizations:
Most frquently called functions are _list_search
, my_hash
(crc32 hash algorithm) and _strcmp_avx2
. Their execution time takes up most of program execution time. These functions will be optimized firstly.
In pictures you can see both C and assembly versions of my_hash
function.
Let's replace hash functions with its assembly version and measure performance time
1 | 2 | 3 | 4 | 5 | AVERAGE | |
---|---|---|---|---|---|---|
TOTAL | 4.76 | 4.89 | 4.79 | 4.71 | 4.75 | 4.78 |
TEST | 4.66 | 4.76 | 4.66 | 4.62 | 4.63 | 4.67 |
Performance increased by about 16%. Good result, let's move on
As next step I rewrote _list_search
functions in assebmly anr replaced with it its C version.
In pictures you can see assembly version of list search. Original function
here
Important to note, that I made some optimizations in assembly version of lsit search. I decreased number of memory accesses and removed call of standart function strlen
.
Here are results:
1 | 2 | 3 | 4 | 5 | AVERAGE | |
---|---|---|---|---|---|---|
TOTAL | 2,79 | 2,67 | 2,63 | 2,66 | 2,62 | 2,67 |
TEST | 2,67 | 2,55 | 2,54 | 2,54 | 2,54 | 2,57 |
Comapring to version with no optimization flags, we increased performance by 207% and have already exceeded -O3 optimization.
At 4th step of my optimizations I decided to use intel instruction crc32
, made specially for calculating crc32 hash.
New version of
Test results:
1 | 2 | 3 | 4 | 5 | AVERAGE | |
---|---|---|---|---|---|---|
TOTAL | 2,28 | 2,27 | 2,27 | 2,24 | 2,27 | 2,27 |
TEST | 2,18 | 2,15 | 2,14 | 2,15 | 2,15 | 2,15 |
Now I've already increased perfomance about by 2,5 times, comparing to -O3 - by 30%
I used callgrind once again to analyze performance of optimized version.
As we can see in picture, _hash_table_search
, which includes calls of crc32 hash (third place) and list_search (first place), being called a lot of times. We know, that function call prevents any optimizations by compiler and bears the overhead (prologue and epilogue of function call made for creating call frame) Let's optimize program by removing calls of this function, instread of call I will place code of this function where it is called.
Here are results:
1 | 2 | 3 | 4 | 5 | AVERAGE | |
---|---|---|---|---|---|---|
TOTAL | 2,04 | 2,02 | 2,06 | 2,02 | 2,05 | 2,04 |
TEST | 1,93 | 1,93 | 1,93 | 1,93 | 1,93 | 1,93 |
Comparing with previous version, this optimisation increased performance by almost 11% Good results, let's keep this optimisation.
As next step I used fast version of stcmp in assemnly version of _list_search
, using SSE vector instructions.
As reference while creating fast version of strcmp I used improved fast version of strcmp by Alex Fog from his website.
Here is code:
Results:
1 | 2 | 3 | 4 | 5 | AVERAGE | |
---|---|---|---|---|---|---|
TOTAL | 1,97 | 1,99 | 1,98 | 1,98 | 1,99 | 1,98 |
TEST | 1,9 | 1,9 | 1,89 | 1,89 | 1,9 | 1,90 |
This optimization barely brings 2% improvement in performance. This brings me to end of my optimization process. Not to decrease portability and understanding of code.
Number \ Version | My opt | My opt | -O3 | -O3 |
---|---|---|---|---|
Period | Total | Test | Total | Test |
1 | 1.97 | 1.90 | 2.90 | 2.85 |
2 | 1.99 | 1.90 | 2.89 | 2.84 |
3 | 1.98 | 1.89 | 2.93 | 2.84 |
4 | 1.99 | 1.89 | 2.92 | 2.83 |
5 | 1.98 | 1.90 | 2.90 | 2.84 |
Average | 1.98 | 1.90 | 2.91 | 2.84 |
Increase in efficiency - 2,86 times - almost 300% comparing to No_flags version and about 1,5 times comapring to -O3 optimization
Our teacher suggested to calculate 'efficiency coefficient' of out optimizations. Here it is:
Coeff = (increase * 1000) / (number of assebmly instruction written) = 68%