I think the following paper by Scott McFarling provides the detailed answer:
http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-TN-36.pdf
Let me use your code to explain.
unsigned char tab[1<<TABLE_BITS];
is the Pattern History Table. Each entry in the tab keeps a 2-bit saturating counter. The direction of the conditional branch is finally determined by the MSB of the counter:
u.direction_prediction (tab[u.index] >> 1);
The reason why we use a counter of two or more bits instead of just one bit is to make the pattern less sensitive to reduce misprediction. For example,
for( int i = 0; i < m; i++ )
{
for( int j = 0; j < n; j++ )
{
...
}
}
when the inner loop is executed for the next time, one bit counter will mispredict the branch while 2-bit counter can prevent it.
The next is how to find the correct pattern in the Pattern History Table.
The naive way is to use branch address as index. But it ignores the correlation between different branches. That is why Global Branch History is introduced (For more details, please refer to http://www.eecg.utoronto.ca/~moshovos/ACA06/readings/two-level-bpred.pdf).
In your code,
unsigned int history;
is the Branch History Register which stores the Global Branch History.
Then some guys have found that combining Global Branch History and Branch Address as index can lead to more accurate prediction than just using one of them. The reason is that both Global Branch History and Branch Address affect the branch pattern.
If ignoring one of them, different branch pattern may be hashed to the same position of the Pattern History Table, thus causing collision problem.
Before Gshare is proposed, there is a solution called Gselect, which uses concatenation of Global Branch History and Branch Address as index of Pattern History Table.
The solution provided by Gshare is the hash function of
index = branch_addr XOR branch_history
This is what exactly what the following code means:
u.index = (history << (TABLE_BITS - HISTORY_LENGTH)) ^ (b.address & ((1<<TABLE_BITS)-1));
Scott McFarling s paper provides a good example to show how Gshare works better than Gselect:
- Branch Address=1111_1111 Global Branch History=0000_0000
- Branch Address=1111_1111 Global Branch History=1000_0000
Assume that we use the following Gselect strategy to prevent bias:
index = { {branch_addr[7:4]}, {branch_history[3:0]} }
Then Gselect will produce 1111_0000 for both cases while Gshare can distinguish the different patterns.
As far as I know, Gshare turns out to be the best solution by far to remove the collision.