The lookup engine receives a 32-bit slice out of the hash and checks whether it is present in the set of interesting values. Since the size of the interesting hash set is on the order of hundreds, a direct CAM-style implementation (comparing all values in the set at once) is not feasible.
Instead, a tree-based structure was chosen. The input word is split into groups of bits (10, 1, 2, 2, 17). Those bits index subsequent levels of the tree, the values read from previous levels acting as base pointers for those lookups.
Since a direct implementation leads to a significant amount of wasted space, blocks of pointers containing only one "hit" are compressed into single entries. The parent entry in that case contains a select value for the next block, which allows it to specify whether a hit happened or not. Compressed Level4 entries are not even allocated - the mere existence of a Level3 pointer implies a compressed Level4 entry would exist, so it would carry no information.
Level4 structure is different than the previous levels. Each entry is a 2-way associative cache with 16 cachelines. Pairs of cachelines (which are associative between them) are indexed by the 3 LSBs of the address.
Another possible implementation would be a binary tree. It'd consume both ports on each of 5 blocks, though, so a write would have to block the lookup engine.
(c) 2007 Stanislaw Skowronek
Contact me: nsa unaligned org (figure out where to put the @ and .)