How to Implement SkipList

November 21, 2021 · 705 words · 4 min · DataStructure SkipList

Some time ago, I decided to implement a simple LSM storage engine model. As part of that, I implemented a basic SkipList and BloomFilter with BitSet. However, due to work demands and after-hours laziness, the project was put on hold. Now that I’m thinking about it again, I realize I’ve forgotten some of the details, so I’m writing it down for future reference.

What is SkipList?

SkipList is an ordered data structure that can be seen as an alternative to balanced trees. It essentially uses sparse indexing to accelerate searches in a linked list structure. It combines both data and index into a single structure, allowing efficient insertions and deletions.

Balanced trees, such as BST and Red-Black Trees, solve the problem of tree imbalance but introduce rotation, coloring, and other complexities. In concurrent scenarios, these can lead to larger lock granularity and affect performance. Compared to balanced trees, SkipList avoids these problems.

Implementation

SkipLists are usually implemented on top of ordered linked lists. The key challenge with ordered linked lists is figuring out how to insert a new node while maintaining order.

For arrays, binary search can be used to quickly locate the insert position, and then elements are moved to make room. For linked lists, the overhead of moving elements does not exist, but they do not support jumping directly to a position, making it challenging to locate where to insert.

The essence of SkipLists is to maintain a linked list of nodes with multiple layers of sparse indices that can be used to efficiently locate nodes.

In the base level, all nodes are present. On the next level, approximately every other node is present, and so on. This approach reduces the average time complexity for search, insertion, and deletion.

Using Randomization

To efficiently maintain these index nodes, randomization is used to decide whether a newly added node should be part of the index.

For a SkipList of maximum level X, each time a new node is added, we use a random approach to determine whether the node should be indexed at each level, with a probability of 1/(2^n) for each level. This results in a roughly equal distribution similar to dividing nodes into even partitions.

Data Structures

type SkipListNode struct {
    data *codec.Entry
    nextPtrs []*SkipListNode
}

type SkipList struct {
    header, tail *SkipListNode
    level        int
    size         int
    rwmtx        *sync.RWMutex
    maxSize      int
}

Operations

The two most notable operations are search and insertion.

Search

The key step here is to use the sparse indices in the SkipList, moving from the top level downwards to efficiently locate the required position.

// findPreNode finds the node before the node with the given key
func (sl *SkipList) findPreNode(key []byte) (*SkipListNode, bool) {
    // Start from the highest level
    h := sl.header
    for i := sl.level - 1; i >= 0; i-- {
        for h.nextPtrs[i] != nil && bytes.Compare(h.nextPtrs[i].data.Key, key) != 1 {
            if bytes.Equal(h.nextPtrs[i].data.Key, key) {
                return h, true
            }
            h = h.nextPtrs[i]
        }
    }
    return nil, false
}

Insertion

First, locate the position to insert, then perform the insertion, and finally add indices as determined by randomization.

func (sl *SkipList) randomLevel() int {
    ans := 1
    rand.Seed(time.Now().Unix())
    for rand.Intn(2) == 0 && ans <= defaultMaxLevel {
        ans++
    }
    return ans
}

func (sl *SkipList) Insert(data *codec.Entry) *SkipListNode {
    sl.rwmtx.Lock()
    defer sl.rwmtx.Unlock()

    h := sl.header
    updateNode := make([]*SkipListNode, defaultMaxLevel) // stores the node before newNode
    // Search from the top level
    for i := sl.level - 1; i >= 0; i-- {
        // Loop while the current nextPtrs is not empty and data is smaller than the inserted one
        for h.nextPtrs[i] != nil && h.nextPtrs[i].data.Less(data) {
            h = h.nextPtrs[i]
        }
        updateNode[i] = h
    }
    // Choose the level to insert
    lvl := sl.randomLevel()
    if lvl > sl.level {
        // Insert into higher levels, we need to create header -> tail
        for i := sl.level; i < lvl; i++ {
            updateNode[i] = sl.header
        }
        sl.level = lvl
    }
    // Insert after the updated node
    n := NewSkipListNode(lvl, data)
    for i := 0; i < lvl; i++ {
        n.nextPtrs[i] = updateNode[i].nextPtrs[i]
        updateNode[i].nextPtrs[i] = n
    }
    sl.size++
    return n
}

References

Arch + DWM Setup Attempt Kylin Overview