How to Implement SkipList
November 21, 2021 · 705 words · 4 min · DataStructure SkipList
Some time ago, I decided to implement a simple LSM storage engine model. As part of that, I implemented a basic SkipList and BloomFilter with BitSet. However, due to work demands and after-hours laziness, the project was put on hold. Now that I’m thinking about it again, I realize I’ve forgotten some of the details, so I’m writing it down for future reference.
What is SkipList?
SkipList is an ordered data structure that can be seen as an alternative to balanced trees. It essentially uses sparse indexing to accelerate searches in a linked list structure. It combines both data and index into a single structure, allowing efficient insertions and deletions.
Balanced trees, such as BST and Red-Black Trees, solve the problem of tree imbalance but introduce rotation, coloring, and other complexities. In concurrent scenarios, these can lead to larger lock granularity and affect performance. Compared to balanced trees, SkipList avoids these problems.
Implementation
SkipLists are usually implemented on top of ordered linked lists. The key challenge with ordered linked lists is figuring out how to insert a new node while maintaining order.
For arrays, binary search can be used to quickly locate the insert position, and then elements are moved to make room. For linked lists, the overhead of moving elements does not exist, but they do not support jumping directly to a position, making it challenging to locate where to insert.
The essence of SkipLists is to maintain a linked list of nodes with multiple layers of sparse indices that can be used to efficiently locate nodes.
In the base level, all nodes are present. On the next level, approximately every other node is present, and so on. This approach reduces the average time complexity for search, insertion, and deletion.
Using Randomization
To efficiently maintain these index nodes, randomization is used to decide whether a newly added node should be part of the index.
For a SkipList of maximum level
X
, each time a new node is added, we use a random approach to determine whether the node should be indexed at each level, with a probability of1/(2^n)
for each level. This results in a roughly equal distribution similar to dividing nodes into even partitions.
Data Structures
type SkipListNode struct {
data *codec.Entry
nextPtrs []*SkipListNode
}
type SkipList struct {
header, tail *SkipListNode
level int
size int
rwmtx *sync.RWMutex
maxSize int
}
Operations
The two most notable operations are search and insertion.
Search
The key step here is to use the sparse indices in the SkipList, moving from the top level downwards to efficiently locate the required position.
// findPreNode finds the node before the node with the given key
func (sl *SkipList) findPreNode(key []byte) (*SkipListNode, bool) {
// Start from the highest level
h := sl.header
for i := sl.level - 1; i >= 0; i-- {
for h.nextPtrs[i] != nil && bytes.Compare(h.nextPtrs[i].data.Key, key) != 1 {
if bytes.Equal(h.nextPtrs[i].data.Key, key) {
return h, true
}
h = h.nextPtrs[i]
}
}
return nil, false
}
Insertion
First, locate the position to insert, then perform the insertion, and finally add indices as determined by randomization.
func (sl *SkipList) randomLevel() int {
ans := 1
rand.Seed(time.Now().Unix())
for rand.Intn(2) == 0 && ans <= defaultMaxLevel {
ans++
}
return ans
}
func (sl *SkipList) Insert(data *codec.Entry) *SkipListNode {
sl.rwmtx.Lock()
defer sl.rwmtx.Unlock()
h := sl.header
updateNode := make([]*SkipListNode, defaultMaxLevel) // stores the node before newNode
// Search from the top level
for i := sl.level - 1; i >= 0; i-- {
// Loop while the current nextPtrs is not empty and data is smaller than the inserted one
for h.nextPtrs[i] != nil && h.nextPtrs[i].data.Less(data) {
h = h.nextPtrs[i]
}
updateNode[i] = h
}
// Choose the level to insert
lvl := sl.randomLevel()
if lvl > sl.level {
// Insert into higher levels, we need to create header -> tail
for i := sl.level; i < lvl; i++ {
updateNode[i] = sl.header
}
sl.level = lvl
}
// Insert after the updated node
n := NewSkipListNode(lvl, data)
for i := 0; i < lvl; i++ {
n.nextPtrs[i] = updateNode[i].nextPtrs[i]
updateNode[i].nextPtrs[i] = n
}
sl.size++
return n
}