Apache-Iceberg Quick Investigation

October 5, 2022 · 1208 words · 6 min · Lake House Storage Big Data

A table format for large-scale analysis of datasets. A specification for organizing data files and metadata files. A schema semantic abstraction between storage and computation. Developed and open-sourced by Netflix to enhance scalability, reliability, and usability. Background Issues encountered when migrating HIVE to the cloud: Dependency on List and Rename semantics makes it impossible to replace HDFS with cheaper OSS. Scalability issues: Schema information in Hive is centrally stored in metastore, which can become a performance bottleneck.

LevelDB Write

May 10, 2022 · 712 words · 4 min · LSM LevelDB

This is the second chapter of my notes on reading the LevelDB source code, focusing on the write flow of LevelDB. This article is not a step-by-step source code tutorial, but rather a learning note that records my questions and thoughts. Main Process The main write logic of LevelDB is relatively simple. First, the write operation is encapsulated into a WriteBatch, and then it is executed. Status DB::Put(const WriteOptions& opt, const Slice& key, const Slice& value) { WriteBatch batch; batch.

MIT6.824-RaftKV

April 15, 2022 · 1039 words · 5 min · Raft Distributed System Consensus MIT6.824

Earlier, I looked at the code of Casbin-Mesh because I wanted to try GSOC. Casbin-Mesh is a distributed Casbin application based on Raft. This RaftKV in MIT6.824 is quite similar, so I took the opportunity to write this blog. Lab Overview Lab 03 involves building a distributed KV service based on Raft. We need to implement the server and client for this service. The structure of RaftKV and the interaction between its modules are shown below:

LevelDB Startup

April 9, 2022 · 1312 words · 7 min · LSM LevelDB

This is the first chapter of my notes on reading the LevelDB source code, focusing on the startup process of LevelDB. This article is not a step-by-step source code tutorial, but rather a learning note that records my questions and thoughts. A code repository with annotations will be shared on GitHub later for those interested in studying it. Prerequisites Database Files For now, I won’t delve into the encoding and naming details of these files (as I haven’t reached that part yet).

MIT6.824-Raft

February 21, 2022 · 953 words · 5 min · Paper Reading Consensu Distributed System MIT6.824

Finally, I managed to complete Lab 02 during this winter break, which had been on hold for quite some time. I was stuck on one of the cases in Test 2B for a while. During the winter break, I revisited the implementations from experts, and finally completed all the tasks, so I decided to document them briefly. Algorithm Overview The basis of consensus algorithms is the replicated state machine, which means that executing the same deterministic commands in the same order will eventually lead to a consistent state.