MIT6.824 Bigtable

September 16, 2021 · 1908 words · 9 min · Paper Reading MIT6.824 DFS Distributed System

I recently found a translated version of the Bigtable paper online and saved it, but hadn’t gotten around to reading it. Lately, I’ve noticed that Bigtable shares many design similarities with a current project in our group, so I took some time over the weekend to read through it. This is the last of Google’s three foundational distributed system papers, and although it wasn’t originally part of the MIT6.824 reading list, I’ve categorized it here for consistency.

MIT6.824 GFS

September 9, 2021 · 1121 words · 6 min · GFS MIT6.824 Paper Reading

This article introduces the Google File System (GFS) paper published in 2003, which proposed a distributed file system designed to store large volumes of data reliably, meeting Google’s data storage needs. This write-up reflects on the design goals, trade-offs, and architectural choices of GFS. Introduction GFS is a distributed file system developed by Google to meet the needs of data-intensive applications, using commodity hardware to provide a scalable and fault-tolerant solution.

Epoll and IO Multiplexing

August 15, 2021 · 834 words · 4 min · OS Linux Network IO

Let’s start with epoll. epoll is an I/O event notification mechanism in the Linux kernel, designed to replace select and poll. It aims to efficiently handle large numbers of file descriptors and supports the system’s maximum file open limit, providing excellent performance. Usage API epoll has three primary system calls: /** epoll_create * Creates an epoll instance and returns a file descriptor for it. * Needs to be closed afterward, as epfd also consumes the system's fd resources.

Linux Cgroups Overview

June 8, 2021 · 545 words · 3 min · Linux Docker Container

Linux Cgroups (Control Groups) provide the ability to limit, control, and monitor the resources used by a group of processes and their future child processes. These resources include CPU, memory, storage, and network. With Cgroups, it’s easy to limit a process’s resource usage and monitor its metrics in real time. Three Components of Cgroups cgroup A mechanism for managing groups of processes. A cgroup contains a group of processes, and various Linux subsystem parameters can be configured on this cgroup, associating a group of processes with a group of system parameters from subsystems.

Distributed Transactions

May 20, 2021 · 1097 words · 6 min · Architecture Microservice Transaction

Transactions and Distributed Transactions Transactions A transaction is a logical unit of work in a database, composed of a finite sequence of database operations. The database must ensure the atomicity of transaction operations: when a transaction is successful, it means that all operations in the transaction have been fully executed; if the transaction fails, all executed SQL operations are rolled back. A single-node database transaction has four main properties: Atomicity: The transaction is executed as a whole.