MIT6.824 Chain Replication

February 8, 2023 · 463 words · 3 min · Distributed System MIT6.824 ChainReplication

This post provides a brief overview of the Chain Replication (CR) paper, which introduces a simple but effective algorithm for providing linearizable consistency in storage services. For those interested in the detailed design, it’s best to refer directly to the original paper. Introduction In short, the Chain Replication (CR) paper presents a replicated state machine algorithm designed for storage services that require linearizable consistency. It uses a chain replication method to improve throughput and relies on multiple replicas to ensure service availability.

MIT6.824-ZooKeeper

January 3, 2023 · 399 words · 2 min · Distributed System MIT6.824 ZooKeeper

This article mainly discusses the design and practical considerations of the ZooKeeper system, such as wait-free and lock mechanisms, consistency choices, system-provided APIs, and specific semantic decisions. These trade-offs are the most insightful aspects of this article. Positioning ZooKeeper is a wait-free, high-performance coordination service for distributed applications. It supports the coordination needs of distributed applications by providing coordination primitives (specific APIs and data models). Design Keywords There are two key phrases in ZooKeeper’s positioning: high performance and distributed application coordination service.

Flink-Iceberg-Connector Write Process

October 10, 2022 · 1056 words · 5 min · Big Data Lake House Stream Compute Storage

The Iceberg community provides an official Flink Connector, and this chapter’s source code analysis is based on that. Overview of the Write Submission Process Flink writes data through RowData -> distributeStream -> WriterStream -> CommitterStream. Before data is committed, it is stored as intermediate files, which become visible to the system after being committed (through writing manifest, snapshot, and metadata files). private <T> DataStreamSink<T> chainIcebergOperators() { Preconditions.checkArgument(inputCreator != null, "Please use forRowData() or forMapperOutputType() to initialize the input DataStream.

Apache-ORC Quick Investigation

October 5, 2022 · 565 words · 3 min · Column Store Big Data Storage

Iceberg supports both ORC and Parquet columnar formats. Compared to Parquet, ORC offers advantages in query performance and ACID support. Considering the future data lakehouse requirements for query performance and ACID compliance, we are researching ORC to support a future demo involving Flink, Iceberg, and ORC. Research Focus: ORC file encoding, file organization, and indexing support. File Layout An ORC file can be divided into three main sections: Header: Identifies the file type.

Apache-Iceberg Quick Investigation

October 5, 2022 · 1208 words · 6 min · Lake House Storage Big Data

A table format for large-scale analysis of datasets. A specification for organizing data files and metadata files. A schema semantic abstraction between storage and computation. Developed and open-sourced by Netflix to enhance scalability, reliability, and usability. Background Issues encountered when migrating HIVE to the cloud: Dependency on List and Rename semantics makes it impossible to replace HDFS with cheaper OSS. Scalability issues: Schema information in Hive is centrally stored in metastore, which can become a performance bottleneck.