Lakehouse

Flink-Iceberg-Connector Write Process

October 10, 2022 · 1078 words · 6 min · Big Data Lakehouse Stream Compute Storage

The Iceberg community provides an official Flink Connector, and this chapter’s source code analysis is based on that.

Overview of the Write Submission Process

Flink writes data through RowData -> distributeStream -> WriterStream -> CommitterStream. Before data is committed, it is stored as intermediate files, which become visible to the system after being committed (through writing manifest, snapshot, and metadata files).

Apache-Iceberg Quick Investigation

October 5, 2022 · 1216 words · 6 min · Lakehouse Storage Big Data

A table format for large-scale analysis of datasets.
A specification for organizing data files and metadata files.
A schema semantic abstraction between storage and computation.
Developed and open-sourced by Netflix to enhance scalability, reliability, and usability.

Background

Issues encountered when migrating HIVE to the cloud: