An alternative tuple-storage engine for Casbin Mesh / Casbin — GSOC 2022 Proposal

About me

Basic Infomation

Open Source Experience

I used to make contribution on those open source projects:

  • MaxtrixOne : Hyperconverged cloud-edge native database

  • flame-go: cache : a middleware that provides the cache management for Flamego

  • flame-go: session : a middleware that provides the session management for Flamego

  • casnode : An open-source forum (BBS) software developed by Go and React

  • Toys : Toys written by myself.

Other Information

  • Currently, I am learning mit6.824 and cmu15-445 courses and have finished MapReduce and Raft Lab. I have basic concepts of page layout, indexing (hash index, B+ tree index), and multi-version concurrency control.

  • Used to work in the Business Department and the Distributed Storage Department in Bytebance as an internship.

Problem Description

Currently, Casbin uses golang built-in map structure to maintain policies in the main memory and persist the policies via adapter abstraction.

If policies data grows, however, the growing cost of main memory resources and bad performance make the memory management strategy not tolerable anymore. We need to find a better way to manage the casbin in-memory data when data grows.

Implementation Plan

Breif Design

From my point of view, our main goal is to reduce the cost of memory as well as keep good performance handling policies read and write requests.

In order to achieve those key goals, we can introduce an experimental tuple storage to get charge of storing those policies, turning the policies management strategy from memory-oriented to disk-oriented. We can even make a better abstraction of the storage layer so that we can use different engines (row, column) for the different workloads.

In general, we can take the following parts into consideration:

  • API for upper layer

    Design API for the upper level.

    The API is the key design if we want to make the storage engine a plugin.

    Derived from adapter will be fine.

  • workload optimizer

    Try to optimize those workloads to improve performance.

    Key design:

    • Estimation of requests cost

    • Strategy to reconstruct data access path.

  • Buffer Pool management

    In-memory data structure management.

    Key design: replace strategy

  • Indexing

    Index to accelerate read and write requests. Key design is what index we should use, and how to build indexes for policies to improve performance.

    Considering the casbin’s common workload, we can provide b+tree and hash structure for indexing.

  • Data Storage Structures

    Key design:

    • File organization. B+tree sequence file organization.

    • Page and tuple layout. Our policies data is actually varchar so our tuples are variable-length records. And we can use slotted-page structure to organize records.

    • encoding. Row-based or Column-based.

  • Transaction if necessary

    we can use mvcc to improve performance.

Reference & Resource

Codebase

  • Bustdb codebase

  • MIT-6.830 Simpledb Codebase

  • Risinglight DB

  • Badger DB

Paper

  • An Empirical Evaluation of In-Memory Multi-Version Concurrency Control

  • Column-Stores vs. Row-Stores: How Different Are They Really?

Other

  • Database System Concepts

  • CMU-15445 DB Course

  • CMU-15721 DB Course

Timeline

Before the official coding time

May 1 - May 23

  • Learn more about Casbin source code and Casbin Community and try to solve some basic issues on the codebase.

  • Have a discussion with the mentor to determine what feature we need to add and make a basic design overview of the project.

  • Do research about how to implement our project best and write a detailed design document about it.

May 24 - June 14

  • Carefully design and write the basic framework of our whole project.

  • Write UT for framework code.

Official coding period starts

June 15 - June 28

  • Write code about data encoding and page tuple layout (disk manager)

  • Write UT for data storage layer

June 29 - July 13

  • Implement a module about buffer pool and Index management.

  • Write UT for the buffer pool and indexing.

July 14 - July 28

  • Implement the API and workload optimizer for the upper layer.

  • Optimized for workload from the upper level.

July 29 - August 5

  • Implement Transaction Module

  • Write UT for transaction part

August 6 - August 13

  • Polish our project

  • CI integration and document work. Polish the document finished

Extra Time

  • A buffer kept for any unpredictable delay

Deliverables

A Casbin built-in embedded disk-oriented tuple storage engine.

The Engine should contain :

  • Carefully designed API for upper Casbin internal module.

  • Storage management. Include file organization and page layout.

  • Buffer pool management.

  • Index management.

  • An workload optimizer for the upper layer.

  • Transaction part if necessary.