An alternative tuple-storage engine for Casbin Mesh / Casbin — GSOC 2022 Proposal

About me

Basic Infomation

First / Last Name: Xie Kai
Email: chenlan2333@gmail.com
QQ : 1633849228
School/University: Beijing University of Posts and Telecommunications
Graduation Date: July, 2022
Major/Focus: Software Engineering
Location: Beijing, China
Timezone: China Standard Time (CST), UTC +8
Github Profile: https://github.com/noneback
Personal Blog: http://noneback.github.io

Open Source Experience

I used to make contribution on those open source projects:

MaxtrixOne : Hyperconverged cloud-edge native database
flame-go: cache : a middleware that provides the cache management for Flamego
flame-go: session : a middleware that provides the session management for Flamego
casnode : An open-source forum (BBS) software developed by Go and React
Toys : Toys written by myself.

Other Information

Currently, I am learning mit6.824 and cmu15-445 courses and have finished MapReduce and Raft Lab. I have basic concepts of page layout, indexing (hash index, B+ tree index), and multi-version concurrency control.
Used to work in the Business Department and the Distributed Storage Department in Bytebance as an internship.

Problem Description

Currently, Casbin uses golang built-in map structure to maintain policies in the main memory and persist the policies via adapter abstraction.

If policies data grows, however, the growing cost of main memory resources and bad performance make the memory management strategy not tolerable anymore. We need to find a better way to manage the casbin in-memory data when data grows.

Implementation Plan

Breif Design

From my point of view, our main goal is to reduce the cost of memory as well as keep good performance handling policies read and write requests.

In order to achieve those key goals, we can introduce an experimental tuple storage to get charge of storing those policies, turning the policies management strategy from memory-oriented to disk-oriented. We can even make a better abstraction of the storage layer so that we can use different engines (row, column) for the different workloads.

In general, we can take the following parts into consideration:

API for upper layer

Design API for the upper level.

The API is the key design if we want to make the storage engine a plugin.

Derived from adapter will be fine.
workload optimizer
Try to optimize those workloads to improve performance.

Key design:
- Estimation of requests cost
- Strategy to reconstruct data access path.
Buffer Pool management

In-memory data structure management.

Key design: replace strategy
Indexing

Index to accelerate read and write requests. Key design is what index we should use, and how to build indexes for policies to improve performance.

Considering the casbin’s common workload, we can provide b+tree and hash structure for indexing.
Data Storage Structures
Key design:
- File organization. B+tree sequence file organization.
- Page and tuple layout. Our policies data is actually varchar so our tuples are variable-length records. And we can use slotted-page structure to organize records.
- encoding. Row-based or Column-based.
Transaction if necessary

we can use mvcc to improve performance.

Reference & Resource

Codebase

Bustdb codebase
MIT-6.830 Simpledb Codebase
Risinglight DB
Badger DB

Paper

An Empirical Evaluation of In-Memory Multi-Version Concurrency Control
Column-Stores vs. Row-Stores: How Different Are They Really?

Other

Database System Concepts
CMU-15445 DB Course
CMU-15721 DB Course

Timeline

Before the official coding time

May 1 - May 23

Learn more about Casbin source code and Casbin Community and try to solve some basic issues on the codebase.
Have a discussion with the mentor to determine what feature we need to add and make a basic design overview of the project.
Do research about how to implement our project best and write a detailed design document about it.

May 24 - June 14

Carefully design and write the basic framework of our whole project.
Write UT for framework code.

Official coding period starts

June 15 - June 28

Write code about data encoding and page tuple layout (disk manager)
Write UT for data storage layer

June 29 - July 13

Implement a module about buffer pool and Index management.
Write UT for the buffer pool and indexing.

July 14 - July 28

Implement the API and workload optimizer for the upper layer.
Optimized for workload from the upper level.

July 29 - August 5

Implement Transaction Module
Write UT for transaction part

August 6 - August 13

Polish our project
CI integration and document work. Polish the document finished

Extra Time

A buffer kept for any unpredictable delay

Deliverables

A Casbin built-in embedded disk-oriented tuple storage engine.

The Engine should contain ：

Carefully designed API for upper Casbin internal module.
Storage management. Include file organization and page layout.
Buffer pool management.
Index management.
An workload optimizer for the upper layer.
Transaction part if necessary.