CPU False Sharing

May 2, 2021 · 566 words · 3 min · CPU Cache

The motivation for this post comes from an interview question I was asked: What is CPU false sharing? CPU Cache Let’s start by discussing CPU cache. CPU cache is a type of storage medium introduced to bridge the speed gap between the CPU and main memory. In the pyramid-shaped storage hierarchy, it is located just below CPU registers. Its capacity is much smaller than that of main memory, but its speed can be close to the processor’s frequency.

MySQL Index Overview

March 21, 2021 · 516 words · 3 min · DB MySQL

Database indexes are sorted data structures in DBMS that help in quickly querying and updating data in a database. Generally, data structures used for building indexes include B-trees, B+ trees, hash tables, etc. MySQL uses B+ trees to build indexes. The reason for this choice is that a B+ tree node can store more data, and in a B+ tree, only leaf nodes store data, while non-leaf nodes store only indexes.

HTTPS Introduction

February 21, 2021 · 564 words · 3 min · Network HTTPS HTTP

HTTPS (HTTP over SSL) was introduced to address the security vulnerabilities of HTTP, such as eavesdropping and identity spoofing. It uses SSL or TLS to encrypt communication between the client and the server. Problems with HTTP Communication uses plain text, making it susceptible to eavesdropping. Unable to verify the identity of the communication party, making it vulnerable to spoofing (e.g., Denial of Service attacks). Cannot guarantee message integrity, making it possible for messages to be altered (e.

MIT6.824-MapReduce

January 22, 2021 · 1541 words · 8 min · MIT6.824 Distributed System Paper Reading

The third year of university has been quite intense, leaving me with little time to continue my studies on 6.824, so my progress stalled at Lab 1. With a bit more free time during the winter break, I decided to continue. Each paper or experiment will be recorded in this article. This is the first chapter of my Distributed System study notes. About the Paper The core content of the paper is the proposed MapReduce distributed computing model and the approach to implementing the Distributed MapReduce System, including the Master data structure, fault tolerance, and some refinements.

Chinese Spam Email Classification Based on Naive Bayes

May 6, 2020 · 897 words · 2 min · ML

Chinese Spam Email Classification Based on Naive Bayes Training and Testing Data This project primarily uses open-source data on GitHub. Data Processing First, we use regular expressions to filter the content of Chinese emails in the training set, removing all non-Chinese characters. The remaining content is then tokenized using jieba for word segmentation, and stopwords are filtered using a Chinese stopword list. The processed results for spam and normal emails