Chinese Spam Email Classification Based on Naive Bayes

May 6, 2020 · 897 words · 2 min · ML

Chinese Spam Email Classification Based on Naive Bayes Training and Testing Data This project primarily uses open-source data on GitHub. Data Processing First, we use regular expressions to filter the content of Chinese emails in the training set, removing all non-Chinese characters. The remaining content is then tokenized using jieba for word segmentation, and stopwords are filtered using a Chinese stopword list. The processed results for spam and normal emails