Chinese Spam Email Classification Based on Naive Bayes
May 6, 2020 · 897 words · 2 min · ML
Chinese Spam Email Classification Based on Naive Bayes Training and Testing Data This project primarily uses open-source data on GitHub. Data Processing First, we use regular expressions to filter the content of Chinese emails in the training set, removing all non-Chinese characters. The remaining content is then tokenized using jieba for word segmentation, and stopwords are filtered using a Chinese stopword list. The processed results for spam and normal emails