Build a Tokenizer for the Thai Language from Scratch
A step-by-step guide to building a Thai multilingual sub-word tokenizer based on a BPE algorithm trained on Thai and English datasets
Source: towardsdatascience.com
A step-by-step guide to building a Thai multilingual sub-word tokenizer based on a BPE algorithm trained on Thai and English datasets