Build a Tokenizer for the Thai Language from Scratch

A step-by-step guide to building a Thai multilingual sub-word tokenizer based on a BPE algorithm trained on Thai and English datasets

By · · 1 min read

Source: towardsdatascience.com

A step-by-step guide to building a Thai multilingual sub-word tokenizer based on a BPE algorithm trained on Thai and English datasets