Build a Tokenizer for the Thai Language from Scratch | Towards Data Science

A step-by-step guide to building a Thai multilingual sub-word tokenizer based on a BPE algorithm trained on Thai and English datasets

By · · 1 min read
Build a Tokenizer for the Thai Language from Scratch | Towards Data Science

Source: Towards Data Science

A step-by-step guide to building a Thai multilingual sub-word tokenizer based on a BPE algorithm trained on Thai and English datasets