In this article I’ll teach you how to build a text classification model from scratch. You’ll enter some text from a language, and the app will identify which language it comes from (English, Spanish, Vietnamese, etc). All it’ll take you to get started is a rudimentary knowledge of Python, the command line, and Git.
Completing this project will give you a good sense of the “full stack” of technical concepts that data scientists encounter: data acquisition, data analysis, modeling, visualization, app development, etc.
This project can roughly be divided into three steps. The first involves downloading data from Wikipedia and saving it to a database. The second step is building a model that predicts a language when given some text. The third step is deploying the app to the web.