We’ve been working on how to cluster texts, useful for FAQs and text classification when not enough data, time or computational power is available to fine-tune a Transformer model.
In the first part (available here) we’ve seen how to get from texts to numerical vectors through Tokenization and Embedding. Now that our texts are in a machine-readable format, we can tackle our main objective: clustering them.
Clustering is one of the most useful Unsupervised Learning tasks out there, so a number of different ways to do it have been developed over the years. We’ll illustrate briefly the ones we’ve tried…
Classifying things comes quite natural to us: our books, movies and music all have genres; the things we study are split between different subjects and even the food we eat belongs to different cuisines!
In recent years we’ve been able to develop better and better algorithms to classify text: models like BERT-ITPT-FiT (BERT + withIn-Task Pre-Training + Fine-Tuning) or XL-NET seem to be reigning champions in this category, at least in the 29 benchmark datasets available on PapersWithCode.
Natural Language Processing (NLP) is a wonderfully complex field, composed of two main branches: Natural Language Understanding (NLU) and Natural Language Generation (NLG). If we were talking about a kid learning English, we’d simply call them reading and writing. It’s an exciting time to work in NLP: the introduction of Transformer models in 2017 drastically improved performances, and the release of the seemingly all-powerful GPT-3 earlier this year has brought along a wave of excitement. We’ll talk more about those later.
Stories have always been part of human nature: we’ve been telling stories to children for as long as we’ve been around, to teach them about life, to warn them about dangers, to build up their character without risking their safety.
Stories are extremely important, and books are one of the main means by which they are told. With books, the reader identifies with the characters and feels their emotions. We want the reader to connect with the book as deeply as possible, and we want both the story and the illustrations to reflect who you are, personally: imagine a book…
Data Science intern at Digitiamo. Passionate about Deep Learning.