Natural Language Processing

K-Means, Gaussian Mixture, DBSCAN and OPTICS compared, with Python code examples

We’ve been working on how to cluster texts, useful for FAQs and text classification when not enough data, time or computational power is available to fine-tune a Transformer model.

In the first part (available here) we’ve seen how to get from texts to numerical vectors through Tokenization and Embedding. Now that our texts are in a machine-readable format, we can tackle our main objective: clustering them.

Clustering Methods

Clustering is one of the most useful Unsupervised Learning tasks out there, so a number of different ways to do it have been developed over the years. We’ll illustrate briefly the ones we’ve tried…

Natural Language Processing

Help chatbots deal with FAQs, with Python code for Tokenization, GloVe, and TF-IDF

Classifying things comes quite natural to us: our books, movies and music all have genres; the things we study are split between different subjects and even the food we eat belongs to different cuisines!
In recent years we’ve been able to develop better and better algorithms to classify text: models like BERT-ITPT-FiT (BERT + withIn-Task Pre-Training + Fine-Tuning) or XL-NET seem to be reigning champions in this category, at least in the 29 benchmark datasets available on PapersWithCode.

In recent years we’ve been able to develop better and better algorithms to classify text: models like BERT-ITPT-FiT (BERT + withIn-Task Pre-Training…

Natural Language Processing

Helping story writers with text generation

Natural Language Processing (NLP) is a wonderfully complex field, composed of two main branches: Natural Language Understanding (NLU) and Natural Language Generation (NLG). If we were talking about a kid learning English, we’d simply call them reading and writing. It’s an exciting time to work in NLP: the introduction of Transformer models in 2017 drastically improved performances, and the release of the seemingly all-powerful GPT-3 earlier this year has brought along a wave of excitement. We’ll talk more about those later.

First, a simple but powerful realization: writing is not easy. Authors often get stuck trying to find that perfect…

Computer Vision

Stories have always been part of human nature: we’ve been telling stories to children for as long as we’ve been around, to teach them about life, to warn them about dangers, to build up their character without risking their safety.

Stories are extremely important, and books are one of the main means by which they are told. With books, the reader identifies with the characters and feels their emotions. We want the reader to connect with the book as deeply as possible, and we want both the story and the illustrations to reflect who you are, personally: imagine a book…

Francesco Fumagalli

Data Science intern at Digitiamo. Passionate about Deep Learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store