System for official industry codes classification
Posted on 21 November 2024 in projects
Solution for multi-class classification of unstructured texts containing company descriptions categorized into over 1800 classes (WZ 2008), using Python libraries, large language models (LLMs) and the Retrieval Augmented Generation (RAG) technique.
This project was evolving since 2021 together with artificial inteligence solutions and initially approached using a Random Forest Classifier, to be later replaced with zero-shot transformer based solution and finally solved with a solution based on RAG and LLMs, yielding significantly improved results.
More details can be found in article

