BRIGHTCODE – Michał Żarnecki Portfolio

Hi, I'm Michał Żarnecki — Programmer, Machine Learning Specialist, and Educator. I specialize in building innovative systems and solutions at the intersection of artificial intelligence, machine learning, and data-driven technologies. With a strong foundation in Python and PHP, my work focuses on delivering impactful results and web based systems in areas such as data mining, big data, and natural language processing. On this website you can check some of my projects and recent activity.

Tag: NLP

System for official industry codes classification

Posted on 21 November 2024  in projects

Solution for multi-class classification of unstructured texts containing company descriptions categorized into over 1800 classes (WZ 2008), using Python libraries, large language models (LLMs) and the Retrieval Augmented Generation (RAG) technique.
This project was evolving since 2021 together with artificial inteligence solutions and initially approached using a Random Forest Classifier, to be later replaced with zero-shot transformer based solution and finally solved with a solution based on RAG and LLMs, yielding significantly improved results.
More details can be found in article


, , , ,

[Top]

Tag: NLP

Data Science Summit 2024 at PGE National Stadium in Warsaw

Posted on 21 November 2024  in events

Lecture: Classifying unstructured texts into 1800 categories!
Problem: In this presentation, I will examine the development of a text classifier created by the team at CompanyHouse AG to address the challenge of classifying unstructured texts that describe companies’ activities into the official German industry codes, WZ 2008. Over the years, we have experimented with various techniques to manage classification across a vast number of categories (1,800 in total). I will discuss the strategies we employed to tackle this complexity and demonstrate the evolution of our model from a random forest classifier to an innovative solution based on large language models and retrieval-augmented generation (RAG) techniques.

Methodology: Our approach includes a range of methodologies: multiclass classification, retrieval-augmented generation (RAG), random forest classifiers, similarity algorithms, embedding techniques, and the use of vector databases.

Conclusions: Integrating additional knowledge into models using retrieval-augmented generation combined with similarity algorithms and techniques such as chain-of-thought reasoning can effectively address complex multiclass classification problems. This approach achieves high evaluation scores and outperforms pre-trained classifiers.

, , , , , ,

[Top]

Tag: NLP

E-learning course: Machine learning – how to use the potential of data to get better results and make smart decisions

Posted on 3 January 2024  in lectures

Course scenario:

  1. Definition and applications of machine learning
    • Data deluge and the definition of machine learning
    • Machine learning examples and related fields of knowledge
    • Types of machine learning
  2. Machine learning tools used in the course
    • Programs used in the course
    • Orange Data Mining
    • Jupyter Lab
  3. Supervised machine learning
    • Machine learning process
    • Data collection, labeling and analysis
    • Feature engineering and division into training and testing sets
    • Model training and evaluation
    • Model export, corrective actions
    • Regression example
    • Classification example
(more…)

, , , , , , , , , , , , , , ,

[Top]

Tag: NLP

System to collect networking and finance data about German companies

Posted on 1 July 2020  in projects

Web application to collect networking and finance data about German companies.
companyhouse.de

Responsibilities:

  • Implementing data mining tools and parsers using deterministic algorithms and deep learning models
  • creating fast and efficient search engine
  • carrying out integration with external platforms, APIs, web-services
  • working with Selenium, automation of acceptance, integration, functional and unit tests, TDD
  • conducting data analysis using Python, R
  • server environment setup and configuration

, , , , , , , , , , , , , , , , , , , , , , ,

[Top]