BRIGHTCODE – Michał Żarnecki Portfolio

Hi, I'm Michał Żarnecki — Programmer, Machine Learning Specialist, and Educator. I specialize in building innovative systems and solutions at the intersection of artificial intelligence, machine learning, and data-driven technologies. With a strong foundation in Python and PHP, my work focuses on delivering impactful results and web based systems in areas such as data mining, big data, and natural language processing. On this website you can check some of my projects and recent activity.

Tag: LLM

Fot. Adam Stepien / ministerstwoportretu.pl

Python Summit 2024 at the Copernicus Science Center in Warsaw

Posted on 11 December 2024  in events

During my talk I shared the journey of building a system that maps unstructured company descriptions to official industry codes which is challenging because of 1800 possible classification categories and multiclass classification.

In presentation I described how over the years, we’ve evolved our solutions:
– From random forest classifier
– To zero-shot + classification dedicated Large Language Model (LLM)
– And finally, LLMs + Retrieval Augmented Generation (RAG)

, , , ,

[Top]

Tag: LLM

System for official industry codes classification

Posted on 21 November 2024  in projects

Solution for multi-class classification of unstructured texts containing company descriptions categorized into over 1800 classes (WZ 2008), using Python libraries, large language models (LLMs) and the Retrieval Augmented Generation (RAG) technique.
This project was evolving since 2021 together with artificial inteligence solutions and initially approached using a Random Forest Classifier, to be later replaced with zero-shot transformer based solution and finally solved with a solution based on RAG and LLMs, yielding significantly improved results.
More details can be found in article


, , , ,

[Top]

Tag: LLM

Data Science Summit 2024 at PGE National Stadium in Warsaw

Posted on 21 November 2024  in events

Lecture: Classifying unstructured texts into 1800 categories!
Problem: In this presentation, I will examine the development of a text classifier created by the team at CompanyHouse AG to address the challenge of classifying unstructured texts that describe companies’ activities into the official German industry codes, WZ 2008. Over the years, we have experimented with various techniques to manage classification across a vast number of categories (1,800 in total). I will discuss the strategies we employed to tackle this complexity and demonstrate the evolution of our model from a random forest classifier to an innovative solution based on large language models and retrieval-augmented generation (RAG) techniques.

Methodology: Our approach includes a range of methodologies: multiclass classification, retrieval-augmented generation (RAG), random forest classifiers, similarity algorithms, embedding techniques, and the use of vector databases.

Conclusions: Integrating additional knowledge into models using retrieval-augmented generation combined with similarity algorithms and techniques such as chain-of-thought reasoning can effectively address complex multiclass classification problems. This approach achieves high evaluation scores and outperforms pre-trained classifiers.

, , , , , ,

[Top]

Tag: LLM

Generative AI in text Mining – laboratories at the Collegium da Vinci

Posted on 12 July 2024  in lectures

The course familiarizes participants with aspects of generative artificial intelligence and the latest achievements in the field of natural language processing. Participants learn the theoretical foundations and architecture of Large Language Models (LLM) and gain practical skills in working with text data. The course places particular emphasis on understanding popular tasks using LLM such as text generation, machine translation, sentiment analysis, creating summaries and answering questions based on a database of source documents. After completing the course, the participant knows the techniques used in command engineering, metrics for evaluating the results generated by LLM, and methods for improving returned content. He can also apply large text models to a variety of applications in research, industry and other areas.

topics:
Generative AI tasks: translation, question-answer, summarize, sentiment
Architecture and types of LLM, encoder, decoder.
Text vectorization, positional coding, attention mechanism (Multi-Head Attention)
OpenAI GPT, Google Gemini, Mistral Mixtral, Meta Llama, Claude, FLAN models.
Prompt engineering, multi-task instruction fine-tuning, zero/one/few shot learning
Parameter-Efficient Fine-Tuning (PEFT), LoRA
Division of instructions into steps: chain-of-thought
Evaluation of LLM models: performance evaluation, ROUGE/BLEU metrics, benchmark
RAG – Retrieval Augmented Generation, vector database
LLM training computational challenges, scaling laws
Frameworks for working with LLM, LangChain, ReAct

, , , ,

[Top]

Tag: LLM

AI chatbot for analysing companies source documents

Posted on 26 May 2024  in projects

https://chat.companyhouse.de/


AI-based chatbot for retrieving reliable, up to date and precise information about companies.
Chatbot is based on streamlit framework and uses vector database based on postgres pg_vector extension to store and access trade register documents.
Application is using large language model (LLM) Llama3 together with retrieval augmented generation (RAG) approach which allows to ask and get response to any question related company and managers history as well as financial condition and important changes.
Together with response also source documents are listed making this approach reliable business intelligence tool.

Responsibilities:

  • build application prototype
  • implement application code parts
  • implement authentication mechanism
  • specify and coordinate works related to building chatbot interactions
  • specify and coordinate works related to sychronizing in real time source documents and make them accessible for LLM
  • measure answers quality

, , , , , , ,

[Top]

Tag: LLM

E-learning course: Machine learning – how to use the potential of data to get better results and make smart decisions

Posted on 3 January 2024  in lectures

Course scenario:

  1. Definition and applications of machine learning
    • Data deluge and the definition of machine learning
    • Machine learning examples and related fields of knowledge
    • Types of machine learning
  2. Machine learning tools used in the course
    • Programs used in the course
    • Orange Data Mining
    • Jupyter Lab
  3. Supervised machine learning
    • Machine learning process
    • Data collection, labeling and analysis
    • Feature engineering and division into training and testing sets
    • Model training and evaluation
    • Model export, corrective actions
    • Regression example
    • Classification example
(more…)

, , , , , , , , , , , , , , ,

[Top]