
This project’s RAG (Retrieval Augmented Generation) system chain is dynamic across the variations V1, V2, and V3, with all variants following a Markov chain flow (no influence of past results). Intermediate processing occurs across multiple hosted/local sub-step model deployments, with local runs managed by either PyTorch or Ray and overall CUDA-optional.
This project aims to showcase an initial tackling of a generalistic issue institutions face as a core challenge in LLM product development; how to accurately retrieve - and evaluate retrieval - across vast collections of mixed-style knowledge documents.
Data Background: This document retrieves from publicly available Masterclass databanks and transcriptions, and the backing evaluation datasets have been synthetically generated into a set format from that space. For the purposes of restraining scope, this document will focus on the below directive.
Primary directive of this document: Allow readers to become accustom to the general system chain, evaluation results, and usage.
Primary directive of this experimentation:
Assess, across state-of-the-art (SoTA) NLP techniques, the optimal* strategy to retrieve historically relevant information to a conversational query chain given a document corpus with stochastic variability in test-styling.
<aside> ⚓ This package is containerizable, and can be built via the Dockerfile provided.
</aside>
This package prefers to be run in a Unix-like environment, with a CUDA GPU available during evaluation tasks. Windows has only been partially tested with AutoRAG and this package, so it is advisable to utilize WSL2.
pip is the preferred package manager for this project, and dependencies can be installed like so
pip install -r requirements_m3.txt