Document Vetting for Credit Risk using Open Source LLMs
The Challenge
A financial institution faced challenges with a cumbersome and costly process to validate bank statements for loans and credit. This included checks for monthly transactions, a three-month transaction history, recent activity, and meeting specified minimum account balance requirements. Tatras created a GenAI based streamlined solution to automate document validation, improving efficiency, accuracy, and scalability while integrating essential checks.
Hypothesis
- A GenAI-based RAG system can extracts data from parsed bank statements.
- Open-source LLMs could offer cost savings and ensure data privacy.
- Prompt engineering on LLMs can validates data accurately.
- Regular expressions can efficiently parse date information.
Execution
- Developed RAG system with open-source LLMs and embeddings for accurate data extraction.
- Created validation module to verify extracted information for credit eligibility.
- Implemented regular expressions for additional validation of transaction dates.
- Libraries used: PyTorch and TensorFlow, Langchain, Llama Index Transformers, Fast-API
Outcomes
- Enhanced Document Vetting System efficiency.
- Achieved significant cost savings and ensured data privacy.
- Implemented robust cross-validation for transaction date extractions, effectively with handling various date formats.
- Deployed the system using Restful API
Project Highlights
- Achieved significant cost savings by utilizing open-source LLMs.
- Ensured data privacy by keeping all data within the premises.
- Developed an automated system that is less error-prone, scalable, and faster.