Document Vetting for Credit Risk using Open Source LLMs

The Challenge

A financial institution faced challenges with a cumbersome and costly process to validate bank statements for loans and credit. This included checks for monthly transactions, a three-month transaction history, recent activity, and meeting specified minimum account balance requirements. Tatras created a GenAI based streamlined solution to automate document validation, improving efficiency, accuracy, and scalability while integrating essential checks.

Hypothesis

  • A GenAI-based RAG system can extracts data from parsed bank statements.
  • Open-source LLMs could offer cost savings and ensure data privacy.
  • Prompt engineering on LLMs can validates data accurately.
  • Regular expressions can efficiently parse date information.

Execution

  • Developed RAG system with open-source LLMs and embeddings for accurate data extraction.
  • Created validation module to verify extracted information for credit eligibility.
  • Implemented regular expressions for additional validation of transaction dates.
  • Libraries used: PyTorch and TensorFlow, Langchain, Llama Index Transformers, Fast-API

Outcomes

  • Enhanced Document Vetting System efficiency.
  • Achieved significant cost savings and ensured data privacy.
  • Implemented robust cross-validation for transaction date extractions, effectively with handling various date formats.
  • Deployed the system using Restful API

Project Highlights

  • Achieved significant cost savings by utilizing open-source LLMs.
  • Ensured data privacy by keeping all data within the premises.
  • Developed an automated system that is less error-prone, scalable, and faster.