Vintage Curve Analysis for Credit Risk using Natural Language
Queries using LLMs

The Challenge

A financial institution wanted to build a question-answering system capable of responding to natural language queries. This system would leverage bank loan dataset to generate responses and was to draw the Vintage Curve for given queries. Tatras used an open-source large language model to transform natural language questions into valid SQL queries. The system retrieves the necessary data and presents the results in natural language, along with a Vintage Curve generated by AI.

Hypothesis

  • An open-source model can effectively take a natural language query and generate a SQL query that will run against the bank loan database.
  • LIDA library powered by opensource LLM focuses on the automatic generation of visualizations and infographics.

Execution

  • Ensure the model generates accurate SQL queries.
  • Enhance the model’s ability to understand historical queries.
  • Ensure that the generated Vintage curve are accurate.
  • Ensure that the correct tables, columns, and filters are selected accurately for a given query.
  • Libraries used: Langchain, PyTorch, Transformers, SQL Databases, vector databases, and LIDA library.

Outcomes

  • A chatbot that processes natural language queries and generates SQL queries to fetch the data.
  • Also provides vintage Curve, that enables the users to analyze the performance of a cohort of loans.
  • The chatbot is able to retain context from the previous conversation history.
  • A GenAI-based intelligent chatbot capable of understanding complex user queries and generating accurate Vintage Curve.
  • Achieved significant cost savings by utilizing open-source LLMs.
  • Ensured data privacy by keeping all data within the premises.

Project highlights

  • Real-time system is significantly reducing time in consumer credit risk decisions.