Chatbot optimization for college
student questions with LLM

The Challenge

To design a scalable retrieval bot capable of efficiently answering a wide range of student questions from a database. It involved training custom models to ensure high accuracy and relevance, while also managing the complexities of deployment on AWS SageMaker. Ensuring the system could handle high query volumes without compromising performance and maintaining scalability as the data grew were critical hurdles. Additionally, fine-tuning the models to adapt to the specific needs of student interactions posed its own set of challenges.

Hypothesis

  • A custom-trained, scalable retrieval bot utilizing advanced sentence embedding models and right data upsampling techniques would significantly improve the accuracy and relevance of answers to student queries.
  • By implementing transformer models with internal negative sampling and allowing clients to deploy and train models in any environment, the solution would offer flexibility, scalability, and continuous improvement.
  • Instant feedback and full feedback training would further enhance the system’s ability to learn and adapt in real time.

Execution

  • Phase 1: Explored various top-tier open models, including OpenAI’s models, and conducted a thorough evaluation.
  • Phase 2: Evaluated multiple sampling techniques and loss functions to identify the most effective method for training the models, focusing on improving recall@N metrics. These optimized models were deployed, and a user interface was created to collect feedback and allow the client to perform testing.
  • Phase 3: The client was provided with the capability to take the solution to their production environments using CloudFormation. They could train and deploy the models on SageMaker to handle dynamic traffic patterns while optimizing for cost.

Outcomes

  • The client successfully deployed this architecture in production, with notable improvements in retrieval performance.
  • Query response times for users remained consistently under 2-3 seconds, ensuring a smooth user experience.
  • The system proved capable of scaling efficiently to handle traffic spikes, adapting to varying user request volumes.
  • The entire model and infrastructure were deployed on AWS, leveraging Sagemaker for model training and Pinecone for vector storage, delivering a robust, scalable, and cost-effective solution.

Project Highlight

  • Recall performance increased from 71% to 92% positively impacting user satisfaction.