Self-Healing Knowledge Base with AI-Driven Metadata and Taxonomy at Scale

The Challenge

Document bases with potentially million documents are very challenging to manage, classic knowledge base scalability challenges. This is required for proper organization, searchability, removing duplication and many more. Most of the documents do not have any tags / categories to identify their content (a gap in metadata management with AI). There is a possibility of duplication in document content and this can sometimes lead to contradiction of facts.

Hypothesis

To address similarity and contradiction detection we need to develop a system that incorporates Language model and Human interaction, moving toward a self-healing knowledge base. There should be a learning loop for the Language model that can accept few human feedback and apply those to more documents in an automated way. Also the language model should be capable of looking at the document content and generating (or extending existing) taxonomy for the knowledge base via taxonomy generation with LLMs.

Execution

  • A LLM and Traditional NLP based system to detect and explain duplication and contradiction for documents that can be scheduled to trigger at regular intervals and remove / suggest conflicting and duplicate documents, forming the backbone of automated knowledge base maintenance.
  • Implementation of a LLM Mentor network that can learn from User feedback on several predictions for contradiction and duplication, supporting metadata management with AI at scale.
  • A taxonomy generator that uses LLM and unsupervised learning techniques along with traditional NLP algorithms. This can generate new taxonomy or extend existing taxonomy or both, enabling robust taxonomy generation with LLMs.

Outcomes

The solution is actually implementation of a new system. The output is evaluated against synthetically generated test data as well as actual user feedback. Based on that the solution is accepted and deployed in production, establishing a self-healing knowledge base through automated knowledge base maintenance that directly addresses knowledge base scalability challenges.