Delphi – Quality Evaluation of AI Chatbot
All Industries
Global
Frontend: Angular; Backend: .NET; API Development: Custom APIs for data integration; Kubernetes cluster
Quality Evaluation of AI Chatbot
An AI Quality Assurance platform designed to validate both modular AI components and complete AI systems. The product supports assessment of RAG, Standalone AI, and entire AI solutions against various factors including accuracy, hallucination, relevancy, toxicity, faithfulness, and bias. It includes prompt evaluation, execution tracking, and performance dashboards for seamless AI validation.
Identifying the unique obstacles that hinder progress and uncovering the root causes behind complex business problems.
The platform should allow users to run tests on different AI models and chatbots — including both single-turn and multi-turn interactions — to assess how they respond in various scenarios.
After testing, the system should present results using clear visuals like charts and dashboards, making it easy to understand performance and spot issues in AI responses.
Users should be able to compare different test results to evaluate how changes to the system impact quality metrics such as accuracy, hallucination, and toxicity.
It should be possible to test multiple AI solutions simultaneously so teams can work on several projects in parallel without performance issues.
All features should be available in one simple, user-friendly platform allowing both technical and non-technical users to easily register models, run tests, view results, and make comparisons.
Developing strategic, innovative solutions designed to address each challenge with precision, ensuring measurable impact and long-term success.
Designed a modular BE with an audit gateway to dynamically trigger tests for selected quality dimensions such as accuracy, bias, and toxicity.
Built an interactive UI with screens for chatbot registration, test execution, and performance dashboards for clear, visual result presentation.
Integrated test execution pipelines referencing OpenAI architecture, enabling validation of both Frontend (MS Teams) and Backe(OpenAI API) chatbot responses.
Implemented standalone evaluation modules including accuracy and toxicity checks, and integrated DeepEval/RAGAS scoring for structured LLM QA.
Supported both individual Q&A and conversational/batch validation flows for complete test coverage across diverse AI solution types.
Delivering tangible results that drive operational efficiency, data trust, and sustained business value for the organisation.
All-in-One AI QA Hub: Consolidates chatbot registration, test execution, and scoring into one centralised platform, significantly reducing QA effort.
Saves significant time for QA teams by enabling faster and better AI quality assurance across all solution types.
Plug-and-Play AI Validation: Easily extendable to new AI modules like OCR, Sentiment Analysis, and Vision AI with minimal setup.
Voice of the Tester: Supports evaluation of both voice and text-based interactions for real-world usability (future scope).
Prompt Intelligence: Helps refine system prompts by detecting bias, toxicity, redundancy, and improving response quality with an AI Scorecard Dashboard.