On the evaluation of StartupGPT: A Retrieval-Augmented AI Chatbot for Delivering Research-Driven Guidance to Startups
[Introduction] The study addresses the challenge of transferring research knowledge to the industry, with a particular focus on small businesses and startups, which often lack access to empirical insights. Large Language Models (LLMs), particularly those using Retrieval-Augmented Generation (RAG), offer potential for embedding knowledge from startup research into an interactive chatbot to support startup mentorship. However, empirical work exploring this application is limited. [Objective] The primary objective of this research is to design and evaluate a version of “StartupGPT,” an AI-driven chatbot that uses LLMs and RAG to provide advice for software startups by leveraging a knowledge base rooted in software startup research. [Methodology] The study follows the Design Science Research Methodology (DSRM) and spans three iterative cycles, with this paper focusing on Cycle 3. The prototype was tested with 11 startup founders, who provided both qualitative and quantitative feedback on the chatbot’s usefulness and satisfaction. [Results] The findings from user tests indicate that StartupGPT was generally perceived as relevant, reliable, and helpful. However, limitations were noted in its responses, which users found overly theoretical, lacking in concrete examples, and insufficiently personalized for specific startup contexts. [Conclusion] Future LLM-based interaction designed for startups should focus on improving interactivity, incorporating more context-aware and specific advice, and leveraging advanced AI techniques, such as fine-tuning, to better align the chatbot’s responses with the unique needs of individual startups.