SCDLDS talk
by Manik Varma
Distinguished Scientist and Vice President Microsoft Research India
Large Language Models (LLMs), such as GPT4 and O1, have delivered game-changing reasoning and synthesis capabilities leading pundits to proclaim that we have entered a new age of LLM reasoning. Yet, LLMs can make mistakes while answering simple factual queries, such as “Who did Ashoka University hire most recently?”, and can fail spectacularly at complex tasks such as “Write a 2 page document summarizing the history of Ashoka University”. A peek under the hood reveals that these mistakes are often caused by failures of the retrieval model which is responsible for fetching relevant information from the web and private databases. Such retrieval failures can lead to particularly egregious responses when the information required for completing the task at hand is not present in the LLM itself and needs to be fetched from external sources.
In this talk, I will discuss what the architecture and flow might look like for a retrieval platform for large-scale AI workloads that can significantly reduce such retrieval failures and thereby lead to much better LLM responses. I will also discuss how we can build a state-of-the-art generative retrieval model that forms the core of the platform and which can accurately fetch documents in milliseconds in a cost-effective manner. Finally, I will share some empirical evidence on how such a retrieval model might benefit millions of users around the world. Most parts of my talk should be broadly accessible to a lay audience.