A Conceptual Framework for a Representational Approach to Information Retrieval
Abstract:
Information retrieval (IR) - the challenge of connecting users to previously stored relevant information - dates back millennia.
The technologies have changed - from clay tablets stacked in a storehouse to books arranged according to the Dewey Decimal Classification to digital content indexed by web search engines - but the aims largely have not.
With the advent of deep learning in the "neural age", IR research of late has been flourishing, particularly building on advances in pretrained transformer models. Today, there is a confusing myriad of competing approaches: cross-encoders vs. bi-encoders, dense vs. sparse representations, inverted indexes vs. approximate nearest neighbors, etc.
In this talk, I present a conceptual framework for understanding recent developments in information retrieval. I propose a representational approach that breaks the core text retrieval problem into a logical scoring model and a physical retrieval model.
The scoring model is defined in terms of encoders, which map queries and documents into a representational space, and a comparison function that computes query-document scores. The physical retrieval model defines how a system produces the top-k scoring documents from an arbitrarily large corpus with respect to a query. I explain how recent developments in IR can be seen as different parameterizations in this framework, and that a unified view suggests a number of open research questions, providing a roadmap for future work.
Biography:
Professor Jimmy Lin holds the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo.
For a quarter of a century, Lin's research has been driven by the quest to develop methods and build tools that connect users to relevant information. His work mostly lies at the intersection of information retrieval and natural language processing, with a focus on two fundamental challenges: those of understanding and scale.
His work mostly lies at the intersection of information retrieval and natural language processing, with a focus on two fundamental challenges: those of understanding and scale.
Host
Assoc. Prof. Guido Zuccon
This session will be conducted online via Zoom: https://uqz.zoom.us/j/89362232168
About Data Science Seminar
This seminar series will be run as weekly sessions and is hosted by ITEE Data Science.