Document Re-ranking by Generality
Speaker: Xin Yan, ITEE
When: 2005-07-12 09:00:00
Venue: 78-420
Host: Dr. Xue Li
Abstract:Document ranking is a fundamental feature for an information
retrieval system. Traditional document ranking methods, e.g., cosine
similarity, often measure how relevant a document is to a query
based on similarity computations. Due to information explosion and
the popularity of WWW information retrieval, however, the
sufficiency of using similarity alone to rank documents has been
questioned. This report argues to take into account the factor of
"generality". As a complement to the traditional similarity-based
ranking, generality refers to how general or conclusive it is for a
document or a query describing a certain topic. For example, a query
may aim to find literature reviews in a broader scope rather than
technical papers. Given a large set of relevant documents retrieved
by an IR system, user may expect those general documents to be moved
upward to the top of the list. As such, the user can then have an
overview on the topic before getting into more specific technical
details. This is particularly the case in some specific domains such
as bio-medical IR. To address this problem, we propose to re-rank
the retrieved documents via generality. A novel ontology based
approach for calculating document generality is developed via
analyzing the scope and semantic cohesion. The documents are then
re-ranked by a combined score of similarity and the closeness of
documents' generality to the query's. Experiments have been
conducted on a large scale bio-medical text corpus, OHSUMED which is
a subset of MEDLINE collection containing 348,566 medical journal
references and 101 test queries. Our approach has demonstrated an
encouraging performance. The future study will focus on three
issues. The first is to enhance our proposed algorithms, the second
is a comprehensive evaluation of our proposed approach, the third is
to consider the application of our algorithms in a general
domain. The final goal of our study is to significantly improve the
generality-based retrieval.
Biography:(biography unavailable)
Type: Ph.D confirmation
Contact:Dr. Xue Li, seminar host (xueli@itee.uq.edu.au)
or Guido Governatori (ITEE seminar co-ordinator)
(guido@itee.uq.edu.au)
