19/09/05 

 

Invited Speakers

Dr Mehran Sahami

Google, Stanford University, USA
Email:
sahami at cs.stanford.edu


Improving Web Information Finding by Learning with Large Datasets

Web search is one of the most important applications used on the Internet. It also poses many interesting opportunities to apply machine learning and make use of massive amounts of data. In order to better help people find relevant information in a growing sea of data, we discuss various automated learning techniques that can be harnessed to sift, organize, and present relevant information to users. In this talk, we provide a brief background on information retrieval, and then look at some of the challenges faced in searching the Web in a variety of contexts.  Specifically, we examine applications of machine learning to improve information retrieval, image classification, topical inference for queries, and record linkage. We show how these tasks are directly related to the overarching goal of improving various aspects of search on the web.
.


Prof Geoff McLachlan

Department of Mathematics & Institute for Molecular Bioscience, University of Queensland
Email:
gjm@maths.uq.edu.au


Selection Bias and Other Issues in Applications of Machine Learning in Bioinformatics

New technologies in molecular biology allow measurements of DNA and proteins to be carried out in great detail and in large numbers ("high-throughput").  There is thus increasing interest in changing the emphasis of cancer diagnosis from morphologic to molecular.
In this context, the problem is to construct a classifier (prediction) rule that can accurately predict the class of origin of a tumor tissue on the basis of a feature vector, which is unclassified with respect to a known number of distinct classes. Here the feature vector consists of the expression levels on a very large number of genes. In applications concerned with the diagnosis of cancer, one class may correspond to cancer and the other to benign tumors. In applications concerned with patient survival following treatment for cancer, one class may correspond to the good-prognosis group and the other to the poor-prognosis group.  In this talk, we consider the problem of constructing a classifier for these applications and for the identification of "marker" genes that characterize the different tissue classes. We also consider the estimation of the associated error rate of the classifier and its standard error. In particular, we focus on the need to correct for the selection bias in using a classifier formed on the basis of a subset of the genes selected according to some "optimal" criterion from the very large number of available genes. This bias has been frequently overlooked in the bioinformatics literature.


Dr Adam Kowalcyzk

National ICT Australia (NICTA)
Email:
Adam.kowalczyk@nicta.com.au


Challenges of Learning from Very Small Size Sample in a Very High Dimensional Space

Biological domain poses new challenges for statistical learning due to very high dimensionality of the feature space (thousand dimensions) and small sample size (measured in tens). In the talk we shall analyze and theoretically explain some counter-intuitive experimental findings that systematic reversal of classifier decisions can occur when switching from training to the independent test data (the phenomenon of anti-learning).  We demonstrate this on  both natural and synthetic data and show that it is distinct from overfitting.

We also discuss necessary and sufficient conditions for perfect anti-learning. Using them we show that anti-learning is primarily a feature of the data set, and it affects a large number of algorithms such as Support Vector Machines, Gaussian Process classification, Parzen windows, Fisher Discriminant, Nearest Neighbours, or Boosting, and all practical choices of kernels (linear and non-linear).


[Home] [Program] [Invited Speakers] [Registration] [Accomodation] [Local Information] [Accepted Papers] [Author info] [Awards] [Special Issues] [Special Session] [Call for Papers] [Submission] [Proceedings] [Dates] [Committees] [Other  IDEALs] [Contact] [Related Events]