School of Information Technology and Electrical Engineering
Semester 1, 2012
COMP4702/COMP7703 - Machine Learning
Course Material
Lecture Notes
Notes are listed here in the order that we will cover them in the course. These slides are based on those provided by Alpaydin (the author of the text), with modifications made where possible.
- Introduction (Ch. 1) - Slides
- Supervised Learning (Ch. 2) - Slides (updated to new version 16/3 - minor additions)
- Occam's Razor (wikipedia entry)
- VC Dimension: a useful reference is via the Support Vector Machines" .org website (see later in the course for SVMs!).
- PAC Learning: for a better overview of this area, see the Russel and Norvig text, Section 18.5.
- Bayesian Decision Theory (Ch. 3) - Slides
- Thomas Bayes (wikipedia entry)
- Parametric Methods (Ch. 4) - Slides
- Multivariate Methods (Ch. 5) - Slides
- Dimensionality Reduction (Ch. 6) - Slides
- Clustering (Ch. 7) - Slides
- k-means figures: [Bis] pic1, [Bis] pic2, [HTF] pic3.
- Gaussian mixture model figures: [Bis] pic1, [Bis] pic2.
- Dendrogram figures: [HTF] pic1, [HTF] pic2.
- Jain, A., Murty, M. and Flynn, P. Data Clustering: A Review. ACM Computing Surveys 34(3), 1999.
- Xu, R. and Wunsch II, D. Survey of Clustering Algorithms. IEEE Transactions on Neural Networks 16(3), 2005.
- A nice applet illustrating Gaussian mixture models and the EM algorithm.
- A nice web page on Cluster analysis, with some examples using hierarchical clustering/dendrograms.
- Nonparametric Methods (Ch. 8) - Slides
- Kernel density estimation figures: [Bis] pic1, [HTF] pic2, [HTF] pic3.
- k-Nearest Neighbour figures: [Bis] pic1 (density estimation), [HTF] pic2 (classification), [HTF] pic3.
- Multilayer Perceptrons (Ch. 11) - Slides
- MLP figures: [Bis] pic1, [DHS] pic1, [DHS] pic2, [DHS] pic3, [HTF] pic2, [HTF] pic3.
- and some additional notes.
- Combining Multiple Learners (Ch. 17) - Slides
- Design and Analysis of Machine Learning Experiments (Ch. 19) - Slides
- Graphical Models (Ch. 16) - Slides
- Probabilistic Graphical Models: link to book website (Chapter 1 can be downloaded and is required reading).
- SpamBayes - popular spam filter based on Naive Bayes models.
- An article on Bayesian networks in the Microsoft Assistant (link).
- A truly awesome list of links to Bayesian net resources (link). Look here for many real-world example applications of Bayesian nets, as well as information about advanced topics such as learning in Bayesian nets and inference in non-DAG graphical models.
- Linear Discrimination (Ch. 10) - Slides
- Kernel Machines (Ch. 13) - Slides
- A few supplementary slides (from lectures, summarizing motivation for logistic sigmoid, cross-entropy, etc.)
- A nice applet illustrating Support Vectors Machines for 2-D classification.
- Bayesian Estimation (Ch. 14) - Slides
- Additional slides (on Gaussian Processes) that I will use during the lecture.
- Hidden Markov Models (Ch. 13) - Slides
Textbooks
- Course text: Introduction to Machine Learning, second edition. Ethem Alpaydin, The MIT Press, February 2010. Book Website
- Reference texts:
- R. Duda, P. Hart and D. Stork. Pattern Classification, Second edition. Wiley, 2001.
- [Bis] Bishop, C. M. Pattern Recognition and Machine Learning. Springer, 2006.
- [HTF] T. Hastie, R. Tibshirani and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition. Springer. 2009. You can download the entire book for free!
- D. Hand, H. Mannila and P. Smyth, Principles of Data Mining, MIT Press, 2001.
- The text for the AI course (COMP3702) is a useful reference - Russell S. and Norvig P., Artificial Intelligence: A modern approach, 2nd ed., 2003. Prentice Hall.
- C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006. This one is also available to download - book website.
Pracs
- Prac 1 (Week 2): Introduction to Classification and Weka. Note: this would also be a good chance to have a look at Matlab if you are not familiar with it - working through some of the "Matlab Primer" document below would be a good start.
- Prac 2 (Week 3): Regression and Parametric Models
- Temperature dataset
- Iris dataset (in plain text format from UCI repository)
- Prac 3 (Week 4): Dimensionality Reduction using Principal Component Analysis
- Prac 4 (Week 5): Clustering
- Prac 5 (Week 6): Nonparametric techniques
- Prac 6 (Week 7): Single and
Multilayer Perceptrons and Trajan
- Sonar dataset (bdf format)
- Gorman, R. P.; Sejnowski, T. J.; Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets, Neural Networks, 1, 75-89, 1988 (PDF)
- Neural Computing notes 2
- Neural Computing notes 3
- Prac 7 (Week 10): Assessing algorithms, Bagging and Boosting
- (Use diabetes dataset from Prac 3).
- Sonar dataset
- Ionosphere dataset
- Glass dataset
- D. Opitz and R. Maclin. Popular ensemble methods: an empirical study. Journal of Artificial Intelligence Research v.11 (1999) pp.169-198. (PDF)
- Prac 8 (Week 11): Bayesian Networks
- Prac 9 (Week 12): Support Vector Machines
Assignments
The assignments will be comprised of some of the questions on the pracs. If you complete each prac, producing your assignment will be quite easy. (NB: in the assignment question a.b refers to question b from Prac a).
- Assignment 1: Questions 1.4, 1.5, 2.2, 2.3, 3.1, 3.2, 3.4. Due Wednesday, 4/4/12.
Please submit via the course Blackboard site. - Assignment 2: Questions 4.1, 4.2, 5.2, 5.3, 6.3, 6.4, 6.5, 6.7. Due 11:59pm Wednesday, 2/5/12.
Please submit via the course Blackboard site. - Assignment 3: Questions 7.5, 7.6, 8.4, 8.5, 9.3, 9.4. Due 11:59pm Friday, 1/6/12.
Please submit via the course Blackboard site.
Case Study
The case study is due 16/5/12.
Please submit via the course Blackboard site.
General Guidelines
The course profile gives a brief overview of what is expected for this assessment task. Generally, you will need to focus on a research paper and attempt to reproduce the experimental results presented in that paper. This may involve implementing one or more algorithms if the paper is proposing a new algorithm that we do not have an implementation of. Note that this task is intentionally flexible: you should expect to have to make assumptions and encounter problems as you work on it. It may not be possible to exactly or fully reproduce the entire results of a paper, in which case you may need to refocus your work or make modifications. How you deal with this process is part of the assessment!
Your case study should take the form of a brief report, describing what you have done and presenting your results.
Exemplars
Here are some examples of very good case studies from previous years (used with permission). Note that the papers used for these CANNOT be chosen for your case study.
- Elizabeth Alpert's Case Study. This was based on the paper:
- A. Tsymbal et al. Eigenvector-based feature extraction for classification. Proc. 15th Int. FLAIRS Conference on Artificial Intelligence, Pensacola, FL, USA, AAAI Press, pp.354-358, 2002.
- Hilton Bristow's Case Study. This was based on the paper:
- D. Pelleg and A. Moore. X-means: extending K-means with efficient estimation of the number of clusters. Proc. 17th Int. Conference on Machine Learning, pp.722-734, 2000.
Candidate Papers
- B. D. Ripley.
Neural networks and related methods for classification. Journal of the Royal
Statistical Society Series B 56(3): 409-456, 1994.
- This is a long paper by a very well-known Professor of Statistics from Oxford. The paper is an excellent resource for some of the material in the course (though detail is sometimes more than we need). One possibility for the case study would be to focus on the parametric classification techniques that he has used for the results presented in the paper.
- M. A. Hall and G. Holmes. Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data engineering 15(6): 1437-1447, 2003.
- B. Lerner et al. A comparative study of neural network based feature extraction paradigms. Pattern Recognition Letters 20:7-14, 1999.
- J. Kolen and J. Pollack. Backpropagation is sensitive to initial conditions. Available as a Technical Report and then as a paper in Advances in neural information processing systems 3, pp860-867, 1991.
- M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. Proceedings of the IEEE international conference on neural networks, pp.586-591, 1993.
- Gaétan Monari and Gérard Dreyfus. Local Overfitting Control via Leverages. Neural Computation 2002 14:6, 1481-1506. You would probably also want to look at the comments made on this paper and the authors response (see paper at the bottom of the webpage following the previous link).
- S. D. Bay. Nearest neighbor classification from multiple feature subsets. Intelligent Data Analysis 3(3):191-209, 1999.
- D. R. Wilson and T. R. Martinez. Reduction techniques for instance-based learning algorithms. Machine Learning 38(3):257-286, 2000.
- G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006.
- Available here. If you want to do this paper you might need to talk to me to discuss the scope of your experiments. But I wanted to suggest it because it is so interesting!
- R. Sitte and J. Sitte. Analysis of the predictive ability of time delay neural networks applied to the S&P 500 time series. IEEE Transactions on Systems, Man and Cybernetics C 30(4): 568-572, 2000.
- F. Camastra and A. Verri. A novel kernel method for clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(5), 2005.
- D. Martens et al. Classification with Ant Colony Optimization. IEEE Transactions on Evolutionary Computation 11(5): 651-665, 2007.
- P. Scheunders. A comparison of clustering algorithms applied to color image quantization. Pattern Recognition Letters 18: 1379-1384, 1997.
- D. Comaniciu and P. Meer. Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5): 603-619, 2002.
Assessment criteria for the case study is summarised here.
Exams
The 2006 - 2011 exams are available from the library web.
The 2005 exam is also available. Note however that the course content has changed extent, hence some of the 2005 exam is irrelevant for you. In particular, you should ignore questions: 3(a), most of(b), (d); 6. Some of Q1 is a little out of context also. Please ask the lecturer if you need more clarification about the 2005 exam questions.
Study Guide/notes
Note that we do NOT cover the following material in lectures (i.e. it is not examinable/assessable):- Chapter 7: 7.5, 7.6.
- Chapter 13: 13.4, 13.6-13.12.
- Chpater 14: 14.3 (though it may be useful background for Gaussian Processes).
- Chapter 15: 15.7-15.10.
- Chapter 16: 16.4-16.8.
- Chapter 17: 17.5, 17.8-17.11.
- Chapter 19: 19.8-19.13.
Reference Material
- Matrix Identities, by Sam Roweis (source)
- Introduction to Probability Models, by Em Prof Tom Downs.
- Weka software website.
- Weka Explorer User Guide
- Neural Computing notes 9
- Neural Computing notes 10
- UCI Machine Learning Repository
- Matlab primer (An Introduction to Matlab for Cognitive Programming) by Scott Bolland.
- Hidden Markov Model tutorial (from University of Leeds).
Last modified: 23/5/12.
