Speaker: Suresh Pokharel
Host: Dr Guido Zuccon

Seminar Type:  PhD Thesis Review


The ability to rapidly identify patient similarity based on their electronic health record (EHR) is a fundamental task for several clinical informatics applications. It includes cohort selection, treatment recommendation, patient stratification, medical prognosis, personalized medicine, and unlocking the potential of medical analytics methods for healthcare intelligence.

The effective representation of EHR data is paramount to effective computational similarity methods. However, health data in EHR presents many challenges that make the representation more complicated; these include temporal aspects, multivariate, heterogeneous and irregular data, and data sparsity. In this research, we mainly focus on the multivariaty and temporality nature of EHR due to which EHR contain many complex inherent relationships. The critical importance for this is the modelling of (i) compound information -- multiple clinical events for a patient occur at the same point of time (or within a short period), (ii) clinical patterns -- frequent common sequential patterns that are associated with specific sequences of clinical events. The modelling of these relationships uncovers important information for similarity computing, which was ignored by the previous studies.

To address the above-listed problems, firstly, we propose a new method for general-purpose EHR representations called Temporal Tree: a temporal hierarchical model which, based on temporal co-occurrence, preserves the compound information found at different levels in EHR. Besides, this representation is augmented using the doc2vec embedding technique exploited for patient's similarity computation. Secondly, we further utilize the compound information with the help of Temporal Tree technique for discriminative feature generation, especially design for the mortality prediction task. Also, to test the impact of preserving temporal information, we capture compound information in terms of patient situations (i.e., the patient's clinical condition at a particular point of time), and represent a patient as a sequence of patient situations. This is contrasted with the baseline approach of defining a patient by the associated sequence of clinical events (bag-of-words like). Thirdly, along with Temporal Tree, we further apply sequential pattern mining with gap constraint to discover more complex clinical patterns. Then, we consider these clinical patterns along with compound information for effective EHR representation for similarity computing. 

We empirically investigate our proposed EHR data representation, along with several state-of-the-art benchmarks, on a real EHR dataset (MIMIC III) based on two task types within an Intensive Care Unit setting: (i) similar patients retrieval (ii) sepsis prediction and mortality prediction. The empirical results show that the representation of EHR with Temporal Tree and  Temporal Tree with sequential pattern mining significantly improves patient representation and computing their similarities. The availability of effective EHR data representation and similarity computing would allow for improvements in many advanced health informatics applications such as personalized healthcare, patient stratification, patient's clinical pathway analysis.


Suresh Pokharel received his B.Eng. from Pokhara University, Nepal and M.Eng. from Asian Institute of Technology, Thailand in 2005 and 2010 respectively. Currently, he is a PhD candidate under the supervision of Dr. Guido Zuccon, Prof. Xue Li and Dr. Yu Li. His research interests include data mining, health data representation learning and health data analytics.