Drs. Dutta, Waltz, Schevon and Emerson win an NSF award to work on a Distributed Framework for Learning on EEG Data obtained from Epilepsy Patients

Project Name: EEGMine: A Distributed Framework for Learning 
on EEG Data obtained from Epilepsy Patients 

PI/Co-Pis: Haimonti Dutta, David Waltz, Catherine A Schevon
and Ronald Emerson
NSF Project ID: IIS-0916186
Award: $440,000 for 2 years


Project Description: The Center for Computational Learning Systems (CCLS) is collaborating with the Computational Neurophysiology Laboratory (CNL) in the Department of Neurology, Columbia University Medical School (CUMC) to develop a distributed framework for data management and machine learning on intracranial EEG data obtained from patients suffering from epilepsy.  
 
Drs. Schevon and Emerson have initiated a trial of a dense, two-dimensional
microelectrode array which can record over long periods of time at a sampling
rate of up to 30 kHz per channel. To date approximately 30 TB of data has been collected.
The large volume of complex EEG data compels us to rethink how we will deal with
this “data avalanche”. The design of a data center for storage and analysis is
particularly challenging since traditional methods of storing data on a single server do not
allow machine learning algorithms to be computed within a reasonable time. Further,
due to the conditions under which the data is collected, noise of multiple types and
sources is pervasive; the data must be extensively cleaned and potential seizure
precursors carefully labeled. The project is investigating mechanisms to develop a cluster architecture (using Apache Hadoop) for the EEGMine Data Center that incorporates
reliable storage and backup; develop a library of machine learning algorithms
(EEGMine-ML library) and address their scalability issues, potentially leveraging the
MapReduce programming paradigm.  
 
This research will have immediate impact for both epilepsy and computer science research. Because of the uniqueness and value of human-derived microelectrode EEG data, it would be beneficial for the seizure prediction community to enable data sharing and long-distance collaborations. The most practical means of sifting through terabytes of complex EEG data is to combine distributed storage on a cluster with local processing to prepare data and generate meta-data that can be used as inputs for machine learning algorithms thus enabling identification of
physiologically significant patterns. From an education perspective, the project will benefit
the EWarn Research Group which is part of CCLS and CUMC by training them in signal
processing, machine learning and basics of EEG.