EEGMine: A Distributed Framework for Learning on iEEG Data
The Center for Computational Learning Systems (CCLS) is collaborating with the Computational Neurophysiology Laboratory (CNL) in the Department of Neurology, Columbia University Medical School (CUMC) to develop a distributed framework for data management and machine learning on intracranial EEG data obtained from patients suffering from epilepsy.
Drs. Schevon and Emerson have initiated a trial of a dense, two-dimensional microelectrode array measuring 4mm x 4mm and containing a 10 x 10 grid of microelectrodes (NeuroportTM Neural Monitoring System, Cyberkinetics Neurotechnology Systems, Foxboro, MA) which can record over long periods of time at a sampling rate of up to 30kHz per channel from Layers IV and V of primate neocortex. To date approximately 30 TB of data has been collected. The large volume of complex EEG data compels us to rethink how we will deal with this “data avalanche”. The design of a data center for storage and analysis is particularly challenging since traditional methods of storing data on a single server does not allow machine learning algorithms to be computed within a reasonable time. Further, due to the conditions under which the data is collected, noise of multiple types and sources is pervasive; the data must be extensively cleaned and potential seizure precursors carefully labeled using direct visualization and automated detection methods focused on both long time periods and specific time points. The project is investigating mechanisms to (1) Develop a cluster architecture (such as Apache Hadoop) for the EEGMine Data Center that incorporates reliable storage and backup of EEG data obtained from patients with surgically implanted electrodes. (2) Develop a library of machine learning algorithms (EEGMine-ML library) based on existing EEG measures that show promise for detecting seizure precursors in a wide range of epilepsy syndromes that employ the parallelization available in the EEGMine Data Center to achieve highly efficient run-time performance. (3) Address the scalability needs of the algorithms in EEGMine-ML library potentially leveraging the Map Reduce programming paradigm.
This research will have immediate impacts both in the epilepsy research community and computer science research. Because of the uniqueness and value of human-derived microelectrode EEG data, it would be beneficial for the seizure prediction community to enable data sharing and long-distance collaborations with others working in this field. The most practical means of sifting through terabytes of complex EEG data is to combine distributed storage on a cluster with local processing to prepare data and generate meta-data that can be used as inputs for machine learning algorithms. Identification of physiologically significant patterns in turn requires development of novel distributed machine learning algorithms which will accelerate the transformation of distributed data into knowledge, facilitating scientific and clinical understanding of the data. From an education perspective, the project will benefit the EWarn Research Group which is part of CCLS and CUMC by training them in signal processing, machine learning and basics of EEG.
Project Website: http://www1.ccls.columbia.edu/~dutta/EEGMine
MathWorks, Inc signs a pilot agreement with CCLS to test scalability of Distributed Computing Server on the Amazon Cloud Infrastructure
A team from CCLS (Haimonti Dutta, Hatim Diab and Manoj Pooleery) is collaborating with MathWorks, Inc. to test the scalability of MATLAB Parallel Computing Toolbox and Distributed Computing Server (MDCS) on the Amazon EC2 / S3 Cloud.
Drs. Dutta, Waltz, Schevon and Emerson win an NSF award to work on a Distributed Framework for Learning on EEG Data obtained from Epilepsy Patients
Project Name: EEGMine: A Distributed Framework for Learning
on EEG Data obtained from Epilepsy Patients
PI/Co-Pis: Haimonti Dutta, David Waltz, Catherine A Schevon
and Ronald Emerson
NSF Project ID: IIS-0916186
Award: $440,000 for 2 years