Research Applications

  • conEdison


    conEdison logoCurrent and recent Con Edison projects illustrate how the Center is applying machine learning to crucial challenges confronted by the public utility providing electricity and gas to the New York City metropolitan area.

  • Clinical Informatics

    Extract clinically useful medical knowledge from large amounts of medical data stored in Electronic Health Records (EHR) toward a better diagnosis and prevention of diseases and conditions.
    Clinical Informatics Group (CING) at CCLS

    Our group is dedicated to developing machine learning algorithms to leverage large medical data stored in Electronic Health Records. Our approach brings to bear new methods to derive accurate, multi-dimensional models from large collections of observational data.

    Our research team brings together machine learning, natural language processing experts, database architects and programmers from the Center for Computational Learning Systems (CCLS) at Columbia University along with clinicians from Columbia University Medical Center (CUMC).

    Project link:

  • Climate Informatics

    Forge collaborations between machine learning and climate science, in order to accelerate progress in answering pressing questions in climate science.
    Collaborations between machine learning and climate science, in order to accelerate progress in answering pressing questions in climate science

    The threat of climate change is one of the greatest challenges currently facing society.  Given the profound impact machine learning has made on the natural sciences to which it has been applied,

    Project Twiki link:

    Machines that speak with us (Spoken Dialogue Systems) rely disproportionately on accurate transcription of the speech signal into readable text. When the system has low confidence in the automatic speech recognition (ASR) of a caller's utterance, a typical dialogue strategy requires the system to repeat its best guess and ask for confirmation. This leads to unnatural interactions and dissatisfied callers. Our novel methodology, wizard ablation, collects simulated human-system dialogues that vary in controlled ways in order to investigate problem-solving strategies people would use if a person's abilities and options were restricted to be more like a machine's. Our testbed application, the CheckItOut dialog system, is modeled on a corpus of telephone transactions between patrons and librarians that we collected at New York City's Andrew Heiskell Braille & Talking Book Library. (Loqui, a Latin phrase meaning "I speak"; because the "I" in the case of an ablated wizard is neither the wizard nor the system, we like the alliterative allusion to Loki (lo-kee), the Norse god of mischief.)
    For Spoken Dialogue Systems (SDS), investigate human strategies for handling system errors


  • An Advanced Learning Paradigm: Learning Using Hidden Information

    Develop algorithms in the SVM family that allow extra information to be used effectively during training, with the understanding that this extra information will not be available during actual operation
    Learning extra information like structural homologies between proteins in a system designed to predict structure from amino acid sequences

  • An 'Early Warning' Device to Allow Epilepsy Patients to Live a More Normal Life

    To develop a wearable 'early warning' device for epilepsy patients using advanced machine learning technology
    Early Warning Device

    The goal of the proposed research is to develop a wearable "early warning" device
    attached to an implantable microelectrode array that will give otherwise untreatable
    epilepsy patients enough time to take a medicine or prepare for the seizure (e.g. get out of
    the pool, pull the car over to side of the road or get off a ladder or stairs). The device
    would use detector software based on advanced machine learning technology to detect an
    impending seizure. The learning system would be trained with data from the implanted

  • Online High Frequency Oscillation Detection

    To develop a combination of hardware and software to automatically detect High Frequency Oscillations (HFOs) in real-time and in a clinical setting
    Online High Frequency Oscillation Detection

    High frequency oscillations (HFOs), or brief bursts in the high gamma band (80-500 Hz), have been studied as potential biomarkers of epileptic activity. Since the early 1990's, it has been recognized that increased high gamma power is present within the epileptogenic region at seizure onset in adults (Allen, Fish et al. 1992; Alarcon, Binnie et al. 1995) and children (Fisher, Webber et al. 1992; Traub, Whittington et al. 2001). Interictal fast ripples (Figure 1) have been detected almost exclusively in epileptogenic regions (Staba, Wilson et al. 2002; Jacobs, Levan et al.

  • Estimation of Mean Time Between Failures (MTBF) of Electrical Feeders and Related Components

    The project aims to estimate the time between (and to) failures of primary distribution feeders and their components (such as sections and joints).

    In the New York City Power Grid, electricity is transmitted via primary distribution feeders between the high voltage transmission system and the household-voltage secondary system. These feeders are susceptible to different kinds of failures such as emergency isolation caused by automatic substation relays (Open Autos), failing on test, maintainence crew noticing problems and scheduled work on different sections of the feeder.

  • EEGMine: A Distributed Framework for Learning on iEEG Data

    This project aims to develop a distributed framework for Data Management and Machine Learning on iEEG data obtained from Epilepsy patients
    Distributed Data Mining (DDM) on iEEG data

    Sample of EEG Graph from a patient

    Project Twiki link:
  • CADIM: Columbia Arabic Dialect Modeling

    Arabic Dialect Modeling for Speech and Natural Language Processing