Manideep Kolla | Projects

projects

Mini Projects

&

Paper Implementations

Visit GitHub

Audio-Visual Speech Separation using Deep Learning

Bachelor's Thesis Project, Prof. Arijit Sur (CSE) & Prof. Rohit Sinha (ECE), IIT Guwahati, July 2019 to June 2020

The thesis of our work is to isolate the target speaker’s speech from a video with multiple simultaneous speakers with a multi-modal deep learning approach using both audio and visual cues (Lip motion) of the target speaker for speech separation.

We proposed a novel generative adversarial training architecture called AVSS-GAN for Audio-Visual Speech Separation for finer target speaker separation. The architecture uses 1D convolutions for direct speech signals and 3Dconvolutions for processing temporal lip motion. We optimize the Si-SNR loss for speech separation along with secondary enforcement from LSGAN. We use techniques like gradient penalty and spectral normalization to stabilize the training.
Our approach achieved a state-of-the-art performance with a Si-SNR of 12.1 dB and 8.95 dB for two speaker and three speaker settings, respectively, on the widely used LRS2 dataset.

View My Thesis

Single Lead ECG signal Acquisition and Arrhythmia Classification using Deep Learning

Prof. Samarendra Dandapat, Professor, IIT Guwahati, Jan to April 2019

The goal is to perform single-lead ECG signal acquisition for which a portable device is designed which contains detecting, amplifying and filtering circuits to get an analog ECG signal of good quality from the electrodes placed on the body. And then classification is done on the acquired data into four different classes namely, Sinus Rhythm, AF Arrhythmia, Other Arrhythmias and Noise using a 34 layer Deep Convolutional network which is trained on an opensource single-lead ECG dataset from PhysioNet.org and achieved an accuracy of about 90%.

View Project on GitHub

The Eye in the Sky - Satellite Image Classification

Inter IIT Tech Meet 2018, IIT Bombay

Built a deep learning pipeline for remote sensing classification by realizing it as a semantic segmentation problem. Experimented with different types of semantic segmentation methods namely UNet and PSPNet and achieved an accuracy of about 85% using a modified UNet architecture and a unique encoding technique on RGB ground truth images. Please refer to the GitHub repository for further details.