top of page

 projects

Mini Projects

&

Paper Implementations

Audio-Visual Speech Separation using Deep Learning

maxresdefault.jpg

Bachelor's Thesis Project, Prof. Arijit Sur (CSE) & Prof. Rohit Sinha (ECE), IIT Guwahati, July 2019 to June 2020

 

The thesis of our work is to isolate the target speaker’s speech from a video with multiple simultaneous speakers with a multi-modal deep learning approach using both audio and visual cues (Lip motion) of the target speaker for speech separation.

We proposed a novel generative adversarial training architecture called AVSS-GAN for Audio-Visual Speech Separation for finer target speaker separation. The architecture uses 1D convolutions for direct speech signals and 3Dconvolutions for processing temporal lip motion. We optimize the Si-SNR loss for speech separation along with secondary enforcement from LSGAN. We use techniques like gradient penalty and spectral normalization to stabilize the training.
Our approach achieved a state-of-the-art performance with a Si-SNR of 12.1 dB and 8.95 dB for two speaker and three speaker settings, respectively, on the widely used LRS2 dataset.

Single Lead ECG signal Acquisition and Arrhythmia Classification using Deep Learning

download.png

Prof. Samarendra Dandapat, Professor, IIT Guwahati, Jan to April 2019

The goal is to perform single-lead ECG signal acquisition for which a portable device is designed which contains detecting, amplifying and filtering circuits to get an analog ECG signal of good quality from the electrodes placed on the body. And then classification is done on the acquired data into four different classes namely, Sinus Rhythm, AF Arrhythmia, Other Arrhythmias and Noise using a 34 layer Deep Convolutional network which is trained on an opensource single-lead ECG dataset from PhysioNet.org and achieved an accuracy of about 90%.

The Eye in the Sky - Satellite Image Classification

comparision.png

Built a deep learning pipeline for remote sensing classification by realizing it as a semantic segmentation problem. Experimented with different types of semantic segmentation methods namely UNet and PSPNet and achieved an accuracy of about 85% using a modified UNet architecture and a unique encoding technique on RGB ground truth images. Please refer to the GitHub repository for further details.

manideep kolla project
manideep kolla project

Generation of Images through human speech using Generative Adveserial Networks

Personal project, Feb to August 2018

Converting human speech to images using Sequential Conditional Generative Adveserial Nets which on a higher level can generate high-quality images from mere description through human speech. This will help us to convert our thoughts (in this case through speech) into images. This has many applications in artistic creations, image editing, image enhancement, etc. This project will be continued as a part of IIT Guwahati's artificial intelligence group IITG.ai.

bottom of page