Course Information
Lectures: Thu - 11:00 AM, Fri - 12:00 AM
Labs: Fri - 5:00 PM
Objectives
This course lays the foundation for students to build multimedia systems. Multimedia systems involve automated analysis and fusion of multiple types of data such as text, images, video, audio, and various sensors. The course covers state-of-the-art tools and techniques for multimedia content processing, compression, fusion, summarization, search and retrieval applicable to different areas such as social media, homeland surveillance and privacy. The objective of this course is to prepare students to develop systems using multi-source information commonly and readily available in the form of Big Data in Internet of Things and Smart Cities paradigms.Outcomes
By taking this course, the students will be able to find answer to the following questions:Prerequisite
CSL201 (Data Structures) for CSE B. Tech StudentsCourse Requirements
Student are required to attend two lectures per week. In addition, there will be weekly lab sessions. During lab sessions, the students are required to solve and implement programming assignments.Grading Policy
There will be lab exercises, homework assignments, quizzes, a mid-semester exam, a final exam and project. The tentative grade distribution is as follows:Quizzes (top n-1): 10%
Lab Exercises (top n-1): 20%
Mid-semester exam: 20%
Final exam: 20%
Project: 30%
A student must score at least 33% marks to pass the course.
Attendance Requirement
There is no attendence requirement; however, students with more than 75% attendance would be considered punctual for future recommendations. During lectures:Code of Ethics & Professional Responsibility
Discussions that help the student understand a concept or a problem are encouraged. However, each student must submit original work. Plagiarism/copying of any form will be dealt with strict disciplinary action. This involves copying from the Internet, textbooks and any other material for which you do not own the copyright. Copying/Lending the code (or part of the code) to others will be considered plagiarism too. If authorized by the instructor, eode reuse is allowed with explicit reference to the source. Students who violate this policy will directly receive an F grade in the course. Remember - Your partial submission can fetch you some points, but submitting other's work as your own can result in you failing the course. Please talk to the instructor if you have questions about this policy. All academic integrity issues will be handled in accordance with institute regulations.Textbooks
Primary Textbook
There is no single textbook for the course. We will rely heavily on the web sources for the content. Few possible reference books are given below:Reference Books
- Fundamentals of Multimedia, Authors: Li, Ze-Nian, Drew, Mark S., Liu, Jiangchuan, Publisher: Springer, Year 2014. [Link].
Language/Tools
Primarily PythonTeaching Assistant
Pratibha Kumari (2017csz0006@iitrpr.ac.in)Contact Me
By appointment at Room No. 319, S. Ramanujan Block, Permanent Campus, IIT Ropar [offline] or mukesh@iitrpr.ac.in [online].Tentative Topics
- Audio: Sampling, quantization, time-domain audio features (ZCR, Energy), frequency-domain audio features (MFCC, Spectral), windowing and spectrogram, pitch detection, speaker recognition (GMM and HMM), audio fingerprinting and alignment.
- Text: Bag-of-words (BoW), TF-IDF, Text clustering, Bottom-up and top-down clustering, n-grams, sentiment analysis.
- Image: Image representation, HoG, SIFT, SVM, ANN, CNN.
- Video: Motion vectors, Foreground detection using Adaptive Gaussian Mixture Model (AGMM), Object tracking using particle filters.
- Information Fusion/Case Studies
- Compression [if time permits]: MP3, MP4, JPEG, Text compression
Quizzes
There will be 3 quizzes, the top 2 will count towards your grade.Projects
Projects are to be done individually or in a group of two max. A list of topics will be added soon. Project requirements:- You should have at least one data fusion component in the project. You need to clearly justify and demonstrate advantage of using fusion.
- The reports must be prepafed in ACM Multimedia LaTeX format. Good quality English is expected in the report.
- The code should be submitted through GitHub or Bitbucket repository. You can make a private repository and show me with your login. I will observe the activities on repository (commits, etc.) to check the progress.
- Dataset can be submitted through Pen Drive of Google Drive.
- You are free to use resources (code) available on the Internet with proper references. However, during evaluation you need to explicitly mention parts with your work.
- There will be marks for creativity/novelty in the project.
- There will be weekly evaluations of the project after the mid-sem exam.
Lectures and Calendar
Lectures | Dates | Topics | Readings | Events |
---|---|---|---|---|
L1-2 | Introduction, Python basics | Lab1: Python basics | ||
L3-4 | Machine Learning Basics: KNN, K-Means, Naive Bayse, SVM | |||
L5-6 | Signals and Systems, Audio Basics, Time domain features | Lab: Audio classification | ||
L7-8 | Audio spectral features, MFCC, Artificial Neural Networks | |||
L9-10 | Audio Spectrogram, Audio Alignment, and Fingerprinting | Evaluation 1 | ||
L11-12 | Speaker recognition using Gaussian Mixture Model (GMM) and Hiddem Markov Model (HMM) | Lab: Audio classification using ANN | ||
L13-14 | Text representation, Bag of Words | |||
L15-16 | Clustering, LBG, agglomerative clustering | Lab: Document clustering | ||
L17-18 | Sentiment Analysis, g-grams, topic modelling | Lab evaluation 2 | ||
L19-20 | Image basics, HoG, SIFT | |||
L21-22 | Image analysis using Convolutional Neural Networks | Lab: Classification using CNN | ||
L23-24 | Video foregreound detection using Adaptive Gaussian Mixture Model, Object Tracking using Particle Filters | Lab evaluation 3 | ||
L25-26 | Information fusion techniques | Lab: BG/FG classification using AGMM | ||
L27-28 | Data compression techniques: MP3, JPEG, MPEG | Lab: Audio-visual fusion, Lab evaluation 4 |
*This is a tentative schedule. The schedule can change according to the need at the discretion of the instructor.