CS507: Multimedia Systems
Semester I, 2021-22

 

Course Information Lectures/Calendar Quizzes Labs

Course Information

Lectures: Thu - 11:00 AM, Fri - 12:00 AM
Labs: Fri - 5:00 PM

Objectives

This course lays the foundation for students to build multimedia systems. Multimedia systems involve automated analysis and fusion of multiple types of data such as text, images, video, audio, and various sensors. The course covers state-of-the-art tools and techniques for multimedia content processing, compression, fusion, summarization, search and retrieval applicable to different areas such as social media, homeland surveillance and privacy. The objective of this course is to prepare students to develop systems using multi-source information commonly and readily available in the form of Big Data in Internet of Things and Smart Cities paradigms.

Outcomes

By taking this course, the students will be able to find answer to the following questions:
  • How to capture, analyse, and compress multimedia (text, audio, and image, and video) data?
  • How fuse multimedia data data to build multimedia systems?
  • Prerequisite

    CSL201 (Data Structures) for CSE B. Tech Students

    Course Requirements

    Student are required to attend two lectures per week. In addition, there will be weekly lab sessions. During lab sessions, the students are required to solve and implement programming assignments.

    Grading Policy

    There will be lab exercises, homework assignments, quizzes, a mid-semester exam, a final exam and project. The tentative grade distribution is as follows:

    Quizzes (top n-1): 10%
    Lab Exercises (top n-1): 20%
    Mid-semester exam: 20%
    Final exam: 20%
    Project: 30%
    A student must score at least 33% marks to pass the course.

    Attendance Requirement

    There is no attendence requirement; however, students with more than 75% attendance would be considered punctual for future recommendations. During lectures:
  • BE SHARP ON TIME
  • STAY THROUGH THE LECTURE (DON'T LEAVE IN-BETWEEN THE LECTURE)
  • It is advised to not indulge in any activity during the lecture that might disturb other students or the instructor.

    Code of Ethics & Professional Responsibility

    Discussions that help the student understand a concept or a problem are encouraged. However, each student must submit original work. Plagiarism/copying of any form will be dealt with strict disciplinary action. This involves copying from the Internet, textbooks and any other material for which you do not own the copyright. Copying/Lending the code (or part of the code) to others will be considered plagiarism too. If authorized by the instructor, eode reuse is allowed with explicit reference to the source. Students who violate this policy will directly receive an F grade in the course. Remember - Your partial submission can fetch you some points, but submitting other's work as your own can result in you failing the course. Please talk to the instructor if you have questions about this policy. All academic integrity issues will be handled in accordance with institute regulations.

    Textbooks

    Primary Textbook

    There is no single textbook for the course. We will rely heavily on the web sources for the content. Few possible reference books are given below:

    Reference Books

    1. Fundamentals of Multimedia, Authors: Li, Ze-Nian, Drew, Mark S., Liu, Jiangchuan, Publisher: Springer, Year 2014. [Link].

    Language/Tools

    Primarily Python

    Teaching Assistant

    Pratibha Kumari (2017csz0006@iitrpr.ac.in)

    Contact Me

    By appointment at Room No. 319, S. Ramanujan Block, Permanent Campus, IIT Ropar [offline] or mukesh@iitrpr.ac.in [online].

    Tentative Topics

    • Audio: Sampling, quantization, time-domain audio features (ZCR, Energy), frequency-domain audio features (MFCC, Spectral), windowing and spectrogram, pitch detection, speaker recognition (GMM and HMM), audio fingerprinting and alignment.
    • Text: Bag-of-words (BoW), TF-IDF, Text clustering, Bottom-up and top-down clustering, n-grams, sentiment analysis.
    • Image: Image representation, HoG, SIFT, SVM, ANN, CNN.
    • Video: Motion vectors, Foreground detection using Adaptive Gaussian Mixture Model (AGMM), Object tracking using particle filters.
    • Information Fusion/Case Studies
    • Compression [if time permits]: MP3, MP4, JPEG, Text compression
    *The topics may not be covered in the same order.

    Quizzes

    There will be 3 quizzes, the top 2 will count towards your grade.

    Projects

    Projects are to be done individually or in a group of two max. A list of topics will be added soon. Project requirements:
    • You should have at least one data fusion component in the project. You need to clearly justify and demonstrate advantage of using fusion.
    • The reports must be prepafed in ACM Multimedia LaTeX format. Good quality English is expected in the report.
    • The code should be submitted through GitHub or Bitbucket repository. You can make a private repository and show me with your login. I will observe the activities on repository (commits, etc.) to check the progress.
    • Dataset can be submitted through Pen Drive of Google Drive.
    • You are free to use resources (code) available on the Internet with proper references. However, during evaluation you need to explicitly mention parts with your work.
    • There will be marks for creativity/novelty in the project.
    • There will be weekly evaluations of the project after the mid-sem exam.

    Lectures and Calendar

    Lectures Dates Topics Readings Events
    L1-2 Introduction, Python basics Lab1: Python basics
    L3-4 Machine Learning Basics: KNN, K-Means, Naive Bayse, SVM
    L5-6 Signals and Systems, Audio Basics, Time domain features Lab: Audio classification
    L7-8 Audio spectral features, MFCC, Artificial Neural Networks
    L9-10 Audio Spectrogram, Audio Alignment, and Fingerprinting Evaluation 1
    L11-12 Speaker recognition using Gaussian Mixture Model (GMM) and Hiddem Markov Model (HMM) Lab: Audio classification using ANN
    L13-14 Text representation, Bag of Words
    L15-16 Clustering, LBG, agglomerative clustering Lab: Document clustering
    L17-18 Sentiment Analysis, g-grams, topic modelling Lab evaluation 2
    L19-20 Image basics, HoG, SIFT
    L21-22 Image analysis using Convolutional Neural Networks Lab: Classification using CNN
    L23-24 Video foregreound detection using Adaptive Gaussian Mixture Model, Object Tracking using Particle Filters Lab evaluation 3
    L25-26 Information fusion techniques Lab: BG/FG classification using AGMM
    L27-28 Data compression techniques: MP3, JPEG, MPEG Lab: Audio-visual fusion, Lab evaluation 4

    *This is a tentative schedule. The schedule can change according to the need at the discretion of the instructor.

    Scroll to top

    Lab Exercises

    There will be weekly lab sessions. The weekly lab assignments will contain practice component and graded component. The TA will help with the practice component. The student has to complete the graded component independently and submit within two days of the lab session. Any plagiarism incident will be reported to the academic section immediately. There will be four evaluations of 5 marsk each, all through VIVA. Each evaluation may cover one or two lab submissions.