MSE Master of Science in Engineering

The Swiss engineering master's degree


Each module contains 3 ECTS. You choose a total of 10 modules/30 ECTS in the following module categories: 

  • 12-15 ECTS in technical scientific modules (TSM)
    TSM modules teach profile-specific specialist skills and supplement the decentralised specialisation modules.
  • 9-12 ECTS in fundamental theoretical principles modules (FTP)
    FTP modules deal with theoretical fundamentals such as higher mathematics, physics, information theory, chemistry, etc. They will teach more detailed, abstract scientific knowledge and help you to bridge the gap between abstraction and application that is so important for innovation.
  • 6-9 ECTS in context modules (CM)
    CM modules will impart additional skills in areas such as technology management, business administration, communication, project management, patent law, contract law, etc.

In the module description (download pdf) you find the entire language information per module divided into the following categories:

  • instruction
  • documentation
  • examination 
Information Retrieval and Data Mining (TSM_InfData)

-

Prerequisites

  • Knowledges in the field of relational databases
  • Basic knowledge of statistics
  • Good basics of object-oriented programming (Java)

Learning Objectives

  • The course provides an introduction to the field of information retrieval and the multidisciplinary field of data mining.
  • Students are familiar with the architecture of an information retrieval system.
  • They are familiar with IR models (Boolean and Vector) and the use of these models to determine the weight of indexing terms and to calculate the correspondence between documents and queries.
  • They understand the different measures of evaluation of an information retrieval system and are able to apply the comparison algorithms and interpret their results.
  • They are familiar with the use of the Apache Lucene library for indexing and information retrieval according to the Boolean and vector model.
  • They are familiar with techniques for detecting similar documents using "Localitiy Sensitive Hashing" algorithms.
  • Students understand the use of modern database technologies for the processing and management of large data collections.
  • Students receive an introduction to the field of multidimensional databases, data warehousing models, OLAP techniques. They are familiar with new data structures (data types) that are alternatives to relational (including non-relational) database management systems (RDBMS) and are able to determine which data types and database system are appropriate for the context and the type of data available.
  • They are familiar with data pre-processing techniques (the concept of data quality and methods for data cleaning, data integration, data reduction, data transformation and data discretization).
  • They are familiar with the main data mining tasks and the main associated methods: descriptive data analysis, market basket analysis (association rules), classification (decision trees), clustering (hierarchical and non-hierarchical), estimation, detection of outliers, etc.
  • They are able to reuse the knowledge acquired during this course in their own work environment and apply it to solve their specific problems.

Contents of Module

The module is divided into two parts, the first is dedicated to the field of information retrieval and the second to the field of data mining :

  • Basic concepts of IR
  • Boolean retrieval model
  • Vector space model and efficient ranking
  • Query refinement
  • Evaluation of IR systems
  • The Lucene API for Information Retrieval and evaluation
  • Near duplicate detection
  • Introduction to Data Warehousing and OLAP
  • Data pre-processing
  • Introduction to Data Mining
  • Classification
  • Market basket Analysis
  • Clustering
  • Estimation

Information Retrieval: 7 weeks
Data mining: 7 weeks

Teaching and Learning Methods

Lectures, exercises, labs.

Literature

Optional literature suggestion (books):

  • DB: Lena Wiese: Advanced Data Management for SQL, NoSQL, Cloud and Distributed Databases. De Gruyter Textbook. 2015. ISBN 978-3-11-044140-6.
  • IR: "Modern Information Retrieval". Baeza-Yates & Ribeiro-Neto, New York (2011). ISBN: 9780321416919.
  • IR: Introduction to Information Retrieval. C.D. Manning, P. Raghavan, H. Schütze. Cambridge UP, 2008. Classical and web information retrieval systems: algorithms, mathematical foundations and practical issues.
  • IR: Information Retrieval in Practice. B. Croft, D. Metzler, T. Strohman. Pearson Education, 2009.

Download full module description

Back