MSE Master of Science in Engineering

The Swiss engineering master's degree


Ogni modulo equivale a 3 crediti ECTS. È possibile scegliere un totale di 10 moduli/30 ECTS nelle seguenti categorie: 

  • 12-15 crediti ECTS in moduli tecnico-scientifici (TSM)
    I moduli TSM trasmettono competenze tecniche specifiche del profilo e si integrano ai moduli di approfondimento decentralizzati.
  • 9-12 crediti ECTS in basi teoriche ampliate (FTP)
    I moduli FTP trattano principalmente basi teoriche come la matematica, la fisica, la teoria dell’informazione, la chimica ecc. I moduli ampliano la competenza scientifica dello studente e contribuiscono a creare un importante sinergia tra i concetti astratti e l’applicazione fondamentale per l’innovazione 
  • 6-9 crediti ECTS in moduli di contesto (CM)
    I moduli CM trasmettono competenze supplementari in settori quali gestione delle tecnologie, economia aziendale, comunicazione, gestione dei progetti, diritto dei brevetti, diritto contrattuale ecc.

La descrizione del modulo (scarica il pdf)riporta le informazioni linguistiche per ogni modulo, suddivise nelle seguenti categorie:

  • Insegnamento
  • Documentazione
  • Esame
Analysis of Text Data (TSM_AnTeDe)

This module introduces the main methods of text analysis using natural language processing (NLP) techniques, from a computer / data science perspective. The methods are introduced in relation to concrete applications, in order to extract meaningful, structured knowledge in several dimensions from large amounts of unstructured texts. The knowledge and applications are complementary to those of information retrieval, with several commonalities (e.g. document representation), and advanced IR topics will be included as well.

This module is divided into three parts, each of them starting with the description of one or more text analysis problems. Then, the main methods needed to address them are defined, emphasizing their generality and reusability. Finally, for each part, the methods are instantiated and combined to enable concrete applications.

The three main parts are organized by increased sophistication of the analysis of language in texts:

  • Text analysis using bags-of-words (i.e. texts are considered as sets of independent words)
  • Text analysis using sequences of words
  • Text analysis using sentence structure (i.e. considering also the dependencies between words)

Requisiti

  • Mathematics: basic linear algebra (e.g. matrix multiplications), probability theory (e.g. Bayes theorem)
  • Statistics: basic descriptive statistics (e.g. mean, variance, hypothesis testing)
  • Programming: good command of Python or another programming language (C++, Java, etc.)
  • Machine learning: experimental framework (incl. data partitioning), simple classifiers (e.g. decision trees, Naive Bayes, SVMs), fundamentals of neural networks

Obiettivi di apprendimento

  • The students are able to categorize a text analysis problem and relate the type of analysis that is required and the features to be extracted to a range of known problems.
  • The students are able to identify text processing methods to leverage for solving a new problem.
  • The students are aware of a range of text processing tools and libraries and can adapt off-the-shelf systems to their needs.
  • The students understand the role of data and evaluation metrics. Given a text analysis problem they are able to design comparative experiments to identify the most promising solution.

Categoria modulo

Introduction [5%]: importance of text analysis; layers of language analysis; basic text processing tools and notions of deep learning; basic notions of information retrieval; data sources; evaluation methods; overview of the course.

Part A. Text analysis using bags-of-words [35%]
Motivating examples: text classification and sentiment analysis, need for word representations accounting for meaning and similarity, distributional semantics.

Methods for learning low-rank word representations from data with illustration of resulting vectors: topic models using LSA ; word embeddings using feed-forward neural networks.

Apply low-rank word representations to text classification, sentiment analysis, information retrieval and content-based text recommendation using bag-of-words models.

Part B. Text analysis using sequences of words [25%]
Motivating examples: predict the next word in a sequence, POS tagging, named entity detection, contextual word embeddings.

Methods: Hidden Markov models (HMMs), Conditional Random Fields (CRFs), n-grams, seq2seq neural networks, attention-only sequence-to-sequence models (Transformers).

Applications: POS tagging, NE recognition, language modeling, machine translation.

Part C. Text analysis using sentence structure [25%]
Motivating example: natural language inference (reasoning over sentences).

Methods: parsing, semantic role labeling, named entity linking, relationship and fact extraction, neural network models of dialogue.

Applications: solving logical entailment with deep neural networks, revisiting sentiment analysis with DNNs, question answering system; automatic information extraction from texts (entities, relationships, facts, events).

Part D. Special chapters [10%]
Perspectives on other text analysis tasks, on multilingual issues, question answering and dialogue, information retrieval and recommendation.

Metodologie di insegnamento e apprendimento

Classroom teaching; programming exercises

Bibliografia

Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2nd

edition, Prentice-Hall, 2008 / 3rd edition draft, online, 2021.

Introduction to Information Retrieval, Christopher Manning, Prabhakar Raghavan and Hinrich Schütze, 2008.

 

Neural Network Methods for Natural Language Processing, Yoav Goldberg, Morgan & Claypool, 2017.

Supplemental material (articles) will be indicated for each lesson.

Scarica il descrittivo completo del modulo

Indietro