Jedes Modul umfasst 3 ECTS. Sie wählen insgesamt 10 Module/30 ECTS in den folgenden Modulkategorien:
- 12-15 ECTS in Technisch-wissenschaftlichen Modulen (TSM)
TSM-Module vermitteln Ihnen profilspezifische Fachkompetenz und ergänzen die dezentralen Vertiefungsmodule.
- 9-12 ECTS in Erweiterten theoretischen Grundlagen (FTP)
FTP-Module behandeln theoretische Grundlagen wie die höhere Mathematik, Physik, Informationstheorie, Chemie usw. Sie erweitern Ihre abstrakte, wissenschaftliche Tiefe und tragen dazu bei, den für die Innovation wichtigen Bogen zwischen Abstraktion und Anwendung spannen zu können.
- 6-9 ECTS in Kontextmodulen (CM)
CM-Module vermitteln Ihnen Zusatzkompetenzen aus Bereichen wie Technologiemanagement, Betriebswirtschaft, Kommunikation, Projektmanagement, Patentrecht, Vertragsrecht usw.
In der Modulbeschreibung (siehe: Herunterladen der vollständigen Modulbeschreibung) finden Sie die kompletten Sprachangaben je Modul, unterteilt in die folgenden Kategorien:
This module introduces the main methods of text analysis using natural language processing (NLP) techniques, from a computer / data science perspective. The methods are introduced in relation to concrete applications, in order to extract meaningful, structured knowledge in several dimensions from large amounts of unstructured texts. The knowledge and applications are complementary to those of information retrieval, with several commonalities (e.g. document representation), and advanced IR topics will be included as well.
This module is divided into three parts, each of them starting with the description of one or more text analysis problems. Then, the main methods needed to address them are defined, emphasizing their generality and reusability. Finally, for each part, the methods are instantiated and combined to enable concrete applications.
The three parts are organized by increased sophistication of the analysis of language in texts:
- Text analysis using bags-of-words (i.e. texts are considered as sets of independent words)
- Text analysis using sequences of words
- Text analysis using sentence structure (i.e. considering also the dependencies between words)
- Mathematics: basic linear algebra (e.g. matrix multiplications), probability theory (e.g. Bayes theorem)
- Statistics: basic descriptive statistics (e.g., mean, variance, hypothesis testing)
- Programming: good command of a structured programming language (e.g., Python, C++, Java, etc.)
- Machine learning: experimental framework, simple classifiers (e.g. decision trees, Naive Bayes, SVMs)
- The students are able to categorize a text analysis problem and relate the type of analysis that is required and the features to be extracted to a range of known problems.
- The students are able to identify text processing methods to leverage for solving a new problem.
- The students are aware of text processing tools and can adapt off-the-shelf systems to their needs.
- The students understand the role of data and evaluation metrics. Given a text analysis problem they are able to design comparative experiments to identify the most promising solution.
Introduction[5%]: importance of text analysis; layers of language analysis; basic text processing tools and notions of statistics; basic notions of information retrieval; data sources; evaluation methods; overview of the course.
Part A. Text analysis using bags-of-words [40%]
Motivating examples: text classification and sentiment analysis, need for word representations accounting for meaning and similarity, distributional semantics.
Methods for learning low-rank word representations from data with illustration of resulting vectors: topic models from LSA to LDA; word embeddings; word sense disambiguation (statistical vs. knowledge-based).
Apply low-rank word representations to text classification, sentiment analysis, information retrieval and content-based text recommendation using bag-of-words models.
Part B. Text analysis using sequences of words [20%]
Motivating examples: predict the next word in a sequence, POS tagging, named entity detection.
Methods and their applications: collocation extraction with mutual information, POS tagging with HMMs, NE detection with CRFs, language modeling with n-grams and neural networks.
Part C. Text analysis using sentence structure [20%]
Motivating example: natural language inference (reasoning over sentences).
Methods: parsing, semantic role labeling, named entity linking, relationship and fact extraction, neural network models of sentence structure (e.g. CNNs or HANs).
Applications: solving logical entailment with deep neural networks, revisiting sentiment analysis with DNNs, question answering system; automatic information extraction from texts (entities, relationships, facts, events) and linking with ontologies (e.g. DBpedia).
Part D. Special chapters [15%]
Perspectives on other text analysis tasks, on multilingual issues, question answering and dialogue, information retrieval and recommendation.
Lehr- und Lernmethoden
Classroom teaching; programming exercises
Foundations of Statistical Natural Language Processing, Christopher Manning & Hinrich Schütze, MIT Press, 1999.
Speech and Language Processing, 2nd edition, Daniel Jurafsky and James H. Martin, Prentice-Hall, 2008.
Introduction to Information Retrieval, Christopher Manning, Prabhakar Raghavan and Hinrich Schütze, 2008.
Natural Language Processing with Python, Steven Bird, Ewan Klein and Edward Loper, O’Reilly, 2009.
Neural Network Methods for Natural Language Processing, Yoav Goldberg, Morgan & Claypool, 2017.
Supplemental material (articles) will be indicated for each lesson.