Each module contains 3 ECTS. You choose a total of 10 modules/30 ECTS in the following module categories:
- 12-15 ECTS in technical scientific modules (TSM)
TSM modules teach profile-specific specialist skills and supplement the decentralised specialisation modules.
- 9-12 ECTS in fundamental theoretical principles modules (FTP)
FTP modules deal with theoretical fundamentals such as higher mathematics, physics, information theory, chemistry, etc. They will teach more detailed, abstract scientific knowledge and help you to bridge the gap between abstraction and application that is so important for innovation.
- 6-9 ECTS in context modules (CM)
CM modules will impart additional skills in areas such as technology management, business administration, communication, project management, patent law, contract law, etc.
In the module description (download pdf) you find the entire language information per module divided into the following categories:
This module introduces the main methods of text analysis using natural language processing (NLP) techniques, from a computer / data science perspective. The methods are introduced in relation to concrete applications, in order to extract meaningful, structured knowledge in several dimensions from large amounts of unstructured texts. The knowledge and applications are complementary to those of information retrieval, with several commonalities (e.g. document representation), and advanced IR topics will be included as well.
This module is divided into three parts, each of them starting with the description of one or more text analysis problems. Then, the main methods needed to address them are defined, emphasizing their generality and reusability. Finally, for each part, the methods are instantiated and combined to enable concrete applications.
The three parts are organized by increased sophistication of the analysis of language in texts:
- Text analysis using bags-of-words (i.e. texts are considered as sets of independent words)
- Text analysis using sequences of words
- Text analysis using sentence structure (i.e. considering also the dependencies between words)
- Mathematics: basic linear algebra (e.g. matrix multiplications), probability theory (e.g. Bayes theorem)
- Statistics: basic descriptive statistics (e.g., mean, variance, hypothesis testing)
- Programming: good command of a structured programming language (e.g., Python, C++, Java, etc.)
- Machine learning: experimental framework, simple classifiers (e.g. decision trees, Naive Bayes, SVMs)
- The students are able to categorize a text analysis problem and relate the type of analysis that is required and the features to be extracted to a range of known problems.
- The students are able to identify text processing methods to leverage for solving a new problem.
- The students are aware of text processing tools and can adapt off-the-shelf systems to their needs.
- The students understand the role of data and evaluation metrics. Given a text analysis problem they are able to design comparative experiments to identify the most promising solution.
Contents of Module
Introduction[5%]: importance of text analysis; layers of language analysis; basic text processing tools and notions of statistics; basic notions of information retrieval; data sources; evaluation methods; overview of the course.
Part A. Text analysis using bags-of-words [40%]
Motivating examples: text classification and sentiment analysis, need for word representations accounting for meaning and similarity, distributional semantics.
Methods for learning low-rank word representations from data with illustration of resulting vectors: topic models from LSA to LDA; word embeddings; word sense disambiguation (statistical vs. knowledge-based).
Apply low-rank word representations to text classification, sentiment analysis, information retrieval and content-based text recommendation using bag-of-words models.
Part B. Text analysis using sequences of words [20%]
Motivating examples: predict the next word in a sequence, POS tagging, named entity detection.
Methods and their applications: collocation extraction with mutual information, POS tagging with HMMs, NE detection with CRFs, language modeling with n-grams and neural networks.
Part C. Text analysis using sentence structure [20%]
Motivating example: natural language inference (reasoning over sentences).
Methods: parsing, semantic role labeling, named entity linking, relationship and fact extraction, neural network models of sentence structure (e.g. CNNs or HANs).
Applications: solving logical entailment with deep neural networks, revisiting sentiment analysis with DNNs, question answering system; automatic information extraction from texts (entities, relationships, facts, events) and linking with ontologies (e.g. DBpedia).
Part D. Special chapters [15%]
Perspectives on other text analysis tasks, on multilingual issues, question answering and dialogue, information retrieval and recommendation.
Teaching and Learning Methods
Classroom teaching; programming exercises
Foundations of Statistical Natural Language Processing, Christopher Manning & Hinrich Schütze, MIT Press, 1999.
Speech and Language Processing, 2nd edition, Daniel Jurafsky and James H. Martin, Prentice-Hall, 2008.
Introduction to Information Retrieval, Christopher Manning, Prabhakar Raghavan and Hinrich Schütze, 2008.
Natural Language Processing with Python, Steven Bird, Ewan Klein and Edward Loper, O’Reilly, 2009.
Neural Network Methods for Natural Language Processing, Yoav Goldberg, Morgan & Claypool, 2017.
Supplemental material (articles) will be indicated for each lesson.