Each module contains 3 ECTS. You choose a total of 10 modules/30 ECTS in the following module categories:
- 12-15 ECTS in technical scientific modules (TSM)
TSM modules teach profile-specific specialist skills and supplement the decentralised specialisation modules.
- 9-12 ECTS in fundamental theoretical principles modules (FTP)
FTP modules deal with theoretical fundamentals such as higher mathematics, physics, information theory, chemistry, etc. They will teach more detailed, abstract scientific knowledge and help you to bridge the gap between abstraction and application that is so important for innovation.
- 6-9 ECTS in context modules (CM)
CM modules will impart additional skills in areas such as technology management, business administration, communication, project management, patent law, contract law, etc.
In the module description (download pdf) you find the entire language information per module divided into the following categories:
This module introduces the main methods of text analysis using natural language processing (NLP) techniques, from a computer / data science perspective. The methods are introduced in relation to concrete applications, in order to extract meaningful, structured knowledge in several dimensions from large amounts of unstructured texts. The knowledge and applications are complementary to those of information retrieval, with several commonalities (e.g. document representation), and advanced IR topics will be included as well.
This module is divided into three parts, each of them starting with the description of one or more text analysis problems. Then, the main methods needed to address them are defined, emphasizing their generality and reusability. Finally, for each part, the methods are instantiated and combined to enable concrete applications.
The three main parts are organized by increased sophistication of the analysis of language in texts:
- Text analysis using bags-of-words (i.e. texts are considered as sets of independent words)
- Text analysis using sequences of words
- Text analysis using sentence structure (i.e. considering also the dependencies between words)
- Mathematics: basic linear algebra (e.g. matrix multiplications), probability theory (e.g. Bayes theorem)
- Statistics: basic descriptive statistics (e.g. mean, variance, hypothesis testing)
- Programming: good command of Python or another programming language (C++, Java, etc.)
- Machine learning: experimental framework (incl. data partitioning), simple classifiers (e.g. decision trees, Naive Bayes, SVMs), fundamentals of neural networks
- The students are able to categorize a text analysis problem and relate the type of analysis that is required and the features to be extracted to a range of known problems.
- The students are able to identify text processing methods to leverage for solving a new problem.
- The students are aware of a range of text processing tools and libraries and can adapt off-the-shelf systems to their needs.
- The students understand the role of data and evaluation metrics. Given a text analysis problem they are able to design comparative experiments to identify the most promising solution.
Contents of Module
Introduction [5%]: importance of text analysis; layers of language analysis; basic text processing tools and notions of deep learning; basic notions of information retrieval; data sources; evaluation methods; overview of the course.
Part A. Text analysis using bags-of-words [35%]
Motivating examples: text classification and sentiment analysis, need for word representations accounting for meaning and similarity, distributional semantics.
Methods for learning low-rank word representations from data with illustration of resulting vectors: topic models using LSA ; word embeddings using feed-forward neural networks.
Apply low-rank word representations to text classification, sentiment analysis, information retrieval and content-based text recommendation using bag-of-words models.
Part B. Text analysis using sequences of words [25%]
Motivating examples: predict the next word in a sequence, POS tagging, named entity detection, contextual word embeddings.
Methods: Hidden Markov models (HMMs), Conditional Random Fields (CRFs), n-grams, seq2seq neural networks, attention-only sequence-to-sequence models (Transformers).
Applications: POS tagging, NE recognition, language modeling, machine translation.
Part C. Text analysis using sentence structure [25%]
Motivating example: natural language inference (reasoning over sentences).
Methods: parsing, semantic role labeling, named entity linking, relationship and fact extraction, neural network models of dialogue.
Applications: solving logical entailment with deep neural networks, revisiting sentiment analysis with DNNs, question answering system; automatic information extraction from texts (entities, relationships, facts, events).
Part D. Special chapters [10%]
Perspectives on other text analysis tasks, on multilingual issues, question answering and dialogue, information retrieval and recommendation.
Teaching and Learning Methods
Classroom teaching; programming exercises
Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2nd
edition, Prentice-Hall, 2008 / 3rd edition draft, online, 2021.
Introduction to Information Retrieval, Christopher Manning, Prabhakar Raghavan and Hinrich Schütze, 2008.
Neural Network Methods for Natural Language Processing, Yoav Goldberg, Morgan & Claypool, 2017.
Supplemental material (articles) will be indicated for each lesson.