INTRODUCTION TO COMPUTATIONAL LINGUISTICS AND LABORATORY
Module MODULE B: NATURAL LANGUAGE ANALYSTICAL TOOLS

Academic Year 2023/2024 - Teacher: MISAEL MONGIOVI'

Expected Learning Outcomes

According to the Dublin descriptors, students, at the end of the course, will demonstrate:

1)      Knowledge and comprehension: students will learn the basics of computational thinking and of the main rules of structured programming in Python (first module). Moreover, students will learn the principles of text processing through Python.

2)      Ability to apply knowledge and comprehension: students will acquire the ability to algorithmically solve simple problems both from a formal and of a practical point view, the latter by using Python. Students will be able to write simple Python programs.

3)      Autonomy of judgment: the subject matter will be examined with the help of case studies, in order to prompt the students’ personal intuitions.

4)      Communicative abilities: students will acquire the ability to appropriately express, analyze and discuss with other people, programming issues and their solutions.

5)      Learning abilities: students will learn to apply the computational thinking to a large family of problems and to use Python programming libraries through the study of the related documentation.

Course Structure

The teaching will be carried out through lectures in which the course contents will be presented, also by means of programming demonstrations. The teaching involves the application of concepts through the use of Python language. Besides the hours of frontal lessons students will have the opportunity to perfect their preparation on structured programming in Python. In addition, a learning platform will be available to practice during the study hours and to evaluate what is learned in class. The same platform provides a valid tool for exam preparation.

Attendance of Lessons

Compulsory attendance.

Detailed Course Content

The course is divided in two main modules.

In the first module the student will learn the basics of computational thinking and of programming through the Python language. He will learn the use of the basic constructs, functions, recursion, files, and the main data structures available in Python. The second module of the course aims to provide essential tools for the processing of text and natural languageprocessing, starting from text normalization, up to the semantic representation of words and sentences and their use for classification. The program includes tokenization, stemming and lemmatization, Part-Of-Speech tagging and Named Entity Recognition, using the Python libraries NLTK and spaCy. We will continue with the acquisition of skills on accessing NLTK corpora, the WordNet dictionary and the datasets available on HuggingFace. The basics of data and matrix handling through pandas and numpy will be introduced. Furthermore, skills will be acquired on the semantic representation of words through word embeddings starting from static word embedding and introducing the contextual embedding and the document embedding. The basics of text classification will be introduced, as well as the metrics for classification using the scikit-learn library.

Textbook Information

Module B.

Text:

- J. Perkins, Python 3 Text Processing with NLTK 3 Cookbook, Packt Publishing, 2014, pp. 1-228.

The text introduces the student to the essential techniques of text and natural language processing. The second part of the second module, relating  Natural Language Processing through SpaCy, will be covered following lecture notes provided by the teacher (28 pages) and the official SpaCy documentation available online (https://spacy.io/usage).

 

Please remember that in compliance with art 171 L22.04.1941, n. 633 and its amendments, it is illegal to copy entire books or journals, only 15% of their content can be copied.

For further information on sanctions and regulations concerning photocopying please refer to the regulations on copyright (Linee Guida sulla Gestione dei Diritti d’Autore) provided by AIDRO - Associazione Italiana per i Diritti di Riproduzione delle opere dell’ingegno (the Italian Association on Copyright).

All the books listed in the programs can be consulted in the Library.

VERSIONE IN ITALIANO