Please wait...
Computer Vision for Image and Video Data Analysis
About
Location:
Mannheim, B6 4-5
Mannheim, B6 4-5
Course duration:
09:00-16:00 (CEST / UTC+2)
General Topics:
Course Level:
Format:
Software used:
Duration:
Language:
Fees:
Students: 550 €
Academics: 825 €
Commercial: 1650 €
Keywords
Additional links
Lecturer(s): Andreu Casas
Course description
Social scientists have long argued that images play a crucial role in shaping and reflecting political life. This role is heightened by the bombardment of images and videos that people experience today through many communications channels, from television to social media. Digitization has both increased the presence of images in daily life and made it easier for scholars to access and collect large quantities of pictures. However, using images collected in observational settings as data for social science inference is an arduous task. Fortunately, recent innovations in computer vision, the subfield of computer science concerned with automated image analysis, can reduce the costs of using images and videos as data.
In this course, we will dig into the necessary theoretical and methodological expertise needed to apply machine learning methods to address social science questions. We will combine theoretical sessions where we will discuss research using computer vision methods for the study of politics, communication science, etc., with sessions where we will cover in detail key methodological advances needed to fully understand state-of-the-art computer vision methods (deep learning, neural networks, convolutional neural networks, multimodal models, visual language models, etc.), as well as practical sessions where we will go over several Python tutorials implementing different computer vision techniques, for image processing (e.g. splitting videos into analytical frames), object and face detection, image (supervised and unsupervised) classification, facial trait analysis, multimodal modeling, and Visual Language Models. In addition, we will also have a session on cloud computing, providing participants with an overview of the options available to them if they need to train and deploy computer vision models for large amounts of data, as well as concrete examples of how to use some particular cloud computing services.
Participants with basic programming skills/experience in Python and some machine learning background will get the most out of the course. In the cloud computing session, we will also use some bash (terminal coding), but it will be very minimal, and no prior knowledge is required. However, we will also take the time to briefly review some key machine learning concepts necessary to implement the computer vision methods taught in this course, and participants will be provided with clear and easy-to-follow sample code for each of the practical tutorials. By the end of the course, participants will have a good understanding of the kind of research questions that can be answered using computer vision methods, as well as a good understanding of several techniques and how to apply them in their own research.
For additional details on the course and a day-to-day schedule, please download the full-length syllabus.
Organizational Structure of the Course
The course will be organized around three different types of sessions:
- Lectures in which the lecturer will present relevant literature, theory, concepts, and methods; and discuss them with the participants.
- Tutorials in which the lecturer will provide, run, and discuss sample code designed to implement different computer vision techniques. Participants will also run the code on their own and can ask as many clarifying questions as needed.
- Consulting sessions in which participants will work on implementing the learned techniques on their own, with the support of the lecturer. During these sessions, participants can bring their own data (or work on sample data provided by the lecturer) and ask questions about how to adapt the sample code for their own project, or what additional computer vision methods can help them answer their substantive questions of interest.
Target group
You will find the course useful if:
- you are a PhD student, early career scholar, industry professional, or generally interested in using computational methods to automatically analyze large quantities of video/image (or multimodal: e.g. text + image) data.
Learning objectives
By the end of the course, you will:
- have a good overview of the existing images-as-data literature in the social sciences
- have a good understanding of key deep learning concepts relevant for the implementation of computer vision methods
- have a good understanding of several computer vision techniques (object and face detection/recognition, image classification, facial trait analysis, etc.)
- have a good understanding of the many options and techniques available to store and compute visual data
- be able to implement different computer vision techniques in Python
- be able to use/adapt different computer vision techniques for their own research projects
- be able to use/adapt different multimodal modeling approaches
Prerequisites
- basic programming skills/experience in Python (e.g. data loading, pandas data frames, loops and basic data operations)
- basic machine learning knowledge (e.g. distinction between supervised and unsupervised learning, familiarity with the training process in machine learning - such as train/test/validation split, cross-validation, etc. - although these concepts will be reviewed in more detail during the course)
- a Google account: we will use Google Colab in the course tutorials.
For those who would like a primer or refresher in Python, we recommend taking the online workshop “Introduction to Python” (25-28 August) and/or the online blended learning course “Introduction to Computational Social Science with Python” (01 September-05 September).
Software and Hardware Requirements
You should bring your own laptops for use in the course.
The course will use Google Colab, so you need a Google account. There is no need to install Python or any Python package locally.