Scientific Coordination

Verena Kunz
E-Mail

Administrative Coordination

Noemi Hartung
E-Mail

Causal Machine Learning

About

Date:
22.09 - 26.09.2025

Location:
Online via Zoom

Course duration:

09:00-12:00 and 13:00-16:00 (CEST / UTC+2)

General Topics:

Computational Social Science, Data Analysis

Course Level:

Beginner, Advanced

Format:

Fall Seminar

Software used:

Duration:

1 week

Language:

English

Fees:

Students: 550 €

Academics: 825 €

Commercial: 1650 €

Keywords

Machine Learning, Causal Inference, High-Dimensional Data, Heterogeneous Treatment Effect Estimation, Policy Evaluation, R Software

Additional links

Terms and Conditions
FAQs

Lecturer(s): Marica Valente

About the lecturer - Marica Valente

Course description

Machine learning (ML) has revolutionized the way we analyze data, making it an essential tool for prediction in a wide range of applications, from forecasting economic trends to assessing environmental risks. However, while ML excels at predicting what is likely to happen, many key questions in science go beyond prediction and require understanding causal relationships-that is, answering "what if" questions about the effects of treatments and policies. This course is designed to equip you with the fundamental ML techniques for prediction and show how you can be tailored to answer causal questions.

Starting from linear regression, the foundation of many ML models, the course will introduce you to high-dimensional predictive modeling and the challenges of applying these tools to causal inference. While standard ML techniques are optimized for minimizing prediction errors, they often fail when directly applied to causal questions. For instance, we might use ML to predict tomorrow's air pollution levels based on weather conditions, but a policymaker needs to know whether restricting traffic would actually reduce pollution-a fundamentally different problem requiring causal analysis.

This course will teach you how to adapt machine learning methods for causal inference, integrating modern ML algorithms with causal models from econometrics and statistical inference. Through hands-on tutorials in R-the primary software for implementing causal machine learning-you will work with real-world datasets to apply, compare, and critically evaluate these methods.

You will explore the differences between standard causal effect estimation techniques and causal ML approaches, gaining a clear understanding of what each method can deliver differently and the contexts in which they are most effective. Additionally, you will learn to distinguish between predicting observable outcomes and estimating causal effects, developing the necessary skills to bridge the gap between conventional ML tools and rigorous causal analysis.

Beyond technical skills, the course will emphasize critical thinking and the ability to identify meaningful research questions. You will have the opportunity to present your research ideas and preliminary findings through optional oral presentations, fostering discussion and feedback.

By the end of the course, you will not only be proficient in machine learning tools for prediction but will also understand how to adapt them for rigorous causal analysis-a crucial skill for treatment effect evaluation and evidence-based policy decision-making.

For additional details on the course and a day-to-day schedule, please download the full-length syllabus.

Organizational Structure of the Course

The course consists of live online sessions every day, combining lectures on methods and applications with hands-on R tutorials. Lectures will be highly interactive, with dedicated Q&A sessions at the end of each section to engage participants. R tutorials will feature practical exercises using real-world datasets to reinforce key concepts. You will have the opportunity to present your ongoing research in the field during dedicated short presentation slots. Those interested in presenting are encouraged to submit a brief, informal summary (e.g., an abstract) of their research topic to marica.valente@uibk.ac.at.

Target group

This course is ideal for you if:

You are a researcher, student or practitioner interested in causal inference methods for evaluating treatment effects and policy interventions.
You want to estimate personalized treatment effects in social sciences, such as assessing individualized causal effects of policies to optimize targeting.
You work with or plan to analyze high-dimensional datasets containing a large number of variables and/or observations.
You seek applications in the social sciences and economics, where data-driven insights can inform decision-making.
You have an interest in coding in R and applying machine learning methods for causal analysis.
If you are not a social scientist, you work or plan to work in a field where personalized treatment effect estimation is valuable, such as evaluating the heterogeneous impacts of medical treatments on health.

Learning objectives

By the end of the course, you will:

Understand the distinction between statistics, econometrics, and machine learning, and how these fields approach data analysis differently, particularly in high-dimensional settings.
Develop proficiency in machine learning methods for prediction, including non-parametric (CART, Random Forests) and parametric (LASSO) techniques, and apply them using R.
Gain a strong foundation in causal inference methods, learning when and how machine learning can be adapted for causal analysis, including Double Machine Learning and orthogonalization techniques.
Apply standard methods for causal effect estimation and causal machine learning using R, understanding what they can deliver differently and when to use each approach.
Explore advanced causal inference methods in high-dimensional settings, such as synthetic controls and synthetic differences-in-differences, and understand their application in policy evaluation.
Learn how to estimate heterogeneous treatment effects, differentiating between Average Treatment Effects (ATE) and Conditional Average Treatment Effects (CATE), and implement causal trees and generalized random forests in R.
Improve your ability to critically assess and communicate empirical findings, through hands-on exercises, oral presentations, and discussions on the strengths and limitations of machine learning for causal inference.

Prerequisites

You should have completed an undergraduate-level introduction to statistics or econometrics.
The course requires basic knowledge of the linear OLS regression method.
No previous knowledge of machine learning is required.
Prior experience with R is not a prerequisite, however, it is strongly recommended. Alternatively, participants with prior experience in Stata or Python might use some of the resources below (see Recommended Literature to Look at in Advance) to ensure they have sufficient proficiency in R to follow the course. If you have little experience in R or want to refresh your sills, I recommend to familiarize yourself with the software using free online tools, e.g. https://www.datacamp.com/courses/free-introduction-to-r (sign up and start the free course on Introduction to R), https://swirlstats.com/ (learn R, in R). You may also consider taking the online workshop “Introduction to R” that takes place from 25-27 August.

Software Requirements

You should have R and RStudio installed on your machines, including the following packages:

rpart, rpart.plot, randomForest, caret, pdp, glmnet, hdm, weights, gplots, dplyr, plm, lmtest, Synth, SCtools, synthdid, tidyverse, grf, fixest, car, haven, spatstat

We recommend using the latest R version 4.4.2 (2024) and RStudio version 2024.12.0.467. If you need to update R, you can run:

install.packages("installr")

library(installr)

updateR()

Schedule

Note: The syllabus readings are not mandatory, but reviewing some before each session will be beneficial.

Day 1: Low- vs. High-Dimensional Problems

Draw differences between Statistics, Econometrics and Machine Learning
Linear Regression (OLS), Assumptions, and Flexibility: What to do when OLS breaks down?
The Curse of Dimensionality: Lost in High-Dimensional Spaces
Non-parametric methods: CART (Classification and Regression Trees)
R Tutorial: Mortality Predictions with CART