Correlation, causation & proxy variables?

The field of automated decision systems is far-reaching and complex. Technical concepts are often difficult to understand for users and affected individuals, which makes such systems even more obscure than they already are. With our explanatory videos, we offer an easy introduction to important basic concepts. We hope to shed some light on the "black box" of algorithms and explain risks associated with automated decision-making in the context of human resources.

Although the videos are narrated in German, we created subtitles in English. You can activate them by clicking on the settings icon on the left side of the YouTube logo.

Proxy variables

Proxy variables are often found in software applications. Although certain characteristics, such as applicants' gender, should be ignored by a system, they can still be determined with the help of proxy variables. This can lead to discrimination, e.g., in software used for automated evaluation of job applications.

Correlation & causality

Correlation ≠ Causality. We encounter correlations every day. For example, in summer, both the consumption of ice cream and the number of deaths by drowning increase. However, from this apparent correlation it cannot necessarily be concluded that the cause (causality) of drownings is increasing ice cream consumption. Whenever correlation and causality are equated in the interpretation of data, fallacies can quickly arise.

Intelligibility of algorithmic systems

Software used in public administration or other societally sensitive areas should be democratically controllable. Transparency alone, e.g. in the context of open source code, is not sufficient for this. It is primarily a matter of explaining the process: how which data is processed by whom? This also plays an important role in learning systems, which must first be "fed" with training data.