DoWhy：因果估计器的解释器#

这是对 DoWhy 因果推断库中解释器使用方法的快速介绍。我们将加载一个样本数据集，使用不同的方法估计（预先指定）处理变量对（预先指定）结果变量的因果效应，并演示如何解释获得的结果。

首先，让我们添加所需的 Python 路径，以便找到 DoWhy 代码并加载所有必需的包。

[1]:

%load_ext autoreload
%autoreload 2

[2]:

import numpy as np
import pandas as pd
import logging

import dowhy
from dowhy import CausalModel
import dowhy.datasets

现在，让我们加载一个数据集。为简单起见，我们模拟一个数据集，其中共同原因与处理之间、以及共同原因与结果之间存在线性关系。

Beta 是真实的因果效应。

[3]:

data = dowhy.datasets.linear_dataset(beta=1,
        num_common_causes=5,
        num_instruments = 2,
        num_treatments=1,
        num_discrete_common_causes=1,
        num_samples=10000,
        treatment_is_binary=True,
        outcome_is_binary=False)
df = data["df"]
print(df[df.v0==True].shape[0])
df

[3]:

	Z0	Z1	W0	W1	W2	W3	W4	v0	y
0	1.0	0.758179	0.223113	-0.583247	1.561522	0.016196	3	True	3.919456
1	1.0	0.336647	0.339881	0.042399	-1.543854	0.047453	1	True	0.311371
2	1.0	0.019540	-0.224140	-1.984048	0.460944	-0.704755	3	True	1.732886
3	1.0	0.811149	-1.666045	1.904262	0.062447	-0.203613	3	True	2.805750
4	1.0	0.816470	0.409841	-0.913059	-0.064024	0.073619	3	True	2.319611
...	...	...	...	...	...	...	...	...	...
9995	0.0	0.577807	-1.296264	-0.774313	-0.110763	-1.224282	2	True	0.668734
9996	0.0	0.981149	-2.850787	0.037785	-0.597924	-2.228819	1	True	-0.898895
9997	0.0	0.747377	-0.952558	0.711825	2.137963	-1.312254	0	True	2.795585
9998	0.0	0.828307	-0.555047	0.259252	0.169997	0.896667	0	True	1.032831
9999	0.0	0.064636	-0.465600	-1.135197	-0.625661	-0.010386	0	False	-1.554850

10000 行 × 9 列

请注意，我们正在使用 pandas dataframe 来加载数据。

识别因果可估计量#

我们现在以 GML 图格式输入因果图。

[4]:

# With graph
model=CausalModel(
        data = df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        graph=data["gml_graph"],
        instruments=data["instrument_names"]
        )

[5]:

model.view_model()

../_images/example_notebooks_dowhy_interpreter_9_0.png

[6]:

from IPython.display import Image, display
display(Image(filename="causal_model.png"))

../_images/example_notebooks_dowhy_interpreter_10_0.png

我们得到了一个因果图。现在识别和估计已完成。

[7]:

identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W1,W0,W2,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W1,W0,W2,W4,U) = P(y|v0,W3,W1,W0,W2,W4)

### Estimand : 2
Estimand name: iv
Estimand expression:
 ⎡                              -1⎤
 ⎢    d        ⎛    d          ⎞  ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟  ⎥
 ⎣d[Z₀  Z₁]    ⎝d[Z₀  Z₁]      ⎠  ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0,Z1})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→{v0}, then ¬({Z0,Z1}→y)

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

方法 1：倾向得分分层#

我们将使用倾向得分对数据中的单元进行分层。

[8]:

causal_estimate_strat = model.estimate_effect(identified_estimand,
                                              method_name="backdoor.propensity_score_stratification",
                                              target_units="att")
print(causal_estimate_strat)
print("Causal Estimate is " + str(causal_estimate_strat.value))

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W1,W0,W2,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W1,W0,W2,W4,U) = P(y|v0,W3,W1,W0,W2,W4)

## Realized estimand
b: y~v0+W3+W1+W0+W2+W4
Target units: att

## Estimate
Mean value: 1.002656737901175

Causal Estimate is 1.002656737901175

文本解释器#

文本解释器（用文字）描述了处理变量单位变化对结果变量的影响。

[9]:

# Textual Interpreter
interpretation = causal_estimate_strat.interpret(method_name="textual_effect_interpreter")

Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 1.002656737901175 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.

可视化解释器#

可视化解释器绘制了基于倾向得分调整数据集前后，标准化平均差 (SMD) 的变化。SMD 的公式如下所示。

\(SMD = \frac{\bar X_{1} - \bar X_{2}}{\sqrt{(S_{1}^{2} + S_{2}^{2})/2}}\)

此处，\(\bar X_{1}\) 和 \(\bar X_{2}\) 分别是处理组和对照组的样本均值。

[10]:

# Visual Interpreter
interpretation = causal_estimate_strat.interpret(method_name="propensity_balance_interpreter")

../_images/example_notebooks_dowhy_interpreter_18_0.png

此图显示了 SMD 如何从未调整单位下降到分层单位。

方法 2：倾向得分匹配#

我们将使用倾向得分来匹配数据中的单位。

[11]:

causal_estimate_match = model.estimate_effect(identified_estimand,
                                              method_name="backdoor.propensity_score_matching",
                                              target_units="atc")
print(causal_estimate_match)
print("Causal Estimate is " + str(causal_estimate_match.value))

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W1,W0,W2,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W1,W0,W2,W4,U) = P(y|v0,W3,W1,W0,W2,W4)

## Realized estimand
b: y~v0+W3+W1+W0+W2+W4
Target units: atc

## Estimate
Mean value: 1.0038102718387791

Causal Estimate is 1.0038102718387791

[12]:

# Textual Interpreter
interpretation = causal_estimate_match.interpret(method_name="textual_effect_interpreter")

Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 1.0038102718387791 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.

此处无法使用倾向平衡解释器，因为该解释器方法仅支持倾向得分分层估计器。

方法 3：加权#

我们将使用（逆向）倾向得分来为数据中的单位分配权重。DoWhy 支持几种不同的加权方案：1. 普通逆倾向得分加权 (IPS) (weighting_scheme=”ips_weight”) 2. 自归一化 IPS 加权（也称为 Hajek 估计器）(weighting_scheme=”ips_normalized_weight”) 3. 稳定化 IPS 加权 (weighting_scheme = “ips_stabilized_weight”)

[13]:

causal_estimate_ipw = model.estimate_effect(identified_estimand,
                                            method_name="backdoor.propensity_score_weighting",
                                            target_units = "ate",
                                            method_params={"weighting_scheme":"ips_weight"})
print(causal_estimate_ipw)
print("Causal Estimate is " + str(causal_estimate_ipw.value))

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W1,W0,W2,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W1,W0,W2,W4,U) = P(y|v0,W3,W1,W0,W2,W4)

## Realized estimand
b: y~v0+W3+W1+W0+W2+W4
Target units: ate

## Estimate
Mean value: 1.2641975291140912

Causal Estimate is 1.2641975291140912

[14]:

# Textual Interpreter
interpretation = causal_estimate_ipw.interpret(method_name="textual_effect_interpreter")

Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 1.2641975291140912 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.

[15]:

interpretation = causal_estimate_ipw.interpret(method_name="confounder_distribution_interpreter", fig_size=(8,8), font_size=12, var_name='W4', var_type='discrete')

../_images/example_notebooks_dowhy_interpreter_27_0.png

[ ]: