DoWhy 中的中介分析:直接效应和间接效应#
[1]:
import numpy as np
import pandas as pd
from dowhy import CausalModel
import dowhy.datasets
# Warnings and logging
import warnings
warnings.filterwarnings('ignore')
创建数据集#
[2]:
# Creating a dataset with a single confounder and a single mediator (num_frontdoor_variables)
data = dowhy.datasets.linear_dataset(10, num_common_causes=1, num_samples=10000,
num_instruments=0, num_effect_modifiers=0,
num_treatments=1,
num_frontdoor_variables=1,
treatment_is_binary=False,
outcome_is_binary=False)
df = data['df']
print(df.head())
FD0 W0 v0 y
0 -5.161805 -0.238899 -2.774685 -24.838799
1 -5.297022 -0.639080 -2.613453 -27.150863
2 -21.917294 -2.305822 -10.291745 -110.928846
3 -5.663202 -0.588138 -2.992840 -28.633400
4 -4.601948 -0.892113 -2.831342 -25.002046
步骤 1:建模因果机制#
我们基于前门准则构建一个遵循因果图的数据集。也就是说,处理对结果没有直接效应;所有效应都通过前门变量 FD0 中介。
[3]:
model = CausalModel(df,
data["treatment_name"],data["outcome_name"],
data["gml_graph"],
missing_nodes_as_confounders=True)
model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))


步骤 2:识别自然直接效应和间接效应#
我们使用 estimand_type
参数指定目标估计量为 自然直接效应 或 自然间接效应。有关定义,请参阅 Judea Pearl 撰写的 因果中介的解释与识别。
自然直接效应:由路径 v0->y 引起的效应
自然间接效应:由路径 v0->FD0->y 引起的效应 (由 FD0 中介)。
[4]:
# Natural direct effect (nde)
identified_estimand_nde = model.identify_effect(estimand_type="nonparametric-nde",
proceed_when_unidentifiable=True)
print(identified_estimand_nde)
Estimand type: EstimandType.NONPARAMETRIC_NDE
### Estimand : 1
Estimand name: mediation
Estimand expression:
⎡ d ⎤
E⎢─────(y|FD0)⎥
⎣d[v₀] ⎦
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→y then P(y|FD0, v0, U) = P(y|FD0, v0)
[5]:
# Natural indirect effect (nie)
identified_estimand_nie = model.identify_effect(estimand_type="nonparametric-nie",
proceed_when_unidentifiable=True)
print(identified_estimand_nie)
Estimand type: EstimandType.NONPARAMETRIC_NIE
### Estimand : 1
Estimand name: mediation
Estimand expression:
⎡ d d ⎤
E⎢──────(y)⋅─────([FD₀])⎥
⎣d[FD₀] d[v₀] ⎦
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→y then P(y|FD0, v0, U) = P(y|FD0, v0)
步骤 3:效应估计#
目前只支持两阶段线性回归进行估计。我们计划很快添加一种非参数蒙特卡洛方法,正如在 Imai, Keele and Yamamoto (2010) 中所述。
自然间接效应#
该估计器将中介效应估计转换为一系列后门效应估计。1. 第一阶段模型估计从处理 (v0) 到中介变量 (FD0) 的效应。2. 第二阶段模型估计从中介变量 (FD0) 到结果 (Y) 的效应。
[6]:
import dowhy.causal_estimators.linear_regression_estimator
causal_estimate_nie = model.estimate_effect(identified_estimand_nie,
method_name="mediation.two_stage_regression",
confidence_intervals=False,
test_significance=False,
method_params = {
'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
}
)
print(causal_estimate_nie)
*** Causal Estimate ***
## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_NIE
### Estimand : 1
Estimand name: mediation
Estimand expression:
⎡ d d ⎤
E⎢──────(y)⋅─────([FD₀])⎥
⎣d[FD₀] d[v₀] ⎦
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→y then P(y|FD0, v0, U) = P(y|FD0, v0)
## Realized estimand
(b: FD0~v0+W0)*(b: y~FD0+W0)
Target units: ate
## Estimate
Mean value: 8.854778256842833
注意,该值等于自然间接效应的真实值 (存在随机噪声)。
[7]:
print(causal_estimate_nie.value, data["ate"])
8.854778256842833 8.961633713652406
该参数名为 ate
,因为在模拟数据集中,直接效应被设置为零。
自然直接效应#
现在让我们检查直接效应估计器是否返回 (正确) 的零估计值。
[8]:
causal_estimate_nde = model.estimate_effect(identified_estimand_nde,
method_name="mediation.two_stage_regression",
confidence_intervals=False,
test_significance=False,
method_params = {
'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
}
)
print(causal_estimate_nde)
*** Causal Estimate ***
## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_NDE
### Estimand : 1
Estimand name: mediation
Estimand expression:
⎡ d ⎤
E⎢─────(y|FD0)⎥
⎣d[v₀] ⎦
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→y then P(y|FD0, v0, U) = P(y|FD0, v0)
## Realized estimand
(b: y~v0+W0) - ((b: FD0~v0+W0)*(b: y~FD0+W0))
Target units: ate
## Estimate
Mean value: -3.29335893400895e-05
步骤 4:反驳#
TODO