使用 DoWhy 和 EconML 计算条件平均治疗效果 (CATE)#

这是一项实验性功能,我们在其中使用 DoWhy 中的 EconML 方法。使用 EconML 可以通过不同方法进行 CATE 估计。

DoWhy 中因果推断的所有四个步骤保持不变:建模、识别、估计和反驳。主要区别在于我们现在在估计步骤中调用 econml 方法。还有一个使用线性回归的简单示例,以帮助理解 CATE 估计器背后的直觉。

所有数据集均使用线性结构方程生成。

[1]:
%load_ext autoreload
%autoreload 2
[2]:
import numpy as np
import pandas as pd
import logging

import dowhy
from dowhy import CausalModel
import dowhy.datasets

import econml
import warnings
warnings.filterwarnings('ignore')

BETA = 10
[3]:
data = dowhy.datasets.linear_dataset(BETA, num_common_causes=4, num_samples=10000,
                                    num_instruments=2, num_effect_modifiers=2,
                                     num_treatments=1,
                                    treatment_is_binary=False,
                                    num_discrete_common_causes=2,
                                    num_discrete_effect_modifiers=0,
                                    one_hot_encode=False)
df=data['df']
print(df.head())
print("True causal estimate is", data["ate"])
         X0        X1   Z0        Z1        W0        W1 W2 W3         v0  \
0 -0.520835  0.756229  0.0  0.547725 -0.348991 -0.917974  0  3  14.006824
1 -1.560984  0.546710  0.0  0.474327 -0.652985 -1.886253  0  2   5.116915
2 -0.631737  0.866077  0.0  0.993189 -0.317968  0.702546  2  2  20.962792
3 -1.471904  0.208802  1.0  0.543871 -1.858522 -2.549787  0  2   9.131259
4 -2.163623  1.074842  1.0  0.522107 -0.281958  0.102664  1  2  24.442475

            y
0  181.573850
1   56.379803
2  274.811687
3   80.831273
4  301.137661
True causal estimate is 11.573149205663968
[4]:
model = CausalModel(data=data["df"],
                    treatment=data["treatment_name"], outcome=data["outcome_name"],
                    graph=data["gml_graph"])
[5]:
model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
../_images/example_notebooks_dowhy-conditional-treatment-effects_5_0.png
../_images/example_notebooks_dowhy-conditional-treatment-effects_5_1.png
[6]:
identified_estimand= model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W2,W1,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W2,W1,W0,U) = P(y|v0,W3,W2,W1,W0)

### Estimand : 2
Estimand name: iv
Estimand expression:
 ⎡                              -1⎤
 ⎢    d        ⎛    d          ⎞  ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟  ⎥
 ⎣d[Z₀  Z₁]    ⎝d[Z₀  Z₁]      ⎠  ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0,Z1})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→{v0}, then ¬({Z0,Z1}→y)

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

线性模型#

首先,让我们使用线性模型建立一些估计 CATE 的直觉。效应调节器(导致异质治疗效果)可以建模为与治疗的交互项。因此,它们的值调节治疗的效果。

下面是治疗从 0 变为 1 的估计效果。

[7]:
linear_estimate = model.estimate_effect(identified_estimand,
                                        method_name="backdoor.linear_regression",
                                       control_value=0,
                                       treatment_value=1)
print(linear_estimate)
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W2,W1,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W2,W1,W0,U) = P(y|v0,W3,W2,W1,W0)

## Realized estimand
b: y~v0+W3+W2+W1+W0+v0*X0+v0*X1
Target units:

## Estimate
Mean value: 11.573167881816286
### Conditional Estimates
__categorical__X0  __categorical__X1
(-4.375, -1.64]    (-3.165, -0.208]      5.078562
                   (-0.208, 0.387]       8.485901
                   (0.387, 0.892]       10.316470
                   (0.892, 1.479]       12.246467
                   (1.479, 4.369]       15.190344
(-1.64, -1.05]     (-3.165, -0.208]      6.211164
                   (-0.208, 0.387]       9.229992
                   (0.387, 0.892]       11.081200
                   (0.892, 1.479]       13.054074
                   (1.479, 4.369]       16.098330
(-1.05, -0.546]    (-3.165, -0.208]      6.672065
                   (-0.208, 0.387]       9.717292
                   (0.387, 0.892]       11.599016
                   (0.892, 1.479]       13.509362
                   (1.479, 4.369]       16.612933
(-0.546, 0.0363]   (-3.165, -0.208]      6.791592
                   (-0.208, 0.387]      10.133414
                   (0.387, 0.892]       12.047274
                   (0.892, 1.479]       13.993364
                   (1.479, 4.369]       17.081919
(0.0363, 2.709]    (-3.165, -0.208]      7.808295
                   (-0.208, 0.387]      10.903614
                   (0.387, 0.892]       12.788630
                   (0.892, 1.479]       14.663872
                   (1.479, 4.369]       18.027674
dtype: float64

EconML 方法#

现在我们转向 EconML 包中更高级的方法来估计 CATE。

首先,让我们看看双重机器学习估计器。Method_name 对应于我们要使用的类的完全限定名称。对于双重 ML,它是“econml.dml.DML”。

目标单位定义了计算因果估计的单位。这可以是应用于原始数据框的 lambda 函数过滤器、新的 Pandas 数据框,或者对应于三种主要目标单位(“ate”、“att”和“atc”)的字符串。下面我们展示一个 lambda 函数的示例。

Method_params 直接传递给 EconML。有关允许参数的详细信息,请参阅 EconML 文档。

[8]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor
dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML",
                                     control_value = 0,
                                     treatment_value = 1,
                                 target_units = lambda df: df["X0"]>1,  # condition used for CATE
                                 confidence_intervals=False,
                                method_params={"init_params":{'model_y':GradientBoostingRegressor(),
                                                              'model_t': GradientBoostingRegressor(),
                                                              "model_final":LassoCV(fit_intercept=False),
                                                              'featurizer':PolynomialFeatures(degree=1, include_bias=False)},
                                               "fit_params":{}})
print(dml_estimate)
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W2,W1,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W2,W1,W0,U) = P(y|v0,W3,W2,W1,W0)

## Realized estimand
b: y~v0+W3+W2+W1+W0 | X0,X1
Target units: Data subset defined by a function

## Estimate
Mean value: 13.416786686392697
Effect estimates: [[20.5674886 ]
 [16.81810804]
 [ 4.41081626]
 [12.7590796 ]
 [18.66045567]
 [18.51674086]
 [17.1438172 ]
 [16.72935127]
 [12.3084639 ]
 [10.89723545]
 [12.2995654 ]
 [15.4393769 ]
 [15.12603867]
 [ 9.92507887]
 [21.45669981]
 [14.24272826]
 [ 5.1408623 ]
 [13.95199732]
 [16.10403499]
 [15.51334418]
 [12.86616718]
 [17.69517585]
 [14.98275821]
 [14.02400959]
 [ 8.04652404]
 [ 7.40925404]
 [11.66325528]
 [16.73155822]
 [11.41878005]
 [14.39183237]
 [ 7.78196165]
 [18.5219238 ]
 [ 5.40617633]
 [17.47738011]
 [13.03976874]
 [21.62362646]
 [12.7727103 ]
 [10.89263293]
 [17.30916227]
 [13.97571332]
 [10.90617174]
 [10.42957745]
 [ 8.22825142]
 [16.43221157]
 [12.23764192]
 [12.68420247]
 [11.39128513]
 [16.09066681]
 [10.02702876]
 [ 7.58134396]
 [12.19220932]
 [10.59733045]
 [13.37447019]
 [13.64565928]
 [19.16869052]
 [11.69113092]
 [17.50618788]
 [19.21822674]
 [13.83940523]
 [14.61733979]
 [11.6829634 ]
 [13.74655525]
 [14.36207116]
 [12.77088996]
 [ 9.01665902]
 [11.30305241]
 [15.89790646]
 [11.78357804]
 [16.56510634]
 [14.5326168 ]
 [ 6.39768296]
 [13.43185146]
 [12.01891169]
 [11.60103112]
 [15.70616691]
 [10.93777069]
 [12.58731849]
 [ 9.65958592]
 [17.27134558]
 [10.11003355]
 [12.69721151]
 [14.08728187]
 [13.07706806]
 [13.16812878]
 [13.81273431]
 [10.85485827]
 [11.22442377]
 [11.26772553]
 [10.91163199]
 [18.91911392]
 [14.16473814]
 [12.43741423]
 [10.90886278]
 [15.94262812]
 [ 7.34136344]
 [13.96326514]
 [19.49004536]
 [13.39017623]
 [11.01312687]
 [12.04871674]
 [ 8.74582241]
 [ 9.18646732]
 [14.41435661]
 [ 7.65377335]
 [12.70753593]
 [ 6.5546823 ]
 [20.89621188]
 [13.6687491 ]
 [12.29115227]
 [ 8.23148509]
 [13.14348103]
 [ 8.59881967]
 [14.45936563]
 [12.93855552]
 [14.96923772]
 [15.20999455]
 [ 9.25666362]
 [15.44556932]
 [16.30677664]
 [ 6.79309506]
 [11.0248835 ]
 [11.38621917]
 [12.40392566]
 [15.15631617]
 [15.47750929]
 [13.09407579]
 [21.09243156]
 [14.20065511]
 [18.44004907]
 [17.46505872]
 [13.12384027]
 [10.80277671]
 [12.90042107]
 [10.60835352]
 [15.78490693]
 [12.61665204]
 [10.55847388]
 [17.02018018]
 [ 8.91050269]
 [16.86815955]
 [17.44050478]
 [ 9.96930217]
 [17.72905667]
 [ 9.60855556]
 [12.93686361]
 [12.69410935]
 [13.12214551]
 [10.4047489 ]
 [13.70034905]
 [16.2873264 ]
 [12.54950062]
 [13.07973091]
 [12.94038306]
 [ 8.55823341]
 [14.97447905]
 [12.30526785]
 [ 9.7865365 ]
 [10.0164226 ]
 [ 8.39509301]
 [13.73544153]
 [ 6.24179927]
 [12.11277178]
 [15.53916016]
 [11.05322366]
 [12.12149138]
 [10.06573177]
 [11.90917829]
 [12.56205525]
 [12.90046028]
 [ 9.62295253]
 [18.13306695]
 [17.18491478]
 [15.42384531]
 [11.79491557]
 [14.98991145]
 [15.03724975]
 [11.26121636]
 [12.67032938]
 [ 8.68729779]
 [ 7.51402856]
 [ 9.75685123]
 [12.60398272]
 [12.99152973]
 [15.09644118]
 [16.60299135]
 [16.09588776]
 [ 3.64772799]
 [12.84148723]
 [18.65626365]
 [14.39374335]
 [13.30649259]
 [13.67661457]
 [10.50907391]
 [ 7.14469413]
 [ 9.56842616]
 [ 9.42719284]
 [19.63900172]
 [12.89246116]
 [ 8.51291399]
 [13.26315487]
 [ 9.42047433]
 [11.55964207]
 [11.37217564]
 [ 7.86415732]
 [16.99067817]
 [ 9.95581975]
 [ 7.62928437]
 [15.67276703]
 [15.23104827]
 [14.74048851]
 [11.4638321 ]
 [14.36781003]
 [18.24133044]
 [14.53730603]
 [ 8.93658313]
 [17.13530402]
 [ 9.63114204]
 [14.82085309]
 [14.30631252]
 [13.27430389]
 [22.7958537 ]
 [14.52528945]
 [12.01047948]
 [17.07857452]
 [12.87676195]
 [13.19365817]
 [11.78923515]
 [13.84325536]
 [15.28329488]
 [14.8810602 ]
 [19.02864126]
 [13.24922779]
 [11.08788091]
 [ 9.14628163]
 [12.45375332]
 [13.11554873]
 [16.94156874]
 [15.2703654 ]
 [15.4388345 ]
 [12.66600935]
 [12.21227572]
 [18.48199464]
 [15.73234323]
 [14.98888595]
 [10.50379867]
 [15.36877202]
 [14.41978644]
 [19.1282748 ]
 [ 8.74649804]
 [18.0238775 ]
 [16.50881595]
 [10.51615594]
 [ 5.16953339]
 [20.92100312]
 [12.17479305]
 [14.34722732]
 [ 9.48866414]
 [ 9.27444633]
 [15.30851141]
 [15.60089383]
 [14.87701042]
 [11.39777685]
 [10.39819466]
 [10.52082995]
 [12.78910962]
 [13.08950908]
 [15.14449594]
 [14.14229255]
 [17.33829309]
 [15.11863113]
 [13.36840657]
 [17.27580312]
 [ 9.63258133]
 [13.81493757]
 [15.31935159]
 [16.65902264]
 [18.30038777]
 [10.05234907]
 [15.99178775]
 [14.50080589]
 [19.2293583 ]
 [15.80951257]
 [10.53998897]
 [12.94333869]
 [ 9.27430062]
 [ 7.90520875]
 [ 5.00443678]
 [ 8.62821179]
 [13.16819391]
 [17.04930048]
 [14.83520756]
 [11.11807073]
 [17.01245274]
 [15.80654   ]
 [13.32692666]
 [12.20093365]
 [14.66867576]
 [11.44647894]
 [19.02857772]
 [15.40038588]
 [11.61615852]
 [12.53974592]
 [ 9.5410054 ]
 [11.46179022]
 [16.09647595]
 [10.29908036]
 [10.36165292]
 [17.80286842]
 [17.03426414]
 [14.28988041]
 [ 5.36218566]
 [17.68735257]
 [14.90463919]
 [12.50118809]
 [13.20463625]
 [13.45133909]
 [19.52980369]
 [16.36723555]
 [17.95656214]
 [ 9.07603694]
 [12.99942557]
 [ 7.58217009]
 [12.25094009]
 [20.62468395]
 [15.53609165]
 [15.05133301]
 [13.82908115]
 [14.40748572]
 [11.60776728]
 [13.26077053]
 [ 7.29519633]
 [14.72728296]
 [13.18829522]
 [22.35169842]
 [11.88613095]
 [11.89403583]
 [13.33237917]
 [18.13117447]
 [16.21136452]
 [ 9.46587805]
 [15.05493252]
 [13.11146559]
 [12.94026322]
 [16.74566772]
 [23.87810341]
 [11.38879393]
 [14.16939295]
 [14.46740696]
 [17.20850582]
 [14.87149497]
 [ 9.71447662]
 [ 7.90234328]
 [16.45053096]
 [11.76390866]
 [16.19268945]
 [ 9.99966499]
 [13.66432805]
 [17.05656237]
 [13.16772128]
 [15.21632243]
 [12.74924754]
 [16.00549041]
 [18.74638992]
 [ 8.81325287]
 [20.21752126]
 [14.41172835]
 [11.44879343]
 [13.86334141]
 [16.00583361]
 [ 6.66165918]
 [12.96753674]
 [ 9.60090159]
 [13.28635303]
 [18.4905823 ]
 [ 9.24040548]
 [ 8.27216878]
 [12.17418539]
 [ 6.78689615]
 [13.10914365]
 [ 9.09830979]
 [15.22533319]
 [18.44379055]
 [ 9.53270009]
 [11.06633756]
 [20.46065545]
 [19.78971092]
 [13.14965442]
 [17.03035734]
 [15.86081423]
 [10.07036957]
 [24.56620577]
 [19.94939981]
 [14.09161592]
 [14.38762808]
 [19.65759238]
 [19.37197678]
 [12.76002652]
 [17.86634742]]

[9]:
print("True causal estimate is", data["ate"])
True causal estimate is 11.573149205663968
[10]:
dml_estimate = model.estimate_effect(identified_estimand, method_name="backdoor.econml.dml.DML",
                                     control_value = 0,
                                     treatment_value = 1,
                                 target_units = 1,  # condition used for CATE
                                 confidence_intervals=False,
                                method_params={"init_params":{'model_y':GradientBoostingRegressor(),
                                                              'model_t': GradientBoostingRegressor(),
                                                              "model_final":LassoCV(fit_intercept=False),
                                                              'featurizer':PolynomialFeatures(degree=1, include_bias=True)},
                                               "fit_params":{}})
print(dml_estimate)
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W2,W1,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W2,W1,W0,U) = P(y|v0,W3,W2,W1,W0)

## Realized estimand
b: y~v0+W3+W2+W1+W0 | X0,X1
Target units:

## Estimate
Mean value: 11.545193396441825
Effect estimates: [[12.21983071]
 [10.53410266]
 [12.50857838]
 ...
 [12.10591118]
 [11.42940987]
 [18.22728622]]

CATE 对象和置信区间#

EconML 提供了计算置信区间的自身方法。下面示例中使用了 BootstrapInference。

[11]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.ensemble import GradientBoostingRegressor
from econml.inference import BootstrapInference
dml_estimate = model.estimate_effect(identified_estimand,
                                     method_name="backdoor.econml.dml.DML",
                                     target_units = "ate",
                                     confidence_intervals=True,
                                     method_params={"init_params":{'model_y':GradientBoostingRegressor(),
                                                              'model_t': GradientBoostingRegressor(),
                                                              "model_final": LassoCV(fit_intercept=False),
                                                              'featurizer':PolynomialFeatures(degree=1, include_bias=True)},
                                               "fit_params":{
                                                               'inference': BootstrapInference(n_bootstrap_samples=100, n_jobs=-1),
                                                            }
                                              })
print(dml_estimate)
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W2,W1,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W2,W1,W0,U) = P(y|v0,W3,W2,W1,W0)

## Realized estimand
b: y~v0+W3+W2+W1+W0 | X0,X1
Target units: ate

## Estimate
Mean value: 11.527529711637378
Effect estimates: [[12.19093781]
 [10.55565123]
 [12.48968174]
 ...
 [12.1142495 ]
 [11.50570656]
 [18.37636746]]
95.0% confidence interval: [[[12.21716494 10.56284631 12.52228849 ... 12.14895059 11.5008028
   18.41722706]]

 [[12.3653399  10.68942253 12.67145822 ... 12.27769598 11.69965469
   18.82137994]]]

可以提供新输入作为目标单位并对其进行 CATE 估计。#

[12]:
test_cols= data['effect_modifier_names'] # only need effect modifiers' values
test_arr = [np.random.uniform(0,1, 10) for _ in range(len(test_cols))] # all variables are sampled uniformly, sample of 10
test_df = pd.DataFrame(np.array(test_arr).transpose(), columns=test_cols)
dml_estimate = model.estimate_effect(identified_estimand,
                                     method_name="backdoor.econml.dml.DML",
                                     target_units = test_df,
                                     confidence_intervals=False,
                                     method_params={"init_params":{'model_y':GradientBoostingRegressor(),
                                                              'model_t': GradientBoostingRegressor(),
                                                              "model_final":LassoCV(),
                                                              'featurizer':PolynomialFeatures(degree=1, include_bias=True)},
                                               "fit_params":{}
                                              })
print(dml_estimate.cate_estimates)
[[13.58279001]
 [10.41890505]
 [11.24343158]
 [13.72711318]
 [13.35554588]
 [13.62401139]
 [11.78798609]
 [13.37448703]
 [10.96215956]
 [10.33472942]]

也可以检索原始 EconML 估计器对象以进行任何进一步操作#

[13]:
print(dml_estimate._estimator_object)
<econml.dml.dml.DML object at 0x7fa48296b910>

适用于任何 EconML 方法#

除了双重机器学习,下面我们示例分析了使用正交森林、DRLearner(待修复错误)和基于神经网络的工具变量。

二元治疗,二元结果#

[14]:
data_binary = dowhy.datasets.linear_dataset(BETA, num_common_causes=4, num_samples=10000,
                                    num_instruments=1, num_effect_modifiers=2,
                                    treatment_is_binary=True, outcome_is_binary=True)
# convert boolean values to {0,1} numeric
data_binary['df'].v0 = data_binary['df'].v0.astype(int)
data_binary['df'].y = data_binary['df'].y.astype(int)
print(data_binary['df'])

model_binary = CausalModel(data=data_binary["df"],
                    treatment=data_binary["treatment_name"], outcome=data_binary["outcome_name"],
                    graph=data_binary["gml_graph"])
identified_estimand_binary = model_binary.identify_effect(proceed_when_unidentifiable=True)
            X0        X1   Z0        W0        W1        W2        W3  v0  y
0    -1.793037 -1.399146  1.0  1.918945 -1.837178  0.236404  0.374686   1  1
1    -0.333448  0.655739  1.0  1.879264 -0.963963 -0.836016 -1.358999   1  1
2     0.888209 -0.468418  0.0 -0.009608 -2.166280 -2.133876 -0.399757   0  0
3     0.047108 -0.112075  1.0 -0.157878  0.960046 -1.215147 -1.075899   1  1
4    -1.136697  0.346339  0.0  1.288702  1.054372  1.008400 -0.854865   1  1
...        ...       ...  ...       ...       ...       ...       ...  .. ..
9995 -0.335132  0.656147  1.0  1.728914  0.080934 -0.744671 -2.788066   1  1
9996  0.722839 -0.265596  1.0  0.752381  0.256524  1.775311 -1.318501   1  1
9997 -0.297344 -0.506091  0.0  2.640441 -1.229769 -0.645940 -0.184254   1  1
9998  1.975701 -0.538139  1.0  1.343029 -2.131004  0.019488 -0.951944   1  1
9999 -0.008991  0.438879  0.0  1.805087 -0.785812 -1.315040 -0.813219   1  1

[10000 rows x 9 columns]

使用 DRLearner 估计器#

[15]:
from sklearn.linear_model import LogisticRegressionCV
#todo needs binary y
drlearner_estimate = model_binary.estimate_effect(identified_estimand_binary,
                                method_name="backdoor.econml.dr.LinearDRLearner",
                                confidence_intervals=False,
                                method_params={"init_params":{
                                                    'model_propensity': LogisticRegressionCV(cv=3, solver='lbfgs', multi_class='auto')
                                                    },
                                               "fit_params":{}
                                              })
print(drlearner_estimate)
print("True causal estimate is", data_binary["ate"])
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W2,W1,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W2,W1,W0,U) = P(y|v0,W3,W2,W1,W0)

## Realized estimand
b: y~v0+W3+W2+W1+W0 | X0,X1
Target units: ate

## Estimate
Mean value: 0.5076726046276433
Effect estimates: [[0.33560034]
 [0.58497292]
 [0.46481803]
 ...
 [0.45091439]
 [0.46551344]
 [0.56250906]]

True causal estimate is 0.3994

工具变量方法#

[16]:
dmliv_estimate = model.estimate_effect(identified_estimand,
                                        method_name="iv.econml.iv.dml.DMLIV",
                                        target_units = lambda df: df["X0"]>-1,
                                        confidence_intervals=False,
                                method_params={"init_params":{
                                                              'discrete_treatment':False,
                                                              'discrete_instrument':False
                                                             },
                                               "fit_params":{}})
print(dmliv_estimate)
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: iv
Estimand expression:
 ⎡                              -1⎤
 ⎢    d        ⎛    d          ⎞  ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟  ⎥
 ⎣d[Z₀  Z₁]    ⎝d[Z₀  Z₁]      ⎠  ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0,Z1})
Estimand assumption 2, Exclusion: If we remove {Z0,Z1}→{v0}, then ¬({Z0,Z1}→y)

## Realized estimand
b: y~v0+W3+W2+W1+W0 | X0,X1
Target units: Data subset defined by a function

## Estimate
Mean value: 12.318378868817403
Effect estimates: [[12.36189086]
 [12.65896699]
 [ 9.27652617]
 ...
 [11.52012298]
 [12.72535513]
 [11.49785524]]

Metalearners#

[17]:
data_experiment = dowhy.datasets.linear_dataset(BETA, num_common_causes=5, num_samples=10000,
                                    num_instruments=2, num_effect_modifiers=5,
                                    treatment_is_binary=True, outcome_is_binary=False)
# convert boolean values to {0,1} numeric
data_experiment['df'].v0 = data_experiment['df'].v0.astype(int)
print(data_experiment['df'])
model_experiment = CausalModel(data=data_experiment["df"],
                    treatment=data_experiment["treatment_name"], outcome=data_experiment["outcome_name"],
                    graph=data_experiment["gml_graph"])
identified_estimand_experiment = model_experiment.identify_effect(proceed_when_unidentifiable=True)
            X0        X1        X2        X3        X4   Z0        Z1  \
0     1.173809  1.459526 -0.734409  1.271968 -1.139768  1.0  0.601753
1     0.414449  0.717820 -0.550879  1.542919 -0.061394  1.0  0.249743
2     1.482742  0.255102 -1.919030 -0.285862 -0.863307  1.0  0.756099
3     0.026937 -1.199462 -0.973142 -0.289309 -1.290227  0.0  0.136899
4     0.024090  0.561883 -1.853785  1.091348 -1.094074  0.0  0.825579
...        ...       ...       ...       ...       ...  ...       ...
9995  2.078520  0.356694 -2.000105  0.670750 -1.748592  1.0  0.741970
9996 -0.124005 -0.166842 -0.093543  0.959838 -1.905656  0.0  0.300446
9997  1.150224 -1.330117  0.489152  0.191833  0.757605  1.0  0.193838
9998  1.304547  0.938613  0.023811  1.904848 -0.811467  1.0  0.214526
9999  0.732159  0.033379 -0.953824  1.192767 -1.144805  1.0  0.375597

            W0        W1        W2        W3        W4  v0          y
0     0.136926 -0.476399 -0.570402  0.077995 -0.197060   1   4.698979
1     0.006260  2.123803 -0.660587  0.510811  0.708137   1  21.577703
2    -0.271943 -1.551507 -1.564531 -0.826516 -0.109544   1 -14.805524
3     0.118405  1.626826 -0.408900 -0.041457 -0.787333   1   7.264344
4    -1.691144  1.842948  0.261925 -1.378161  1.596990   1  11.047906
...        ...       ...       ...       ...       ...  ..        ...
9995  1.196646  2.431578  0.756643 -0.104238 -1.087868   1  17.433346
9996 -0.753776  1.279296  0.805955 -0.917385 -1.207200   0   1.261266
9997  0.075716  0.774895 -0.733659 -0.301633  1.135854   1  17.049905
9998  2.535041  1.245978 -0.121139 -0.032694 -0.994657   1  26.679028
9999  0.484762  0.763548 -1.651844  0.680308 -2.926768   1  -4.694555

[10000 rows x 14 columns]
[18]:
from sklearn.ensemble import RandomForestRegressor
metalearner_estimate = model_experiment.estimate_effect(identified_estimand_experiment,
                                method_name="backdoor.econml.metalearners.TLearner",
                                confidence_intervals=False,
                                method_params={"init_params":{
                                                    'models': RandomForestRegressor()
                                                    },
                                               "fit_params":{}
                                              })
print(metalearner_estimate)
print("True causal estimate is", data_experiment["ate"])
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W2,W1,W0,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W2,W1,W0,W4,U) = P(y|v0,W3,W2,W1,W0,W4)

## Realized estimand
b: y~v0+X4+X0+X3+X1+X2+W3+W2+W1+W0+W4
Target units: ate

## Estimate
Mean value: 14.846358811540133
Effect estimates: [[10.21441061]
 [20.81706358]
 [-1.88236367]
 ...
 [20.25643333]
 [23.70695711]
 [ 4.46810009]]

True causal estimate is 8.16819957456251

避免重新训练估计器#

一旦估计器拟合完毕,就可以重复用于估计不同数据点的效果。在这种情况下,您可以将 fit_estimator=False 传递给 estimate_effect。这适用于任何 EconML 估计器。下面我们展示了 T-learner 的示例。

[19]:
# For metalearners, need to provide all the features (except treatmeant and outcome)
metalearner_estimate = model_experiment.estimate_effect(identified_estimand_experiment,
                                method_name="backdoor.econml.metalearners.TLearner",
                                confidence_intervals=False,
                                fit_estimator=False,
                                target_units=data_experiment["df"].drop(["v0","y", "Z0", "Z1"], axis=1)[9995:],
                                method_params={})
print(metalearner_estimate)
print("True causal estimate is", data_experiment["ate"])
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W3,W2,W1,W0,W4])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W3,W2,W1,W0,W4,U) = P(y|v0,W3,W2,W1,W0,W4)

## Realized estimand
b: y~v0+X4+X0+X3+X1+X2+W3+W2+W1+W0+W4
Target units: Data subset provided as a data frame

## Estimate
Mean value: 14.565445047253522
Effect estimates: [[12.60870682]
 [11.78702789]
 [20.25643333]
 [23.70695711]
 [ 4.46810009]]

True causal estimate is 8.16819957456251

反驳估计结果#

添加一个随机共同原因变量#

[20]:
res_random=model.refute_estimate(identified_estimand, dml_estimate, method_name="random_common_cause")
print(res_random)
Refute: Add a random common cause
Estimated effect:12.241115919202636
New effect:12.169842164671763
p value:0.02

添加一个未观测到的共同原因变量#

[21]:
res_unobserved=model.refute_estimate(identified_estimand, dml_estimate, method_name="add_unobserved_common_cause",
                                     confounders_effect_on_treatment="linear", confounders_effect_on_outcome="linear",
                                    effect_strength_on_treatment=0.01, effect_strength_on_outcome=0.02)
print(res_unobserved)
Refute: Add an Unobserved Common Cause
Estimated effect:12.241115919202636
New effect:12.12830320273598

将治疗替换为随机(安慰剂)变量#

[22]:
res_placebo=model.refute_estimate(identified_estimand, dml_estimate,
        method_name="placebo_treatment_refuter", placebo_type="permute",
        num_simulations=10 # at least 100 is good, setting to 10 for speed
        )
print(res_placebo)
Refute: Use a Placebo Treatment
Estimated effect:12.241115919202636
New effect:-0.007014803477057729
p value:0.348413255039955

移除数据的随机子集#

[23]:
res_subset=model.refute_estimate(identified_estimand, dml_estimate,
        method_name="data_subset_refuter", subset_fraction=0.8,
        num_simulations=10)
print(res_subset)
Refute: Use a subset of data
Estimated effect:12.241115919202636
New effect:12.151513329399062
p value:0.020034500872382233

将会有更多反驳方法,特别是针对 CATE 估计器的方法。