功能性 API 预览#

本笔记本是系列笔记本的一部分，提供了 dowhy 提议的功能性 API 的预览。有关 DoWhy 新 API 的详细信息，请查看 py-why/dowhy。这项工作正在进行中，我们将随着添加新功能而更新。欢迎通过 Discord 或讨论页面提供您的反馈。此功能性 API 在设计时考虑了向后兼容性。因此，在即将发布的版本中，旧 API 和新 API 将继续共存并正常工作。使用 CausalModel 的旧 API 将逐步弃用，转而使用新 API。

当前功能性 API 涵盖： * 识别效应 (Identify Effect)： * identify_effect(...)：使用默认设置运行识别效应算法，只需提供图、处理和结果。 * auto_identify_effect(...)：identify_effect(...) 的更可配置版本。 * id_identify_effect(...)：使用 ID 算法识别效应。 * 反驳估计 (Refute Estimate)： * refute_estimate：使用默认参数运行一组反驳器的函数。 * refute_bootstrap：通过在包含混杂因素测量误差的随机数据样本上重新运行估计来反驳估计。 * refute_data_subset：通过在原始数据的随机子集上重新运行估计来反驳估计。 * refute_random_common_cause：通过引入随机生成的混杂因素（可能是未观测到的）来反驳估计。 * refute_placebo_treatment：通过用随机生成的安慰剂变量替换处理来反驳估计。 * sensitivity_simulation：添加未观测到的混杂因素进行反驳（未观测到的混杂因素模拟）。 * sensitivity_linear_partial_r2：添加未观测到的混杂因素进行反驳（线性偏 R2：线性模型的敏感性分析）。 * sensitivity_non_parametric_partial_r2：添加未观测到的混杂因素进行反驳（基于非参数偏 R2：非参数模型的敏感性分析）。 * sensitivity_e_value：计算点估计和置信区间的 E 值。使用观测协变量 E 值对测量的混杂因素的 E 值进行基准测试。绘制 E 值和观测协变量 E 值。 * refute_dummy_outcome：通过引入随机生成的混杂因素（可能是未观测到的）来反驳估计。

导入依赖项#

[1]:

# Config dict to set the logging level
import logging.config

from dowhy import CausalModel  # We still need this as we haven't created the functional API for effect estimation
from dowhy.causal_estimators.econml import Econml
from dowhy.causal_estimators.propensity_score_matching_estimator import PropensityScoreMatchingEstimator
from dowhy.causal_graph import CausalGraph
from dowhy.graph import build_graph

# Functional API imports
from dowhy.causal_identifier import (
    BackdoorAdjustment,
    EstimandType,
    identify_effect,
    identify_effect_auto,
    identify_effect_id,
)
from dowhy.causal_refuters import (
    refute_bootstrap,
    refute_data_subset,
    refute_estimate,
)
from dowhy.datasets import linear_dataset

DEFAULT_LOGGING = {
    "version": 1,
    "disable_existing_loggers": False,
    "loggers": {
        "": {
            "level": "WARN",
        },
    },
}


# set random seed for deterministic dataset generation
# and avoid problems when running tests
import numpy as np
np.random.seed(1)

logging.config.dictConfig(DEFAULT_LOGGING)
# Disabling warnings output
import warnings
from sklearn.exceptions import DataConversionWarning

warnings.filterwarnings(action="ignore", category=DataConversionWarning)

创建数据集#

[2]:

# Parameters for creating the Dataset
TREATMENT_IS_BINARY = True
BETA = 10
NUM_SAMPLES = 500
NUM_CONFOUNDERS = 3
NUM_INSTRUMENTS = 2
NUM_EFFECT_MODIFIERS = 2

# Creating a Linear Dataset with the given parameters
data = linear_dataset(
    beta=BETA,
    num_common_causes=NUM_CONFOUNDERS,
    num_instruments=NUM_INSTRUMENTS,
    num_effect_modifiers=NUM_EFFECT_MODIFIERS,
    num_samples=NUM_SAMPLES,
    treatment_is_binary=True,
)

data_2 = linear_dataset(
    beta=BETA,
    num_common_causes=NUM_CONFOUNDERS,
    num_instruments=NUM_INSTRUMENTS,
    num_effect_modifiers=NUM_EFFECT_MODIFIERS,
    num_samples=NUM_SAMPLES,
    treatment_is_binary=True,
)

treatment_name = data["treatment_name"]
print(treatment_name)
outcome_name = data["outcome_name"]
print(outcome_name)

graph = build_graph(
    action_nodes=treatment_name,
    outcome_nodes=outcome_name,
    effect_modifier_nodes=data["effect_modifier_names"],
    common_cause_nodes=data["common_causes_names"],
)
observed_nodes = data["df"].columns.tolist()

['v0']
y

识别效应 - 功能性 API (预览)#

[3]:

# Default identify_effect call example:
identified_estimand = identify_effect(graph, treatment_name, outcome_name, observed_nodes)

# auto_identify_effect example with extra parameters:
identified_estimand_auto = identify_effect_auto(
    graph,
    treatment_name,
    outcome_name,
    observed_nodes,
    estimand_type=EstimandType.NONPARAMETRIC_ATE,
    backdoor_adjustment=BackdoorAdjustment.BACKDOOR_EFFICIENT,
)

# id_identify_effect example:
identified_estimand_id = identify_effect_id(
    graph, treatment_name, outcome_name
)  # Note that the return type for id_identify_effect is IDExpression and not IdentifiedEstimand

print(identified_estimand)

Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

### Estimand : 2
Estimand name: iv
No such variable(s) found!

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

估计效应 - 功能性 API (预览)#

[4]:

# Basic Estimate Effect function
estimator = PropensityScoreMatchingEstimator(
    identified_estimand=identified_estimand,
    test_significance=None,
    evaluate_effect_strength=False,
    confidence_intervals=False,
).fit(
    data=data["df"],
    effect_modifier_names=data["effect_modifier_names"]
)

estimate = estimator.estimate_effect(
    data=data["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

# Using same estimator with different data
second_estimate = estimator.estimate_effect(
    data=data_2["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

print(estimate)
print("-----------")
print(second_estimate)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

-----------
*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 15.620265231373276

[5]:

# EconML estimator example
from econml.dml import DML
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import PolynomialFeatures

from sklearn.ensemble import GradientBoostingRegressor

estimator = Econml(
    identified_estimand=identified_estimand,
    econml_estimator=DML(
        model_y=GradientBoostingRegressor(),
        model_t=GradientBoostingRegressor(),
        model_final=LassoCV(fit_intercept=False),
        featurizer=PolynomialFeatures(degree=1, include_bias=True),
    ),
).fit(
    data=data["df"],
    effect_modifier_names=data["effect_modifier_names"],
)

estimate_econml = estimator.estimate_effect(
    data=data["df"],
    control_value=0,
    treatment_value=1,
    target_units="ate",
)

print(estimate)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

反驳估计 - 功能性 API (预览)#

[6]:

# You can call the refute_estimate function for executing several refuters using default parameters
# Currently this function does not support sensitivity_* functions
refutation_results = refute_estimate(
    data["df"],
    identified_estimand,
    estimate,
    treatment_name=treatment_name,
    outcome_name=outcome_name,
    refuters=[refute_bootstrap, refute_data_subset],
)

for result in refutation_results:
    print(result)

# Or you can execute refute methods directly
# You can change the refute_bootstrap - refute_data_subset for any of the other refuters and add the missing parameters

bootstrap_refutation = refute_bootstrap(data["df"], identified_estimand, estimate)
print(bootstrap_refutation)

data_subset_refutation = refute_data_subset(data["df"], identified_estimand, estimate)
print(data_subset_refutation)

Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.88856219387756
p value:0.54

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.66112498000895
p value:0.6599999999999999

Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.681262891992505
p value:0.7

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.743625685672644
p value:0.5800000000000001

向后兼容性#

本节展示了如何仅使用 CausalModel API 重现相同结果

[7]:

# Create Causal Model
causal_model = CausalModel(data=data["df"], treatment=treatment_name, outcome=outcome_name, graph=data["gml_graph"])

识别效应#

[8]:

identified_estimand_causal_model_api = (
    causal_model.identify_effect()
)  # graph, treatment and outcome comes from the causal_model object

print(identified_estimand_causal_model_api)

Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

### Estimand : 2
Estimand name: iv
Estimand expression:
 ⎡                              -1⎤
 ⎢    d        ⎛    d          ⎞  ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟  ⎥
 ⎣d[Z₁  Z₀]    ⎝d[Z₁  Z₀]      ⎠  ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z1,Z0})
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→{v0}, then ¬({Z1,Z0}→y)

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

估计效应#

[9]:

estimate_causal_model_api = causal_model.estimate_effect(
    identified_estimand_causal_model_api, method_name="backdoor.propensity_score_matching"
)

print(estimate_causal_model_api)

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W0,W2])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,W2,U) = P(y|v0,W1,W0,W2)

## Realized estimand
b: y~v0+W1+W0+W2
Target units: ate

## Estimate
Mean value: 11.32687325588067

反驳估计#

[10]:

bootstrap_refutation_causal_model_api = causal_model.refute_estimate(identified_estimand_causal_model_api, estimate_causal_model_api, "bootstrap_refuter")
print(bootstrap_refutation_causal_model_api)

data_subset_refutation_causal_model_api = causal_model.refute_estimate(
    identified_estimand_causal_model_api, estimate_causal_model_api, "data_subset_refuter"
)

print(data_subset_refutation_causal_model_api)

Refute: Bootstrap Sample Dataset
Estimated effect:11.32687325588067
New effect:11.80905615602231
p value:0.5

Refute: Use a subset of data
Estimated effect:11.32687325588067
New effect:11.766914477757974
p value:0.6399999999999999