演示因果识别中优化后门变量搜索的示例#
本笔记本比较了使用普通后门搜索和优化后门搜索进行因果识别的性能,并展示了使用后者获得的性能提升。
[1]:
import time
import random
from networkx.linalg.graphmatrix import adjacency_matrix
import numpy as np
import pandas as pd
import networkx as nx
import dowhy
from dowhy import CausalModel
from dowhy.utils import graph_operations
import dowhy.datasets
创建随机图#
在本节中,我们将创建一个具有指定节点数(本例中为 10 个)的随机图。
[2]:
n = 10
p = 0.5
G = nx.generators.random_graphs.fast_gnp_random_graph(n, p, directed=True)
graph = nx.DiGraph([(u,v) for (u,v) in G.edges() if u<v])
nodes = []
for i in graph.nodes:
nodes.append(str(i))
adjacency_matrix = np.asarray(nx.to_numpy_array(graph))
graph_dot = graph_operations.adjacency_matrix_to_graph(adjacency_matrix, nodes)
graph_dot = graph_operations.str_to_dot(graph_dot.source)
print("Graph Generated.")
df = pd.DataFrame(columns=nodes)
print("Dataframe Generated.")
Graph Generated.
Dataframe Generated.
测试优化后门搜索#
在本节中,我们将比较使用普通后门搜索和优化后门搜索进行因果识别的运行时间。
[3]:
start = time.time()
# I. Create a causal model from the data and given graph.
model = CausalModel(data=df,treatment=str(random.randint(0,n-1)),outcome=str(random.randint(0,n-1)),graph=graph_dot)
time1 = time.time()
print("Time taken for initializing model =", time1-start)
# II. Identify causal effect and return target estimands
identified_estimand = model.identify_effect()
time2 = time.time()
print("Time taken for vanilla identification =", time2-time1)
# III. Identify causal effect using the optimized backdoor implementation
identified_estimand = model.identify_effect(optimize_backdoor=True)
end = time.time()
print("Time taken for optimized backdoor identification =", end-time2)
Time taken for initializing model = 0.004637241363525391
Time taken for vanilla identification = 0.00022339820861816406
Time taken for optimized backdoor identification = 0.00013709068298339844
可以看出,与普通实现相比,优化后门搜索使因果识别更快。