Simulates data from an (augmented) DAG, respecting the optional
metadata fields form
, for functional form of relationships, and
distribution
, for the distributions of exogenous nodes and
residuals.
Arguments
- x
An object of class
dagitty
.- beta_default
Function used to specify missing edge coefficients. Default:
runif(1, min = -0.6, max = 0.6)
- n
Atomic integer defining the sample size, default:
500
- run
Logical, indicating whether or not to run the simulation. Default:
TRUE
.- duplicated
Atomic character, indicating how to resolve duplicate terms from multiple edges pointing to the same node. Default:
"unique"
. See Details.
Value
If run
is TRUE
, this function returns a data.frame
with
an additional attribute called attr( , which = "script")
that contains the
script for simulating data. If run
is FALSE
, this function returns the
script as character
vector.
Details
Data is simulated sequentially, first from exogenous nodes and then
from their descendants. If x
is an augmented DAG with metadata indicating
the functional form
of relationships and distribution
of exogenous nodes
and residuals, this information is used. If this information is absent, nodes
and residuals are assumed to be normally distributed, and edges are assumed
to be linear, with coefficients samples based on beta_default
.
The argument duplicated
controls how multiplicative terms are merged across
edges pointing to the same outcome node. The default duplicated = "unique"
removes terms that are duplicated across edges (i.e.,
if two edges point to node "O"
, and both edges specify .5*E
, the resulting
function will say .5*E
. However, if one edge specifies .2*E
and the other
specifies .3*E
, they are not duplicated and will be added.
Alternatively, duplicated = "add"
just sums terms across all edges pointing
into the same outcome node.
Examples
x <- dagitty::dagitty('
dag {
X [distribution="rbinom(size = 2, prob = .5)"]
Z [distribution="rexp()"]
Y [distribution="rnorm()"]
X -> Y [form="0.5+X"]
Z -> Y [form="2*Z"]
A -> X
}
')
txt <- simulate_data(x, n = 5, run = FALSE)
#> ! No valid distribution for data, used rnorm() instead.
df <- simulate_data(x, n = 5, run = TRUE)
#> ! No valid distribution for data, used rnorm() instead.
df_from_txt <- eval(parse(text = txt))