Adolescent Social Structure by by Jim Moody
Adolescent Social Structure by by Jim Moody
Adolescent Social Structure by by Jim Moody
The political blogosphere and the 2004 election: Divided they blog by Lada Adamic
Network Map of Protein-Protein Interactions by Erich E. Wanker of the Max Delbrück Center for Molecular Medicine (MDC)
A social network analysis of Twitter: Mapping the digital humanities community by Grandjean & Mauro
BIT Formation
International Conflict Event Warning System (ICEWS): Material Conflict by Minhas, Hoff, & Ward
Relational data consists of
GLM: \(y_{ij} \sim \beta^{T} X_{ij} + e_{ij}\)
Networks typically show evidence against independence of {\(e_{ij} : i \neq j\)}
Not accounting for dependence can lead to:
We've been hearing this concern for decades now:
Thompson & Walker (1982)
Frank & Strauss (1986)
Kenny (1996)
Krackhardt (1998)
Beck et al. (1998)
Signorino (1999)
Li & Loken (2002)
Hoff and Ward (2004)
Snijders (2011)
Erikson et al. (2014)
Aronow et al. (2015)
Athey et al. (2016)
Values across a row, say \(\{y_{ij},y_{ik},y_{il}\}\), may be more similar to each other than other values in the adjacency matrix because each of these values has a common sender \(i\)
Values across a column, say \(\{y_{ji},y_{ki},y_{li}\}\), may be more similar to each other than other values in the adjacency matrix because each of these values has a common receiver \(i\)
Actors who are more likely to send ties in a network may also be more likely to receive them
Values of \(y_{ij}\) and \(y_{ji}\) may be statistically dependent
# library(devtools) ; devtools::install_github('s7minhas/amen')
library(amen) # Load additive and multiplicative effects pkg
data(IR90s) # Load trade data
Y[1:5,1:5] # Data organized in an adjacency matrix
## ARG AUL BEL BNG BRA ## ARG NA 0.05826891 0.2468601 0.03922071 1.76473080 ## AUL 0.0861777 NA 0.3784364 0.10436002 0.21511138 ## BEL 0.2700271 0.35065687 NA 0.01980263 0.39877612 ## BNG 0.0000000 0.01980263 0.1222176 NA 0.01980263 ## BRA 1.6937791 0.23901690 0.6205765 0.03922071 NA
# Reciprocity cor(c(Y), c(t(Y)), use='complete')
## [1] 0.9392867
# Reciprocity beyond nodal variation? senMean = apply(Y, 1, mean, na.rm=TRUE) recMean = apply(Y, 2, mean, na.rm=TRUE) globMean = mean(Y, na.rm=TRUE) resid <- Y - ( globMean + outer(senMean,recMean,"+")) cor(c(resid), c(t(resid)), use='complete')
## [1] 0.8591242
\[ \begin{aligned} y_{ij} &= \color{red}{\mu} + \color{red}{e_{ij}} \\ e_{ij} &= a_{i} + b_{j} + \epsilon_{ij} \\ \{ (a_{1}, b_{1}), \ldots, (a_{n}, b_{n}) \} &\sim N(0,\Sigma_{ab}) \\ \{ (\epsilon_{ij}, \epsilon_{ji}) : \; i \neq j\} &\sim N(0,\Sigma_{\epsilon}), \text{ where } \\ \Sigma_{ab} = \begin{pmatrix} \sigma_{a}^{2} & \sigma_{ab} \\ \sigma_{ab} & \sigma_{b}^2 \end{pmatrix} \;\;\;\;\; &\Sigma_{\epsilon} = \sigma_{\epsilon}^{2} \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} \end{aligned} \]
\[ \begin{aligned} y_{ij} &= \mu + e_{ij} \\ e_{ij} &= \color{red}{a_{i} + b_{j}} + \epsilon_{ij} \\ \color{red}{\{ (a_{1}, b_{1}), \ldots, (a_{n}, b_{n}) \}} &\sim N(0,\Sigma_{ab}) \\ \{ (\epsilon_{ij}, \epsilon_{ji}) : \; i \neq j\} &\sim N(0,\Sigma_{\epsilon}), \text{ where } \\ \Sigma_{ab} = \begin{pmatrix} \sigma_{a}^{2} & \sigma_{ab} \\ \sigma_{ab} & \sigma_{b}^2 \end{pmatrix} \;\;\;\;\; &\Sigma_{\epsilon} = \sigma_{\epsilon}^{2} \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} \end{aligned} \]
\[ \begin{aligned} y_{ij} &= \mu + e_{ij} \\ e_{ij} &= a_{i} + b_{j} + \epsilon_{ij} \\ \{ (a_{1}, b_{1}), \ldots, (a_{n}, b_{n}) \} &\sim N(0,\color{red}{\Sigma_{ab}}) \\ \{ (\epsilon_{ij}, \epsilon_{ji}) : \; i \neq j\} &\sim N(0,\Sigma_{\epsilon}), \text{ where } \\ \color{red}{\Sigma_{ab}} = \begin{pmatrix} \sigma_{a}^{2} & \sigma_{ab} \\ \sigma_{ab} & \sigma_{b}^2 \end{pmatrix} \;\;\;\;\; &\Sigma_{\epsilon} = \sigma_{\epsilon}^{2} \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} \end{aligned} \]
\[ \begin{aligned} y_{ij} &= \mu + e_{ij} \\ e_{ij} &= a_{i} + b_{j} + \color{red}{\epsilon_{ij}} \\ \{ (a_{1}, b_{1}), \ldots, (a_{n}, b_{n}) \} &\sim N(0,\Sigma_{ab}) \\ \color{red}{\{ (\epsilon_{ij}, \epsilon_{ji}) : \; i \neq j\}} &\sim N(0,\color{red}{\Sigma_{\epsilon}}), \text{ where } \\ \Sigma_{ab} = \begin{pmatrix} \sigma_{a}^{2} & \sigma_{ab} \\ \sigma_{ab} & \sigma_{b}^2 \end{pmatrix} \;\;\;\;\; & \color{red}{\Sigma_{\epsilon}} = \sigma_{\epsilon}^{2} \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} \end{aligned} \]
\[ \begin{aligned} y_{i,j} = \beta_d^T \textbf{x}_{d,i,j} + \beta_r^T \textbf{x}_{r,i} +\beta_c^T \textbf{x}_{c,j} + a_i + b_j + \epsilon_{i,j} \end{aligned} \] Variables we might want to include:
(Hoff 2005; Westveld & Hoff 2010; Hoff et al. 2013; Fosdick & Hoff 2015; Minhas et al. 2016)
Threshold model: linking latent \(Z\) to \(Y\)
Social relations model: inducing network covariance
Estimation:
MCMC routine:
Arguments:
Y an n x n square relational matrixXdyad an n x n x pd array of dyadic covariatesXrow an n x pr array of sender covariatesXcol an n x pc array of receiver covariatesrvar TRUE/FALSE: fit sender random effectscvar TRUE/FALSE: fit receiver random effectsdcor TRUE/FALSE: fit dyadic correlationmodel one of "nrm", "bin", "ord", "cbin", "frn", "rrl"intercept TRUE/FALSE: fit with an intercept?symmetric TRUE/FALSE: are relations directed?nscan number of iterations of the markov chainburn burn in for the chainodens output densityR dimension of multiplicative effectsNodal covariates should be structured as:
Xrow and XcolXn[1:10,]
## pop gdp polity ## ARG 3.548755 5.864710 7.18 ## AUL 2.895912 6.011414 10.00 ## BEL 2.314514 5.370685 10.00 ## BNG 4.789989 5.177956 5.00 ## BRA 5.070915 6.963597 8.00 ## CAN 3.377588 6.531009 10.00 ## CHN 7.091101 8.114522 -7.00 ## COL 3.652734 5.324862 7.82 ## EGY 4.063542 5.371521 -3.55 ## FRN 4.082272 7.101956 9.00
Dyadic covariates should be structured as:
Xd[1:3,1:3,]
conflicts
## ARG AUL BEL ## ARG NA 0 0 ## AUL 0 NA 0 ## BEL 0 0 NA
distance
## ARG AUL BEL ## ARG NA 11.72 11.31 ## AUL 11.72 NA 16.71 ## BEL 11.31 16.71 NA
shared_igos
## ARG AUL BEL ## ARG NA 3.83 3.92 ## AUL 3.83 NA 4.02 ## BEL 3.92 4.02 NA
fitSRM = ame(Y=Y,
Xdyad=Xd, # incorp dyadic covariates
Xrow=Xn, # incorp sender covariates
Xcol=Xn, # incorp receiver covariates
symmetric=FALSE, # tell AME trade is directed
intercept=TRUE, # add an intercept
model='nrm', # model type
rvar=TRUE, # sender random effects (a)
cvar=TRUE, # receiver random effects (b)
dcor=TRUE, # dyadic correlation
R=0, # we'll get to this later
nscan=10000, burn=5000, odens=25,
plot=FALSE, print=FALSE, gof=TRUE
)
objects returned in fitSRM
names(fitSRM)
## [1] "BETA" "VC" "APM" "BPM" "U" "V" "UVPM" "EZ" "YPM" "GOF"
paramPlot(fitSRM$BETA[,1:5])
paramPlot(fitSRM$BETA[,6:ncol(fitSRM$BETA)])
grid.arrange( paramPlot(fitSRM$VC),
arrangeGrob( abPlot(fitSRM$APM, 'Sender Effects'),
abPlot(fitSRM$BPM, 'Receiver Effects') ), ncol=2 )
gofPlot(fitSRM$GOF, symmetric=FALSE)
Lets build on what we have so far and find an expression for \(\gamma\):
\[ y_{ij} \approx \beta^{T} X_{ij} + a_{i} + b_{j} + \gamma(u_{i},v_{j}) \]
(Holland et al. 1983; Nowicki & Snijders 2001; Rohe et al. 2011; Airoldi et al. 2013)
Each node \(i\) is a member of an (unknown) latent class:
\[ \textbf{u}_{i} \in \{1, \ldots, K \}, \; i \in \{1,\ldots, n\} \\ \] The probability of a tie between \(i\) and \(j\) is:
\[ Pr(Y_{ij}=1 | \textbf{u}_{i}, \textbf{u}_{j}) = \theta_{\textbf{u}_{i} \textbf{u}_{j}} \]
Software packages:
statnet (Handcock et al. 2016)blockmodels (Leger 2015)Newman (2006):
Adjectives and
Nouns
White & Murphy (2016): Mixed membership stochastic block model
(Hoff et al. 2002; Krivitsky et al. 2009; Sewell & Chen 2015)
Each node \(i\) has an unknown latent position
\[ \textbf{u}_{i} \in \mathbb{R}^{k} \]
The probability of a tie from \(i\) to \(j\) depends on the distance between them
\[ Pr(Y_{ij}=1 | \textbf{u}_{i}, \textbf{u}_{j}) = \theta - |\textbf{u}_{i} - \textbf{u}_{j}| \]
Software packages:
latentnet (Krivitsky et al. 2015)VBLPCM (Salter-Townshend 2015)Kirkland (2012): North Carolina Legislators
Kuh et al. (2015): Discerning
prey and
predators from food web
(Hoff 2003; Hoff 2007)
Each node \(i\) has an unknown latent factor
\[ \textbf{u}_{i} \in \mathbb{R}^{k} \]
The probability of a tie from \(i\) to \(j\) depends on their latent factors
\[ \begin{aligned} Pr(Y_{ij}=1 | \textbf{u}_{i}, \textbf{u}_{j}) =& \theta + \textbf{u}_{i}^{T} \Lambda \textbf{u}_{j} \, \text{, where} \\ &\Lambda \text{ is a } K \times K \text{ diagonal matrix} \end{aligned} \]
Software packages:
amen (Hoff et al. 2015)\[ \begin{aligned} y_{ij} &= g(\theta_{ij}) \\ &\theta_{ij} = \beta^{T} \mathbf{X}_{ij} + e_{ij} \\ &e_{ij} = a_{i} + b_{j} + \epsilon_{ij} + \textbf{u}_{i}^{T} \textbf{D} \textbf{v}_{j} \\ \end{aligned} \]
Multiplicative effects can be added by toggling the R input parameter
fitAME = ame(Y=Y,
Xdyad=Xd, # incorp dyadic covariates
Xrow=Xn, # incorp sender covariates
Xcol=Xcol, # incorp receiver covariates
symmetric=FALSE, # tell AME trade is directed
intercept=TRUE, # add an intercept
model='nrm', # model type
rvar=TRUE, # sender random effects (a)
cvar=TRUE, # receiver random effects (b)
dcor=TRUE, # dyadic correlation
R=2, # 2 dimensional multiplicative effects
nscan=10000, burn=25, odens=25,
plot=FALSE, print=FALSE, gof=TRUE
)
gofPlot(fitAME$GOF, symmetric=FALSE)
ggCirc(Y=Y, U=fitAME$U, V=fitAME$V)
Cranmer et al. (2017)
Out-of-sample Network Cross-Validation
Null model: model with no covariates
- Weschle (2017)
- Gallop and Minhas (WP)
- Greenhill (2016)
Covariate estimation in longitudinal networks
- Ward et al. (2013)
- Metternich et al. (2015)
- Dorff et al. (WP)
LFM is a powerful framework that has proven useful
These interdependent relations may at times be of interest themselves or in other cases may just help us to better predict
Does AME actually reduce bias?
Hoff provides an argument that rests on exchangeability (Aldous, 1985)
Basis of simulation analysis
# Network simulation
simY = simulate.formula(network(n) ~ edges + edgecov(edgeVar) + networkTerm,
coef=c(
interceptValue,
dyadParamValue,
netParamValue
) )
# Run ergm
ergm(simY ~ edges + edgecov(edgeVar) + networkTerm)
# Run ame with and without multiplicative effects
ame(simY, Xdyad=edges, K=0)
ame(simY, Xdyad=edges, K=2)