Adolescent Social Structure by by Jim Moody
Adolescent Social Structure by by Jim Moody
Adolescent Social Structure by by Jim Moody
The political blogosphere and the 2004 election: Divided they blog by Lada Adamic
Network Map of Protein-Protein Interactions by Erich E. Wanker of the Max Delbrück Center for Molecular Medicine (MDC)
A social network analysis of Twitter: Mapping the digital humanities community by Grandjean & Mauro
BIT Formation
International Conflict Event Warning System (ICEWS): Material Conflict by Minhas, Hoff, & Ward
Relational data consists of
GLM: \(y_{ij} \sim \beta^{T} X_{ij} + e_{ij}\)
Networks typically show evidence against independence of {\(e_{ij} : i \neq j\)}
Not accounting for dependence can lead to:
We've been hearing this concern for decades now:
Thompson & Walker (1982)
Frank & Strauss (1986)
Kenny (1996)
Krackhardt (1998)
Beck et al. (1998)
Signorino (1999)
Li & Loken (2002)
Hoff and Ward (2004)
Snijders (2011)
Erikson et al. (2014)
Aronow et al. (2015)
Athey et al. (2016)
Values across a row, say \(\{y_{ij},y_{ik},y_{il}\}\), may be more similar to each other than other values in the adjacency matrix because each of these values has a common sender \(i\)
Values across a column, say \(\{y_{ji},y_{ki},y_{li}\}\), may be more similar to each other than other values in the adjacency matrix because each of these values has a common receiver \(i\)
Actors who are more likely to send ties in a network may also be more likely to receive them
Values of \(y_{ij}\) and \(y_{ji}\) may be statistically dependent
# library(devtools) ; devtools::install_github('s7minhas/amen') library(amen) # Load additive and multiplicative effects pkg data(IR90s) # Load trade data
Y[1:5,1:5] # Data organized in an adjacency matrix
## ARG AUL BEL BNG BRA ## ARG NA 0.05826891 0.2468601 0.03922071 1.76473080 ## AUL 0.0861777 NA 0.3784364 0.10436002 0.21511138 ## BEL 0.2700271 0.35065687 NA 0.01980263 0.39877612 ## BNG 0.0000000 0.01980263 0.1222176 NA 0.01980263 ## BRA 1.6937791 0.23901690 0.6205765 0.03922071 NA
# Reciprocity cor(c(Y), c(t(Y)), use='complete')
## [1] 0.9392867
# Reciprocity beyond nodal variation? senMean = apply(Y, 1, mean, na.rm=TRUE) recMean = apply(Y, 2, mean, na.rm=TRUE) globMean = mean(Y, na.rm=TRUE) resid <- Y - ( globMean + outer(senMean,recMean,"+")) cor(c(resid), c(t(resid)), use='complete')
## [1] 0.8591242
\[ \begin{aligned} y_{ij} &= \color{red}{\mu} + \color{red}{e_{ij}} \\ e_{ij} &= a_{i} + b_{j} + \epsilon_{ij} \\ \{ (a_{1}, b_{1}), \ldots, (a_{n}, b_{n}) \} &\sim N(0,\Sigma_{ab}) \\ \{ (\epsilon_{ij}, \epsilon_{ji}) : \; i \neq j\} &\sim N(0,\Sigma_{\epsilon}), \text{ where } \\ \Sigma_{ab} = \begin{pmatrix} \sigma_{a}^{2} & \sigma_{ab} \\ \sigma_{ab} & \sigma_{b}^2 \end{pmatrix} \;\;\;\;\; &\Sigma_{\epsilon} = \sigma_{\epsilon}^{2} \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} \end{aligned} \]
\[ \begin{aligned} y_{ij} &= \mu + e_{ij} \\ e_{ij} &= \color{red}{a_{i} + b_{j}} + \epsilon_{ij} \\ \color{red}{\{ (a_{1}, b_{1}), \ldots, (a_{n}, b_{n}) \}} &\sim N(0,\Sigma_{ab}) \\ \{ (\epsilon_{ij}, \epsilon_{ji}) : \; i \neq j\} &\sim N(0,\Sigma_{\epsilon}), \text{ where } \\ \Sigma_{ab} = \begin{pmatrix} \sigma_{a}^{2} & \sigma_{ab} \\ \sigma_{ab} & \sigma_{b}^2 \end{pmatrix} \;\;\;\;\; &\Sigma_{\epsilon} = \sigma_{\epsilon}^{2} \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} \end{aligned} \]
\[ \begin{aligned} y_{ij} &= \mu + e_{ij} \\ e_{ij} &= a_{i} + b_{j} + \epsilon_{ij} \\ \{ (a_{1}, b_{1}), \ldots, (a_{n}, b_{n}) \} &\sim N(0,\color{red}{\Sigma_{ab}}) \\ \{ (\epsilon_{ij}, \epsilon_{ji}) : \; i \neq j\} &\sim N(0,\Sigma_{\epsilon}), \text{ where } \\ \color{red}{\Sigma_{ab}} = \begin{pmatrix} \sigma_{a}^{2} & \sigma_{ab} \\ \sigma_{ab} & \sigma_{b}^2 \end{pmatrix} \;\;\;\;\; &\Sigma_{\epsilon} = \sigma_{\epsilon}^{2} \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} \end{aligned} \]
\[ \begin{aligned} y_{ij} &= \mu + e_{ij} \\ e_{ij} &= a_{i} + b_{j} + \color{red}{\epsilon_{ij}} \\ \{ (a_{1}, b_{1}), \ldots, (a_{n}, b_{n}) \} &\sim N(0,\Sigma_{ab}) \\ \color{red}{\{ (\epsilon_{ij}, \epsilon_{ji}) : \; i \neq j\}} &\sim N(0,\color{red}{\Sigma_{\epsilon}}), \text{ where } \\ \Sigma_{ab} = \begin{pmatrix} \sigma_{a}^{2} & \sigma_{ab} \\ \sigma_{ab} & \sigma_{b}^2 \end{pmatrix} \;\;\;\;\; & \color{red}{\Sigma_{\epsilon}} = \sigma_{\epsilon}^{2} \begin{pmatrix} 1 & \rho \\ \rho & 1 \end{pmatrix} \end{aligned} \]
\[ \begin{aligned} y_{i,j} = \beta_d^T \textbf{x}_{d,i,j} + \beta_r^T \textbf{x}_{r,i} +\beta_c^T \textbf{x}_{c,j} + a_i + b_j + \epsilon_{i,j} \end{aligned} \] Variables we might want to include:
(Hoff 2005; Westveld & Hoff 2010; Hoff et al. 2013; Fosdick & Hoff 2015; Minhas et al. 2016)
Threshold model: linking latent \(Z\) to \(Y\)
Social relations model: inducing network covariance
Estimation:
MCMC routine:
Arguments:
Y
an n x n square relational matrixXdyad
an n x n x pd array of dyadic covariatesXrow
an n x pr array of sender covariatesXcol
an n x pc array of receiver covariatesrvar
TRUE/FALSE: fit sender random effectscvar
TRUE/FALSE: fit receiver random effectsdcor
TRUE/FALSE: fit dyadic correlationmodel
one of "nrm", "bin", "ord", "cbin", "frn", "rrl"intercept
TRUE/FALSE: fit with an intercept?symmetric
TRUE/FALSE: are relations directed?nscan
number of iterations of the markov chainburn
burn in for the chainodens
output densityR
dimension of multiplicative effectsNodal covariates should be structured as:
Xrow
and Xcol
Xn[1:10,]
## pop gdp polity ## ARG 3.548755 5.864710 7.18 ## AUL 2.895912 6.011414 10.00 ## BEL 2.314514 5.370685 10.00 ## BNG 4.789989 5.177956 5.00 ## BRA 5.070915 6.963597 8.00 ## CAN 3.377588 6.531009 10.00 ## CHN 7.091101 8.114522 -7.00 ## COL 3.652734 5.324862 7.82 ## EGY 4.063542 5.371521 -3.55 ## FRN 4.082272 7.101956 9.00
Dyadic covariates should be structured as:
Xd[1:3,1:3,]
conflicts
## ARG AUL BEL ## ARG NA 0 0 ## AUL 0 NA 0 ## BEL 0 0 NA
distance
## ARG AUL BEL ## ARG NA 11.72 11.31 ## AUL 11.72 NA 16.71 ## BEL 11.31 16.71 NA
shared_igos
## ARG AUL BEL ## ARG NA 3.83 3.92 ## AUL 3.83 NA 4.02 ## BEL 3.92 4.02 NA
fitSRM = ame(Y=Y, Xdyad=Xd, # incorp dyadic covariates Xrow=Xn, # incorp sender covariates Xcol=Xn, # incorp receiver covariates symmetric=FALSE, # tell AME trade is directed intercept=TRUE, # add an intercept model='nrm', # model type rvar=TRUE, # sender random effects (a) cvar=TRUE, # receiver random effects (b) dcor=TRUE, # dyadic correlation R=0, # we'll get to this later nscan=10000, burn=5000, odens=25, plot=FALSE, print=FALSE, gof=TRUE )
objects returned in fitSRM
names(fitSRM)
## [1] "BETA" "VC" "APM" "BPM" "U" "V" "UVPM" "EZ" "YPM" "GOF"
paramPlot(fitSRM$BETA[,1:5])
paramPlot(fitSRM$BETA[,6:ncol(fitSRM$BETA)])
grid.arrange( paramPlot(fitSRM$VC), arrangeGrob( abPlot(fitSRM$APM, 'Sender Effects'), abPlot(fitSRM$BPM, 'Receiver Effects') ), ncol=2 )
gofPlot(fitSRM$GOF, symmetric=FALSE)
Lets build on what we have so far and find an expression for \(\gamma\):
\[ y_{ij} \approx \beta^{T} X_{ij} + a_{i} + b_{j} + \gamma(u_{i},v_{j}) \]
(Holland et al. 1983; Nowicki & Snijders 2001; Rohe et al. 2011; Airoldi et al. 2013)
Each node \(i\) is a member of an (unknown) latent class:
\[ \textbf{u}_{i} \in \{1, \ldots, K \}, \; i \in \{1,\ldots, n\} \\ \] The probability of a tie between \(i\) and \(j\) is:
\[ Pr(Y_{ij}=1 | \textbf{u}_{i}, \textbf{u}_{j}) = \theta_{\textbf{u}_{i} \textbf{u}_{j}} \]
Software packages:
statnet
(Handcock et al. 2016)blockmodels
(Leger 2015)Newman (2006):
Adjectives and
Nouns
White & Murphy (2016): Mixed membership stochastic block model
(Hoff et al. 2002; Krivitsky et al. 2009; Sewell & Chen 2015)
Each node \(i\) has an unknown latent position
\[ \textbf{u}_{i} \in \mathbb{R}^{k} \]
The probability of a tie from \(i\) to \(j\) depends on the distance between them
\[ Pr(Y_{ij}=1 | \textbf{u}_{i}, \textbf{u}_{j}) = \theta - |\textbf{u}_{i} - \textbf{u}_{j}| \]
Software packages:
latentnet
(Krivitsky et al. 2015)VBLPCM
(Salter-Townshend 2015)Kirkland (2012): North Carolina Legislators
Kuh et al. (2015): Discerning
prey and
predators from food web
(Hoff 2003; Hoff 2007)
Each node \(i\) has an unknown latent factor
\[ \textbf{u}_{i} \in \mathbb{R}^{k} \]
The probability of a tie from \(i\) to \(j\) depends on their latent factors
\[ \begin{aligned} Pr(Y_{ij}=1 | \textbf{u}_{i}, \textbf{u}_{j}) =& \theta + \textbf{u}_{i}^{T} \Lambda \textbf{u}_{j} \, \text{, where} \\ &\Lambda \text{ is a } K \times K \text{ diagonal matrix} \end{aligned} \]
Software packages:
amen
(Hoff et al. 2015)\[ \begin{aligned} y_{ij} &= g(\theta_{ij}) \\ &\theta_{ij} = \beta^{T} \mathbf{X}_{ij} + e_{ij} \\ &e_{ij} = a_{i} + b_{j} + \epsilon_{ij} + \textbf{u}_{i}^{T} \textbf{D} \textbf{v}_{j} \\ \end{aligned} \]
Multiplicative effects can be added by toggling the R
input parameter
fitAME = ame(Y=Y, Xdyad=Xd, # incorp dyadic covariates Xrow=Xn, # incorp sender covariates Xcol=Xcol, # incorp receiver covariates symmetric=FALSE, # tell AME trade is directed intercept=TRUE, # add an intercept model='nrm', # model type rvar=TRUE, # sender random effects (a) cvar=TRUE, # receiver random effects (b) dcor=TRUE, # dyadic correlation R=2, # 2 dimensional multiplicative effects nscan=10000, burn=25, odens=25, plot=FALSE, print=FALSE, gof=TRUE )
gofPlot(fitAME$GOF, symmetric=FALSE)
ggCirc(Y=Y, U=fitAME$U, V=fitAME$V)
Cranmer et al. (2017)
Out-of-sample Network Cross-Validation
Null model: model with no covariates
- Weschle (2017)
- Gallop and Minhas (WP)
- Greenhill (2016)
Covariate estimation in longitudinal networks
- Ward et al. (2013)
- Metternich et al. (2015)
- Dorff et al. (WP)
LFM is a powerful framework that has proven useful
These interdependent relations may at times be of interest themselves or in other cases may just help us to better predict
Does AME actually reduce bias?
Hoff provides an argument that rests on exchangeability (Aldous, 1985)
Basis of simulation analysis
# Network simulation simY = simulate.formula(network(n) ~ edges + edgecov(edgeVar) + networkTerm, coef=c( interceptValue, dyadParamValue, netParamValue ) ) # Run ergm ergm(simY ~ edges + edgecov(edgeVar) + networkTerm) # Run ame with and without multiplicative effects ame(simY, Xdyad=edges, K=0) ame(simY, Xdyad=edges, K=2)