chapter_topological_properties.tex

\chapter[Network Topological properties]{Topological properties of directed complex networks}
\label{cpt:topology}
\section{Introduction}
Topological properties of stock complex network are the structural organisations of the interconnections of the system components, e.g., nodes, edges, and the directions and weights of edges, which are also referred to as "network architecture".

In the networks of real stock markets, there are numerous common recurring patterns of connections which have profound effects towards the way the complex financial systems behave. This chapter will introduce the critical topological properties implemented in this thesis.

\section{Degree centrality and strength centrality}
The degree of a node $k$ represents the number of its neighbours. In directed network, out-degree $k_{out}$ is the number of edges which start from the given node and end at others, while in-degree $k_{in}$ is the number of edges which end at the given node and start from others. Thus, there is relationship between $k_{in}$ and $k_{out}$:
\begin{eqnarray}
k=k_{in}+k_{out}.
\end{eqnarray}

As one of the most widespread measures to calculate network centrality, degree centrality of a node can be described as the number of direct links that relate to a specific node~\cite{freeman}. In terms of the directed stock price return network, this thesis mainly focuses on the out-degree analysis on the nodes. Moreover, the strength centrality has generally been accumulated to the sum of weights of out-degrees to form the weighted networks. The equation of this measure is shown as bellow:
\begin{eqnarray}
&&C_D^W(i)=\sum_{j}^{N}w_{ij}
\end{eqnarray}

where $W$ represents the matrix of weighed adjacencies, and $w_{ij}$ represents the weight of the link between node $i$ and $j$.

\section{Degree distribution and strength distribution}
The degree distribution of stock price return network $p(k)$ can be defined as:
\begin{eqnarray}
p_d(k)=\frac{N_k}{N},
\end{eqnarray}

while $N_k$ represents the number of nodes whose out-degree value is k. The distribution of strength has a similar definition:
\begin{eqnarray}
p_s(w)=\frac{N_w}{N},
\end{eqnarray}

while $N_w$ represents the number of nodes whose strength value is w.

\section{Average shortest path length}
In a network, the distance between two nodes is the number of edges contained on the shortest path connecting the two nodes. The average path length of the network refers to the average distance of all pairs of nodes in the network. It indicates the degree of separation between nodes in the network and reflects the global characteristics of the network.

The average shortest path length of a directed network $G$ is defined as the following equation:
\begin{eqnarray}
l_G=\frac{1}{n(n-1)}\sum_{i,j\in V}{d(i,j)}
\end{eqnarray}

where $V$ is the set of nodes of $G$.

\section{Betweenness centrality}
Other than strength, betweenness centrality~\cite{freeman1977set} can be used to determine the critical nodes among the entire network and to recognise the most associated firms in the chosen stock market. When it comes to weighted networks, betweenness centrality of a node is the sum of the weights in the fraction of all-pairs shortest paths that pass through this node, which can be described as the following equation:
\begin{eqnarray}
&&C_B(v)=\sum_{s,t \in V}\frac{\sigma(s,t|v)}{\sigma(s,t)}
\end{eqnarray}

where $V$ is the set of nodes, $\sigma(s,t)$ is the sum of weights of all-pairs shortest $(s,t)$-paths, and $\sigma(s,t|v)$ is the sum of weights of those paths passing through some node $v$ other than $s,t$. If $s=t$, $\sigma(s,t)=1$, and if $v\in s,t$, $\sigma(s,t|v)=0$.

\section{Clustering coefficient} %%*改写*
The clustering coefficient of a node refers to the ratio of the number of connected edges between all of adjacent nodes of the node to the maximum number of possible edges between these adjacent nodes. The clustering coefficient of the network refers to the average of the clustering coefficients of all nodes in the network, which indicates the clustering of nodes in the network, i.e., the clustering characteristic of the network. It also means that how large is the possibility of two nodes are adjacent nodes if they are both adjacent nodes of a same node, which reflects the local characteristics of the network.

In a nutshell, clustering coefficient is a measure of the degree to which nodes in a network tend to cluster together. Concerning the clustering coefficient of the complex networks, it is defined as:
\begin{eqnarray}
C_i=\frac{2E_i}{(k_i(k_i-1))},
\end{eqnarray}

where $k_i$ is the degree of a given node $v_i$, $E_i$ is the real existing edges among the nearest neighbour nodes of the given node $v_i$, and $k_i(k_i-1)/2$ means the maximum possible edges existing between its nearest neighbours of the node $v_i$. Besides, the clustering coefficient of a node accounts for the extent to which the transmission relationship between the given node and its neighbours also exists between its neighbours, and the clustering coefficient may be given by:
\begin{eqnarray}
C=\frac{3\times number\ of\ triangles\ in\ the\ networks}{number\ of\ connected\ triples\ of\ nodes}.
\end{eqnarray}

This measure gives an indication of the clustering in the whole network, and can be applied to both undirected and directed networks.

\section{Efficiency}
Network efficiency measures how efficient for information being conducted and exchanged in the network, which can help to determine whether the objective network shows small-world property. There are global and local efficiencies that on the different scale sizes~\cite{latora2001efficient}.

\subsection{Global efficiency}
Global efficiency quantifies the conduction and exchange of information through out the entire network. The global efficiency of network \textbf{G} is defined as:
\begin{eqnarray}
&&E_{glob}(\textbf{G})=\frac{\sum_{i \neq j \in \textbf{G}} \epsilon_{ij}}{N(N-1)}=\frac{1}{N(N-1)}\sum_{i \neq j \in \textbf{G}} \frac{1}{d_{ij}}
\end{eqnarray}

In other words the global efficiency is calculated as the average of each inverse shortest path length among the entire network, hence it is inversely related to the average shortest path length.

\subsection{Local efficiency}
The local efficiency evaluates the resistance of a network towards node \textit{i} and quantifies the conduction and exchange of information among its neighbours. It can also be regarded as the global efficiency computed on node neighborhoods. The local efficiency of node \textit{i} in network \textbf{G} is defined as:
\begin{eqnarray}
&&E_{loc}(G, i)=\frac{1}{N} \sum_{i \in \textbf{G}} E_{glob}(\textbf{G}_i)
\end{eqnarray}

Therefore it can be seen that the local efficiency is related to the clustering coefficient, which can help recognise the property of small-world for a network.

\section{Assortativity and degree correlations}
The assortativity is a correlation coefficient between the degrees of all nodes on two opposite ends of an edge. A positive assortativity value indicates that nodes tend to link to other nodes with the same or similar degree.

The phenomenon of assortative~\cite{newman2002assortative} mixing can be quantified by means of an assortative coefficient. Let $E_{ij}$ be the number of edges in the network that connect a vertex of type $i$ to one of type $j$, with $i, j=1, . . . , n$, then similar in spirit to the adjacency matrix for vertices, these edges can be represented in the form of an edge
incidence matrix $\mathbf{E}$, with elements $E_{ij}$. A normalized mixing matrix is defined as follows:
\begin{eqnarray}
&&\mathbf{e}=\frac{\mathbf{E}}{\|\mathbf{E}\|},
\end{eqnarray}

where $\|\mathbf{E}\|$ refers to the sum of the elements of the matrix $\mathbf{E}$. The entries $e_{ij}$ in the normalized matrix represent the fraction of edges that connect vertices of types $i$ and $j$, and satisfies the normalization condition,

\begin{eqnarray}
&&\sum_{ij}e_{ij}=1.
\end{eqnarray}

The assortativity coefficient $r$ is then defined thus,
\begin{eqnarray}
&&r=\frac{Tr(\mathbf{e})-\|\mathbf{e}\|^2}{1-\|\mathbf{e}\|^2},\label{con:degreecorr}
\end{eqnarray}

where $Tr(\mathbf{e})$ is the standard matrix trace—the sum of the diagonal elements $e_{ii}$. The value of the coefficient $r$ lies in the range $-1\leq{r}\leq{1}$, where 1 represents a perfectly assortative network, 0 a randomly mixed one and -1 a perfectly disassortative network.

Since the degree is an important topological measure, degree correlations assume a significant amount of relevance as they can give rise to complicated network structural effects. The degree correlation can be computed using Eqn.~\ref{con:degreecorr}, where the elements $e_{ij}$ represent the fraction of edges that connect a vertex of degree $i$ to that with degree $j$.

\section{Modularity}
Modularity stands for the difference between fraction of links that fall within communities and the expected fraction if links are randomly distributed~\cite{newman2004finding}. This project introduces modularity as a measure to evaluate the connection strength between node pair within a group. Regarding to the industry where the stocks belong to, these stocks are divided into different groups hence modularity is used to measure the closeness of intra- and inter-group.

Two groups are combined to generate the modularity value while computing the closeness of two groups, as formula below shows:
\begin{eqnarray}
&&Q=\frac{1}{2m}\sum_{j}\left[w_{ij}-\frac{k_ik_j}{2m}\right]\delta\left(c_i,c_j\right)
\end{eqnarray}

where $c_i$ is the community to which node $i$ is assigned, and $k_i$ represents the degree of node $i$. The $\delta$-function $\delta(u,v)$ is 1 if $u=v$ and 0 otherwise and $m=0.5\sum_{ij}w_{ij}$ is the sum of weights in the whole network.

\section{Summary}
In this chapter the critical topological properties for this thesis have been introduced. Corresponding properties will be calculated if they are applicable for the directed-unweighted or directed-weighted network during the empirical practice.