Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs
This is a review of the paper titled Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs which uses a novel technique called Time-Decayed Line Graph (TDLG) to effectively capture temporal information into both edges and nodes.
Introduction
Temporal graph is a special kind of graphs that comes with an extra piece of information i.e. time of interactions between the entities a.k.a. nodes in a graph. Conventionally, when an event or an interaction takes place, an edge between a pair of nodes is added to the graph but it does not save the time of event rather discards that information forever. If we follow this paradigm, the order of events over time gets lost which may result in sub - optimal performance in subsequent graph based tasks such as classification and prediction. The modelling techniques in use are primarily focused on nodes and their embeddings where as the authors of this paper propose a simple solution that can learn embeddings corresponding to edges over nodes.
Edge representations can be derived indirectly from node emebeddings by aggregating them, for example if we want to get the embedding of an edge that connects nodes a and b, we can aggregate the embeddings corresponding to those two nodes (assuming the node embeddings are already learned using Node2Vec, CTDNE or any other similar methods) by some aggregation function such as addition, average or even element-wise product. But the quality of edges represented this way is questionable and may lead to not-so-good semantic descriptions. Given that timestamps are associated with edges rather than nodes, the authors claim that it is intuitively a good idea to learn representations and use them for different down-stream tasks.
If you are strongly interested about temporal networks and embedding generation by neighborhood formation, consider reading this.
Framework
Authors construct a line graph of the network which converts each edge of the network into a node and connects interactions that share an endpoint node.
Here in the above image, there are 3 nodes namely a, b and c. Edge e₁ connects nodes a and b and edge e₂ connects node b and c. Edges e₁ and e₂ will be converted to nodes in the resulting line graph and an edge between them will be formed. The edge in the line graph is valid because the two edges in the temporal graph share a common end point i.e. node b. The weight of the edge in the line graph will be calculated based on differences in time between interactions - here t₁ and t₂. The interactions that occur closer in time will have higher edge weights whereas interactions far from each other will have lower edge weights.
The newly created line graph well describes topological proximity i.e. those edges that are topologically closer in the temporal graph will still be represented in its corresponding line graph without loosing any proximity information. Temporal proximity is also captured with the help of edge weights where recent interactions will have higher weights in comparison with older events. Temporal embeddings are calculated from this derived line graph instead of using the original temporal graph.
Time-Decayed Line Graph
We will first touch base on what a temporal graph is in terms of the authors,
A temporal graph G = (V, E) is a set of vertices V and a set of temporal edges E, where E ⊆ V × V × ℝ, t is the time of the temporal edge (u, v, t).
Given a temporal graph G, the authors first convert it into a line graph. The line graph of an undirected, unweighted graph is another undirected, unweighted graph that represents the adjancencies between edges in the original graph. The time information is incorporated to the line graph by taking time of interactions in the temporal graph into account. The resulting Time-Decayed Line Graph (TDLG) is defined as,
Given a temporal graph G = (V, E), the associated time decayed line graph is LTD(G) = (Vₗ, Eₗ, w) where Vₗ = E, Eₗ = {((u, v, t₁), (v, z, t₂)): (u, v, t₁), (v, z, t₂) ∈ E}, and the edge weight function w: Eₗ → ℝ⁺ evaluated on edge ((u, v, t₁), (v, z, t₂)) is given by exp[(-1 / (2 σ² t)) * (t₁ - t₂)²] for σₜ > 0.
The parameter σₜ in the above definition controls the decay of edge weights over time.
Assume that we have a graph with n nodes and m edges and its incidence matrix B ∈ {0, 1}ⁿ ˣ ᵐ where each column of B is an n dimensional vector that corresponds to an edge in the graph and is 1 for the edge’s two node end points and 0 elsewhere. The adjacency matrix A ∈ ℝᵐ ˣ ᵐ of the corresponding line graph can be calculated as,
where bᵢ and bⱼ are two edges or columns in matrix B. This adjacency matrix A can be understood in two ways,
- A is the edge adjacency matrix of the original temporal graph G
- A is the node adjacency matrix of the corresponding line graph of G
The operation bᵢᵀ bⱼ in the above equation is indeed interesting, when bᵢ and bⱼ do not share any common end points (nodes), the result will be 0 and 1 or 2 if the two edges have at-least one end point in common.
For downstream tasks such as link prediction and classification, we will need temporal edge embeddings which are real-valued vectors for each edge which can be then fed into a different model say a logistic regression model. For temporal edge i, we will just have to retrieve either the iᵗʰ row or column of the adjacency matrix A. Both the row and column will work because matrix A is symmetric in nature. This will return an m dimensional vector which will mostly be sparse. Authors say we can try a dimensionality reduction technique such as eigen-decomposition to reduce the dimensions and but this is something not necessarily be done because the sparse vectors themselves show some real representational power.
Temporal Stochastic Block Model
To demonstrate the representational power of the latent structures in the graphs, authors propose a model based on Stochastic Block Model (SBM). We will take a pause here and discuss what SBMs are and why they are important.
Stochastic Block Model is a generative model for random graphs, commonly used to identify and model community structures within networks. An SBM partitions nodes into different groups or blocks and assigns edges between nodes with probabilities that depend on the groups to which the nodes belong. In an SBM, the setup begins by assigning nodes to different groups or communities, where each node i is assigned to a group zᵢ chosen from a set of K possible communities. This assignment is often based on a probability distribution that specifies the likelihood of a node belonging to each community. Once groups are assigned, edges between nodes are created according to an edge probability matrix P of size K x K. This matrix defines the probability Pab of an edge forming between nodes in groups a and b. For instance, if the values on the diagonal of P are high relative to the off-diagonal values, it indicates that nodes more likely to connect with others in their own group than with nodes from other groups, thus creating a strong community structure. Conversely, more uniform values in P would suggest a lack of distinct community structure, resembling a more random graph.
Authors introduce a slightly modified version of the SBM called temporal SBM (TSBM) which comprises the union of several SBMs. TSBMs are officially defined as,
Consider a set of n nodes partitioned into two equal-sized communities {U, V}. For some integer ∆ > 0 and real numbers 0 < α₁, α₂ < 1, construct a random temporal edge set over this graph as follows: Let there be (∆·n) / 2 temporal edges in each of two time periods; the times associated with edges in each time period are drawn from the normal distributions N(μ₁, σ₁²) and N(μ₂, σ₂²), respectively. Of the edges in time period 1, let (1 − α₁) · (∆·n / 2) edges be drawn uniformly at random from U × V , and let the remaining α₁ · (∆·n) / 2 edges be drawn half each from U × U and V × V. Similarly, of the edges in time period 2, let (1 − α₂) · ∆·n / 2 edges be drawn
from U × V , and the remaining α₂ · (∆·n / 2) edges be drawn half each from U × U and V × V.
This model defines a method for generating a random temporal graph with two communities of nodes, each with specific connectivity patterns over time. In this setup, we have n nodes divided equally into two communities, denoted as U and V. The temporal graph evolves over two distinct time periods, with each period containing a fixed number of (∆·n) / 2 temporal edges. The times of these edges are drawn from two separate normal distributions, with timestamps in the first time period following N(μ₁, σ₁²) and in the second time period following N(μ₂, σ₂²).
During each time period, the edges are assigned according to pre-defined community interaction patterns controlled by parameters α₁ and α₂. In the first time period, a portion (1 — α₁) · (∆·n) / 2 of edges is placed between nodes in different communities, connecting nodes in U to nodes in V randomly. The remaining α₁ · (∆·n) / 2 edges are assigned within communities, split evenly between connections within U and within V. A similar assignment process occurs in the second time period, with α₂ controlling the ratio of within and between-community edges. This temporal SBM model thus enables structured temporal network generation by specifying edge counts, temporal distributions, and proportions of community-based interactions over time.
Authors construct a TSBM based on the above technique where it is not possible to distinguish any community structure unless time is considered. As per the definition, each SBM will be formed on the basis of the two distinct distributions for edge times and the corresponding edges with times sampled from these distributions will be called ‘early edges’ and ‘later edges’ respectively.
The first three figures are of adjacency matrices of an early SBM, late SBM and a combination or union of both. If we look at the first matrix of the three, we will notice how dense and sparse the connections are with in -communities U and V (intra-community relation) and inter - communities. Whereas in the late SBM, we see more dense connections with inter-communities in comparison with in individual communities. We will see some sort community structures in the first two figures but not at all in the third figure which is a union of the two SBMs.
Authors are trying to prove that it is still possible to recover the community structures from the union of SBMs if temporal information can be considered. In order to do so, we need the embeddings corresponding to each node in the graph but we only have edge embeddings available. Since the proposed time-decayed line graph method only produces temporal edge embeddings, we must aggregate them into node embeddings. Here is how it works, for each node, get the mean of the embeddings of all edges incident on that node - simple as that!
The first two figures are the adjacency representations of the raw line graph in consideration and its corresponding time decayed graph, respectively. The visualizations in third and fourth figures show that the proposed method succeeded in recovering edge types and node types using both edge and derived node embeddings. The edge embeddings are roughly grouped into 6 categories, each for U and V communities alone and one for U - V (inter - community edges) and these three types will be present in both early and later times (which add up to 6 edge types). Also, the node embeddings are now linearly separated to two communities i.e. U and V successfully. This result highlights that preserving temporal information in graphs allows for the recovery of community structures that might otherwise be lost.
Conclusion
The novel idea of this paper is a framework called Time-Decayed Line Graph (TDLG) which is a non-deep way of representing edges in a semantic space and thereby deriving node features from them. Two things authors emphasize are how effectively the proposed framework is able to capture both temporal and topological proximities in a graph. To show its full effectiveness, an SBM based approach is also illustrated.
The directions of future work mentioned in the paper is increasing this technique’s scalability via reduction of the number of features. The TDLG method return m features (sparse in most cases) per temporal edge based on proximities to all other temporal edges.
Anyway that is all about this one. It was a good read, indeed. I hope you also enjoyed reading the post. Have a good one.
Thank you.