Continuous-Time Dynamic Network Embeddings

9 min readOct 3, 2024

This is going to be a holistic review of the paper Continuous-Time Dynamic Network Embeddings which was first presented in BigNet 2018. This is one of the early stage attempts to capture the temporal information present in networks along with topological and other external features.

If you are interested in reading on the rise of temporal graph networks, consider this.

Introduction

Network evolution is now a highly discussed topic among academia and researchers considering the fact that almost all real-world networks are dynamic in nature. It means that nodes and edges can join and leave the network as time progresses. A simple example of this would be of a graph that could be formed from the interactions in an e-commerce platform. For the purpose of this illustration, let’s consider the users and products as nodes of a graph and the interactions between them as edges. So, new members could join and products could get added or removed from the platform. Authors of this paper are trying to propose a new framework where temporal information is incorporated into existing random walk - a stochastic process with random variables W₁vᵢ, W₂vᵢ, …, Wₖvᵢ such that Wₖ₊₁vᵢ is a vertex chosen at random from the neighbors of vertex vₖ (Perozzi et. al, 2014) - based embedding approaches such as DeepWalk (Perozzi et. al, 2014) and Node2Vec (Grover et. al, 2016). Similar algorithms simply rely on random walks starting from every single node ignoring the temporal properties. The proposed framework allows one to learn appropriate node embeddings totally respecting time-dependencies which in-turn removes spurious information caused by non-existing sequences.

Time-independent Random Walks

Authors of the paper claim that their novel approach learns stronger network representations from continuous-time dynamic networks or temporal networks. We should take a pause here and think about what shortcomings of the existing random walks they are trying to solve. Given a snapshot of the graph, one tends to generate random walks of arbitrary length based on the neighborhood information of each node and assume all such generated sequences are valid. These sequences will then be used in conjunction with sequence embedding techniques such as Word2Vec to generate their corresponding embeddings that well capture the topological as well as external features available in the network. Naively, a random walk starts from a random node and the control transitions from the first node to one of its neighbors and this process gets repeated until the number of nodes selected or in other words, length of the node sequence reaches some limit. The only criteria we follow is that the control gets transferred to one of the neighboring nodes of the current node and nothing else.

The assumption that every sequence that followed this criteria is valid need not to hold true always since some noise which is already present in the data that we knowingly or unknowingly ignored could have poisoned the entire sequence by corrupting some of the intermediate nodes in the walk. Imagine we have two nodes namely A and B and an edge between them. This edge could denote a relationship between either A - B or B - A but we cannot guarantee which one is true unless we have some directions given. What if A is a sender of a message and B is the corresponding receiver, the edge ‘sends’ is only valid when this exists between A and B but not vice versa. In an undirected graph, these two nodes may seem neighbors of each other but in real-world what actually works is only one of them. So randomly selecting neighbors of each node is not as good because non-existent relations like the one we slightly explained before could happen in the form of noise which makes the sequence invalid. If training algorithms encounter such invalid sequences during training, they might learn invalid as well spurious information.

Time-dependent Walks

Existing random walk based embedding approaches try to model the space of sequences of all random walks S of graph G but actually what is valid is a subset Sₜ only. The proposed approach limits the space to be learned to just Sₜ by introducing a time based condition along with the existing neighborhood criteria. That is, we are now only allowed to sample new nodes based on not just the neighborhood information but also the time at which each edge between the neighbors came into existence. A temporal walk St from node vᵢ₁ to node vᵢₗ₊₁ is defined as a sequence of edges {(vᵢ₁, vᵢ₂, tᵢ₁), (vᵢ₂, vᵢ₃, tᵢ₂), …, (vᵢₗ, vᵢₗ₊₁, tᵢₗ)} such that tᵢ₁ ≤ tᵢ₂ ≤ … ≤ tᵢₗ. A temporal walk is a temporally valid sequence of edges traversed in the increasing order edge times and therefore accept time. In simple terms, three nodes vᵢ₁, vᵢ₂, and vᵢ₃ can appear in a temporal walk if and only if vᵢ₂ and vᵢ₃ are neighbors of vᵢ₁ and vᵢ₂ respectively and the edge time between nodes vᵢ₂ - vᵢ₃ is greater than of nodes vᵢ₁ - vᵢ₂. If we were to do random walks on these nodes, sequences vᵢ₁ - vᵢ₂ - vᵢ₃ and vᵢ₃ - vᵢ₂ - vᵢ₁ would be possible but the second one would be indeed invalid. If vᵢ₁ - vᵢ₂ - vᵢ₃ are one way communication channels, the temporal walk would probably show the flow of information starting from vᵢ₁ to vᵢ₃ but at the same time communicating backwards is not possible so the walk vᵢ₃ - vᵢ₂ - vᵢ₁ would become irrelevant.

We can now be sure that an embedding model that generated embeddings based on the invalid sequence would learn unwanted representations which eventually lead to degradation of evaluation metrics. In a nutshell, time-dependent walks are sequences of nodes connected by edges with non-decreasing edge times.

Selection of First Edge and Subsequent Nodes

We are now going to discuss selecting the first edge from which a temporal walk should start and then progress. So far we were thinking about selecting a random node from the given graph and then iteratively sampling neighbors until we reach some walk limit - which is always true in case of random walks. Unlike in a random walk, we begin by selecting an edge with some timestamp in a temporal walk. We are not starting from a random node because it does not have any time information associated with it. The alternate option is to select an edge that connects two nodes together and we assume the edge is accompanied with some timestamp which might denote the time at which the interaction took place.

Authors propose two independent strategies namely unbiased and biased to select the first edge. In unbiased selection, each edge in a temporal graph will have an equal probability to be selected as the first edge i.e. if Eₜ is the set of all temporal edges, then each edge eᵢ will be equally likely to be sampled which means the probabilities follow a uniform distribution.

Probability(eᵢ) = 1 / |Eₜ|

In biased selection, we use a non-uniform distribution to select an edge to begin the walk where the probabilities are not equal but weighted based on some function (exponential or linear). The weight function can be chosen in such a way that it defines a probability distribution that favors either early or later edges appearing in the graph. The rationale behind biasing sounds sensible because the downstream applications of the learned embeddings are heavily dependent on the representations, so some applications might prefer more recent information whereas others opt for old information. For example, in a link prediction problem, we might want to begin temporal walks from edges closer to the current time since new edges might be more relevant or highly indicative of the current state of the system. This way biased selection helps us in selecting the first edge with some bias applied.

After the first edge is selected, we will now have to sample the subsequent nodes to finish the walk. For that, we will first identify the temporal neighborhood of the second node in the first edge i.e. if our edge eᵢ connects a node vᵢ₁ to vᵢ₂, then we would look into the temporal neighborhood of vᵢ₂ and figure out all those eligible nodes. Being in the neighborhood is not just the only criteria to be passed but there exists a second one regarding the timestamps of the edges that connect the second node in the first edge and its neighbors. Edges with timestamps greater than the timestamp of the first edge and the trailing nodes at the end of these edges are now eligible to be selected as the next node in the walk (Figure 1).

Figure 1. We assume that the walk starts from edge E₀ with timestamp 11.7. Then the nodes that form the temporal neighborhood of node v₁ will be v₂ and v₄ only.

The temporal neighborhood of a node v can be defined as,

Γt (vi) = {(w, t′) | e = (vi, w, t′) ∈ Eₜ∧T(e) > t}

where e is the first edge in the walk, Eₜ is the set of all edges and T is a function that maps each edge e to a timestamp, t is the timestamp of the edge that connects vᵢ₋₁ and vᵢ.

This means that node w is in the temporal neighborhood of vᵢ iff there exists an edge e starting from vᵢ to w and the timestamp of e is greater than t.

From the set of temporal neighbors, authors use the same unbiased and biased strategies to select the next node in the walk. In unbiased setting, every node in the temporal neighborhood will have the same probability (uniformly distributed) of getting selected whereas unbiased setting uses either an exponential or a linear function to add weightage (non-uniformly distributed) to the probabilities of nodes.

Temporal Context Windows

Authors point out that since we have time constraints on selecting subsequent nodes, it is pretty much possible to run out of nodes in the walk, so a hard limit on the maximum length of the walks is not always required, rather they define a minimum length ω. This is imposed because of the minimum context-window length requirement in the sequence model which we use to learn the embeddings later. So any sequence with at least the length ω will be valid. If S is the set of all valid sequences, then the total number of context windows can be from it will be,

β = ∑ |Si| − ω + 1

Modeling

Once all the temporal walks are generated, the next step will be to use a sequence modeling algorithm to learn time-dependent embeddings corresponding to each node in the sequences. Since the restrictions are only applicable to the sequence generation phase, any sequence modeling algorithm with minimum hyperparameters can be utilized here. We have transformers (Vaswani et. al, 2017) now, so it can also be a candidate.

Evaluation

The proposed algorithm is tested against different existing algorithms such as DeepWalk, Node2Vec and LINE on link prediction task on multiple datasets. The way in which the train and test split is made is quite simple, authors sort each dataset in the ascending order of edge times and use the first 75% of the data to train the model and use the rest for evaluating the trained model. The last 25% of the sorted data is considered as positive examples and they randomly generate the same proportion of data as negative examples. Authors observe that the proposed algorithm consistently outperformed all the existing algorithms and they say important information is lost when temporal information is ignored. Overall 11.9% gain is observed across datasets and I believe the biggest factor that helped the authors achieve this is the set of regulations they introduced to the sequence generation phase which substantially reduced the noise in the training data.

Conclusion

Authors of the paper were trying to come up with a framework that incorporates the temporal information into existing random walk methods believing that it cannot be ignored at all. The novel idea of the paper is something they call a ‘temporal walk’ which is an ordered set of nodes and edges with non-decreasing edge times. Each such walk will form the basis of the training data which will later be used in association with a sequence learning algorithm like Word2Vec. Models trained on the data of this new type of walk perform better than existing algorithms and overall a gain of 11.9% is observed. This temporal walk something compatible with any sequence learning algorithm, so scaling this to an existing will not be a headache.

As we navigate the landscape of Continuous-Time Dynamic Networks, we recognize the profound impact of temporal information on our understanding of complex systems. I appreciate your time and interest. Thank you for spending time here. Have a good one 👋👋👋