Transforming Data: The rise of Temporal Graphs
This blog is now part of world’s biggest and ever evolving Temporal graph — The World Wide Web (WWW).
Introduction
Graphs have become an important data structure under research and development for numerous reasons such as its ability to model complex non-euclidean data and adaptability in various fields of science and engineering. They opened up an easy but strong opportunity to define entities as nodes and relationships between the entities or nodes as edges. The choice of entities and relationships can vary depending on data abstractions and business needs but the underlying design and core concepts of graphs stay unchanged. If we consider Instagram as a simple example, users and posts could play the role of entities or nodes whereas the interactions between users and posts such as likes, comments or shares could best form the candidates for relationships or edges. Modeling data itself is not just helpful even though it is an important prior to be solved before shifting focus to anything else.
Deep Learning has been pivotal in learning graph data and performing various downstream tasks such as node — graph classification, edge prediction and clustering. There exists classical feature engineering techniques based on graph measures and properties like centrality, degree etc. but they possess limitations which make them not scalable or even usable after a certain threshold. Before diving deep into Deep Learning techniques for graphs, it is better we get to know graphs in some detail. Well, there are two broad categories of graphs, one is static and the other one is dynamic (obviously). Static graphs (Figure 1) are constant structures meaning the node and edge information always stay the same with no notion of time. Traditional machine learning algorithms were primarily designed and maintained considering static graphs only. However, many real world applications involve dynamic graphs that change their properties over time. That means, nodes, edges and attributes can get added, removed or updated at different times.
Dynamic graphs can further be divided into two as Discrete — Time Dynamic Graphs (DTDG) (Figure 1) and Continuous - Time Dynamic Graphs (CTDG) or Temporal Graphs (Figure 1). DTDGs are more like snapshots of a graph taken at fixed time intervals but they guarantee to lose the details about the interactions between two snapshots (Figure 2). In simple terms, if we take one snapshot after each 30 mins window, all the interactions such as node additions — deletions and edge additions — deletions between two windows will completely be lost. One method to address this issue is bringing down the snapshot interval to a smaller number, but this can lead to unnecessary computations and increased storage requirements. CTDGs on the other hand keep track of the timestamps of every single graph event which in turn realistically shows the evolution of graphs over time. Since temporal information is preserved along with nodes and edges, algorithms developed for static graphs or CTDGs become less useful as they were primarily designed to cater non-temporal data only. Time and order, both are equally important components in causal inference because future events are likely to be predicted with some confidence based on a function of the set of past ordered events.
Our current focus will just be Temporal graphs in detail. Consider this as a starter guide if you are new to this topic. I will be using the terms ‘evolving graphs’ and ‘temporal graphs’ interchangeably but assume both are the same.
Why are evolving graphs important?
We have been witnessing the staggering amount and pace at which data (both online and offline) is being produced every single day of our life. Here are some interesting statistics, 402M TerraBytes (TBs) of data are created everyday which sum up to approximately 147 ZettaBytes (ZB) this year only. The amount of data generated annually has grown year-over-year since 2010 and different estimates underline that 90% of the world’s data were generated in the last two years alone.
In the beginning of the Internet revolution, we just had to deal with unstructured plain text in most of the cases because both production and consumption were limited and effortful. Eventually, we started handling structured and hierarchical data, images and videos, time series, biological sequences, sensory readings and so on which means not just the amount was increasing but the complexity too in collecting, storing and maintaining them efficiently. Data is now not at all perceived as isolated pieces rather systemically well connected pieces. There are many equally important ways in which interconnected data is stored such as relational databases, document stores, key — value pairs, Resource Description Framework (RDF) and graphs. To model complex data where there are well defined entities and relationships are present, a graph is a good choice considering the efficiency in querying and analysis. As the volume and complexity of data continue to grow, traditional static graphs have become less resilient in representing the data in a useful way. Most of the classic graphs are static and thus not suited in some real world scenarios, since entities and their relationships can continually evolve over time. New entities may join or leave the graph, new edges can appear and existing ones can disappear, the attributes of the nodes can change — all such incidents are natural in evolving graphs.
Let us now try to understand some of the evolving graphs around us. On E-commerce platforms such as Amazon or Walmart, new sellers, users and products can enter the platform and leave. We can consider all these three entities as nodes in our temporal graph and their relationships among them as edges. If a user adds one product to his/her cart, that directly shows the user’s interest in the product and its category, so an ‘interested in’ relationship can easily be formed and tracked here and if the user checks out by successfully placing the order, that is again a strong relationship or signal to be considered. Product descriptions and user information such as location, gender, age and past purchases can act as node attributes. Recommendation engines (Ting Bai et al, 2020) can consume these pieces of information and recommend new products based on evolving preferences, collaborative influence of peers and the temporal impact of previous purchases. In social networks, users can join, make or remove friends, post updates and interact with existing updates. If we consider users and posts as nodes in temporal graphs, interactions among them can be counted as different relationships. If a user adds someone else to their friend list, a new edge ‘friends with’ can be added to the graph and the user is also allowed to remove this connection by unfriending them which adds an another edge i.e. ‘unfriended’. Interesting signal can be mined and stored in the form of edges when a user interacts with posts by liking, disliking, commenting and sharing them because this can directly or indirectly denote the user’s evolving or changing likes and dislikes of different social media topics such as entertainment, sports, politics, automobile etc.
The nodes and edges described here are for illustrations only, the actual choice of them is solely dependent on what businesses are trying to solve and achieve. Sometimes in the real-world, temporal evolution may only occur at the levels of entities, relationships and their characteristics. Also, there can be different parts of the graph that evolve at different paces.
Learning from Temporal Graphs
Business owners can navigate temporally evolving graphs to answer their needs about entities and the relationships among them. The needs can range from basic queries like ‘what’, ‘when’, ‘why’ and ‘who’ to complex ‘how’ and ‘what next’ queries. In the context of evolving graphs, knowledge refers to a combination of information pieces that serves for understanding causality as explained above. The past events and interactions in an evolving temporal graph can control the occurrence of future events. The main challenge for knowledge extraction in a graph with temporal evolution is to be able to combine information from its different dimensions such as evolution and topology and other external factors. Different researches have been taking place in solving these challenges and many innovations such as Temporal Graph Networks (Emanuele Rossi et al, 2020) and Temporal Graph Transformer (Peng Chuet al, 2023) have come out of labs and academia.
The future of learning from temporal graphs is exciting and I believe the ChatGPT moment of it is yet to come.