Graph Representation Learning: From Simple to Higher-order Structures
Blogs from ODSC SpeakersMachine LearningModelingGraph Representation LearningWest 2021posted by ODSC Community November 1, 2021 ODSC Community
Editor’s note: Mayank is a speaker for ODSC West 2021. Be sure to check out his talk, “Graph Representation Learning: From Simple to Higher-Order Structures,” there!
Graphs and networks have become ubiquitous for describing “complex systems,” where it is not enough to just represent the elements of a system, but to also represent the interactions between the elements. Graphs can be defined simply as a collection of ‘nodes’ or ‘vertices’, which represent the elements, and the interactions (alternately called ‘edges’, ‘links’, ‘relations’ and ‘predicates’) between element-pairs.
Examples of complex networks include social networks, organizational networks (including networks representing M&A, subsidiary relations, and inter-company board overlap), collaboration networks, and even ecological networks that are mapped through field studies in the wild. Complex systems are therefore not always human; there is something natural and powerful about the ability of interactions to lead to emergent phenomena arising in a system. To take an example purely from the natural sciences, in biology, elements can represent proteins, with edges representing biological interactions, such as protein-protein interactions. A deep (and still growing) body of work has shown that such networks can yield important insights into the underlying system.
Large-scale graphs are rarely complete or clean, however. For instance, not all of our friends on social media are our actual friends, and there may be close friends or family who we are not connected with on social media. Some graphs are more complex than others. For instance, when the graph is a knowledge graph (KG), edges have directions, and both nodes and edges have human-readable labels. Nodes can have different ‘types’. For example, in a social KG, such as the one being built by LinkedIn, nodes can be people, organizations, educational institutions, or even jobs. Certain edges only make sense in the context of specific node types e.g., an edge label ‘employed at’ would only make sense for an edge that goes from a person-type node to an organization-type node. Since such KGs are large-scale and rarely constructed manually, noise and incompleteness are inevitable. This can make them problematic for direct use in downstream applications.
Machine learning has an obvious role to play here, at multiple levels. Aside from the well-known use case of link prediction, recent problems on which there has been enormous progress, include detecting incorrect links, and predicting the types of links (especially when the graph is represented as a KG). Before the advent of deep learning and representation learning algorithms, solutions to such applications tended to be piecemeal and problem-specific. With deep neural networks, the modern approach instead is to ‘embed’ nodes and edges in the graph into dense, continuous real-valued vectors. These vectors represent both the semantic and structural properties of the nodes and edges. They find a remarkable range of applications, not just for supervised problems, but also unsupervised problems like clustering and visualization.
In the earliest days of this line of work, called graph representation learning (GRL), statistical and linear algebra-based techniques like non-negative matrix factorization were more directly used, rather than neural networks per se. Independent researchers in academia and industry, fueled by the success of the word2vec algorithm proposed in the early part of the previous decade, sought to apply similar algorithms to GRL. One such approach, which has an intuitive explanation, is called DeepWalk. It directly used word2vec, by first conducting random walks starting from each node in the graph and ‘interpreting’ each such random walk as a ‘sentence’, with nodes treated as ‘words’. Given such sequences, word2vec can be directly applied. The algorithm was initially amenable to ordinary networks, rather than labeled-edge graphs like KGs, but researchers have since devised and successfully tested KG-equivalent versions.
DeepWalk was lightweight, easy to set up, and yielded state-of-the-art performance compared to more expensive and bulky methods. Although the skip-gram model used in word2vec (and DeepWalk) would not be considered a ‘deep’ network, either then or now, it was a compelling demonstration of the success that neural networks can have in representation learning. The next natural question that arose was whether we can do better by designing more powerful networks. At the same time, the KG community was also devising its own KG-specific algorithms using neural networks and advanced optimization techniques. Examples that have since become classic in the KG embedding community include the neural tensor network (NTN), the Trans* algorithms, and algorithms devised in Big Tech research groups, such as HolE.
Fast forward to 2021, and there are innumerable Graph Representation Learning algorithms, some of which have become mainstream (such as LINE and node2vec) and others of which remain obscure. Not all of these algorithms are unique, and the evolution of this space shows some consistent patterns. On the one hand, the neural networks have gotten deeper and more powerful, especially because of transformer architectures (like BERT) but also because of Graph Convolutional Networks. Researchers have also started moving away from simple random walks, or fortifying them with more advanced features, including triangles, weights and ‘motifs’ (special, statistically frequent structures, such as a ‘square’, found in graphs).
With all of this research, the field is at an exciting juncture, awaiting a whole host of creative applications to build better products and services. In my upcoming talk at ODSC West 2021, I will be discussing the evolution of Graph Representation Learning algorithms, and future possibilities and applications, in-depth.
About the author/ODSC West 2021 speaker on Graph Representation Learning:
Mayank Kejriwal holds joint appointments as a research assistant professor and research team lead at the University of Southern California. His research on applied AI and knowledge graphs has been covered widely in the international press, including the World Economic Forum, The Houston Chronicle, Big Think, Popular Science, the Associated Press, CNN Indonesia, and Global News Canada. His most recent book, titled “Knowledge Graphs: Fundamentals, Techniques and Applications” was published by MIT Press in 2021.