Graph Neural Networks- Architectures review!
Graph Neural Networks offer an efficient and accurate answers to problems which can be represented as a graph. This way of solution have high potential to get accurate prediction on models in comparison to other types of neural networks! Here I will discuss types of GNN like GCN, GraphSAGA, DeepWalk. The ChebNet will be discussed in other post!
The bullet-stick model is the simplest way of using graphs in chemistry!
I’m addressing the problem from the perspective of graphs! I’ll start with simple GNN and primary models then I walk into new achievements in this field.
I want to start with semi-supervised classification with GCNs. I mainly focus on Kipf and Welling  paper.
I plot a graph using GraphOnline:
This notation means that graph G is made up v nodes and e edges.
In this graph we have 5 nodes or vertices.
We also have 5 edges or links.
Edges can have value
We will introduce 3 main matrices:
Adjacency matrix is nodeXnode matrix which declare the connection between two nodes.
Degree matrix is nodeXnode diagonal matrix which shows the number of edges ended to each node.
Laplacian matrix simply calculable by subtracting A from D ( L=D-A)
These three matrices are the key into the graph-based processing! We can define other matrices like Feature Matrix. This Matrix can include the features of each node (e.g. if nodes represent people, node feature can be their age, height, weight, etc.).
So we are dealing with n rows and f columns in this matrix (nXf) where f is the number of features.
2.Graph Convolutional Network (GCN)
Remind the represented graph! it’s layer one! So what is layer two?!
I mean if we apply some math functions on our matrices we obtain the second layer! What we will do is actually on feature nodes!
So we have function F, adjacency A, features X. Also we define Hl which represents the values of each layer. Also we have:
SO in math expression:
F can be anything! I choose ReLU! we Also need to define matrix W to carry layer-specific trainable weights.
In GCN, each node represents the aggregation of its neighbors.
You may come across with “SEMI-SUPERVISED” learning. In graphs, we may not have label for all of the nodes! so we use available nodes to predict and determine label for unlabeled nodes! As you know, in supervised learning, we have labels completely. so this is called semi-supervised.
Let’s get back to our graph! we talked about aggregation. but there is a problem!! node 2 has three degrees! node 4 has only one! you know what I mean? in aggregation, we aggregated the neighbors! more neighbors,more value!
So we need to do symmetric normalization by degree matrix! To achieve this purpose we take this steps:
I is identical matrix which help us to normalize A. D hat is diagonal degree matrix of A hat!
So finally we have this equation:
My reference for this part of article is “DeepWalk: Online Learning of Social Representations” by Perozzi et al. who present DeepWalk on Zachary’s Karate network .
This network learns unsupervised. So we may have not feature matrix. The algorithm set off random features and develops it in each iteration.
Which problem they solved?!
–They wanted to cluster members of Karate club.
They aimed to prevent cascading errors by separating labels and features! Simply we use a subset of data to learn! RandomWalk and SkipGram help to achieve desired results!
RandomWalk. At each node, we determine walk size and walk length!
GraphSAGE is an applicable network! in previous nets we have the problem with extending model! by developing model, we should train it again! GaphSAGE made it easy !
Three main steps of this algorithm:
1.Sample the neighborhood
2.Aggregate feature information from neighbors
3.Predict graph context and label
The main difference with other nets is that GraphSAGE produces set of aggregation functions from node’s neighborhood instead of vector for each node!
OK! I want to define embedding: it’s not strange thing but calculations on node features! I mean, imagine a graph with three node in triangle shape! to achieve new-feature-value for node 0, we give its neighbors values to a function. for example averaging!
So we take this steps:
first we need to assign random values to each node! For example, I give this random numbers to our graph:
For example, I select node 2. I talked about embedding! in
fact, we will capture its neighbors and apply a function! The values of node 0, 3 and 4 are the input of our function which results embedding! so we fed 0.8, 0.6 and 0.7 to function and its output represents the new value of node 2.
As you may noticed, we didn’t involved the node 2 itself! and this is a disaster :))))) So we add self loops to each node in implementing phase to have node’s value too!
We reaped this work for all other nodes! this creates layer one! then we do this again to create second layer!
There is a question about the function we used! what we can use?! we can use everything even a small neural network! but mainly we use averaging! I mean:
Or we can use maximum function which selects 0.8 in this example (Pool aggregator)!
Also we used one-hop embedding! in two-hop embedding we use neighbors of neighbors too.
5. Further POSTS
In next series, I’ll discuss important, novel and applicable networks which used in graph signal processing (GSP) !