Graph Retrieval Toolkit

GRT (Graph retrieval toolkit) is a lorem ipsum dolum

Note

This project is under active development

Contents

Installation

Tutorial

SimGNN to Predict Graph Similarity on AIDS700nef Dataset

Import Basic Libraries

We first import all basic libraries before we begin:

import tqdm
import os
import os.path as osp
import torch
from torch_geometric.loader import DataLoader
import numpy as np

Now that we have the basic libraries in place, we import specific modules from sgmatch directory:

from sgmatch.utils.utility import Namespace, GraphPair
from sgmatch.models.matcher import graphMatcher

The sgmatch.utils.utility.Namespace class serves as a container for hyperparameters and other parameters used to instantiate the models. The sgmatch.utils.utility.GraphPair class is the fundamental building block of this toolkit. It builds on top of PyTorch Geometric’s API and creates a class which can be instantiated to return a graph pair or a batch of graph pairs when invoked. The sgmatch.models.matcher.graphMatcher is a wrapper class for all the Graph Similarity / Graph Retrieval models implemented in Graph Retrieval Toolkit.

Load and Create Datasets

This tutorial involves prediction of Graph Similarity of the Graphs present in the ‘AIDS700nef’ dataset. We first download the dataset from torch_geometric as follows:

from torch_geometric.datasets import GEDDataset
name = "AIDS700nef"
ROOT_DIR = '../'
train_graphs = GEDDataset(root=os.path.join(ROOT_DIR, f'data/{name}/train'), train = True, name=name)
test_graphs = GEDDataset(root=os.path.join(ROOT_DIR, f'data/{name}/test'), train = False, name=name)

print(f"Number of Graphs in Train Set : {len(train_graphs)}")
print(f"Number of Graphs in Test Set : {len(test_graphs)}")

Although we have downloaded the graphs from the AIDS700nef dataset, we are not done yet. Our objective is to compute the Graph Similarity between two graphs and thus our model SimGNN takes as input a pair of graphs:

## *** Training Set Pair ***
train_graph_pair_list = []
# Making the Pairs of Graphs
for graph_s_num, graph_s in enumerate(train_graphs):
    for graph_t in train_graphs:
        edge_index_s = graph_s.edge_index
        x_s = graph_s.x
        edge_index_t = graph_t.edge_index
        x_t = graph_t.x
        ged = train_graphs.ged[graph_s.i, graph_t.i]
        norm_ged = train_graphs.norm_ged[graph_s.i, graph_t.i]
        graph_sim = torch.exp(-norm_ged)

        # Making Graph Pair
        graph_pair = GraphPair(edge_index_s=edge_index_s, x_s=x_s,
                                    edge_index_t=edge_index_t, x_t=x_t,
                                    ged=ged ,norm_ged=norm_ged, graph_sim = graph_sim)

        train_graph_pair_list.append(graph_pair)

## *** Test Set Pair ***
test_graph_pair_list = []
# Making the Pairs of Graphs
for graph_s_num, graph_s in enumerate(test_graphs):
    for graph_t in train_graphs:
        edge_index_s = graph_s.edge_index
        x_s = graph_s.x
        edge_index_t = graph_t.edge_index
        x_t = graph_t.x
        ged = train_graphs.ged[graph_s.i, graph_t.i] # Yes, train_graphs.ged is correct
        norm_ged = train_graphs.norm_ged[graph_s.i, graph_t.i] # Yes, train_graphs.norm_ged is correct
        graph_sim = torch.exp(-norm_ged)

        # Making Graph Pair
        graph_pair = GraphPair(edge_index_s=edge_index_s, x_s=x_s,
                                    edge_index_t=edge_index_t, x_t=x_t,
                                    ged=ged ,norm_ged=norm_ged, graph_sim = graph_sim)

        test_graph_pair_list.append(graph_pair)

For some usecases and to prevent model overfitting, we might also need to make a validation set. Although dataset does not come built in with a validation set, we can create our own validation graph pair set as shown:

val_idxs = np.random.randint(len(train_graph_pair_list), size=len(test_graph_pair_list))
val_graph_pair_list = [train_graph_pair_list[idx] for idx in val_idxs]
train_idxs = set(range(len(train_graph_pair_list))) - set(val_idxs)
train_graph_pair_list = [train_graph_pair_list[idx] for idx in train_idxs]
del val_idxs, train_idxs

print("Number of Training Graph Pairs = {}".format(len(train_graph_pair_list)))
print("Number of Validation Graph Pairs = {}".format(len(val_graph_pair_list)))
print("Number of Test Graph Pairs = {}".format(len(test_graph_pair_list)))

Now that we have Training, Validation and Testing Graph Pair Data, we can create our own DataLoaders from this data:

from torch_geometric.loader import DataLoader
batch_size = 128
train_loader = DataLoader(train_graph_pair_list, batch_size=batch_size, follow_batch=["x_s", "x_t"], shuffle=True)
val_loader = DataLoader(val_graph_pair_list, batch_size=batch_size, follow_batch=["x_s", "x_t"], shuffle=True)
test_loader = DataLoader(test_graph_pair_list, batch_size=batch_size, follow_batch=["x_s", "x_t"], shuffle=True)
Training the Model

Now, we define a sgmatch.utils.utility.Namespace object which sends arguments to the base sgmatch.models.matcher.graphMatcher class to initialize our graph similarity model:

av = Namespace(model_name        = "simgnn",
               ntn_slices        = 16,
               filters           = [64, 32, 16],
               mlp_neurons       = [32,16,8,4],
               hist_bins         = 16,
               conv              = 'GCN',
               activation        = 'tanh',
               activation_slope  = None,
               include_histogram = True,
               input_dim         = train_graphs.num_features)

We also define a training function which takes in train and validation graph pairs and train our model for convenience:

def train(train_loader, val_loader, model, loss_criterion, optimizer, device, num_epochs=10):
    train_losses = []
    val_losses = []

    for epoch in range(num_epochs):
        for batch_idx, batch in enumerate(train_loader):
            # print(batch.num_nodes)
            model.train()
            batch = batch.to(device)
            optimizer.zero_grad()

            pred_sim = model(batch.x_s, batch.edge_index_s, batch.x_t, batch.edge_index_t)
            loss = loss_criterion(pred_sim, batch.graph_sim)
            # Compute Gradients via Backpropagation
            loss.backward()
            # Update Parameters
            optimizer.step()
            train_losses.append(loss.item())

        for batch_idx, val_batch in enumerate(val_loader):
            model.eval()
            with torch.no_grad():
                val_batch = val_batch.to(device)
                pred_sim = model(val_batch.x_s, val_batch.edge_index_s,
                        val_batch.x_t, val_batch.edge_index_t)
                val_loss = loss_criterion(pred_sim, val_batch.graph_sim)
                val_losses.append(val_loss.item())

        if torch.cuda.is_available():
            torch.cuda.empty_cache()

        # Printing Epoch Summary
        print(f"Epoch: {epoch+1}/{num_epochs} | Train MSE: {loss} | Validation MSE: {val_loss}")

With everything in place above, we train our model:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = graphMatcher(av).to(device)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), 0.01)
train(train_loader, val_loader, model, criterion, optimizer, device)

sgmatch.models

Graph Matching Base Class

class sgmatch.models.matcher.graphMatcher(av: Namespace)[source]

A Wrapper Class for all the Graph Similarity / Matching models implemented in the library

Parameters:

av (Namespace) – Object of Namespace containing arguments to be passed to models

Returns:

The initialized model selected by the user through the ‘model_name’ key in dict ‘args’

Graph Matching Networks

class sgmatch.models.GMN.GMNEmbed(node_feature_dim: int, enc_node_hidden_sizes: List[int], prop_node_hidden_sizes: List[int], prop_message_hidden_sizes: List[int], aggr_gate_hidden_sizes: List[int], aggr_mlp_hidden_sizes: List[int], edge_feature_dim: Optional[int] = None, enc_edge_hidden_sizes: Optional[List[int]] = None, message_net_init_scale: float = 0.1, node_update_type: str = 'residual', use_reverse_direction: bool = True, reverse_dir_param_different: bool = True, layer_norm: bool = False)[source]

End to end implementation of Graph Matching Networks - Embed from the “Graph Matching Networks for Learning the Similarity of Graph Structured Objects” paper.

Parameters:
  • node_feature_dim (int) – Input dimension of node feature embedding vectors

  • enc_node_hidden_sizes ([int]) – Hyperparameter for the number of tensor slices in the Neural Tensor Network. In this domain, it denotes the number of interaction (similarity) scores produced by the model for each graph embedding pair.

  • prop_node_hidden_sizes ([int]) – Number of filters per convolutional layer in the graph convolutional encoder model.

  • prop_message_hidden_sizes ([int]) – Number of hidden neurons in each linear layer of MLP for reducing dimensionality of concatenated output of neural tensor network and histogram features. Note that the final scoring weight tensor of size [mlp_neurons[-1], 1] is kept separate from the MLP, therefore specifying only the hidden layer sizes will suffice.

  • aggr_gate_hidden_sizes ([int]) – Hyperparameter controlling the number of bins in the node ordering histogram scheme.

  • aggr_mlp_hidden_sizes ([int]) – Type of graph convolutional architecture to be used for encoding ('GCN' or 'SAGE' or 'GAT')

  • edge_feature_dim (int, Optional) – Type of activation used in Attention and NTN modules. ('sigmoid' or 'relu' or 'leaky_relu' or 'tanh') (default: None)

  • enc_edge_hidden_sizes ([int], Optional) – Slope of function for leaky_relu activation. (default: None)

  • message_net_init_scale (float) – Flag for including Strategy Two: Nodewise comparison from SimGNN. (default: 0.1)

  • node_update_type (str) – Slope of function for leaky_relu activation. (default: 'residual')

  • use_reverse_direction (bool) – Flag if need to use messages in reverse direction for node updates. (default: True)

  • reverse_dir_param_different (bool) – Slope of function for leaky_relu activation. (default: True)

  • layer_norm (bool) – Slope of function for leaky_relu activation. (default: True)

class sgmatch.models.GMN.GMNMatch(node_feature_dim: int, enc_node_hidden_sizes: List[int], prop_node_hidden_sizes: List[int], prop_message_hidden_sizes: List[int], aggr_gate_hidden_sizes: List[int], aggr_mlp_hidden_sizes: List[int], edge_feature_dim: Optional[int] = None, enc_edge_hidden_sizes: Optional[List[int]] = None, message_net_init_scale: float = 0.1, node_update_type: str = 'residual', use_reverse_direction: bool = True, reverse_dir_param_different: bool = True, attention_sim_metric: str = 'euclidean', layer_norm: bool = False)[source]

End to end implementation of Graph Matching Networks - Match from the “Graph Matching Networks for Learning the Similarity of Graph Structured Objects” paper.

TODO: Provide description of implementation and differences from paper if any

Parameters:
  • node_feature_dim (int) – Input dimension of node feature embedding vectors

  • enc_node_hidden_sizes ([int]) – Hyperparameter for the number of tensor slices in the Neural Tensor Network. In this domain, it denotes the number of interaction (similarity) scores produced by the model for each graph embedding pair.

  • prop_node_hidden_sizes ([int]) – Number of filters per convolutional layer in the graph convolutional encoder model.

  • prop_message_hidden_sizes ([int]) – Number of hidden neurons in each linear layer of MLP for reducing dimensionality of concatenated output of neural tensor network and histogram features. Note that the final scoring weight tensor of size [mlp_neurons[-1], 1] is kept separate from the MLP, therefore specifying only the hidden layer sizes will suffice.

  • aggr_gate_hidden_sizes ([int]) – Hyperparameter controlling the number of bins in the node ordering histogram scheme.

  • aggr_mlp_hidden_sizes ([int]) – Type of graph convolutional architecture to be used for encoding ('GCN' or 'SAGE' or 'GAT')

  • edge_feature_dim (int, Optional) – Type of activation used in Attention and NTN modules. ('sigmoid' or 'relu' or 'leaky_relu' or 'tanh') (default: None)

  • enc_edge_hidden_sizes ([int], Optional) – Slope of function for leaky_relu activation. (default: None)

  • message_net_init_scale (float) – Flag for including Strategy Two: Nodewise comparison from SimGNN. (default: 0.1)

  • node_update_type (str) – Slope of function for leaky_relu activation. (default: 'residual')

  • use_reverse_direction (bool) – Slope of function for leaky_relu activation. (default: True)

  • reverse_dir_param_different (bool) – Slope of function for leaky_relu activation. (default: True)

  • attention_sim_metric (str) – Slope of function for leaky_relu activation. (default: 'euclidean')

  • layer_norm (bool) – Slope of function for leaky_relu activation. (default: True)

GraphSim

class sgmatch.models.GraphSim.GraphSim(input_dim: int, gnn: str = 'GCN', gnn_filters: List[int] = [64, 32, 16], conv_filters: Optional[ModuleList] = None, mlp_neurons: List[int] = [32, 16, 8, 4, 1], padding_correction: bool = True, resize_dim: int = 10, resize_mode='bilinear', gnn_activation: str = 'relu', mlp_activation: str = 'relu', gnn_dropout_p: float = 0.5, activation_slope: Optional[float] = 0.1)[source]

End to end implementation of GraphSim from the “Learning-based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching” paper.

NOTE: Model assumes that node features of input graph data are arranged according to Breadth-First Search of the graph

TODO: Provide description of implementation and differences from paper if any

Parameters:
  • input_dim (int) – Input dimension of node feature vectors.

  • gnn (str, optional) – Type of Graph Neural Network to use to embed the node features ("Neuro-PNA" or "PNA" or "GCN" or "GAT"`or :obj:”SAGE”` or "GIN" or "graph" or "gated"). (default: 'GCN')

  • gnn_filters ([int], optional) – Number of hidden neurons in each layer of the GNN for embedding input node features. (default: [64,32,16])

  • conv_filters (torch.nn.ModuleList, optional) – List of Convolution Filters to be applied to each similarity matrix generated from each GNN pass. (default: None)

  • mlp_neurons ([int], optional) – Number of hidden neurons in each layer of decoder MLP (default: [32,16,8,4,1])

  • padding_correction (bool, optional) – Flag whether to include padding correction as specified in the paper which is voided due to batching of graphs (default: True)

  • resize_dim (int, optional) – Dimension to resize the similarity image matrices to. (default: 10)

  • resize_mode (str, optional) – Interpolation method to resize the similarity images (nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear' | 'area' | 'nearest-exact'). (default: 'bilinear')

  • gnn_activation (str, optional) – Activation to be used in the GNN layers (default: relu)

  • mlp_activation (str, optional) – Activation to be used in the MLP decoder layers (default: relu)

  • activation_slope (int, optional) – Slope of negative part in case of "leaky_relu" activation (default: 0.1)

class sgmatch.models.GraphSim.GraphSim_v2(input_dim: int, conv_kernel_sizes, conv_in_channels, conv_out_channels, conv_stride, maxpool_kernel_sizes, maxpool_stride, cnn_dropout_p=0.2, gnn: str = 'GCN', gnn_filters: List[int] = [64, 32, 16], mlp_neurons: List[int] = [32, 16, 8, 4, 1], padding_correction: bool = True, resize_dim: int = 10, resize_mode='bilinear', gnn_activation: str = 'relu', mlp_activation: str = 'relu', gnn_dropout_p: float = 0.5, activation_slope: Optional[float] = 0.1)[source]

A more efficient implementation of GraphSim from the “Learning-based Efficient Graph Similarity Computation via Multi-Scale Convolutional Set Matching” paper.

Uses the grouped convolution layer in :object:`PyTorch`to speed up the embedding of heirarchical similarity image matrices by parallelizing computations. Prefer using this variant over version 1 if the convolution network architecture is the same for all similarity image matrices.

TODO: Provide description of implementation and differences from paper if any and update argument description

Parameters:
  • input_dim (int) – Input dimension of node feature vectors.

  • gnn (str, optional) – Type of Graph Neural Network to use to embed the node features ("Neuro-PNA" or "PNA" or "GCN" or "GAT"`or :obj:”SAGE”` or "GIN" or "graph" or "gated"). (default: 'GCN')

  • gnn_filters ([int], optional) – Number of hidden neurons in each layer of the GNN for embedding input node features. (default: [64,32,16])

  • conv_filters (torch.nn.ModuleList, optional) – List of Convolution Filters to be applied to each similarity matrix generated from each GNN pass. (default: None)

  • mlp_neurons ([int], optional) – Number of hidden neurons in each layer of decoder MLP (default: [32,16,8,4,1])

  • padding_correction (bool, optional) – Flag whether to include padding correction as specified in the paper which is voided due to batching of graphs (default: True)

  • resize_dim (int, optional) – Dimension to resize the similarity image matrices to. (default: 10)

  • resize_mode (str, optional) – Interpolation method to resize the similarity images (nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear' | 'area' | 'nearest-exact'). (default: 'bilinear')

  • gnn_activation (str, optional) – Activation to be used in the GNN layers (default: relu)

  • mlp_activation (str, optional) – Activation to be used in the MLP decoder layers (default: relu)

  • activation_slope (int, optional) – Slope of negative part in case of "leaky_relu" activation (default: 0.1)

NeuroMatch

class sgmatch.models.NeuroMatch.SkipLastGNN(input_dim: int, hidden_dim: int, output_dim: int, num_layers: int, conv_type: str = 'Neuro-PNA', dropout: float = 0.0, skip: str = 'learnable')[source]

End to end implementation of NeuroMatch from the “Neural Subgraph Matching” paper

TODO: Provide argument description

Parameters:
  • input_dim (int) – Input dimension of node feature vectors.

  • hidden_dim (int) – Dimension of

  • output_dim (int) – Input dimension of node feature vectors.

  • num_layers (int) –

  • conv_type (str, optional) – Type of Graph Neural Network to encode input features ("Neuro-PNA" or "PNA" or "GCN" or "GAT"`or :obj:”SAGE”` or "GIN" or "graph" or "gated"). (default: "Neuro-PNA")

  • dropout (float, optional) – Dropout probability to prevent overfitting (default: 0.0)

  • skip (str, optional) – Type of skip (default: "learnable")

SimGNN

class sgmatch.models.SimGNN.SimGNN(input_dim: int, ntn_slices: int = 16, filters: list = [64, 32, 16], mlp_neurons: List[int] = [32, 16, 8, 4], hist_bins: int = 16, conv: str = 'GCN', activation: str = 'tanh', activation_slope: Optional[float] = None, include_histogram: bool = True)[source]

End to end implementation of SimGNN from the “SimGNN: A Neural Network Approach to Fast Graph Similarity Computation” paper

TODO: Provide description of implementation and differences from paper if any

Parameters:
  • input_dim (int) – Input dimension of node feature embedding vectors.

  • ntn_slices (int, optional) – Hyperparameter for the number of tensor slices in the Neural Tensor Network. In this domain, it denotes the number of interaction (similarity) scores produced by the model for each graph embedding pair.

  • filters ([int], optional) – Number of filters per convolutional layer in the graph convolutional encoder model. (default: [64, 32, 16])

  • mlp_neurons ([int], optional) – Number of hidden neurons in each linear layer of MLP for reducing dimensionality of concatenated output of neural tensor network and histogram features Note that the final scoring weight tensor of size [mlp_neurons[-1], 1] is kept separate from the MLP, therefore specifying only the hidden layer sizes will suffice. (default: [32,16,8,4])

  • hist_bins (int, optional) – Hyperparameter controlling the number of bins in the node ordering histogram scheme. (default: 16)

  • conv (str, optional) – Type of graph convolutional architecture to be used for encoding ('GCN' or 'SAGE' or 'GAT') (default: 'GCN')

  • activation (str, optional) – Type of activation used in Attention and NTN modules. ('sigmoid' or 'relu' or 'leaky_relu' or 'tanh') (default: 'tanh)

  • activation_slope (float, optional) – Slope of function for leaky_relu activation. (default: None)

  • include_histogram (bool, optional) – Flag for including Strategy Two: Nodewise comparison from SimGNN. (default: True)

ISONET

class sgmatch.models.ISONET.ISONET(node_feature_dim: int, enc_node_hidden_sizes: List[int], prop_node_hidden_sizes: List[int], prop_message_hidden_sizes: List[int], edge_feature_dim: Optional[int] = None, enc_edge_hidden_sizes: Optional[List[int]] = None, message_net_init_scale: float = 0.1, node_update_type: str = 'GRU', use_reverse_direction: bool = True, reverse_dir_param_different: bool = True, layer_norm: bool = False, lrl_hidden_sizes: List[int] = [16, 16], temp: float = 0.1, eps: float = 1e-20, noise_factor: float = 1, gs_num_iters: int = 20)[source]

End-to-End implementation of the ISONET model from the “Interpretable Neural Subgraph Matching for Graph Retrieval” paper.

Parameters:
  • node_feature_dim (int) – Input dimension of node feature embedding vectors.

  • enc_node_hidden_sizes ([int]) – Number of hidden neurons in each linear layer for transforming the node features.

  • prop_node_hidden_sizes ([int]) – Number of hidden neurons in each linear layer of node update MLP f_node. node_feature_dim is appended as the size of the final linear layer to maintain node embedding dimensionality

  • prop_message_hidden_sizes ([int]) – Number of hidden neurons in each linear layer of message computation MLP f_node. Note that the message vector dimensionality (prop_message_hidden_sizes[-1]) may not be equal to node_feature_dim.

  • edge_feature_dim (int, optional) – Input dimension of node feature embedding vectors. (default: None)

  • enc_edge_hidden_sizes ([int], optional) – Number of hidden neurons in each linear layer for transforming the edge features. (default: None)

  • message_net_init_scale (float, optional) – Initialisation scale for the message net output vectors. (default: 0.1)

  • node_update_type (str, optional) – Type of update applied to node feature vectors ("GRU" or "MLP" or "residual"). (default: "GRU")

  • use_reverse_direction (bool, optional) – Flag for whether or not to use the reverse message aggregation for propagation step. (default: True)

  • reverse_dir_param_different (bool, optional) – Flag for whether or not message computation model parameters should be shared by forward and reverse messages in propagation step. (default: True)

  • layer_norm (bool, optional) – Flag for applying layer normalization in propagation step. (default: False)

  • lrl_hidden_sizes ([int], optional) – List containing the sizes for LRL network to pass edge features of input graphs. (default: [16,16])

  • temp (float, optional) – Temperature parameter in the Gumbel-Sinkhorn Network. (default: 0.1)

  • eps (float, optional) – Small value for numerical stability and precision in the Gumbel-Sinkhorn Network. (default: 1e-20)

  • noise_factor (float, optional) – Parameter which controls the magnitude of the effect of sampled Gumbel Noise. (default: 1)

  • gs_num_iters (int, optional) – Number of iterations of Sinkhorn Row and Column scaling (in practice, as little as 20 iterations are needed to achieve decent convergence for N~100). (default: 20)

sgmatch.modules

Attention

class sgmatch.modules.attention.GlobalContextAttention(input_dim: int, activation: str = 'tanh', activation_slope: Optional[float] = None)[source]

Attention Mechanism layer for the attention operator from the “SimGNN: A Neural Network Approach to Fast Graph Similarity Computation” paper

TODO: Include latex formula for attention computation and aggregation update

Parameters:
  • input_dim – Input Dimension of the Node Embeddings

  • activation – The Activation Function to be used for the Attention Layer

  • activation_slope – Slope of the -ve part if the activation is Leaky ReLU

class sgmatch.modules.attention.CrossGraphAttention(similarity_metric: str = 'euclidean')[source]

Attention mechanism layer for the cross-graph attention operator from the “Graph Matching Networks for Learning the Similarity of Graph Structured Objects” paper

TODO: Include latex formula for attention computation and aggregation update

Parameters:

similarity_metric – Similarity metric to be used to compute attention scoring

sgmatch.modules.scoring.similarity(h_i, h_j, mode: str = 'cosine')[source]

Encoding

class sgmatch.modules.encoder.MLPEncoder(node_feature_dim: int, node_hidden_sizes: List[int], edge_feature_dim: Optional[int] = None, edge_hidden_sizes: Optional[List[int]] = None)[source]

MLP node/edge feature encoding scheme following the “Graph Matching Networks for Learning the Similarity of Graph Structured Objects” paper.

NOTE: This is a generic MLP Encoder for graph encoding rather than something explicitly novel from the paper; it has been referenced for clarity. Both node and edge features have separately initialised MLP models.

Parameters:
  • node_feature_dim (int) – Input dimension of node feature embedding vectors

  • node_hidden_sizes ([int]) – Number of hidden neurons in each linear layer for transforming the node features.

  • edge_feature_dim ([int], Optional) – Input dimension of node feature embedding vectors(default: None)

  • edge_hidden_sizes ([int], Optional) – Number of hidden neurons in each linear layer for transforming the edge features. (default: None)

class sgmatch.modules.encoder.OrderEmbedder(margin, use_intersection: bool = False)[source]

Propagation

class sgmatch.modules.propagation.GraphProp(node_feature_dim: int, node_hidden_sizes: List[int], message_hidden_sizes: List[int], edge_feature_dim: Optional[int] = None, message_net_init_scale: float = 0.1, node_update_type: str = 'residual', use_reverse_direction: bool = False, reverse_dir_param_different: bool = True, layer_norm: bool = False, prop_type: str = 'embedding')[source]

Implementation of the message-propagation module from the “Graph Matching Networks for Learning the Similarity of Graph Structured Objects” <https://arxiv.org/pdf/1904.12787.pdf>_ paper.

\[\]

NOTE: This module only computes one propagation step at a time and needs to be called T times for T propagation steps (step-wise calls need to be defined by user in model training scripts).

Parameters:
  • node_feature_dim (int) – Input dimension of node feature embedding vectors

  • node_hidden_sizes ([int]) – Number of hidden neurons in each linear layer of node update MLP f_node. node_feature_dim is appended as the size of the final linear layer to maintain node embedding dimensionality

  • message_hidden_sizes ([int]) – Number of hidden neurons in each linear layer of message computation MLP f_node. Note that the message vector dimensionality (message_hidden_sizes[-1]) may not be equal to node_feature_dim.

  • edge_feature_dim ([int], Optional) – Input dimension of node feature embedding vectors. (default: None)

  • message_net_init_scale (float) – Initialisation scale for the message net output vectors. (default: 0.1)

  • node_update_type (str) – Type of update applied to node feature vectors ("GRU" or "MLP" or "residual) (default: "residual")

  • use_reverse_direction (bool) – Specifies whether or not to use the reverse message aggregation for propagation step. (default: False)

  • reverse_dir_param_different (bool) – Specifies whether or not message computation model parameters should be shared by forward and reverse messages. (default: True)

  • layer_norm (bool) – (default: False)

  • prop_type (str) – Propagation computation type ("embedding" or "matching") (default: "embedding")

Scoring

class sgmatch.modules.scoring.NeuralTensorNetwork(input_dim: int, slices: int = 16, activation: str = 'tanh')[source]

Neural Tensor Network layer from the “SimGNN: A Neural Network Approach to Fast Graph Similarity Computation” paper

TODO: Include latex formula for NTN interaction score computation

Parameters:
  • input_dim – Input dimension of the graph-level embeddings slices. That is, number of slices (K) the weight tensor possesses. Often interpreted as the number of entity-pair (in this use case - pairwise node) relations the data might possess.

  • activation – Non-linearity applied on the computed output of the layer