Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Test Cuda ML Flavor #1026

Closed
tomuben opened this issue Nov 25, 2024 · 7 comments
Closed

Create Test Cuda ML Flavor #1026

tomuben opened this issue Nov 25, 2024 · 7 comments
Assignees
Labels
feature Product feature

Comments

@tomuben
Copy link
Collaborator

tomuben commented Nov 25, 2024

Background

  • We need a generic CUDA-enabled SLC for our customers and for testing the DB
    1. we have a template CUDA SLC for customization for our customers
    2. we probably need also CUDA-enabled SLC with basic CUDA-enabled ML libraries

Acceptance Criteria

ready to start CUDA SLC with CUDA-enabled ML libraries

Name: test-Exasol-8-cuda-ml
New Packages:

  • numba
  • pytorch
@tomuben tomuben self-assigned this Nov 25, 2024
@tomuben tomuben added the feature Product feature label Nov 25, 2024
@tkilias
Copy link
Collaborator

tkilias commented Dec 3, 2024

think about pola.rs

@tkilias
Copy link
Collaborator

tkilias commented Dec 3, 2024

Maybe have one SLC for data engineering and one for model execution (however, only one of pytorch or tensorflow)

@tomuben
Copy link
Collaborator Author

tomuben commented Dec 4, 2024

The SLC with all dependencies became too big: ~50GB (16GB tar.gz)
We decided to have a Test SLC for now, which can be used for further development on database side. This Test SLC should contain:

  1. Cuda toolkit
  2. Cuda compat
  3. Pytorch
  4. Numba

Reason is that both can be installed with conda (in contrast to Tensorflow) without any conflicts. Also, pytorch enables UDF test without JIT, and numba enables tests with JIT. The manual test showed that for JIT (which requires proper setup of the cuda compat module), the LD_LIBRARY_PATH needs to be set correct: The path to the compat shared libraries needs to be in front of the path to the Nvidia driver shared libraries.

@tomuben tomuben changed the title Create standard Cuda ML Flavor Create Test Cuda ML Flavor Dec 5, 2024
@tomuben
Copy link
Collaborator Author

tomuben commented Dec 5, 2024

As this will not be the final standard ML SLC, I renamed it to "test-Exasol-8-cuda-ml`.

@tomuben
Copy link
Collaborator Author

tomuben commented Dec 5, 2024

The test flavor uses the slow-wrapper scripts for now, which have the LD_LIBRARY_PATH configured with the path to the cuda-compat shared libraries.

@tomuben
Copy link
Collaborator Author

tomuben commented Dec 6, 2024

My pytorch test:

CREATE SCHEMA TEST;
OPEN SCHEMA TEST;

ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON_GPU=localzmq+protobuf:///bfsdefault/default/ml-slc/?lang=python#buckets/bfsdefault/default/ml-slc/exaudf/exaudfclient';

CREATE OR REPLACE PYTHON_GPU SCALAR SCRIPT
test_pytorch(epochs INTEGER)
RETURNS VARCHAR(10000) AS

def run(ctx):
    import os
    import time
    import urllib.request
    import tarfile
    import numpy as np
    import argparse
    
    import torch
    from torch import nn
    import torch.nn.functional as F
    from torch.optim import Adam
    
    
    class GraphConv(nn.Module):
        """
            Graph Convolutional Layer described in "Semi-Supervised Classification with Graph Convolutional Networks".
    
            Given an input feature representation for each node in a graph, the Graph Convolutional Layer aims to aggregate
            information from the node's neighborhood to update its own representation. This is achieved by applying a graph
            convolutional operation that combines the features of a node with the features of its neighboring nodes.
    
            Mathematically, the Graph Convolutional Layer can be described as follows:
    
                H' = f(D^(-1/2) * A * D^(-1/2) * H * W)
    
            where:
                H: Input feature matrix with shape (N, F_in), where N is the number of nodes and F_in is the number of 
                    input features per node.
                A: Adjacency matrix of the graph with shape (N, N), representing the relationships between nodes.
                W: Learnable weight matrix with shape (F_in, F_out), where F_out is the number of output features per node.
                D: The degree matrix.
        """
        def __init__(self, input_dim, output_dim, use_bias=False):
            super(GraphConv, self).__init__()
    
            # Initialize the weight matrix W (in this case called `kernel`)
            self.kernel = nn.Parameter(torch.Tensor(input_dim, output_dim))
            nn.init.xavier_normal_(self.kernel) # Initialize the weights using Xavier initialization
    
            # Initialize the bias (if use_bias is True)
            self.bias = None
            if use_bias:
                self.bias = nn.Parameter(torch.Tensor(output_dim))
                nn.init.zeros_(self.bias) # Initialize the bias to zeros
    
        def forward(self, input_tensor, adj_mat):
            """
            Performs a graph convolution operation.
    
            Args:
                input_tensor (torch.Tensor): Input tensor representing node features.
                adj_mat (torch.Tensor): Normalized adjacency matrix representing graph structure.
    
            Returns:
                torch.Tensor: Output tensor after the graph convolution operation.
            """
    
            support = torch.mm(input_tensor, self.kernel) # Matrix multiplication between input and weight matrix
            output = torch.spmm(adj_mat, support) # Sparse matrix multiplication between adjacency matrix and support
            # Add the bias (if bias is not None)
            if self.bias is not None:
                output = output + self.bias
    
            return output
    
    
    class GCN(nn.Module):
        """
        Graph Convolutional Network (GCN) as described in the paper `"Semi-Supervised Classification with Graph 
        Convolutional Networks" <https://arxiv.org/pdf/1609.02907.pdf>`.
    
        The Graph Convolutional Network is a deep learning architecture designed for semi-supervised node 
        classification tasks on graph-structured data. It leverages the graph structure to learn node representations 
        by propagating information through the graph using graph convolutional layers.
    
        The original implementation consists of two stacked graph convolutional layers. The ReLU activation function is 
        applied to the hidden representations, and the Softmax activation function is applied to the output representations.
        """
        def __init__(self, input_dim, hidden_dim, output_dim, use_bias=True, dropout_p=0.1):
            super(GCN, self).__init__()
    
            # Define the Graph Convolution layers
            self.gc1 = GraphConv(input_dim, hidden_dim, use_bias=use_bias)
            self.gc2 = GraphConv(hidden_dim, output_dim, use_bias=use_bias)
    
            # Define the dropout layer
            self.dropout = nn.Dropout(dropout_p)
    
        def forward(self, input_tensor, adj_mat):
            """
            Performs forward pass of the Graph Convolutional Network (GCN).
    
            Args:
                input_tensor (torch.Tensor): Input node feature matrix with shape (N, input_dim), where N is the number of nodes
                    and input_dim is the number of input features per node.
                adj_mat (torch.Tensor): Normalized adjacency matrix of the graph with shape (N, N), representing the relationships between
                    nodes.
    
            Returns:
                torch.Tensor: Output tensor with shape (N, output_dim), representing the predicted class probabilities for each node.
            """
    
            # Perform the first graph convolutional layer
            x = self.gc1(input_tensor, adj_mat)
            x = F.relu(x) # Apply ReLU activation function
            x = self.dropout(x) # Apply dropout regularization
    
            # Perform the second graph convolutional layer
            x = self.gc2(x, adj_mat)
    
            # Apply log-softmax activation function for classification
            return F.log_softmax(x, dim=1)
    
    
    def load_cora(path='./cora', device='cpu'):
        """
        The graph convolutional operation rquires the normalized adjacency matrix: D^(-1/2) * A * D^(-1/2). This step 
        scales the adjacency matrix such that the features of neighboring nodes are weighted appropriately during 
        aggregation. The steps involved in the renormalization trick are as follows:
            - Compute the degree matrix.
            - Compute the inverse square root of the degree matrix.
            - Multiply the inverse square root of the degree matrix with the adjacency matrix.
        """
    
        # Set the paths to the data files
        content_path = os.path.join(path, 'cora.content')
        cites_path = os.path.join(path, 'cora.cites')
    
        # Load data from files
        content_tensor = np.genfromtxt(content_path, dtype=np.dtype(str))
        cites_tensor = np.genfromtxt(cites_path, dtype=np.int32)
    
        # Process features
        features = torch.FloatTensor(content_tensor[:, 1:-1].astype(np.int32)) # Extract feature values
        scale_vector = torch.sum(features, dim=1) # Compute sum of features for each node
        scale_vector = 1 / scale_vector # Compute reciprocal of the sums
        scale_vector[scale_vector == float('inf')] = 0 # Handle division by zero cases
        scale_vector = torch.diag(scale_vector).to_sparse() # Convert the scale vector to a sparse diagonal matrix
        features = scale_vector @ features # Scale the features using the scale vector
    
        # Process labels
        classes, labels = np.unique(content_tensor[:, -1], return_inverse=True) # Extract unique classes and map labels to indices
        labels = torch.LongTensor(labels) # Convert labels to a tensor
    
        # Process adjacency matrix
        idx = content_tensor[:, 0].astype(np.int32) # Extract node indices
        idx_map = {id: pos for pos, id in enumerate(idx)} # Create a dictionary to map indices to positions
    
        # Map node indices to positions in the adjacency matrix
        edges = np.array(
            list(map(lambda edge: [idx_map[edge[0]], idx_map[edge[1]]], 
                cites_tensor)), dtype=np.int32)
    
        V = len(idx) # Number of nodes
        E = edges.shape[0] # Number of edges
        adj_mat = torch.sparse_coo_tensor(edges.T, torch.ones(E), (V, V), dtype=torch.int64) # Create the initial adjacency matrix as a sparse tensor
        adj_mat = torch.eye(V) + adj_mat # Add self-loops to the adjacency matrix
    
        degree_mat = torch.sum(adj_mat, dim=1) # Compute the sum of each row in the adjacency matrix (degree matrix)
        degree_mat = torch.sqrt(1 / degree_mat) # Compute the reciprocal square root of the degrees
        degree_mat[degree_mat == float('inf')] = 0 # Handle division by zero cases
        degree_mat = torch.diag(degree_mat).to_sparse() # Convert the degree matrix to a sparse diagonal matrix
    
        adj_mat = degree_mat @ adj_mat @ degree_mat # Apply the renormalization trick
    
        return features.to_sparse().to(device), labels.to(device), adj_mat.to_sparse().to(device)
    
    
    def train_iter(epoch, model, optimizer, criterion, input, target, mask_train, mask_val, print_every=10):
        start_t = time.time()
        model.train()
        optimizer.zero_grad()
    
        # Forward pass
        output = model(*input) 
        loss = criterion(output[mask_train], target[mask_train]) # Compute the loss using the training mask
    
        loss.backward()
        optimizer.step()
    
        # Evaluate the model performance on training and validation sets
        loss_train, acc_train = test(model, criterion, input, target, mask_train)
        loss_val, acc_val = test(model, criterion, input, target, mask_val)
    
        if epoch % print_every == 0:
            # Print the training progress at specified intervals
            print(f'Epoch: {epoch:04d} ({(time.time() - start_t):.4f}s) loss_train: {loss_train:.4f} acc_train: {acc_train:.4f} loss_val: {loss_val:.4f} acc_val: {acc_val:.4f}')
    
    
    def test(model, criterion, input, target, mask):
        model.eval()
        with torch.no_grad():
            output = model(*input)
            output, target = output[mask], target[mask]
    
            loss = criterion(output, target)
            acc = (output.argmax(dim=1) == target).float().sum() / len(target)
        return loss.item(), acc.item()
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'

    include_bias = False
    val_every = 2
    hidden_dim = 16
    dropout_p = 0.5
    l2 = 0.00004
    lr = 0.01
    epochs = 20000

    use_cuda = True
    use_mps = torch.backends.mps.is_available()

    torch.manual_seed(42)

    device = torch.device('cuda')

    cora_url = 'https://linqs-data.soe.ucsc.edu/public/lbc/cora.tgz'
    print('Downloading dataset...')
    os.chdir("/tmp")
    local_filename, headers = urllib.request.urlretrieve(cora_url)
    with tarfile.open(name=local_filename, mode='r:gz') as tgz_object:
        tgz_object.extractall()

    print('Loading dataset...')
    features, labels, adj_mat = load_cora(device=device)
    idx = torch.randperm(len(labels)).to(device)
    idx_test, idx_val, idx_train = idx[:1000], idx[1000:1500], idx[1500:]

    gcn = GCN(features.shape[1], hidden_dim, labels.max().item() + 1, include_bias, dropout_p).to(device)
    optimizer = Adam(gcn.parameters(), lr=lr, weight_decay=l2)
    criterion = nn.NLLLoss()

    for epoch in range(epochs):
        train_iter(epoch + 1, gcn, optimizer, criterion, (features, adj_mat), labels, idx_train, idx_val, val_every)

    loss_test, acc_test = test(gcn, criterion, (features, adj_mat), labels, idx_test)
    return f'Test set results: loss {loss_test:.4f} accuracy {acc_test:.4f}'
/


SELECT test_pytorch(5);

@tomuben
Copy link
Collaborator Author

tomuben commented Dec 6, 2024

Numba Test

CREATE SCHEMA TEST;
OPEN SCHEMA TEST;


ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON_GPU=localzmq+protobuf:///bfsdefault/default/ml-slc/?lang=python#buckets/bfsdefault/default/ml-slc/exaudf/exaudfclient';



CREATE OR REPLACE PYTHON_GPU SCALAR SCRIPT
test_numba(epochs INTEGER)
RETURNS VARCHAR(10000) AS

import math
from numba import vectorize, cuda
import numpy as np
import os

@vectorize(['float32(float32, float32, float32)',
            'float64(float64, float64, float64)'],
           target='cuda')
def cu_discriminant(a, b, c):
    return math.sqrt(b ** 2 - 4 * a * c)

def run(ctx):
#    os.environ["LD_LIBRARY_PATH"] = f"/opt/conda/cuda-compat/:{os.environ['LD_LIBRARY_PATH']}"
#    return os.environ["LD_LIBRARY_PATH"]
    N = ctx.epochs
    dtype = np.float32

    # prepare the input
    A = np.array(np.random.sample(N), dtype=dtype)
    B = np.array(np.random.sample(N) + 10, dtype=dtype)
    C = np.array(np.random.sample(N), dtype=dtype)

    D = cu_discriminant(A, B, C)

    return str(D)
/
    
SELECT test_numba(1000000000);

tomuben added a commit to exasol/script-languages that referenced this issue Dec 6, 2024
tomuben added a commit that referenced this issue Dec 6, 2024
tomuben added a commit that referenced this issue Dec 9, 2024
closes #1026 

---------

Co-authored-by: Torsten Kilias <[email protected]>
@tomuben tomuben closed this as completed Dec 9, 2024
tomuben added a commit that referenced this issue Jan 3, 2025
Changelog:
- #1026: Created test-Exasol-8-cuda-ml flavor (#1037) 
- Update Dependencies on top of 9.0.0 #1038 (#1039)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Product feature
Projects
None yet
Development

No branches or pull requests

2 participants