Orleans.GpuBridge.Grains 0.3.0

.NET 9.0

dotnet add package Orleans.GpuBridge.Grains --version 0.3.0

NuGet\Install-Package Orleans.GpuBridge.Grains -Version 0.3.0

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Orleans.GpuBridge.Grains" Version="0.3.0" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Orleans.GpuBridge.Grains" Version="0.3.0" />
                    

                            Directory.Packages.props

<PackageReference Include="Orleans.GpuBridge.Grains" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Orleans.GpuBridge.Grains --version 0.3.0

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Orleans.GpuBridge.Grains, 0.3.0"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Orleans.GpuBridge.Grains@0.3.0

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Orleans.GpuBridge.Grains&version=0.3.0
                    

                            Install as a Cake Addin

#tool nuget:?package=Orleans.GpuBridge.Grains&version=0.3.0
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Orleans.GpuBridge.Grains

GPU-native Orleans grains for high-performance distributed computing with DotCompute backend support.

Overview

Orleans.GpuBridge.Grains provides production-ready Orleans grain base classes and implementations for GPU acceleration. This package supports two deployment models:

Model	Base Class	Latency	Throughput	Use Case
GPU-Offload	`GpuGrainBase<TState>`	10-100μs	15K msg/s	Batch processing
GPU-Native	`RingKernelGrainBase<TState, TMessage>`	100-500ns	2M msg/s	High-frequency messaging

Features

GPU-Offload Model: Traditional grain pattern with GPU kernel execution
GPU-Native Model: Actors that live permanently in GPU memory with sub-microsecond latency
Ring Kernels: Persistent GPU threads processing messages without kernel launch overhead
K2K Messaging: Kernel-to-Kernel communication for GPU-resident actors
Temporal Alignment: HLC and Vector Clocks for causal ordering on GPU
Hypergraph Support: Multi-way relationships with GPU-accelerated pattern matching
Automatic Fallback: CPU fallback when GPU resources are unavailable

Grain Base Classes

GpuGrainBase<TState>

Base class for Orleans grains with GPU-offload kernel execution.

Key Features:

Familiar Orleans grain pattern with GPU kernel invocation
Automatic CPU fallback when GPU unavailable
Kernel execution via InvokeKernelAsync<TIn, TOut>
State management via Orleans persistence

Use Cases:

Batch processing with large datasets
Machine learning inference
Image/video processing pipelines
Infrequent GPU operations (< 1000 msg/s)

RingKernelGrainBase<TState, TMessage>

Base class for GPU-native actors using persistent ring kernels.

Key Features:

Sub-microsecond message latency (100-500ns)
State resides in GPU memory
Zero kernel launch overhead
Lock-free atomic queue operations
Automatic CPU fallback

Use Cases:

High-frequency messaging (> 1000 msg/s)
Real-time trading systems
Game server tick processing
Digital twin simulations

Additional Grain Types

GpuBatchGrain<TIn, TOut>: Generic batch processing grain
GpuStreamGrain<TIn, TOut>: Orleans Streaming with GPU processing
GpuResidentGrain<T>: Persistent GPU memory management
HypergraphVertexGrain: Vertex actor for hypergraph structures
HypergraphHyperedgeGrain: Hyperedge actor for multi-way relationships
PatternDetectorGrain: Temporal pattern detection grain
TemporalGraphGrain: Graph with temporal ordering

Installation

Add the package reference to your Orleans application:

<PackageReference Include="Orleans.GpuBridge.Grains" Version="0.1.0" />

Quick Start

1. Configure GPU Bridge with Ring Kernel Support

using Orleans.GpuBridge.Runtime.Extensions;
using Orleans.GpuBridge.Backends.DotCompute.Extensions;

var builder = Host.CreateApplicationBuilder(args);

builder.Services
    .AddOrleans(orleans =>
    {
        orleans.UseLocalhostClustering();
    });

builder.Services
    .AddGpuBridge(options =>
    {
        options.PreferGpu = true;
        options.FallbackToCpu = true;
        options.MaxConcurrentKernels = 100;
    })
    .AddDotGpuBackend()
    .Services
    .AddRingKernelSupport()
    .AddK2KSupport()
    .AddDotComputeRingKernelBridge();

var host = builder.Build();
await host.RunAsync();

2. Using GpuGrainBase (GPU-Offload Model)

using Orleans.GpuBridge.Grains.Base;

public interface IComputeGrain : IGrainWithGuidKey
{
    ValueTask<float[]> ProcessDataAsync(float[] input);
    ValueTask<ComputeStats> GetStatsAsync();
}

public class ComputeGrain : GpuGrainBase<ComputeState>, IComputeGrain
{
    public async ValueTask<float[]> ProcessDataAsync(float[] input)
    {
        // Execute kernel on GPU (or CPU fallback)
        var result = await InvokeKernelAsync<float[], float[]>("vector-process", input);

        // Update local state
        State.ProcessedCount++;
        State.LastProcessedAt = DateTime.UtcNow;

        return result;
    }

    public ValueTask<ComputeStats> GetStatsAsync() =>
        ValueTask.FromResult(new ComputeStats(State.ProcessedCount, State.LastProcessedAt));
}

public struct ComputeState
{
    public int ProcessedCount;
    public DateTime LastProcessedAt;
}

public record ComputeStats(int ProcessedCount, DateTime LastProcessedAt);

3. Using RingKernelGrainBase (GPU-Native Model)

using Orleans.GpuBridge.Grains.Base;
using System.Runtime.InteropServices;

public interface IHighFrequencyActor : IGrainWithIntegerKey
{
    ValueTask<int> IncrementAsync(int amount);
    ValueTask<int> GetValueAsync();
    ValueTask ResetAsync();
}

public class HighFrequencyActor : RingKernelGrainBase<CounterState, CounterMessage>, IHighFrequencyActor
{
    protected override string KernelId => "counters/high-frequency";

    public async ValueTask<int> IncrementAsync(int amount)
    {
        // Message processed at 100-500ns latency on GPU
        var request = new CounterMessage { Operation = CounterOp.Increment, Amount = amount };
        return await InvokeKernelAsync<CounterMessage, int>(request);
    }

    public async ValueTask<int> GetValueAsync()
    {
        // Read current state from GPU memory
        var state = await GetGpuStateAsync();
        return state.Value;
    }

    public async ValueTask ResetAsync()
    {
        var request = new CounterMessage { Operation = CounterOp.Reset, Amount = 0 };
        await InvokeKernelAsync<CounterMessage, int>(request);
    }
}

[StructLayout(LayoutKind.Sequential)]
public struct CounterState
{
    public int Value;
    public long LastUpdated;
    public long MessageCount;
}

[StructLayout(LayoutKind.Sequential)]
public struct CounterMessage
{
    public CounterOp Operation;
    public int Amount;
}

public enum CounterOp : int { Increment = 0, Decrement = 1, Reset = 2 }

4. Using Hypergraph Grains

using Orleans.GpuBridge.Abstractions.Hypergraph;

// Create a vertex
var vertex = grainFactory.GetGrain<IHypergraphVertex>("user-123");
await vertex.InitializeAsync(new VertexInitRequest
{
    EntityType = "User",
    InitialProperties = new Dictionary<string, object>
    {
        ["name"] = "Alice",
        ["score"] = 100
    }
});

// Create a hyperedge connecting multiple vertices
var hyperedge = grainFactory.GetGrain<IHypergraphHyperedge>("friendship-group-1");
await hyperedge.InitializeAsync(new HyperedgeInitRequest
{
    RelationType = "FriendshipGroup",
    Members = new[]
    {
        new HyperedgeMemberInit { VertexId = "user-123", Role = "Admin" },
        new HyperedgeMemberInit { VertexId = "user-456", Role = "Member" },
        new HyperedgeMemberInit { VertexId = "user-789", Role = "Member" }
    }
});

Configuration

GpuBridgeOptions

services.AddGpuBridge(options =>
{
    // GPU preferences
    options.PreferGpu = true;
    options.FallbackToCpu = true;
    options.MaxRetries = 3;

    // Performance tuning
    options.DefaultMicroBatch = 8192;
    options.MaxConcurrentKernels = 100;
    options.MemoryPoolSizeMB = 1024;
    options.BatchSize = 1024;

    // Device management
    options.MaxDevices = 4;
    options.EnableGpuDirectStorage = false;

    // Backend configuration
    options.DefaultBackend = "DotCompute";
    options.EnableProviderDiscovery = true;

    // Telemetry
    options.EnableProfiling = false;
    options.Telemetry = new TelemetryOptions
    {
        EnableMetrics = true,
        EnableTracing = true,
        SamplingRate = 0.1
    };
});

RingKernelOptions

services.AddRingKernelSupport(options =>
{
    options.DefaultGridSize = 1;        // Single block for single-actor
    options.DefaultBlockSize = 256;     // Optimal for most GPUs
    options.DefaultQueueCapacity = 256; // Message queue size (power of 2)
    options.EnableKernelCaching = true; // Cache compiled kernels
    options.DeviceIndex = 0;            // GPU device to use
});

GpuExecutionHints

// For GpuBatchGrain operations
var result = await batchGrain.ExecuteAsync(inputBatches, new GpuExecutionHints(
    PreferGpu: true,
    BatchSize: 1024,
    DeviceId: 0,
    Timeout: TimeSpan.FromSeconds(30)));

Performance Characteristics

GpuBatchGrain Performance

Batch Size	Throughput (ops/sec)	Latency (ms)	Memory Usage
256	15,000	12.5	512 MB
1024	45,000	18.2	1.2 GB
4096	120,000	28.7	3.8 GB
16384	200,000	45.1	12.1 GB

Memory Management Best Practices

Use appropriate batch sizes: Start with 1024 and tune based on your GPU memory
Enable memory pooling: Reduces allocation overhead for repeated operations
Monitor memory usage: Use GetMemoryInfoAsync() to track allocations
Clean up resources: Always release memory handles when done
Consider memory type: Use pinned memory for frequent host-device transfers

Advanced Usage Examples

Custom Kernel Implementation

public class MatrixMultiplyKernel : IGpuKernel<MatrixPair, Matrix>
{
    public async Task<Matrix> ExecuteAsync(MatrixPair input)
    {
        // GPU kernel implementation
        // This is a simplified example - actual implementation would use CUDA/OpenCL
        return await ComputeMatrixMultiplyGpu(input.A, input.B);
    }
    
    public async Task<Matrix> ExecuteFallbackAsync(MatrixPair input)
    {
        // CPU fallback implementation
        return ComputeMatrixMultiplyCpu(input.A, input.B);
    }
}

// Register the kernel
services.AddKernel(k => k
    .Id("matrix-multiply")
    .In<MatrixPair>()
    .Out<Matrix>()
    .FromFactory<MatrixMultiplyKernel>());

Error Handling and Resilience

try
{
    var result = await batchGrain.ExecuteAsync(inputData);
    
    if (!result.Success)
    {
        _logger.LogError("Batch processing failed: {Error}", result.Error);
        // Handle error appropriately
        return await ProcessWithCpuFallback(inputData);
    }
    
    return result.Results;
}
catch (GpuResourceException ex)
{
    // GPU resource exhaustion
    _logger.LogWarning("GPU resources exhausted, retrying with smaller batch: {Message}", ex.Message);
    return await ProcessWithSmallerBatch(inputData);
}
catch (KernelExecutionException ex)
{
    // Kernel execution failure
    _logger.LogError(ex, "Kernel execution failed");
    throw; // Re-throw for upstream handling
}

Monitoring and Diagnostics

// Monitor batch grain performance
var batchGrain = grainFactory.GetGrain<IGpuBatchGrain<float[], float[]>>("my-kernel");
var result = await batchGrain.ExecuteAsync(data);

Console.WriteLine($"Execution time: {result.ExecutionTime}");
Console.WriteLine($"Throughput: {data.Count / result.ExecutionTime.TotalSeconds:F2} items/sec");

// Monitor stream processing
var streamGrain = grainFactory.GetGrain<IGpuStreamGrain<float[], float[]>>("stream-processor");
var stats = await streamGrain.GetStatsAsync();

Console.WriteLine($"Items processed: {stats.ItemsProcessed}");
Console.WriteLine($"Success rate: {(double)stats.ItemsProcessed / (stats.ItemsProcessed + stats.ItemsFailed):P2}");
Console.WriteLine($"Average latency: {stats.AverageLatencyMs:F2}ms");

// Monitor memory usage
var residentGrain = grainFactory.GetGrain<IGpuResidentGrain<float>>("memory-manager");
var memInfo = await residentGrain.GetMemoryInfoAsync();

Console.WriteLine($"Total allocated: {memInfo.TotalAllocatedBytes / (1024 * 1024)}MB");
Console.WriteLine($"Active allocations: {memInfo.ActiveAllocations}");
Console.WriteLine($"Memory efficiency: {memInfo.MemoryEfficiency:P2}");

Best Practices

Grain Design Patterns

Use StatelessWorker: For high-throughput scenarios, use [StatelessWorker] attribute
Enable Reentrancy: Use [Reentrant] for concurrent processing capabilities
Implement Proper Cleanup: Always dispose GPU resources in grain deactivation
Monitor Resource Usage: Track GPU memory and processing utilization

Performance Optimization

Batch Size Tuning: Optimize batch sizes based on your GPU memory and compute capacity
Memory Reuse: Use resident grains for frequently accessed data
Asynchronous Processing: Leverage Orleans' async nature for overlapping operations
Stream Processing: Use streaming for real-time scenarios with lower latency requirements

Error Handling

Implement Fallbacks: Always provide CPU fallback implementations
Handle Resource Limits: Gracefully handle GPU memory exhaustion
Timeout Management: Set appropriate timeouts for long-running kernels
Logging and Monitoring: Implement comprehensive logging for troubleshooting

Resource Management

Clean Shutdown: Implement proper grain deactivation to free GPU resources
Memory Pooling: Enable memory pooling for better performance
Device Selection: Consider multi-GPU scenarios with device selection
Load Balancing: Use Orleans placement strategies for GPU load balancing

Dependencies

Microsoft.Orleans.Core (≥9.2.1): Core Orleans framework
Microsoft.Orleans.Runtime (≥9.2.1): Orleans runtime components
Microsoft.Orleans.Server (≥9.2.1): Orleans server infrastructure
Microsoft.Orleans.Streaming (≥9.2.1): Orleans streaming support
Orleans.GpuBridge.Abstractions: Core GPU bridge abstractions
Orleans.GpuBridge.Runtime: GPU bridge runtime implementation

Platform Requirements

.NET 9.0 or later
CUDA 12.0+ or OpenCL 2.0+ for GPU acceleration
Orleans 9.2+ compatible silo
Windows 10/11 or Linux with GPU drivers

License

Copyright (c) 2025 Michael Ivertowski

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Contributing

Contributions are welcome! Please read our Contributing Guidelines and Code of Conduct before submitting pull requests.

Support

Documentation: Orleans GPU Bridge Documentation
Issues: GitHub Issues
Discussions: GitHub Discussions
Orleans Community: Orleans Discord

Orleans.GpuBridge.Grains - Bringing GPU acceleration to Orleans distributed applications with minimal complexity and maximum performance.

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net9.0
- DotCompute.Backends.CPU (>= 0.5.3)
- Microsoft.CodeAnalysis.Workspaces.Common (>= 5.0.0)
- Microsoft.DotNet.ILCompiler (>= 10.0.0)
- Microsoft.NET.ILLink.Tasks (>= 9.0.11)
- Microsoft.Orleans.Core (>= 9.2.1)
- Microsoft.Orleans.Runtime (>= 9.2.1)
- Microsoft.Orleans.Server (>= 9.2.1)
- Microsoft.Orleans.Streaming (>= 9.2.1)
- Orleans.GpuBridge.Abstractions (>= 0.3.0)
- Orleans.GpuBridge.Runtime (>= 0.3.0)

NuGet packages (1)

Showing the top 1 NuGet packages that depend on Orleans.GpuBridge.Grains:

Package	Downloads
Orleans.GpuBridge.BridgeFX High-level pipeline API for Orleans GPU Bridge - fluent API for GPU-accelerated data processing	1.1K

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
0.3.0	115	2/9/2026
0.2.1	487	12/8/2025
0.2.0	219	12/5/2025
0.1.0	399	11/30/2025