Microsoft.AI.Foundry.Local.WinML
1.0.0
Prefix Reserved
dotnet add package Microsoft.AI.Foundry.Local.WinML --version 1.0.0
NuGet\Install-Package Microsoft.AI.Foundry.Local.WinML -Version 1.0.0
<PackageReference Include="Microsoft.AI.Foundry.Local.WinML" Version="1.0.0" />
<PackageVersion Include="Microsoft.AI.Foundry.Local.WinML" Version="1.0.0" />
<PackageReference Include="Microsoft.AI.Foundry.Local.WinML" />
paket add Microsoft.AI.Foundry.Local.WinML --version 1.0.0
#r "nuget: Microsoft.AI.Foundry.Local.WinML, 1.0.0"
#:package Microsoft.AI.Foundry.Local.WinML@1.0.0
#addin nuget:?package=Microsoft.AI.Foundry.Local.WinML&version=1.0.0
#tool nuget:?package=Microsoft.AI.Foundry.Local.WinML&version=1.0.0
Foundry Local C# SDK
The Foundry Local C# SDK provides a .NET interface for running AI models locally via the Foundry Local Core. Discover, download, load, and run inference entirely on your own machine — no cloud required.
Features
- Model catalog — browse and search all available models; filter by cached or loaded state
- Lifecycle management — download, load, unload, and remove models programmatically
- Chat completions — synchronous and
IAsyncEnumerablestreaming via OpenAI-compatible types - Audio transcription — transcribe audio files with streaming support
- Download progress — wire up an
Action<float>callback for real-time download percentage - Model variants — select specific hardware/quantization variants per model alias
- Optional web service — start an OpenAI-compatible REST endpoint (
/v1/chat_completions,/v1/models) - WinML acceleration — opt-in Windows hardware acceleration with automatic EP download
- Full async/await — every operation supports
CancellationTokenand async patterns - IDisposable — deterministic cleanup of native resources
Installation
dotnet add package Microsoft.AI.Foundry.Local
Building from source
cd sdk/cs
dotnet build src/Microsoft.AI.Foundry.Local.csproj
Or open Microsoft.AI.Foundry.Local.SDK.sln in Visual Studio / VS Code.
WinML: Automatic Hardware Acceleration (Windows)
On Windows, Foundry Local can leverage WinML for GPU/NPU hardware acceleration via ONNX Runtime execution providers (EPs). EPs are large binaries downloaded on first use and cached for subsequent runs.
Install the WinML package variant instead:
dotnet add package Microsoft.AI.Foundry.Local.WinML
Or build from source with:
dotnet build src/Microsoft.AI.Foundry.Local.csproj /p:UseWinML=true
Triggering EP download
EP management is explicit via two methods:
DiscoverEps()— returns an array ofEpInfodescribing each available EP and whether it is already registered.DownloadAndRegisterEpsAsync(names?, progressCallback?, ct?)— downloads and registers the specified EPs (or all available EPs if no names are given). Returns anEpDownloadResult. Overloads are provided so you can pass just a callback without specifying names.
// Initialize the manager first (see Quick Start)
await FoundryLocalManager.CreateAsync(
new Configuration { AppName = "my-app" },
NullLogger.Instance);
var mgr = FoundryLocalManager.Instance;
// Discover what EPs are available
var eps = mgr.DiscoverEps();
foreach (var ep in eps)
{
Console.WriteLine($"{ep.Name} — registered: {ep.IsRegistered}");
}
// Download and register all EPs
var result = await mgr.DownloadAndRegisterEpsAsync();
Console.WriteLine($"Success: {result.Success}, Status: {result.Status}");
// Or download only specific EPs
var result2 = await mgr.DownloadAndRegisterEpsAsync(new[] { eps[0].Name });
Per-EP download progress
Pass an optional Action<string, double> callback to receive (epName, percent) updates
as each EP downloads (percent is 0–100):
string currentEp = "";
await mgr.DownloadAndRegisterEpsAsync((epName, percent) =>
{
if (epName != currentEp)
{
if (currentEp != "")
{
Console.WriteLine();
}
currentEp = epName;
}
Console.Write($"\r {epName} {percent,6:F1}%");
});
Console.WriteLine();
Catalog access no longer blocks on EP downloads. Call DownloadAndRegisterEpsAsync explicitly when you need hardware-accelerated execution providers.
Quick Start
using Microsoft.AI.Foundry.Local;
using Microsoft.Extensions.Logging;
using Microsoft.Extensions.Logging.Abstractions;
using Betalgo.Ranul.OpenAI.ObjectModels.RequestModels;
// 1. Initialize the singleton manager
await FoundryLocalManager.CreateAsync(
new Configuration { AppName = "my-app" },
NullLogger.Instance);
// 2. Get the model catalog and look up a model
var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
var model = await catalog.GetModelAsync("phi-3.5-mini")
?? throw new Exception("Model 'phi-3.5-mini' not found in catalog.");
// 3. Download (if needed) and load the model
await model.DownloadAsync();
await model.LoadAsync();
// 4. Get a chat client and run inference
var chatClient = await model.GetChatClientAsync();
var response = await chatClient.CompleteChatAsync(new[]
{
new ChatMessage { Role = "user", Content = "Why is the sky blue?" }
});
Console.WriteLine(response.Choices![0].Message.Content);
// 5. Clean up
FoundryLocalManager.Instance.Dispose();
Usage
Initialization
FoundryLocalManager is an async singleton. Call CreateAsync once at startup:
await FoundryLocalManager.CreateAsync(
new Configuration { AppName = "my-app" },
loggerFactory.CreateLogger("FoundryLocal"));
Access it anywhere afterward via FoundryLocalManager.Instance. Check FoundryLocalManager.IsInitialized to verify creation.
Catalog
The catalog lists all models known to the Foundry Local Core:
var catalog = await FoundryLocalManager.Instance.GetCatalogAsync();
// List all available models
var models = await catalog.ListModelsAsync();
foreach (var m in models)
Console.WriteLine($"{m.Alias} — {m.Info.DisplayName}");
// Get a specific model by alias
var model = await catalog.GetModelAsync("phi-3.5-mini")
?? throw new Exception("Model 'phi-3.5-mini' not found in catalog.");
// Get a specific variant by its unique model ID
var variant = await catalog.GetModelVariantAsync("phi-3.5-mini-generic-gpu-4")
?? throw new Exception("Variant 'phi-3.5-mini-generic-gpu-4' not found in catalog.");
// List models already downloaded to the local cache
var cached = await catalog.GetCachedModelsAsync();
// List models currently loaded in memory
var loaded = await catalog.GetLoadedModelsAsync();
Model Lifecycle
Each model may have multiple variants (different quantizations, hardware targets). The SDK auto-selects the best variant, or you can pick one. All models implement the IModel interface.
// Check and select variants
Console.WriteLine($"Selected: {model.Id}");
foreach (var v in model.Variants)
Console.WriteLine($" {v.Id} (cached: {await v.IsCachedAsync()})");
// Switch to a different variant
model.SelectVariant(model.Variants[1]);
Download, load, and unload:
// Download with progress reporting
await model.DownloadAsync(progress =>
Console.WriteLine($"Download: {progress:F1}%"));
// Load into memory
await model.LoadAsync();
// Unload when done
await model.UnloadAsync();
// Remove from local cache entirely
await model.RemoveFromCacheAsync();
Chat Completions
var chatClient = await model.GetChatClientAsync();
var response = await chatClient.CompleteChatAsync(new[]
{
new ChatMessage { Role = "system", Content = "You are a helpful assistant." },
new ChatMessage { Role = "user", Content = "Explain async/await in C#." }
});
Console.WriteLine(response.Choices![0].Message.Content);
Streaming
Use IAsyncEnumerable for token-by-token output:
using var cts = new CancellationTokenSource();
await foreach (var chunk in chatClient.CompleteChatStreamingAsync(
new[] { new ChatMessage { Role = "user", Content = "Write a haiku about .NET" } }, cts.Token))
{
Console.Write(chunk.Choices?[0]?.Message?.Content);
}
Chat Settings
Tune generation parameters per client:
chatClient.Settings.Temperature = 0.7f;
chatClient.Settings.MaxTokens = 256;
chatClient.Settings.TopP = 0.9f;
chatClient.Settings.FrequencyPenalty = 0.5f;
Audio Transcription
var audioClient = await model.GetAudioClientAsync();
// One-shot transcription
var result = await audioClient.TranscribeAudioAsync("recording.mp3");
Console.WriteLine(result.Text);
// Streaming transcription
await foreach (var chunk in audioClient.TranscribeAudioStreamingAsync("recording.mp3", CancellationToken.None))
{
Console.Write(chunk.Text);
}
Audio Settings
audioClient.Settings.Language = "en";
audioClient.Settings.Temperature = 0.0f;
Web Service
Start an OpenAI-compatible REST endpoint for use by external tools or processes:
// Configure the web service URL in your Configuration
await FoundryLocalManager.CreateAsync(
new Configuration
{
AppName = "my-app",
Web = new Configuration.WebService { Urls = "http://127.0.0.1:5000" }
},
NullLogger.Instance);
await FoundryLocalManager.Instance.StartWebServiceAsync();
Console.WriteLine($"Listening on: {string.Join(", ", FoundryLocalManager.Instance.Urls!)}");
// ... use the service ...
await FoundryLocalManager.Instance.StopWebServiceAsync();
Configuration
| Property | Type | Default | Description |
|---|---|---|---|
AppName |
string |
(required) | Your application name |
AppDataDir |
string? |
~/.{AppName} |
Application data directory |
ModelCacheDir |
string? |
{AppDataDir}/cache/models |
Where models are stored locally |
LogsDir |
string? |
{AppDataDir}/logs |
Log output directory |
LogLevel |
LogLevel |
Warning |
Verbose, Debug, Information, Warning, Error, Fatal |
Web |
WebService? |
null |
Web service configuration (see below) |
AdditionalSettings |
IDictionary<string, string>? |
null |
Extra key-value settings passed to Core |
Configuration.WebService
| Property | Type | Default | Description |
|---|---|---|---|
Urls |
string? |
127.0.0.1:0 |
Bind address; semi-colon separated for multiple |
ExternalUrl |
Uri? |
null |
URI for accessing the web service in a separate process |
Disposal
FoundryLocalManager implements IDisposable. Dispose stops the web service (if running) and releases native resources:
FoundryLocalManager.Instance.Dispose();
API Reference
Auto-generated API docs live in docs/api/. See GENERATE-DOCS.md to regenerate.
Key types:
| Type | Description |
|---|---|
FoundryLocalManager |
Singleton entry point — create, catalog, web service |
Configuration |
Initialization settings |
ICatalog |
Model catalog interface |
IModel |
Model interface — identity, metadata, lifecycle, variant selection |
Model |
Model with variant selection (implements IModel) |
OpenAIChatClient |
Chat completions (sync + streaming) |
OpenAIAudioClient |
Audio transcription (sync + streaming) |
ModelInfo |
Full model metadata record |
Tests
dotnet test
See test/FoundryLocal.Tests/LOCAL_MODEL_TESTING.md for prerequisites and local model setup.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net9.0-windows10.0.26100 is compatible. net10.0-windows was computed. |
-
net9.0-windows10.0.26100
- Betalgo.Ranul.OpenAI (>= 9.1.0)
- Microsoft.AI.Foundry.Local.Core.WinML (>= 1.0.0)
- Microsoft.Extensions.Logging (>= 9.0.9)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories (2)
Showing the top 2 popular GitHub repositories that depend on Microsoft.AI.Foundry.Local.WinML:
| Repository | Stars |
|---|---|
|
microsoft/ai-dev-gallery
An open-source project for Windows developers to learn how to add AI with local models and APIs to Windows apps.
|
|
|
rwjdk/MicrosoftAgentFrameworkSamples
Samples demonstrating the Microsoft Agent Framework in C#
|