UnSpec 1.0.0
dotnet add package UnSpec --version 1.0.0
NuGet\Install-Package UnSpec -Version 1.0.0
<PackageReference Include="UnSpec" Version="1.0.0" />
<PackageVersion Include="UnSpec" Version="1.0.0" />
<PackageReference Include="UnSpec" />
paket add UnSpec --version 1.0.0
#r "nuget: UnSpec, 1.0.0"
#:package UnSpec@1.0.0
#addin nuget:?package=UnSpec&version=1.0.0
#tool nuget:?package=UnSpec&version=1.0.0
UnSpec
A .NET 9.0 class library implementing the Universal Normalization Algorithm Specification (UNAS) — deterministic normalization of entity identifiers across all world scripts, naming conventions, creative works, geographic regions, and languages.
UnSpec produces stable, comparable identifiers from messy real-world data by normalizing Unicode text, transliterating scripts, parsing culture-specific names, and generating BLAKE3-based composite identifiers with FRBR-layered granularity.
Key Features
- Unicode Preprocessing Pipeline — NFC normalization, whitespace/punctuation/diacritic canonicalization, case folding with Turkish/German/Greek special cases
- Script Detection & Transliteration — 15+ script families (Cyrillic, Greek, Arabic, Hebrew, Devanagari, CJK, Georgian, Armenian, Ethiopic, Thai, Bengali, Tamil, Thaana, Tifinagh, and more) to ASCII Latin
- Person Name Normalization — Culture-aware strategies for Western, Spanish/Portuguese, East Asian, Arabic, South Asian, Icelandic, mononymous, and pseudonymous names
- Creative Work & Product Normalization — Title preprocessing, article removal (40+ languages), subtitle/edition stripping, Roman numeral and spelled-out number normalization
- Geographic & Language Normalization — Place name canonicalization and BCP 47 language tag normalization
- BLAKE3 Identifier Generation — 128-bit deterministic hashes across four FRBR layers (Work, Expression, Manifestation, Item)
- Collision Detection — Alias registry for managing known variant-to-canonical mappings
- Versioning — Algorithm version migration with deterministic re-hashing
- Phone Normalization — E.164-aligned canonical form with country code detection, trunk prefix stripping, vanity number mapping, and 100+ country metadata
- Email Normalization — Provider-aware canonicalization: Gmail dot-stripping, sub-address (
+tag) removal for Gmail/Outlook/Yahoo/Proton/iCloud, domain alias resolution - Address Normalization — Street abbreviation expansion (St→street, Ave→avenue, 50+ types), directional/unit normalization, ordinal stripping, US state and CA province expansion, postal code cleaning
- Confidence Tracking — Every result carries a Green/Amber/Red confidence flag indicating normalization quality
The Problem UnSpec Solves
Real-world systems ingest entity data from many sources — user forms, API partners, imports, manual entry — and the same entity arrives in dozens of surface forms:
| Source | Person | Address/Org |
|---|---|---|
| CRM Import | Dr. José María García-López |
Müller & Söhne GmbH |
| Web Form | jose garcia lopez |
Mueller and Soehne |
| API Partner | GARCIA LOPEZ, Jose Maria |
MULLER SOHNE GMBH |
| Call Center | García López, J.M. |
Müller Söhne |
These all refer to the same person and the same organization. Without normalization, your Person, Contact, Address, and Organization tables accumulate duplicates that poison search, reporting, and matching.
UnSpec gives each entity a single deterministic normalized form and a BLAKE3 hash so you can deduplicate, match, and index reliably across any script or culture.
Architecture
┌──────────────────────────────────────────────────────────┐
│ NormalizationPipeline │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Encoding │→│Whitespace│→│Punctuatn │→│ Case Folding│→...
│ │ NFC │ │ Collapse│ │ Symbols │ │ + Special │ │
│ └──────────┘ └──────────┘ └──────────┘ └─────────────┘ │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────┐ ┌──────────────────────────┐
│ Script Detection │────→│ Transliteration Registry │
│ (per-segment) │ │ (script → transliterator)│
└──────────────────┘ └──────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ Domain Normalizers │
│ PersonName │ CreativeWork │ Product │ Geographic │ Lang │
└──────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ BLAKE3 Identifier Generation │
│ Work (W) → Expression (E) → Manifestation (M) → Item │
└──────────────────────────────────────────────────────────┘
Every component implements an interface and is independently composable. Pipelines can nest inside other pipelines. Strategies and transliterators can be added or replaced at runtime.
Requirements
- .NET 9.0+
- Blake3 NuGet package (pulled automatically)
Quick Start
dotnet add package UnSpec
using UnSpec;
using UnSpec.Identifiers;
using UnSpec.Pipeline;
// Generate a work identifier
var workGen = new WorkIdentifierGenerator();
var workId = workGen.Generate("The Last Unicorn", "Peter S. Beagle", "book");
// → v1.W.a3f7c9e2b1d4f6a8.g
// Generate the full FRBR chain
var exprId = new ExpressionIdentifierGenerator().Generate(workId, "en");
var manId = new ManifestationIdentifierGenerator().Generate(exprId, "hardcover", "Penguin", "1968");
var itemId = new ItemIdentifierGenerator().Generate(manId, "Library of Congress", "LC-123456");
See CONSUMPTION.md for complete API documentation.
Real-World Usage with RDBMS
The following sections show how to integrate UnSpec with relational databases for common business entities. The pattern is always the same: normalize on write, index the hash, query by hash.
Schema Design Pattern
For any entity table, add two computed columns alongside the original data:
┌─────────────────────────────────────────────────────────────┐
│ Raw columns (preserve original) │ UnSpec columns (query) │
│ first_name, last_name, ... │ normalized_key │
│ │ normalized_hash │
│ │ confidence │
└─────────────────────────────────────────────────────────────┘
normalized_key— The human-readable canonical form (for debugging and display)normalized_hash— The BLAKE3 identifier (for indexing and matching)confidence— Green/Amber/Red flag (for filtering questionable matches)
Always preserve the original raw data. The normalized columns exist purely for deduplication and lookup.
Person Table
A person arrives as "Dr. José María García-López", "jose garcia lopez", or "GARCIA LOPEZ, Jose Maria". All three should resolve to the same row.
Schema:
CREATE TABLE Person (
id BIGINT IDENTITY PRIMARY KEY,
-- Original data (preserved exactly as received)
first_name NVARCHAR(200) NOT NULL,
last_name NVARCHAR(200) NOT NULL,
full_name_raw NVARCHAR(500) NOT NULL,
source_system NVARCHAR(100),
language_tag NVARCHAR(20) DEFAULT 'en',
-- UnSpec normalized columns
normalized_key NVARCHAR(500) NOT NULL, -- "garcia-lopez:jose maria"
normalized_hash CHAR(52) NOT NULL, -- "v1.W.a3f7c9e2b1d4f6a8c9d0e1f2a3b4c5d6.g"
confidence CHAR(1) NOT NULL, -- 'g', 'a', or 'r'
-- Timestamps
created_at DATETIME2 DEFAULT SYSUTCDATETIME(),
updated_at DATETIME2 DEFAULT SYSUTCDATETIME()
);
CREATE INDEX IX_Person_NormalizedHash ON Person (normalized_hash);
CREATE INDEX IX_Person_Confidence ON Person (confidence) WHERE confidence <> 'r';
C# — Populate on insert/update:
using UnSpec.PersonName;
using UnSpec.Identifiers;
using UnSpec.Pipeline;
public class PersonService
{
private readonly PersonNameNormalizer _nameNorm = new();
private readonly WorkIdentifierGenerator _idGen = new();
public (string key, string hash, char confidence) NormalizePerson(
string fullName, string languageTag)
{
var ctx = new NormalizationContext { LanguageTag = languageTag };
var result = _nameNorm.Normalize(fullName, ctx);
// Canonical form: "primary:secondary"
string key = result.ToCanonicalForm();
// BLAKE3 hash for indexing
var id = _idGen.Generate(key, "", "person");
char conf = id.Confidence switch
{
Confidence.Green => 'g',
Confidence.Amber => 'a',
_ => 'r'
};
return (key, id.ToString(), conf);
}
}
What this buys you:
INSERT 1: "Dr. José María García-López" → hash: v1.W.7f3a... (garcia-lopez:jose maria)
INSERT 2: "jose garcia lopez" → hash: v1.W.7f3a... (same hash!)
INSERT 3: "GARCIA LOPEZ, Jose Maria" → hash: v1.W.7f3a... (same hash!)
-- Find all variants of the same person
SELECT * FROM Person WHERE normalized_hash = 'v1.W.7f3a...';
-- Deduplicate: find persons that appear more than once
SELECT normalized_hash, COUNT(*) as occurrences
FROM Person
WHERE confidence <> 'r'
GROUP BY normalized_hash
HAVING COUNT(*) > 1;
Contact Table
Contacts combine a person with communication channels. Use PersonNameNormalizer for the name, EmailNormalizer for email, and PhoneNormalizer for phone — each has its own domain-specific rules.
Schema:
CREATE TABLE Contact (
id BIGINT IDENTITY PRIMARY KEY,
-- Original data
display_name NVARCHAR(300) NOT NULL,
email_raw NVARCHAR(320),
phone_raw NVARCHAR(50),
country_code CHAR(2) DEFAULT 'US',
language_tag NVARCHAR(20) DEFAULT 'en',
-- UnSpec: person normalization
person_norm_key NVARCHAR(500) NOT NULL,
person_norm_hash CHAR(52) NOT NULL,
-- UnSpec: email normalization (provider-aware canonical form)
email_normalized NVARCHAR(320),
email_confidence CHAR(1),
-- UnSpec: phone normalization (E.164 canonical form)
phone_normalized VARCHAR(20), -- "+{cc}{national}", e.g. "+12125551234"
phone_confidence CHAR(1),
-- UnSpec: composite contact hash (person + email + phone for dedup)
contact_hash CHAR(52) NOT NULL,
confidence CHAR(1) NOT NULL,
created_at DATETIME2 DEFAULT SYSUTCDATETIME()
);
CREATE INDEX IX_Contact_PersonHash ON Contact (person_norm_hash);
CREATE INDEX IX_Contact_ContactHash ON Contact (contact_hash);
CREATE INDEX IX_Contact_Email ON Contact (email_normalized);
CREATE INDEX IX_Contact_Phone ON Contact (phone_normalized);
C#:
using UnSpec;
using UnSpec.PersonName;
using UnSpec.Email;
using UnSpec.Phone;
using UnSpec.Identifiers;
using UnSpec.Pipeline;
public class ContactService
{
private readonly PersonNameNormalizer _nameNorm = new();
private readonly EmailNormalizer _emailNorm = new();
private readonly PhoneNormalizer _phoneNorm = new();
private readonly IdentifierGenerator _idGen = new();
public ContactNormalized Normalize(
string displayName, string? email, string? phone,
string lang, string countryCode = "US")
{
var ctx = new NormalizationContext { LanguageTag = lang };
// Normalize person name
var person = _nameNorm.Normalize(displayName, ctx);
var personId = _idGen.Generate(
person.ToCanonicalForm(), IdentifierLayer.Work, person.Confidence);
// Normalize email (provider-aware: Gmail dot-strip, sub-address removal, etc.)
string? emailNorm = null;
Confidence emailConf = Confidence.Red;
if (!string.IsNullOrWhiteSpace(email))
{
var emailResult = _emailNorm.Normalize(email);
emailNorm = emailResult.CanonicalForm;
emailConf = emailResult.Confidence;
}
// Normalize phone (E.164 canonical form)
string? phoneNorm = null;
Confidence phoneConf = Confidence.Red;
if (!string.IsNullOrWhiteSpace(phone))
{
var phoneResult = _phoneNorm.Normalize(phone, countryCode);
phoneNorm = phoneResult.CanonicalForm;
phoneConf = phoneResult.Confidence;
}
// Composite hash: person + email + phone
string compositeInput = $"{person.ToCanonicalForm()}|{emailNorm ?? ""}|{phoneNorm ?? ""}";
var contactId = _idGen.Generate(
compositeInput, IdentifierLayer.Work, person.Confidence);
return new ContactNormalized(person, personId, emailNorm, emailConf,
phoneNorm, phoneConf, contactId);
}
}
What this buys you for email:
john.doe+promo@gmail.com → johndoe@gmail.com (dots stripped, sub-address removed)
John.Doe@googlemail.com → johndoe@gmail.com (same!)
j.o.h.n.d.o.e@Gmail.COM → johndoe@gmail.com (same!)
What this buys you for phone:
+1 (212) 555-1234 → +12125551234
1-212-555-1234 → +12125551234 (same!)
212.555.1234 (country=US) → +12125551234 (same!)
020 7946 0958 (country=GB) → +442079460958 (trunk prefix stripped)
Dedup query — find contacts that are probably the same person regardless of format:
-- Same person across different entries
SELECT person_norm_key, COUNT(*) as entries
FROM Contact
WHERE confidence IN ('g', 'a')
GROUP BY person_norm_hash, person_norm_key
HAVING COUNT(*) > 1;
-- Same email, different person records (possible duplicate identities)
SELECT email_normalized, COUNT(*) as cnt
FROM Contact
WHERE email_normalized IS NOT NULL AND email_confidence <> 'r'
GROUP BY email_normalized
HAVING COUNT(*) > 1;
-- Same phone, different records
SELECT phone_normalized, COUNT(*) as cnt
FROM Contact
WHERE phone_normalized IS NOT NULL AND phone_confidence <> 'r'
GROUP BY phone_normalized
HAVING COUNT(*) > 1;
Organization Table
Organization names suffer from suffix noise (Inc., Corp., GmbH, Ltd., S.A.), transliteration differences (Müller vs Mueller), and ampersand variants (& vs and). UnSpec handles all of these.
Schema:
CREATE TABLE Organization (
id BIGINT IDENTITY PRIMARY KEY,
-- Original data
legal_name NVARCHAR(500) NOT NULL,
trade_name NVARCHAR(500),
country_code CHAR(2),
language_tag NVARCHAR(20) DEFAULT 'en',
-- UnSpec normalized columns
normalized_key NVARCHAR(500) NOT NULL,
normalized_hash CHAR(52) NOT NULL,
confidence CHAR(1) NOT NULL,
created_at DATETIME2 DEFAULT SYSUTCDATETIME()
);
CREATE INDEX IX_Org_NormalizedHash ON Organization (normalized_hash);
C#:
using UnSpec;
using UnSpec.Product;
using UnSpec.Pipeline;
public class OrganizationService
{
private readonly ProductNormalizer _prodNorm = new();
public (string key, string hash, char confidence) NormalizeOrg(
string legalName, string? countryCode, string lang)
{
var ctx = new NormalizationContext { LanguageTag = lang };
// ProductNormalizer strips publisher suffixes (Inc, Corp, GmbH, Ltd, etc.)
// and normalizes Unicode, ampersands, diacritics
var result = _prodNorm.Normalize(legalName, "", "organization", ctx);
char conf = result.Confidence switch
{
Confidence.Green => 'g',
Confidence.Amber => 'a',
_ => 'r'
};
return (result.Value, result.Value.GetHashCode().ToString(), conf);
}
}
What normalizes identically:
| Raw Input | Normalized Key |
|---|---|
Müller & Söhne GmbH |
muller and sohne\|organization |
Mueller and Soehne |
muller and sohne\|organization (after pipeline: ü→u, ö→o) |
MÜLLER SÖHNE GMBH |
muller and sohne\|organization |
Muller & Sohne, Inc. |
muller and sohne\|organization |
For identifier-based matching, generate a hash with IdentifierGenerator instead:
var idGen = new IdentifierGenerator();
var id = idGen.Generate(result.Value, IdentifierLayer.Work, result.Confidence);
// Use id.ToString() as normalized_hash
Address Table
The AddressNormalizer handles full structured addresses: street abbreviation expansion (St→street, Ave→avenue, 50+ types), directional expansion (N→north, NE→northeast), unit normalization (Apt→apartment, Ste→suite), ordinal stripping (1st→1), US state/CA province expansion, and postal code cleaning.
Schema:
CREATE TABLE Address (
id BIGINT IDENTITY PRIMARY KEY,
-- Original data
street_line_1 NVARCHAR(500),
street_line_2 NVARCHAR(500),
city_raw NVARCHAR(200) NOT NULL,
state_province_raw NVARCHAR(200),
postal_code NVARCHAR(20),
country_raw NVARCHAR(200) NOT NULL,
language_tag NVARCHAR(20) DEFAULT 'en',
-- UnSpec: normalized components
street_normalized NVARCHAR(500),
city_normalized NVARCHAR(200) NOT NULL,
state_normalized NVARCHAR(200),
postal_normalized VARCHAR(20),
country_normalized NVARCHAR(200) NOT NULL,
-- UnSpec: composite address hash for dedup
address_canonical NVARCHAR(1000) NOT NULL, -- "street|city|state|postal|country"
address_hash CHAR(52) NOT NULL,
confidence CHAR(1) NOT NULL,
created_at DATETIME2 DEFAULT SYSUTCDATETIME()
);
CREATE INDEX IX_Address_Hash ON Address (address_hash);
CREATE INDEX IX_Address_City ON Address (city_normalized);
CREATE INDEX IX_Address_Postal ON Address (postal_normalized);
C#:
using UnSpec;
using UnSpec.Address;
using UnSpec.Identifiers;
using UnSpec.Pipeline;
public class AddressService
{
private readonly AddressNormalizer _addrNorm = new();
private readonly IdentifierGenerator _idGen = new();
public (AddressNormalizationResult result, string hash) Normalize(
string? street1, string? street2, string city,
string? state, string? postal, string country, string lang)
{
var ctx = new NormalizationContext { LanguageTag = lang };
var result = _addrNorm.Normalize(new AddressInput
{
Street1 = street1,
Street2 = street2,
City = city,
StateProvince = state,
PostalCode = postal,
Country = country
}, ctx);
var id = _idGen.Generate(
result.CanonicalForm, IdentifierLayer.Work, result.Confidence);
return (result, id.ToString());
}
}
What normalizes identically:
| Raw Address | Normalized Canonical Form |
|---|---|
123 Main St, Apt 4B, Springfield, IL 62704 |
123 main street apartment 4b\|springfield\|illinois\|62704\|us |
123 Main Street, Apartment 4B, Springfield, Illinois 62704 |
123 main street apartment 4b\|springfield\|illinois\|62704\|us |
123 MAIN ST APT 4B + SPRINGFIELD + IL |
123 main street apartment 4b\|springfield\|illinois\|\| |
Abbreviation expansion ensures St = Street, Ave = Avenue, Apt = Apartment, N = North, CA = California, ON = Ontario, etc. Ordinals like 1st and 42nd are normalized to 1 and 42.
Note: For geographic matching at the city/country level (without street-level precision), use
GeographicNormalizerdirectly.AddressNormalizeris for full postal address dedup.
When to Use Each Normalizer
| Entity Type | Normalizer | What It Does | When to Use |
|---|---|---|---|
| Person | PersonNameNormalizer |
Culture-aware name parsing into primary:secondary form. Handles particles (van, von, de, al-), prefixes (Mc/Mac/O'), titles (Dr., Prof.), suffixes (Jr., III), and 8 culture strategies |
Person, Employee, Customer, Author, Patient |
EmailNormalizer |
Provider-aware canonicalization. Gmail: strip dots + sub-address. Outlook/Yahoo/Proton/iCloud: strip sub-address. Domain alias resolution (googlemail→gmail) | Contact, User, Subscriber, Lead — any table with email addresses |
|
| Phone | PhoneNormalizer |
E.164-aligned canonical form. Country code detection, trunk prefix stripping (0→), vanity letter mapping (1-800-FLOWERS), 100+ country metadata | Contact, Customer, Lead — any table with phone numbers |
| Address | AddressNormalizer |
Full postal address: street abbreviation expansion (50+ types), directional/unit normalization, ordinal stripping, US state and CA province expansion, postal code cleaning | Address, Location, Branch, Warehouse, Store — structured postal addresses |
| Organization | ProductNormalizer |
Strips legal suffixes (Inc., GmbH, Ltd., S.A.), normalizes &→and, handles diacritics and case |
Organization, Company, Vendor, Publisher, Employer |
| Place / City | GeographicNormalizer |
Canonical name|geo_type|parent_region form with controlled geo-type vocabulary |
City/country-level geographic matching (not full postal addresses) |
| Creative Work | WorkNormalizer |
Title normalization (article/subtitle/edition removal) + creator + type in canonical form | Book, Film, Album, Game, media catalog tables |
| Product / SKU | ProductNormalizer |
Name + manufacturer + category canonical form with suffix stripping | Product, SKU, Inventory, Catalog tables |
| Language Tag | LanguageNormalizer |
BCP 47 canonicalization: legacy codes, script suppression, case normalization | Any column storing locale/language codes |
| Free Text | UnicodePreprocessingPipeline |
Raw Unicode normalization only — NFC, whitespace, punctuation, case fold, diacritics | Normalizing search terms, tags, notes, or any string column |
Pattern: Dedup on Insert
A common pattern is to check the normalized hash before inserting, to prevent duplicates at the application layer:
public async Task<long> UpsertPerson(string fullName, string lang, DbConnection db)
{
var (key, hash, conf) = _personService.NormalizePerson(fullName, lang);
// Check for existing match
var existing = await db.QueryFirstOrDefaultAsync<long?>(
"SELECT id FROM Person WHERE normalized_hash = @hash",
new { hash });
if (existing.HasValue)
return existing.Value;
// Insert new
return await db.ExecuteScalarAsync<long>(
@"INSERT INTO Person (full_name_raw, language_tag, normalized_key, normalized_hash, confidence)
VALUES (@raw, @lang, @key, @hash, @conf);
SELECT SCOPE_IDENTITY();",
new { raw = fullName, lang, key, hash, conf });
}
Pattern: Fuzzy Match with Confidence Filtering
When matching across systems, use the confidence flag to control strictness:
-- High-confidence matches only (deterministic, unambiguous)
SELECT a.*, b.*
FROM System_A a
JOIN System_B b ON a.normalized_hash = b.normalized_hash
WHERE a.confidence = 'g' AND b.confidence = 'g';
-- Include heuristic matches for review
SELECT a.*, b.*, a.confidence AS conf_a, b.confidence AS conf_b
FROM System_A a
JOIN System_B b ON a.normalized_hash = b.normalized_hash
WHERE a.confidence IN ('g', 'a') AND b.confidence IN ('g', 'a')
ORDER BY
CASE WHEN a.confidence = 'g' AND b.confidence = 'g' THEN 1
WHEN a.confidence = 'g' OR b.confidence = 'g' THEN 2
ELSE 3 END;
Pattern: Alias Resolution for Known Variants
Some entities normalize to different strings but represent the same thing (Mark Twain vs Samuel Clemens, Munich vs München). Use the alias registry:
using UnSpec.Collision;
var aliases = new InMemoryAliasRegistry();
// Register known alias
aliases.Register(new AliasEntry(
Type: "person",
VariantNormalized: "twain:mark",
CanonicalNormalized: "clemens:samuel",
CanonicalId: null,
Source: "manual_review",
Confidence: Confidence.Green
));
// At query time, check both direct hash match AND alias
public async Task<List<Person>> FindPerson(string name, string lang, DbConnection db)
{
var (key, hash, _) = _personService.NormalizePerson(name, lang);
// Check alias registry
var alias = aliases.Lookup(key, "person");
string canonicalKey = alias?.CanonicalNormalized ?? key;
return await db.QueryAsync<Person>(
"SELECT * FROM Person WHERE normalized_key = @key",
new { key = canonicalKey });
}
For production systems, implement IAliasRegistry with a database-backed store:
CREATE TABLE NormalizationAlias (
id BIGINT IDENTITY PRIMARY KEY,
entity_type NVARCHAR(50) NOT NULL, -- 'person', 'organization', etc.
variant_normalized NVARCHAR(500) NOT NULL,
canonical_normalized NVARCHAR(500) NOT NULL,
canonical_id CHAR(52),
source NVARCHAR(100) NOT NULL,
confidence CHAR(1) NOT NULL,
added_by NVARCHAR(100),
added_date DATE DEFAULT GETDATE(),
UNIQUE (entity_type, variant_normalized)
);
CREATE INDEX IX_Alias_Canonical ON NormalizationAlias (entity_type, canonical_normalized);
Running Tests
dotnet test
887 tests covering every normalization stage, transliterator, name strategy, identifier generator, phone/email/address normalizer, and integration scenario.
Project Structure
src/UnSpec/
├── Pipeline/ # Composable normalization stages
│ └── UnicodePipeline/ # §2: Encoding, whitespace, punctuation, case, diacritics
├── ScriptDetection/ # §3.1: Unicode script identification
├── Transliteration/ # §3.2–3.14: Script-to-Latin transliterators
├── PersonName/ # §4: Culture-specific name normalization
│ └── Strategies/ # Western, Arabic, East Asian, etc.
├── CreativeWork/ # §5: Title and work normalization
├── Product/ # §5: Product normalization
├── Phone/ # E.164 phone normalization
├── Email/ # Provider-aware email normalization
├── Address/ # Structured address normalization
├── Geographic/ # §6: Geographic entity normalization
├── Language/ # §7: BCP 47 language tag normalization
├── Identifiers/ # §8: BLAKE3 identifier generation (4 FRBR layers)
├── Collision/ # §9: Alias registry for collision resolution
├── Versioning/ # §10: Algorithm version migration
└── Vocabularies/ # Appendices: Controlled vocabularies
License
See LICENSE file.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net9.0
- Blake3 (>= 1.1.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0 | 109 | 2/20/2026 |
See https://github.com/JasSra/UnSpec/releases for release notes.