Mostly switch object writer to UTF-8 #122160

MichalStrehovsky · 2025-12-03T22:18:34Z

When compiling hello world:

byte[] allocations before: 564000. string allocations before: 302000
byte[] allocations after: 625000. string allocations after: 241000

So this is mostly a wash allocation-wise, however, not allocating string means we're also avoiding the UTF-8 -> UTF-16 -> UTF-8 conversions.

Cc @dotnet/ilc-contrib

f

dotnet-policy-service · 2025-12-03T22:24:37Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

src/coreclr/tools/Common/Internal/Text/Utf8String.cs

Copilot

Pull request overview

This PR converts the object writer subsystem from using string to Utf8String to avoid UTF-8 ↔ UTF-16 ↔ UTF-8 conversions during compilation. The change primarily affects symbol naming, mangling, and object file generation code.

Key Changes

Introduced IsNull property to Utf8String for null checking
Added multiple Concat overload methods for efficient UTF-8 string concatenation
Updated all object writer interfaces and implementations to accept/return Utf8String instead of string
Changed dictionaries and collections to use Utf8String keys

Reviewed changes

Copilot reviewed 44 out of 44 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
Internal/Text/Utf8String.cs	Added `IsNull` property and new `Concat` overloads for UTF-8 string operations
ObjectWriter/ObjectWriter.cs	Core object writer changed to use `Utf8String` for symbol names and relocations
ObjectWriter/StringTableBuilder.cs	Updated string table to work directly with UTF-8 bytes
ObjectWriter/CoffObjectWriter.cs	COFF format writer updated to use `Utf8String`
ObjectWriter/ElfObjectWriter.cs	ELF format writer updated to use `Utf8String`
ObjectWriter/MachObjectWriter.cs	Mach-O format writer updated to use `Utf8String`
ObjectWriter/UnixObjectWriter.cs	Unix-specific object writer base updated
Compiler/NodeMangler.cs	Name mangling infrastructure changed to return `Utf8String`
Compiler/WindowsNodeMangler.cs	Windows-specific name mangling updated with UTF-8 concatenation
Compiler/UnixNodeMangler.cs	Unix-specific name mangling updated with UTF-8 concatenation
Compiler/UserDefinedTypeDescriptor.cs	Debug info type descriptors updated to use `Utf8String`
DependencyAnalysis/NodeFactory.cs	Factory methods updated for `Utf8String` symbol names
DependencyAnalysis/*Node.cs	Various node types updated to use `Utf8String` for mangled names

src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs

src/coreclr/tools/Common/Compiler/ObjectWriter/ElfObjectWriter.cs

src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/UserDefinedTypeDescriptor.cs

src/coreclr/tools/Common/Compiler/ObjectWriter/MachObjectWriter.cs

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/WindowsNodeMangler.cs

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/UserDefinedTypeDescriptor.cs

src/coreclr/tools/Common/Compiler/ObjectWriter/StringTableBuilder.cs

MichalStrehovsky · 2025-12-08T07:55:57Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2025-12-08T07:56:14Z

Azure Pipelines successfully started running 1 pipeline(s).

ANahr · 2025-12-08T22:32:42Z

src/coreclr/tools/Common/Compiler/NativeAotNameMangler.cs


+        private Utf8String SanitizeNameWithHash(Utf8String literal)
+        {
+            Utf8String mangledName = SanitizeName(literal);


That code seems to be dangerous.
The result is not really an Utf8-String but ASCII - and it needs to be because the next few lines would be incorrect if it would be UTF8 and would contain any multi-byte chars (you can't simply cut off arbitrary Utf8 at byte position 30).
So if anybody ever changes SanitizeName to actually be Utf8 this will create hard-to-spot errors.

So if anybody ever changes SanitizeName to actually be Utf8 this will create hard-to-spot errors.

It's not likely we'd ever allow SanitizeName to return characters outside the basic ASCII set.

Would a Debug.Assert(Ascii.IsValid(mangledName)) make sense here?

Object writer

787fcfa

f

github-actions bot added the area-NativeAOT-coreclr label Dec 3, 2025

MichalStrehovsky mentioned this pull request Dec 3, 2025

Remove implicit string operator from Utf8String #122149

Open

dotnet-policy-service bot assigned MichalStrehovsky Dec 3, 2025

am11 reviewed Dec 3, 2025

View reviewed changes

src/coreclr/tools/Common/Internal/Text/Utf8String.cs Outdated Show resolved Hide resolved

MichalStrehovsky added 3 commits December 5, 2025 06:02

Fix crossgen2

fd0c4f2

Review feedback

cd66282

Usings

860936a

MichalStrehovsky changed the title ~~Switch object writer to UTF-8~~ Mostly switch object writer to UTF-8 Dec 5, 2025

MichalStrehovsky marked this pull request as ready for review December 5, 2025 05:49

Copilot AI review requested due to automatic review settings December 5, 2025 05:49

Copilot started reviewing on behalf of MichalStrehovsky December 5, 2025 05:50 View session

Copilot finished reviewing on behalf of MichalStrehovsky December 5, 2025 05:53

Copilot AI reviewed Dec 5, 2025

View reviewed changes

MichalStrehovsky mentioned this pull request Dec 5, 2025

Crossgen Wasm Object Writer V0 #122111

Merged

build-analysis bot mentioned this pull request Dec 5, 2025

Vector saturate tests failing on arm32 #122185

Closed

PaulusParssinen reviewed Dec 5, 2025

View reviewed changes

src/coreclr/tools/Common/Compiler/ObjectWriter/StringTableBuilder.cs Show resolved Hide resolved

PaulusParssinen reviewed Dec 5, 2025

View reviewed changes

src/coreclr/tools/Common/Compiler/ObjectWriter/StringTableBuilder.cs Outdated Show resolved Hide resolved

Review feedback

e578822

jkotas approved these changes Dec 8, 2025

View reviewed changes

build-analysis bot mentioned this pull request Dec 8, 2025

Unable to pull image from mcr.microsoft.com #117164

Open

filipnavara approved these changes Dec 8, 2025

View reviewed changes

MichalStrehovsky merged commit 8c8bfb2 into dotnet:main Dec 8, 2025
111 of 120 checks passed

MichalStrehovsky deleted the utf8objwriter branch December 8, 2025 12:59

build-analysis bot mentioned this pull request Dec 8, 2025

System.Security.Cryptography.Tests .IsSupported_AgreesWithPlatform failures #122228

Open

ANahr reviewed Dec 8, 2025

View reviewed changes

dotnet-maestro bot mentioned this pull request Dec 9, 2025

[main] Source code updates from dotnet/runtime dotnet/dotnet#3725

Closed

dotnet-maestro bot mentioned this pull request Dec 9, 2025

[main] Source code updates from dotnet/runtime dotnet/dotnet#3766

Open

maestro-local-dev bot mentioned this pull request Dec 10, 2025

[repro/d7f049ef-d337-4132-bc67-a9d2b6ecd9a8] Source code updates from maestro-auth-test/runtime maestro-auth-test/dotnet#234

Closed

Mostly switch object writer to UTF-8 #122160

Mostly switch object writer to UTF-8 #122160

Uh oh!

Conversation

MichalStrehovsky commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Dec 3, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichalStrehovsky commented Dec 8, 2025

Uh oh!

azure-pipelines bot commented Dec 8, 2025

Uh oh!

Uh oh!

ANahr Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

MichalPetryka Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

MichalStrehovsky commented Dec 3, 2025 •

edited

Loading