Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add a feature flag to not use GVM in Linq Select #109978

Merged

Conversation

keegan-caruso
Copy link
Contributor

@keegan-caruso keegan-caruso commented Nov 19, 2024

Adds a feature flag to allow Linq Select to not use a GVM implementation.

The compiled size in Native AOT with a value type GVM scales at an order of n**2 relative to types used in the call. This avoids that growth in size at the cost of a slower implementation of chained Linq calls.

A real-world example of where this caused an inability to compile was in this issue: #102131

Adapting this to something a bit contrived, but easy to measure:

GetEnumValue<GeneratedEnum0>();GetEnumValue<GeneratedEnumN>();
static T? GetEnumValue<[DynamicallyAccessedMembers(DynamicallyAccessedMemberTypes.PublicFields)] T>() where T : struct, Enum
{
    var rawValue = "foo,bar";
    if (string.IsNullOrEmpty(rawValue)) return null;

    var type = typeof(T);
    if (type.GetCustomAttributes<FlagsAttribute>().Any())
    {
        return (T)(object)rawValue!
            .Split(',')
            .Select(x => Enum.TryParse<T>(x, true, out var result) ? result : (T?)null)
            .Where(x => !x.Equals(null))
            .Select(x => (int)(object)x!)
            .Sum();
    }
    else
        return Enum.TryParse<T>(rawValue, true, out var result) ? result : null;
}

class FlagsAttribute : Attribute { }
enum GeneratedEnum0 { };
…
enum GeneratedEnumN { };

The size of the Linq Namespace measured by sizoscope and when compiled with Native Aot:

N Size NET 9.0 Size with feature flag enabled
10 1016.6 kb 77.7 kb
25 5.2 mb 190.4 kb
50 19.7 mb 378.3 kb
75 43.4 mb 566.2 kb
100 76.3 mb 754.1 kb
1000 Failed to compile 7475.2 kb

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Nov 19, 2024
@stephentoub
Copy link
Member

A few questions:

  1. Who do we expect to set this to false?
  2. This is specific to Select because it's the only one of the virtuals on the base Iterator type that's generic? There have separately been concerns about the size impact of all of these specializations, GVM or not.
  3. We already have the OptimizeForSize build constant: https://github.com/dotnet/runtime/pull/109978/files#diff-bcb77d7db7721bb5508d93dc432d9a40d920e6b248c98c9ad14a3640bbe6fa2bR11. I'm not a huge fan of having yet another flavor. Could we get rid of that existing one, so that there's just one build of LINQ, and then use a switch like this to control things at publish / execution time?

@am11
Copy link
Member

am11 commented Nov 19, 2024

The size of the Linq Namespace measured by sizoscope and when compiled with Native Aot:

N Size .NET 9.0 Size with Feature Flag Enabled
10 1016.6 kb 77.7 kb
25 5.2 mb 190.4 kb
50 19.7 mb 378.3 kb
75 43.4 mb 566.2 kb
100 76.3 mb 754.1 kb
1000 Failed to compile 7475.2 kb

@MichalStrehovsky could ILCompiler optimize this pattern for value type a bit broadly (or even conservatively for .Select() verbatim) without introducing the feature switch?

@keegan-caruso
Copy link
Contributor Author

A few questions:

  1. Who do we expect to set this to false?
  2. This is specific to Select because it's the only one of the virtuals on the base Iterator type that's generic? There have separately been concerns about the size impact of all of these specializations, GVM or not.
  3. We already have the OptimizeForSize build constant: https://github.com/dotnet/runtime/pull/109978/files#diff-bcb77d7db7721bb5508d93dc432d9a40d920e6b248c98c9ad14a3640bbe6fa2bR11. I'm not a huge fan of having yet another flavor. Could we get rid of that existing one, so that there's just one build of LINQ, and then use a switch like this to control things at publish / execution time?

1: It is an opt out of the feature if the performance is unacceptable and the developer is unwilling to rewrite their code to avoid Linq Select. If they are unwilling to make the size vs perf tradeoff for this scenario.

  1. Yes, only virtual that is also a generic method. I updated the description to add an example of the size difference we see with this change; it can be significant.

  2. I guess it is a question if we can conditionally set the value of the feature flag from TargetPlatformIdentifier and if it should be controllable at publish time.

@stephentoub
Copy link
Member

I guess it is a question if we can conditionally set the value of the feature flag from TargetPlatformIdentifier and if it should be controllable at publish time.

I'm suggesting we wouldn't gate it on TPM, and instead guard everything relevant with the feature switch, which when set would cause lots of the specializations to be trimmed away.

src/libraries/System.Linq/src/System/Linq/Select.cs Outdated Show resolved Hide resolved
@@ -38,6 +38,7 @@ Copyright (c) .NET Foundation. All rights reserved.
<EnableUnsafeBinaryFormatterSerialization Condition="'$(EnableUnsafeBinaryFormatterSerialization)' == ''">false</EnableUnsafeBinaryFormatterSerialization>
<EnableUnsafeUTF7Encoding Condition="'$(EnableUnsafeUTF7Encoding)' == ''">false</EnableUnsafeUTF7Encoding>
<BuiltInComInteropSupport Condition="'$(BuiltInComInteropSupport)' == ''">false</BuiltInComInteropSupport>
<ValueTypeTrimFriendlySelect Condition="'$(ValueTypeTrimFriendlySelect)' == ''">true</ValueTypeTrimFriendlySelect>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any measurements how this affects PublishTrimmed? This enables it if PublishTrimmed is true, I wonder if it should set defaults based on PublishAot instead.

@MichalStrehovsky
Copy link
Member

I guess it is a question if we can conditionally set the value of the feature flag from TargetPlatformIdentifier and if it should be controllable at publish time.

I'm suggesting we wouldn't gate it on TPM, and instead guard everything relevant with the feature switch, which when set would cause lots of the specializations to be trimmed away.

I only very briefly spot checked what code is in the SpeedOpt files but if it's stuff like this:

private sealed partial class RangeIterator : IList<int>, IReadOnlyList<int>
{
public override IEnumerable<TResult> Select<TResult>(Func<int, TResult> selector)
{
return new RangeSelectIterator<TResult>(_start, _end, selector);
}
public override int[] ToArray()
{
int start = _start;
int[] array = new int[_end - start];
FillIncrementing(array, start);
return array;
}
public override List<int> ToList()
{
(int start, int end) = (_start, _end);
List<int> list = new List<int>(end - start);
FillIncrementing(SetCountAndGetSpan(list, end - start), start);
return list;
}
public void CopyTo(int[] array, int arrayIndex) =>
FillIncrementing(array.AsSpan(arrayIndex, _end - _start), _start);
public override int GetCount(bool onlyIfCheap) => _end - _start;
public int Count => _end - _start;
public override Iterator<int>? Skip(int count)
{
if (count >= _end - _start)
{
return null;
}
return new RangeIterator(_start + count, _end - _start - count);
}
public override Iterator<int> Take(int count)
{
int curCount = _end - _start;
if (count >= curCount)
{
return this;
}
return new RangeIterator(_start, count);
}
public override int TryGetElementAt(int index, out bool found)
{
if ((uint)index < (uint)(_end - _start))
{
found = true;
return _start + index;
}
found = false;
return 0;
}
public override int TryGetFirst(out bool found)
{
found = true;
return _start;
}
public override int TryGetLast(out bool found)
{
found = true;
return _end - 1;
}
public bool Contains(int item) =>
(uint)(item - _start) < (uint)(_end - _start);
public int IndexOf(int item) =>
Contains(item) ? item - _start : -1;
public int this[int index]
{
get
{
if ((uint)index >= (uint)(_end - _start))
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.index);
}
return _start + index;
}
set => ThrowHelper.ThrowNotSupportedException();
}
public bool IsReadOnly => true;
void ICollection<int>.Add(int item) => ThrowHelper.ThrowNotSupportedException();
void ICollection<int>.Clear() => ThrowHelper.ThrowNotSupportedException();
void IList<int>.Insert(int index, int item) => ThrowHelper.ThrowNotSupportedException();
bool ICollection<int>.Remove(int item) => ThrowHelper.ThrowNotSupportedException_Boolean();
void IList<int>.RemoveAt(int index) => ThrowHelper.ThrowNotSupportedException();
}
}

It would introduce untrimmable methods into compilation - these are all virtual/interface methods. Even if we stub them out, this feels like it's going to be a size regression since we need the method bodies, even if they pretty much do nothing.

@MichalStrehovsky
Copy link
Member

@MichalStrehovsky could ILCompiler optimize this pattern for value type a bit broadly (or even conservatively for .Select() verbatim) without introducing the feature switch?

Do you have a more concrete idea? The change in this PR modifies behaviors, the resulting behavior is not identical; we take different codepaths and call different methods. It would require some very advanced analysis to do the equivalent change in the compiler.

@MichalStrehovsky
Copy link
Member

Size statistics from rt-sz:

Nice savings for Avalonia. The rest probably already learned the lesson to just steer clear of LINQ if perf is of any concern.

Size statistics

Pull request #109978

Project Size before Size after Difference
avalonia.app-linux 22175368 20849944 -1325424
avalonia.app-windows 19117568 18252800 -864768
hello-linux 1352352 1352352 0
hello-minimal-linux 1081896 1081896 0
hello-minimal-windows 858112 858112 0
hello-windows 1103360 1103360 0
kestrel-minimal-linux 5474240 5474240 0
kestrel-minimal-windows 4908544 4909056 512
reflection-linux 2063440 2063440 0
reflection-windows 1750016 1750016 0
webapiaot-linux 10120480 10120480 0
webapiaot-windows 9157632 9158144 512
winrt-component-full-windows 5602304 5600768 -1536
winrt-component-minimal-windows 1747456 1747456 0

@stephentoub
Copy link
Member

stephentoub commented Nov 20, 2024

this feels like it's going to be a size regression since we need the method bodies, even if they pretty much do nothing.

It'd mean smaller code for nativeaot / trimmed coreclr apps in exchange for possibly slightly larger for mobile. Plus not having to maintain two biuild flavors of the library, and now another variation on top of it. I'm not excited at having both the build constant and the trimming constant, both to optimize for size, both with similar goals, but both doing it differently.

This enables it if PublishTrimmed is true, I wonder if it should set defaults based on PublishAot instead.

This is on by default if trimming is enabled?

I don't see it mentioned anywhere, but these changes still trade offf throughput and allocation for that size benefit. Some of the ones covered by this PR have been there since the earliest days of LINQ.

Also, this can have observable behavioral differences, which I thought we tried to avoid as part of trimming by default.

@MichalStrehovsky
Copy link
Member

Also, this can have observable behavioral differences, which I thought we tried to avoid as part of trimming by default.

Is this behavior or just perf difference? I'm not thrilled about introducing perf differences either, but the generic expansion caused by this generic virtual method can lead to actual failure to compile (#102131 mentioned in the top post) because it becomes physically impossible to compile that much code.

Brainstorming alternatives, we could also add a perf analyzer that simply flags all uses of LINQ in code that sets IsAotCompatible as a perf problem so that people know to stay away from it. It's basically how we solved these issues in the past, but without the analyzer, just deleting LINQ use in e.g. ASP.NET.

@MichalStrehovsky
Copy link
Member

Looking at the test results, looks like this change is not correct in this shape, the tests are hitting a stack overflow.

@MichalStrehovsky
Copy link
Member

Brainstorming alternatives, we could also add a perf analyzer that simply flags all uses of LINQ in code that sets IsAotCompatible as a perf problem so that people know to stay away from it. It's basically how we solved these issues in the past, but without the analyzer, just deleting LINQ use in e.g. ASP.NET.

Maybe a perf analyzer wouldn't be the worst idea in general. LINQ expressions is another thing that performs very differently under AOT and we could use something that would steer people away from it better than a line in native AOT docs.

@neon-sunset
Copy link
Contributor

neon-sunset commented Nov 20, 2024

Do you have a more concrete idea? The change in this PR modifies behaviors, the resulting behavior is not identical; we take different codepaths and call different methods. It would require some very advanced analysis to do the equivalent change in the compiler.

Maybe a perf analyzer wouldn't be the worst idea in general. LINQ expressions is another thing that performs very differently under AOT and we could use something that would steer people away from it better than a line in native AOT docs.

IlcFoldIdenticalMethodBodies=true appears to remove about 260 KB from LINQ namespace for N = 10.

I wanted to ask if something could be done to fold codegen-identical type instantiations besides removing optimized iterator implementations. This is not the first report here that concerns a problematic interaction of LINQ + enums with NAOT.

@am11 am11 added the size-reduction Issues impacting final app size primary for size sensitive workloads label Nov 20, 2024
@stephentoub
Copy link
Member

stephentoub commented Nov 20, 2024

Is this behavior or just perf difference?

Behavior. It's generally minor, but for example if you have:

IList<T> list = ...;
foreach (var item in list.Skip(3).Take(4).Select(...) { ... }

that Select will end up producing an enumerable that will use the IList<T>'s indexer, but with this PR, it would end up enumerating it via GetEnumerator/MoveNext/Current.

@MichalStrehovsky
Copy link
Member

IlcFoldIdenticalMethodBodies=true appears to remove about 260 KB from LINQ namespace for N = 10.

#103951 would help then, but still would not solve #102131 because we cannot even represent that many methods within the compiler (and we need to represent and compile these methods before we find out they have identical method bodies).

@agocke
Copy link
Member

agocke commented Nov 21, 2024

Behavior. It's generally minor, but for example if you have:

IList<T> list = ...;
foreach (var item in list.Skip(3).Take(4).Select(...) { ... }

I see that there's already an ifdef that controls some of the behavior here. It looks like that ifdef may already be enabled for some of the mobile platforms and WASM. Is this a known behavioral difference?

@agocke
Copy link
Member

agocke commented Nov 21, 2024

Also, just to provide some clarity for @keegan-caruso, I think we should definitely have a runtime feature-flag for this behavior. Whether or not it's on by default when running AOT can be a separate decision, but there seems to be a wealth of evidence that this is a potentially blocking implementation for some people and we should provide a way to workaround the problem for large apps.

@stephentoub
Copy link
Member

stephentoub commented Nov 22, 2024

Is this a known behavioral difference?

I mean, I told you about it, so... yes? :) This is an implementation detail; there are two different interface methods that can be used to achieve the same result, and the implementation is using one or the other. There's no guarantee or documentation about which is used. To my knowledge, we've also never heard anyone having an issue based on it using one versus the other.

I raise it, though, because historically folks on our team have been adamant that behavior with and without the linker should be identical, that you should be able to test pre-trimming and not have to test again post-trimming. This change puts in place a possibly observable difference that logically goes against that. This is different from how the difference currently manifests, which is that mobile platforms get one behavior and everything else gets another, and we absolutely say you must test on all platforms you care about as there are platform differences that can manifest and result in testing on one platform not guaranteeing correctness on others.

@MichalStrehovsky
Copy link
Member

I raise it, though, because historically folks on our team have been adamant that behavior with and without the linker should be identical, that you should be able to test pre-trimming and not have to test again post-trimming.

The nuance here is that the AppContext switch would be set if PublishTrimmed=true is in the project file - this is irrespective of whether we're doing F5 launch without trimming, or running the trimmed result of dotnet publish. So there wouldn't be a behavioral different between trimmed/untrimmed, just a difference between "I have PublishTrimmed=true in the project file" and "I don't have it".

It's a bit of weaseling-out from the rules, but we've been forced to do this for various things in the past (e.g. startup hooks don't work with PublishTrimmed and there's no build-time warning about the behavioral difference, but we also disable them for dotnet build, not just publish, so it doesn't break the rule).

The difference here is that we're not forced into this - this code works, it's just not great size-wise and the size can be prohibitive.

@agocke
Copy link
Member

agocke commented Nov 26, 2024

Yeah, I think the feature switch does mean that we're technically correct (the best kind of correct) about same behavior during JIT and during AOT -- but I'm definitely open to more discussion on the right defaults.

@keegan-caruso keegan-caruso changed the title Don't use a GVM in Linq Select with NAOT by default Add a feature flag to not use GVM in Linq Select Nov 27, 2024
Adds a feature flag to allow Linq Select to not use
a GVM implementation.
MichalStrehovsky and others added 4 commits November 27, 2024 11:48
Co-authored-by: Michal Strehovský <MichalStrehovsky@users.noreply.github.com>
- Don't set feature flag by default for trimming
- Move option to Enumerable
- Fix error in OfType.SpeedOpt
@keegan-caruso keegan-caruso force-pushed the nativeaot-remove-gvm-select branch from 2f6ab0c to c1531ec Compare November 27, 2024 19:49
@MichalStrehovsky
Copy link
Member

Unless there's any objections, I'd like to fold this new codepath under the same feature switch that we introduced in #111743. The latest commit does that. This can all just be "size-optimized" without going into details of the mechanism.

Some EXE size numbers for the N=11 sample in the top post:

bytes
main 3,287,552
main (UseSizeOptimizedLinq) 2,689,024
PR (UseSizeOptimizedLinq) 1,792,512

Additional numbers are in rt-sz measurements at MichalStrehovsky/rt-sz#106 (comment). That one measures UseSizeOptimizedLinq for both baseline size (before) and PR size (after).

So this is making UseSizeOptimizedLinq even more size optimized.

One of the reasons I'd like to fold this into the existing switch is that there's actually interactions between the existing size optimization and this size optimization - as you can see in 05d43cf I'm deleting a test that was testing a difference in behavior of the new switch added here. Turns out the difference in behavior only exists if the new switch (added in this PR) is enabled, but the existing switch is not enabled. If both are enabled (or both are disabled) the difference in behavior doesn't exist.

@MichalStrehovsky MichalStrehovsky merged commit 481eab6 into dotnet:main Feb 6, 2025
86 checks passed
grendello added a commit to grendello/runtime that referenced this pull request Feb 6, 2025
* main: (23 commits)
  add important remarks to NrbfDecoder (dotnet#111286)
  docs: fix spelling grammar and missing words in clr-code-guide.md (dotnet#112222)
  Consider type declaration order in MethodImpls (dotnet#111998)
  Add a feature flag to not use GVM in Linq Select (dotnet#109978)
  [cDAC] Implement ISOSDacInterface::GetMethodDescPtrFromIp (dotnet#110755)
  Restructure JSImport/JSExport generators to share more code and utilize more Microsoft.Interop.SourceGeneration shared code (dotnet#107769)
  Add more detailed explanations to control-flow RegexOpcode values (dotnet#112170)
  Add repo-specific condition to labeling workflows (dotnet#112169)
  Fix bad assembly when a nested exported type is marked via link.xml (dotnet#107945)
  Make `CalculateAssemblyAction` virtual. (dotnet#112154)
  JIT: Enable reusing profile-aware DFS trees between phases (dotnet#112198)
  Add support for LDAPTLS_CACERTDIR \ TrustedCertificateDirectory (dotnet#111877)
  JIT: Support custom `ClassLayout` instances with GC pointers in them (dotnet#112064)
  Factor positive lookaheads better into find optimizations (dotnet#112107)
  Add ImmutableCollectionsMarshal.AsMemory (dotnet#112177)
  [mono] ILStrip write directly to the output filestream (dotnet#112142)
  Allow the NativeAOT runtime pack to be specified as the ILC runtime package (dotnet#111876)
  JIT: some reworking for conditional escape analysis (dotnet#112194)
  Replace HELPER_METHOD_FRAME with DynamicHelperFrame in patchpoints (dotnet#112025)
  [Android] Decouple runtime initialization and entry point execution for Android sample (dotnet#111742)
  ...
@keegan-caruso keegan-caruso deleted the nativeaot-remove-gvm-select branch February 6, 2025 21:30
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area-System.Linq community-contribution Indicates that the PR has been added by a community member size-reduction Issues impacting final app size primary for size sensitive workloads
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants