Skip to content

Add project template build tests + CG reporting #6355

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 20 commits into from
May 2, 2025
Merged

Conversation

MackinnonBuck
Copy link
Member

@MackinnonBuck MackinnonBuck commented Apr 30, 2025

Adds project template create/restore/build tests and CG reporting.

Summary of changes:

  • Added infrastructure for invoking .NET commands from test code
  • Added a TemplateSandbox folder that gets used to build (and in the future, run) tests in an environment isolated from the rest of the repo
  • Added a script that configures the local PowerShell environment to match that of template execution tests
    • This makes debugging generated tests locally a lot easier than it would otherwise have been
  • Added tests that create, restore, and build variations of the AI Chat Web project template
    • Note: Test output does not get cleared until the next test run. This enables easy debugging of generated templates and allows template dependencies to get discovered for CG.
  • Added a README explaining how to debug template execution tests
  • Made the Microsoft.Extensions.AI.Templates.Test project an "integration test" so it runs separately from existing unit tests
    • This is required because template execution tests can only be run after the repo gets packed
  • Updated pipelines to include a "Run integration tests" step that runs after packing the repo

Fixes #6322

Microsoft Reviewers: Open in CodeFlow

@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Apr 30, 2025
@MackinnonBuck MackinnonBuck marked this pull request as ready for review April 30, 2025 23:43
@MackinnonBuck MackinnonBuck requested review from a team as code owners April 30, 2025 23:43
Copy link
Member

@RussKie RussKie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@RussKie RussKie added area-ai-templates Microsoft.Extensions.AI.Templates and removed area-ai Microsoft.Extensions.AI libraries labels Apr 30, 2025
Copy link
Member

@SteveSandersonMS SteveSandersonMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

This is all pretty sophisticated. Did you take much of the infrastructure code from our other repos or is this all custom for dotnet/extensions? If it's custom, do you think we should look into ways to share it with dotnet/aspnetcore etc or is this unique to the scenarios in this repo?

Presumably this could be used as the basis for Playwright tests in the future, though that would also force us to provide working AI service endpoints in CI which is a whole other challenge. Right?

@RussKie
Copy link
Member

RussKie commented May 1, 2025

/cc: @radical

@MackinnonBuck
Copy link
Member Author

MackinnonBuck commented May 1, 2025

Did you take much of the infrastructure code from our other repos or is this all custom for dotnet/extensions?

It's mostly custom, but I did look into the approaches taken in the aspnetcore, sdk, and aspire repos to see if there was anything that could be borrowed. Thankfully, it's quite a lot simpler to get these kinds of tests working in the extensions repo because we don't need to mutate the local .NET SDK installation to execute template projects (unlike e.g., aspnetcore, which needs to make a copy of the SDK with the just-built shared framework swapped in, or aspire, which has to install a workload). In the extensions case, it's enough to specify an alternate NuGet package cache location and a custom template install directory. Therefore, it didn't seem worth it to do a direct copy/paste of existing solutions when the complexity isn't warranted.

The main candidate for shared logic seems to be the ability to reliably run the .NET CLI from tests. The sdk repo has the most sophisticated solution for this out of the repos I looked at (Microsoft.DotNet.Cli.Utils + Microsoft.NET.TestFramework), and I effectively used a very simplified version of that (similar to aspire, but simpler still). And aspnetcore sort of does its own separate thing.

That said, it's likely possible to create a shared utility that provides the most common functionality required by each repo (maybe something like what the SDK already has). I'd be curious to see what others think about this.

Presumably this could be used as the basis for Playwright tests in the future, though that would also force us to provide working AI service endpoints in CI which is a whole other challenge. Right?

Yes, right.

MackinnonBuck and others added 4 commits April 30, 2025 20:59
…ationTests/TemplateSandbox/Directory.Build.props

Co-authored-by: Steve Sanderson <SteveSandersonMS@users.noreply.github.com>
…ationTests/TemplateSandbox/Directory.Build.targets

Co-authored-by: Steve Sanderson <SteveSandersonMS@users.noreply.github.com>
@radical
Copy link
Member

radical commented May 1, 2025

This is what we have - https://github.com/dotnet/aspire/blob/main/tests/Aspire.Templates.Tests/NewUpAndBuildStandaloneTemplateTests.cs

It doesn't depend on a workload now, so doesn't need to modify local dotnet installs. It supports testing against multiple sdks.

@radical
Copy link
Member

radical commented May 1, 2025

Playwright tests - https://github.com/dotnet/aspire/blob/main/tests/Aspire.Templates.Tests/TemplateTestsBase.cs#L173

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this now in place, what are your thoughts on the existing snapshot tests? Changes to the snapshot tests should be a separate PR if you suggest any though.

@MackinnonBuck
Copy link
Member Author

MackinnonBuck commented May 1, 2025

With this now in place, what are your thoughts on the existing snapshot tests? Changes to the snapshot tests should be a separate PR if you suggest any though.

I think snapshot tests are still valuable for catching unintended changes in template content, which these new tests might not do. Aside from adding more snapshot tests over time as we see fit, I don't have any suggestions for how they need to change yet 🙂

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, @MackinnonBuck. The only comment I have left that I'd like you to address is that I don't know what specific side-effect of generating and building the template combinations gives component governance visibility into the dependencies.

Can you add a comment in the place or places where there are any subtle details that make that wire up correctly? Is it how the projects are built, the fact that they simply perform a restore, the output location, and/or other aspects that result in CG visibility?

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent comments and clarity; thanks!

@MackinnonBuck MackinnonBuck merged commit 13d0313 into main May 2, 2025
6 checks passed
@MackinnonBuck MackinnonBuck deleted the mbuck/template-cg branch May 2, 2025 00:42
jeffhandley pushed a commit that referenced this pull request May 9, 2025
Co-authored-by: Steve Sanderson <SteveSandersonMS@users.noreply.github.com>
jeffhandley added a commit that referenced this pull request May 9, 2025
* Update CHANGELOGS for M.E.AI libs (#6359)

* Add project template build tests + CG reporting (#6355)

Co-authored-by: Steve Sanderson <SteveSandersonMS@users.noreply.github.com>

* Use ConversationId instead of ChatThreadId (#6367)

* Update OpenTelemetryChatClient/EmbeddingGenerator to 1.33 (#6366)

Also adds the Enable SensitiveData property to OpenTelemetryEmbeddingGenerators. This was missed when adding it to OpenTelemetryChatClient.

* Add AsIEmbeddingGenerator for Azure.AI.Inference ImageEmbeddingsClient (#6363)

* Add DataContent.Base64Data (#6365)

* Fix AIFunctionFactory handling of default struct arguments (#6381)

* Fix AIFunctionFactory handling of default struct arguments

* Extend testing to custom structs

* Use GetUninitializedObject instead of Activator

* Add JS dependency update instructions to chat template README (#6376)

* Add see also links to conceptual docs (#6368)

* Translate OpenAI refusals to ErrorContent (#6393)

Refusals in OpenAI are errors reported when the service can't generate an output that matches the requested schema. Translate refusals to ErrorContent now that we have it.

* Rename useJsonSchema parameter (#6394)

* Rename useJsonSchema parameter

The `GetResponseAsync<T>` methods accept a `bool?` parameter currently called `useJsonSchema`. This is confusing, because the whole point of the method is to create and use a JSON schema from the `T`. The parameter actually controls _how_ that schema is used, whether it's included as part of the messages (false), as part of a ChatResponseFormat in the ChatOptions (true), or up to the system to decide.

I've clarified it by renaming it from `useJsonSchema` to `useJsonSchemaResponseFormat`. It's wordier, but it disambiguates the intent.

* Update src/Libraries/Microsoft.Extensions.AI/ChatCompletion/ChatClientStructuredOutputExtensions.cs

Co-authored-by: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>

---------

Co-authored-by: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>

* Add JSON schema transformation functionality to `AIJsonUtilities` (#6383)

* Add initial schema transformation functionality and incorporate into the OpenAI leaf client.

* Update all leaf client implementions, improve naming, add testing.

* Remove redundant suppressions

* Address feedback.

* Add ChatOptions.RawRepresentationFactory (#6319)

* Look for OpenAI.ChatCompletionOptions in top-level additional properties and stop looking for individually specific additional properties

* Add RawRepresentation to ChatOptions and use it in OpenAI and AzureAIInference

* Remove now unused locals

* Add [JsonIgnore] and update roundtrip tests

* Overwirte properties only if the underlying model don't specify it already

* Clone RawRepresentation

* Reflection workaround for ToolChoice not being cloned

* Style changes

* AI.Inference: Bring back propagation of additional properties

* Don't use 0.1f, it doesn't roundtrip properly in .NET Framework

* Add RawRepresentationFactory instead of object? property

* Augment remarks to discourage returning shared instances

* Documentation feedback

* AI.Inference: keep passing TopK as AdditionalProperty if not already there

* Avoid caching in CachingChatClient when ConversationId is set (#6400)

* Add comment LoggingChatClient et al trace-level logging (#6391)

Also fixed the name of the LoggingSpeechToTextClientBuilderExtensions type to conform to patterns used elsewhere in the library.

* Fix test validation of aggregate usage counts (#6401)

* Add BinaryEmbedding (#6398)

* Add BinaryEmbedding

Also:
- Renames the polymorphic discriminators to conform with typical lingo for these types.
- Adds an Embedding.Dimensions virtual property.

* Fix keys for EvaluationResult.Metrics dictionary to reflect the correct metric names for Safety evaluators (#6361)

This was an unfortunate regression that was introduced during a recent refactoring.

The metrics returned from the Azure AI Foundry Evaluation service have different names than the ones we use in the Safety library. We translate the EvaluationMetric.Name of the metrics returned by the service to the more display friendly names used in the library before returning the metrics to the caller.

While the returned metrics were correctly patched up, the EvaluationResult.Metrics dictionary still stored these metrics by the original names returned from the service. Unfortunately, this meant EvaluationResult.Get would throw an exception when trying to fetch metric with name ViolenceEvaluator.ViolenceMetricName.

The fix in this commit fixes the keys in the dictionary as well. This commit also updates tests to cover the case being fixed.

Fixes #6360

* Update branding for Azure service used by Safety evaluators (#6362)

`Azure AI Content Safety service` -> `Azure AI Foundry Evaluation service`

* Remove CacheOptions from DiskBasedResponseCache (#6395)

Details for this change are available in #6387.

Fixes #6387

* Add back net9.0 version of the aieval dotnet tool (#6396)

In #6148, we disabled net9.0 TFM for the aieval tool to work around the race described in dotnet/sdk#47696. The underlying issue was subsequently fixed in the SDK (via dotnet/sdk#47788). However, this fix has not been backported to the dotnet 9 SDK yet.

The SDK team is working on backporting the fix (see discussion in dotnet/sdk#47788 (comment)). But in the meanwhile, we can add back the net9.0 TFM and continue to work around the race by disabling parallel build.

This would help users of the aieval tool sidestep errors such as the ones described in #6388 when they dont have dotnet8 installed on the machine.

We can remove this workaround, once the backported fix is available in the dotnet 9 SDK.

Fixes #6388

* Add some additional documentation around usage of cache, and CSP properties on report (#6377)

* Add documentation around proper usage of IDistributedCache

* Add Content-Security-Policy to prevent page from calling into other sites.

* Remove remark about IDistributedCache usage

* Fix package-lock.json

* Remove <remarks> start tag.

* Some API related fixes for the evaluation libraries (#6402)

* Rename IResultStore and IResponseCacheProvider

IResultStore -> IEvaluationResultStore and
IResponseCacheProvider -> IEvaluationResponseCacheProvider

* Include missing EvaluationContextConverter in AzureStorageJsonUtilities

Also use linked files to avoid the need to duplicate code.

* Reorder enum members

The new order goes from least desirable rating to most desirable.

* Refactor extension method overloads

Implement overloads that take ChatMessage by calling corresponding overloads that take ChatResponse.

* Refactor AddTurnDetails to support adding details for multiple turns

Adding single turns continues to be supported via a params array overload.

* Add missing parameter for timeToLiveForCacheEntries to DiskBasedReportingConfiguration

This was missed in an earlier PR that introduced the timeToLiveForCacheEntries on the constructor of DiskBasedResponseCacheProvider.

Also reorder constructor parameters for AzureStorageReportingConfiguration so that the parameters for caching apear next to each other and so that the parameter ordering is aligned with DiskBasedReportingConfiguration.

* Minor formatting changes

* Bump vite from 6.2.6 to 6.3.4 in /src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/TypeScript (#6354)

* Bump vite

Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 6.2.6 to 6.3.4.
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v6.3.4/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-version: 6.3.4
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shyam Namboodiripad <gnamboo@microsoft.com>

* Allow image rendering in evaluation report (#6407)

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Stephen Toub <stoub@microsoft.com>
Co-authored-by: Mackinnon Buck <mackinnon.buck@gmail.com>
Co-authored-by: Steve Sanderson <SteveSandersonMS@users.noreply.github.com>
Co-authored-by: Genevieve Warren <24882762+gewarren@users.noreply.github.com>
Co-authored-by: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
Co-authored-by: David Cantú <dacantu@microsoft.com>
Co-authored-by: Shyam N <shyamnamboodiripad@users.noreply.github.com>
Co-authored-by: Peter Waldschmidt <pewaldsc@microsoft.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shyam Namboodiripad <gnamboo@microsoft.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
area-ai-templates Microsoft.Extensions.AI.Templates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure templates are built via a pipeline to support CG
5 participants