Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support regional ESTS (ests-r) #2508

Closed
bgavrilMS opened this issue Mar 25, 2021 · 12 comments
Closed

Support regional ESTS (ests-r) #2508

bgavrilMS opened this issue Mar 25, 2021 · 12 comments

Comments

@bgavrilMS
Copy link
Member

bgavrilMS commented Mar 25, 2021

Problem

AAD is adding support for regional STS (ESTSR). Currently only S2S (client_credentials) flow and available via opt-in (1st party only for now). Although a DNS-only solution is preferred, this is not practical. The client SDK needs to help route the traffic to the regional authority.

Goals

  1. Provide an API that allows traffic to be routed to a regional authority (P0)
  2. Help developers detect the region (P1)
  3. Existing telemetry needs to work. We must detect if user region != detected region and report it via telemetry.

Proposal

// if app knows the region, use it as MSAL cannot reliably detect it
string region = Config.Region ?? ConfidentialClientApplication.AttemptRegionDiscovery;

var cca = ConfidentialClientApplicationBuilder(client_id)
                  .WithAuthority(AzureCloud.PublicCloud)
                  .WithRegion(region)  
                  .Build();

Behavior

  • if region = null, ignore
  • if cloud is PUBLIC cloud, use {region}.microsoft.com
  • otherwise, use {region}.environment
@jmprieur
Copy link
Contributor

jmprieur commented Mar 25, 2021

What about, for the detection:

string region = await ConfidentialClientApplication.AutoDetectRegion();

@jmprieur
Copy link
Contributor

To be able to send telemetry on if we auto detect correctly or not, the region returned by the detector could even contain a character (an exclamation mark at the end?), to signify that the region was auto detected. And of course, .WithRegion would remember it was auto-detected, and trim the extra character and use the region

@bgavrilMS
Copy link
Member Author

bgavrilMS commented Mar 25, 2021

  1. For the detection, do you mean a static method on ConfidentialClientApplication or an instance method? We'd probably need to pass in a logger as well, so it'll be more like:
string region = await ConfidentialClientApplication.AutoDetectRegion(LogCallback);
  1. For the telemetry, I like your idea, but I think we should go all in on it:
  • if env variable detection, return "uswest1_msal_autodetect_variable"
  • if imds detection succesfull, return "uswest1_msal_autodetect_imds"

The alternative would be to have a static variable that captures this.

@jennyf19
Copy link
Collaborator

jennyf19 commented Mar 26, 2021

for the cca options, something like public string? AzureRegion {get; set;} would work i think. would be nice to have the default be the AutoDetectRegion.

For the detection, what if this comes to pca? do we want something that is more generalized? if not, i think the above looks good.

@bgavrilMS
Copy link
Member Author

@jennyf19 - yes, we discussed about adding it to public client app but since it's unlikely to happen anytime soon (i.e. years), decided not to polute the api and keep in at confidential level only. It's easy to add to PCA later.

Thanks for looking at the API @jennyf19, I agree on the Options part.

@pmaytak
Copy link
Contributor

pmaytak commented Mar 26, 2021

After I wrote out thoughts below, I guess I'm leaning to a solution that's similar to what we have now. I guess I'm not a huge fan of having to call TryGetRegionAsync before creating PCA. I'd rather have it simply encapsulated inside WithRegion.


1 Doesn't using WithRegion already imply autodetection? (So, AutoDetectRegion is unnecessary?)

  • If WithRegion() (null / empty string region parameter) is used, we detect region, and use it.
    • If region cannot be autodetected, we log error and fall back to global.
  • If WithRegion(userRegion) is specified, we still detect region; send both regions in telemetry, and use the user-provided region for requests.
    • If region cannot be autodetected, we use the user-specified one.

2 In the simple scenario code snippet, is calling a RegionDetector.TryGetRegionAsync a prerequisite to calling WithRegion(region)?

  • I don't think it should be. Users should just call WithRegion() and it will be auto detected for them.
  • I think RegionDetector.TryGetRegionAsync should still be public, so that end users can get the region detected by MSAL for their own purposes.
  • WithRegion will internally call RegionDetector.TryGetRegionAsync which will save the detected region into some static variable like we do now.

3 Provide a mechanism to fallback to the non-regional authority in case the regional one is down (P2)
We would never know if regional one is down, right? Gateway will just forward the request to global and not tell us.

@bgavrilMS
Copy link
Member Author

Thanks @pmaytak for the feedback.

After playing around with the idea for a while, I realized that it's very inconvenient to introduce new top-level options in MSAL, because those objects need logging, HTTP stack etc. all of which are configurable in our Application objects. So having a top-level option would be like:

RegionDiscovery.GetRegionAsynz(LogCallback logger, IHttpClientFactory httpClientFactory);

This is too complicated, so went back to WithRegion(region) and WithRegion("auto")

@bgavrilMS bgavrilMS added this to the 4.29.0 milestone Mar 26, 2021
@jabbera
Copy link
Contributor

jabbera commented Mar 26, 2021

From an outsiders perspective how does failover work w.r.t. a regional outage?

@jennyf19
Copy link
Collaborator

@henrik-me is probably the best to answer this.

From an outsiders perspective how does failover work w.r.t. a regional outage?

@henrik-me
Copy link
Contributor

@jabbera : Not sure exactly what you mean. Can you please provide more context to the question?

Region means the traffic will stay in the region, thus impacts from other regions or even global will not impact a service running in a another region.

@jabbera
Copy link
Contributor

jabbera commented Mar 26, 2021

@henrik-me the scenario described assumes that Azure AD and the region will always fail together. Is it possible for Azure AD in EastUS2 (EU2) to have an outage but the region itself would still be up? In that case don't want to be glued to only EU2 for my azure ad auth. I don't know how the Azure AD network is designed, but if {region}.microsoft.com had a GTM with primary\hot standby (in central or something) in front of it my hypothetical scenario may not be an issue.

@henrik-me
Copy link
Contributor

@jabbera, thanks for the added details to your question. Networking issues including DNS issues are not covered with this change, for DNS and network handling we allow you to pass in a HttpClientFactory where you can do what is right for your service to mitigate those issues (e.g. small outages to DNS would typically not be felt as DNS is cached on the node and only refreshed at certain intervals, you can decide which interval would be the right for your scenario).

Two other perspectives that regional and MSAL in general helps with for the scenario you are describing is:

  1. MSAL has tokens in the cache and depending on the outage period and the lifetime of the token there will be less of a need to call AAD for a token, and thus mitigates if say just AAD in the region is down.
  2. AAD regional also have lots of failure modes covered, including a gateway in front of it which can re-direct traffic as needed.

Hope this helped bring some perspectives to the question you raised. Please also feel free to share what you believe should happen and how it can be done.

@henrik-me henrik-me changed the title Regional Redesign Support regional ESTS (ests-r) Apr 6, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

6 participants