Attending: Eve, Domenico, Justin, Andi, Adrian, George, Maciej, Sarah, Mike

See below.

On Tue, Feb 2, 2016 at 10:42 AM, Eve Maler <eve@xmlgrrl.com> wrote:
Attending: Eve, Andi, Sarah, Domenico, Justin, Mike, James, John W, Maciej, George (partial)?

Thanks to the intrepid attendees of this ad hoc meeting series!

The progressive agenda for this meeting series:
Analyze the nature of the vulnerability and the use cases at risk at the various time horizons (now/near-future/far-future) to be sure we understand its parameters
Analyze the proposed mitigations and see if there are any others we have missed
Assemble pros and cons for the mitigations and the forms in which they can be "packaged" to bring forward to the whole WG
Adding:

4. How to communicate around this attack?

NOTE: We have renamed the issue to call it "Session Fixation Attack on UMA Claims-Gathering Protocol (UMA 1.0.1 Section 3.6.3)". This is because in the spec, trust elevation encompasses all of the ways to test the RqP (and even the client) for suitability, including any extension mechanisms, not just claims-gathering.

Logistics: We will hold a short ad hoc meeting in the 30 minutes prior to Thursday's WG telecon. We nearly got through agenda item 1b; the agenda will be to finish 1b, start 2a, and set up next week's ad hoc meeting schedule. We may be able to complete all the agenda items by next week at this pace.

AI: Eve: Set up the next ad hoc meeting in the calendar.

1a. Analyze the nature of the vulnerability

The preconditions are laid out in the issue in GitHub. Let's see if we can identify other "truths" in the scenario:
The nature of the grant flow that was used to mint the AAT is irrelevant to this attack.
We're assuming that the attack is possible even if only one of the two RqPs is compromised and nothing else (AS, client, etc.).
This doesn't have to do with step-up authentication; there may be vulnerability dragons there, but they're not the same as this vulnerability. We talked about asking the guys who just formally analyzed OAuth 2.0 to analyze UMA too.

AI: Eve: Reach out to the security researchers who analyzed OAuth.

Recall that we are showing in our WSD two clients because the attacker uses either the same client or a different client with the same client ID, but with the same session that is able to compromise the existing permission ticket. The (persistent) permission ticket is usable by both pieces of software (or the "same" piece of software used by both victim and attacker).

The attacker is able to phish the true RqP and substitute the true RqP in front of them as soon as interaction with an RqP is needed for any claims-gathering, and then substitute themselves back with the original permission ticket when it's time to "turn in the winning ticket to get the winnings" (get the RPT and then go on to access the resource). You're getting the legitimate winner Bob to present their ID as is proper to get their winnings, but at the last minute, Eve the Eavesdropper swoops in and knocks the winning chit out of his hands and runs up to the counter.

Here is a consolidated web sequence diagram (which will update whenever I make changes to it!) that reflects, I think, a current understanding of the attack. I've added a few comments, and some questions that we can ponder regarding additional potential mitigations. Many thanks to Justin for kicking off this flow.

http://www.websequencediagrams.com/files/render?link=tQ9pVR17H-5H93gM2wzU

We walked through the new swimlane.

AI: Eve: Change "requesting party (RqP)" in WSD to mention "victim" and edit notes.

Is Eve's question above "Load claims gathering endpoint at AS" pertinent? The "pushed claims" scenario, vs. using the claims-gathering endpoint, is what she is seeing around narrow ecosystems. Justin has deployed "tight ecosystems" inside enterprises that do leverage OAuth redirect flows, so he's certain that behavior-signaling is a totally insufficient mitigation because of the high likelihood that the interaction can still be silent (as noted in the writeup of the attack: "the victim’s claims might be passed to the server automatically through SSO"). As long as the phishing email is convincing enough, the victim will go along with it. It's possible that better guidance on this can help at the margin. But by this point in the attack, the AS hasn't even seen the client at all.

What about the question about loading claims_redirect? The attacker just polls whenever, and when they achieve success, then they're off to the races. This is the exact same mechanism as in the old OAuth 1.0 session fixation attack.

Alice's PAT was used to generate the permission ticket, and until Bob submits his "Bobly claims" (a Justinism :-), we're not sure who is who on the requesting side at all.

1b. Analyze the use cases at risk at the various time horizons

Eve's point in suggesting this topic is that it helps us figure out when and how to potentially apply tactical (less invasive/disruptive?) and strategic (more invasive/disruptive?) mitigations. One possibility:

NOW........SOON........LATER
(tactical?)...........(strategic?)

This is necessarily a more subjective conversation because vulnerability is technical and risk is business-legal.

Mike's deployment doesn't use interactive claims-gathering. It should be noted that he has deployed only #APIsec use cases. Which use cases are likeliest to use interactive claims-gathering? Justin has implemented for it. His use cases are cross-domain (#wideeco) and the human RqP is present and using a client app. He expects interactive claims-gathering. Eve and James are seeing imminent deployments where interactive claims-gathering isn't expected at first; this is because the human RqP experience is expected to be extra-smooth, and the client apps are either published by the operator of the AS or by close partners, and the first deployments are likely to use "hub-and-spoke" identity federations where the AS is also where the RqP's claims from. However, wider -- at least "medium" -- ecosystem deployments will eventually be on the scene.

...tbs: more to come...

To be more precise, the "medium" ecosystems they are seeing arising have Alice's AS as a claims client (at maximum) to Bob's claims providers (typically a single IdP, but it could eventually go beyond just that "identity-centric" role) with a trust relationship pre-established between them. Justin is seeing this "identity-centric" claims-gathering role initially too.

2a. Analyze the proposed mitigations

Door #1: Add an unpredictable code parameter to the post-claims-gathering response

Door #2: Rotating permission tickets with each call to the Authorization Server

Both of these proposed mitigations involve returning some value from the claims-gathering URL to the (attacker's) client that is unpredictable. The permission ticket is currently required to be static across all interactions that seek the same protected resource; Door #1 puts the randomness in a new parameter, and Door #2 puts the randomness in the permission ticket itself. The consequences differ.

Door #3: Collapse the claims gathering endpoint and the authorization endpoint into one

If we go through all of the work to do this (as Justin has suggested in various other more "wholesale" design-related issues), then the attack goes away.

2b. Identify additional potential mitigations

Are there other "doors" (approaches), or other ways to apply these same techniques?
It appears that behavioral signaling is a bust.
We had explored various mitigations in our existing Security Considerations for RqP redirection and impersonation threats, but they don't work in this case because at the last minute, the attacker "steals Bob's candy".
The "polling"/session fixation attack works because there's no binding of the authenticated client to the polling attempt, just a client ID. Would it be too complex to require that the client be authenticated at that point?

(We switched over to holding the regular WG telecon at this point.)

There is an OAuth vulnerability being discussed where an attacker can replay an authorization code of a victim. Is there any analogy to be made here regarding ways to add entropy? In that case, the state parameter is being added to the token endpoint to add for sufficient correlation. If an OAuth client works with multiple AS's, then there's a problem with a mismatch between authorization codes and states. A binding between them is needed. The mitigation in their case is of a different nature than ours, because we don't need that binding semantic.

3. Assemble pros and cons for the mitigations and their "packages"

Mitigation #1 would require the client to hang onto the static ticket plus, sometimes, a new variable code.

Justin has implemented mitigation #2 in code, and so can speak to pros and cons. He found that it made the client simpler because it doesn't have to remember the ticket. Good to know! This particular AS implementation changed only very slightly; this was because it made use of an artifact of how tickets were implemented.

What does mitigation #3 look like? Would be be able to execute to a comprehensive door #3, which would speak to many additional goals besides mitigating a vulnerability, in good time to mitigate a vulnerability? Justin notes that "door #3" may not be implementable in the real world. :-) There's definitely more exploration to be done. If the RqP has to go through an OAuth flow to reach a page at the AS to get authorized (whatever it is) anyway, then it may be a wash.

Regarding packaging, how does our semver decision impact our choices? Any backwards-incompatible change would force a V2.0. Any new Kantara Recommendation would force an approval cycle that adds about three months if all goes well. But anyone not using the claims-gathering endpoint doesn't need the mitigation; draft technical specifications can be applied before they're fully approved; and extension specifications can be published separately from the main specifications if we want to go that route.

Changing the permission ticket semantics also would involve having the claims-gathering endpoint return the new ticket, and the server would need to advertise that it has endpoints that speak the special semantics as well. We have a way of advertising this in the configuration data (cf. the permission registration extension spec) -- or never signal it here? Justin had also submitted the issue about making the claims-gathering endpoint be able to be more dynamic, with the AS returning it from need_info (issue #167). Justin has in mind a rationale for why it's more secure to have the AS signal the endpoint dynamically, even if it's the same endpoint every time. Eve would be in favor of handling issue #167 in the security extension only if it has a true security rationale, but not if it's just executing a #simplify use case.

Regarding "packaging": What might this mean for our semver decision?

Thinking about who is affected by this vulnerability: Granted that it's a pretty sophisticated phishing attack (not that phishing attacks can't be sophisticated!). The mitigation could reasonably be described as a security enhancement rather than a required fix.

Title of the proposed separate spec: Enhanced Claims-Gathering Security Extension.

At some point, we can slot this into the appropriate UMA version.

We haven't really fully talked through the semver consequences.

(we should add this one too:) 4. How to communicate about this attack and related information
...tbs...

The extension spec publication, likely with just approval by the WG and not full Kantara approval for now, is the way we'd like to propose going.

This concludes the ad hoc meeting series. We'll cancel the 30-minute call scheduled for after the WG call tomorrow, and just present the results of our analysis to the WG during the WG telecon.

Thanks, everybody!