diff mbox series

EAP-TEAP peer: keep inner EAP method when processing Identity method

Message ID 356ec6de-003f-4653-b227-b6672596618c@app.fastmail.com
State Superseded
Headers show
Series EAP-TEAP peer: keep inner EAP method when processing Identity method | expand

Commit Message

Alexander Clouter Nov. 27, 2022, 2:13 p.m. UTC
We need the inner EAP method's MSK/EMSK material to verify/calculate
the Cryptobinding CMACs so do not dispose of them when seeing an
Identity request; this occurs duing EAP sequences (machine+user auth)

Signed-off-by: Alexander Clouter <aclouter@networkradius.com>
---
 src/eap_peer/eap_teap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Jouni Malinen Nov. 27, 2022, 2:47 p.m. UTC | #1
On Sun, Nov 27, 2022 at 02:13:33PM +0000, Alexander Clouter wrote:
> We need the inner EAP method's MSK/EMSK material to verify/calculate
> the Cryptobinding CMACs so do not dispose of them when seeing an
> Identity request; this occurs duing EAP sequences (machine+user auth)

Why would this be needed for the Identity method? It is not an EAP
authentication method and it is not followed by the
Intermediate-Result/Crypto-Binding exchange (unlike the actual EAP
authentication methods would be). Unless I missed something here, this
seems to be related to this errata entry on the RFC 7170:
https://www.rfc-editor.org/errata/eid5767
Alexander Clouter Nov. 27, 2022, 3:21 p.m. UTC | #2
On Sun, 27 Nov 2022, at 14:47, Jouni Malinen wrote:
> On Sun, Nov 27, 2022 at 02:13:33PM +0000, Alexander Clouter wrote:
>> We need the inner EAP method's MSK/EMSK material to verify/calculate
>> the Cryptobinding CMACs so do not dispose of them when seeing an
>> Identity request; this occurs duing EAP sequences (machine+user auth)
>
> Why would this be needed for the Identity method? It is not an EAP
> authentication method and it is not followed by the
> Intermediate-Result/Crypto-Binding exchange (unlike the actual EAP
> authentication methods would be). Unless I missed something here, this
> seems to be related to this errata entry on the RFC 7170:
> https://www.rfc-editor.org/errata/eid5767

When EAP chaining (eg. EAP-TLS[machine] followed by EAP-MSCHAPv2[user]) in TEAP, the server sends to the client (along with the cryptobinding and interediate-status TLVs) an EAP Identity-Type to ask the peer for the correct type of identity.

This is something that is described in RFC7170, in section 4.2.3, 4.3 and in particular is shown in appendix C.6.

Windows 11 needs to be prompted on what to use, otherwise it replies with a null identity which opens a whole world of pain.

Back to hostapd though, the presence of that second inner EAP Identity causes it to remove reference to the phase2 method just completed (ie. EAP-TLS) and so results in the fall back onto using eap_teap_derive_cmk_basic_pw_auth() when calculating the CMK in eap_teap.c:eap_teap_get_cmk().

Cheers
Jouni Malinen Nov. 29, 2022, 9:30 p.m. UTC | #3
On Sun, Nov 27, 2022 at 03:21:11PM +0000, Alexander Clouter wrote:
> On Sun, 27 Nov 2022, at 14:47, Jouni Malinen wrote:
> > On Sun, Nov 27, 2022 at 02:13:33PM +0000, Alexander Clouter wrote:
> >> We need the inner EAP method's MSK/EMSK material to verify/calculate
> >> the Cryptobinding CMACs so do not dispose of them when seeing an
> >> Identity request; this occurs duing EAP sequences (machine+user auth)
> >
> > Why would this be needed for the Identity method? It is not an EAP
> > authentication method and it is not followed by the
> > Intermediate-Result/Crypto-Binding exchange (unlike the actual EAP
> > authentication methods would be). Unless I missed something here, this
> > seems to be related to this errata entry on the RFC 7170:
> > https://www.rfc-editor.org/errata/eid5767

> When EAP chaining (eg. EAP-TLS[machine] followed by EAP-MSCHAPv2[user]) in TEAP, the server sends to the client (along with the cryptobinding and interediate-status TLVs) an EAP Identity-Type to ask the peer for the correct type of identity.

OK, but that Identity-Type TLV is not another instance of Identity
method (behavior of which is what this patch is proposing to change)..

> This is something that is described in RFC7170, in section 4.2.3, 4.3 and in particular is shown in appendix C.6.

That Appendix C.6 example is explicitly noting that an identity request
is not sent before negotiating EAP-Type=Y.

> Windows 11 needs to be prompted on what to use, otherwise it replies with a null identity which opens a whole world of pain.

I'm not completely sure what "replies with a null identity" means in
this context. Are you about EAP Payload TLV, not Identity-Type TLV here?
I.e., there is actually two instances of EAP-Request/Identity within the
tunnel?

> Back to hostapd though, the presence of that second inner EAP Identity causes it to remove reference to the phase2 method just completed (ie. EAP-TLS) and so results in the fall back onto using eap_teap_derive_cmk_basic_pw_auth() when calculating the CMK in eap_teap.c:eap_teap_get_cmk().

Would you be able to share a debug log showing this?

I'm testing this with EAP-MSCHAPv2 for user authentication followed by
EAP-MSCHAPv2 for machine authentication and when the second
EAP-Request/Identity message is processed between those, the IMSK for
the first EAP-MSCHAPv2 has already been fetched and used for CMK
calculation. As such, this works fine with the first phase 2
authentication method being cleared after that point. I do not see how
eap_teap_derive_cmk_basic_pw_auth() would come into picture here since
CMK was already derived before processing the EAP-Request/Identity.

It sounds like there is something else happening in the sequence if
eap_teap_get_cmk() is reached with data->phase2_method == NULL.. Maybe
you are sending out the second EAP-Request/Identity in an
EAP-Payload-TLV before the Crypto-Binding-TLV for the previously
completed EAP authentication method? If so, I would suggest reordering
that (see that C.6 example for an example when moving from EAP-Type X to
Y) so that the crypto binding and CMK derivation is completed prior to
starting any additional EAP methods, including the Identity method.
Alexander Clouter Nov. 30, 2022, 3:53 p.m. UTC | #4
On Tue, 29 Nov 2022, at 21:30, Jouni Malinen wrote:
>> When EAP chaining (eg. EAP-TLS[machine] followed by EAP-MSCHAPv2[user]) in TEAP, the server sends to the client (along with the cryptobinding and interediate-status TLVs) an EAP Identity-Type to ask the peer for the correct type of identity.
>
> OK, but that Identity-Type TLV is not another instance of Identity
> method (behavior of which is what this patch is proposing to change)..

I probably muddled the explanation, sorry.

>> This is something that is described in RFC7170, in section 4.2.3, 4.3 and in particular is shown in appendix C.6.
>
> That Appendix C.6 example is explicitly noting that an identity request
> is not sent before negotiating EAP-Type=Y.

I thought RFC7170 appendix C.6 did show the second EAP Identity response, but a second non-glancing look found you are right, it does not.

The EAP Identity response though is needed for Win10/11...well it was the only way I could get this working for sequenced authentication.

>> Windows 11 needs to be prompted on what to use, otherwise it replies with a null identity which opens a whole world of pain.
>
> I'm not completely sure what "replies with a null identity" means in
> this context. Are you about EAP Payload TLV, not Identity-Type TLV here?
> I.e., there is actually two instances of EAP-Request/Identity within the
> tunnel?

Whilst doing interop with Windows, I noticed that if I did not start the next method in the sequence without EAP Identity (inside the EAP-Payload TLV), Windows replied with a zero byte length identity and refused to go any further.

But, let me go back and reproduce this to refresh my brain, it was a month or two ago when I figured this out.

There are two instances of EAP-Identity in the tunnel.

1. server->peer: Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]
2. peer<->server: EAP-Payload-TLV[do EAP-<anything>]
3. server->peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]
4. server<-peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]
5. peer<->server: EAP-Payload-TLV[do EAP-<anything>]
6. server->peer: {Success,Cryptobinding}-TLV
7. server<-peer: {Success,Cryptobinding}-TLV

I will get some PCAPs of this though, this is from memory.

>> Back to hostapd though, the presence of that second inner EAP Identity causes it to remove reference to the phase2 method just completed (ie. EAP-TLS) and so results in the fall back onto using eap_teap_derive_cmk_basic_pw_auth() when calculating the CMK in eap_teap.c:eap_teap_get_cmk().
>
> Would you be able to share a debug log showing this?

Sure, give me a few days to find the time and put back together my rig.

I also submitted patches to Wireshark which made it into version 4 so you get TEAP decoding and inner EAP-TLS decryption. The SSLKEYLOGFILE support I submitted and you merged into hostapd the other month combined with creating an embedded pcapng with the keying material makes things a lot easier to collaborate on.

https://gitlab.com/wireshark/wireshark/-/wikis/TLS#embedding-decryption-secrets-in-a-pcapng-file

You want the debug logs and/or PCAPs here on the list, or offline?

> I'm testing this with EAP-MSCHAPv2 for user authentication followed by
> EAP-MSCHAPv2 for machine authentication and when the second
> EAP-Request/Identity message is processed between those, the IMSK for
> the first EAP-MSCHAPv2 has already been fetched and used for CMK
> calculation. As such, this works fine with the first phase 2
> authentication method being cleared after that point. I do not see how
> eap_teap_derive_cmk_basic_pw_auth() would come into picture here since
> CMK was already derived before processing the EAP-Request/Identity.

The problem is though I got Windows 10/11 working with FreeRADIUS, I needed to get eapol_test invited to the party too.

What got my attention from the eapol_test logs was 'CMK[no-inner-EAP]' after the machine inner EAP-TLS authentication had just completed successfully.

Digging around I  I found the problem seemed to be that data->phase2_method was NULL and a bit more debugging found that it was that EAP identity was blanking it.

Removing the deinit when an EAP-Identity was processed fixed the problem, it made sense to me at the time why too.

Though maybe not the right fix, at least it is better than "does not work", right? :)

I have not looked into or tested using hostapd as a server for sequences against Win10/11.

> It sounds like there is something else happening in the sequence if
> eap_teap_get_cmk() is reached with data->phase2_method == NULL.. Maybe
> you are sending out the second EAP-Request/Identity in an
> EAP-Payload-TLV before the Crypto-Binding-TLV for the previously
> completed EAP authentication method? If so, I would suggest reordering
> that (see that C.6 example for an example when moving from EAP-Type X to
> Y) so that the crypto binding and CMK derivation is completed prior to
> starting any additional EAP methods, including the Identity method.

I do not think this will work, for non-technical reasons.

RFC7170 has only one ordering requirement I could find and that is to process the Identity-Type TLV before the EAP-Payload as it could affect routing.

https://www.rfc-editor.org/rfc/rfc7170#section-4.3

Depending on a state machine implementation of the peer (or vice versa) I can see that a server pushing the Crypto-Binding TLV to the front of the list is helpful but that is definitely not mentioned in the RFC so it would be unsporting to beat people up over this.

If you would prefer to see an ordering fix here, then I think applying a change to hostapd to search and process any Crypto-Binding TLV first, then then Identity-Type TLV and then anything afterwards could work?

Your state machine is nice to work with...be a shame to muck it up with TEAP.

Alan has just posted to EMU that the TEAP RFC really does need to be fixed up:

https://mailarchive.ietf.org/arch/msg/emu/Asg-ramjDWHr67nYVTyhZRJMI9o/

Maybe you want to chip in?

Thanks
Jouni Malinen Nov. 30, 2022, 6:05 p.m. UTC | #5
On Wed, Nov 30, 2022 at 03:53:15PM +0000, Alexander Clouter wrote:
> Whilst doing interop with Windows, I noticed that if I did not start the next method in the sequence without EAP Identity (inside the EAP-Payload TLV), Windows replied with a zero byte length identity and refused to go any further.

Interesting.. Anyway, I did check what I had implemented and I did
include the second EAP-Identity exchange within the tunnel. I guess that
is kind of needed for some EAP methods anyway since there has to be a
username provided for those. This does get a bit interesting with the
cryptobinding requirements, though.

> There are two instances of EAP-Identity in the tunnel.
> 
> 1. server->peer: Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]
> 2. peer<->server: EAP-Payload-TLV[do EAP-<anything>]
> 3. server->peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]
> 4. server<-peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]

(I did realize something about the implementation while writing this
email, but I do not want to lose all the comments here about the
protocol design, so I'm not going to rewrite this part.. Anyway, as a
prior warning, the implementation consideration below is quite different
from what the protocol consideration here starts with..)

I think I'm fine with everything here as long as that list of TLVs is
indeed the sequence in which those TLVs are encoded in the tunnel..

> 5. peer<->server: EAP-Payload-TLV[do EAP-<anything>]
> 6. server->peer: {Success,Cryptobinding}-TLV
> 7. server<-peer: {Success,Cryptobinding}-TLV
> 
> I will get some PCAPs of this though, this is from memory.

eapol_test will also print those things out in debug output and I have
automated hwsim test cases that go through a set of the TEAP
combinations, so I can easily compare the behavior against this type of
list of TLVs.

> You want the debug logs and/or PCAPs here on the list, or offline?

I guess offline might be more convenient if things start getting any
bigger (the list has a pretty small maximum message length anyway).

> The problem is though I got Windows 10/11 working with FreeRADIUS, I needed to get eapol_test invited to the party too.
> 
> What got my attention from the eapol_test logs was 'CMK[no-inner-EAP]' after the machine inner EAP-TLS authentication had just completed successfully.

While I have not yet managed to force hostapd to send the Crypto-Binding
TLV after the second EAP-Request/Identity, I'm pretty sure that is the
difference here between what you see with FreeRADIUS and I see with
hostapd as the TEAP server.

> Digging around I  I found the problem seemed to be that data->phase2_method was NULL and a bit more debugging found that it was that EAP identity was blanking it.
> 
> Removing the deinit when an EAP-Identity was processed fixed the problem, it made sense to me at the time why too.
> 
> Though maybe not the right fix, at least it is better than "does not work", right? :)

Well, the debug log showing the issue would be even better ;-). But yes,
this "CMK[no-inner-EAP]" part is certainly better than just "does not
work".

> > It sounds like there is something else happening in the sequence if
> > eap_teap_get_cmk() is reached with data->phase2_method == NULL.. Maybe
> > you are sending out the second EAP-Request/Identity in an
> > EAP-Payload-TLV before the Crypto-Binding-TLV for the previously
> > completed EAP authentication method? If so, I would suggest reordering
> > that (see that C.6 example for an example when moving from EAP-Type X to
> > Y) so that the crypto binding and CMK derivation is completed prior to
> > starting any additional EAP methods, including the Identity method.
> 
> I do not think this will work, for non-technical reasons.

Interesting.. I'm not sure what those non-technical reasons are, but I
would still recommend completing Crypto-Binding for an EAP
authentication method by writing the TLV for this before starting
another EAP method to keep the TEAP design within the tunnel reasonable.

> RFC7170 has only one ordering requirement I could find and that is to process the Identity-Type TLV before the EAP-Payload as it could affect routing.
> 
> https://www.rfc-editor.org/rfc/rfc7170#section-4.3
> 
> Depending on a state machine implementation of the peer (or vice versa) I can see that a server pushing the Crypto-Binding TLV to the front of the list is helpful but that is definitely not mentioned in the RFC so it would be unsporting to beat people up over this.

I'd agree that this should be clearly defined (and IMHO, required) in
the RFC.

> If you would prefer to see an ordering fix here, then I think applying a change to hostapd to search and process any Crypto-Binding TLV first, then then Identity-Type TLV and then anything afterwards could work?

While that would likely work around this (well, I'd need to see the
debug logs first to confirm the exact sequence of TLVs), I think this is
not exactly clean due to the way TLS tunnels work in general. The part
about being able to determine that you have received a "full message" is
not really supposed to happen since the TLS tunnel is a stream protocol
and there is no explicit beginning and end of a message. It would feel
more logical to process the received TLVs one by one as they are
decrypted from the TLS tunnel rather than force full reassembly of all
TEAP fragments and decryption of all the application data before being
able to process the first TLV that was received.

Sure, in this particular instance use of the encapsulating TEAP with
lock step EAP protocol ends up generating such information about
"message boundaries" at upper layer in the protocol, but this does not
feel correct to use when parsing the TLVs within the tunnel.

All that said, I do realize that RFC 7170 does specify maximum number of
TLVs that can be included in a "Request" or a "Response", so I guess
that is already messing up my model of a stream of TLVs within the
tunnel.

And all that above was based on protocol design part, but if we get back
to the implementation side, I can actually see that I did end up
implementing this in a manner that parses the full set of TLVs from on
TEAP "message" (or whatever those things are called when reassembling
together one or more EAP-Request/TEAP packets).

What I do really understand now is why I ended up processing the
EAP-Payload TLV before Crypto-Binding TLV.. I cannot really think of a
good reason for that now and it was already that way in the first TEAP
commit, so the reason for that did not get documented either. Anyway,
instead of what the patch here proposed to do, the most practical change
could indeed be this reordering of implementation-internal processing
steps (and separately, also get all the requirements spelled out in an
updated RFC). If you have the setup easily ready for testing, please
check with the following applied to eapol_test (all this does is move
that segment of code within a function):


diff --git a/src/eap_peer/eap_teap.c b/src/eap_peer/eap_teap.c
index bc7f6f4f5abe..e8dfc70f6219 100644
--- a/src/eap_peer/eap_teap.c
+++ b/src/eap_peer/eap_teap.c
@@ -1298,6 +1298,33 @@ static int eap_teap_process_decrypted(struct eap_sm *sm,
 		goto done;
 	}
 
+	if (tlv.crypto_binding) {
+		if (tlv.iresult != TEAP_STATUS_SUCCESS &&
+		    tlv.result != TEAP_STATUS_SUCCESS) {
+			wpa_printf(MSG_DEBUG,
+				   "EAP-TEAP: Unexpected Crypto-Binding TLV without Result TLV or Intermediate-Result TLV indicating success");
+			failed = 1;
+			error = TEAP_ERROR_UNEXPECTED_TLVS_EXCHANGED;
+			goto done;
+		}
+
+		tmp = eap_teap_process_crypto_binding(sm, data, ret,
+						      tlv.crypto_binding,
+						      tlv.crypto_binding_len);
+		if (!tmp) {
+			failed = 1;
+			error = TEAP_ERROR_TUNNEL_COMPROMISE_ERROR;
+		} else {
+			resp = wpabuf_concat(resp, tmp);
+			if (tlv.result == TEAP_STATUS_SUCCESS && !failed)
+				data->result_success_done = 1;
+			if (tlv.iresult == TEAP_STATUS_SUCCESS && !failed) {
+				data->inner_method_done = 0;
+				data->iresult_verified = 1;
+			}
+		}
+	}
+
 	if (tlv.identity_type == TEAP_IDENTITY_TYPE_MACHINE) {
 		struct eap_peer_config *config = eap_get_config(sm);
 
@@ -1353,33 +1380,6 @@ static int eap_teap_process_decrypted(struct eap_sm *sm,
 		}
 	}
 
-	if (tlv.crypto_binding) {
-		if (tlv.iresult != TEAP_STATUS_SUCCESS &&
-		    tlv.result != TEAP_STATUS_SUCCESS) {
-			wpa_printf(MSG_DEBUG,
-				   "EAP-TEAP: Unexpected Crypto-Binding TLV without Result TLV or Intermediate-Result TLV indicating success");
-			failed = 1;
-			error = TEAP_ERROR_UNEXPECTED_TLVS_EXCHANGED;
-			goto done;
-		}
-
-		tmp = eap_teap_process_crypto_binding(sm, data, ret,
-						      tlv.crypto_binding,
-						      tlv.crypto_binding_len);
-		if (!tmp) {
-			failed = 1;
-			error = TEAP_ERROR_TUNNEL_COMPROMISE_ERROR;
-		} else {
-			resp = wpabuf_concat(resp, tmp);
-			if (tlv.result == TEAP_STATUS_SUCCESS && !failed)
-				data->result_success_done = 1;
-			if (tlv.iresult == TEAP_STATUS_SUCCESS && !failed) {
-				data->inner_method_done = 0;
-				data->iresult_verified = 1;
-			}
-		}
-	}
-
 	if (data->result_success_done && data->session_ticket_used &&
 	    eap_teap_derive_msk(data) == 0) {
 		/* Assume the server might accept authentication without going
Jouni Malinen Dec. 1, 2022, 4:45 p.m. UTC | #6
On Wed, Nov 30, 2022 at 08:05:44PM +0200, Jouni Malinen wrote:
> On Wed, Nov 30, 2022 at 03:53:15PM +0000, Alexander Clouter wrote:
> > There are two instances of EAP-Identity in the tunnel.
> > 
> > 1. server->peer: Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]
> > 2. peer<->server: EAP-Payload-TLV[do EAP-<anything>]
> > 3. server->peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]
> > 4. server<-peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity]
..

> While I have not yet managed to force hostapd to send the Crypto-Binding
> TLV after the second EAP-Request/Identity, I'm pretty sure that is the
> difference here between what you see with FreeRADIUS and I see with
> hostapd as the TEAP server.

I was able to reproduce this now. I had not used the optimized sequence
within the tunnel by combination start of the next EAP method with the
cryptobinding of the previous one. I implemented that in hostapd and saw
the same issue in wpa_supplicant. This is now fixed in hostap.git using
the changes I described here. This will hopefully work with FreeRADIUS
as well.
diff mbox series

Patch

diff --git a/src/eap_peer/eap_teap.c b/src/eap_peer/eap_teap.c
index 42769eb64..86529f41c 100644
--- a/src/eap_peer/eap_teap.c
+++ b/src/eap_peer/eap_teap.c
@@ -429,7 +429,7 @@  static int eap_teap_phase2_request(struct eap_sm *sm,
 	wpa_printf(MSG_DEBUG, "EAP-TEAP: Phase 2 Request: type=%u:%u",
 		   vendor, method);
 	if (vendor == EAP_VENDOR_IETF && method == EAP_TYPE_IDENTITY) {
-		eap_teap_deinit_inner_eap(sm, data);
+		/* do not deinit the inner EAP method as we need it for the Cryptobinding CMACs */
 		*resp = eap_sm_buildIdentity(sm, hdr->identifier, 1);
 		return 0;
 	}