Message ID | 356ec6de-003f-4653-b227-b6672596618c@app.fastmail.com |
---|---|
State | Superseded |
Headers | show |
Series | EAP-TEAP peer: keep inner EAP method when processing Identity method | expand |
On Sun, Nov 27, 2022 at 02:13:33PM +0000, Alexander Clouter wrote: > We need the inner EAP method's MSK/EMSK material to verify/calculate > the Cryptobinding CMACs so do not dispose of them when seeing an > Identity request; this occurs duing EAP sequences (machine+user auth) Why would this be needed for the Identity method? It is not an EAP authentication method and it is not followed by the Intermediate-Result/Crypto-Binding exchange (unlike the actual EAP authentication methods would be). Unless I missed something here, this seems to be related to this errata entry on the RFC 7170: https://www.rfc-editor.org/errata/eid5767
On Sun, 27 Nov 2022, at 14:47, Jouni Malinen wrote: > On Sun, Nov 27, 2022 at 02:13:33PM +0000, Alexander Clouter wrote: >> We need the inner EAP method's MSK/EMSK material to verify/calculate >> the Cryptobinding CMACs so do not dispose of them when seeing an >> Identity request; this occurs duing EAP sequences (machine+user auth) > > Why would this be needed for the Identity method? It is not an EAP > authentication method and it is not followed by the > Intermediate-Result/Crypto-Binding exchange (unlike the actual EAP > authentication methods would be). Unless I missed something here, this > seems to be related to this errata entry on the RFC 7170: > https://www.rfc-editor.org/errata/eid5767 When EAP chaining (eg. EAP-TLS[machine] followed by EAP-MSCHAPv2[user]) in TEAP, the server sends to the client (along with the cryptobinding and interediate-status TLVs) an EAP Identity-Type to ask the peer for the correct type of identity. This is something that is described in RFC7170, in section 4.2.3, 4.3 and in particular is shown in appendix C.6. Windows 11 needs to be prompted on what to use, otherwise it replies with a null identity which opens a whole world of pain. Back to hostapd though, the presence of that second inner EAP Identity causes it to remove reference to the phase2 method just completed (ie. EAP-TLS) and so results in the fall back onto using eap_teap_derive_cmk_basic_pw_auth() when calculating the CMK in eap_teap.c:eap_teap_get_cmk(). Cheers
On Sun, Nov 27, 2022 at 03:21:11PM +0000, Alexander Clouter wrote: > On Sun, 27 Nov 2022, at 14:47, Jouni Malinen wrote: > > On Sun, Nov 27, 2022 at 02:13:33PM +0000, Alexander Clouter wrote: > >> We need the inner EAP method's MSK/EMSK material to verify/calculate > >> the Cryptobinding CMACs so do not dispose of them when seeing an > >> Identity request; this occurs duing EAP sequences (machine+user auth) > > > > Why would this be needed for the Identity method? It is not an EAP > > authentication method and it is not followed by the > > Intermediate-Result/Crypto-Binding exchange (unlike the actual EAP > > authentication methods would be). Unless I missed something here, this > > seems to be related to this errata entry on the RFC 7170: > > https://www.rfc-editor.org/errata/eid5767 > When EAP chaining (eg. EAP-TLS[machine] followed by EAP-MSCHAPv2[user]) in TEAP, the server sends to the client (along with the cryptobinding and interediate-status TLVs) an EAP Identity-Type to ask the peer for the correct type of identity. OK, but that Identity-Type TLV is not another instance of Identity method (behavior of which is what this patch is proposing to change).. > This is something that is described in RFC7170, in section 4.2.3, 4.3 and in particular is shown in appendix C.6. That Appendix C.6 example is explicitly noting that an identity request is not sent before negotiating EAP-Type=Y. > Windows 11 needs to be prompted on what to use, otherwise it replies with a null identity which opens a whole world of pain. I'm not completely sure what "replies with a null identity" means in this context. Are you about EAP Payload TLV, not Identity-Type TLV here? I.e., there is actually two instances of EAP-Request/Identity within the tunnel? > Back to hostapd though, the presence of that second inner EAP Identity causes it to remove reference to the phase2 method just completed (ie. EAP-TLS) and so results in the fall back onto using eap_teap_derive_cmk_basic_pw_auth() when calculating the CMK in eap_teap.c:eap_teap_get_cmk(). Would you be able to share a debug log showing this? I'm testing this with EAP-MSCHAPv2 for user authentication followed by EAP-MSCHAPv2 for machine authentication and when the second EAP-Request/Identity message is processed between those, the IMSK for the first EAP-MSCHAPv2 has already been fetched and used for CMK calculation. As such, this works fine with the first phase 2 authentication method being cleared after that point. I do not see how eap_teap_derive_cmk_basic_pw_auth() would come into picture here since CMK was already derived before processing the EAP-Request/Identity. It sounds like there is something else happening in the sequence if eap_teap_get_cmk() is reached with data->phase2_method == NULL.. Maybe you are sending out the second EAP-Request/Identity in an EAP-Payload-TLV before the Crypto-Binding-TLV for the previously completed EAP authentication method? If so, I would suggest reordering that (see that C.6 example for an example when moving from EAP-Type X to Y) so that the crypto binding and CMK derivation is completed prior to starting any additional EAP methods, including the Identity method.
On Tue, 29 Nov 2022, at 21:30, Jouni Malinen wrote: >> When EAP chaining (eg. EAP-TLS[machine] followed by EAP-MSCHAPv2[user]) in TEAP, the server sends to the client (along with the cryptobinding and interediate-status TLVs) an EAP Identity-Type to ask the peer for the correct type of identity. > > OK, but that Identity-Type TLV is not another instance of Identity > method (behavior of which is what this patch is proposing to change).. I probably muddled the explanation, sorry. >> This is something that is described in RFC7170, in section 4.2.3, 4.3 and in particular is shown in appendix C.6. > > That Appendix C.6 example is explicitly noting that an identity request > is not sent before negotiating EAP-Type=Y. I thought RFC7170 appendix C.6 did show the second EAP Identity response, but a second non-glancing look found you are right, it does not. The EAP Identity response though is needed for Win10/11...well it was the only way I could get this working for sequenced authentication. >> Windows 11 needs to be prompted on what to use, otherwise it replies with a null identity which opens a whole world of pain. > > I'm not completely sure what "replies with a null identity" means in > this context. Are you about EAP Payload TLV, not Identity-Type TLV here? > I.e., there is actually two instances of EAP-Request/Identity within the > tunnel? Whilst doing interop with Windows, I noticed that if I did not start the next method in the sequence without EAP Identity (inside the EAP-Payload TLV), Windows replied with a zero byte length identity and refused to go any further. But, let me go back and reproduce this to refresh my brain, it was a month or two ago when I figured this out. There are two instances of EAP-Identity in the tunnel. 1. server->peer: Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] 2. peer<->server: EAP-Payload-TLV[do EAP-<anything>] 3. server->peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] 4. server<-peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] 5. peer<->server: EAP-Payload-TLV[do EAP-<anything>] 6. server->peer: {Success,Cryptobinding}-TLV 7. server<-peer: {Success,Cryptobinding}-TLV I will get some PCAPs of this though, this is from memory. >> Back to hostapd though, the presence of that second inner EAP Identity causes it to remove reference to the phase2 method just completed (ie. EAP-TLS) and so results in the fall back onto using eap_teap_derive_cmk_basic_pw_auth() when calculating the CMK in eap_teap.c:eap_teap_get_cmk(). > > Would you be able to share a debug log showing this? Sure, give me a few days to find the time and put back together my rig. I also submitted patches to Wireshark which made it into version 4 so you get TEAP decoding and inner EAP-TLS decryption. The SSLKEYLOGFILE support I submitted and you merged into hostapd the other month combined with creating an embedded pcapng with the keying material makes things a lot easier to collaborate on. https://gitlab.com/wireshark/wireshark/-/wikis/TLS#embedding-decryption-secrets-in-a-pcapng-file You want the debug logs and/or PCAPs here on the list, or offline? > I'm testing this with EAP-MSCHAPv2 for user authentication followed by > EAP-MSCHAPv2 for machine authentication and when the second > EAP-Request/Identity message is processed between those, the IMSK for > the first EAP-MSCHAPv2 has already been fetched and used for CMK > calculation. As such, this works fine with the first phase 2 > authentication method being cleared after that point. I do not see how > eap_teap_derive_cmk_basic_pw_auth() would come into picture here since > CMK was already derived before processing the EAP-Request/Identity. The problem is though I got Windows 10/11 working with FreeRADIUS, I needed to get eapol_test invited to the party too. What got my attention from the eapol_test logs was 'CMK[no-inner-EAP]' after the machine inner EAP-TLS authentication had just completed successfully. Digging around I I found the problem seemed to be that data->phase2_method was NULL and a bit more debugging found that it was that EAP identity was blanking it. Removing the deinit when an EAP-Identity was processed fixed the problem, it made sense to me at the time why too. Though maybe not the right fix, at least it is better than "does not work", right? :) I have not looked into or tested using hostapd as a server for sequences against Win10/11. > It sounds like there is something else happening in the sequence if > eap_teap_get_cmk() is reached with data->phase2_method == NULL.. Maybe > you are sending out the second EAP-Request/Identity in an > EAP-Payload-TLV before the Crypto-Binding-TLV for the previously > completed EAP authentication method? If so, I would suggest reordering > that (see that C.6 example for an example when moving from EAP-Type X to > Y) so that the crypto binding and CMK derivation is completed prior to > starting any additional EAP methods, including the Identity method. I do not think this will work, for non-technical reasons. RFC7170 has only one ordering requirement I could find and that is to process the Identity-Type TLV before the EAP-Payload as it could affect routing. https://www.rfc-editor.org/rfc/rfc7170#section-4.3 Depending on a state machine implementation of the peer (or vice versa) I can see that a server pushing the Crypto-Binding TLV to the front of the list is helpful but that is definitely not mentioned in the RFC so it would be unsporting to beat people up over this. If you would prefer to see an ordering fix here, then I think applying a change to hostapd to search and process any Crypto-Binding TLV first, then then Identity-Type TLV and then anything afterwards could work? Your state machine is nice to work with...be a shame to muck it up with TEAP. Alan has just posted to EMU that the TEAP RFC really does need to be fixed up: https://mailarchive.ietf.org/arch/msg/emu/Asg-ramjDWHr67nYVTyhZRJMI9o/ Maybe you want to chip in? Thanks
On Wed, Nov 30, 2022 at 03:53:15PM +0000, Alexander Clouter wrote: > Whilst doing interop with Windows, I noticed that if I did not start the next method in the sequence without EAP Identity (inside the EAP-Payload TLV), Windows replied with a zero byte length identity and refused to go any further. Interesting.. Anyway, I did check what I had implemented and I did include the second EAP-Identity exchange within the tunnel. I guess that is kind of needed for some EAP methods anyway since there has to be a username provided for those. This does get a bit interesting with the cryptobinding requirements, though. > There are two instances of EAP-Identity in the tunnel. > > 1. server->peer: Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] > 2. peer<->server: EAP-Payload-TLV[do EAP-<anything>] > 3. server->peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] > 4. server<-peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] (I did realize something about the implementation while writing this email, but I do not want to lose all the comments here about the protocol design, so I'm not going to rewrite this part.. Anyway, as a prior warning, the implementation consideration below is quite different from what the protocol consideration here starts with..) I think I'm fine with everything here as long as that list of TLVs is indeed the sequence in which those TLVs are encoded in the tunnel.. > 5. peer<->server: EAP-Payload-TLV[do EAP-<anything>] > 6. server->peer: {Success,Cryptobinding}-TLV > 7. server<-peer: {Success,Cryptobinding}-TLV > > I will get some PCAPs of this though, this is from memory. eapol_test will also print those things out in debug output and I have automated hwsim test cases that go through a set of the TEAP combinations, so I can easily compare the behavior against this type of list of TLVs. > You want the debug logs and/or PCAPs here on the list, or offline? I guess offline might be more convenient if things start getting any bigger (the list has a pretty small maximum message length anyway). > The problem is though I got Windows 10/11 working with FreeRADIUS, I needed to get eapol_test invited to the party too. > > What got my attention from the eapol_test logs was 'CMK[no-inner-EAP]' after the machine inner EAP-TLS authentication had just completed successfully. While I have not yet managed to force hostapd to send the Crypto-Binding TLV after the second EAP-Request/Identity, I'm pretty sure that is the difference here between what you see with FreeRADIUS and I see with hostapd as the TEAP server. > Digging around I I found the problem seemed to be that data->phase2_method was NULL and a bit more debugging found that it was that EAP identity was blanking it. > > Removing the deinit when an EAP-Identity was processed fixed the problem, it made sense to me at the time why too. > > Though maybe not the right fix, at least it is better than "does not work", right? :) Well, the debug log showing the issue would be even better ;-). But yes, this "CMK[no-inner-EAP]" part is certainly better than just "does not work". > > It sounds like there is something else happening in the sequence if > > eap_teap_get_cmk() is reached with data->phase2_method == NULL.. Maybe > > you are sending out the second EAP-Request/Identity in an > > EAP-Payload-TLV before the Crypto-Binding-TLV for the previously > > completed EAP authentication method? If so, I would suggest reordering > > that (see that C.6 example for an example when moving from EAP-Type X to > > Y) so that the crypto binding and CMK derivation is completed prior to > > starting any additional EAP methods, including the Identity method. > > I do not think this will work, for non-technical reasons. Interesting.. I'm not sure what those non-technical reasons are, but I would still recommend completing Crypto-Binding for an EAP authentication method by writing the TLV for this before starting another EAP method to keep the TEAP design within the tunnel reasonable. > RFC7170 has only one ordering requirement I could find and that is to process the Identity-Type TLV before the EAP-Payload as it could affect routing. > > https://www.rfc-editor.org/rfc/rfc7170#section-4.3 > > Depending on a state machine implementation of the peer (or vice versa) I can see that a server pushing the Crypto-Binding TLV to the front of the list is helpful but that is definitely not mentioned in the RFC so it would be unsporting to beat people up over this. I'd agree that this should be clearly defined (and IMHO, required) in the RFC. > If you would prefer to see an ordering fix here, then I think applying a change to hostapd to search and process any Crypto-Binding TLV first, then then Identity-Type TLV and then anything afterwards could work? While that would likely work around this (well, I'd need to see the debug logs first to confirm the exact sequence of TLVs), I think this is not exactly clean due to the way TLS tunnels work in general. The part about being able to determine that you have received a "full message" is not really supposed to happen since the TLS tunnel is a stream protocol and there is no explicit beginning and end of a message. It would feel more logical to process the received TLVs one by one as they are decrypted from the TLS tunnel rather than force full reassembly of all TEAP fragments and decryption of all the application data before being able to process the first TLV that was received. Sure, in this particular instance use of the encapsulating TEAP with lock step EAP protocol ends up generating such information about "message boundaries" at upper layer in the protocol, but this does not feel correct to use when parsing the TLVs within the tunnel. All that said, I do realize that RFC 7170 does specify maximum number of TLVs that can be included in a "Request" or a "Response", so I guess that is already messing up my model of a stream of TLVs within the tunnel. And all that above was based on protocol design part, but if we get back to the implementation side, I can actually see that I did end up implementing this in a manner that parses the full set of TLVs from on TEAP "message" (or whatever those things are called when reassembling together one or more EAP-Request/TEAP packets). What I do really understand now is why I ended up processing the EAP-Payload TLV before Crypto-Binding TLV.. I cannot really think of a good reason for that now and it was already that way in the first TEAP commit, so the reason for that did not get documented either. Anyway, instead of what the patch here proposed to do, the most practical change could indeed be this reordering of implementation-internal processing steps (and separately, also get all the requirements spelled out in an updated RFC). If you have the setup easily ready for testing, please check with the following applied to eapol_test (all this does is move that segment of code within a function): diff --git a/src/eap_peer/eap_teap.c b/src/eap_peer/eap_teap.c index bc7f6f4f5abe..e8dfc70f6219 100644 --- a/src/eap_peer/eap_teap.c +++ b/src/eap_peer/eap_teap.c @@ -1298,6 +1298,33 @@ static int eap_teap_process_decrypted(struct eap_sm *sm, goto done; } + if (tlv.crypto_binding) { + if (tlv.iresult != TEAP_STATUS_SUCCESS && + tlv.result != TEAP_STATUS_SUCCESS) { + wpa_printf(MSG_DEBUG, + "EAP-TEAP: Unexpected Crypto-Binding TLV without Result TLV or Intermediate-Result TLV indicating success"); + failed = 1; + error = TEAP_ERROR_UNEXPECTED_TLVS_EXCHANGED; + goto done; + } + + tmp = eap_teap_process_crypto_binding(sm, data, ret, + tlv.crypto_binding, + tlv.crypto_binding_len); + if (!tmp) { + failed = 1; + error = TEAP_ERROR_TUNNEL_COMPROMISE_ERROR; + } else { + resp = wpabuf_concat(resp, tmp); + if (tlv.result == TEAP_STATUS_SUCCESS && !failed) + data->result_success_done = 1; + if (tlv.iresult == TEAP_STATUS_SUCCESS && !failed) { + data->inner_method_done = 0; + data->iresult_verified = 1; + } + } + } + if (tlv.identity_type == TEAP_IDENTITY_TYPE_MACHINE) { struct eap_peer_config *config = eap_get_config(sm); @@ -1353,33 +1380,6 @@ static int eap_teap_process_decrypted(struct eap_sm *sm, } } - if (tlv.crypto_binding) { - if (tlv.iresult != TEAP_STATUS_SUCCESS && - tlv.result != TEAP_STATUS_SUCCESS) { - wpa_printf(MSG_DEBUG, - "EAP-TEAP: Unexpected Crypto-Binding TLV without Result TLV or Intermediate-Result TLV indicating success"); - failed = 1; - error = TEAP_ERROR_UNEXPECTED_TLVS_EXCHANGED; - goto done; - } - - tmp = eap_teap_process_crypto_binding(sm, data, ret, - tlv.crypto_binding, - tlv.crypto_binding_len); - if (!tmp) { - failed = 1; - error = TEAP_ERROR_TUNNEL_COMPROMISE_ERROR; - } else { - resp = wpabuf_concat(resp, tmp); - if (tlv.result == TEAP_STATUS_SUCCESS && !failed) - data->result_success_done = 1; - if (tlv.iresult == TEAP_STATUS_SUCCESS && !failed) { - data->inner_method_done = 0; - data->iresult_verified = 1; - } - } - } - if (data->result_success_done && data->session_ticket_used && eap_teap_derive_msk(data) == 0) { /* Assume the server might accept authentication without going
On Wed, Nov 30, 2022 at 08:05:44PM +0200, Jouni Malinen wrote: > On Wed, Nov 30, 2022 at 03:53:15PM +0000, Alexander Clouter wrote: > > There are two instances of EAP-Identity in the tunnel. > > > > 1. server->peer: Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] > > 2. peer<->server: EAP-Payload-TLV[do EAP-<anything>] > > 3. server->peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] > > 4. server<-peer: {Intermediate-Success,Cryptobinding}-TLV + Identity-Type-TLV + EAP-Payload-TLV[EAP-Identity] .. > While I have not yet managed to force hostapd to send the Crypto-Binding > TLV after the second EAP-Request/Identity, I'm pretty sure that is the > difference here between what you see with FreeRADIUS and I see with > hostapd as the TEAP server. I was able to reproduce this now. I had not used the optimized sequence within the tunnel by combination start of the next EAP method with the cryptobinding of the previous one. I implemented that in hostapd and saw the same issue in wpa_supplicant. This is now fixed in hostap.git using the changes I described here. This will hopefully work with FreeRADIUS as well.
diff --git a/src/eap_peer/eap_teap.c b/src/eap_peer/eap_teap.c index 42769eb64..86529f41c 100644 --- a/src/eap_peer/eap_teap.c +++ b/src/eap_peer/eap_teap.c @@ -429,7 +429,7 @@ static int eap_teap_phase2_request(struct eap_sm *sm, wpa_printf(MSG_DEBUG, "EAP-TEAP: Phase 2 Request: type=%u:%u", vendor, method); if (vendor == EAP_VENDOR_IETF && method == EAP_TYPE_IDENTITY) { - eap_teap_deinit_inner_eap(sm, data); + /* do not deinit the inner EAP method as we need it for the Cryptobinding CMACs */ *resp = eap_sm_buildIdentity(sm, hdr->identifier, 1); return 0; }
We need the inner EAP method's MSK/EMSK material to verify/calculate the Cryptobinding CMACs so do not dispose of them when seeing an Identity request; this occurs duing EAP sequences (machine+user auth) Signed-off-by: Alexander Clouter <aclouter@networkradius.com> --- src/eap_peer/eap_teap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)