diff mbox series

umbim: fix invalid mbim message string encoding

Message ID 20220510095553.4056033-1-daniel@dd-wrt.com
State Superseded
Delegated to: Hauke Mehrtens
Headers show
Series umbim: fix invalid mbim message string encoding | expand

Commit Message

Daniel Danzberger May 10, 2022, 9:55 a.m. UTC
Strings in mbim messages have to follow these formatting rules:
 - 4 byte alignment, padded if not.
 - utf-16 little endian.

Fixes:
 - mbim connect fails with more than 1 string parameter (apn/user/pass)
   when they are not 4 byte aligned.

Signed-off-by: Daniel Danzberger <daniel@dd-wrt.com>
---
 mbim-msg.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

Comments

Bjørn Mork May 10, 2022, 10:57 a.m. UTC | #1
Daniel Danzberger <daniel@dd-wrt.com> writes:

> Strings in mbim messages have to follow these formatting rules:
>  - 4 byte alignment, padded if not.

ack. MBIM spec section 10.3:

    All fields (either static or variable size) start at a 4-byte
    boundary. Hence, all field offsets shall be multiples of 4. Between
    the end of a field (not necessarily at a 4-byte boundary) and the
    beginning of the next field (always at a 4-byte boundary), padding
    may be required. Padding bytes must be set to 0 upon transmission
    and ignored upon reception.
    ..
    If the size of the payload in the variable field is not a multiple
    of 4 bytes, the field shall be padded up to the next 4 byte
    multiple. This shall be true even for the last payload in
    DataBuffer.

>  - utf-16 little endian.

ack. MBIM spec section 10.4:

    Unless otherwise specified, all strings use UNICODE UTF-16LE
    encodings limited to characters from the Basic Multilingual
    Plane. Strings shall not be terminated by a NULL character.

> +	/* convert to utf-16 little endian */
>  	for (i = 0; i < l; i++)
> -		p[i * 2] = in[i];
> +		p[i * 2] = htole16(in[i]);
>  
>  	return 0;
>  }


This new code is buggy.  It fails on BE and makes no difference on
LE. Both p and in are byte arrays. The byte you write into p on a BE
system will be aither 0x00 or 0xff depending on the MSB of in[i].

The previous code was sort of correct, assuming that the input is mostly
ascii mapping to the same value in unicode. Ideally we should do real
utf16le conversion of the input. But then someone has to decide if the
input is utf8 or something else first.  And you need a real utf16
implementation.  Not sure it's worth it.  Did you ever see an operator
use a non-ascii APN, username, password or pin code?



Bjørn
Daniel Danzberger May 10, 2022, 11:23 a.m. UTC | #2
On Tue, 2022-05-10 at 12:57 +0200, Bjørn Mork wrote:
>     Unless otherwise specified, all strings use UNICODE UTF-16LE
>     encodings limited to characters from the Basic Multilingual
>     Plane. Strings shall not be terminated by a NULL character.
> 
> > +       /* convert to utf-16 little endian */
> >         for (i = 0; i < l; i++)
> > -               p[i * 2] = in[i];
> > +               p[i * 2] = htole16(in[i]);
> >  
> >         return 0;
> >  }
> 
> 
> This new code is buggy.  It fails on BE and makes no difference on
> LE. Both p and in are byte arrays. The byte you write into p on a BE
> system will be aither 0x00 or 0xff depending on the MSB of in[i].

True, I will remove that le16 conversion part and retest.
> 
> The previous code was sort of correct, assuming that the input is mostly
> ascii mapping to the same value in unicode. Ideally we should do real
> utf16le conversion of the input. But then someone has to decide if the
> input is utf8 or something else first.  And you need a real utf16
> implementation.  Not sure it's worth it.  Did you ever see an operator
> use a non-ascii APN, username, password or pin code?
Agree, it's best for now to just leave it as is.
diff mbox series

Patch

diff --git a/mbim-msg.c b/mbim-msg.c
index 5ec04f4..8465091 100644
--- a/mbim-msg.c
+++ b/mbim-msg.c
@@ -53,8 +53,10 @@  mbim_add_payload(uint8_t len)
 int
 mbim_encode_string(struct mbim_string *str, char *in)
 {
-	int l = strlen(in);
-	int s = mbim_add_payload(l * 2);
+	const int l = strlen(in);
+	const int utf16_len = l * 2;
+	const int pad_len = utf16_len % 4;
+	const int s = mbim_add_payload(utf16_len + pad_len);
 	uint8_t *p = &payload_buffer[s];
 	int i;
 
@@ -62,14 +64,15 @@  mbim_encode_string(struct mbim_string *str, char *in)
 		return -1;
 
 	str->offset = htole32(s);
-	str->length = htole32(l * 2);
+	str->length = htole32(utf16_len);
+
+	/* convert to utf-16 little endian */
 	for (i = 0; i < l; i++)
-		p[i * 2] = in[i];
+		p[i * 2] = htole16(in[i]);
 
 	return 0;
 }
 
-
 char *
 mbim_get_string(struct mbim_string *str, char *in)
 {