diff mbox series

[PULL,V2,25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly

Message ID 20220720090313.55169-26-jasowang@redhat.com
State New
Headers show
Series [PULL,V2,01/25] vhost: move descriptor translation to vhost_svq_vring_write_descs | expand

Commit Message

Jason Wang July 20, 2022, 9:03 a.m. UTC
From: Zhang Chen <chen.zhang@intel.com>

When COLO use only one vnet_hdr_support parameter between
filter-redirector and filter-mirror(or colo-compare), COLO will crash
with segmentation fault. Back track as follow:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
(gdb) bt
0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
net/colo.c:49
2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
net/filter-rewriter.c:63

So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
raise error and add trace-events to track vnet_hdr_len.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo.c       | 9 ++++++++-
 net/trace-events | 1 +
 2 files changed, 9 insertions(+), 1 deletion(-)

Comments

Peter Maydell July 29, 2022, 1:58 p.m. UTC | #1
On Wed, 20 Jul 2022 at 10:04, Jason Wang <jasowang@redhat.com> wrote:
>
> From: Zhang Chen <chen.zhang@intel.com>
>
> When COLO use only one vnet_hdr_support parameter between
> filter-redirector and filter-mirror(or colo-compare), COLO will crash
> with segmentation fault. Back track as follow:
>
> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
> 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>     at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> 296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
> (gdb) bt
> 0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>     at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> 1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
> net/colo.c:49
> 2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
> net/filter-rewriter.c:63
>
> So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
> raise error and add trace-events to track vnet_hdr_len.
>
> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Hi -- Coverity points out that there's a problem with this fix
(CID 1490786):

> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
>      static const uint8_t vlan[] = {0x81, 0x00};
>      uint8_t *data = pkt->data + pkt->vnet_hdr_len;

data here is set to pkt->data + pkt->vnet_hdr_len.
If pkt->data is NULL then this addition is C undefined behaviour,
and if pkt->data is not NULL but the integer added is such
that the pointer ends up not pointing within data, then that
is also C undefined behaviour...

>      uint16_t l3_proto;
> -    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> +    ssize_t l2hdr_len;
> +
> +    if (data == NULL) {

...so the compiler is allowed to assume that data cannot be NULL
here, and this entire error checking clause is logically dead code.

If you're trying to check whether vnet_hdr_len is valid, you
need to do that as an integer arithmetic check before you start
using it for pointer arithmetic. And you probably want to be
checking against some kind of bound, not just for whether the
result is going to be NULL.

Overall this looks kinda sketchy and could probably use more
detailed code review. Where does the bogus vnet_hdr_len come from in
the first place? Is this data from the guest, or from the network
(ie uncontrolled)?

> +        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
> +                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
> +        return 1;
> +    }

thanks
-- PMM
Jason Wang Aug. 1, 2022, 4:17 a.m. UTC | #2
在 2022/7/29 21:58, Peter Maydell 写道:
> On Wed, 20 Jul 2022 at 10:04, Jason Wang <jasowang@redhat.com> wrote:
>> From: Zhang Chen <chen.zhang@intel.com>
>>
>> When COLO use only one vnet_hdr_support parameter between
>> filter-redirector and filter-mirror(or colo-compare), COLO will crash
>> with segmentation fault. Back track as follow:
>>
>> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
>> 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
>> 296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
>> (gdb) bt
>> 0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
>>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
>> 1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
>> net/colo.c:49
>> 2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
>> net/filter-rewriter.c:63
>>
>> So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
>> raise error and add trace-events to track vnet_hdr_len.
>>
>> Signed-off-by: Tao Xu <tao3.xu@intel.com>
>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Hi -- Coverity points out that there's a problem with this fix
> (CID 1490786):
>
>> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
>>       static const uint8_t vlan[] = {0x81, 0x00};
>>       uint8_t *data = pkt->data + pkt->vnet_hdr_len;
> data here is set to pkt->data + pkt->vnet_hdr_len.
> If pkt->data is NULL then this addition is C undefined behaviour,
> and if pkt->data is not NULL but the integer added is such
> that the pointer ends up not pointing within data, then that
> is also C undefined behaviour...


Right. And I don't see how pkt->data can be NULL, maybe a hint of bug 
elsewhere.


>
>>       uint16_t l3_proto;
>> -    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
>> +    ssize_t l2hdr_len;
>> +
>> +    if (data == NULL) {
> ...so the compiler is allowed to assume that data cannot be NULL
> here, and this entire error checking clause is logically dead code.
>
> If you're trying to check whether vnet_hdr_len is valid, you
> need to do that as an integer arithmetic check before you start
> using it for pointer arithmetic. And you probably want to be
> checking against some kind of bound, not just for whether the
> result is going to be NULL.


Or we can let the caller to validate the Pkt before calling this function.


>
> Overall this looks kinda sketchy and could probably use more
> detailed code review. Where does the bogus vnet_hdr_len come from in
> the first place? Is this data from the guest, or from the network
> (ie uncontrolled)?


If I understand correctly the vnet_hdr_len is set during handshake 
(net_fill_rstate()) which is sent from a network backend.

Chen or Xu, please post a fix and explain what happens in the changelog.

Thanks


>
>> +        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
>> +                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
>> +        return 1;
>> +    }
> thanks
> -- PMM
>
Zhang, Chen Aug. 1, 2022, 5:32 a.m. UTC | #3
> -----Original Message-----
> From: Jason Wang <jasowang@redhat.com>
> Sent: Monday, August 1, 2022 12:18 PM
> To: Zhang, Chen <chen.zhang@intel.com>; Xu, Tao3 <tao3.xu@intel.com>
> Cc: qemu-devel@nongnu.org; Li Zhijian <lizhijian@fujitsu.com>; Peter
> Maydell <peter.maydell@linaro.org>
> Subject: Re: [PULL V2 25/25] net/colo.c: fix segmentation fault when packet
> is not parsed correctly
> 
> 
> 在 2022/7/29 21:58, Peter Maydell 写道:
> > On Wed, 20 Jul 2022 at 10:04, Jason Wang <jasowang@redhat.com> wrote:
> >> From: Zhang Chen <chen.zhang@intel.com>
> >>
> >> When COLO use only one vnet_hdr_support parameter between
> >> filter-redirector and filter-mirror(or colo-compare), COLO will crash
> >> with segmentation fault. Back track as follow:
> >>
> >> Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation
> fault.
> >> 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
> >>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> >> 296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
> >> (gdb) bt
> >> 0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
> >>      at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
> >> 1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
> >> net/colo.c:49
> >> 2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
> >> net/filter-rewriter.c:63
> >>
> >> So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
> >> raise error and add trace-events to track vnet_hdr_len.
> >>
> >> Signed-off-by: Tao Xu <tao3.xu@intel.com>
> >> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> >> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > Hi -- Coverity points out that there's a problem with this fix (CID
> > 1490786):
> >
> >> @@ -46,7 +46,14 @@ int parse_packet_early(Packet *pkt)
> >>       static const uint8_t vlan[] = {0x81, 0x00};
> >>       uint8_t *data = pkt->data + pkt->vnet_hdr_len;
> > data here is set to pkt->data + pkt->vnet_hdr_len.
> > If pkt->data is NULL then this addition is C undefined behaviour, and
> > if pkt->data is not NULL but the integer added is such that the
> > pointer ends up not pointing within data, then that is also C
> > undefined behaviour...
> 
> 
> Right. And I don't see how pkt->data can be NULL, maybe a hint of bug
> elsewhere.
> 
> 
> >
> >>       uint16_t l3_proto;
> >> -    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
> >> +    ssize_t l2hdr_len;
> >> +
> >> +    if (data == NULL) {
> > ...so the compiler is allowed to assume that data cannot be NULL
> > here, and this entire error checking clause is logically dead code.
> >
> > If you're trying to check whether vnet_hdr_len is valid, you
> > need to do that as an integer arithmetic check before you start
> > using it for pointer arithmetic. And you probably want to be
> > checking against some kind of bound, not just for whether the
> > result is going to be NULL.
> 
> 
> Or we can let the caller to validate the Pkt before calling this function.
> 
> 
> >
> > Overall this looks kinda sketchy and could probably use more
> > detailed code review. Where does the bogus vnet_hdr_len come from in
> > the first place? Is this data from the guest, or from the network
> > (ie uncontrolled)?
> 
> 
> If I understand correctly the vnet_hdr_len is set during handshake
> (net_fill_rstate()) which is sent from a network backend.
> 
> Chen or Xu, please post a fix and explain what happens in the changelog.

OK, we will send a patch to fix it.

Thanks
Chen

> 
> Thanks
> 
> 
> >
> >> +        trace_colo_proxy_main_vnet_info("This packet is not parsed
> correctly, "
> >> +                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
> >> +        return 1;
> >> +    }
> > thanks
> > -- PMM
> >
diff mbox series

Patch

diff --git a/net/colo.c b/net/colo.c
index 694f3c9..6b0ff56 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -46,7 +46,14 @@  int parse_packet_early(Packet *pkt)
     static const uint8_t vlan[] = {0x81, 0x00};
     uint8_t *data = pkt->data + pkt->vnet_hdr_len;
     uint16_t l3_proto;
-    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
+    ssize_t l2hdr_len;
+
+    if (data == NULL) {
+        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
+                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
+        return 1;
+    }
+    l2hdr_len = eth_get_l2_hdr_length(data);
 
     if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
         trace_colo_proxy_main("pkt->size < ETH_HLEN");
diff --git a/net/trace-events b/net/trace-events
index d7a1725..6af927b 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -9,6 +9,7 @@  vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
 
 # colo.c
 colo_proxy_main(const char *chr) ": %s"
+colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d"
 
 # colo-compare.c
 colo_compare_main(const char *chr) ": %s"