diff mbox

[2/2] linux-user: SOCK_PACKET uses network endian to encode protocol in socket()

Message ID 1356982680-12436-3-git-send-email-laurent@vivier.eu
State New
Headers show

Commit Message

Laurent Vivier Dec. 31, 2012, 7:38 p.m. UTC
From: Laurent Vivier <Laurent@Vivier.EU>

in PACKET(7) :
                                 protocol is the  IEEE  802.3  protocol
number in network order.  See the <linux/if_ether.h> include file for a
list of allowed protocols.  When protocol is  set  to  htons(ETH_P_ALL)
then all protocols are received.  All incoming packets of that protocol
type will be passed to the packet socket before they are passed to  the
protocols implemented in the kernel.

Signed-off-by: Laurent Vivier <Laurent@Vivier.EU>
---
 include/exec/user/abitypes.h |   22 ++++++++++++++++++++++
 linux-user/syscall.c         |    8 +++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

Comments

Peter Maydell Dec. 31, 2012, 9:32 p.m. UTC | #1
On 31 December 2012 19:38, Laurent Vivier <laurent@vivier.eu> wrote:
> @@ -1900,6 +1900,12 @@ static abi_long do_socket(int domain, int type, int protocol)
>  #endif
>      if (domain == PF_NETLINK)
>          return -EAFNOSUPPORT; /* do not NETLINK socket connections possible */
> +    if (type == SOCK_PACKET) {
> +        /* in this case, socket() needs a network endian short */
> +        protocol = tswapal(protocol); /* restore network endian long */
> +        protocol = abi_ntohl(protocol); /* a host endian long */
> +        protocol = htons(protocol); /* network endian short */
> +    }

Are you sure this is correct for little endian guests? I've only
desk-checked it rather than running a test program, but it looks
to me like you end up passing the wrong value to socket().

Also it seems rather involved since we swap things three times and
have an entirely new abi_* function. Either I'm completely confused
or it should be enough to just have

if (type == SOCK_PACKET) {
      protocol = tswap16(protocol);
}

-- PMM
Laurent Vivier Dec. 31, 2012, 10:19 p.m. UTC | #2
Le lundi 31 décembre 2012 à 21:32 +0000, Peter Maydell a écrit :
> On 31 December 2012 19:38, Laurent Vivier <laurent@vivier.eu> wrote:
> > @@ -1900,6 +1900,12 @@ static abi_long do_socket(int domain, int type, int protocol)
> >  #endif
> >      if (domain == PF_NETLINK)
> >          return -EAFNOSUPPORT; /* do not NETLINK socket connections possible */
> > +    if (type == SOCK_PACKET) {
> > +        /* in this case, socket() needs a network endian short */
> > +        protocol = tswapal(protocol); /* restore network endian long */
> > +        protocol = abi_ntohl(protocol); /* a host endian long */
> > +        protocol = htons(protocol); /* network endian short */
> > +    }
> 
> Are you sure this is correct for little endian guests? I've only
> desk-checked it rather than running a test program, but it looks
> to me like you end up passing the wrong value to socket().

I tried to find a solution working in every case.

> Also it seems rather involved since we swap things three times and
> have an entirely new abi_* function. Either I'm completely confused
> or it should be enough to just have
> 
> if (type == SOCK_PACKET) {
>       protocol = tswap16(protocol);
> }

works... sometime. In fact, work if target endianess is network endianess.

Correct me if I'm wrong.

target          host
little endian / big endian

memory   00 00 00 03
protocol 03000000
tswap16  00000000 -> don't work

tswapal()   00000003
abi_ntohl() 00000003
htons()     00000003 -> work

big endian / little endian:

memory    00 00 00 03
protocol  00000003
tswap16() 00000300 -> work

tswapal() 03000000
abi_ntohl() 00000003
htons()     00000300 -> work

little endian/little endian:

memory: 00 00 00 03 (network endian)
protocol : 03000000
tswap16() : 00000000 -> don't work

tswapal() 03000000
abi_ntohl() 00000003
htons()     00000300 -> work

big endian / big endian

memory 00 00 00 03
protocol 00000003
tswap16() 00000003 -> work

tswapal() 00000003
abi_ntohl() 00000003
htons()     00000003 -> work

Laurent
Peter Maydell Jan. 1, 2013, 3:03 p.m. UTC | #3
On 31 December 2012 22:19, Laurent Vivier <Laurent@vivier.eu> wrote:
> Le lundi 31 décembre 2012 à 21:32 +0000, Peter Maydell a écrit :
>> Also it seems rather involved since we swap things three times and
>> have an entirely new abi_* function. Either I'm completely confused
>> or it should be enough to just have
>>
>> if (type == SOCK_PACKET) {
>>       protocol = tswap16(protocol);
>> }

Looking more carefully at packet(7) this is actually the wrong
guard anyway. You need to check for
 (domain == AF_PACKET) || (type == SOCK_PACKET)

since SOCK_PACKET is the obsolete Linux 2.0 way of doing packet sockets.

> works... sometime. In fact, work if target endianess is network endianess.
>
> Correct me if I'm wrong.
>
> target          host
> little endian / big endian
>
> memory   00 00 00 03

Syscall arguments aren't generally passed in memory, they're
in registers (and if they were pased in memory for some architecture
then that arch would do a load-and-swap-from-memory in main.c).
So the value you see in do_socket() is always "the integer passed
as a syscall parameter, as a host-order integer".

So in this case, with a simple guest program:
#include <sys/socket.h>
#include <netpacket/packet.h>
#include <net/ethernet.h>
#include <arpa/inet.h>

int main(void) {
   return socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
}

you will find that do_socket() in QEMU is passed either 0x3 [if the
guest is bigendian and the guest htons() is a no-op] or 0x0300
[if the guest is littleendian]. Since what we want to pass to the
host socket() call is 0x3 if the host is bigendian and 0x0300 if
the host is little endian, this amounts to needing to do a 16 bit
byteswap if the host and guest are different endianness, which
is exactly what tswap16() does. I checked with i386-to-i386
that do_socket() gets passed 0x300 and we correctly send it
through to the host socket().

-- PMM
Laurent Vivier Jan. 1, 2013, 5:27 p.m. UTC | #4
Le mardi 01 janvier 2013 à 15:03 +0000, Peter Maydell a écrit :
> On 31 December 2012 22:19, Laurent Vivier <Laurent@vivier.eu> wrote:
> > Le lundi 31 décembre 2012 à 21:32 +0000, Peter Maydell a écrit :
> >> Also it seems rather involved since we swap things three times and
> >> have an entirely new abi_* function. Either I'm completely confused
> >> or it should be enough to just have
> >>
> >> if (type == SOCK_PACKET) {
> >>       protocol = tswap16(protocol);
> >> }
> 
> Looking more carefully at packet(7) this is actually the wrong
> guard anyway. You need to check for
>  (domain == AF_PACKET) || (type == SOCK_PACKET)

I agree.

> since SOCK_PACKET is the obsolete Linux 2.0 way of doing packet sockets.

But dhclient is always using this...

> > works... sometime. In fact, work if target endianess is network endianess.
> >
> > Correct me if I'm wrong.
> >
> > target          host
> > little endian / big endian
> >
> > memory   00 00 00 03
> 
> Syscall arguments aren't generally passed in memory, they're
> in registers (and if they were pased in memory for some architecture
> then that arch would do a load-and-swap-from-memory in main.c).
> So the value you see in do_socket() is always "the integer passed
> as a syscall parameter, as a host-order integer".

Yes, I missed that.

> So in this case, with a simple guest program:
> #include <sys/socket.h>
> #include <netpacket/packet.h>
> #include <net/ethernet.h>
> #include <arpa/inet.h>
> 
> int main(void) {
>    return socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
> }
> 
> you will find that do_socket() in QEMU is passed either 0x3 [if the
> guest is bigendian and the guest htons() is a no-op] or 0x0300
> [if the guest is littleendian]. Since what we want to pass to the
> host socket() call is 0x3 if the host is bigendian and 0x0300 if
> the host is little endian, this amounts to needing to do a 16 bit
> byteswap if the host and guest are different endianness, which
> is exactly what tswap16() does. I checked with i386-to-i386
> that do_socket() gets passed 0x300 and we correctly send it
> through to the host socket().

Yes, I agree. I correct the patch.

Thank you,
Laurent
Laurent Vivier Jan. 1, 2013, 6:37 p.m. UTC | #5
Le mardi 01 janvier 2013 à 18:27 +0100, Laurent Vivier a écrit :
> Le mardi 01 janvier 2013 à 15:03 +0000, Peter Maydell a écrit :
> > On 31 December 2012 22:19, Laurent Vivier <Laurent@vivier.eu> wrote:
> > > works... sometime. In fact, work if target endianess is network endianess.
> > >
> > > Correct me if I'm wrong.
> > >
> > > target          host
> > > little endian / big endian
> > >
> > > memory   00 00 00 03
> > 
> > Syscall arguments aren't generally passed in memory, they're
> > in registers (and if they were pased in memory for some architecture
> > then that arch would do a load-and-swap-from-memory in main.c).
> > So the value you see in do_socket() is always "the integer passed
> > as a syscall parameter, as a host-order integer".
> 
> Yes, I missed that.

But, in fact, for socketcall(), they are read from memory :

        static abi_long do_socketcall(int num, abi_ulong vptr)
        {
            abi_long ret;
            const int n = sizeof(abi_ulong);
        
            switch(num) {
            case SOCKOP_socket:
                {
                    abi_ulong domain, type, protocol;
        
                    if (get_user_ual(domain, vptr)
                        || get_user_ual(type, vptr + n)
                        || get_user_ual(protocol, vptr + 2 * n))
                        return -TARGET_EFAULT;
        
                    ret = do_socket(domain, type, protocol);
                }
                break;


So, I don't know if "tswap16()" is always correct. It works for
m68k-to-x86_64, but I don't understand how it can works for
i386-to-i386.

Your opinion ?

Regards,
Laurent
Peter Maydell Jan. 1, 2013, 7:45 p.m. UTC | #6
On 1 January 2013 18:37, Laurent Vivier <Laurent@vivier.eu> wrote:
> Le mardi 01 janvier 2013 à 18:27 +0100, Laurent Vivier a écrit :
>> Le mardi 01 janvier 2013 à 15:03 +0000, Peter Maydell a écrit :
>> > On 31 December 2012 22:19, Laurent Vivier <Laurent@vivier.eu> wrote:
>> > > works... sometime. In fact, work if target endianess is network endianess.
>> > >
>> > > Correct me if I'm wrong.
>> > >
>> > > target          host
>> > > little endian / big endian
>> > >
>> > > memory   00 00 00 03
>> >
>> > Syscall arguments aren't generally passed in memory, they're
>> > in registers (and if they were pased in memory for some architecture
>> > then that arch would do a load-and-swap-from-memory in main.c).
>> > So the value you see in do_socket() is always "the integer passed
>> > as a syscall parameter, as a host-order integer".
>>
>> Yes, I missed that.
>
> But, in fact, for socketcall(), they are read from memory :

Yes, this is because socketcall is weird. The actual kernel
implementation also reads them from memory:
  http://lxr.linux.no/#linux+v3.7.1/net/socket.c#L2443
as an array of unsigned longs. So as long as qemu also reads
them out of memory as an array of target abi_ulongs (which as
you can see we do) then we'll retrieve the same value (0x3 or
0x300) to pass to do_socket() as the guest program wrote into
its guest view of memory (since it should have written an
unsigned long). (What is happening here is that the guest
binary writes the protocol value to memory as an unsigned
long, so it goes in as 4 bytes in whichever order the guest uses;
qemu's get_user_ual() then rereads those 4 bytes, swapping
the value back so we get the same integer value the guest
program stored. Note that the guest doesn't write the protocol
argument as a 2 byte value!)

I would encourage you to write some simple test programs
and check them using strace (both of the native program and
of qemu running the program).

-- PMM
Laurent Vivier Jan. 1, 2013, 10:12 p.m. UTC | #7
Le mardi 01 janvier 2013 à 19:45 +0000, Peter Maydell a écrit :
> On 1 January 2013 18:37, Laurent Vivier <Laurent@vivier.eu> wrote:
> > Le mardi 01 janvier 2013 à 18:27 +0100, Laurent Vivier a écrit :
> >> Le mardi 01 janvier 2013 à 15:03 +0000, Peter Maydell a écrit :
> >> > On 31 December 2012 22:19, Laurent Vivier <Laurent@vivier.eu> wrote:
> >> > > works... sometime. In fact, work if target endianess is network endianess.
> >> > >
> >> > > Correct me if I'm wrong.
> >> > >
> >> > > target          host
> >> > > little endian / big endian
> >> > >
> >> > > memory   00 00 00 03
> >> >
> >> > Syscall arguments aren't generally passed in memory, they're
> >> > in registers (and if they were pased in memory for some architecture
> >> > then that arch would do a load-and-swap-from-memory in main.c).
> >> > So the value you see in do_socket() is always "the integer passed
> >> > as a syscall parameter, as a host-order integer".
> >>
> >> Yes, I missed that.
> >
> > But, in fact, for socketcall(), they are read from memory :
> 
> Yes, this is because socketcall is weird. The actual kernel
> implementation also reads them from memory:
>   http://lxr.linux.no/#linux+v3.7.1/net/socket.c#L2443
> as an array of unsigned longs. So as long as qemu also reads
> them out of memory as an array of target abi_ulongs (which as
> you can see we do) then we'll retrieve the same value (0x3 or
> 0x300) to pass to do_socket() as the guest program wrote into
> its guest view of memory (since it should have written an
> unsigned long). (What is happening here is that the guest
> binary writes the protocol value to memory as an unsigned
> long, so it goes in as 4 bytes in whichever order the guest uses;
> qemu's get_user_ual() then rereads those 4 bytes, swapping
> the value back so we get the same integer value the guest
> program stored. Note that the guest doesn't write the protocol
> argument as a 2 byte value!)
> 
> I would encourage you to write some simple test programs
> and check them using strace (both of the native program and
> of qemu running the program).

OK, I will... but I think we will fall back to my original patch ;-)

Regards,
Laurent
Peter Maydell Jan. 1, 2013, 10:50 p.m. UTC | #8
On 1 January 2013 22:12, Laurent Vivier <Laurent@vivier.eu> wrote:
> Le mardi 01 janvier 2013 à 19:45 +0000, Peter Maydell a écrit :
>> I would encourage you to write some simple test programs
>> and check them using strace (both of the native program and
>> of qemu running the program).
>
> OK, I will... but I think we will fall back to my original patch ;-)

You'll need to provide test programs which work with that and
fail with the simple tswap16() approach first.

-- PMM
diff mbox

Patch

diff --git a/include/exec/user/abitypes.h b/include/exec/user/abitypes.h
index fe7f662..f4f526a 100644
--- a/include/exec/user/abitypes.h
+++ b/include/exec/user/abitypes.h
@@ -15,6 +15,15 @@  static inline abi_ulong tswapal(abi_ulong v)
     return tswap32(v);
 }
 
+static inline abi_ulong abi_ntohl(abi_ulong v)
+{
+#if defined(HOST_BIG_ENDIAN)
+    return v;
+#else
+    return bswap_32(v);
+#endif
+}
+
 #else
 typedef target_ulong abi_ulong;
 typedef target_long abi_long;
@@ -32,5 +41,18 @@  static inline abi_ulong tswapal(abi_ulong v)
     return tswapl(v);
 }
 
+static inline abi_ulong abi_ntohl(abi_ulong v)
+{
+#if defined(HOST_BIG_ENDIAN)
+    return v;
+#else
+#if TARGET_LONG_SIZE == 4
+    return bswap_32(v);
+#else
+    return bswap_64(v);
+#endif
+#endif
+}
+
 #endif
 #endif
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 000b640..29151a6 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1874,7 +1874,7 @@  static void unlock_iovec(struct iovec *vec, abi_ulong target_addr,
 }
 
 /* do_socket() Must return target values and target errnos. */
-static abi_long do_socket(int domain, int type, int protocol)
+static abi_long do_socket(int domain, int type, abi_ulong protocol)
 {
 #if defined(TARGET_MIPS)
     switch(type) {
@@ -1900,6 +1900,12 @@  static abi_long do_socket(int domain, int type, int protocol)
 #endif
     if (domain == PF_NETLINK)
         return -EAFNOSUPPORT; /* do not NETLINK socket connections possible */
+    if (type == SOCK_PACKET) {
+        /* in this case, socket() needs a network endian short */
+        protocol = tswapal(protocol); /* restore network endian long */
+        protocol = abi_ntohl(protocol); /* a host endian long */
+        protocol = htons(protocol); /* network endian short */
+    }
     return get_errno(socket(domain, type, protocol));
 }