Patchwork [RFC,RDMA,support,v2:,4/6] initialize RDMA options when QEMU first runs on command-line

login
register
mail settings
Submitter mrhines@linux.vnet.ibm.com
Date Feb. 11, 2013, 10:49 p.m.
Message ID <1360622997-26904-4-git-send-email-mrhines@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/219689/
State New
Headers show

Comments

mrhines@linux.vnet.ibm.com - Feb. 11, 2013, 10:49 p.m.
From: "Michael R. Hines" <mrhines@us.ibm.com>


Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
---
 exec.c |   27 +++++++++++++++++++++++++++
 vl.c   |   13 +++++++++++++
 2 files changed, 40 insertions(+)
Paolo Bonzini - Feb. 18, 2013, 10:37 a.m.
Il 11/02/2013 23:49, Michael R. Hines ha scritto:
> From: "Michael R. Hines" <mrhines@us.ibm.com>
> 
> 
> Signed-off-by: Michael R. Hines <mrhines@us.ibm.com>
> ---
>  exec.c |   27 +++++++++++++++++++++++++++
>  vl.c   |   13 +++++++++++++
>  2 files changed, 40 insertions(+)
> 
> diff --git a/exec.c b/exec.c
> index b85508b..b7ac6fa 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -25,6 +25,8 @@
>  #endif
>  
>  #include "qemu-common.h"
> +#include "qemu/rdma.h"
> +#include "monitor/monitor.h"
>  #include "cpu.h"
>  #include "tcg.h"
>  #include "hw/hw.h"
> @@ -104,6 +106,31 @@ static MemoryRegion io_mem_watch;
>  
>  #if !defined(CONFIG_USER_ONLY)
>  
> +/*
> + * Memory regions need to be registered with the device and queue pairs setup
> + * in advanced before the migration starts. This tells us where the RAM blocks
> + * are so that we can register them individually.
> + */
> +int rdma_init_ram_blocks(struct rdma_ram_blocks *rdma_ram_blocks)
> +{
> +    RAMBlock *block;
> +    int num_blocks = 0;
> +
> +    memset(rdma_ram_blocks, 0, sizeof *rdma_ram_blocks);
> +    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> +        if (num_blocks >= RDMA_MAX_RAM_BLOCKS) {
> +                return -1;
> +        }
> +        rdma_ram_blocks->block[num_blocks].local_host_addr = block->host;
> +        rdma_ram_blocks->block[num_blocks].offset = (uint64_t)block->offset;
> +        rdma_ram_blocks->block[num_blocks].length = (uint64_t)block->length;
> +        num_blocks++;
> +    }
> +    rdma_ram_blocks->num_blocks = num_blocks;
> +
> +    return 0;
> +}

Memory regions are not static data, so you have to do this at the time
migration starts.

For the RDMA-impaired among us, why do you need a separate host+port?
Can it be the same by default, and if it is different you can then
specify it like

    rdma://host:port/?rdmahost=HOST&rdmaport=PORT

Paolo
mrhines@linux.vnet.ibm.com - Feb. 19, 2013, 6 a.m.
Yes, this is done at migration time (see functions "rdma_client_init" 
and "rdma_server_prepare()")

To explain the host and port:

The separate host and port are used by the library "librdmacm". This 
library performs a network translation between the IP address and a 
unique infiniband user-level Port number and the physical interface that 
has the RDMA capabilities. This library requires an IP address and port 
bound specifically to the requested RDMA interface to work.

The patch does not assume that the network interface used for TCP 
traffic will necessarily be the same as the interface used for RDMA traffic.

Alternatively, this host and port could be specified using the QMP 
"migrate" command, but this command already has the URI for the TCP side 
of things reserved.

If you guys like, we could specify a *second* URI on the QMP command 
line - we don't really have a preference.

Either way is fine........ whatever the consensus is.

- Michael

On 02/18/2013 05:37 AM, Paolo Bonzini wrote:
> Il 11/02/2013 23:49, Michael R. Hines ha scritto:
>> +/*
>> + * Memory regions need to be registered with the device and queue pairs setup
>> + * in advanced before the migration starts. This tells us where the RAM blocks
>> + * are so that we can register them individually.
>> + */
>> +int rdma_init_ram_blocks(struct rdma_ram_blocks *rdma_ram_blocks)
>> +{
>> +    RAMBlock *block;
>> +    int num_blocks = 0;
>> +
>> +    memset(rdma_ram_blocks, 0, sizeof *rdma_ram_blocks);
>> +    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
>> +        if (num_blocks >= RDMA_MAX_RAM_BLOCKS) {
>> +                return -1;
>> +        }
>> +        rdma_ram_blocks->block[num_blocks].local_host_addr = block->host;
>> +        rdma_ram_blocks->block[num_blocks].offset = (uint64_t)block->offset;
>> +        rdma_ram_blocks->block[num_blocks].length = (uint64_t)block->length;
>> +        num_blocks++;
>> +    }
>> +    rdma_ram_blocks->num_blocks = num_blocks;
>> +
>> +    return 0;
>> +}
> Memory regions are not static data, so you have to do this at the time
> migration starts.
>
> For the RDMA-impaired among us, why do you need a separate host+port?
> Can it be the same by default, and if it is different you can then
> specify it like
>
>      rdma://host:port/?rdmahost=HOST&rdmaport=PORT
>
> Paolo
>
Paolo Bonzini - Feb. 19, 2013, 8:42 a.m.
Il 19/02/2013 07:00, Michael R. Hines ha scritto:
> Yes, this is done at migration time (see functions "rdma_client_init"
> and "rdma_server_prepare()")
> 
> To explain the host and port:
> 
> The separate host and port are used by the library "librdmacm". This
> library performs a network translation between the IP address and a
> unique infiniband user-level Port number and the physical interface that
> has the RDMA capabilities. This library requires an IP address and port
> bound specifically to the requested RDMA interface to work.
> 
> The patch does not assume that the network interface used for TCP
> traffic will necessarily be the same as the interface used for RDMA
> traffic.

Of course the best thing to do would be to have all traffic on the RDMA
interface... :)

Paolo

> Alternatively, this host and port could be specified using the QMP
> "migrate" command, but this command already has the URI for the TCP side
> of things reserved.
> 
> If you guys like, we could specify a *second* URI on the QMP command
> line - we don't really have a preference.
> 
> Either way is fine........ whatever the consensus is.
> 
> - Michael
Michael S. Tsirkin - Feb. 21, 2013, 8:12 p.m.
On Tue, Feb 19, 2013 at 09:42:45AM +0100, Paolo Bonzini wrote:
> Il 19/02/2013 07:00, Michael R. Hines ha scritto:
> > Yes, this is done at migration time (see functions "rdma_client_init"
> > and "rdma_server_prepare()")
> > 
> > To explain the host and port:
> > 
> > The separate host and port are used by the library "librdmacm". This
> > library performs a network translation between the IP address and a
> > unique infiniband user-level Port number and the physical interface that
> > has the RDMA capabilities. This library requires an IP address and port
> > bound specifically to the requested RDMA interface to work.
> > 
> > The patch does not assume that the network interface used for TCP
> > traffic will necessarily be the same as the interface used for RDMA
> > traffic.
> 
> Of course the best thing to do would be to have all traffic on the RDMA
> interface... :)
> 
> Paolo

You can't do this with infiniband, RDMA is only possible once the
connection is established.


> > Alternatively, this host and port could be specified using the QMP
> > "migrate" command, but this command already has the URI for the TCP side
> > of things reserved.
> > 
> > If you guys like, we could specify a *second* URI on the QMP command
> > line - we don't really have a preference.
> > 
> > Either way is fine........ whatever the consensus is.
> > 
> > - Michael
>
Paolo Bonzini - March 6, 2013, 10:10 a.m.
> On Tue, Feb 19, 2013 at 09:42:45AM +0100, Paolo Bonzini wrote:
> > Il 19/02/2013 07:00, Michael R. Hines ha scritto:
> > > Yes, this is done at migration time (see functions
> > > "rdma_client_init"
> > > and "rdma_server_prepare()")
> > > 
> > > To explain the host and port:
> > > 
> > > The separate host and port are used by the library "librdmacm".  This
> > > library performs a network translation between the IP address and a
> > > unique infiniband user-level Port number and the physical
> > > interface that has the RDMA capabilities. This library requires an
> > > IP address and port bound specifically to the requested RDMA interface
> > > to work.
> > > 
> > > The patch does not assume that the network interface used for TCP
> > > traffic will necessarily be the same as the interface used for
> > > RDMA traffic.
> > 
> > Of course the best thing to do would be to have all traffic on the
> > RDMA interface... :)
> 
> You can't do this with infiniband, RDMA is only possible once the
> connection is established.

Sorry, I meant on the infiniband interface.

Right now Michael (Hines)'s code needs two sockets, one for TCP and
one for RDMA.  If I understand correctly, the rdmacm library does not
need a separate address to set up the connection, that's just an
artifact of the implementation.

Whatever goes on in the TCP socket can be done on RDMA after establishing
the connection, or can be done with SEND.

Paolo

> 
> > > Alternatively, this host and port could be specified using the
> > > QMP
> > > "migrate" command, but this command already has the URI for the
> > > TCP side
> > > of things reserved.
> > > 
> > > If you guys like, we could specify a *second* URI on the QMP
> > > command
> > > line - we don't really have a preference.
> > > 
> > > Either way is fine........ whatever the consensus is.
> > > 
> > > - Michael
> > 
>

Patch

diff --git a/exec.c b/exec.c
index b85508b..b7ac6fa 100644
--- a/exec.c
+++ b/exec.c
@@ -25,6 +25,8 @@ 
 #endif
 
 #include "qemu-common.h"
+#include "qemu/rdma.h"
+#include "monitor/monitor.h"
 #include "cpu.h"
 #include "tcg.h"
 #include "hw/hw.h"
@@ -104,6 +106,31 @@  static MemoryRegion io_mem_watch;
 
 #if !defined(CONFIG_USER_ONLY)
 
+/*
+ * Memory regions need to be registered with the device and queue pairs setup
+ * in advanced before the migration starts. This tells us where the RAM blocks
+ * are so that we can register them individually.
+ */
+int rdma_init_ram_blocks(struct rdma_ram_blocks *rdma_ram_blocks)
+{
+    RAMBlock *block;
+    int num_blocks = 0;
+
+    memset(rdma_ram_blocks, 0, sizeof *rdma_ram_blocks);
+    QTAILQ_FOREACH(block, &ram_list.blocks, next) {
+        if (num_blocks >= RDMA_MAX_RAM_BLOCKS) {
+                return -1;
+        }
+        rdma_ram_blocks->block[num_blocks].local_host_addr = block->host;
+        rdma_ram_blocks->block[num_blocks].offset = (uint64_t)block->offset;
+        rdma_ram_blocks->block[num_blocks].length = (uint64_t)block->length;
+        num_blocks++;
+    }
+    rdma_ram_blocks->num_blocks = num_blocks;
+
+    return 0;
+}
+
 static void phys_map_node_reserve(unsigned nodes)
 {
     if (phys_map_nodes_nb + nodes > phys_map_nodes_nb_alloc) {
diff --git a/vl.c b/vl.c
index 7aab73b..170d209 100644
--- a/vl.c
+++ b/vl.c
@@ -29,6 +29,7 @@ 
 #include <sys/time.h>
 #include <zlib.h>
 #include "qemu/bitmap.h"
+#include "qemu/rdma.h"
 
 /* Needed early for CONFIG_BSD etc. */
 #include "config-host.h"
@@ -233,6 +234,9 @@  int boot_menu;
 uint8_t *boot_splash_filedata;
 size_t boot_splash_filedata_size;
 uint8_t qemu_extra_params_fw[2];
+int rdmaport = -1;
+char rdmahost[64] = "";
+struct rdma_data rdma_mdata;
 
 typedef struct FWBootEntry FWBootEntry;
 
@@ -3622,6 +3626,13 @@  int main(int argc, char **argv, char **envp)
                 default_sdcard = 0;
                 default_vga = 0;
                 break;
+            case QEMU_OPTION_rdmaport:
+                rdmaport = atoi(optarg);
+                break;
+            case QEMU_OPTION_rdmahost:
+                strncpy(rdmahost, optarg, 64);
+                rdmahost[63] = '\0';
+                break;
             case QEMU_OPTION_xen_domid:
                 if (!(xen_available())) {
                     printf("Option %s not supported for this target\n", popt->name);
@@ -3725,6 +3736,8 @@  int main(int argc, char **argv, char **envp)
     }
     loc_set_none();
 
+    rdma_data_init(&rdma_mdata);
+
     if (qemu_init_main_loop()) {
         fprintf(stderr, "qemu_init_main_loop failed\n");
         exit(1);