[ovs-dev,v3,3/3] OVN: Add ovn-northd IGMP support
diff mbox series

Message ID 20190711084520.15842.4299.stgit@dceara.remote.csb
State Superseded
Headers show
Series
  • OVN: Add IGMP support
Related show

Commit Message

Dumitru Ceara July 11, 2019, 8:45 a.m. UTC
New IP Multicast Snooping Options are added to the Northbound DB
Logical_Switch:other_config column. These allow enabling IGMP snooping and
querier on the logical switch and get translated by ovn-northd to rows in
the IP_Multicast Southbound DB table.

ovn-northd monitors for changes done by ovn-controllers in the Southbound DB
IGMP_Group table. Based on the entries in IGMP_Group ovn-northd creates
Multicast_Group entries in the Southbound DB, one per IGMP_Group address X,
containing the list of logical switch ports (aggregated from all controllers)
that have IGMP_Group entries for that datapath and address X. ovn-northd
also creates a logical flow that matches on IP multicast traffic destined
to address X and outputs it on the tunnel key of the corresponding
Multicast_Group entry.

Signed-off-by: Dumitru Ceara <dceara@redhat.com>
Acked-by: Mark Michelson <mmichels@redhat.com>
---
 ovn/northd/ovn-northd.c |  460 ++++++++++++++++++++++++++++++++++++++++++++---
 ovn/ovn-nb.xml          |   54 ++++++
 tests/ovn.at            |  270 ++++++++++++++++++++++++++++
 tests/system-ovn.at     |  119 ++++++++++++
 4 files changed, 871 insertions(+), 32 deletions(-)

Comments

Dumitru Ceara July 11, 2019, 3:20 p.m. UTC | #1
On Thu, Jul 11, 2019 at 10:57 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> New IP Multicast Snooping Options are added to the Northbound DB
> Logical_Switch:other_config column. These allow enabling IGMP snooping and
> querier on the logical switch and get translated by ovn-northd to rows in
> the IP_Multicast Southbound DB table.
>
> ovn-northd monitors for changes done by ovn-controllers in the Southbound DB
> IGMP_Group table. Based on the entries in IGMP_Group ovn-northd creates
> Multicast_Group entries in the Southbound DB, one per IGMP_Group address X,
> containing the list of logical switch ports (aggregated from all controllers)
> that have IGMP_Group entries for that datapath and address X. ovn-northd
> also creates a logical flow that matches on IP multicast traffic destined
> to address X and outputs it on the tunnel key of the corresponding
> Multicast_Group entry.
>
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> Acked-by: Mark Michelson <mmichels@redhat.com>
> ---
>  ovn/northd/ovn-northd.c |  460 ++++++++++++++++++++++++++++++++++++++++++++---
>  ovn/ovn-nb.xml          |   54 ++++++
>  tests/ovn.at            |  270 ++++++++++++++++++++++++++++
>  tests/system-ovn.at     |  119 ++++++++++++
>  4 files changed, 871 insertions(+), 32 deletions(-)
>

<snip>

> +
> +static void
> +build_mcast_groups(struct northd_context *ctx,
> +                   struct hmap *datapaths, struct hmap *ports,
> +                   struct hmap *mcast_groups,
> +                   struct hmap *igmp_groups)
> +{
> +    struct ovn_port *op;
> +
> +    hmap_init(mcast_groups);
> +    hmap_init(igmp_groups);
> +
> +    HMAP_FOR_EACH (op, key_node, ports) {
> +        if (!op->nbsp) {
> +            continue;
> +        }
> +
> +        if (lsp_is_enabled(op->nbsp)) {
> +            ovn_multicast_add(mcast_groups, &mc_flood, op);
> +        }
> +    }
> +
> +    const struct sbrec_igmp_group *sb_igmp, *sb_igmp_next;
> +
> +    SBREC_IGMP_GROUP_FOR_EACH_SAFE (sb_igmp, sb_igmp_next, ctx->ovnsb_idl) {
> +        /* If this is a stale group (e.g., controller had crashed,
> +         * purge it).
> +         */
> +        if (!sb_igmp->chassis || !sb_igmp->datapath) {
> +            sbrec_igmp_group_delete(sb_igmp);
> +            continue;
> +        }
> +
> +        struct ovn_datapath *od =
> +            ovn_datapath_from_sbrec(datapaths, sb_igmp->datapath);
> +        if (!od) {
> +            sbrec_igmp_group_delete(sb_igmp);
> +            continue;
> +        }
> +
> +        ovn_igmp_group_add(mcast_groups, igmp_groups, od, ports,
> +                           sb_igmp->address, sb_igmp->ports, sb_igmp->n_ports);
> +    }
> +}

Hi Ben, Mark,

While doing some scale testing I realized that walking the rows of the
IGMP_Group table in ovn-northd in the order we get them from the
database might create an issue: ovn_igmp_group_add will create a new
multicast_group for every unique IGMP group address and allocate a
tunnel-id for it. However, because rows are not processed in the order
they were added to the database, it can happen that multicast groups
that didn't actually change will get a different tunnel-id triggering
a change in the associated logical flows.

In order to avoid this I would need to reuse the tunnel-ids of
multicast groups that didn't change between different runs of the
ovn-northd loop. Until now I thought of two different approaches (both
with advantages and disadvantages):

1. Force ovn-northd to walk the IGMP table in a way that ensures that
IGMP groups are processed in the order they were added to the
database:

Add a column to the IGMP_Group table storing a free running counter
value (unique per ovn-controller instance) and add another compound
index [datapath + address + counter]. Every time an ovn-controller
adds an IGMP group it increments its own counter. Then have ovn-northd
walk the IGMP_Group table with SBREC_IGMP_GROUP_FOR_EACH_BYINDEX which
would give us a stable ordering of the entries.

Advantages:
- relatively straightforward to code and maintain

Disadvantages:
- extra column in SB DB
- populating the index in ovn-northd will take N log(N) operations if
i understand correctly the IDL index implementation (N = number of
IGMP_Group entries in the DB)

2. Maintain a cache (hashtable) of allocated multicast group
tunnel-ids between subsequent  runs of the ovn-northd loop:
- Once all IGMP_Group entries are processed and their corresponding
Multicast_Group entries are collected we'd need to store a mapping
(per datapath) between IGMP group address and multicast group
tunnel-id.
- Next time ovn-northd walks the IGMP_Group table, before allocating a
new tunnel-id for a multicast group entry it would check the "cache"
from the previous run. If there's already an entry it would reuse the
tunnel-id. If not, it will have to allocate a tunnel-id. Store the
(IGMP group address, tunnel-id) mapping for next run.

Advantages:
- Changes are all local to ovn-northd, no need to store additional
information in the DB.
- Should be faster on average when processing small batches of new
IGMP_Groups from the DB.

Disadvantages:
- We need a multicast_group tunnel-id allocator. I see there's already
similar code (allocate_tnlid) for datapath tunnel-keys but this will
walk the whole range of tunnel-ids until we find one that's not used.
I'm not completely sure about the worst case complexity of this
approach..
- The code to maintain the tunnel-ids across ovn-northd loop
iterations might end up complicated.

What do you guys think? Any alternatives?

Thanks,
Dumitru
Ben Pfaff July 12, 2019, 5:34 p.m. UTC | #2
On Thu, Jul 11, 2019 at 05:20:58PM +0200, Dumitru Ceara wrote:
> While doing some scale testing I realized that walking the rows of the
> IGMP_Group table in ovn-northd in the order we get them from the
> database might create an issue: ovn_igmp_group_add will create a new
> multicast_group for every unique IGMP group address and allocate a
> tunnel-id for it. However, because rows are not processed in the order
> they were added to the database, it can happen that multicast groups
> that didn't actually change will get a different tunnel-id triggering
> a change in the associated logical flows.

Ouch.  Let's avoid that problem!

> 2. Maintain a cache (hashtable) of allocated multicast group
> tunnel-ids between subsequent  runs of the ovn-northd loop:
> - Once all IGMP_Group entries are processed and their corresponding
> Multicast_Group entries are collected we'd need to store a mapping
> (per datapath) between IGMP group address and multicast group
> tunnel-id.
> - Next time ovn-northd walks the IGMP_Group table, before allocating a
> new tunnel-id for a multicast group entry it would check the "cache"
> from the previous run. If there's already an entry it would reuse the
> tunnel-id. If not, it will have to allocate a tunnel-id. Store the
> (IGMP group address, tunnel-id) mapping for next run.

This is the customary way, so it's what I recommend.
Dumitru Ceara July 15, 2019, 2:40 p.m. UTC | #3
On Fri, Jul 12, 2019 at 7:34 PM Ben Pfaff <blp@ovn.org> wrote:
>
> On Thu, Jul 11, 2019 at 05:20:58PM +0200, Dumitru Ceara wrote:
> > While doing some scale testing I realized that walking the rows of the
> > IGMP_Group table in ovn-northd in the order we get them from the
> > database might create an issue: ovn_igmp_group_add will create a new
> > multicast_group for every unique IGMP group address and allocate a
> > tunnel-id for it. However, because rows are not processed in the order
> > they were added to the database, it can happen that multicast groups
> > that didn't actually change will get a different tunnel-id triggering
> > a change in the associated logical flows.
>
> Ouch.  Let's avoid that problem!
>
> > 2. Maintain a cache (hashtable) of allocated multicast group
> > tunnel-ids between subsequent  runs of the ovn-northd loop:
> > - Once all IGMP_Group entries are processed and their corresponding
> > Multicast_Group entries are collected we'd need to store a mapping
> > (per datapath) between IGMP group address and multicast group
> > tunnel-id.
> > - Next time ovn-northd walks the IGMP_Group table, before allocating a
> > new tunnel-id for a multicast group entry it would check the "cache"
> > from the previous run. If there's already an entry it would reuse the
> > tunnel-id. If not, it will have to allocate a tunnel-id. Store the
> > (IGMP group address, tunnel-id) mapping for next run.
>
> This is the customary way, so it's what I recommend.

I'm working on it and I'll send out a v4 soon.

Thanks,
Dumitru

Patch
diff mbox series

diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index ce382ac..2b71526 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -29,6 +29,7 @@ 
 #include "openvswitch/json.h"
 #include "ovn/lex.h"
 #include "ovn/lib/chassis-index.h"
+#include "ovn/lib/ip-mcast-index.h"
 #include "ovn/lib/ovn-l7.h"
 #include "ovn/lib/ovn-nb-idl.h"
 #include "ovn/lib/ovn-sb-idl.h"
@@ -57,6 +58,7 @@  struct northd_context {
     struct ovsdb_idl_txn *ovnnb_txn;
     struct ovsdb_idl_txn *ovnsb_txn;
     struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name;
+    struct ovsdb_idl_index *sbrec_ip_mcast_by_dp;
 };
 
 static const char *ovnnb_db;
@@ -424,6 +426,33 @@  struct ipam_info {
     bool mac_only;
 };
 
+#define OVN_MIN_MULTICAST 32768
+#define OVN_MAX_MULTICAST OVN_MCAST_FLOOD_TUNNEL_KEY
+BUILD_ASSERT_DECL(OVN_MIN_MULTICAST < OVN_MAX_MULTICAST);
+
+#define OVN_MIN_IP_MULTICAST OVN_MIN_MULTICAST
+#define OVN_MAX_IP_MULTICAST (OVN_MCAST_UNKNOWN_TUNNEL_KEY - 1)
+BUILD_ASSERT_DECL(OVN_MAX_IP_MULTICAST >= OVN_MIN_MULTICAST);
+
+/*
+ * Multicast snooping and querier per datapath configuration.
+ */
+struct mcast_info {
+    bool enabled;
+    bool querier;
+    bool flood_unregistered;
+
+    int64_t table_size;
+    int64_t idle_timeout;
+    int64_t query_interval;
+    char *eth_src;
+    char *ipv4_src;
+    int64_t  query_max_response;
+
+    uint32_t group_key_next;
+    uint32_t active_flows;
+};
+
 /* The 'key' comes from nbs->header_.uuid or nbr->header_.uuid or
  * sb->external_ids:logical-switch. */
 struct ovn_datapath {
@@ -448,6 +477,9 @@  struct ovn_datapath {
     /* IPAM data. */
     struct ipam_info ipam_info;
 
+    /* Multicast data. */
+    struct mcast_info mcast_info;
+
     /* OVN northd only needs to know about the logical router gateway port for
      * NAT on a distributed router.  This "distributed gateway port" is
      * populated only when there is a "redirect-chassis" specified for one of
@@ -522,6 +554,8 @@  ovn_datapath_destroy(struct hmap *datapaths, struct ovn_datapath *od)
         hmap_remove(datapaths, &od->key_node);
         destroy_tnlids(&od->port_tnlids);
         bitmap_free(od->ipam_info.allocated_ipv4s);
+        free(od->mcast_info.eth_src);
+        free(od->mcast_info.ipv4_src);
         free(od->router_ports);
         ovn_ls_port_group_destroy(&od->nb_pgs);
         free(od);
@@ -659,6 +693,85 @@  init_ipam_info_for_datapath(struct ovn_datapath *od)
 }
 
 static void
+init_mcast_info_for_datapath(struct ovn_datapath *od)
+{
+    if (!od->nbs) {
+        return;
+    }
+
+    struct mcast_info *mcast_info = &od->mcast_info;
+
+    mcast_info->enabled =
+        smap_get_bool(&od->nbs->other_config, "mcast_snoop", false);
+    mcast_info->querier =
+        smap_get_bool(&od->nbs->other_config, "mcast_querier", true);
+    mcast_info->flood_unregistered =
+        smap_get_bool(&od->nbs->other_config, "mcast_flood_unregistered",
+                      false);
+
+    mcast_info->table_size =
+        smap_get_ullong(&od->nbs->other_config, "mcast_table_size",
+                        OVN_MCAST_DEFAULT_MAX_ENTRIES);
+
+    uint32_t idle_timeout =
+        smap_get_ullong(&od->nbs->other_config, "mcast_idle_timeout",
+                        OVN_MCAST_DEFAULT_IDLE_TIMEOUT_S);
+    if (idle_timeout < OVN_MCAST_MIN_IDLE_TIMEOUT_S) {
+        idle_timeout = OVN_MCAST_MIN_IDLE_TIMEOUT_S;
+    } else if (idle_timeout > OVN_MCAST_MAX_IDLE_TIMEOUT_S) {
+        idle_timeout = OVN_MCAST_MAX_IDLE_TIMEOUT_S;
+    }
+    mcast_info->idle_timeout = idle_timeout;
+
+    uint32_t query_interval =
+        smap_get_ullong(&od->nbs->other_config, "mcast_query_interval",
+                        mcast_info->idle_timeout / 2);
+    if (query_interval < OVN_MCAST_MIN_QUERY_INTERVAL_S) {
+        query_interval = OVN_MCAST_MIN_QUERY_INTERVAL_S;
+    } else if (query_interval > OVN_MCAST_MAX_QUERY_INTERVAL_S) {
+        query_interval = OVN_MCAST_MAX_QUERY_INTERVAL_S;
+    }
+    mcast_info->query_interval = query_interval;
+
+    mcast_info->eth_src =
+        nullable_xstrdup(smap_get(&od->nbs->other_config, "mcast_eth_src"));
+    mcast_info->ipv4_src =
+        nullable_xstrdup(smap_get(&od->nbs->other_config, "mcast_ip4_src"));
+
+    mcast_info->query_max_response =
+        smap_get_ullong(&od->nbs->other_config, "mcast_query_max_response",
+                        OVN_MCAST_DEFAULT_QUERY_MAX_RESPONSE_S);
+
+    mcast_info->group_key_next = OVN_MAX_IP_MULTICAST;
+    mcast_info->active_flows = 0;
+}
+
+static void
+store_mcast_info_for_datapath(const struct sbrec_ip_multicast *sb,
+                              struct ovn_datapath *od)
+{
+    struct mcast_info *mcast_info = &od->mcast_info;
+
+    sbrec_ip_multicast_set_datapath(sb, od->sb);
+    sbrec_ip_multicast_set_enabled(sb, &mcast_info->enabled, 1);
+    sbrec_ip_multicast_set_querier(sb, &mcast_info->querier, 1);
+    sbrec_ip_multicast_set_table_size(sb, &mcast_info->table_size, 1);
+    sbrec_ip_multicast_set_idle_timeout(sb, &mcast_info->idle_timeout, 1);
+    sbrec_ip_multicast_set_query_interval(sb,
+                                          &mcast_info->query_interval, 1);
+    sbrec_ip_multicast_set_query_max_resp(sb,
+                                          &mcast_info->query_max_response, 1);
+
+    if (mcast_info->eth_src) {
+        sbrec_ip_multicast_set_eth_src(sb, mcast_info->eth_src);
+    }
+
+    if (mcast_info->ipv4_src) {
+        sbrec_ip_multicast_set_ip4_src(sb, mcast_info->ipv4_src);
+    }
+}
+
+static void
 ovn_datapath_update_external_ids(struct ovn_datapath *od)
 {
     /* Get the logical-switch or logical-router UUID to set in
@@ -741,6 +854,7 @@  join_datapaths(struct northd_context *ctx, struct hmap *datapaths,
         }
 
         init_ipam_info_for_datapath(od);
+        init_mcast_info_for_datapath(od);
     }
 
     const struct nbrec_logical_router *nbr;
@@ -910,7 +1024,7 @@  ovn_port_destroy(struct hmap *ports, struct ovn_port *port)
 }
 
 static struct ovn_port *
-ovn_port_find(struct hmap *ports, const char *name)
+ovn_port_find(const struct hmap *ports, const char *name)
 {
     struct ovn_port *op;
 
@@ -2700,20 +2814,19 @@  build_ports(struct northd_context *ctx,
     cleanup_sb_ha_chassis_groups(ctx, &active_ha_chassis_grps);
     sset_destroy(&active_ha_chassis_grps);
 }
-
-#define OVN_MIN_MULTICAST 32768
-#define OVN_MAX_MULTICAST 65535
 
 struct multicast_group {
-    const char *name;
+    char *name;
     uint16_t key;               /* OVN_MIN_MULTICAST...OVN_MAX_MULTICAST. */
 };
 
 #define MC_FLOOD "_MC_flood"
-static const struct multicast_group mc_flood = { MC_FLOOD, 65535 };
+static const struct multicast_group mc_flood =
+    { MC_FLOOD, OVN_MCAST_FLOOD_TUNNEL_KEY };
 
 #define MC_UNKNOWN "_MC_unknown"
-static const struct multicast_group mc_unknown = { MC_UNKNOWN, 65534 };
+static const struct multicast_group mc_unknown =
+    { MC_UNKNOWN, OVN_MCAST_UNKNOWN_TUNNEL_KEY };
 
 static bool
 multicast_group_equal(const struct multicast_group *a,
@@ -2756,10 +2869,10 @@  ovn_multicast_find(struct hmap *mcgroups, struct ovn_datapath *datapath,
 }
 
 static void
-ovn_multicast_add(struct hmap *mcgroups, const struct multicast_group *group,
-                  struct ovn_port *port)
+ovn_multicast_add_ports(struct hmap *mcgroups, struct ovn_datapath *od,
+                        const struct multicast_group *group,
+                        struct ovn_port **ports, size_t n_ports)
 {
-    struct ovn_datapath *od = port->od;
     struct ovn_multicast *mc = ovn_multicast_find(mcgroups, od, group);
     if (!mc) {
         mc = xmalloc(sizeof *mc);
@@ -2770,11 +2883,27 @@  ovn_multicast_add(struct hmap *mcgroups, const struct multicast_group *group,
         mc->allocated_ports = 4;
         mc->ports = xmalloc(mc->allocated_ports * sizeof *mc->ports);
     }
-    if (mc->n_ports >= mc->allocated_ports) {
+
+    size_t n_ports_total = mc->n_ports + n_ports;
+
+    if (n_ports_total > 2 * mc->allocated_ports) {
+        mc->allocated_ports = n_ports_total;
+        mc->ports = xrealloc(mc->ports,
+                             mc->allocated_ports * sizeof *mc->ports);
+    } else if (n_ports_total > mc->allocated_ports) {
         mc->ports = x2nrealloc(mc->ports, &mc->allocated_ports,
                                sizeof *mc->ports);
     }
-    mc->ports[mc->n_ports++] = port;
+
+    memcpy(&mc->ports[mc->n_ports], &ports[0], n_ports * sizeof *ports);
+    mc->n_ports += n_ports;
+}
+
+static void
+ovn_multicast_add(struct hmap *mcgroups, const struct multicast_group *group,
+                  struct ovn_port *port)
+{
+    ovn_multicast_add_ports(mcgroups, port->od, group, &port, 1);
 }
 
 static void
@@ -2798,7 +2927,115 @@  ovn_multicast_update_sbrec(const struct ovn_multicast *mc,
     sbrec_multicast_group_set_ports(sb, ports, mc->n_ports);
     free(ports);
 }
-
+
+/*
+ * IGMP group entry.
+ */
+struct ovn_igmp_group {
+    struct hmap_node hmap_node; /* Index on 'datapath' and 'mcgroup.name'. */
+
+    struct ovn_datapath *datapath;
+    struct in6_addr address; /* Multicast IPv6-mapped-IPv4 or IPv4 address */
+    char address_s[INET6_ADDRSTRLEN + 1];
+    struct multicast_group mcgroup;
+};
+
+static uint32_t
+ovn_igmp_group_hash(const struct ovn_datapath *datapath,
+                    const struct in6_addr *address)
+{
+    return hash_pointer(datapath, hash_bytes(address, sizeof *address, 0));
+}
+
+static struct ovn_igmp_group *
+ovn_igmp_group_find(struct hmap *igmp_groups,
+                    const struct ovn_datapath *datapath,
+                    const struct in6_addr *address)
+{
+    struct ovn_igmp_group *group;
+
+    HMAP_FOR_EACH_WITH_HASH (group, hmap_node,
+                             ovn_igmp_group_hash(datapath, address),
+                             igmp_groups) {
+        if (group->datapath == datapath &&
+                ipv6_addr_equals(&group->address, address)) {
+            return group;
+        }
+    }
+    return NULL;
+}
+
+static void
+ovn_igmp_group_add(struct hmap *mcast_groups, struct hmap *igmp_groups,
+                   struct ovn_datapath *datapath,
+                   const struct hmap *ports,
+                   const char *address,
+                   struct sbrec_port_binding **igmp_ports,
+                   size_t n_ports)
+{
+    struct in6_addr group_address;
+    ovs_be32 ipv4;
+
+    if (ip_parse(address, &ipv4)) {
+        group_address = in6_addr_mapped_ipv4(ipv4);
+    } else if (!ipv6_parse(address, &group_address)) {
+        static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+        VLOG_WARN_RL(&rl, "invalid IGMP group address: %s", address);
+        return;
+    }
+
+    struct ovn_igmp_group *igmp_group =
+        ovn_igmp_group_find(igmp_groups, datapath, &group_address);
+
+    if (!igmp_group) {
+        if (datapath->mcast_info.group_key_next == OVN_MIN_IP_MULTICAST) {
+            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
+            VLOG_WARN_RL(&rl, "all IP multicast tunnel ids exhausted");
+            return;
+        }
+
+        igmp_group = xmalloc(sizeof *igmp_group);
+
+        igmp_group->datapath = datapath;
+        igmp_group->address = group_address;
+        ovs_strzcpy(igmp_group->address_s, address,
+                    sizeof igmp_group->address_s);
+        igmp_group->mcgroup.key = datapath->mcast_info.group_key_next;
+        igmp_group->mcgroup.name =
+            xasprintf("%u", datapath->mcast_info.group_key_next);
+
+        hmap_insert(igmp_groups, &igmp_group->hmap_node,
+                    ovn_igmp_group_hash(datapath, &group_address));
+
+        datapath->mcast_info.group_key_next--;
+    }
+
+    struct ovn_port **oports = xmalloc(n_ports * sizeof *oports);
+    size_t n_oports = 0;
+
+    for (size_t i = 0; i < n_ports; i++) {
+        oports[n_oports] = ovn_port_find(ports, igmp_ports[i]->logical_port);
+        if (oports[n_oports]) {
+            n_oports++;
+        }
+    }
+
+    ovn_multicast_add_ports(mcast_groups, datapath, &igmp_group->mcgroup,
+                            oports, n_oports);
+    free(oports);
+}
+
+static void
+ovn_igmp_group_destroy(struct hmap *igmp_groups,
+                       struct ovn_igmp_group *igmp_group)
+{
+    if (igmp_group) {
+        hmap_remove(igmp_groups, &igmp_group->hmap_node);
+        free(igmp_group->mcgroup.name);
+        free(igmp_group);
+    }
+}
+
 /* Logical flow generation.
  *
  * This code generates the Logical_Flow table in the southbound database, as a
@@ -4444,7 +4681,7 @@  build_lrouter_groups(struct hmap *ports, struct ovs_list *lr_list)
 static void
 build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
                     struct hmap *port_groups, struct hmap *lflows,
-                    struct hmap *mcgroups)
+                    struct hmap *mcgroups, struct hmap *igmp_groups)
 {
     /* This flow table structure is documented in ovn-northd(8), so please
      * update ovn-northd.8.xml if you change anything. */
@@ -4908,24 +5145,63 @@  build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
             }
         }
     }
+
     /* Ingress table 17: Destination lookup, broadcast and multicast handling
-     * (priority 100). */
-    HMAP_FOR_EACH (op, key_node, ports) {
-        if (!op->nbsp) {
+     * (priority 70 - 100). */
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        if (!od->nbs) {
             continue;
         }
 
-        if (lsp_is_enabled(op->nbsp)) {
-            ovn_multicast_add(mcgroups, &mc_flood, op);
+        if (od->mcast_info.enabled) {
+            /* Punt IGMP traffic to controller. */
+            ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 100,
+                          "ip4 && ip.proto == 2", "igmp;");
+
+            /* Flood all IP multicast traffic destined to 224.0.0.X to all
+             * ports - RFC 4541, section 2.1.2, item 2.
+             */
+            ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 85,
+                          "ip4 && ip4.dst == 224.0.0.0/24",
+                          "outport = \""MC_FLOOD"\"; output;");
+
+            /* Drop unregistered IP multicast if not allowed. */
+            if (!od->mcast_info.flood_unregistered) {
+                ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 80,
+                              "ip4 && ip4.mcast", "drop;");
+            }
         }
+
+        ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 70, "eth.mcast",
+                      "outport = \""MC_FLOOD"\"; output;");
     }
-    HMAP_FOR_EACH (od, key_node, datapaths) {
-        if (!od->nbs) {
+
+    /* Ingress table 17: Add IP multicast flows learnt from IGMP
+     * (priority 90). */
+    struct ovn_igmp_group *igmp_group, *next_igmp_group;
+
+    HMAP_FOR_EACH_SAFE (igmp_group, next_igmp_group, hmap_node, igmp_groups) {
+        ds_clear(&match);
+        ds_clear(&actions);
+
+        if (!igmp_group->datapath) {
             continue;
         }
 
-        ovn_lflow_add(lflows, od, S_SWITCH_IN_L2_LKUP, 100, "eth.mcast",
-                      "outport = \""MC_FLOOD"\"; output;");
+        struct mcast_info *mcast_info = &igmp_group->datapath->mcast_info;
+
+        if (mcast_info->active_flows >= mcast_info->table_size) {
+            continue;
+        }
+        mcast_info->active_flows++;
+
+        ds_put_format(&match, "eth.mcast && ip4 && ip4.dst == %s ",
+                      igmp_group->address_s);
+        ds_put_format(&actions, "outport = \"%s\"; output; ",
+                      igmp_group->mcgroup.name);
+
+        ovn_lflow_add(lflows, igmp_group->datapath, S_SWITCH_IN_L2_LKUP, 90,
+                      ds_cstr(&match), ds_cstr(&actions));
     }
 
     /* Ingress table 17: Destination lookup, unicast handling (priority 50), */
@@ -7526,12 +7802,13 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
  * constructing their contents based on the OVN_NB database. */
 static void
 build_lflows(struct northd_context *ctx, struct hmap *datapaths,
-             struct hmap *ports, struct hmap *port_groups)
+             struct hmap *ports, struct hmap *port_groups,
+             struct hmap *mcgroups, struct hmap *igmp_groups)
 {
     struct hmap lflows = HMAP_INITIALIZER(&lflows);
-    struct hmap mcgroups = HMAP_INITIALIZER(&mcgroups);
 
-    build_lswitch_flows(datapaths, ports, port_groups, &lflows, &mcgroups);
+    build_lswitch_flows(datapaths, ports, port_groups, &lflows, mcgroups,
+                        igmp_groups);
     build_lrouter_flows(datapaths, ports, &lflows);
 
     /* Push changes to the Logical_Flow table to database. */
@@ -7606,24 +7883,26 @@  build_lflows(struct northd_context *ctx, struct hmap *datapaths,
 
         struct multicast_group group = { .name = sbmc->name,
                                          .key = sbmc->tunnel_key };
-        struct ovn_multicast *mc = ovn_multicast_find(&mcgroups, od, &group);
+        struct ovn_multicast *mc = ovn_multicast_find(mcgroups, od, &group);
         if (mc) {
             ovn_multicast_update_sbrec(mc, sbmc);
-            ovn_multicast_destroy(&mcgroups, mc);
+            ovn_multicast_destroy(mcgroups, mc);
         } else {
             sbrec_multicast_group_delete(sbmc);
         }
     }
     struct ovn_multicast *mc, *next_mc;
-    HMAP_FOR_EACH_SAFE (mc, next_mc, hmap_node, &mcgroups) {
+    HMAP_FOR_EACH_SAFE (mc, next_mc, hmap_node, mcgroups) {
+        if (!mc->datapath) {
+            continue;
+        }
         sbmc = sbrec_multicast_group_insert(ctx->ovnsb_txn);
         sbrec_multicast_group_set_datapath(sbmc, mc->datapath->sb);
         sbrec_multicast_group_set_name(sbmc, mc->group->name);
         sbrec_multicast_group_set_tunnel_key(sbmc, mc->group->key);
         ovn_multicast_update_sbrec(mc, sbmc);
-        ovn_multicast_destroy(&mcgroups, mc);
+        ovn_multicast_destroy(mcgroups, mc);
     }
-    hmap_destroy(&mcgroups);
 }
 
 static void
@@ -8043,6 +8322,80 @@  destroy_datapaths_and_ports(struct hmap *datapaths, struct hmap *ports,
 }
 
 static void
+build_ip_mcast(struct northd_context *ctx, struct hmap *datapaths)
+{
+    struct ovn_datapath *od;
+
+    HMAP_FOR_EACH (od, key_node, datapaths) {
+        if (!od->nbs) {
+            continue;
+        }
+
+        const struct sbrec_ip_multicast *ip_mcast =
+            ip_mcast_lookup(ctx->sbrec_ip_mcast_by_dp, od->sb);
+
+        if (!ip_mcast) {
+            ip_mcast = sbrec_ip_multicast_insert(ctx->ovnsb_txn);
+        }
+        store_mcast_info_for_datapath(ip_mcast, od);
+    }
+
+    /* Delete southbound records without northbound matches. */
+    const struct sbrec_ip_multicast *sb, *sb_next;
+
+    SBREC_IP_MULTICAST_FOR_EACH_SAFE (sb, sb_next, ctx->ovnsb_idl) {
+        if (!sb->datapath ||
+                !ovn_datapath_from_sbrec(datapaths, sb->datapath)) {
+            sbrec_ip_multicast_delete(sb);
+        }
+    }
+}
+
+static void
+build_mcast_groups(struct northd_context *ctx,
+                   struct hmap *datapaths, struct hmap *ports,
+                   struct hmap *mcast_groups,
+                   struct hmap *igmp_groups)
+{
+    struct ovn_port *op;
+
+    hmap_init(mcast_groups);
+    hmap_init(igmp_groups);
+
+    HMAP_FOR_EACH (op, key_node, ports) {
+        if (!op->nbsp) {
+            continue;
+        }
+
+        if (lsp_is_enabled(op->nbsp)) {
+            ovn_multicast_add(mcast_groups, &mc_flood, op);
+        }
+    }
+
+    const struct sbrec_igmp_group *sb_igmp, *sb_igmp_next;
+
+    SBREC_IGMP_GROUP_FOR_EACH_SAFE (sb_igmp, sb_igmp_next, ctx->ovnsb_idl) {
+        /* If this is a stale group (e.g., controller had crashed,
+         * purge it).
+         */
+        if (!sb_igmp->chassis || !sb_igmp->datapath) {
+            sbrec_igmp_group_delete(sb_igmp);
+            continue;
+        }
+
+        struct ovn_datapath *od =
+            ovn_datapath_from_sbrec(datapaths, sb_igmp->datapath);
+        if (!od) {
+            sbrec_igmp_group_delete(sb_igmp);
+            continue;
+        }
+
+        ovn_igmp_group_add(mcast_groups, igmp_groups, od, ports,
+                           sb_igmp->address, sb_igmp->ports, sb_igmp->n_ports);
+    }
+}
+
+static void
 ovnnb_db_run(struct northd_context *ctx,
              struct ovsdb_idl_index *sbrec_chassis_by_name,
              struct ovsdb_idl_loop *sb_loop,
@@ -8053,23 +8406,36 @@  ovnnb_db_run(struct northd_context *ctx,
         return;
     }
     struct hmap port_groups;
+    struct hmap mcast_groups;
+    struct hmap igmp_groups;
 
     build_datapaths(ctx, datapaths, lr_list);
     build_ports(ctx, sbrec_chassis_by_name, datapaths, ports);
     build_ipam(datapaths, ports);
     build_port_group_lswitches(ctx, &port_groups, ports);
     build_lrouter_groups(ports, lr_list);
-    build_lflows(ctx, datapaths, ports, &port_groups);
+    build_ip_mcast(ctx, datapaths);
+    build_mcast_groups(ctx, datapaths, ports, &mcast_groups, &igmp_groups);
+    build_lflows(ctx, datapaths, ports, &port_groups, &mcast_groups,
+                 &igmp_groups);
 
     sync_address_sets(ctx);
     sync_port_groups(ctx);
     sync_meters(ctx);
     sync_dns_entries(ctx, datapaths);
 
+    struct ovn_igmp_group *igmp_group, *next_igmp_group;
+
+    HMAP_FOR_EACH_SAFE (igmp_group, next_igmp_group, hmap_node, &igmp_groups) {
+        ovn_igmp_group_destroy(&igmp_groups, igmp_group);
+    }
+
     struct ovn_port_group *pg, *next_pg;
     HMAP_FOR_EACH_SAFE (pg, next_pg, key_node, &port_groups) {
         ovn_port_group_destroy(&port_groups, pg);
     }
+    hmap_destroy(&igmp_groups);
+    hmap_destroy(&mcast_groups);
     hmap_destroy(&port_groups);
 
     /* Sync ipsec configuration.
@@ -8866,12 +9232,41 @@  main(int argc, char *argv[])
     add_column_noalert(ovnsb_idl_loop.idl,
                        &sbrec_ha_chassis_group_col_ref_chassis);
 
+    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_igmp_group);
+    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_igmp_group_col_address);
+    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_igmp_group_col_datapath);
+    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_igmp_group_col_chassis);
+    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_igmp_group_col_ports);
+
+    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ip_multicast);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_datapath);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_enabled);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_querier);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_eth_src);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_ip4_src);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_table_size);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_idle_timeout);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_query_interval);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_ip_multicast_col_query_max_resp);
+
     struct ovsdb_idl_index *sbrec_chassis_by_name
         = chassis_index_create(ovnsb_idl_loop.idl);
 
     struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name
         = ha_chassis_group_index_create(ovnsb_idl_loop.idl);
 
+    struct ovsdb_idl_index *sbrec_ip_mcast_by_dp
+        = ip_mcast_index_create(ovnsb_idl_loop.idl);
+
     /* Ensure that only a single ovn-northd is active in the deployment by
      * acquiring a lock called "ovn_northd" on the southbound database
      * and then only performing DB transactions if the lock is held. */
@@ -8887,6 +9282,7 @@  main(int argc, char *argv[])
             .ovnsb_idl = ovnsb_idl_loop.idl,
             .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
             .sbrec_ha_chassis_grp_by_name = sbrec_ha_chassis_grp_by_name,
+            .sbrec_ip_mcast_by_dp = sbrec_ip_mcast_by_dp,
         };
 
         if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 318379c..0aba420 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -274,6 +274,60 @@ 
       </column>
     </group>
 
+    <group title="IP Multicast Snooping Options">
+      <p>
+        These options control IP Multicast Snooping configuration of the
+        logical switch. To enable IP Multicast Snooping set
+        <ref column="other_config" key="mcast_snoop"/> to true. To enable IP
+        Multicast Querier set <ref column="other_config" key="mcast_snoop"/>
+        to true. If IP Multicast Querier is enabled
+        <ref column="other_config" key="mcast_eth_src"/> and
+        <ref column="other_config" key="mcast_ip4_src"/> must be set.
+      </p>
+      <column name="other_config" key="mcast_snoop"
+          type='{"type": "boolean"}'>
+        Enables/disables IP Multicast Snooping on the logical switch.
+      </column>
+      <column name="other_config" key="mcast_querier"
+          type='{"type": "boolean"}'>
+        Enables/disables IP Multicast Querier on the logical switch.
+      </column>
+      <column name="other_config" key="mcast_flood_unregistered"
+          type='{"type": "boolean"}'>
+        Determines whether unregistered multicast traffic should be flooded
+        or not. Only applicable if
+        <ref column="other_config" key="mcast_snoop"/> is enabled.
+      </column>
+      <column name="other_config" key="mcast_table_size"
+          type='{"type": "integer", "minInteger": 1, "maxInteger": 32766}'>
+        Number of multicast groups to be stored. Default: 2048.
+      </column>
+      <column name="other_config" key="mcast_idle_timeout"
+          type='{"type": "integer", "minInteger": 15, "maxInteger": 3600}'>
+        Configures the IP Multicast Snooping group idle timeout (in seconds).
+        Default: 300 seconds.
+      </column>
+      <column name="other_config" key="mcast_query_interval"
+          type='{"type": "integer", "minInteger": 1, "maxInteger": 3600}'>
+        Configures the IP Multicast Querier interval between queries (in
+        seconds). Default:
+        <ref column="other_config" key="mcast_idle_timeout"/> / 2.
+      </column>
+      <column name="other_config" key="mcast_query_max_response"
+          type='{"type": "integer", "minInteger": 1, "maxInteger": 10}'>
+        Configures the value of the "max-response" field in the multicast
+        queries originated by the logical switch. Default: 1 second.
+      </column>
+      <column name="other_config" key="mcast_eth_src">
+        Configures the source Ethernet address for queries originated by the
+        logical switch.
+      </column>
+      <column name="other_config" key="mcast_ip4_src">
+        Configures the source IPv4 address for queries originated by the
+        logical switch.
+      </column>
+    </group>
+
     <group title="Common Columns">
       <column name="external_ids">
         See <em>External IDs</em> at the beginning of this document.
diff --git a/tests/ovn.at b/tests/ovn.at
index 55f612c..5e7cd94 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -14338,3 +14338,273 @@  AT_CHECK([ovn-nbctl ls-add sw1], [1], [ignore],
 ])
 
 AT_CLEANUP
+
+AT_SETUP([ovn -- IGMP snoop/querier])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# Two independent logical switches (sw1 and sw2).
+# sw1:
+#   - subnet 10.0.0.0/8
+#   - 2 ports bound on hv1 (sw1-p11, sw1-p12)
+#   - 2 ports bound on hv2 (sw1-p21, sw1-p22)
+# sw2:
+#   - subnet 20.0.0.0/8
+#   - 1 port bound on hv1 (sw2-p1)
+#   - 1 port bound on hv2 (sw2-p2)
+#   - IGMP Querier from 20.0.0.254
+
+reset_pcap_file() {
+    local iface=$1
+    local pcap_file=$2
+    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
+options:rxq_pcap=dummy-rx.pcap
+    rm -f ${pcap_file}*.pcap
+    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
+options:rxq_pcap=${pcap_file}-rx.pcap
+}
+
+ip_to_hex() {
+     printf "%02x%02x%02x%02x" "$@"
+}
+
+#
+# send_igmp_v3_report INPORT HV ETH_SRC IP_SRC IP_CSUM GROUP REC_TYPE
+#                     IGMP_CSUM OUTFILE
+#
+# This shell function causes an IGMPv3 report to be received on INPORT of HV.
+# The packet's content has Ethernet destination 01:00:5E:00:00:22 and source
+# ETH_SRC (exactly 12 hex digits). Ethernet type is set to IP.
+# GROUP is the IP multicast group to be joined/to leave (based on REC_TYPE).
+# REC_TYPE == 04: join GROUP
+# REC_TYPE == 03: leave GROUP
+# The packet hexdump is also stored in OUTFILE.
+#
+send_igmp_v3_report() {
+    local inport=$1 hv=$2 eth_src=$3 ip_src=$4 ip_chksum=$5 group=$6
+    local rec_type=$7 igmp_chksum=$8 outfile=$9
+
+    local eth_dst=01005e000016
+    local ip_dst=$(ip_to_hex 224 0 0 22)
+    local ip_ttl=01
+    local ip_ra_opt=94040000
+
+    local igmp_type=2200
+    local num_rec=00000001
+    local aux_dlen=00
+    local num_src=0000
+
+    local eth=${eth_dst}${eth_src}0800
+    local ip=46c0002800004000${ip_ttl}02${ip_chksum}${ip_src}${ip_dst}${ip_ra_opt}
+    local igmp=${igmp_type}${igmp_chksum}${num_rec}${rec_type}${aux_dlen}${num_src}${group}
+    local packet=${eth}${ip}${igmp}
+
+    echo ${packet} >> ${outfile}
+    as $hv ovs-appctl netdev-dummy/receive ${inport} ${packet}
+}
+
+#
+# store_igmp_v3_query ETH_SRC IP_SRC IP_CSUM OUTFILE
+#
+# This shell function builds an IGMPv3 general query from ETH_SRC and IP_SRC
+# and stores the hexdump of the packet in OUTFILE.
+#
+store_igmp_v3_query() {
+    local eth_src=$1 ip_src=$2 ip_chksum=$3 outfile=$4
+
+    local eth_dst=01005e000001
+    local ip_dst=$(ip_to_hex 224 0 0 1)
+    local ip_ttl=01
+    local igmp_type=11
+    local max_resp=0a
+    local igmp_chksum=eeeb
+    local addr=00000000
+
+    local eth=${eth_dst}${eth_src}0800
+    local ip=4500002000004000${ip_ttl}02${ip_chksum}${ip_src}${ip_dst}
+    local igmp=${igmp_type}${max_resp}${igmp_chksum}${addr}000a0000
+    local packet=${eth}${ip}${igmp}
+
+    echo ${packet} >> ${outfile}
+}
+
+#
+# send_ip_multicast_pkt INPORT HV ETH_SRC ETH_DST IP_SRC IP_DST IP_LEN
+#    IP_PROTO DATA OUTFILE
+#
+# This shell function causes an IP multicast packet to be received on INPORT
+# of HV.
+# The hexdump of the packet is stored in OUTFILE.
+#
+send_ip_multicast_pkt() {
+    local inport=$1 hv=$2 eth_src=$3 eth_dst=$4 ip_src=$5 ip_dst=$6
+    local ip_len=$7 ip_chksum=$8 proto=$9 data=${10} outfile=${11}
+
+    local ip_ttl=20
+
+    local eth=${eth_dst}${eth_src}0800
+    local ip=450000${ip_len}95f14000${ip_ttl}${proto}${ip_chksum}${ip_src}${ip_dst}
+    local packet=${eth}${ip}${data}
+
+    as $hv ovs-appctl netdev-dummy/receive ${inport} ${packet}
+    echo ${packet} >> ${outfile}
+}
+
+ovn-nbctl ls-add sw1
+ovn-nbctl ls-add sw2
+
+ovn-nbctl lsp-add sw1 sw1-p11
+ovn-nbctl lsp-add sw1 sw1-p12
+ovn-nbctl lsp-add sw1 sw1-p21
+ovn-nbctl lsp-add sw1 sw1-p22
+ovn-nbctl lsp-add sw2 sw2-p1
+ovn-nbctl lsp-add sw2 sw2-p2
+
+net_add n1
+sim_add hv1
+as hv1
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=sw1-p11 \
+    options:tx_pcap=hv1/vif1-tx.pcap \
+    options:rxq_pcap=hv1/vif1-rx.pcap \
+    ofport-request=1
+ovs-vsctl -- add-port br-int hv1-vif2 -- \
+    set interface hv1-vif2 external-ids:iface-id=sw1-p12 \
+    options:tx_pcap=hv1/vif2-tx.pcap \
+    options:rxq_pcap=hv1/vif2-rx.pcap \
+    ofport-request=1
+ovs-vsctl -- add-port br-int hv1-vif3 -- \
+    set interface hv1-vif3 external-ids:iface-id=sw2-p1 \
+    options:tx_pcap=hv1/vif3-tx.pcap \
+    options:rxq_pcap=hv1/vif3-rx.pcap \
+    ofport-request=1
+
+sim_add hv2
+as hv2
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+ovs-vsctl -- add-port br-int hv2-vif1 -- \
+    set interface hv2-vif1 external-ids:iface-id=sw1-p21 \
+    options:tx_pcap=hv2/vif1-tx.pcap \
+    options:rxq_pcap=hv2/vif1-rx.pcap \
+    ofport-request=1
+ovs-vsctl -- add-port br-int hv2-vif2 -- \
+    set interface hv2-vif2 external-ids:iface-id=sw1-p22 \
+    options:tx_pcap=hv2/vif2-tx.pcap \
+    options:rxq_pcap=hv2/vif2-rx.pcap \
+    ofport-request=1
+ovs-vsctl -- add-port br-int hv2-vif3 -- \
+    set interface hv2-vif3 external-ids:iface-id=sw2-p2 \
+    options:tx_pcap=hv2/vif3-tx.pcap \
+    options:rxq_pcap=hv2/vif3-rx.pcap \
+    ofport-request=1
+
+OVN_POPULATE_ARP
+
+# Enable IGMP snooping on sw1.
+ovn-nbctl set Logical_Switch sw1 other_config:mcast_querier="false"
+ovn-nbctl set Logical_Switch sw1 other_config:mcast_snoop="true"
+
+# No IGMP query should be generated by sw1 (mcast_querier="false").
+truncate -s 0 expected
+OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [expected])
+OVN_CHECK_PACKETS([hv1/vif2-tx.pcap], [expected])
+OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
+OVN_CHECK_PACKETS([hv2/vif2-tx.pcap], [expected])
+
+ovn-nbctl --wait=hv sync
+
+# Inject IGMP Join for 239.0.1.68 on sw1-p11.
+send_igmp_v3_report hv1-vif1 hv1 \
+    000000000001 $(ip_to_hex 10 0 0 1) f9f8 \
+    $(ip_to_hex 239 0 1 68) 04 e9b9 \
+    /dev/null
+# Inject IGMP Join for 239.0.1.68 on sw1-p21.
+send_igmp_v3_report hv2-vif1 hv2 000000000002 $(ip_to_hex 10 0 0 2) f9f9 \
+    $(ip_to_hex 239 0 1 68) 04 e9b9 \
+    /dev/null
+
+# Check that the IGMP Group is learned on both hv.
+OVS_WAIT_UNTIL([
+    total_entries=`ovn-sbctl find IGMP_Group | grep "239.0.1.68" | wc -l`
+    test "${total_entries}" = "2"
+])
+
+# Send traffic and make sure it gets forwarded only on the two ports that
+# joined.
+truncate -s 0 expected
+truncate -s 0 expected_empty
+send_ip_multicast_pkt hv1-vif2 hv1 \
+    000000000001 01005e000144 \
+    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e ca70 11 \
+    e518e518000a3b3a0000 \
+    expected
+
+OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [expected])
+OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
+OVN_CHECK_PACKETS([hv1/vif2-tx.pcap], [expected_empty])
+OVN_CHECK_PACKETS([hv2/vif2-tx.pcap], [expected_empty])
+OVN_CHECK_PACKETS([hv1/vif3-tx.pcap], [expected_empty])
+OVN_CHECK_PACKETS([hv2/vif3-tx.pcap], [expected_empty])
+
+# Inject IGMP Leave for 239.0.1.68 on sw1-p11.
+send_igmp_v3_report hv1-vif1 hv1 \
+    000000000001 $(ip_to_hex 10 0 0 1) f9f8 \
+    $(ip_to_hex 239 0 1 68) 03 eab9 \
+    /dev/null
+
+# Check IGMP_Group table on both HV.
+OVS_WAIT_UNTIL([
+    total_entries=`ovn-sbctl find IGMP_Group | grep "239.0.1.68" | wc -l`
+    test "${total_entries}" = "1"
+])
+
+# Send traffic traffic and make sure it gets forwarded only on the port that
+# joined.
+as hv1 reset_pcap_file hv1-vif1 hv1/vif1
+as hv2 reset_pcap_file hv2-vif1 hv2/vif1
+truncate -s 0 expected
+truncate -s 0 expected_empty
+send_ip_multicast_pkt hv1-vif2 hv1 \
+    000000000001 01005e000144 \
+    $(ip_to_hex 10 0 0 42) $(ip_to_hex 239 0 1 68) 1e ca70 11 \
+    e518e518000a3b3a0000 \
+    expected
+
+OVN_CHECK_PACKETS([hv1/vif1-tx.pcap], [expected_empty])
+OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
+OVN_CHECK_PACKETS([hv1/vif2-tx.pcap], [expected_empty])
+OVN_CHECK_PACKETS([hv2/vif2-tx.pcap], [expected_empty])
+OVN_CHECK_PACKETS([hv1/vif3-tx.pcap], [expected_empty])
+OVN_CHECK_PACKETS([hv2/vif3-tx.pcap], [expected_empty])
+
+# Flush IGMP groups.
+ovn-sbctl ip-multicast-flush sw1
+ovn-nbctl --wait=hv -t 3 sync
+OVS_WAIT_UNTIL([
+    total_entries=`ovn-sbctl find IGMP_Group | grep "239.0.1.68" | wc -l`
+    test "${total_entries}" = "0"
+])
+
+# Enable IGMP snooping and querier on sw2 and set query interval to minimum.
+ovn-nbctl set Logical_Switch sw2 \
+    other_config:mcast_snoop="true" \
+    other_config:mcast_querier="true" \
+    other_config:mcast_query_interval=1 \
+    other_config:mcast_eth_src="00:00:00:00:02:fe" \
+    other_config:mcast_ip4_src="20.0.0.254"
+
+# Wait for 1 query interval (1 sec) and check that two queries are generated.
+truncate -s 0 expected
+store_igmp_v3_query 0000000002fe $(ip_to_hex 20 0 0 254) 84dd expected
+store_igmp_v3_query 0000000002fe $(ip_to_hex 20 0 0 254) 84dd expected
+
+sleep 1
+OVN_CHECK_PACKETS([hv1/vif3-tx.pcap], [expected])
+OVN_CHECK_PACKETS([hv2/vif3-tx.pcap], [expected])
+
+OVN_CLEANUP([hv1], [hv2])
+AT_CLEANUP
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index b7e2d77..10fbd26 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -1542,3 +1542,122 @@  as
 OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
 /connection dropped.*/d"])
 AT_CLEANUP
+
+AT_SETUP([ovn -- 2 LSs IGMP])
+AT_KEYWORDS([ovnigmp])
+
+ovn_start
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-int])
+
+# Set external-ids in br-int needed for ovn-controller
+ovs-vsctl \
+        -- set Open_vSwitch . external-ids:system-id=hv1 \
+        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
+        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
+        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
+        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
+
+# Start ovn-controller
+start_daemon ovn-controller
+
+# Logical network:
+# Two independent logical switches (sw1 and sw2).
+# sw1:
+#   - subnet 10.0.0.0/8
+#   - 2 ports (sw1-p1 - sw1-p2)
+# sw2:
+#   - subnet 20.0.0.0/8
+#   - 2 port (sw2-p1 - sw2-p2)
+#   - IGMP Querier from 20.0.0.254
+
+ovn-nbctl ls-add sw1
+ovn-nbctl ls-add sw2
+
+for i in `seq 1 2`
+do
+    ADD_NAMESPACES(sw1-p$i)
+    ADD_VETH(sw1-p$i, sw1-p$i, br-int, "10.0.0.$i/24", "00:00:00:00:01:0$i", \
+            "10.0.0.254")
+    ovn-nbctl lsp-add sw1 sw1-p$i \
+        -- lsp-set-addresses sw1-p$i "00:00:00:00:01:0$i 10.0.0.$i"
+done
+
+for i in `seq 1 2`
+do
+    ADD_NAMESPACES(sw2-p$i)
+    ADD_VETH(sw2-p$i, sw2-p$i, br-int, "20.0.0.$i/24", "00:00:00:00:02:0$i", \
+            "20.0.0.254")
+    ovn-nbctl lsp-add sw2 sw2-p$i \
+        -- lsp-set-addresses sw2-p$i "00:00:00:00:02:0$i 20.0.0.$i"
+done
+
+# Enable IGMP snooping on sw1.
+ovn-nbctl set Logical_Switch sw1 other_config:mcast_querier="false"
+ovn-nbctl set Logical_Switch sw1 other_config:mcast_snoop="true"
+
+# Inject IGMP Join for 239.0.1.68 on sw1-p1.
+NS_CHECK_EXEC([sw1-p1], [ip addr add dev sw1-p1 239.0.1.68/32 autojoin], [0])
+
+# Inject IGMP Join for 239.0.1.68 on sw1-p2
+NS_CHECK_EXEC([sw1-p2], [ip addr add dev sw1-p2 239.0.1.68/32 autojoin], [0])
+
+# Check that the IGMP Group is learned.
+OVS_WAIT_UNTIL([
+    total_entries=`ovn-sbctl find IGMP_Group | grep "239.0.1.68" | wc -l`
+    ports=`ovn-sbctl find IGMP_Group | grep ports | cut -f 2 -d ":" | wc -w`
+    test "${total_entries}" = "1"
+    test "${ports}" = "2"
+])
+
+# Inject IGMP Leave for 239.0.1.68 on sw1-p2.
+NS_CHECK_EXEC([sw1-p2], [ip addr del dev sw1-p2 239.0.1.68/32], [0])
+
+# Check that only one port is left in the group.
+OVS_WAIT_UNTIL([
+    total_entries=`ovn-sbctl find IGMP_Group | grep "239.0.1.68" | wc -l`
+    ports=`ovn-sbctl find IGMP_Group | grep ports | cut -f 2 -d ":" | wc -w`
+    test "${total_entries}" = "1"
+    test "${ports}" = "1"
+])
+
+# Flush IGMP groups.
+ovn-sbctl ip-multicast-flush sw1
+ovn-nbctl --wait=hv -t 3 sync
+OVS_WAIT_UNTIL([
+    total_entries=`ovn-sbctl find IGMP_Group | grep "239.0.1.68" | wc -l`
+    test "${total_entries}" = "0"
+])
+
+# Enable IGMP snooping and querier on sw2 and set query interval to minimum.
+ovn-nbctl set Logical_Switch sw2 \
+    other_config:mcast_snoop="true" \
+    other_config:mcast_querier="true" \
+    other_config:mcast_query_interval=1 \
+    other_config:mcast_eth_src="00:00:00:00:02:fe" \
+    other_config:mcast_ip4_src="20.0.0.254"
+
+# Check that queries are generated.
+NS_CHECK_EXEC([sw2-p1], [tcpdump -n -c 2 -i sw2-p1 igmp > sw2-p1.pcap &])
+
+OVS_WAIT_UNTIL([
+    total_queries=`cat sw2-p1.pcap | grep "igmp query" | wc -l`
+    test "${total_queries}" = "2"
+])
+
+OVS_APP_EXIT_AND_WAIT([ovn-controller])
+
+as ovn-sb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as ovn-nb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as northd
+OVS_APP_EXIT_AND_WAIT([ovn-northd])
+
+as
+OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
+/connection dropped.*/d"])
+AT_CLEANUP