diff mbox series

[ovs-dev,1/7] netdev: add netdev_get_speed() to nedev API

Message ID 20230421151651.3032616-2-amorenoz@redhat.com
State Changes Requested
Headers show
Series Improve linux QoS for exotic and fast links | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Adrian Moreno April 21, 2023, 3:16 p.m. UTC
Currently, the netdev's speed is being calculated by taking the link's
feature bits (using netdev_get_features()) and transforming them into
bps.

This mechanism can be both inaccurate and difficult to maintain, mainly
because we currently use the feature bits supported by OpenFlow which
would have to be extended to support all new feature bits of all netdev
implementations while keeping the OpenFlow API intact.

In order to expose the link speed accurately for all current and future
hardware, add a new netdev API call that allows the implementations to
provide the current and maximum link speeds in Mbps.

Internally, the logic to get the maximum supported speed still relies on
feature bits so it might still get out of sync in the future. However,
the maximum configurable speed is not used as much as the current speed
and these feature bits are not exposed through the netdev interface so
it should be easier to add more.

Use this new function instead of netdev_get_features() where the link
speed is needed.

As a consecuence of this patch, link speeds of cards is properly
reported (internally in OVSDB) even if not supported by OpenFlow.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
---
 include/openvswitch/netdev.h |  1 +
 lib/dpif.h                   |  4 ++-
 lib/netdev-dpdk.c            | 48 ++++++++++++++++++++++++++++++++++++
 lib/netdev-linux-private.h   |  1 +
 lib/netdev-linux.c           | 44 ++++++++++++++++++++++++++-------
 lib/netdev-provider.h        |  9 +++++++
 lib/netdev.c                 | 23 +++++++++++++++++
 ofproto/ofproto-dpif-sflow.c | 10 ++++++--
 ofproto/ofproto.c            |  6 +++--
 vswitchd/bridge.c            | 30 +++++++++++++---------
 10 files changed, 151 insertions(+), 25 deletions(-)

Comments

Simon Horman April 25, 2023, 12:47 p.m. UTC | #1
On Fri, Apr 21, 2023 at 05:16:45PM +0200, Adrian Moreno wrote:
> Currently, the netdev's speed is being calculated by taking the link's
> feature bits (using netdev_get_features()) and transforming them into
> bps.
> 
> This mechanism can be both inaccurate and difficult to maintain, mainly
> because we currently use the feature bits supported by OpenFlow which
> would have to be extended to support all new feature bits of all netdev
> implementations while keeping the OpenFlow API intact.
> 
> In order to expose the link speed accurately for all current and future
> hardware, add a new netdev API call that allows the implementations to
> provide the current and maximum link speeds in Mbps.
> 
> Internally, the logic to get the maximum supported speed still relies on
> feature bits so it might still get out of sync in the future. However,
> the maximum configurable speed is not used as much as the current speed
> and these feature bits are not exposed through the netdev interface so
> it should be easier to add more.
> 
> Use this new function instead of netdev_get_features() where the link
> speed is needed.
> 
> As a consecuence of this patch, link speeds of cards is properly
> reported (internally in OVSDB) even if not supported by OpenFlow.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>

...

> @@ -2456,14 +2459,19 @@ iface_refresh_netdev_status(struct iface *iface)
>      ovsrec_interface_set_link_resets(iface->cfg, &link_resets, 1);
>  
>      error = netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
> -    bps = !error ? netdev_features_to_bps(current, 0) : 0;
> -    if (bps) {
> +    if (!error) {
>          ovsrec_interface_set_duplex(iface->cfg,
>                                      netdev_features_is_full_duplex(current)
>                                      ? "full" : "half");
> -        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
>      } else {
>          ovsrec_interface_set_duplex(iface->cfg, NULL);
> +    }
> +
> +    error = netdev_get_speed(iface->netdev, &mbps, NULL);
> +    if (!error) {
> +        bps = (uint64_t) mbps * 1000000;

Hi Adrian,

nit: Can 1000000ULL be used to avoid the need for a cast?
     Likewise in patch 2.

> +        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
> +    } else {
>          ovsrec_interface_set_link_speed(iface->cfg, NULL, 0);
>      }

...
Adrian Moreno May 3, 2023, 1:53 p.m. UTC | #2
On 4/25/23 14:47, Simon Horman wrote:
> On Fri, Apr 21, 2023 at 05:16:45PM +0200, Adrian Moreno wrote:
>> Currently, the netdev's speed is being calculated by taking the link's
>> feature bits (using netdev_get_features()) and transforming them into
>> bps.
>>
>> This mechanism can be both inaccurate and difficult to maintain, mainly
>> because we currently use the feature bits supported by OpenFlow which
>> would have to be extended to support all new feature bits of all netdev
>> implementations while keeping the OpenFlow API intact.
>>
>> In order to expose the link speed accurately for all current and future
>> hardware, add a new netdev API call that allows the implementations to
>> provide the current and maximum link speeds in Mbps.
>>
>> Internally, the logic to get the maximum supported speed still relies on
>> feature bits so it might still get out of sync in the future. However,
>> the maximum configurable speed is not used as much as the current speed
>> and these feature bits are not exposed through the netdev interface so
>> it should be easier to add more.
>>
>> Use this new function instead of netdev_get_features() where the link
>> speed is needed.
>>
>> As a consecuence of this patch, link speeds of cards is properly
>> reported (internally in OVSDB) even if not supported by OpenFlow.
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
>> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> 
> ...
> 
>> @@ -2456,14 +2459,19 @@ iface_refresh_netdev_status(struct iface *iface)
>>       ovsrec_interface_set_link_resets(iface->cfg, &link_resets, 1);
>>   
>>       error = netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
>> -    bps = !error ? netdev_features_to_bps(current, 0) : 0;
>> -    if (bps) {
>> +    if (!error) {
>>           ovsrec_interface_set_duplex(iface->cfg,
>>                                       netdev_features_is_full_duplex(current)
>>                                       ? "full" : "half");
>> -        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
>>       } else {
>>           ovsrec_interface_set_duplex(iface->cfg, NULL);
>> +    }
>> +
>> +    error = netdev_get_speed(iface->netdev, &mbps, NULL);
>> +    if (!error) {
>> +        bps = (uint64_t) mbps * 1000000;
> 
> Hi Adrian,
> 
> nit: Can 1000000ULL be used to avoid the need for a cast?
>       Likewise in patch 2.
> 

Sure! Thanks for the suggestion.

>> +        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
>> +    } else {
>>           ovsrec_interface_set_link_speed(iface->cfg, NULL, 0);
>>       }
> 
> ...
>
Ilya Maximets May 4, 2023, 3:52 p.m. UTC | #3
On 4/21/23 17:16, Adrian Moreno wrote:
> Currently, the netdev's speed is being calculated by taking the link's
> feature bits (using netdev_get_features()) and transforming them into
> bps.
> 
> This mechanism can be both inaccurate and difficult to maintain, mainly
> because we currently use the feature bits supported by OpenFlow which
> would have to be extended to support all new feature bits of all netdev
> implementations while keeping the OpenFlow API intact.
> 
> In order to expose the link speed accurately for all current and future
> hardware, add a new netdev API call that allows the implementations to
> provide the current and maximum link speeds in Mbps.
> 
> Internally, the logic to get the maximum supported speed still relies on
> feature bits so it might still get out of sync in the future. However,
> the maximum configurable speed is not used as much as the current speed
> and these feature bits are not exposed through the netdev interface so
> it should be easier to add more.
> 
> Use this new function instead of netdev_get_features() where the link
> speed is needed.
> 
> As a consecuence of this patch, link speeds of cards is properly
> reported (internally in OVSDB) even if not supported by OpenFlow.

Hi, Adrian.  Thanks for the set!  I didn't test it, but I have a few
comments inline.

Best regards, Ilya Maximets.

> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567

I'm not sure about this link.  The BZ talks about switching to
ETHTOOL_GLINKSETTINGS interface to get speeds that do not have
predefined values.  Current patch is a step forward to be able
to support other link speeds, but IIUC, we will still not able
to detect 25 and 50 Gbps links in netdev-linux.

> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
> ---
>  include/openvswitch/netdev.h |  1 +
>  lib/dpif.h                   |  4 ++-
>  lib/netdev-dpdk.c            | 48 ++++++++++++++++++++++++++++++++++++
>  lib/netdev-linux-private.h   |  1 +
>  lib/netdev-linux.c           | 44 ++++++++++++++++++++++++++-------
>  lib/netdev-provider.h        |  9 +++++++
>  lib/netdev.c                 | 23 +++++++++++++++++
>  ofproto/ofproto-dpif-sflow.c | 10 ++++++--
>  ofproto/ofproto.c            |  6 +++--
>  vswitchd/bridge.c            | 30 +++++++++++++---------
>  10 files changed, 151 insertions(+), 25 deletions(-)
> 
> diff --git a/include/openvswitch/netdev.h b/include/openvswitch/netdev.h
> index cafd6fd7b..83e8633dd 100644
> --- a/include/openvswitch/netdev.h
> +++ b/include/openvswitch/netdev.h
> @@ -132,6 +132,7 @@ int netdev_get_features(const struct netdev *,
>                          enum netdev_features *advertised,
>                          enum netdev_features *supported,
>                          enum netdev_features *peer);
> +int netdev_get_speed(const struct netdev *, uint32_t *current, uint32_t *max);
>  uint64_t netdev_features_to_bps(enum netdev_features features,
>                                  uint64_t default_bps);
>  bool netdev_features_is_full_duplex(enum netdev_features features);
> diff --git a/lib/dpif.h b/lib/dpif.h
> index 129cbf6a1..9e9d0aa1b 100644
> --- a/lib/dpif.h
> +++ b/lib/dpif.h
> @@ -91,7 +91,9 @@
>   *
>   *    - Carrier status (netdev_get_carrier()).
>   *
> - *    - Speed (netdev_get_features()).
> + *    - Link features (netdev_get_features()).
> + *
> + *    - Speed (netdev_get_speed()).
>   *
>   *    - QoS queue configuration (netdev_get_queue(), netdev_set_queue() and
>   *      related functions.)
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index fb0dd43f7..b136f25a4 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -3505,6 +3505,53 @@ netdev_dpdk_get_features(const struct netdev *netdev,
>      return 0;
>  }
>  
> +static int
> +netdev_dpdk_get_speed(const struct netdev *netdev, uint32_t *current,
> +                      uint32_t *max)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    struct rte_eth_link link;
> +    struct rte_eth_dev_info dev_info;

Reverse x-mass tree, please.

> +
> +    ovs_mutex_lock(&dev->mutex);
> +    link = dev->link;
> +    rte_eth_dev_info_get(dev->port_id, &dev_info);
> +    ovs_mutex_unlock(&dev->mutex);
> +
> +    *current = link.link_speed != RTE_ETH_SPEED_NUM_UNKNOWN
> +        ? link.link_speed : 0;
> +
> +    if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_200G) {
> +        *max = RTE_ETH_SPEED_NUM_200G;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_100G) {
> +        *max = RTE_ETH_SPEED_NUM_100G;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_56G) {
> +        *max = RTE_ETH_SPEED_NUM_56G;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_50G) {
> +        *max = RTE_ETH_SPEED_NUM_50G;

Should there be 25 and 40 as well?

> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_20G) {
> +        *max = RTE_ETH_SPEED_NUM_20G;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_10G) {
> +        *max = RTE_ETH_SPEED_NUM_10G;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_5G) {
> +        *max = RTE_ETH_SPEED_NUM_5G;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_2_5G) {
> +        *max = RTE_ETH_SPEED_NUM_2_5G;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_1G) {
> +        *max = RTE_ETH_SPEED_NUM_1G;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_100M ||
> +        dev_info.speed_capa & RTE_ETH_LINK_SPEED_100M_HD) {
> +        *max = RTE_ETH_SPEED_NUM_100M;
> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_10M ||
> +        dev_info.speed_capa & RTE_ETH_LINK_SPEED_10M_HD) {
> +        *max = RTE_ETH_SPEED_NUM_10M;
> +    } else {
> +        *max = 0;
> +    }
> +
> +    return 0;
> +}
> +
>  static struct ingress_policer *
>  netdev_dpdk_policer_construct(uint32_t rate, uint32_t burst)
>  {
> @@ -5809,6 +5856,7 @@ parse_vhost_config(const struct smap *ovs_other_config)
>      .get_stats = netdev_dpdk_get_stats,                 \
>      .get_custom_stats = netdev_dpdk_get_custom_stats,   \
>      .get_features = netdev_dpdk_get_features,           \
> +    .get_speed = netdev_dpdk_get_speed,                 \
>      .get_status = netdev_dpdk_get_status,               \
>      .reconfigure = netdev_dpdk_reconfigure,             \
>      .rxq_recv = netdev_dpdk_rxq_recv
> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
> index deb015bdb..39508090c 100644
> --- a/lib/netdev-linux-private.h
> +++ b/lib/netdev-linux-private.h
> @@ -92,6 +92,7 @@ struct netdev_linux {
>      enum netdev_features current;    /* Cached from ETHTOOL_GSET. */
>      enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
>      enum netdev_features supported;  /* Cached from ETHTOOL_GSET. */
> +    uint32_t speed;                  /* Cached from ETHTOOL_GSET. */

Might make sense to name it 'current_speed' ?

>  
>      struct ethtool_drvinfo drvinfo;  /* Cached from ETHTOOL_GDRVINFO. */
>      struct tc *tc;
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index 36620199e..576adbf3f 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -2348,7 +2348,6 @@ static void
>  netdev_linux_read_features(struct netdev_linux *netdev)
>  {
>      struct ethtool_cmd ecmd;
> -    uint32_t speed;
>      int error;
>  
>      if (netdev->cache_valid & VALID_FEATURES) {
> @@ -2462,20 +2461,20 @@ netdev_linux_read_features(struct netdev_linux *netdev)
>      }
>  
>      /* Current settings. */
> -    speed = ethtool_cmd_speed(&ecmd);
> -    if (speed == SPEED_10) {
> +    netdev->speed = ethtool_cmd_speed(&ecmd);
> +    if (netdev->speed == SPEED_10) {
>          netdev->current = ecmd.duplex ? NETDEV_F_10MB_FD : NETDEV_F_10MB_HD;
> -    } else if (speed == SPEED_100) {
> +    } else if (netdev->speed == SPEED_100) {
>          netdev->current = ecmd.duplex ? NETDEV_F_100MB_FD : NETDEV_F_100MB_HD;
> -    } else if (speed == SPEED_1000) {
> +    } else if (netdev->speed == SPEED_1000) {
>          netdev->current = ecmd.duplex ? NETDEV_F_1GB_FD : NETDEV_F_1GB_HD;
> -    } else if (speed == SPEED_10000) {
> +    } else if (netdev->speed == SPEED_10000) {
>          netdev->current = NETDEV_F_10GB_FD;
> -    } else if (speed == 40000) {
> +    } else if (netdev->speed == 40000) {
>          netdev->current = NETDEV_F_40GB_FD;
> -    } else if (speed == 100000) {
> +    } else if (netdev->speed == 100000) {
>          netdev->current = NETDEV_F_100GB_FD;
> -    } else if (speed == 1000000) {
> +    } else if (netdev->speed == 1000000) {
>          netdev->current = NETDEV_F_1TB_FD;
>      } else {
>          netdev->current = 0;
> @@ -2529,6 +2528,31 @@ exit:
>      return error;
>  }
>  
> +static int
> +netdev_linux_get_speed(const struct netdev *netdev_, uint32_t *current,
> +                       uint32_t *max)
> +{
> +    struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> +    int error;
> +
> +    ovs_mutex_lock(&netdev->mutex);
> +    if (netdev_linux_netnsid_is_remote(netdev)) {
> +        error = EOPNOTSUPP;
> +        goto exit;
> +    }
> +
> +    netdev_linux_read_features(netdev);
> +    if (!netdev->get_features_error) {
> +        *current = netdev->speed == SPEED_UNKNOWN ? 0 : netdev->speed;
> +        *max = netdev_features_to_bps(netdev->supported, 0) / 1000000ULL;
> +    }
> +    error = netdev->get_features_error;
> +
> +exit:
> +    ovs_mutex_unlock(&netdev->mutex);
> +    return error;
> +}
> +
>  /* Set the features advertised by 'netdev' to 'advertise'. */
>  static int
>  netdev_linux_set_advertisements(struct netdev *netdev_,
> @@ -3655,6 +3679,7 @@ const struct netdev_class netdev_linux_class = {
>      .destruct = netdev_linux_destruct,
>      .get_stats = netdev_linux_get_stats,
>      .get_features = netdev_linux_get_features,
> +    .get_speed = netdev_linux_get_speed,
>      .get_status = netdev_linux_get_status,
>      .get_block_id = netdev_linux_get_block_id,
>      .send = netdev_linux_send,
> @@ -3671,6 +3696,7 @@ const struct netdev_class netdev_tap_class = {
>      .destruct = netdev_linux_destruct,
>      .get_stats = netdev_tap_get_stats,
>      .get_features = netdev_linux_get_features,
> +    .get_speed = netdev_linux_get_speed,
>      .get_status = netdev_linux_get_status,
>      .send = netdev_linux_send,
>      .rxq_construct = netdev_linux_rxq_construct,
> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> index b5420947d..cf17aaea9 100644
> --- a/lib/netdev-provider.h
> +++ b/lib/netdev-provider.h
> @@ -500,6 +500,15 @@ struct netdev_class {
>                          enum netdev_features *supported,
>                          enum netdev_features *peer);
>  
> +    /* Stores the current and maximum supported link speed by 'netdev' into
> +     * each of '*current' and '*max'. Each value represents the speed in Mbps.
> +     * If any of the speeds is unknown, a zero value must be returned.

s/returned/stored/ ?

> +     *
> +     * This function may be set to null if it would always return EOPNOTSUPP.
> +     */
> +    int (*get_speed)(const struct netdev *netdev, uint32_t *current,
> +                     uint32_t *max);
> +
>      /* Set the features advertised by 'netdev' to 'advertise', which is a
>       * set of NETDEV_F_* bits.
>       *
> diff --git a/lib/netdev.c b/lib/netdev.c
> index c79778378..aea4e2d46 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -1167,6 +1167,29 @@ netdev_get_features(const struct netdev *netdev,
>      return error;
>  }
>  
> +int
> +netdev_get_speed(const struct netdev *netdev, uint32_t *current, uint32_t *max)
> +{
> +    uint32_t current_dummy, max_dummy;
> +    int error;
> +
> +    if (!current) {
> +        current = &current_dummy;
> +    }
> +    if (!max) {
> +        max = &max_dummy;
> +    }
> +
> +    error = netdev->netdev_class->get_speed
> +        ? netdev->netdev_class->get_speed(netdev, current, max)
> +        : EOPNOTSUPP;

Please, indent ? and : to the level where condition starts, i.e.
shift 4 spaces to the rigth.  Same goes to other ternary blocks
in the patch set.

> +
> +    if (error) {
> +        *current = *max = 0;
> +    }
> +    return error;
> +}
> +
>  /* Returns the maximum speed of a network connection that has the NETDEV_F_*
>   * bits in 'features', in bits per second.  If no bits that indicate a speed
>   * are set in 'features', returns 'default_bps'. */
> diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
> index a405eb056..7a8f259a2 100644
> --- a/ofproto/ofproto-dpif-sflow.c
> +++ b/ofproto/ofproto-dpif-sflow.c
> @@ -307,6 +307,7 @@ sflow_agent_get_counters(void *ds_, SFLPoller *poller,
>      enum netdev_flags flags;
>      struct lacp_member_stats lacp_stats;
>      const char *ifName;
> +    uint32_t curr_speed;

We don't have a reverse x-mass tree here, but it might be better
to not make it worse.

>  
>      dsp = dpif_sflow_find_port(ds, u32_to_odp(poller->bridgePort));
>      if (!dsp) {
> @@ -320,13 +321,18 @@ sflow_agent_get_counters(void *ds_, SFLPoller *poller,
>      if (!netdev_get_features(dsp->ofport->netdev, &current, NULL, NULL, NULL)) {
>          /* The values of ifDirection come from MAU MIB (RFC 2668): 0 = unknown,
>             1 = full-duplex, 2 = half-duplex, 3 = in, 4=out */
> -        counters->ifSpeed = netdev_features_to_bps(current, 0);
>          counters->ifDirection = (netdev_features_is_full_duplex(current)
>                                   ? 1 : 2);
>      } else {
> -        counters->ifSpeed = 100000000;
>          counters->ifDirection = 0;
>      }
> +
> +    if (!netdev_get_speed(dsp->ofport->netdev, &curr_speed, NULL)) {
> +        counters->ifSpeed = curr_speed * 1000000;
> +    } else {
> +        counters->ifSpeed = 100000000;
> +    }
> +
>      if (!netdev_get_flags(dsp->ofport->netdev, &flags) && flags & NETDEV_UP) {
>          counters->ifStatus = 1; /* ifAdminStatus up. */
>          if (netdev_get_carrier(dsp->ofport->netdev)) {
> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> index 11cc0c6f6..2df210921 100644
> --- a/ofproto/ofproto.c
> +++ b/ofproto/ofproto.c
> @@ -2478,6 +2478,7 @@ ofport_open(struct ofproto *ofproto,
>  {
>      enum netdev_flags flags;
>      struct netdev *netdev;
> +    uint32_t curr_speed, max_speed;
>      int error;

ditto.

>  
>      *p_netdev = NULL;
> @@ -2514,8 +2515,9 @@ ofport_open(struct ofproto *ofproto,
>      pp->state = netdev_get_carrier(netdev) ? 0 : OFPUTIL_PS_LINK_DOWN;
>      netdev_get_features(netdev, &pp->curr, &pp->advertised,
>                          &pp->supported, &pp->peer);
> -    pp->curr_speed = netdev_features_to_bps(pp->curr, 0) / 1000;
> -    pp->max_speed = netdev_features_to_bps(pp->supported, 0) / 1000;
> +    netdev_get_speed(netdev, &curr_speed, &max_speed);
> +    pp->curr_speed = curr_speed * 1000;
> +    pp->max_speed = max_speed * 1000;
>  
>      *p_netdev = netdev;
>      return 0;
> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
> index f5dc59ad0..2059f7315 100644
> --- a/vswitchd/bridge.c
> +++ b/vswitchd/bridge.c
> @@ -1694,11 +1694,12 @@ port_configure_stp(const struct ofproto *ofproto, struct port *port,
>      if (config_str) {
>          port_s->path_cost = strtoul(config_str, NULL, 10);
>      } else {
> -        enum netdev_features current;
> -        unsigned int mbps;
> +        uint32_t mbps;
>  
> -        netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
> -        mbps = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 1000000;
> +        netdev_get_speed(iface->netdev, &mbps, NULL);
> +        if (mbps == 0) {

If (!mbps) ?

> +            mbps = NETDEV_DEFAULT_BPS / 1000000;
> +        }
>          port_s->path_cost = stp_convert_speed_to_cost(mbps);
>      }
>  
> @@ -1777,11 +1778,12 @@ port_configure_rstp(const struct ofproto *ofproto, struct port *port,
>      if (config_str) {
>          port_s->path_cost = strtoul(config_str, NULL, 10);
>      } else {
> -        enum netdev_features current;
> -        unsigned int mbps;
> +        uint32_t mbps;
>  
> -        netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
> -        mbps = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 1000000;
> +        netdev_get_speed(iface->netdev, &mbps, NULL);
> +        if (mbps == 0) {
> +            mbps = NETDEV_DEFAULT_BPS / 1000000;
> +        }
>          port_s->path_cost = rstp_convert_speed_to_cost(mbps);
>      }
>  
> @@ -2417,6 +2419,7 @@ iface_refresh_netdev_status(struct iface *iface)
>      const char *link_state;
>      struct eth_addr mac;
>      int64_t bps, mtu_64, ifindex64, link_resets;
> +    uint32_t mbps;
>      int mtu, error;

Same here.

>  
>      if (iface_is_synthetic(iface)) {
> @@ -2456,14 +2459,19 @@ iface_refresh_netdev_status(struct iface *iface)
>      ovsrec_interface_set_link_resets(iface->cfg, &link_resets, 1);
>  
>      error = netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
> -    bps = !error ? netdev_features_to_bps(current, 0) : 0;
> -    if (bps) {
> +    if (!error) {
>          ovsrec_interface_set_duplex(iface->cfg,
>                                      netdev_features_is_full_duplex(current)
>                                      ? "full" : "half");
> -        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
>      } else {
>          ovsrec_interface_set_duplex(iface->cfg, NULL);
> +    }
> +
> +    error = netdev_get_speed(iface->netdev, &mbps, NULL);
> +    if (!error) {

Should we check for error or for mbps being zero?

> +        bps = (uint64_t) mbps * 1000000;
> +        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
> +    } else {
>          ovsrec_interface_set_link_speed(iface->cfg, NULL, 0);
>      }
>
Adrian Moreno May 5, 2023, 8:04 a.m. UTC | #4
On 5/4/23 17:52, Ilya Maximets wrote:
> On 4/21/23 17:16, Adrian Moreno wrote:
>> Currently, the netdev's speed is being calculated by taking the link's
>> feature bits (using netdev_get_features()) and transforming them into
>> bps.
>>
>> This mechanism can be both inaccurate and difficult to maintain, mainly
>> because we currently use the feature bits supported by OpenFlow which
>> would have to be extended to support all new feature bits of all netdev
>> implementations while keeping the OpenFlow API intact.
>>
>> In order to expose the link speed accurately for all current and future
>> hardware, add a new netdev API call that allows the implementations to
>> provide the current and maximum link speeds in Mbps.
>>
>> Internally, the logic to get the maximum supported speed still relies on
>> feature bits so it might still get out of sync in the future. However,
>> the maximum configurable speed is not used as much as the current speed
>> and these feature bits are not exposed through the netdev interface so
>> it should be easier to add more.
>>
>> Use this new function instead of netdev_get_features() where the link
>> speed is needed.
>>
>> As a consecuence of this patch, link speeds of cards is properly
>> reported (internally in OVSDB) even if not supported by OpenFlow.
> 
> Hi, Adrian.  Thanks for the set!  I didn't test it, but I have a few
> comments inline.
> 
> Best regards, Ilya Maximets.
> 
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
> 
> I'm not sure about this link.  The BZ talks about switching to
> ETHTOOL_GLINKSETTINGS interface to get speeds that do not have
> predefined values.  Current patch is a step forward to be able
> to support other link speeds, but IIUC, we will still not able
> to detect 25 and 50 Gbps links in netdev-linux.

It's true that we will not be able to report a bit in the feature bitmask that 
indicates 25 or 50 Gbps. I think this is not a huge deal because we don't have 
those flags in the ofproto layer anyway.

However, with this patch netdev-linux should be able to read the speed directly 
from ethtool_cmd_speed() which uses a 32-bit field in Mbps. Using this speed 
directly (and not the bitmask) to calculate QoS maximum rates and even to report 
the speed in ofproto port information (OFP >= 1.1) should, IIUC, address the 
limitations reported in the BZ.


> 
>> Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
>> ---
>>   include/openvswitch/netdev.h |  1 +
>>   lib/dpif.h                   |  4 ++-
>>   lib/netdev-dpdk.c            | 48 ++++++++++++++++++++++++++++++++++++
>>   lib/netdev-linux-private.h   |  1 +
>>   lib/netdev-linux.c           | 44 ++++++++++++++++++++++++++-------
>>   lib/netdev-provider.h        |  9 +++++++
>>   lib/netdev.c                 | 23 +++++++++++++++++
>>   ofproto/ofproto-dpif-sflow.c | 10 ++++++--
>>   ofproto/ofproto.c            |  6 +++--
>>   vswitchd/bridge.c            | 30 +++++++++++++---------
>>   10 files changed, 151 insertions(+), 25 deletions(-)
>>
>> diff --git a/include/openvswitch/netdev.h b/include/openvswitch/netdev.h
>> index cafd6fd7b..83e8633dd 100644
>> --- a/include/openvswitch/netdev.h
>> +++ b/include/openvswitch/netdev.h
>> @@ -132,6 +132,7 @@ int netdev_get_features(const struct netdev *,
>>                           enum netdev_features *advertised,
>>                           enum netdev_features *supported,
>>                           enum netdev_features *peer);
>> +int netdev_get_speed(const struct netdev *, uint32_t *current, uint32_t *max);
>>   uint64_t netdev_features_to_bps(enum netdev_features features,
>>                                   uint64_t default_bps);
>>   bool netdev_features_is_full_duplex(enum netdev_features features);
>> diff --git a/lib/dpif.h b/lib/dpif.h
>> index 129cbf6a1..9e9d0aa1b 100644
>> --- a/lib/dpif.h
>> +++ b/lib/dpif.h
>> @@ -91,7 +91,9 @@
>>    *
>>    *    - Carrier status (netdev_get_carrier()).
>>    *
>> - *    - Speed (netdev_get_features()).
>> + *    - Link features (netdev_get_features()).
>> + *
>> + *    - Speed (netdev_get_speed()).
>>    *
>>    *    - QoS queue configuration (netdev_get_queue(), netdev_set_queue() and
>>    *      related functions.)
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index fb0dd43f7..b136f25a4 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -3505,6 +3505,53 @@ netdev_dpdk_get_features(const struct netdev *netdev,
>>       return 0;
>>   }
>>   
>> +static int
>> +netdev_dpdk_get_speed(const struct netdev *netdev, uint32_t *current,
>> +                      uint32_t *max)
>> +{
>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +    struct rte_eth_link link;
>> +    struct rte_eth_dev_info dev_info;
> 
> Reverse x-mass tree, please.
> 
>> +
>> +    ovs_mutex_lock(&dev->mutex);
>> +    link = dev->link;
>> +    rte_eth_dev_info_get(dev->port_id, &dev_info);
>> +    ovs_mutex_unlock(&dev->mutex);
>> +
>> +    *current = link.link_speed != RTE_ETH_SPEED_NUM_UNKNOWN
>> +        ? link.link_speed : 0;
>> +
>> +    if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_200G) {
>> +        *max = RTE_ETH_SPEED_NUM_200G;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_100G) {
>> +        *max = RTE_ETH_SPEED_NUM_100G;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_56G) {
>> +        *max = RTE_ETH_SPEED_NUM_56G;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_50G) {
>> +        *max = RTE_ETH_SPEED_NUM_50G;
> 
> Should there be 25 and 40 as well?

Yes, they were missing in the previous implementation and I thought they were 
not defined in rte_ethdev.h but they are. I'll add them.

> 
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_20G) {
>> +        *max = RTE_ETH_SPEED_NUM_20G;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_10G) {
>> +        *max = RTE_ETH_SPEED_NUM_10G;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_5G) {
>> +        *max = RTE_ETH_SPEED_NUM_5G;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_2_5G) {
>> +        *max = RTE_ETH_SPEED_NUM_2_5G;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_1G) {
>> +        *max = RTE_ETH_SPEED_NUM_1G;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_100M ||
>> +        dev_info.speed_capa & RTE_ETH_LINK_SPEED_100M_HD) {
>> +        *max = RTE_ETH_SPEED_NUM_100M;
>> +    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_10M ||
>> +        dev_info.speed_capa & RTE_ETH_LINK_SPEED_10M_HD) {
>> +        *max = RTE_ETH_SPEED_NUM_10M;
>> +    } else {
>> +        *max = 0;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static struct ingress_policer *
>>   netdev_dpdk_policer_construct(uint32_t rate, uint32_t burst)
>>   {
>> @@ -5809,6 +5856,7 @@ parse_vhost_config(const struct smap *ovs_other_config)
>>       .get_stats = netdev_dpdk_get_stats,                 \
>>       .get_custom_stats = netdev_dpdk_get_custom_stats,   \
>>       .get_features = netdev_dpdk_get_features,           \
>> +    .get_speed = netdev_dpdk_get_speed,                 \
>>       .get_status = netdev_dpdk_get_status,               \
>>       .reconfigure = netdev_dpdk_reconfigure,             \
>>       .rxq_recv = netdev_dpdk_rxq_recv
>> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
>> index deb015bdb..39508090c 100644
>> --- a/lib/netdev-linux-private.h
>> +++ b/lib/netdev-linux-private.h
>> @@ -92,6 +92,7 @@ struct netdev_linux {
>>       enum netdev_features current;    /* Cached from ETHTOOL_GSET. */
>>       enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
>>       enum netdev_features supported;  /* Cached from ETHTOOL_GSET. */
>> +    uint32_t speed;                  /* Cached from ETHTOOL_GSET. */
> 
> Might make sense to name it 'current_speed' ?
> 

Sure. Will do.

>>   
>>       struct ethtool_drvinfo drvinfo;  /* Cached from ETHTOOL_GDRVINFO. */
>>       struct tc *tc;
>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
>> index 36620199e..576adbf3f 100644
>> --- a/lib/netdev-linux.c
>> +++ b/lib/netdev-linux.c
>> @@ -2348,7 +2348,6 @@ static void
>>   netdev_linux_read_features(struct netdev_linux *netdev)
>>   {
>>       struct ethtool_cmd ecmd;
>> -    uint32_t speed;
>>       int error;
>>   
>>       if (netdev->cache_valid & VALID_FEATURES) {
>> @@ -2462,20 +2461,20 @@ netdev_linux_read_features(struct netdev_linux *netdev)
>>       }
>>   
>>       /* Current settings. */
>> -    speed = ethtool_cmd_speed(&ecmd);
>> -    if (speed == SPEED_10) {
>> +    netdev->speed = ethtool_cmd_speed(&ecmd);
>> +    if (netdev->speed == SPEED_10) {
>>           netdev->current = ecmd.duplex ? NETDEV_F_10MB_FD : NETDEV_F_10MB_HD;
>> -    } else if (speed == SPEED_100) {
>> +    } else if (netdev->speed == SPEED_100) {
>>           netdev->current = ecmd.duplex ? NETDEV_F_100MB_FD : NETDEV_F_100MB_HD;
>> -    } else if (speed == SPEED_1000) {
>> +    } else if (netdev->speed == SPEED_1000) {
>>           netdev->current = ecmd.duplex ? NETDEV_F_1GB_FD : NETDEV_F_1GB_HD;
>> -    } else if (speed == SPEED_10000) {
>> +    } else if (netdev->speed == SPEED_10000) {
>>           netdev->current = NETDEV_F_10GB_FD;
>> -    } else if (speed == 40000) {
>> +    } else if (netdev->speed == 40000) {
>>           netdev->current = NETDEV_F_40GB_FD;
>> -    } else if (speed == 100000) {
>> +    } else if (netdev->speed == 100000) {
>>           netdev->current = NETDEV_F_100GB_FD;
>> -    } else if (speed == 1000000) {
>> +    } else if (netdev->speed == 1000000) {
>>           netdev->current = NETDEV_F_1TB_FD;
>>       } else {
>>           netdev->current = 0;
>> @@ -2529,6 +2528,31 @@ exit:
>>       return error;
>>   }
>>   
>> +static int
>> +netdev_linux_get_speed(const struct netdev *netdev_, uint32_t *current,
>> +                       uint32_t *max)
>> +{
>> +    struct netdev_linux *netdev = netdev_linux_cast(netdev_);
>> +    int error;
>> +
>> +    ovs_mutex_lock(&netdev->mutex);
>> +    if (netdev_linux_netnsid_is_remote(netdev)) {
>> +        error = EOPNOTSUPP;
>> +        goto exit;
>> +    }
>> +
>> +    netdev_linux_read_features(netdev);
>> +    if (!netdev->get_features_error) {
>> +        *current = netdev->speed == SPEED_UNKNOWN ? 0 : netdev->speed;
>> +        *max = netdev_features_to_bps(netdev->supported, 0) / 1000000ULL;
>> +    }
>> +    error = netdev->get_features_error;
>> +
>> +exit:
>> +    ovs_mutex_unlock(&netdev->mutex);
>> +    return error;
>> +}
>> +
>>   /* Set the features advertised by 'netdev' to 'advertise'. */
>>   static int
>>   netdev_linux_set_advertisements(struct netdev *netdev_,
>> @@ -3655,6 +3679,7 @@ const struct netdev_class netdev_linux_class = {
>>       .destruct = netdev_linux_destruct,
>>       .get_stats = netdev_linux_get_stats,
>>       .get_features = netdev_linux_get_features,
>> +    .get_speed = netdev_linux_get_speed,
>>       .get_status = netdev_linux_get_status,
>>       .get_block_id = netdev_linux_get_block_id,
>>       .send = netdev_linux_send,
>> @@ -3671,6 +3696,7 @@ const struct netdev_class netdev_tap_class = {
>>       .destruct = netdev_linux_destruct,
>>       .get_stats = netdev_tap_get_stats,
>>       .get_features = netdev_linux_get_features,
>> +    .get_speed = netdev_linux_get_speed,
>>       .get_status = netdev_linux_get_status,
>>       .send = netdev_linux_send,
>>       .rxq_construct = netdev_linux_rxq_construct,
>> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
>> index b5420947d..cf17aaea9 100644
>> --- a/lib/netdev-provider.h
>> +++ b/lib/netdev-provider.h
>> @@ -500,6 +500,15 @@ struct netdev_class {
>>                           enum netdev_features *supported,
>>                           enum netdev_features *peer);
>>   
>> +    /* Stores the current and maximum supported link speed by 'netdev' into
>> +     * each of '*current' and '*max'. Each value represents the speed in Mbps.
>> +     * If any of the speeds is unknown, a zero value must be returned.
> 
> s/returned/stored/ ?
> 
>> +     *
>> +     * This function may be set to null if it would always return EOPNOTSUPP.
>> +     */
>> +    int (*get_speed)(const struct netdev *netdev, uint32_t *current,
>> +                     uint32_t *max);
>> +
>>       /* Set the features advertised by 'netdev' to 'advertise', which is a
>>        * set of NETDEV_F_* bits.
>>        *
>> diff --git a/lib/netdev.c b/lib/netdev.c
>> index c79778378..aea4e2d46 100644
>> --- a/lib/netdev.c
>> +++ b/lib/netdev.c
>> @@ -1167,6 +1167,29 @@ netdev_get_features(const struct netdev *netdev,
>>       return error;
>>   }
>>   
>> +int
>> +netdev_get_speed(const struct netdev *netdev, uint32_t *current, uint32_t *max)
>> +{
>> +    uint32_t current_dummy, max_dummy;
>> +    int error;
>> +
>> +    if (!current) {
>> +        current = &current_dummy;
>> +    }
>> +    if (!max) {
>> +        max = &max_dummy;
>> +    }
>> +
>> +    error = netdev->netdev_class->get_speed
>> +        ? netdev->netdev_class->get_speed(netdev, current, max)
>> +        : EOPNOTSUPP;
> 
> Please, indent ? and : to the level where condition starts, i.e.
> shift 4 spaces to the rigth.  Same goes to other ternary blocks
> in the patch set.
> 
>> +
>> +    if (error) {
>> +        *current = *max = 0;
>> +    }
>> +    return error;
>> +}
>> +
>>   /* Returns the maximum speed of a network connection that has the NETDEV_F_*
>>    * bits in 'features', in bits per second.  If no bits that indicate a speed
>>    * are set in 'features', returns 'default_bps'. */
>> diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
>> index a405eb056..7a8f259a2 100644
>> --- a/ofproto/ofproto-dpif-sflow.c
>> +++ b/ofproto/ofproto-dpif-sflow.c
>> @@ -307,6 +307,7 @@ sflow_agent_get_counters(void *ds_, SFLPoller *poller,
>>       enum netdev_flags flags;
>>       struct lacp_member_stats lacp_stats;
>>       const char *ifName;
>> +    uint32_t curr_speed;
> 
> We don't have a reverse x-mass tree here, but it might be better
> to not make it worse.
> 
>>   
>>       dsp = dpif_sflow_find_port(ds, u32_to_odp(poller->bridgePort));
>>       if (!dsp) {
>> @@ -320,13 +321,18 @@ sflow_agent_get_counters(void *ds_, SFLPoller *poller,
>>       if (!netdev_get_features(dsp->ofport->netdev, &current, NULL, NULL, NULL)) {
>>           /* The values of ifDirection come from MAU MIB (RFC 2668): 0 = unknown,
>>              1 = full-duplex, 2 = half-duplex, 3 = in, 4=out */
>> -        counters->ifSpeed = netdev_features_to_bps(current, 0);
>>           counters->ifDirection = (netdev_features_is_full_duplex(current)
>>                                    ? 1 : 2);
>>       } else {
>> -        counters->ifSpeed = 100000000;
>>           counters->ifDirection = 0;
>>       }
>> +
>> +    if (!netdev_get_speed(dsp->ofport->netdev, &curr_speed, NULL)) {
>> +        counters->ifSpeed = curr_speed * 1000000;
>> +    } else {
>> +        counters->ifSpeed = 100000000;
>> +    }
>> +
>>       if (!netdev_get_flags(dsp->ofport->netdev, &flags) && flags & NETDEV_UP) {
>>           counters->ifStatus = 1; /* ifAdminStatus up. */
>>           if (netdev_get_carrier(dsp->ofport->netdev)) {
>> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
>> index 11cc0c6f6..2df210921 100644
>> --- a/ofproto/ofproto.c
>> +++ b/ofproto/ofproto.c
>> @@ -2478,6 +2478,7 @@ ofport_open(struct ofproto *ofproto,
>>   {
>>       enum netdev_flags flags;
>>       struct netdev *netdev;
>> +    uint32_t curr_speed, max_speed;
>>       int error;
> 
> ditto.
> 
>>   
>>       *p_netdev = NULL;
>> @@ -2514,8 +2515,9 @@ ofport_open(struct ofproto *ofproto,
>>       pp->state = netdev_get_carrier(netdev) ? 0 : OFPUTIL_PS_LINK_DOWN;
>>       netdev_get_features(netdev, &pp->curr, &pp->advertised,
>>                           &pp->supported, &pp->peer);
>> -    pp->curr_speed = netdev_features_to_bps(pp->curr, 0) / 1000;
>> -    pp->max_speed = netdev_features_to_bps(pp->supported, 0) / 1000;
>> +    netdev_get_speed(netdev, &curr_speed, &max_speed);
>> +    pp->curr_speed = curr_speed * 1000;
>> +    pp->max_speed = max_speed * 1000;
>>   
>>       *p_netdev = netdev;
>>       return 0;
>> diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
>> index f5dc59ad0..2059f7315 100644
>> --- a/vswitchd/bridge.c
>> +++ b/vswitchd/bridge.c
>> @@ -1694,11 +1694,12 @@ port_configure_stp(const struct ofproto *ofproto, struct port *port,
>>       if (config_str) {
>>           port_s->path_cost = strtoul(config_str, NULL, 10);
>>       } else {
>> -        enum netdev_features current;
>> -        unsigned int mbps;
>> +        uint32_t mbps;
>>   
>> -        netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
>> -        mbps = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 1000000;
>> +        netdev_get_speed(iface->netdev, &mbps, NULL);
>> +        if (mbps == 0) {
> 
> If (!mbps) ?
> 



>> +            mbps = NETDEV_DEFAULT_BPS / 1000000;
>> +        }
>>           port_s->path_cost = stp_convert_speed_to_cost(mbps);
>>       }
>>   
>> @@ -1777,11 +1778,12 @@ port_configure_rstp(const struct ofproto *ofproto, struct port *port,
>>       if (config_str) {
>>           port_s->path_cost = strtoul(config_str, NULL, 10);
>>       } else {
>> -        enum netdev_features current;
>> -        unsigned int mbps;
>> +        uint32_t mbps;
>>   
>> -        netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
>> -        mbps = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 1000000;
>> +        netdev_get_speed(iface->netdev, &mbps, NULL);
>> +        if (mbps == 0) {
>> +            mbps = NETDEV_DEFAULT_BPS / 1000000;
>> +        }
>>           port_s->path_cost = rstp_convert_speed_to_cost(mbps);
>>       }
>>   
>> @@ -2417,6 +2419,7 @@ iface_refresh_netdev_status(struct iface *iface)
>>       const char *link_state;
>>       struct eth_addr mac;
>>       int64_t bps, mtu_64, ifindex64, link_resets;
>> +    uint32_t mbps;
>>       int mtu, error;
> 
> Same here.
> 
>>   
>>       if (iface_is_synthetic(iface)) {
>> @@ -2456,14 +2459,19 @@ iface_refresh_netdev_status(struct iface *iface)
>>       ovsrec_interface_set_link_resets(iface->cfg, &link_resets, 1);
>>   
>>       error = netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
>> -    bps = !error ? netdev_features_to_bps(current, 0) : 0;
>> -    if (bps) {
>> +    if (!error) {
>>           ovsrec_interface_set_duplex(iface->cfg,
>>                                       netdev_features_is_full_duplex(current)
>>                                       ? "full" : "half");
>> -        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
>>       } else {
>>           ovsrec_interface_set_duplex(iface->cfg, NULL);
>> +    }
>> +
>> +    error = netdev_get_speed(iface->netdev, &mbps, NULL);
>> +    if (!error) {
> 
> Should we check for error or for mbps being zero?
> 

I thought of showing zero for unknown addresses (i.e. no !error && !mbps) as 
ofproto layer also shows zero if the speed is unknown, but thinking again this 
changing the existing behavior which shows NULL link speed (i.e: []) in that 
case. I'll change it in the next version.

>> +        bps = (uint64_t) mbps * 1000000;
>> +        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
>> +    } else {
>>           ovsrec_interface_set_link_speed(iface->cfg, NULL, 0);
>>       }
>>   
>
Ilya Maximets May 5, 2023, 9:56 a.m. UTC | #5
On 5/5/23 10:04, Adrian Moreno wrote:
> 
> 
> On 5/4/23 17:52, Ilya Maximets wrote:
>> On 4/21/23 17:16, Adrian Moreno wrote:
>>> Currently, the netdev's speed is being calculated by taking the link's
>>> feature bits (using netdev_get_features()) and transforming them into
>>> bps.
>>>
>>> This mechanism can be both inaccurate and difficult to maintain, mainly
>>> because we currently use the feature bits supported by OpenFlow which
>>> would have to be extended to support all new feature bits of all netdev
>>> implementations while keeping the OpenFlow API intact.
>>>
>>> In order to expose the link speed accurately for all current and future
>>> hardware, add a new netdev API call that allows the implementations to
>>> provide the current and maximum link speeds in Mbps.
>>>
>>> Internally, the logic to get the maximum supported speed still relies on
>>> feature bits so it might still get out of sync in the future. However,
>>> the maximum configurable speed is not used as much as the current speed
>>> and these feature bits are not exposed through the netdev interface so
>>> it should be easier to add more.
>>>
>>> Use this new function instead of netdev_get_features() where the link
>>> speed is needed.
>>>
>>> As a consecuence of this patch, link speeds of cards is properly
>>> reported (internally in OVSDB) even if not supported by OpenFlow.
>>
>> Hi, Adrian.  Thanks for the set!  I didn't test it, but I have a few
>> comments inline.
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
>>
>> I'm not sure about this link.  The BZ talks about switching to
>> ETHTOOL_GLINKSETTINGS interface to get speeds that do not have
>> predefined values.  Current patch is a step forward to be able
>> to support other link speeds, but IIUC, we will still not able
>> to detect 25 and 50 Gbps links in netdev-linux.
> 
> It's true that we will not be able to report a bit in the feature bitmask that indicates 25 or 50 Gbps. I think this is not a huge deal because we don't have those flags in the ofproto layer anyway.
> 
> However, with this patch netdev-linux should be able to read the speed directly from ethtool_cmd_speed() which uses a 32-bit field in Mbps. Using this speed directly (and not the bitmask) to calculate QoS maximum rates and even to report the speed in ofproto port information (OFP >= 1.1) should, IIUC, address the limitations reported in the BZ.

Oh, I see.  I missed the fact that ETHTOOL_GSET reports a numerical
value of the speed.

Is it possible to add a system test for this functionality? e.g. can
we change the speed of a tap or veth interface with ethtool to some
currently unsupported value (50G) ?

OTOH, ETHTOOL_GSET is still a deprecated interface, so we'll need to
eventually move to ETHTOOL_GLINKSETTINGS.  But that can be a separate
change.

Best regards, Ilya Maximets.
Adrian Moreno May 5, 2023, 10:39 a.m. UTC | #6
On 5/5/23 11:56, Ilya Maximets wrote:
> On 5/5/23 10:04, Adrian Moreno wrote:
>>
>>
>> On 5/4/23 17:52, Ilya Maximets wrote:
>>> On 4/21/23 17:16, Adrian Moreno wrote:
>>>> Currently, the netdev's speed is being calculated by taking the link's
>>>> feature bits (using netdev_get_features()) and transforming them into
>>>> bps.
>>>>
>>>> This mechanism can be both inaccurate and difficult to maintain, mainly
>>>> because we currently use the feature bits supported by OpenFlow which
>>>> would have to be extended to support all new feature bits of all netdev
>>>> implementations while keeping the OpenFlow API intact.
>>>>
>>>> In order to expose the link speed accurately for all current and future
>>>> hardware, add a new netdev API call that allows the implementations to
>>>> provide the current and maximum link speeds in Mbps.
>>>>
>>>> Internally, the logic to get the maximum supported speed still relies on
>>>> feature bits so it might still get out of sync in the future. However,
>>>> the maximum configurable speed is not used as much as the current speed
>>>> and these feature bits are not exposed through the netdev interface so
>>>> it should be easier to add more.
>>>>
>>>> Use this new function instead of netdev_get_features() where the link
>>>> speed is needed.
>>>>
>>>> As a consecuence of this patch, link speeds of cards is properly
>>>> reported (internally in OVSDB) even if not supported by OpenFlow.
>>>
>>> Hi, Adrian.  Thanks for the set!  I didn't test it, but I have a few
>>> comments inline.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>>>
>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
>>>
>>> I'm not sure about this link.  The BZ talks about switching to
>>> ETHTOOL_GLINKSETTINGS interface to get speeds that do not have
>>> predefined values.  Current patch is a step forward to be able
>>> to support other link speeds, but IIUC, we will still not able
>>> to detect 25 and 50 Gbps links in netdev-linux.
>>
>> It's true that we will not be able to report a bit in the feature bitmask that indicates 25 or 50 Gbps. I think this is not a huge deal because we don't have those flags in the ofproto layer anyway.
>>
>> However, with this patch netdev-linux should be able to read the speed directly from ethtool_cmd_speed() which uses a 32-bit field in Mbps. Using this speed directly (and not the bitmask) to calculate QoS maximum rates and even to report the speed in ofproto port information (OFP >= 1.1) should, IIUC, address the limitations reported in the BZ.
> 
> Oh, I see.  I missed the fact that ETHTOOL_GSET reports a numerical
> value of the speed.
> 
> Is it possible to add a system test for this functionality? e.g. can
> we change the speed of a tap or veth interface with ethtool to some
> currently unsupported value (50G) ?
> 
> OTOH, ETHTOOL_GSET is still a deprecated interface, so we'll need to
> eventually move to ETHTOOL_GLINKSETTINGS.  But that can be a separate
> change.
> 

That's true. However, there is a compatibility layer so even if new drivers 
don't implement the old driver API, it'll still be usable from uapi. What the 
new API definitely improves the amount (and extensibility) of feature bits.

But as long as the compatibility layer properly reports the numeric speed, we 
will be accurate in reporting the link speed (as well as calculating QoS max 
rates, etc).

Now, if we evaluate the benefit OVS will take from the new API, I think it won't 
be very big since OFP layer will not support the new feature bits. The only real 
benefit would be increased accuracy in the numeric maximum speed reported since 
the only way is to calculate this based on the feature bits. My feeling is that 
it's not that big of a benefit compared with the complexity of having to 
maintain both APIs, what do you think?


> Best regards, Ilya Maximets.
>
Ilya Maximets May 5, 2023, 10:54 a.m. UTC | #7
On 5/5/23 12:39, Adrian Moreno wrote:
> 
> 
> On 5/5/23 11:56, Ilya Maximets wrote:
>> On 5/5/23 10:04, Adrian Moreno wrote:
>>>
>>>
>>> On 5/4/23 17:52, Ilya Maximets wrote:
>>>> On 4/21/23 17:16, Adrian Moreno wrote:
>>>>> Currently, the netdev's speed is being calculated by taking the link's
>>>>> feature bits (using netdev_get_features()) and transforming them into
>>>>> bps.
>>>>>
>>>>> This mechanism can be both inaccurate and difficult to maintain, mainly
>>>>> because we currently use the feature bits supported by OpenFlow which
>>>>> would have to be extended to support all new feature bits of all netdev
>>>>> implementations while keeping the OpenFlow API intact.
>>>>>
>>>>> In order to expose the link speed accurately for all current and future
>>>>> hardware, add a new netdev API call that allows the implementations to
>>>>> provide the current and maximum link speeds in Mbps.
>>>>>
>>>>> Internally, the logic to get the maximum supported speed still relies on
>>>>> feature bits so it might still get out of sync in the future. However,
>>>>> the maximum configurable speed is not used as much as the current speed
>>>>> and these feature bits are not exposed through the netdev interface so
>>>>> it should be easier to add more.
>>>>>
>>>>> Use this new function instead of netdev_get_features() where the link
>>>>> speed is needed.
>>>>>
>>>>> As a consecuence of this patch, link speeds of cards is properly
>>>>> reported (internally in OVSDB) even if not supported by OpenFlow.
>>>>
>>>> Hi, Adrian.  Thanks for the set!  I didn't test it, but I have a few
>>>> comments inline.
>>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>>>
>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
>>>>
>>>> I'm not sure about this link.  The BZ talks about switching to
>>>> ETHTOOL_GLINKSETTINGS interface to get speeds that do not have
>>>> predefined values.  Current patch is a step forward to be able
>>>> to support other link speeds, but IIUC, we will still not able
>>>> to detect 25 and 50 Gbps links in netdev-linux.
>>>
>>> It's true that we will not be able to report a bit in the feature bitmask that indicates 25 or 50 Gbps. I think this is not a huge deal because we don't have those flags in the ofproto layer anyway.
>>>
>>> However, with this patch netdev-linux should be able to read the speed directly from ethtool_cmd_speed() which uses a 32-bit field in Mbps. Using this speed directly (and not the bitmask) to calculate QoS maximum rates and even to report the speed in ofproto port information (OFP >= 1.1) should, IIUC, address the limitations reported in the BZ.
>>
>> Oh, I see.  I missed the fact that ETHTOOL_GSET reports a numerical
>> value of the speed.
>>
>> Is it possible to add a system test for this functionality? e.g. can
>> we change the speed of a tap or veth interface with ethtool to some
>> currently unsupported value (50G) ?
>>
>> OTOH, ETHTOOL_GSET is still a deprecated interface, so we'll need to
>> eventually move to ETHTOOL_GLINKSETTINGS.  But that can be a separate
>> change.
>>
> 
> That's true. However, there is a compatibility layer so even if new drivers don't implement the old driver API, it'll still be usable from uapi. What the new API definitely improves the amount (and extensibility) of feature bits.
> 
> But as long as the compatibility layer properly reports the numeric speed, we will be accurate in reporting the link speed (as well as calculating QoS max rates, etc).
> 
> Now, if we evaluate the benefit OVS will take from the new API, I think it won't be very big since OFP layer will not support the new feature bits. The only real benefit would be increased accuracy in the numeric maximum speed reported since the only way is to calculate this based on the feature bits. My feeling is that it's not that big of a benefit compared with the complexity of having to maintain both APIs, what do you think?

I agree that there is no much benefit today.  And I would not expect the
interface to actually be removed any time soon, even though it's clearly
marked as deprecated.  We can revisit this later, if needed.

BTW, do we need a get_speed() implementation for netdev-bsd ?

Best regards, Ilya Maximets.
Adrian Moreno May 5, 2023, 11:42 a.m. UTC | #8
On 5/5/23 12:54, Ilya Maximets wrote:
> On 5/5/23 12:39, Adrian Moreno wrote:
>>
>>
>> On 5/5/23 11:56, Ilya Maximets wrote:
>>> On 5/5/23 10:04, Adrian Moreno wrote:
>>>>
>>>>
>>>> On 5/4/23 17:52, Ilya Maximets wrote:
>>>>> On 4/21/23 17:16, Adrian Moreno wrote:
>>>>>> Currently, the netdev's speed is being calculated by taking the link's
>>>>>> feature bits (using netdev_get_features()) and transforming them into
>>>>>> bps.
>>>>>>
>>>>>> This mechanism can be both inaccurate and difficult to maintain, mainly
>>>>>> because we currently use the feature bits supported by OpenFlow which
>>>>>> would have to be extended to support all new feature bits of all netdev
>>>>>> implementations while keeping the OpenFlow API intact.
>>>>>>
>>>>>> In order to expose the link speed accurately for all current and future
>>>>>> hardware, add a new netdev API call that allows the implementations to
>>>>>> provide the current and maximum link speeds in Mbps.
>>>>>>
>>>>>> Internally, the logic to get the maximum supported speed still relies on
>>>>>> feature bits so it might still get out of sync in the future. However,
>>>>>> the maximum configurable speed is not used as much as the current speed
>>>>>> and these feature bits are not exposed through the netdev interface so
>>>>>> it should be easier to add more.
>>>>>>
>>>>>> Use this new function instead of netdev_get_features() where the link
>>>>>> speed is needed.
>>>>>>
>>>>>> As a consecuence of this patch, link speeds of cards is properly
>>>>>> reported (internally in OVSDB) even if not supported by OpenFlow.
>>>>>
>>>>> Hi, Adrian.  Thanks for the set!  I didn't test it, but I have a few
>>>>> comments inline.
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>>>
>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2137567
>>>>>
>>>>> I'm not sure about this link.  The BZ talks about switching to
>>>>> ETHTOOL_GLINKSETTINGS interface to get speeds that do not have
>>>>> predefined values.  Current patch is a step forward to be able
>>>>> to support other link speeds, but IIUC, we will still not able
>>>>> to detect 25 and 50 Gbps links in netdev-linux.
>>>>
>>>> It's true that we will not be able to report a bit in the feature bitmask that indicates 25 or 50 Gbps. I think this is not a huge deal because we don't have those flags in the ofproto layer anyway.
>>>>
>>>> However, with this patch netdev-linux should be able to read the speed directly from ethtool_cmd_speed() which uses a 32-bit field in Mbps. Using this speed directly (and not the bitmask) to calculate QoS maximum rates and even to report the speed in ofproto port information (OFP >= 1.1) should, IIUC, address the limitations reported in the BZ.
>>>
>>> Oh, I see.  I missed the fact that ETHTOOL_GSET reports a numerical
>>> value of the speed.
>>>
>>> Is it possible to add a system test for this functionality? e.g. can
>>> we change the speed of a tap or veth interface with ethtool to some
>>> currently unsupported value (50G) ?
>>>
>>> OTOH, ETHTOOL_GSET is still a deprecated interface, so we'll need to
>>> eventually move to ETHTOOL_GLINKSETTINGS.  But that can be a separate
>>> change.
>>>
>>
>> That's true. However, there is a compatibility layer so even if new drivers don't implement the old driver API, it'll still be usable from uapi. What the new API definitely improves the amount (and extensibility) of feature bits.
>>
>> But as long as the compatibility layer properly reports the numeric speed, we will be accurate in reporting the link speed (as well as calculating QoS max rates, etc).
>>
>> Now, if we evaluate the benefit OVS will take from the new API, I think it won't be very big since OFP layer will not support the new feature bits. The only real benefit would be increased accuracy in the numeric maximum speed reported since the only way is to calculate this based on the feature bits. My feeling is that it's not that big of a benefit compared with the complexity of having to maintain both APIs, what do you think?
> 
> I agree that there is no much benefit today.  And I would not expect the
> interface to actually be removed any time soon, even though it's clearly
> marked as deprecated.  We can revisit this later, if needed.
> 
> BTW, do we need a get_speed() implementation for netdev-bsd ?
> 

Good point. I can add one that falls back to using feature flags to determine 
the speed as I don't see any way of getting the numeric value (I am no bsd 
expert so I might be missing something).

> Best regards, Ilya Maximets.
>
diff mbox series

Patch

diff --git a/include/openvswitch/netdev.h b/include/openvswitch/netdev.h
index cafd6fd7b..83e8633dd 100644
--- a/include/openvswitch/netdev.h
+++ b/include/openvswitch/netdev.h
@@ -132,6 +132,7 @@  int netdev_get_features(const struct netdev *,
                         enum netdev_features *advertised,
                         enum netdev_features *supported,
                         enum netdev_features *peer);
+int netdev_get_speed(const struct netdev *, uint32_t *current, uint32_t *max);
 uint64_t netdev_features_to_bps(enum netdev_features features,
                                 uint64_t default_bps);
 bool netdev_features_is_full_duplex(enum netdev_features features);
diff --git a/lib/dpif.h b/lib/dpif.h
index 129cbf6a1..9e9d0aa1b 100644
--- a/lib/dpif.h
+++ b/lib/dpif.h
@@ -91,7 +91,9 @@ 
  *
  *    - Carrier status (netdev_get_carrier()).
  *
- *    - Speed (netdev_get_features()).
+ *    - Link features (netdev_get_features()).
+ *
+ *    - Speed (netdev_get_speed()).
  *
  *    - QoS queue configuration (netdev_get_queue(), netdev_set_queue() and
  *      related functions.)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index fb0dd43f7..b136f25a4 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -3505,6 +3505,53 @@  netdev_dpdk_get_features(const struct netdev *netdev,
     return 0;
 }
 
+static int
+netdev_dpdk_get_speed(const struct netdev *netdev, uint32_t *current,
+                      uint32_t *max)
+{
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    struct rte_eth_link link;
+    struct rte_eth_dev_info dev_info;
+
+    ovs_mutex_lock(&dev->mutex);
+    link = dev->link;
+    rte_eth_dev_info_get(dev->port_id, &dev_info);
+    ovs_mutex_unlock(&dev->mutex);
+
+    *current = link.link_speed != RTE_ETH_SPEED_NUM_UNKNOWN
+        ? link.link_speed : 0;
+
+    if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_200G) {
+        *max = RTE_ETH_SPEED_NUM_200G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_100G) {
+        *max = RTE_ETH_SPEED_NUM_100G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_56G) {
+        *max = RTE_ETH_SPEED_NUM_56G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_50G) {
+        *max = RTE_ETH_SPEED_NUM_50G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_20G) {
+        *max = RTE_ETH_SPEED_NUM_20G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_10G) {
+        *max = RTE_ETH_SPEED_NUM_10G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_5G) {
+        *max = RTE_ETH_SPEED_NUM_5G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_2_5G) {
+        *max = RTE_ETH_SPEED_NUM_2_5G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_1G) {
+        *max = RTE_ETH_SPEED_NUM_1G;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_100M ||
+        dev_info.speed_capa & RTE_ETH_LINK_SPEED_100M_HD) {
+        *max = RTE_ETH_SPEED_NUM_100M;
+    } else if (dev_info.speed_capa & RTE_ETH_LINK_SPEED_10M ||
+        dev_info.speed_capa & RTE_ETH_LINK_SPEED_10M_HD) {
+        *max = RTE_ETH_SPEED_NUM_10M;
+    } else {
+        *max = 0;
+    }
+
+    return 0;
+}
+
 static struct ingress_policer *
 netdev_dpdk_policer_construct(uint32_t rate, uint32_t burst)
 {
@@ -5809,6 +5856,7 @@  parse_vhost_config(const struct smap *ovs_other_config)
     .get_stats = netdev_dpdk_get_stats,                 \
     .get_custom_stats = netdev_dpdk_get_custom_stats,   \
     .get_features = netdev_dpdk_get_features,           \
+    .get_speed = netdev_dpdk_get_speed,                 \
     .get_status = netdev_dpdk_get_status,               \
     .reconfigure = netdev_dpdk_reconfigure,             \
     .rxq_recv = netdev_dpdk_rxq_recv
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
index deb015bdb..39508090c 100644
--- a/lib/netdev-linux-private.h
+++ b/lib/netdev-linux-private.h
@@ -92,6 +92,7 @@  struct netdev_linux {
     enum netdev_features current;    /* Cached from ETHTOOL_GSET. */
     enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */
     enum netdev_features supported;  /* Cached from ETHTOOL_GSET. */
+    uint32_t speed;                  /* Cached from ETHTOOL_GSET. */
 
     struct ethtool_drvinfo drvinfo;  /* Cached from ETHTOOL_GDRVINFO. */
     struct tc *tc;
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 36620199e..576adbf3f 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2348,7 +2348,6 @@  static void
 netdev_linux_read_features(struct netdev_linux *netdev)
 {
     struct ethtool_cmd ecmd;
-    uint32_t speed;
     int error;
 
     if (netdev->cache_valid & VALID_FEATURES) {
@@ -2462,20 +2461,20 @@  netdev_linux_read_features(struct netdev_linux *netdev)
     }
 
     /* Current settings. */
-    speed = ethtool_cmd_speed(&ecmd);
-    if (speed == SPEED_10) {
+    netdev->speed = ethtool_cmd_speed(&ecmd);
+    if (netdev->speed == SPEED_10) {
         netdev->current = ecmd.duplex ? NETDEV_F_10MB_FD : NETDEV_F_10MB_HD;
-    } else if (speed == SPEED_100) {
+    } else if (netdev->speed == SPEED_100) {
         netdev->current = ecmd.duplex ? NETDEV_F_100MB_FD : NETDEV_F_100MB_HD;
-    } else if (speed == SPEED_1000) {
+    } else if (netdev->speed == SPEED_1000) {
         netdev->current = ecmd.duplex ? NETDEV_F_1GB_FD : NETDEV_F_1GB_HD;
-    } else if (speed == SPEED_10000) {
+    } else if (netdev->speed == SPEED_10000) {
         netdev->current = NETDEV_F_10GB_FD;
-    } else if (speed == 40000) {
+    } else if (netdev->speed == 40000) {
         netdev->current = NETDEV_F_40GB_FD;
-    } else if (speed == 100000) {
+    } else if (netdev->speed == 100000) {
         netdev->current = NETDEV_F_100GB_FD;
-    } else if (speed == 1000000) {
+    } else if (netdev->speed == 1000000) {
         netdev->current = NETDEV_F_1TB_FD;
     } else {
         netdev->current = 0;
@@ -2529,6 +2528,31 @@  exit:
     return error;
 }
 
+static int
+netdev_linux_get_speed(const struct netdev *netdev_, uint32_t *current,
+                       uint32_t *max)
+{
+    struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+    int error;
+
+    ovs_mutex_lock(&netdev->mutex);
+    if (netdev_linux_netnsid_is_remote(netdev)) {
+        error = EOPNOTSUPP;
+        goto exit;
+    }
+
+    netdev_linux_read_features(netdev);
+    if (!netdev->get_features_error) {
+        *current = netdev->speed == SPEED_UNKNOWN ? 0 : netdev->speed;
+        *max = netdev_features_to_bps(netdev->supported, 0) / 1000000ULL;
+    }
+    error = netdev->get_features_error;
+
+exit:
+    ovs_mutex_unlock(&netdev->mutex);
+    return error;
+}
+
 /* Set the features advertised by 'netdev' to 'advertise'. */
 static int
 netdev_linux_set_advertisements(struct netdev *netdev_,
@@ -3655,6 +3679,7 @@  const struct netdev_class netdev_linux_class = {
     .destruct = netdev_linux_destruct,
     .get_stats = netdev_linux_get_stats,
     .get_features = netdev_linux_get_features,
+    .get_speed = netdev_linux_get_speed,
     .get_status = netdev_linux_get_status,
     .get_block_id = netdev_linux_get_block_id,
     .send = netdev_linux_send,
@@ -3671,6 +3696,7 @@  const struct netdev_class netdev_tap_class = {
     .destruct = netdev_linux_destruct,
     .get_stats = netdev_tap_get_stats,
     .get_features = netdev_linux_get_features,
+    .get_speed = netdev_linux_get_speed,
     .get_status = netdev_linux_get_status,
     .send = netdev_linux_send,
     .rxq_construct = netdev_linux_rxq_construct,
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index b5420947d..cf17aaea9 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -500,6 +500,15 @@  struct netdev_class {
                         enum netdev_features *supported,
                         enum netdev_features *peer);
 
+    /* Stores the current and maximum supported link speed by 'netdev' into
+     * each of '*current' and '*max'. Each value represents the speed in Mbps.
+     * If any of the speeds is unknown, a zero value must be returned.
+     *
+     * This function may be set to null if it would always return EOPNOTSUPP.
+     */
+    int (*get_speed)(const struct netdev *netdev, uint32_t *current,
+                     uint32_t *max);
+
     /* Set the features advertised by 'netdev' to 'advertise', which is a
      * set of NETDEV_F_* bits.
      *
diff --git a/lib/netdev.c b/lib/netdev.c
index c79778378..aea4e2d46 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -1167,6 +1167,29 @@  netdev_get_features(const struct netdev *netdev,
     return error;
 }
 
+int
+netdev_get_speed(const struct netdev *netdev, uint32_t *current, uint32_t *max)
+{
+    uint32_t current_dummy, max_dummy;
+    int error;
+
+    if (!current) {
+        current = &current_dummy;
+    }
+    if (!max) {
+        max = &max_dummy;
+    }
+
+    error = netdev->netdev_class->get_speed
+        ? netdev->netdev_class->get_speed(netdev, current, max)
+        : EOPNOTSUPP;
+
+    if (error) {
+        *current = *max = 0;
+    }
+    return error;
+}
+
 /* Returns the maximum speed of a network connection that has the NETDEV_F_*
  * bits in 'features', in bits per second.  If no bits that indicate a speed
  * are set in 'features', returns 'default_bps'. */
diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c
index a405eb056..7a8f259a2 100644
--- a/ofproto/ofproto-dpif-sflow.c
+++ b/ofproto/ofproto-dpif-sflow.c
@@ -307,6 +307,7 @@  sflow_agent_get_counters(void *ds_, SFLPoller *poller,
     enum netdev_flags flags;
     struct lacp_member_stats lacp_stats;
     const char *ifName;
+    uint32_t curr_speed;
 
     dsp = dpif_sflow_find_port(ds, u32_to_odp(poller->bridgePort));
     if (!dsp) {
@@ -320,13 +321,18 @@  sflow_agent_get_counters(void *ds_, SFLPoller *poller,
     if (!netdev_get_features(dsp->ofport->netdev, &current, NULL, NULL, NULL)) {
         /* The values of ifDirection come from MAU MIB (RFC 2668): 0 = unknown,
            1 = full-duplex, 2 = half-duplex, 3 = in, 4=out */
-        counters->ifSpeed = netdev_features_to_bps(current, 0);
         counters->ifDirection = (netdev_features_is_full_duplex(current)
                                  ? 1 : 2);
     } else {
-        counters->ifSpeed = 100000000;
         counters->ifDirection = 0;
     }
+
+    if (!netdev_get_speed(dsp->ofport->netdev, &curr_speed, NULL)) {
+        counters->ifSpeed = curr_speed * 1000000;
+    } else {
+        counters->ifSpeed = 100000000;
+    }
+
     if (!netdev_get_flags(dsp->ofport->netdev, &flags) && flags & NETDEV_UP) {
         counters->ifStatus = 1; /* ifAdminStatus up. */
         if (netdev_get_carrier(dsp->ofport->netdev)) {
diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
index 11cc0c6f6..2df210921 100644
--- a/ofproto/ofproto.c
+++ b/ofproto/ofproto.c
@@ -2478,6 +2478,7 @@  ofport_open(struct ofproto *ofproto,
 {
     enum netdev_flags flags;
     struct netdev *netdev;
+    uint32_t curr_speed, max_speed;
     int error;
 
     *p_netdev = NULL;
@@ -2514,8 +2515,9 @@  ofport_open(struct ofproto *ofproto,
     pp->state = netdev_get_carrier(netdev) ? 0 : OFPUTIL_PS_LINK_DOWN;
     netdev_get_features(netdev, &pp->curr, &pp->advertised,
                         &pp->supported, &pp->peer);
-    pp->curr_speed = netdev_features_to_bps(pp->curr, 0) / 1000;
-    pp->max_speed = netdev_features_to_bps(pp->supported, 0) / 1000;
+    netdev_get_speed(netdev, &curr_speed, &max_speed);
+    pp->curr_speed = curr_speed * 1000;
+    pp->max_speed = max_speed * 1000;
 
     *p_netdev = netdev;
     return 0;
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index f5dc59ad0..2059f7315 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -1694,11 +1694,12 @@  port_configure_stp(const struct ofproto *ofproto, struct port *port,
     if (config_str) {
         port_s->path_cost = strtoul(config_str, NULL, 10);
     } else {
-        enum netdev_features current;
-        unsigned int mbps;
+        uint32_t mbps;
 
-        netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
-        mbps = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 1000000;
+        netdev_get_speed(iface->netdev, &mbps, NULL);
+        if (mbps == 0) {
+            mbps = NETDEV_DEFAULT_BPS / 1000000;
+        }
         port_s->path_cost = stp_convert_speed_to_cost(mbps);
     }
 
@@ -1777,11 +1778,12 @@  port_configure_rstp(const struct ofproto *ofproto, struct port *port,
     if (config_str) {
         port_s->path_cost = strtoul(config_str, NULL, 10);
     } else {
-        enum netdev_features current;
-        unsigned int mbps;
+        uint32_t mbps;
 
-        netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
-        mbps = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 1000000;
+        netdev_get_speed(iface->netdev, &mbps, NULL);
+        if (mbps == 0) {
+            mbps = NETDEV_DEFAULT_BPS / 1000000;
+        }
         port_s->path_cost = rstp_convert_speed_to_cost(mbps);
     }
 
@@ -2417,6 +2419,7 @@  iface_refresh_netdev_status(struct iface *iface)
     const char *link_state;
     struct eth_addr mac;
     int64_t bps, mtu_64, ifindex64, link_resets;
+    uint32_t mbps;
     int mtu, error;
 
     if (iface_is_synthetic(iface)) {
@@ -2456,14 +2459,19 @@  iface_refresh_netdev_status(struct iface *iface)
     ovsrec_interface_set_link_resets(iface->cfg, &link_resets, 1);
 
     error = netdev_get_features(iface->netdev, &current, NULL, NULL, NULL);
-    bps = !error ? netdev_features_to_bps(current, 0) : 0;
-    if (bps) {
+    if (!error) {
         ovsrec_interface_set_duplex(iface->cfg,
                                     netdev_features_is_full_duplex(current)
                                     ? "full" : "half");
-        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
     } else {
         ovsrec_interface_set_duplex(iface->cfg, NULL);
+    }
+
+    error = netdev_get_speed(iface->netdev, &mbps, NULL);
+    if (!error) {
+        bps = (uint64_t) mbps * 1000000;
+        ovsrec_interface_set_link_speed(iface->cfg, &bps, 1);
+    } else {
         ovsrec_interface_set_link_speed(iface->cfg, NULL, 0);
     }