diff mbox series

[endianness,bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts

Message ID 20180805172238.GH15082@ZenIV.linux.org.uk
State Accepted, archived
Delegated to: David Miller
Headers show
Series [endianness,bug] cxgb4: mk_act_open_req() buggers ->{local,peer}_ip on big-endian hosts | expand

Commit Message

Al Viro Aug. 5, 2018, 5:22 p.m. UTC
Unlike fs.val.lport and fs.val.fport, cxgb4_process_flow_match()
sets fs.val.{l,f}ip to net-endian values without conversion - they come
straight from flow_dissector_key_ipv4_addrs ->dst and ->src resp.  So
the assignment in mk_act_open_req() ought to be a straigh copy.

	As far as I know, T4 PCIe cards do exist, so it's not as if that
thing could only be found on little-endian systems...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

Comments

Rahul Lakkireddy Aug. 6, 2018, 12:15 p.m. UTC | #1
On Sunday, August 08/05/18, 2018 at 22:52:38 +0530, Al Viro wrote:
> 	Unlike fs.val.lport and fs.val.fport, cxgb4_process_flow_match()
> sets fs.val.{l,f}ip to net-endian values without conversion - they come
> straight from flow_dissector_key_ipv4_addrs ->dst and ->src resp.  So
> the assignment in mk_act_open_req() ought to be a straigh copy.
> 
> 	As far as I know, T4 PCIe cards do exist, so it's not as if that
> thing could only be found on little-endian systems...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> index 00fc5f1afb1d..7dddb9e748b8 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> @@ -1038,10 +1038,8 @@ static void mk_act_open_req(struct filter_entry *f, struct sk_buff *skb,
>  	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_ACT_OPEN_REQ, qid_filterid));
>  	req->local_port = cpu_to_be16(f->fs.val.lport);
>  	req->peer_port = cpu_to_be16(f->fs.val.fport);
> -	req->local_ip = f->fs.val.lip[0] | f->fs.val.lip[1] << 8 |
> -		f->fs.val.lip[2] << 16 | f->fs.val.lip[3] << 24;
> -	req->peer_ip = f->fs.val.fip[0] | f->fs.val.fip[1] << 8 |
> -		f->fs.val.fip[2] << 16 | f->fs.val.fip[3] << 24;
> +	memcpy(&req->local_ip, f->fs.val.lip, 4);
> +	memcpy(&req->peer_ip, f->fs.val.fip, 4);
>  	req->opt0 = cpu_to_be64(NAGLE_V(f->fs.newvlan == VLAN_REMOVE ||
>  					f->fs.newvlan == VLAN_REWRITE) |
>  				DELACK_V(f->fs.hitcnts) |

Thanks for fix Al!

Acked-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
David Miller Aug. 7, 2018, 7:35 p.m. UTC | #2
From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Sun, 5 Aug 2018 18:22:38 +0100

> 	Unlike fs.val.lport and fs.val.fport, cxgb4_process_flow_match()
> sets fs.val.{l,f}ip to net-endian values without conversion - they come
> straight from flow_dissector_key_ipv4_addrs ->dst and ->src resp.  So
> the assignment in mk_act_open_req() ought to be a straigh copy.
> 
> 	As far as I know, T4 PCIe cards do exist, so it's not as if that
> thing could only be found on little-endian systems...
> 
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Applied.
Al Viro Aug. 14, 2018, 9:25 p.m. UTC | #3
How can cxgb4/cxgb4_tc_flower.c handling of 16bit
fields possibly work on b-e?  Look:
        case TCA_PEDIT_KEY_EX_HDR_TYPE_TCP:
                switch (offset) {
                case PEDIT_TCP_SPORT_DPORT:
                        if (~mask & PEDIT_TCP_UDP_SPORT_MASK)
                                offload_pedit(fs, cpu_to_be32(val) >> 16,
                                              cpu_to_be32(mask) >> 16,
                                              TCP_SPORT);

OK, we are feeding two results of >> 16 (i.e. the values in
range 0..65535 from the host POV) to offload_pedit().  Which does

static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
                          u8 field)
{
        u32 set_val = val & ~mask;

OK, it's a value in range 0..65535.

        u32 offset = 0;
        u8 size = 1;
        int i;

        for (i = 0; i < ARRAY_SIZE(pedits); i++) {
                if (pedits[i].field == field) {
go until we finally find this:
        PEDIT_FIELDS(TCP_, SPORT, 2, nat_fport, 0),
i.e.
        {TCP_SPORT, 2, offsetof(struct ch_filter_specification, nat_fport)}
                        offset = pedits[i].offset;
                        size = pedits[i].size;
... resulting in offset = offsetof(..., nat_fport), size = 2
                        break;
                }
        }
        memcpy((u8 *)fs + offset, &set_val, size);
... and we copy the first two bytes of set_val to fs->nat_fport, right?

On little-endian, assuming that val & 0xffff was 256 * V0 + V1 and
mask & 0xffff - 256 * M0 + M1, we get cpu_to_be32(val) >> 16 equal to
256 * V1 + V0, and similar for mask, resuling in set_val containing
{V0 & ~M0, V1 & ~M1, 0, 0}, with the first two bytes copied to fs->nat_fport.

Now, think what will happen on big-endian.  The value in set_val has upper
16 bits all zero, no matter what - shift anything 32bit down by 16 and you'll
get that.  And on big-endian that's first two bytes of memory representation,
so this memcpy() is absolutely guaranteed to set fs->nat_fport to zero.
No matter how fancy the hardware is, it can't guess what had the other two
bytes been - CPU has discarded those before the NIC had a chance to see
them.

Am I right assuming that the val is supposed to be {S1, S0, D1, D0},
with sport == S1 * 256 + S0, dport == D1 * 256 + D0?  If so, the following
ought to work [== COMPLETELY UNTESTED, in other words] on l-e same as the
current code does and do the right thing on b-e.  Objections?

offload_pedit() is broken for big-endian; it's actually easier to spell the
memcpy (and in case of ports - memcpy-with-byteswap) explicitly, avoiding
both the b-e problems and getting rid of a lot of LoC, including an unpleasant
macro.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 3db969eefba9..020ca0121fb4 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -43,27 +43,6 @@
 
 #define STATS_CHECK_PERIOD (HZ / 2)
 
-static struct ch_tc_pedit_fields pedits[] = {
-	PEDIT_FIELDS(ETH_, DMAC_31_0, 4, dmac, 0),
-	PEDIT_FIELDS(ETH_, DMAC_47_32, 2, dmac, 4),
-	PEDIT_FIELDS(ETH_, SMAC_15_0, 2, smac, 0),
-	PEDIT_FIELDS(ETH_, SMAC_47_16, 4, smac, 2),
-	PEDIT_FIELDS(IP4_, SRC, 4, nat_fip, 0),
-	PEDIT_FIELDS(IP4_, DST, 4, nat_lip, 0),
-	PEDIT_FIELDS(IP6_, SRC_31_0, 4, nat_fip, 0),
-	PEDIT_FIELDS(IP6_, SRC_63_32, 4, nat_fip, 4),
-	PEDIT_FIELDS(IP6_, SRC_95_64, 4, nat_fip, 8),
-	PEDIT_FIELDS(IP6_, SRC_127_96, 4, nat_fip, 12),
-	PEDIT_FIELDS(IP6_, DST_31_0, 4, nat_lip, 0),
-	PEDIT_FIELDS(IP6_, DST_63_32, 4, nat_lip, 4),
-	PEDIT_FIELDS(IP6_, DST_95_64, 4, nat_lip, 8),
-	PEDIT_FIELDS(IP6_, DST_127_96, 4, nat_lip, 12),
-	PEDIT_FIELDS(TCP_, SPORT, 2, nat_fport, 0),
-	PEDIT_FIELDS(TCP_, DPORT, 2, nat_lport, 0),
-	PEDIT_FIELDS(UDP_, SPORT, 2, nat_fport, 0),
-	PEDIT_FIELDS(UDP_, DPORT, 2, nat_lport, 0),
-};
-
 static struct ch_tc_flower_entry *allocate_flower_entry(void)
 {
 	struct ch_tc_flower_entry *new = kzalloc(sizeof(*new), GFP_KERNEL);
@@ -306,81 +285,63 @@ static int cxgb4_validate_flow_match(struct net_device *dev,
 	return 0;
 }
 
-static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
-			  u8 field)
-{
-	u32 set_val = val & ~mask;
-	u32 offset = 0;
-	u8 size = 1;
-	int i;
-
-	for (i = 0; i < ARRAY_SIZE(pedits); i++) {
-		if (pedits[i].field == field) {
-			offset = pedits[i].offset;
-			size = pedits[i].size;
-			break;
-		}
-	}
-	memcpy((u8 *)fs + offset, &set_val, size);
-}
-
-static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
-				u32 mask, u32 offset, u8 htype)
+static void process_pedit_field(struct ch_filter_specification *fs, __be32 val,
+				__be32 mask, u32 offset, u8 htype)
 {
+	val &= ~mask;
 	switch (htype) {
 	case TCA_PEDIT_KEY_EX_HDR_TYPE_ETH:
 		switch (offset) {
 		case PEDIT_ETH_DMAC_31_0:
 			fs->newdmac = 1;
-			offload_pedit(fs, val, mask, ETH_DMAC_31_0);
+			memcpy(fs->dmac, &val, 4);
 			break;
 		case PEDIT_ETH_DMAC_47_32_SMAC_15_0:
 			if (~mask & PEDIT_ETH_DMAC_MASK)
-				offload_pedit(fs, val, mask, ETH_DMAC_47_32);
+				memcpy(fs->dmac + 4, &val, 2);
 			else
-				offload_pedit(fs, val >> 16, mask >> 16,
-					      ETH_SMAC_15_0);
+				memcpy(fs->smac, (__be16 *)&val + 1, 2);
 			break;
 		case PEDIT_ETH_SMAC_47_16:
 			fs->newsmac = 1;
-			offload_pedit(fs, val, mask, ETH_SMAC_47_16);
+			memcpy(fs->smac + 2, &val, 4);
 		}
 		break;
 	case TCA_PEDIT_KEY_EX_HDR_TYPE_IP4:
 		switch (offset) {
 		case PEDIT_IP4_SRC:
-			offload_pedit(fs, val, mask, IP4_SRC);
+			memcpy(fs->nat_fip, &val, 4);
 			break;
 		case PEDIT_IP4_DST:
-			offload_pedit(fs, val, mask, IP4_DST);
+			memcpy(fs->nat_lip, &val, 4);
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
 	case TCA_PEDIT_KEY_EX_HDR_TYPE_IP6:
 		switch (offset) {
 		case PEDIT_IP6_SRC_31_0:
-			offload_pedit(fs, val, mask, IP6_SRC_31_0);
+			memcpy(fs->nat_fip, &val, 4);
 			break;
 		case PEDIT_IP6_SRC_63_32:
-			offload_pedit(fs, val, mask, IP6_SRC_63_32);
+			memcpy(fs->nat_fip + 4, &val, 4);
 			break;
 		case PEDIT_IP6_SRC_95_64:
-			offload_pedit(fs, val, mask, IP6_SRC_95_64);
+			memcpy(fs->nat_fip + 8, &val, 4);
 			break;
 		case PEDIT_IP6_SRC_127_96:
-			offload_pedit(fs, val, mask, IP6_SRC_127_96);
+			memcpy(fs->nat_fip + 12, &val, 4);
 			break;
 		case PEDIT_IP6_DST_31_0:
-			offload_pedit(fs, val, mask, IP6_DST_31_0);
+			memcpy(fs->nat_lip, &val, 4);
 			break;
 		case PEDIT_IP6_DST_63_32:
-			offload_pedit(fs, val, mask, IP6_DST_63_32);
+			memcpy(fs->nat_lip + 4, &val, 4);
 			break;
 		case PEDIT_IP6_DST_95_64:
-			offload_pedit(fs, val, mask, IP6_DST_95_64);
+			memcpy(fs->nat_lip + 8, &val, 4);
 			break;
 		case PEDIT_IP6_DST_127_96:
-			offload_pedit(fs, val, mask, IP6_DST_127_96);
+			memcpy(fs->nat_lip + 12, &val, 4);
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
@@ -388,12 +349,9 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
 		switch (offset) {
 		case PEDIT_TCP_SPORT_DPORT:
 			if (~mask & PEDIT_TCP_UDP_SPORT_MASK)
-				offload_pedit(fs, cpu_to_be32(val) >> 16,
-					      cpu_to_be32(mask) >> 16,
-					      TCP_SPORT);
+				fs->nat_fport = be16_to_cpup((__be16 *)&val);
 			else
-				offload_pedit(fs, cpu_to_be32(val),
-					      cpu_to_be32(mask), TCP_DPORT);
+				fs->nat_lport = be16_to_cpup((__be16 *)&val + 1);
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
@@ -401,12 +359,9 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
 		switch (offset) {
 		case PEDIT_UDP_SPORT_DPORT:
 			if (~mask & PEDIT_TCP_UDP_SPORT_MASK)
-				offload_pedit(fs, cpu_to_be32(val) >> 16,
-					      cpu_to_be32(mask) >> 16,
-					      UDP_SPORT);
+				fs->nat_fport = be16_to_cpup((__be16 *)&val);
 			else
-				offload_pedit(fs, cpu_to_be32(val),
-					      cpu_to_be32(mask), UDP_DPORT);
+				fs->nat_lport = be16_to_cpup((__be16 *)&val + 1);
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 	}
@@ -453,7 +408,8 @@ static void cxgb4_process_flow_actions(struct net_device *in,
 				break;
 			}
 		} else if (is_tcf_pedit(a)) {
-			u32 mask, val, offset;
+			__be32 mask, val;
+			u32 offset;
 			int nkeys, i;
 			u8 htype;
 
@@ -471,23 +427,18 @@ static void cxgb4_process_flow_actions(struct net_device *in,
 	}
 }
 
-static bool valid_l4_mask(u32 mask)
+static bool valid_l4_mask(__be32 mask)
 {
-	u16 hi, lo;
-
-	/* Either the upper 16-bits (SPORT) OR the lower
-	 * 16-bits (DPORT) can be set, but NOT BOTH.
+	/* Either the SPORT OR DPORT can be set, but NOT BOTH.
 	 */
-	hi = (mask >> 16) & 0xFFFF;
-	lo = mask & 0xFFFF;
-
-	return hi && lo ? false : true;
+	return !(mask && htonl(0xffff)) || !(mask & htonl(0xffff0000));
 }
 
 static bool valid_pedit_action(struct net_device *dev,
 			       const struct tc_action *a)
 {
-	u32 mask, offset;
+	__be32 mask;
+	u32 offset;
 	u8 cmd, htype;
 	int nkeys, i;
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
index 050c8a50ae41..4da5267726a9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
@@ -54,44 +54,8 @@ struct ch_tc_flower_entry {
 	u32 filter_id;
 };
 
-enum {
-	ETH_DMAC_31_0,	/* dmac bits 0.. 31 */
-	ETH_DMAC_47_32,	/* dmac bits 32..47 */
-	ETH_SMAC_15_0,	/* smac bits 0.. 15 */
-	ETH_SMAC_47_16,	/* smac bits 16..47 */
-
-	IP4_SRC,	/* 32-bit IPv4 src  */
-	IP4_DST,	/* 32-bit IPv4 dst  */
-
-	IP6_SRC_31_0,	/* src bits 0..  31 */
-	IP6_SRC_63_32,	/* src bits 63.. 32 */
-	IP6_SRC_95_64,	/* src bits 95.. 64 */
-	IP6_SRC_127_96,	/* src bits 127..96 */
-
-	IP6_DST_31_0,	/* dst bits 0..  31 */
-	IP6_DST_63_32,	/* dst bits 63.. 32 */
-	IP6_DST_95_64,	/* dst bits 95.. 64 */
-	IP6_DST_127_96,	/* dst bits 127..96 */
-
-	TCP_SPORT,	/* 16-bit TCP sport */
-	TCP_DPORT,	/* 16-bit TCP dport */
-
-	UDP_SPORT,	/* 16-bit UDP sport */
-	UDP_DPORT,	/* 16-bit UDP dport */
-};
-
-struct ch_tc_pedit_fields {
-	u8 field;
-	u8 size;
-	u32 offset;
-};
-
-#define PEDIT_FIELDS(type, field, size, fs_field, offset) \
-	{ type## field, size, \
-		offsetof(struct ch_filter_specification, fs_field) + (offset) }
-
-#define PEDIT_ETH_DMAC_MASK		0xffff
-#define PEDIT_TCP_UDP_SPORT_MASK	0xffff
+#define PEDIT_ETH_DMAC_MASK		htonl(0xffff0000)
+#define PEDIT_TCP_UDP_SPORT_MASK	htonl(0xffff0000)
 #define PEDIT_ETH_DMAC_31_0		0x0
 #define PEDIT_ETH_DMAC_47_32_SMAC_15_0	0x4
 #define PEDIT_ETH_SMAC_47_16		0x8
Ganesh Goudar Aug. 17, 2018, 1:05 p.m. UTC | #4
Thanks, Al. The patch looks good to me but it does not seem
to be showing up in patchwork, should I resend the patch on
your behalf to net tree ?

Ganesh
Al Viro Aug. 17, 2018, 3:43 p.m. UTC | #5
On Fri, Aug 17, 2018 at 06:35:41PM +0530, Ganesh Goudar wrote:
> Thanks, Al. The patch looks good to me but it does not seem
> to be showing up in patchwork, should I resend the patch on
> your behalf to net tree ?

Umm...  I thought net-next had been closed until -rc1, hadn't
it?

Anyway, endianness cleanups and fixes of drivers/net/ethernet/chelsio
can be found in vfs.git #net-endian.chelsio; I was planning to post
that stuff on netdev after -rc1, but I would certainly appreciate
a look from somebody familiar with the hardware prior to that, assuming
you have time for that at the moment...  The stuff in there (it's
based off net/master):
      struct cxgb4_next_header .match_val/.match_mask/mask should be net-endian
      cxgb4: fix TC-PEDIT-related parts of cxgb4_tc_flower on big-endian
      cxgb4_tc_u32: trivial endianness annotations
      cxgb4: trivial endianness annotations
      libcxgb: trivial __percpu annotations
      libcxgb: trivial endianness annotations
      cxgb3: trivial endianness annotations
      cxgb3: don't get cute with !! and shifts in t3_prep_adapter()...
      [investigate][endianness bug] cxgb3: assigning htonl(11-bit value) to __be16 field is wrong
      cxgb: trivial endianness annotations

The first two are fixes (the second being the patch you've just replied to),
next-to-last might or might not be (see "[RFC] weirdness in
cxgb3_main.c:init_tp_parity()" posted to netdev a couple of weeks
ago), the rest is pure annotations.  Result is not entirely sparse-clean, but
fairly close to that:

drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:681:67: warning: incorrect type in argument 3 (different base types)
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:681:67:    expected restricted __le32 [usertype] data
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:681:67:    got int
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:898:31: warning: incorrect type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:898:31:    expected unsigned int [unsigned] [usertype] <noident>
drivers/net/ethernet/chelsio/cxgb3/t3_hw.c:898:31:    got restricted __be32 [usertype] <noident>
drivers/net/ethernet/chelsio/cxgb3/sge.c:2435:43: warning: incorrect type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb3/sge.c:2435:43:    expected restricted __wsum [usertype] csum
drivers/net/ethernet/chelsio/cxgb3/sge.c:2435:43:    got restricted __be32 [assigned] [usertype] rss_hi
drivers/net/ethernet/chelsio/cxgb3/sge.c:2436:47: warning: incorrect type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb3/sge.c:2436:47:    expected unsigned int [unsigned] [usertype] priority
drivers/net/ethernet/chelsio/cxgb3/sge.c:2436:47:    got restricted __be32 [assigned] [usertype] rss_lo
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:419:38: warning: incorrect type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:419:38:    expected restricted __be32 [usertype] mask
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:419:38:    got unsigned int
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:420:37: warning: incorrect type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:420:37:    expected restricted __be32 [usertype] val
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:420:37:    got unsigned int
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:449:22: warning: incorrect type in assignment (different base types)
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:449:22:    expected restricted __be32 [usertype] mask
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c:449:22:    got unsigned int

Remaining tc_flower warnings are misannotated tcf_pedit_mask()/tcf_pedit_val()
(both are actually returning __be32)

sge.c ones are
                                /* Preserve the RSS info in csum & priority */
                                skb->csum = rss_hi;
                                skb->priority = rss_lo;
and I've no idea whether it's a problem or not - ->csum is almost
certainly being abused there, and it looks like ->priority also
is...  I don't know - it's certainly used as host-endian in the
same driver (see e.g. queue_set() and is_ctrl_pkt()), but those are
on TX path and this is RX...  I'm not familiar enough with the
code to tell what's going on here - what *is* the consumer of
the data stored there?  cxgb3_offload.c get_hwtid() and get_opcode()?
If so (and if that's all there is), why not simply something like
	skb->priority = (ntohl(rss_lo) & 0xffff00) | G_OPCODE(ntohl(rss_hi));
before passing skb to rx_offload(), with
static inline u32 get_hwtid(struct sk_buff *skb)
{
        return skb->priority >> 8;
}
static inline u32 get_opcode(struct sk_buff *skb)
{
        return skb->priority & 0xff;
}
and don't bother touching ->csum...  Again, I'm not familiar with hardware
*or* the driver; I've no idea how much is e.g. shared with the infiniband
or scsi sides of the things (hadn't even looked there).  So this one is
up to somebody familiar with the design of that thing.

t3_hw.c: the first one is
        return t3_seeprom_write(adapter, EEPROM_STAT_ADDR, enable ? 0xc : 0);
which looks bloody odd, seeing that t3_seeprom_write() expects __le32 as
the last argument and appears to be serious about that -
        pci_write_config_dword(adapter->pdev, base + PCI_VPD_DATA,
                               le32_to_cpu(data));
is where that 0 or 0xc ends up.  Might or might not be a bug, but I'd
suggest triggering that sucker on big-endian host and checking whether
it works there...

The second in t3_hw.c...  I'd be tempted to make sf1_read() to do
htonl() before storing the result, adjusted the callers (i.e.
                if (!(status & htonl(1)))
in flash_wait_op(), no byteswaps in t3_read_flash()) and
had t3_read_flash() do explicit ntohl() on the value read before
storing it in *vers, but that's cosmetical anyway - making it
easier for sparse (and human readers) to keep track of what's
host- and what's net-endian; no real bug there.

Said that, there's another thing worth looking into - uses of __force
in the driver(s).  I mean, look at
        union {
                u32 word;
                char byte[4];
        } last;
        unsigned char *bp;
        int i;

        if (dir == T4_MEMORY_READ) {
                last.word = le32_to_cpu((__force __le32)
                                        t4_read_reg(adap, addr));
                for (bp = (unsigned char *)buf, i = off; i < 4; i++)
                        bp[i] = last.byte[i];

That "__force __le32" crap is a plain and simple result of confusion -
the code clearly intends last.word to be fixed-endian, since it then
proceeds to access individual bytes via aliasing.  IOW, this
le32_to_cpu() is a misspelled cpu_to_le32(), and the force-cast
is an attempt to confuse sparse into not noticing the misannotation.
All too successful attempt, at that - the other half
        } else {
                last.word = *buf;
                for (i = off; i < 4; i++)
                        last.byte[i] = 0;
                t4_write_reg(adap, addr,
                             (__force u32)cpu_to_le32(last.word));
        }
also wants last.word fixed-endian, which makes the first assignment
very dubious - buf is a pointer to _byte_, so that makes very little
sense.  I very much doubt that this thing (t4_memory_rw_residual())
is correct - e.g. if len = 4n + 2 in "write" t4_memory_rw(), the
first 4n bytes will be fed to hardware and then
	* on little-endian hbuf[4n] will be padded with 3 zeroes and
passed to t4_write_reg(), completely ignoring hbuf[4n+1], while
	* on big-endian zero will be passed to t4_write_reg(),
completely ignoring both hbuf[4n] and hbuf[4n+1].
Looks rather odd, doesn't it?

It smells like
	__le32 v = 0;
	memcpy(&v, buf, off);
	t4_write_reg(adap, addr, le32_to_cpu(v));
on the write side.  And on the read side...  Are you sure you want
to copy 4 - off bytes?  After all, if it was a *read* for 4n+1 bytes,
we'll copy the first 4n bytes from the hardware, then call that thing
with off = 1.  Presumably wanting to copy one more byte, not three...
Should that be
	__le32 v = cpu_to_le32(t4_read_reg(adap, addr));
	memcpy(buf, &v, off);
instead?

Looking at the callers, I'd say that t4_memory_rw() ought to declare
buf as __le32 *, and have
                if (dir == T4_MEMORY_READ)
                        *buf++ = cpu_to_le32(t4_read_reg(adap,
                                                mem_base + offset));
                else
                        t4_write_reg(adap, mem_base + offset,
                                     le32_to_cpu(*buf++));
                offset += sizeof(__le32);
                len -= sizeof(__le32);
and to hell with force-casts in there (BTW, where has __be32 come
from in the mainline?  It's only in sizeof, but still...).

And cudbg_memory_read() looks somewhat fishy in one more way -
you are doing 64bit host memory stores to pointer that is only
32bit-aligned.  Sure, x86 can cope, but what about e.g. sparc64?
And what of the PCIe side of that - what happens if your 64bit
read straddles the aperture window boundary?  You only check that
addr is 32bit-aligned, so what happens if you set it 4 bytes
below the window end and ask to read you 8 bytes?
David Miller Aug. 17, 2018, 4:34 p.m. UTC | #6
From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Fri, 17 Aug 2018 16:43:20 +0100

> Umm...  I thought net-next had been closed until -rc1, hadn't
> it?

That's correct.
Al Viro Aug. 17, 2018, 6:09 p.m. UTC | #7
On Fri, Aug 17, 2018 at 04:43:20PM +0100, Al Viro wrote:
> On Fri, Aug 17, 2018 at 06:35:41PM +0530, Ganesh Goudar wrote:
> > Thanks, Al. The patch looks good to me but it does not seem
> > to be showing up in patchwork, should I resend the patch on
> > your behalf to net tree ?
> 
> Umm...  I thought net-next had been closed until -rc1, hadn't
> it?
> 
> Anyway, endianness cleanups and fixes of drivers/net/ethernet/chelsio
> can be found in vfs.git #net-endian.chelsio; I was planning to post
> that stuff on netdev after -rc1, but I would certainly appreciate
> a look from somebody familiar with the hardware prior to that, assuming
> you have time for that at the moment...  The stuff in there (it's
> based off net/master):
>       struct cxgb4_next_header .match_val/.match_mask/mask should be net-endian
>       cxgb4: fix TC-PEDIT-related parts of cxgb4_tc_flower on big-endian
>       cxgb4_tc_u32: trivial endianness annotations
>       cxgb4: trivial endianness annotations
>       libcxgb: trivial __percpu annotations
>       libcxgb: trivial endianness annotations
>       cxgb3: trivial endianness annotations
>       cxgb3: don't get cute with !! and shifts in t3_prep_adapter()...
>       [investigate][endianness bug] cxgb3: assigning htonl(11-bit value) to __be16 field is wrong
>       cxgb: trivial endianness annotations

... and updated for some of today's catch (== stuff caught while finding the
reply).  Another very likely bug hidden by force-casts: in t4_fwcache()
you put host-endian 0 or 1 into __be32 field, hiding that by force-cast.
Then feed that to hardware, which, judging by everything else nearby
expects big-endian there.  The only reason it works, AFAICS, is that you
only pass it FW_PARAM_DEV_FWCACHE_FLUSH (== 0); as soon as you get
a caller passing FW_PARAM_DEV_FWCACHE_FLUSHINV, you'll get breakage.

And frankly, comments like
                        while (count) {
                                /* the (__force u64) is because the compiler
                                 * doesn't understand the endian swizzling
                                 * going on
                                 */
                                writeq((__force u64)*src, dst);
                                src++;
                                dst++;
                                count--;
                        }
are more than slightly terrifying - piss on compiler, what about reviewers?  And
the authors themselves, for that matter...  FWIW, see the dmr's comments on
"you are not expected to understand that" story (bell labs site is buggered,
but wayback machine has it on
https://web.archive.org/web/20140724213028/http://cm.bell-labs.com/cm/cs/who/dmr/odd.html)
Especially the "The real problem is that we didn't understand what was going on either"
part...

Re that code - are you sure it doesn't need le64_to_cpu(*src)?  Because from what
I understand about PCI (which matches just fine to the comments in the same driver),
you probably do need that.  Again, the only real way to find out is to test on
big-endian host...
Al Viro Aug. 17, 2018, 6:58 p.m. UTC | #8
On Fri, Aug 17, 2018 at 07:09:49PM +0100, Al Viro wrote:

> Re that code - are you sure it doesn't need le64_to_cpu(*src)?  Because from what
> I understand about PCI (which matches just fine to the comments in the same driver),
> you probably do need that.  Again, the only real way to find out is to test on
> big-endian host...

BTW, would that, by any chance, be an open-coded
	_iowrite64_copy(dst, src, EQ_UNIT/sizeof(u64))
Al Viro Aug. 17, 2018, 6:59 p.m. UTC | #9
On Fri, Aug 17, 2018 at 07:58:41PM +0100, Al Viro wrote:
> On Fri, Aug 17, 2018 at 07:09:49PM +0100, Al Viro wrote:
> 
> > Re that code - are you sure it doesn't need le64_to_cpu(*src)?  Because from what
> > I understand about PCI (which matches just fine to the comments in the same driver),
> > you probably do need that.  Again, the only real way to find out is to test on
> > big-endian host...
> 
> BTW, would that, by any chance, be an open-coded
> 	_iowrite64_copy(dst, src, EQ_UNIT/sizeof(u64))

__iowrite64_copy, even...
Al Viro Aug. 17, 2018, 7:44 p.m. UTC | #10
On Fri, Aug 17, 2018 at 07:59:44PM +0100, Al Viro wrote:
> On Fri, Aug 17, 2018 at 07:58:41PM +0100, Al Viro wrote:
> > On Fri, Aug 17, 2018 at 07:09:49PM +0100, Al Viro wrote:
> > 
> > > Re that code - are you sure it doesn't need le64_to_cpu(*src)?  Because from what
> > > I understand about PCI (which matches just fine to the comments in the same driver),
> > > you probably do need that.  Again, the only real way to find out is to test on
> > > big-endian host...
> > 
> > BTW, would that, by any chance, be an open-coded
> > 	_iowrite64_copy(dst, src, EQ_UNIT/sizeof(u64))
> 
> __iowrite64_copy, even...

FWIW, it looks like the confusion had been between the endianness of the data structures
(b-e both on host and NIC side) and the fact that PCI is l-e.  *IF* that code wants to
copy data from host data structures to iomem as-is, it needs to use __raw_writeq() and
its ilk or writeq(le64_to_cpu(...)) to compensate.  The latter would, indeed, confuse
sparse - we are accessing b-e data as if it was l-e.

If we want copying that wouldn't affect the endianness, we need memcpy_toio() or similar
beasts.  And AFAICS that code is very close to
                /* If we're only writing a single Egress Unit and the BAR2
                 * Queue ID is 0, we can use the Write Combining Doorbell
                 * Gather Buffer; otherwise we use the simple doorbell.
                 */
                if (n == 1 && tq->bar2_qid == 0) {
                        unsigned int index = (tq->pidx ?: tq->size) - 1;
                        /* Copy the TX Descriptor in a tight loop in order to
                         * try to get it to the adapter in a single Write
                         * Combined transfer on the PCI-E Bus.  If the Write
                         * Combine fails (say because of an interrupt, etc.)
                         * the hardware will simply take the last write as a
                         * simple doorbell write with a PIDX Increment of 1
                         * and will fetch the TX Descriptor from memory via
                         * DMA.
                         */
			__iowrite64_copy(tq->bar2_addr + SGE_UDB_WCDOORBELL,
					 &tq->desc[index], EQ_UNIT/sizeof(u64))
		} else {
                        writel(val | QID_V(tq->bar2_qid),
                               tq->bar2_addr + SGE_UDB_KDOORBELL);
		}
                /* This Write Memory Barrier will force the write to the User
                 * Doorbell area to be flushed.  This is needed to prevent
                 * writes on different CPUs for the same queue from hitting
                 * the adapter out of order.  This is required when some Work
                 * Requests take the Write Combine Gather Buffer path (user
                 * doorbell area offset [SGE_UDB_WCDOORBELL..+63]) and some
                 * take the traditional path where we simply increment the
                 * PIDX (User Doorbell area SGE_UDB_KDOORBELL) and have the
                 * hardware DMA read the actual Work Request.
                 */
                wmb();

which wouldn't have looked unusual...  Again, that really needs review from
the folks familiar with the hardware in question, as well as testing - I'm
not trying to push patches like that.  If the current mainline variant
really works on b-e, I'd like to understand how does it manage that, though...
Ganesh Goudar Aug. 22, 2018, 10:29 a.m. UTC | #11
Hi Al,

All the issues you have mentioned make sense to me, I will fix them
and try to have them tested on big endian machine.
I got all the patches from net-endian.chelsio they all look good,
But I am yet to go through
(struct cxgb4_next_header .match_val/.match_mask/mask should be net-endian).

regarding "le64_to_cpu(*src)", I think we have not tested our vf driver on
a big endian machine. will address this as well.

Thanks
Ganesh
diff mbox series

Patch

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index 00fc5f1afb1d..7dddb9e748b8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -1038,10 +1038,8 @@  static void mk_act_open_req(struct filter_entry *f, struct sk_buff *skb,
 	OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_ACT_OPEN_REQ, qid_filterid));
 	req->local_port = cpu_to_be16(f->fs.val.lport);
 	req->peer_port = cpu_to_be16(f->fs.val.fport);
-	req->local_ip = f->fs.val.lip[0] | f->fs.val.lip[1] << 8 |
-		f->fs.val.lip[2] << 16 | f->fs.val.lip[3] << 24;
-	req->peer_ip = f->fs.val.fip[0] | f->fs.val.fip[1] << 8 |
-		f->fs.val.fip[2] << 16 | f->fs.val.fip[3] << 24;
+	memcpy(&req->local_ip, f->fs.val.lip, 4);
+	memcpy(&req->peer_ip, f->fs.val.fip, 4);
 	req->opt0 = cpu_to_be64(NAGLE_V(f->fs.newvlan == VLAN_REMOVE ||
 					f->fs.newvlan == VLAN_REWRITE) |
 				DELACK_V(f->fs.hitcnts) |