From patchwork Tue Jun 19 17:32:49 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Kumar X-Patchwork-Id: 931779 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=vmware.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 419FSQ133Rz9s4w for ; Wed, 20 Jun 2018 03:33:58 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 8A25E111B; Tue, 19 Jun 2018 17:32:59 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 36D79110A for ; Tue, 19 Jun 2018 17:32:57 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from EX13-EDG-OU-002.vmware.com (ex13-edg-ou-002.vmware.com [208.91.0.190]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id AB72C13E for ; Tue, 19 Jun 2018 17:32:55 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-002.vmware.com (10.113.208.156) with Microsoft SMTP Server id 15.0.1156.6; Tue, 19 Jun 2018 10:32:51 -0700 Received: from WIN-ANAND1.vmware.com (win-anand1.prom.eng.vmware.com [10.33.79.212]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 70D5A40EF7; Tue, 19 Jun 2018 10:32:54 -0700 (PDT) From: Anand Kumar To: Date: Tue, 19 Jun 2018 10:32:49 -0700 Message-ID: <20180619173251.6764-2-kumaranand@vmware.com> X-Mailer: git-send-email 2.9.3.windows.1 In-Reply-To: <20180619173251.6764-1-kumaranand@vmware.com> References: <20180619173251.6764-1-kumaranand@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-002.vmware.com: kumaranand@vmware.com does not designate permitted sender hosts) X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v4 1/3] datapath-windows: Use spinlock instead of RW lock for ct entry X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This patch mainly changes a ndis RW lock for conntrack entry to a spinlock along with some minor refactor in conntrack. Using spinlock instead of RW lock as RW locks causes performance hits when acquired/released multiple times. - Use NdisInterlockedXX wrapper api's instead of InterlockedXX. - Update 'ctTotalRelatedEntries' using interlocked functions. - Move conntrack lock out of NAT module. Testing: Verified loading/unloading the driver with driver verified enabled. Ran TCP/UDP and ICMP traffic. Signed-off-by: Anand Kumar Acked-by: Alin Gabriel Serdean --- v1->v2: Calculate the dispatch level only in cases where the locks are being acquired multiple times within a given context and minor style change. v2->v3: Fix kernel crash while executing cleanup thread in conntrack-related v3->v4: Fix a bug found through code analysis --- datapath-windows/ovsext/Conntrack-nat.c | 7 +- datapath-windows/ovsext/Conntrack-related.c | 21 ++--- datapath-windows/ovsext/Conntrack.c | 135 ++++++++++++++-------------- datapath-windows/ovsext/Conntrack.h | 2 +- datapath-windows/ovsext/Util.h | 18 ++++ 5 files changed, 96 insertions(+), 87 deletions(-) diff --git a/datapath-windows/ovsext/Conntrack-nat.c b/datapath-windows/ovsext/Conntrack-nat.c index 316c946..da1814f 100644 --- a/datapath-windows/ovsext/Conntrack-nat.c +++ b/datapath-windows/ovsext/Conntrack-nat.c @@ -167,16 +167,13 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx, { UINT32 natFlag; const struct ct_endpoint* endpoint; - LOCK_STATE_EX lockState; - /* XXX: Move conntrack locks out of NAT after implementing lock in NAT. */ - NdisAcquireRWLockRead(entry->lock, &lockState, 0); + /* When it is NAT, only entry->rev_key contains NATTED address; When it is unNAT, only entry->key contains the UNNATTED address;*/ const OVS_CT_KEY *ctKey = reverse ? &entry->key : &entry->rev_key; BOOLEAN isSrcNat; if (!(natAction & (NAT_ACTION_SRC | NAT_ACTION_DST))) { - NdisReleaseRWLock(entry->lock, &lockState); return; } isSrcNat = (((natAction & NAT_ACTION_SRC) && !reverse) || @@ -206,7 +203,6 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx, } } else if (ctKey->dl_type == htons(ETH_TYPE_IPV6)){ // XXX: IPv6 packet not supported yet. - NdisReleaseRWLock(entry->lock, &lockState); return; } if (natAction & (NAT_ACTION_SRC_PORT | NAT_ACTION_DST_PORT)) { @@ -220,7 +216,6 @@ OvsNatPacket(OvsForwardingContext *ovsFwdCtx, } } } - NdisReleaseRWLock(entry->lock, &lockState); } diff --git a/datapath-windows/ovsext/Conntrack-related.c b/datapath-windows/ovsext/Conntrack-related.c index ec4b536..950be98 100644 --- a/datapath-windows/ovsext/Conntrack-related.c +++ b/datapath-windows/ovsext/Conntrack-related.c @@ -18,7 +18,7 @@ #include "Jhash.h" static PLIST_ENTRY ovsCtRelatedTable; /* Holds related entries */ -static UINT64 ctTotalRelatedEntries; +static ULONG ctTotalRelatedEntries; static OVS_CT_THREAD_CTX ctRelThreadCtx; static PNDIS_RW_LOCK_EX ovsCtRelatedLockObj; extern POVS_SWITCH_CONTEXT gOvsSwitchContext; @@ -75,13 +75,11 @@ OvsCtRelatedLookup(OVS_CT_KEY key, UINT64 currentTime) POVS_CT_REL_ENTRY entry; LOCK_STATE_EX lockState; - NdisAcquireRWLockRead(ovsCtRelatedLockObj, &lockState, 0); - if (!ctTotalRelatedEntries) { - NdisReleaseRWLock(ovsCtRelatedLockObj, &lockState); return NULL; } + NdisAcquireRWLockRead(ovsCtRelatedLockObj, &lockState, 0); for (int i = 0; i < CT_HASH_TABLE_SIZE; i++) { /* XXX - Scan the table based on the hash instead */ LIST_FORALL_SAFE(&ovsCtRelatedTable[i], link, next) { @@ -103,7 +101,7 @@ OvsCtRelatedEntryDelete(POVS_CT_REL_ENTRY entry) { RemoveEntryList(&entry->link); OvsFreeMemoryWithTag(entry, OVS_CT_POOL_TAG); - ctTotalRelatedEntries--; + NdisInterlockedDecrement((PLONG)&ctTotalRelatedEntries); } NDIS_STATUS @@ -139,7 +137,7 @@ OvsCtRelatedEntryCreate(UINT8 ipProto, NdisAcquireRWLockWrite(ovsCtRelatedLockObj, &lockState, 0); InsertHeadList(&ovsCtRelatedTable[hash & CT_HASH_TABLE_MASK], &entry->link); - ctTotalRelatedEntries++; + NdisInterlockedIncrement((PLONG)&ctTotalRelatedEntries); NdisReleaseRWLock(ovsCtRelatedLockObj, &lockState); return NDIS_STATUS_SUCCESS; @@ -150,20 +148,19 @@ OvsCtRelatedFlush() { PLIST_ENTRY link, next; POVS_CT_REL_ENTRY entry; - LOCK_STATE_EX lockState; - NdisAcquireRWLockWrite(ovsCtRelatedLockObj, &lockState, 0); if (ctTotalRelatedEntries) { + NdisAcquireRWLockWrite(ovsCtRelatedLockObj, &lockState, 0); for (int i = 0; i < CT_HASH_TABLE_SIZE; i++) { LIST_FORALL_SAFE(&ovsCtRelatedTable[i], link, next) { entry = CONTAINING_RECORD(link, OVS_CT_REL_ENTRY, link); OvsCtRelatedEntryDelete(entry); } } + NdisReleaseRWLock(ovsCtRelatedLockObj, &lockState); } - NdisReleaseRWLock(ovsCtRelatedLockObj, &lockState); return NDIS_STATUS_SUCCESS; } @@ -189,9 +186,8 @@ OvsCtRelatedEntryCleaner(PVOID data) /* Lock has been freed by 'OvsCleanupCtRelated()' */ break; } - NdisAcquireRWLockWrite(ovsCtRelatedLockObj, &lockState, 0); + if (context->exit) { - NdisReleaseRWLock(ovsCtRelatedLockObj, &lockState); break; } @@ -201,6 +197,7 @@ OvsCtRelatedEntryCleaner(PVOID data) threadSleepTimeout = currentTime + CT_CLEANUP_INTERVAL; if (ctTotalRelatedEntries) { + NdisAcquireRWLockWrite(ovsCtRelatedLockObj, &lockState, 0); for (int i = 0; i < CT_HASH_TABLE_SIZE; i++) { LIST_FORALL_SAFE(&ovsCtRelatedTable[i], link, next) { entry = CONTAINING_RECORD(link, OVS_CT_REL_ENTRY, link); @@ -209,8 +206,8 @@ OvsCtRelatedEntryCleaner(PVOID data) } } } + NdisReleaseRWLock(ovsCtRelatedLockObj, &lockState); } - NdisReleaseRWLock(ovsCtRelatedLockObj, &lockState); KeWaitForSingleObject(&context->event, Executive, KernelMode, FALSE, (LARGE_INTEGER *)&threadSleepTimeout); } diff --git a/datapath-windows/ovsext/Conntrack.c b/datapath-windows/ovsext/Conntrack.c index add1491..4c9f518 100644 --- a/datapath-windows/ovsext/Conntrack.c +++ b/datapath-windows/ovsext/Conntrack.c @@ -34,7 +34,7 @@ static OVS_CT_THREAD_CTX ctThreadCtx; static PNDIS_RW_LOCK_EX *ovsCtBucketLock = NULL; static PNDIS_RW_LOCK_EX ovsCtNatLockObj; extern POVS_SWITCH_CONTEXT gOvsSwitchContext; -static LONG ctTotalEntries; +static ULONG ctTotalEntries; static __inline OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 *tuple); static __inline NDIS_STATUS @@ -208,7 +208,6 @@ OvsPostCtEventEntry(POVS_CT_ENTRY entry, UINT8 type) { OVS_CT_EVENT_ENTRY ctEventEntry = {0}; NdisMoveMemory(&ctEventEntry.entry, entry, sizeof(OVS_CT_ENTRY)); - ctEventEntry.entry.lock = NULL; ctEventEntry.entry.parent = NULL; ctEventEntry.type = type; OvsPostCtEvent(&ctEventEntry); @@ -217,8 +216,7 @@ OvsPostCtEventEntry(POVS_CT_ENTRY entry, UINT8 type) static __inline VOID OvsCtIncrementCounters(POVS_CT_ENTRY entry, BOOLEAN reply, PNET_BUFFER_LIST nbl) { - LOCK_STATE_EX lockState; - NdisAcquireRWLockWrite(entry->lock, &lockState, 0); + NdisAcquireSpinLock(&(entry->lock)); if (reply) { entry->rev_key.byteCount+= OvsPacketLenNBL(nbl); entry->rev_key.packetCount++; @@ -226,11 +224,11 @@ OvsCtIncrementCounters(POVS_CT_ENTRY entry, BOOLEAN reply, PNET_BUFFER_LIST nbl) entry->key.byteCount += OvsPacketLenNBL(nbl); entry->key.packetCount++; } - NdisReleaseRWLock(entry->lock, &lockState); + NdisReleaseSpinLock(&(entry->lock)); } static __inline BOOLEAN -OvsCtAddEntry(POVS_SWITCH_CONTEXT switchContext, POVS_CT_ENTRY entry, +OvsCtAddEntry(POVS_CT_ENTRY entry, OvsConntrackKeyLookupCtx *ctx, PNAT_ACTION_INFO natInfo, UINT64 now) { @@ -261,18 +259,14 @@ OvsCtAddEntry(POVS_SWITCH_CONTEXT switchContext, POVS_CT_ENTRY entry, } entry->timestampStart = now; - entry->lock = NdisAllocateRWLock(switchContext->NdisFilterHandle); - if (entry->lock == NULL) { - OVS_LOG_ERROR("NdisAllocateRWLock failed : Insufficient resources"); - return FALSE; - } + NdisAllocateSpinLock(&(entry->lock)); UINT32 bucketIdx = ctx->hash & CT_HASH_TABLE_MASK; entry->bucketLockRef = ovsCtBucketLock[bucketIdx]; NdisAcquireRWLockWrite(ovsCtBucketLock[bucketIdx], &lockState, 0); InsertHeadList(&ovsConntrackTable[bucketIdx], &entry->link); - InterlockedIncrement((LONG volatile *)&ctTotalEntries); + NdisInterlockedIncrement((PLONG)&ctTotalEntries); NdisReleaseRWLock(ovsCtBucketLock[bucketIdx], &lockState); return TRUE; } @@ -351,8 +345,7 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx, if (state != OVS_CS_F_INVALID && commit) { if (entry) { entry->parent = parentEntry; - if (OvsCtAddEntry(fwdCtx->switchContext, entry, ctx, - natInfo, currentTime)) { + if (OvsCtAddEntry(entry, ctx, natInfo, currentTime)) { *entryCreated = TRUE; } else { /* Unable to add entry to the list */ @@ -382,8 +375,6 @@ OvsCtUpdateEntry(OVS_CT_ENTRY* entry, UINT64 now) { CT_UPDATE_RES status; - LOCK_STATE_EX lockState; - NdisAcquireRWLockWrite(entry->lock, &lockState, 0); switch (ipProto) { case IPPROTO_TCP: { @@ -394,20 +385,29 @@ OvsCtUpdateEntry(OVS_CT_ENTRY* entry, status = CT_UPDATE_INVALID; break; } + NdisAcquireSpinLock(&(entry->lock)); status = OvsConntrackUpdateTcpEntry(entry, tcp, nbl, reply, now); + NdisReleaseSpinLock(&(entry->lock)); break; } case IPPROTO_ICMP: + { + NdisAcquireSpinLock(&(entry->lock)); status = OvsConntrackUpdateIcmpEntry(entry, reply, now); + NdisReleaseSpinLock(&(entry->lock)); break; + } case IPPROTO_UDP: + { + NdisAcquireSpinLock(&(entry->lock)); status = OvsConntrackUpdateOtherEntry(entry, reply, now); + NdisReleaseSpinLock(&(entry->lock)); break; + } default: status = CT_UPDATE_INVALID; break; } - NdisReleaseRWLock(entry->lock, &lockState); return status; } @@ -431,8 +431,8 @@ OvsCtEntryDelete(POVS_CT_ENTRY entry, BOOLEAN forceDelete) if (entry == NULL) { return; } - LOCK_STATE_EX lockState; - NdisAcquireRWLockWrite(entry->lock, &lockState, 0); + KIRQL irql = KeGetCurrentIrql(); + OVS_ACQUIRE_SPIN_LOCK(&(entry->lock), irql); if (forceDelete || OvsCtEntryExpired(entry)) { if (entry->natInfo.natAction) { LOCK_STATE_EX lockStateNat; @@ -442,13 +442,13 @@ OvsCtEntryDelete(POVS_CT_ENTRY entry, BOOLEAN forceDelete) } OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE); RemoveEntryList(&entry->link); - NdisReleaseRWLock(entry->lock, &lockState); - NdisFreeRWLock(entry->lock); + OVS_RELEASE_SPIN_LOCK(&(entry->lock), irql); + NdisFreeSpinLock(&(entry->lock)); OvsFreeMemoryWithTag(entry, OVS_CT_POOL_TAG); - InterlockedDecrement((LONG volatile*)&ctTotalEntries); + NdisInterlockedDecrement((PLONG)&ctTotalEntries); return; } - NdisReleaseRWLock(entry->lock, &lockState); + OVS_RELEASE_SPIN_LOCK(&(entry->lock), irql); } static __inline NDIS_STATUS @@ -499,7 +499,7 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) POVS_CT_ENTRY entry; BOOLEAN reply = FALSE; POVS_CT_ENTRY found = NULL; - LOCK_STATE_EX lockState, lockStateTable; + LOCK_STATE_EX lockStateTable; UINT32 bucketIdx; /* Reverse NAT must be performed before OvsCtLookup, so here * we simply need to flip the src and dst in key and compare @@ -512,11 +512,14 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) if (!ctTotalEntries) { return found; } + + KIRQL irql = KeGetCurrentIrql(); bucketIdx = ctx->hash & CT_HASH_TABLE_MASK; NdisAcquireRWLockRead(ovsCtBucketLock[bucketIdx], &lockStateTable, 0); LIST_FORALL(&ovsConntrackTable[bucketIdx], link) { entry = CONTAINING_RECORD(link, OVS_CT_ENTRY, link); - NdisAcquireRWLockRead(entry->lock, &lockState, 0); + OVS_ACQUIRE_SPIN_LOCK(&(entry->lock), irql); + if (OvsCtKeyAreSame(ctx->key, entry->key)) { found = entry; reply = FALSE; @@ -533,10 +536,10 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) } else { ctx->reply = reply; } - NdisReleaseRWLock(entry->lock, &lockState); + OVS_RELEASE_SPIN_LOCK(&(entry->lock), irql); break; } - NdisReleaseRWLock(entry->lock, &lockState); + OVS_RELEASE_SPIN_LOCK(&(entry->lock), irql); } NdisReleaseRWLock(ovsCtBucketLock[bucketIdx], &lockStateTable); @@ -689,7 +692,7 @@ OvsProcessConntrackEntry(OvsForwardingContext *fwdCtx, POVS_CT_ENTRY entry = ctx->entry; UINT32 state = 0; PNET_BUFFER_LIST curNbl = fwdCtx->curNbl; - LOCK_STATE_EX lockState, lockStateTable; + LOCK_STATE_EX lockStateTable; PNDIS_RW_LOCK_EX bucketLockRef = NULL; *entryCreated = FALSE; @@ -730,7 +733,7 @@ OvsProcessConntrackEntry(OvsForwardingContext *fwdCtx, } } if (entry) { - NdisAcquireRWLockRead(entry->lock, &lockState, 0); + NdisAcquireSpinLock(&(entry->lock)); if (key->ipKey.nwProto == IPPROTO_TCP) { /* Update the related bit if there is a parent */ if (entry->parent) { @@ -748,7 +751,7 @@ OvsProcessConntrackEntry(OvsForwardingContext *fwdCtx, /* Copy mark and label from entry into flowKey. If actions specify different mark and label, update the flowKey. */ OvsCtUpdateFlowKey(key, state, zone, entry->mark, &entry->labels); - NdisReleaseRWLock(entry->lock, &lockState); + NdisReleaseSpinLock(&(entry->lock)); } else { OvsCtUpdateFlowKey(key, state, zone, 0, NULL); } @@ -802,8 +805,6 @@ OvsCtSetMarkLabel(OvsFlowKey *key, MD_LABELS *labels, BOOLEAN *triggerUpdateEvent) { - LOCK_STATE_EX lockState; - NdisAcquireRWLockWrite(entry->lock, &lockState, 0); if (mark) { OvsConntrackSetMark(key, entry, mark->value, mark->mask, triggerUpdateEvent); @@ -813,7 +814,6 @@ OvsCtSetMarkLabel(OvsFlowKey *key, OvsConntrackSetLabels(key, entry, &labels->value, &labels->mask, triggerUpdateEvent); } - NdisReleaseRWLock(entry->lock, &lockState); } /* @@ -854,10 +854,11 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, { NDIS_STATUS status = NDIS_STATUS_SUCCESS; BOOLEAN triggerUpdateEvent = FALSE; + BOOLEAN entryCreated = FALSE; POVS_CT_ENTRY entry = NULL; PNET_BUFFER_LIST curNbl = fwdCtx->curNbl; OvsConntrackKeyLookupCtx ctx = { 0 }; - LOCK_STATE_EX lockState, lockStateTable; + LOCK_STATE_EX lockStateTable; UINT64 currentTime; NdisGetCurrentSystemTime((LARGE_INTEGER *) ¤tTime); @@ -866,10 +867,9 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, /* Lookup Conntrack entries for a matching entry */ entry = OvsCtLookup(&ctx); - BOOLEAN entryCreated = FALSE; /* Delete entry in reverse direction if 'force' is specified */ - if (entry && force && ctx.reply) { + if (force && ctx.reply && entry) { PNDIS_RW_LOCK_EX bucketLockRef = entry->bucketLockRef; NdisAcquireRWLockWrite(bucketLockRef, &lockStateTable, 0); OvsCtEntryDelete(entry, TRUE); @@ -877,34 +877,33 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, entry = NULL; } - if (!entry && commit && ctTotalEntries >= CT_MAX_ENTRIES) { - /* Don't proceed with processing if the max limit has been hit. - * This blocks only new entries from being created and doesn't - * affect existing connections. + if (entry) { + /* Increment stats for the entry if it wasn't tracked previously or + * if they are on different zones */ - OVS_LOG_ERROR("Conntrack Limit hit: %lu", ctTotalEntries); - return NDIS_STATUS_RESOURCES; - } - - /* Increment stats for the entry if it wasn't tracked previously or - * if they are on different zones - */ - if (entry && (entry->key.zone != key->ct.zone || - (!(key->ct.state & OVS_CS_F_TRACKED)))) { - OvsCtIncrementCounters(entry, ctx.reply, curNbl); - } + if ((entry->key.zone != key->ct.zone || + (!(key->ct.state & OVS_CS_F_TRACKED)))) { + OvsCtIncrementCounters(entry, ctx.reply, curNbl); + } + /* Process the entry and update CT flags */ + entry = OvsProcessConntrackEntry(fwdCtx, layers->l4Offset, &ctx, key, + zone, natInfo, commit, currentTime, + &entryCreated); - if (!entry) { + } else { + if (commit && ctTotalEntries >= CT_MAX_ENTRIES) { + /* Don't proceed with processing if the max limit has been hit. + * This blocks only new entries from being created and doesn't + * affect existing connections. + */ + OVS_LOG_ERROR("Conntrack Limit hit: %lu", ctTotalEntries); + return NDIS_STATUS_RESOURCES; + } /* If no matching entry was found, create one and add New state */ entry = OvsCtEntryCreate(fwdCtx, key->ipKey.nwProto, layers->l4Offset, &ctx, key, natInfo, commit, currentTime, &entryCreated); - } else { - /* Process the entry and update CT flags */ - entry = OvsProcessConntrackEntry(fwdCtx, layers->l4Offset, &ctx, key, - zone, natInfo, commit, currentTime, - &entryCreated); } if (entry == NULL) { @@ -918,6 +917,8 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, * NAT_ACTION_REVERSE, yet entry->natInfo is NAT_ACTION_SRC or * NAT_ACTION_DST without NAT_ACTION_REVERSE */ + KIRQL irql = KeGetCurrentIrql(); + OVS_ACQUIRE_SPIN_LOCK(&(entry->lock), irql); if (natInfo->natAction != NAT_ACTION_NONE) { LOCK_STATE_EX lockStateNat; @@ -939,15 +940,14 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, OVS_LOG_ERROR("Error while parsing the FTP packet"); } } - NdisAcquireRWLockRead(entry->lock, &lockState, 0); + /* Add original tuple information to flow Key */ if (entry->key.dl_type == ntohs(ETH_TYPE_IPV4)) { if (entry->parent != NULL) { POVS_CT_ENTRY parent = entry->parent; - LOCK_STATE_EX lockStateParent; - NdisAcquireRWLockRead(parent->lock, &lockStateParent, 0); + OVS_ACQUIRE_SPIN_LOCK(&(parent->lock), irql); OvsCtUpdateTuple(key, &parent->key); - NdisReleaseRWLock(parent->lock, &lockStateParent); + OVS_RELEASE_SPIN_LOCK(&(parent->lock), irql); } else { OvsCtUpdateTuple(key, &entry->key); } @@ -955,13 +955,11 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, if (entryCreated) { OvsPostCtEventEntry(entry, OVS_EVENT_CT_NEW); - } - if (postUpdateEvent && !entryCreated && triggerUpdateEvent) { + } else if (postUpdateEvent && triggerUpdateEvent) { OvsPostCtEventEntry(entry, OVS_EVENT_CT_UPDATE); } - NdisReleaseRWLock(entry->lock, &lockState); - + OVS_RELEASE_SPIN_LOCK(&(entry->lock), irql); return status; } @@ -1738,8 +1736,9 @@ OvsCtDumpCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx, UINT32 inIndex = instance->dumpState.index[1]; UINT32 i = CT_HASH_TABLE_SIZE; UINT32 outIndex = 0; + KIRQL irql = KeGetCurrentIrql(); + LOCK_STATE_EX lockStateTable; - LOCK_STATE_EX lockState, lockStateTable; if (ctTotalEntries) { for (i = inBucket; i < CT_HASH_TABLE_SIZE; i++) { PLIST_ENTRY head, link; @@ -1756,7 +1755,7 @@ OvsCtDumpCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx, */ if (outIndex >= inIndex) { entry = CONTAINING_RECORD(link, OVS_CT_ENTRY, link); - NdisAcquireRWLockRead(entry->lock, &lockState, 0); + OVS_ACQUIRE_SPIN_LOCK(&(entry->lock), irql); rc = OvsCreateNlMsgFromCtEntry(entry, usrParamsCtx->outputBuffer, usrParamsCtx->outputLength, @@ -1765,7 +1764,7 @@ OvsCtDumpCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx, msgIn->nlMsg.nlmsgPid, msgIn->nfGenMsg.version, 0); - NdisReleaseRWLock(entry->lock, &lockState); + OVS_RELEASE_SPIN_LOCK(&(entry->lock), irql); if (rc != NDIS_STATUS_SUCCESS) { NdisReleaseRWLock(ovsCtBucketLock[i], &lockStateTable); return STATUS_UNSUCCESSFUL; diff --git a/datapath-windows/ovsext/Conntrack.h b/datapath-windows/ovsext/Conntrack.h index 3be309e..7dc92a1 100644 --- a/datapath-windows/ovsext/Conntrack.h +++ b/datapath-windows/ovsext/Conntrack.h @@ -101,7 +101,7 @@ typedef struct _NAT_ACTION_INFO { typedef struct OVS_CT_ENTRY { /* Reference to ovsCtBucketLock of ovsConntrackTable.*/ PNDIS_RW_LOCK_EX bucketLockRef; - PNDIS_RW_LOCK_EX lock; /* Protects OVS_CT_ENTRY. */ + NDIS_SPIN_LOCK lock; /* Protects OVS_CT_ENTRY. */ OVS_CT_KEY key; OVS_CT_KEY rev_key; UINT64 expiration; diff --git a/datapath-windows/ovsext/Util.h b/datapath-windows/ovsext/Util.h index 6f02147..a9bccf3 100644 --- a/datapath-windows/ovsext/Util.h +++ b/datapath-windows/ovsext/Util.h @@ -105,6 +105,24 @@ VOID OvsAppendList(PLIST_ENTRY dst, PLIST_ENTRY src); #define BIT16(_x) ((UINT16)0x1 << (_x)) #define BIT32(_x) ((UINT32)0x1 << (_x)) +#define OVS_ACQUIRE_SPIN_LOCK(_pLock, _dispatchLevel) \ +{ \ + if (_dispatchLevel) { \ + NdisDprAcquireSpinLock(_pLock); \ + } else { \ + NdisAcquireSpinLock(_pLock); \ + } \ +} + +#define OVS_RELEASE_SPIN_LOCK(_pLock, _dispatchLevel) \ +{ \ + if (_dispatchLevel) { \ + NdisDprReleaseSpinLock(_pLock); \ + } else { \ + NdisReleaseSpinLock(_pLock); \ + } \ +} + BOOLEAN OvsCompareString(PVOID string1, PVOID string2); /* From patchwork Tue Jun 19 17:32:50 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Kumar X-Patchwork-Id: 931778 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=vmware.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 419FRm4cV5z9s4w for ; Wed, 20 Jun 2018 03:33:24 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id B45FF1115; Tue, 19 Jun 2018 17:32:58 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id F3B2A1107 for ; Tue, 19 Jun 2018 17:32:56 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from EX13-EDG-OU-001.vmware.com (ex13-edg-ou-001.vmware.com [208.91.0.189]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6A70917E for ; Tue, 19 Jun 2018 17:32:55 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-001.vmware.com (10.113.208.155) with Microsoft SMTP Server id 15.0.1156.6; Tue, 19 Jun 2018 10:32:21 -0700 Received: from WIN-ANAND1.vmware.com (win-anand1.prom.eng.vmware.com [10.33.79.212]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id 8B59340EED; Tue, 19 Jun 2018 10:32:54 -0700 (PDT) From: Anand Kumar To: Date: Tue, 19 Jun 2018 10:32:50 -0700 Message-ID: <20180619173251.6764-3-kumaranand@vmware.com> X-Mailer: git-send-email 2.9.3.windows.1 In-Reply-To: <20180619173251.6764-1-kumaranand@vmware.com> References: <20180619173251.6764-1-kumaranand@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-001.vmware.com: kumaranand@vmware.com does not designate permitted sender hosts) X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v4 2/3] datapath-windows: Implement locking in conntrack NAT. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This patch primarily replaces existing ndis RWlock based implementaion for NAT in conntrack with a spinlock based implementation inside NAT, module along with some conntrack optimization. - The 'ovsNatTable' and 'ovsUnNatTable' tables are shared between cleanup threads and packet processing thread. In order to protect these two tables use a spinlock. Also introduce counters to track number of nat entries. - Introduce a new function OvsGetTcpHeader() to retrieve TCP header and payload length, to optimize for TCP traffic. - Optimize conntrack look up. - Remove 'bucketlockRef' member from conntrack entry structure. Testing: Verified loading/unloading the driver with driver verified enabled. Ran TCP/UDP and ICMP traffic. Signed-off-by: Anand Kumar Acked-by: Alin Gabriel Serdean --- v1->v2: Merge patch 2 and 3 so that NAT locks related changes are in a single patch. v2->v3: No change v3->v4: No change --- datapath-windows/ovsext/Conntrack-ftp.c | 4 +- datapath-windows/ovsext/Conntrack-nat.c | 27 +++++++- datapath-windows/ovsext/Conntrack-tcp.c | 15 ++--- datapath-windows/ovsext/Conntrack.c | 110 +++++++++++++------------------- datapath-windows/ovsext/Conntrack.h | 36 +++++++---- 5 files changed, 100 insertions(+), 92 deletions(-) diff --git a/datapath-windows/ovsext/Conntrack-ftp.c b/datapath-windows/ovsext/Conntrack-ftp.c index 6830dfa..ce09a65 100644 --- a/datapath-windows/ovsext/Conntrack-ftp.c +++ b/datapath-windows/ovsext/Conntrack-ftp.c @@ -129,14 +129,14 @@ OvsCtHandleFtp(PNET_BUFFER_LIST curNbl, char temp[256] = { 0 }; char ftpMsg[256] = { 0 }; + UINT32 len; TCPHdr tcpStorage; const TCPHdr *tcp; - tcp = OvsGetTcp(curNbl, layers->l4Offset, &tcpStorage); + tcp = OvsGetTcpHeader(curNbl, layers, &tcpStorage, &len); if (!tcp) { return NDIS_STATUS_INVALID_PACKET; } - UINT32 len = OvsGetTcpPayloadLength(curNbl); if (len > sizeof(temp)) { /* We only care up to 256 */ len = sizeof(temp); diff --git a/datapath-windows/ovsext/Conntrack-nat.c b/datapath-windows/ovsext/Conntrack-nat.c index da1814f..11057e6 100644 --- a/datapath-windows/ovsext/Conntrack-nat.c +++ b/datapath-windows/ovsext/Conntrack-nat.c @@ -3,7 +3,8 @@ PLIST_ENTRY ovsNatTable = NULL; PLIST_ENTRY ovsUnNatTable = NULL; - +static NDIS_SPIN_LOCK ovsCtNatLock; +static ULONG ovsNatEntries; /* *--------------------------------------------------------------------------- * OvsHashNatKey @@ -109,6 +110,8 @@ NTSTATUS OvsNatInit() InitializeListHead(&ovsUnNatTable[i]); } + NdisAllocateSpinLock(&ovsCtNatLock); + ovsNatEntries = 0; return STATUS_SUCCESS; } @@ -121,6 +124,11 @@ NTSTATUS OvsNatInit() VOID OvsNatFlush(UINT16 zone) { PLIST_ENTRY link, next; + if (!ovsNatEntries) { + return; + } + + NdisAcquireSpinLock(&ovsCtNatLock); for (int i = 0; i < NAT_HASH_TABLE_SIZE; i++) { LIST_FORALL_SAFE(&ovsNatTable[i], link, next) { POVS_NAT_ENTRY entry = @@ -131,6 +139,7 @@ VOID OvsNatFlush(UINT16 zone) } } } + NdisReleaseSpinLock(&ovsCtNatLock); } /* @@ -144,10 +153,14 @@ VOID OvsNatCleanup() if (ovsNatTable == NULL) { return; } + + NdisAcquireSpinLock(&ovsCtNatLock); OvsFreeMemoryWithTag(ovsNatTable, OVS_CT_POOL_TAG); OvsFreeMemoryWithTag(ovsUnNatTable, OVS_CT_POOL_TAG); ovsNatTable = NULL; ovsUnNatTable = NULL; + NdisReleaseSpinLock(&ovsCtNatLock); + NdisFreeSpinLock(&ovsCtNatLock); } /* @@ -250,10 +263,13 @@ static UINT32 OvsNatHashRange(const OVS_CT_ENTRY *entry, UINT32 basis) VOID OvsNatAddEntry(OVS_NAT_ENTRY* entry) { + NdisAcquireSpinLock(&ovsCtNatLock); InsertHeadList(OvsNatGetBucket(&entry->key, FALSE), &entry->link); InsertHeadList(OvsNatGetBucket(&entry->value, TRUE), &entry->reverseLink); + NdisReleaseSpinLock(&ovsCtNatLock); + NdisInterlockedIncrement((PLONG)&ovsNatEntries); } /* @@ -399,21 +415,29 @@ OvsNatLookup(const OVS_CT_KEY *ctKey, BOOLEAN reverse) PLIST_ENTRY link; POVS_NAT_ENTRY entry; + if (!ovsNatEntries) { + return NULL; + } + + NdisAcquireSpinLock(&ovsCtNatLock); LIST_FORALL(OvsNatGetBucket(ctKey, reverse), link) { if (reverse) { entry = CONTAINING_RECORD(link, OVS_NAT_ENTRY, reverseLink); if (OvsNatKeyAreSame(ctKey, &entry->value)) { + NdisReleaseSpinLock(&ovsCtNatLock); return entry; } } else { entry = CONTAINING_RECORD(link, OVS_NAT_ENTRY, link); if (OvsNatKeyAreSame(ctKey, &entry->key)) { + NdisReleaseSpinLock(&ovsCtNatLock); return entry; } } } + NdisReleaseSpinLock(&ovsCtNatLock); return NULL; } @@ -432,6 +456,7 @@ OvsNatDeleteEntry(POVS_NAT_ENTRY entry) RemoveEntryList(&entry->link); RemoveEntryList(&entry->reverseLink); OvsFreeMemoryWithTag(entry, OVS_CT_POOL_TAG); + NdisInterlockedDecrement((PLONG)&ovsNatEntries); } /* diff --git a/datapath-windows/ovsext/Conntrack-tcp.c b/datapath-windows/ovsext/Conntrack-tcp.c index 8cbab24..eda42ac 100644 --- a/datapath-windows/ovsext/Conntrack-tcp.c +++ b/datapath-windows/ovsext/Conntrack-tcp.c @@ -194,9 +194,9 @@ OvsCastConntrackEntryToTcpEntry(OVS_CT_ENTRY* conn) enum CT_UPDATE_RES OvsConntrackUpdateTcpEntry(OVS_CT_ENTRY* conn_, const TCPHdr *tcp, - PNET_BUFFER_LIST nbl, BOOLEAN reply, - UINT64 now) + UINT64 now, + UINT32 tcpPayloadLen) { struct conn_tcp *conn = OvsCastConntrackEntryToTcpEntry(conn_); /* The peer that sent 'pkt' */ @@ -207,7 +207,6 @@ OvsConntrackUpdateTcpEntry(OVS_CT_ENTRY* conn_, UINT16 tcp_flags = ntohs(tcp->flags); uint16_t win = ntohs(tcp->window); uint32_t ack, end, seq, orig_seq; - uint32_t p_len = OvsGetTcpPayloadLength(nbl); int ackskew; if (OvsCtInvalidTcpFlags(tcp_flags)) { @@ -248,7 +247,7 @@ OvsConntrackUpdateTcpEntry(OVS_CT_ENTRY* conn_, ack = ntohl(tcp->ack_seq); - end = seq + p_len; + end = seq + tcpPayloadLen; if (tcp_flags & TCP_SYN) { end++; if (dst->wscale & CT_WSCALE_FLAG) { @@ -287,7 +286,7 @@ OvsConntrackUpdateTcpEntry(OVS_CT_ENTRY* conn_, } else { ack = ntohl(tcp->ack_seq); - end = seq + p_len; + end = seq + tcpPayloadLen; if (tcp_flags & TCP_SYN) { end++; } @@ -469,8 +468,8 @@ OvsConntrackValidateTcpPacket(const TCPHdr *tcp) OVS_CT_ENTRY * OvsConntrackCreateTcpEntry(const TCPHdr *tcp, - PNET_BUFFER_LIST nbl, - UINT64 now) + UINT64 now, + UINT32 tcpPayloadLen) { struct conn_tcp* newconn; struct tcp_peer *src, *dst; @@ -486,7 +485,7 @@ OvsConntrackCreateTcpEntry(const TCPHdr *tcp, dst = &newconn->peer[1]; src->seqlo = ntohl(tcp->seq); - src->seqhi = src->seqlo + OvsGetTcpPayloadLength(nbl) + 1; + src->seqhi = src->seqlo + tcpPayloadLen + 1; if (tcp->flags & TCP_SYN) { src->seqhi++; diff --git a/datapath-windows/ovsext/Conntrack.c b/datapath-windows/ovsext/Conntrack.c index 4c9f518..8fa1e07 100644 --- a/datapath-windows/ovsext/Conntrack.c +++ b/datapath-windows/ovsext/Conntrack.c @@ -32,7 +32,6 @@ KSTART_ROUTINE OvsConntrackEntryCleaner; static PLIST_ENTRY ovsConntrackTable; static OVS_CT_THREAD_CTX ctThreadCtx; static PNDIS_RW_LOCK_EX *ovsCtBucketLock = NULL; -static PNDIS_RW_LOCK_EX ovsCtNatLockObj; extern POVS_SWITCH_CONTEXT gOvsSwitchContext; static ULONG ctTotalEntries; @@ -54,19 +53,11 @@ OvsInitConntrack(POVS_SWITCH_CONTEXT context) ctTotalEntries = 0; UINT32 numBucketLocks = CT_HASH_TABLE_SIZE; - /* Init the sync-lock */ - ovsCtNatLockObj = NdisAllocateRWLock(context->NdisFilterHandle); - if (ovsCtNatLockObj == NULL) { - return STATUS_INSUFFICIENT_RESOURCES; - } - /* Init the Hash Buffer */ ovsConntrackTable = OvsAllocateMemoryWithTag(sizeof(LIST_ENTRY) * CT_HASH_TABLE_SIZE, OVS_CT_POOL_TAG); if (ovsConntrackTable == NULL) { - NdisFreeRWLock(ovsCtNatLockObj); - ovsCtNatLockObj = NULL; return STATUS_INSUFFICIENT_RESOURCES; } @@ -121,8 +112,6 @@ freeBucketLock: freeTable: OvsFreeMemoryWithTag(ovsConntrackTable, OVS_CT_POOL_TAG); ovsConntrackTable = NULL; - NdisFreeRWLock(ovsCtNatLockObj); - ovsCtNatLockObj = NULL; return status; } @@ -135,7 +124,6 @@ freeTable: VOID OvsCleanupConntrack(VOID) { - LOCK_STATE_EX lockStateNat; ctThreadCtx.exit = 1; KeSetEvent(&ctThreadCtx.event, 0, FALSE); KeWaitForSingleObject(ctThreadCtx.threadObject, Executive, @@ -160,12 +148,7 @@ OvsCleanupConntrack(VOID) } OvsFreeMemoryWithTag(ovsCtBucketLock, OVS_CT_POOL_TAG); ovsCtBucketLock = NULL; - - NdisAcquireRWLockWrite(ovsCtNatLockObj, &lockStateNat, 0); OvsNatCleanup(); - NdisReleaseRWLock(ovsCtNatLockObj, &lockStateNat); - NdisFreeRWLock(ovsCtNatLockObj); - ovsCtNatLockObj = NULL; } static __inline VOID @@ -243,25 +226,20 @@ OvsCtAddEntry(POVS_CT_ENTRY entry, if (natInfo == NULL) { entry->natInfo.natAction = NAT_ACTION_NONE; } else { - LOCK_STATE_EX lockStateNat; - NdisAcquireRWLockWrite(ovsCtNatLockObj, &lockStateNat, 0); if (OvsIsForwardNat(natInfo->natAction)) { entry->natInfo = *natInfo; if (!OvsNatTranslateCtEntry(entry)) { - NdisReleaseRWLock(ovsCtNatLockObj, &lockStateNat); return FALSE; } ctx->hash = OvsHashCtKey(&entry->key); } else { entry->natInfo.natAction = natInfo->natAction; } - NdisReleaseRWLock(ovsCtNatLockObj, &lockStateNat); } entry->timestampStart = now; NdisAllocateSpinLock(&(entry->lock)); UINT32 bucketIdx = ctx->hash & CT_HASH_TABLE_MASK; - entry->bucketLockRef = ovsCtBucketLock[bucketIdx]; NdisAcquireRWLockWrite(ovsCtBucketLock[bucketIdx], &lockState, 0); InsertHeadList(&ovsConntrackTable[bucketIdx], &entry->link); @@ -274,7 +252,7 @@ OvsCtAddEntry(POVS_CT_ENTRY entry, static __inline POVS_CT_ENTRY OvsCtEntryCreate(OvsForwardingContext *fwdCtx, UINT8 ipProto, - UINT32 l4Offset, + OVS_PACKET_HDR_INFO *layers, OvsConntrackKeyLookupCtx *ctx, OvsFlowKey *key, PNAT_ACTION_INFO natInfo, @@ -292,16 +270,18 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx, switch (ipProto) { case IPPROTO_TCP: { + UINT32 tcpPayloadLen; TCPHdr tcpStorage; const TCPHdr *tcp; - tcp = OvsGetTcp(curNbl, l4Offset, &tcpStorage); + tcp = OvsGetTcpHeader(curNbl, layers, &tcpStorage, &tcpPayloadLen); if (!OvsConntrackValidateTcpPacket(tcp)) { state = OVS_CS_F_INVALID; break; } if (commit) { - entry = OvsConntrackCreateTcpEntry(tcp, curNbl, currentTime); + entry = OvsConntrackCreateTcpEntry(tcp, currentTime, + tcpPayloadLen); } break; } @@ -309,7 +289,7 @@ OvsCtEntryCreate(OvsForwardingContext *fwdCtx, { ICMPHdr storage; const ICMPHdr *icmp; - icmp = OvsGetIcmp(curNbl, l4Offset, &storage); + icmp = OvsGetIcmp(curNbl, layers->l4Offset, &storage); if (!OvsConntrackValidateIcmpPacket(icmp)) { if(icmp) { OVS_LOG_TRACE("Invalid ICMP packet detected, icmp->type %u", @@ -370,7 +350,7 @@ static enum CT_UPDATE_RES OvsCtUpdateEntry(OVS_CT_ENTRY* entry, PNET_BUFFER_LIST nbl, UINT8 ipProto, - UINT32 l4Offset, + OVS_PACKET_HDR_INFO *layers, BOOLEAN reply, UINT64 now) { @@ -378,15 +358,17 @@ OvsCtUpdateEntry(OVS_CT_ENTRY* entry, switch (ipProto) { case IPPROTO_TCP: { + UINT32 tcpPayloadLen; TCPHdr tcpStorage; const TCPHdr *tcp; - tcp = OvsGetTcp(nbl, l4Offset, &tcpStorage); + tcp = OvsGetTcpHeader(nbl, layers, &tcpStorage, &tcpPayloadLen); if (!tcp) { status = CT_UPDATE_INVALID; break; } NdisAcquireSpinLock(&(entry->lock)); - status = OvsConntrackUpdateTcpEntry(entry, tcp, nbl, reply, now); + status = OvsConntrackUpdateTcpEntry(entry, tcp, reply, now, + tcpPayloadLen); NdisReleaseSpinLock(&(entry->lock)); break; } @@ -435,10 +417,7 @@ OvsCtEntryDelete(POVS_CT_ENTRY entry, BOOLEAN forceDelete) OVS_ACQUIRE_SPIN_LOCK(&(entry->lock), irql); if (forceDelete || OvsCtEntryExpired(entry)) { if (entry->natInfo.natAction) { - LOCK_STATE_EX lockStateNat; - NdisAcquireRWLockWrite(ovsCtNatLockObj, &lockStateNat, 0); OvsNatDeleteKey(&entry->key); - NdisReleaseRWLock(ovsCtNatLockObj, &lockStateNat); } OvsPostCtEventEntry(entry, OVS_EVENT_CT_DELETE); RemoveEntryList(&entry->link); @@ -481,15 +460,12 @@ OvsDetectCtPacket(OvsForwardingContext *fwdCtx, } BOOLEAN -OvsCtKeyAreSame(OVS_CT_KEY ctxKey, OVS_CT_KEY entryKey) +OvsCtEndpointsAreSame(OVS_CT_KEY ctxKey, OVS_CT_KEY entryKey) { return ((NdisEqualMemory(&ctxKey.src, &entryKey.src, sizeof(struct ct_endpoint))) && (NdisEqualMemory(&ctxKey.dst, &entryKey.dst, - sizeof(struct ct_endpoint))) && - (ctxKey.dl_type == entryKey.dl_type) && - (ctxKey.nw_proto == entryKey.nw_proto) && - (ctxKey.zone == entryKey.zone)); + sizeof(struct ct_endpoint)))); } POVS_CT_ENTRY @@ -501,6 +477,11 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) POVS_CT_ENTRY found = NULL; LOCK_STATE_EX lockStateTable; UINT32 bucketIdx; + + if (!ctTotalEntries) { + return found; + } + /* Reverse NAT must be performed before OvsCtLookup, so here * we simply need to flip the src and dst in key and compare * they are equal. Note that flipped key is not equal to @@ -509,10 +490,6 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) OVS_CT_KEY revCtxKey = ctx->key; OvsCtKeyReverse(&revCtxKey); - if (!ctTotalEntries) { - return found; - } - KIRQL irql = KeGetCurrentIrql(); bucketIdx = ctx->hash & CT_HASH_TABLE_MASK; NdisAcquireRWLockRead(ovsCtBucketLock[bucketIdx], &lockStateTable, 0); @@ -520,12 +497,19 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) entry = CONTAINING_RECORD(link, OVS_CT_ENTRY, link); OVS_ACQUIRE_SPIN_LOCK(&(entry->lock), irql); - if (OvsCtKeyAreSame(ctx->key, entry->key)) { + if ((ctx->key.dl_type != entry->key.dl_type) || + (ctx->key.nw_proto != entry->key.nw_proto) || + (ctx->key.zone != entry->key.zone)) { + OVS_RELEASE_SPIN_LOCK(&(entry->lock), irql); + continue; + } + + if (OvsCtEndpointsAreSame(ctx->key, entry->key)) { found = entry; reply = FALSE; } - if (!found && OvsCtKeyAreSame(revCtxKey, entry->key)) { + if (!found && OvsCtEndpointsAreSame(revCtxKey, entry->key)) { found = entry; reply = TRUE; } @@ -651,10 +635,7 @@ OvsCtSetupLookupCtx(OvsFlowKey *flowKey, return NDIS_STATUS_INVALID_PACKET; } - LOCK_STATE_EX lockStateNat; - NdisAcquireRWLockRead(ovsCtNatLockObj, &lockStateNat, 0); natEntry = OvsNatLookup(&ctx->key, TRUE); - NdisReleaseRWLock(ovsCtNatLockObj, &lockStateNat); if (natEntry) { /* Translate address first for reverse NAT */ ctx->key = natEntry->ctEntry->key; @@ -680,7 +661,7 @@ OvsDetectFtpPacket(OvsFlowKey *key) { */ static __inline POVS_CT_ENTRY OvsProcessConntrackEntry(OvsForwardingContext *fwdCtx, - UINT32 l4Offset, + OVS_PACKET_HDR_INFO *layers, OvsConntrackKeyLookupCtx *ctx, OvsFlowKey *key, UINT16 zone, @@ -693,7 +674,7 @@ OvsProcessConntrackEntry(OvsForwardingContext *fwdCtx, UINT32 state = 0; PNET_BUFFER_LIST curNbl = fwdCtx->curNbl; LOCK_STATE_EX lockStateTable; - PNDIS_RW_LOCK_EX bucketLockRef = NULL; + *entryCreated = FALSE; /* If an entry was found, update the state based on TCP flags */ @@ -704,8 +685,9 @@ OvsProcessConntrackEntry(OvsForwardingContext *fwdCtx, } } else { CT_UPDATE_RES result; - result = OvsCtUpdateEntry(entry, curNbl, key->ipKey.nwProto, - l4Offset, ctx->reply, currentTime); + UINT32 bucketIdx; + result = OvsCtUpdateEntry(entry, curNbl, key->ipKey.nwProto, layers, + ctx->reply, currentTime); switch (result) { case CT_UPDATE_VALID: state |= OVS_CS_F_ESTABLISHED; @@ -718,12 +700,12 @@ OvsProcessConntrackEntry(OvsForwardingContext *fwdCtx, break; case CT_UPDATE_NEW: //Delete and update the Conntrack - bucketLockRef = entry->bucketLockRef; - NdisAcquireRWLockWrite(bucketLockRef, &lockStateTable, 0); + bucketIdx = ctx->hash & CT_HASH_TABLE_MASK; + NdisAcquireRWLockWrite(ovsCtBucketLock[bucketIdx], &lockStateTable, 0); OvsCtEntryDelete(ctx->entry, TRUE); - NdisReleaseRWLock(bucketLockRef, &lockStateTable); + NdisReleaseRWLock(ovsCtBucketLock[bucketIdx], &lockStateTable); ctx->entry = NULL; - entry = OvsCtEntryCreate(fwdCtx, key->ipKey.nwProto, l4Offset, + entry = OvsCtEntryCreate(fwdCtx, key->ipKey.nwProto, layers, ctx, key, natInfo, commit, currentTime, entryCreated); if (!entry) { @@ -870,10 +852,10 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, /* Delete entry in reverse direction if 'force' is specified */ if (force && ctx.reply && entry) { - PNDIS_RW_LOCK_EX bucketLockRef = entry->bucketLockRef; - NdisAcquireRWLockWrite(bucketLockRef, &lockStateTable, 0); + UINT32 bucketIdx = ctx.hash & CT_HASH_TABLE_MASK; + NdisAcquireRWLockWrite(ovsCtBucketLock[bucketIdx], &lockStateTable, 0); OvsCtEntryDelete(entry, TRUE); - NdisReleaseRWLock(bucketLockRef, &lockStateTable); + NdisReleaseRWLock(ovsCtBucketLock[bucketIdx], &lockStateTable); entry = NULL; } @@ -886,7 +868,7 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, OvsCtIncrementCounters(entry, ctx.reply, curNbl); } /* Process the entry and update CT flags */ - entry = OvsProcessConntrackEntry(fwdCtx, layers->l4Offset, &ctx, key, + entry = OvsProcessConntrackEntry(fwdCtx, layers, &ctx, key, zone, natInfo, commit, currentTime, &entryCreated); @@ -901,7 +883,7 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, } /* If no matching entry was found, create one and add New state */ entry = OvsCtEntryCreate(fwdCtx, key->ipKey.nwProto, - layers->l4Offset, &ctx, + layers, &ctx, key, natInfo, commit, currentTime, &entryCreated); } @@ -919,13 +901,9 @@ OvsCtExecute_(OvsForwardingContext *fwdCtx, */ KIRQL irql = KeGetCurrentIrql(); OVS_ACQUIRE_SPIN_LOCK(&(entry->lock), irql); - if (natInfo->natAction != NAT_ACTION_NONE) - { - LOCK_STATE_EX lockStateNat; - NdisAcquireRWLockWrite(ovsCtNatLockObj, &lockStateNat, 0); + if (natInfo->natAction != NAT_ACTION_NONE) { OvsNatPacket(fwdCtx, entry, entry->natInfo.natAction, key, ctx.reply); - NdisReleaseRWLock(ovsCtNatLockObj, &lockStateNat); } OvsCtSetMarkLabel(key, entry, mark, labels, &triggerUpdateEvent); @@ -1157,7 +1135,7 @@ OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 *tuple) { PLIST_ENTRY link, next; POVS_CT_ENTRY entry; - LOCK_STATE_EX lockState, lockStateNat; + LOCK_STATE_EX lockState; if (ctTotalEntries) { for (UINT32 i = 0; i < CT_HASH_TABLE_SIZE; i++) { @@ -1189,9 +1167,7 @@ OvsCtFlush(UINT16 zone, struct ovs_key_ct_tuple_ipv4 *tuple) } } - NdisAcquireRWLockWrite(ovsCtNatLockObj, &lockStateNat, 0); OvsNatFlush(zone); - NdisReleaseRWLock(ovsCtNatLockObj, &lockStateNat); return NDIS_STATUS_SUCCESS; } diff --git a/datapath-windows/ovsext/Conntrack.h b/datapath-windows/ovsext/Conntrack.h index 7dc92a1..7a80eea 100644 --- a/datapath-windows/ovsext/Conntrack.h +++ b/datapath-windows/ovsext/Conntrack.h @@ -99,8 +99,6 @@ typedef struct _NAT_ACTION_INFO { } NAT_ACTION_INFO, *PNAT_ACTION_INFO; typedef struct OVS_CT_ENTRY { - /* Reference to ovsCtBucketLock of ovsConntrackTable.*/ - PNDIS_RW_LOCK_EX bucketLockRef; NDIS_SPIN_LOCK lock; /* Protects OVS_CT_ENTRY. */ OVS_CT_KEY key; OVS_CT_KEY rev_key; @@ -156,23 +154,33 @@ OvsConntrackUpdateExpiration(OVS_CT_ENTRY *ctEntry, ctEntry->expiration = now + interval; } -static __inline UINT32 -OvsGetTcpPayloadLength(PNET_BUFFER_LIST nbl) +static const TCPHdr* +OvsGetTcpHeader(PNET_BUFFER_LIST nbl, + OVS_PACKET_HDR_INFO *layers, + VOID *storage, + UINT32 *tcpPayloadLen) { IPHdr *ipHdr; TCPHdr *tcp; - char *ipBuf[sizeof(EthHdr) + sizeof(IPHdr) + sizeof(TCPHdr)]; + VOID *dest = storage; - ipHdr = NdisGetDataBuffer(NET_BUFFER_LIST_FIRST_NB(nbl), sizeof *ipBuf, - (PVOID)&ipBuf, 1 /*no align*/, 0); + ipHdr = NdisGetDataBuffer(NET_BUFFER_LIST_FIRST_NB(nbl), + layers->l4Offset + sizeof(TCPHdr), + NULL, 1 /*no align*/, 0); if (ipHdr == NULL) { - return 0; + return NULL; } - ipHdr = (IPHdr *)((PCHAR)ipHdr + sizeof(EthHdr)); + ipHdr = (IPHdr *)((PCHAR)ipHdr + layers->l3Offset); tcp = (TCPHdr *)((PCHAR)ipHdr + ipHdr->ihl * 4); + if (tcp->doff * 4 >= sizeof *tcp) { + NdisMoveMemory(dest, tcp, sizeof(TCPHdr)); + *tcpPayloadLen = ntohs((ipHdr->tot_len) - (ipHdr->ihl * 4) - + (TCP_HDR_LEN(tcp))); + return storage; + } - return (ntohs(ipHdr->tot_len) - (ipHdr->ihl * 4) - (TCP_HDR_LEN(tcp))); + return NULL; } VOID OvsCleanupConntrack(VOID); @@ -184,17 +192,17 @@ NDIS_STATUS OvsExecuteConntrackAction(OvsForwardingContext *fwdCtx, BOOLEAN OvsConntrackValidateTcpPacket(const TCPHdr *tcp); BOOLEAN OvsConntrackValidateIcmpPacket(const ICMPHdr *icmp); OVS_CT_ENTRY * OvsConntrackCreateTcpEntry(const TCPHdr *tcp, - PNET_BUFFER_LIST nbl, - UINT64 now); + UINT64 now, + UINT32 tcpPayloadLen); NDIS_STATUS OvsCtMapTcpProtoInfoToNl(PNL_BUFFER nlBuf, OVS_CT_ENTRY *conn_); OVS_CT_ENTRY * OvsConntrackCreateOtherEntry(UINT64 now); OVS_CT_ENTRY * OvsConntrackCreateIcmpEntry(UINT64 now); enum CT_UPDATE_RES OvsConntrackUpdateTcpEntry(OVS_CT_ENTRY* conn_, const TCPHdr *tcp, - PNET_BUFFER_LIST nbl, BOOLEAN reply, - UINT64 now); + UINT64 now, + UINT32 tcpPayloadLen); enum CT_UPDATE_RES OvsConntrackUpdateOtherEntry(OVS_CT_ENTRY *conn_, BOOLEAN reply, UINT64 now); From patchwork Tue Jun 19 17:32:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anand Kumar X-Patchwork-Id: 931781 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=vmware.com Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 419FTM42PFz9s4w for ; Wed, 20 Jun 2018 03:34:47 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 85212111F; Tue, 19 Jun 2018 17:33:01 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 1ADB9110A for ; Tue, 19 Jun 2018 17:32:58 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from EX13-EDG-OU-002.vmware.com (ex13-edg-ou-002.vmware.com [208.91.0.190]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4534C13E for ; Tue, 19 Jun 2018 17:32:57 +0000 (UTC) Received: from sc9-mailhost3.vmware.com (10.113.161.73) by EX13-EDG-OU-002.vmware.com (10.113.208.156) with Microsoft SMTP Server id 15.0.1156.6; Tue, 19 Jun 2018 10:32:51 -0700 Received: from WIN-ANAND1.vmware.com (win-anand1.prom.eng.vmware.com [10.33.79.212]) by sc9-mailhost3.vmware.com (Postfix) with ESMTP id B118240EF5; Tue, 19 Jun 2018 10:32:54 -0700 (PDT) From: Anand Kumar To: Date: Tue, 19 Jun 2018 10:32:51 -0700 Message-ID: <20180619173251.6764-4-kumaranand@vmware.com> X-Mailer: git-send-email 2.9.3.windows.1 In-Reply-To: <20180619173251.6764-1-kumaranand@vmware.com> References: <20180619173251.6764-1-kumaranand@vmware.com> MIME-Version: 1.0 Received-SPF: None (EX13-EDG-OU-002.vmware.com: kumaranand@vmware.com does not designate permitted sender hosts) X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v4 3/3] datapath-windows: Compute ct hash based on 5-tuple and zone X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Conntrack 5-tuple consists of src address, dst address, src port, dst port and protocol which will be unique to a ct session. Use this information along with zone to compute hash. Also re-factor conntrack code related to parsing netlink attributes. Testing: Verified loading/unloading the driver with driver verified enabled. Ran TCP/UDP and ICMP traffic. Signed-off-by: Anand Kumar Acked-by: Alin Gabriel Serdean --- v1->v2: Updated commit message to include testing done. v2->v3: No change v3->v4: No change --- datapath-windows/ovsext/Conntrack.c | 228 ++++++++++++++++++------------------ datapath-windows/ovsext/Conntrack.h | 2 - 2 files changed, 116 insertions(+), 114 deletions(-) diff --git a/datapath-windows/ovsext/Conntrack.c b/datapath-windows/ovsext/Conntrack.c index 8fa1e07..dd16602 100644 --- a/datapath-windows/ovsext/Conntrack.c +++ b/datapath-windows/ovsext/Conntrack.c @@ -151,6 +151,24 @@ OvsCleanupConntrack(VOID) OvsNatCleanup(); } +/* + *---------------------------------------------------------------------------- + * OvsCtHashKey + * Compute hash using 5-tuple and zone. + *---------------------------------------------------------------------------- + */ +UINT32 +OvsCtHashKey(const OVS_CT_KEY *key) +{ + UINT32 hsrc, hdst, hash; + hsrc = key->src.addr.ipv4 | ntohl(key->src.port); + hdst = key->dst.addr.ipv4 | ntohl(key->dst.port); + hash = hsrc ^ hdst; /* TO identify reverse traffic */ + hash = hash | (key->zone + key->nw_proto); + hash = OvsJhashWords((uint32_t*) &hash, 1, hash); + return hash; +} + static __inline VOID OvsCtKeyReverse(OVS_CT_KEY *key) { @@ -231,7 +249,7 @@ OvsCtAddEntry(POVS_CT_ENTRY entry, if (!OvsNatTranslateCtEntry(entry)) { return FALSE; } - ctx->hash = OvsHashCtKey(&entry->key); + ctx->hash = OvsCtHashKey(&entry->key); } else { entry->natInfo.natAction = natInfo->natAction; } @@ -531,20 +549,6 @@ OvsCtLookup(OvsConntrackKeyLookupCtx *ctx) return found; } -UINT32 -OvsHashCtKey(const OVS_CT_KEY *key) -{ - UINT32 hsrc, hdst, hash; - hsrc = OvsJhashBytes((UINT32*) &key->src, sizeof(key->src), 0); - hdst = OvsJhashBytes((UINT32*) &key->dst, sizeof(key->dst), 0); - hash = hsrc ^ hdst; /* TO identify reverse traffic */ - hash = OvsJhashBytes((uint32_t *) &key->dst + 1, - ((uint32_t *) (key + 1) - - (uint32_t *) (&key->dst + 1)), - hash); - return hash; -} - static UINT8 OvsReverseIcmpType(UINT8 type) { @@ -642,7 +646,7 @@ OvsCtSetupLookupCtx(OvsFlowKey *flowKey, OvsCtKeyReverse(&ctx->key); } - ctx->hash = OvsHashCtKey(&ctx->key); + ctx->hash = OvsCtHashKey(&ctx->key); return NDIS_STATUS_SUCCESS; } @@ -953,7 +957,6 @@ OvsExecuteConntrackAction(OvsForwardingContext *fwdCtx, OvsFlowKey *key, const PNL_ATTR a) { - PNL_ATTR ctAttr; BOOLEAN commit = FALSE; BOOLEAN force = FALSE; BOOLEAN postUpdateEvent = FALSE; @@ -973,109 +976,110 @@ OvsExecuteConntrackAction(OvsForwardingContext *fwdCtx, return status; } - /* XXX Convert this to NL_ATTR_FOR_EACH */ - ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_ZONE); - if (ctAttr) { - zone = NlAttrGetU16(ctAttr); - } - ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_COMMIT); - if (ctAttr) { - commit = TRUE; - } - ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_MARK); - if (ctAttr) { - mark = NlAttrGet(ctAttr); - } - ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_LABELS); - if (ctAttr) { - labels = NlAttrGet(ctAttr); - } - natActionInfo.natAction = NAT_ACTION_NONE; - ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_NAT); - if (ctAttr) { - /* Pares Nested NAT attributes. */ - PNL_ATTR natAttr; - unsigned int left; - BOOLEAN hasMinIp = FALSE; - BOOLEAN hasMinPort = FALSE; - BOOLEAN hasMaxIp = FALSE; - BOOLEAN hasMaxPort = FALSE; - NL_NESTED_FOR_EACH_UNSAFE (natAttr, left, ctAttr) { - enum ovs_nat_attr subtype = NlAttrType(natAttr); - switch(subtype) { - case OVS_NAT_ATTR_SRC: - case OVS_NAT_ATTR_DST: - natActionInfo.natAction |= - ((subtype == OVS_NAT_ATTR_SRC) - ? NAT_ACTION_SRC : NAT_ACTION_DST); + PNL_ATTR ctAttr = NULL; + INT left; + + NL_NESTED_FOR_EACH (ctAttr, left, a) { + switch(NlAttrType(ctAttr)) { + case OVS_CT_ATTR_ZONE: + zone = NlAttrGetU16(ctAttr); + break; + case OVS_CT_ATTR_COMMIT: + commit = TRUE; + break; + case OVS_CT_ATTR_MARK: + mark = NlAttrGet(ctAttr); break; - case OVS_NAT_ATTR_IP_MIN: - memcpy(&natActionInfo.minAddr, - NlAttrData(natAttr), NlAttrGetSize(natAttr)); - hasMinIp = TRUE; + case OVS_CT_ATTR_LABELS: + labels = NlAttrGet(ctAttr); break; - case OVS_NAT_ATTR_IP_MAX: - memcpy(&natActionInfo.maxAddr, - NlAttrData(natAttr), NlAttrGetSize(natAttr)); - hasMaxIp = TRUE; + case OVS_CT_ATTR_HELPER: + helper = NlAttrGetString(ctAttr); + if (helper == NULL) { + return NDIS_STATUS_INVALID_PARAMETER; + } + if (strcmp("ftp", helper) != 0) { + /* Only support FTP */ + return NDIS_STATUS_NOT_SUPPORTED; + } break; - case OVS_NAT_ATTR_PROTO_MIN: - natActionInfo.minPort = NlAttrGetU16(natAttr); - hasMinPort = TRUE; + case OVS_CT_ATTR_FORCE_COMMIT: + force = TRUE; + /* Force implicitly means commit */ + commit = TRUE; break; - case OVS_NAT_ATTR_PROTO_MAX: - natActionInfo.maxPort = NlAttrGetU16(natAttr); - hasMaxPort = TRUE; + case OVS_CT_ATTR_EVENTMASK: + eventmask = NlAttrGetU32(ctAttr); + /* Only mark and label updates are supported. */ + if (eventmask & (1 << IPCT_MARK | 1 << IPCT_LABEL)) + postUpdateEvent = TRUE; break; - case OVS_NAT_ATTR_PERSISTENT: - case OVS_NAT_ATTR_PROTO_HASH: - case OVS_NAT_ATTR_PROTO_RANDOM: + case OVS_CT_ATTR_NAT: + natActionInfo.natAction = NAT_ACTION_NONE; + /* Pares Nested NAT attributes. */ + PNL_ATTR natAttr; + unsigned int natLeft; + BOOLEAN hasMinIp = FALSE; + BOOLEAN hasMinPort = FALSE; + BOOLEAN hasMaxIp = FALSE; + BOOLEAN hasMaxPort = FALSE; + NL_NESTED_FOR_EACH_UNSAFE (natAttr, natLeft, ctAttr) { + enum ovs_nat_attr subtype = NlAttrType(natAttr); + switch(subtype) { + case OVS_NAT_ATTR_SRC: + case OVS_NAT_ATTR_DST: + natActionInfo.natAction |= + ((subtype == OVS_NAT_ATTR_SRC) + ? NAT_ACTION_SRC : NAT_ACTION_DST); + break; + case OVS_NAT_ATTR_IP_MIN: + memcpy(&natActionInfo.minAddr, + NlAttrData(natAttr), NlAttrGetSize(natAttr)); + hasMinIp = TRUE; + break; + case OVS_NAT_ATTR_IP_MAX: + memcpy(&natActionInfo.maxAddr, + NlAttrData(natAttr), NlAttrGetSize(natAttr)); + hasMaxIp = TRUE; + break; + case OVS_NAT_ATTR_PROTO_MIN: + natActionInfo.minPort = NlAttrGetU16(natAttr); + hasMinPort = TRUE; + break; + case OVS_NAT_ATTR_PROTO_MAX: + natActionInfo.maxPort = NlAttrGetU16(natAttr); + hasMaxPort = TRUE; + break; + case OVS_NAT_ATTR_PERSISTENT: + case OVS_NAT_ATTR_PROTO_HASH: + case OVS_NAT_ATTR_PROTO_RANDOM: + break; + } + } + if (natActionInfo.natAction == NAT_ACTION_NONE) { + natActionInfo.natAction = NAT_ACTION_REVERSE; + } + if (hasMinIp && !hasMaxIp) { + memcpy(&natActionInfo.maxAddr, + &natActionInfo.minAddr, + sizeof(natActionInfo.maxAddr)); + } + if (hasMinPort && !hasMaxPort) { + natActionInfo.maxPort = natActionInfo.minPort; + } + if (hasMinPort || hasMaxPort) { + if (natActionInfo.natAction & NAT_ACTION_SRC) { + natActionInfo.natAction |= NAT_ACTION_SRC_PORT; + } else if (natActionInfo.natAction & NAT_ACTION_DST) { + natActionInfo.natAction |= NAT_ACTION_DST_PORT; + } + } + break; + default: + OVS_LOG_TRACE("Invalid netlink attr type: %u", NlAttrType(ctAttr)); break; - } - } - if (natActionInfo.natAction == NAT_ACTION_NONE) { - natActionInfo.natAction = NAT_ACTION_REVERSE; - } - if (hasMinIp && !hasMaxIp) { - memcpy(&natActionInfo.maxAddr, - &natActionInfo.minAddr, - sizeof(natActionInfo.maxAddr)); - } - if (hasMinPort && !hasMaxPort) { - natActionInfo.maxPort = natActionInfo.minPort; - } - if (hasMinPort || hasMaxPort) { - if (natActionInfo.natAction & NAT_ACTION_SRC) { - natActionInfo.natAction |= NAT_ACTION_SRC_PORT; - } else if (natActionInfo.natAction & NAT_ACTION_DST) { - natActionInfo.natAction |= NAT_ACTION_DST_PORT; - } - } - } - ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_HELPER); - if (ctAttr) { - helper = NlAttrGetString(ctAttr); - if (helper == NULL) { - return NDIS_STATUS_INVALID_PARAMETER; - } - if (strcmp("ftp", helper) != 0) { - /* Only support FTP */ - return NDIS_STATUS_NOT_SUPPORTED; } } - ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_FORCE_COMMIT); - if (ctAttr) { - force = TRUE; - /* Force implicitly means commit */ - commit = TRUE; - } - ctAttr = NlAttrFindNested(a, OVS_CT_ATTR_EVENTMASK); - if (ctAttr) { - eventmask = NlAttrGetU32(ctAttr); - /* Only mark and label updates are supported. */ - if (eventmask & (1 << IPCT_MARK | 1 << IPCT_LABEL)) - postUpdateEvent = TRUE; - } /* If newNbl is not allocated, use the current Nbl*/ status = OvsCtExecute_(fwdCtx, key, layers, commit, force, zone, mark, labels, helper, &natActionInfo, diff --git a/datapath-windows/ovsext/Conntrack.h b/datapath-windows/ovsext/Conntrack.h index 7a80eea..d4152b3 100644 --- a/datapath-windows/ovsext/Conntrack.h +++ b/datapath-windows/ovsext/Conntrack.h @@ -237,6 +237,4 @@ NDIS_STATUS OvsCtHandleFtp(PNET_BUFFER_LIST curNbl, UINT64 currentTime, POVS_CT_ENTRY entry, BOOLEAN request); - -UINT32 OvsHashCtKey(const OVS_CT_KEY *key); #endif /* __OVS_CONNTRACK_H_ */