From patchwork Fri Jan 27 22:26:59 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Anand Kumar X-Patchwork-Id: 720968 X-Patchwork-Delegate: diproiettod@vmware.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3v9D185zyfz9t2b for ; Sat, 28 Jan 2017 09:27:08 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=onevmw.onmicrosoft.com header.i=@onevmw.onmicrosoft.com header.b="ltzTgrrG"; dkim-atps=neutral Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 4BCDDBC0; Fri, 27 Jan 2017 22:27:06 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 1D49594B for ; Fri, 27 Jan 2017 22:27:05 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0083.outbound.protection.outlook.com [104.47.42.83]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 02A3E1CC for ; Fri, 27 Jan 2017 22:27:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=onevmw.onmicrosoft.com; s=selector1-vmware-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=uhzE6IoELzMqA7ii+aJ7Wg6nyF2W0ASwgfxeatKkbuo=; b=ltzTgrrGWktBOrxchYP60ub8PEFMEk0FAuE/B+ffe3fw4MKGJ9oB5nqBtND9QiP0IxHINHsk2SyV14ooTEO6fTu4hn8s6hAb/eaPfmfsGRTDzZfmCSqDqpVg1HK0mCKua7cB0RRIVnvJBtNpZGp7buVwvJ0Jt76pRb4ICV5wjs8= Received: from DM5PR05MB2827.namprd05.prod.outlook.com (10.168.175.143) by DM5PR05MB3193.namprd05.prod.outlook.com (10.173.219.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.888.5; Fri, 27 Jan 2017 22:27:00 +0000 Received: from DM5PR05MB2827.namprd05.prod.outlook.com ([10.168.175.143]) by DM5PR05MB2827.namprd05.prod.outlook.com ([10.168.175.143]) with mapi id 15.01.0874.016; Fri, 27 Jan 2017 22:27:00 +0000 From: Anand Kumar To: Shashank Ram , "dev@openvswitch.org" Thread-Topic: [ovs-dev] [PATCH v2 1/5] datapath-windows: Added a new file to support Ipv4 fragments. Thread-Index: AQHSeOx5JyjENQxMNUyZZ+U5T9QW2w== Date: Fri, 27 Jan 2017 22:26:59 +0000 Message-ID: References: <20170112211346.612-1-kumaranand@vmware.com> <20170112211346.612-2-kumaranand@vmware.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=kumaranand@vmware.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [208.91.1.34] x-microsoft-exchange-diagnostics: 1; DM5PR05MB3193; 7:Kohx1JsJ67b5qBSHJNtNjHUidmysZmvgPaZbEz1KrBsgdY2L8uCoF7uXTaqrMpYjE7DgoQT99R7gj2MhYtZf5sCTfH62l0VFa23ILGVFItnkVgfxUWY5KS/bPiZXEakxW5UXtFMZA1/Szz6bCznY/KgquOseTyXdjBgEsqDZo7HqqEOxISSCfYYRkx7PI2IweGvFPOQzF2EgzaftSqmt7bxqFMxjIVAkva/lvHgyJLOYNsu3QzDptta3bTdExz3bZrc7O97P2Jpf/oH2Rk4bZZUQTuZlcMZbW3JNdcKXG16wuw6zLp4ji4b+v+Vh0UENAXTQ/CvOAN63k2NwKDK5N0f8F8QRobQ7WegvVHbHXLOBfBiLFoKokb7/DtmSo6FR55ALaMHnmfuGUHDS88plfdoQD1G0bLYB6M2yVjSROV7J43bT5Bd0C1RzWpb7nFB9GoqO+yHGqymEmcbHIplTsg==; 20:wqF407q2UvNDBBrJbYeP6KIpCgCw3J0mc+OOgdldvn86v3MepFnGFyW6XiJ18Z2FnyzOmy2MgtHPjWdKqtR/IfXn8iwq+KIaxIfh1po7cG+Ok6Upb1fgg0O5MAtWddPrDpphAUWWYMI8SRQ72jI6FG8geZj8rVduJnCxAeODziM= x-forefront-antispam-report: SFV:SKI; SCL:-1SFV:NSPM; SFS:(10009020)(7916002)(39450400003)(377454003)(199003)(189002)(3280700002)(606005)(6436002)(97736004)(5001770100001)(5890100001)(2501003)(83716003)(7906003)(7736002)(99286003)(107886002)(189998001)(8676002)(86362001)(5660300001)(2950100002)(8936002)(81156014)(81166006)(25786008)(575784001)(82746002)(6506006)(6116002)(77096006)(92566002)(102836003)(38730400001)(3846002)(6486002)(450100001)(33656002)(122556002)(54356999)(76176999)(106116001)(101416001)(50986999)(53936002)(105586002)(66066001)(53946003)(68736007)(6512007)(54896002)(6306002)(236005)(2906002)(345774005)(3660700001)(229853002)(36756003)(2900100001)(106356001)(9326002)(21314002)(104396002)(579004)(559001); DIR:OUT; SFP:1101; SCL:1; SRVR:DM5PR05MB3193; H:DM5PR05MB2827.namprd05.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; x-ms-office365-filtering-correlation-id: 13345df9-6c5f-4419-cba4-08d447039c60 x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(22001); SRVR:DM5PR05MB3193; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(61668805478150)(10436049006162)(216315784871565)(21748063052155); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040375)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6041248)(20161123564025)(20161123562025)(20161123555025)(20161123560025)(20161123558021)(6072148); SRVR:DM5PR05MB3193; BCL:0; PCL:0; RULEID:; SRVR:DM5PR05MB3193; x-forefront-prvs: 0200DDA8BE received-spf: None (protection.outlook.com: vmware.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: vmware.com X-MS-Exchange-CrossTenant-originalarrivaltime: 27 Jan 2017 22:26:59.8714 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR05MB3193 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HTML_MESSAGE, RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org X-Content-Filtered-By: Mailman/MimeDel 2.1.12 Subject: Re: [ovs-dev] [PATCH v2 1/5] datapath-windows: Added a new file to support Ipv4 fragments. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Hi Shashank, Thank you for reviewing the patch. I have sent out a new version of the patch series addressing your comments. Based on your suggestion, I’m using an event to signal the clean up thread to free up the memory. Regards, Anand Kumar From: Shashank Ram Date: Wednesday, January 18, 2017 at 9:06 AM To: Anand Kumar , "dev@openvswitch.org" Subject: Re: [ovs-dev] [PATCH v2 1/5] datapath-windows: Added a new file to support Ipv4 fragments. Hi Anand, following are my comments: 1. Since you are just using a RW lock without specifically differentiating between and read and write protection, you could use a spin lock instead. Spin locks in general are recommended if all you want is a lock. 2. Instead of running the fragment cleaner thread every minute, you should also make use of an event to signal the thread when a fragment is removed from your fragment list. This way, you avoid holding unnecessary memory when it could be cleared. Please make use of an event to signal the thread along with a default timeout, in case the signal never arrives. Thanks, Shashank ________________________________ From: ovs-dev-bounces@openvswitch.org on behalf of Anand Kumar Sent: Thursday, January 12, 2017 1:13:42 PM To: dev@openvswitch.org Subject: [ovs-dev] [PATCH v2 1/5] datapath-windows: Added a new file to support Ipv4 fragments. This patch adds functionalities to handle IPv4 fragments, which will be used by Conntrack module. Added a new structure to hold the Ipv4 fragments and a hash table to hold Ipv4 datagram entries. Also added a clean up thread that runs every minute to delete the expired IPv4 datagram entries. The individual fragments are ignored by the conntrack. Once all the fragments are recieved, a new NBL is created out of the reassembled fragments and conntrack executes actions on the new NBL. Created new APIs OvsProcessIpv4Fragment() to process individual fragments, OvsIpv4Reassemble() to reassemble Ipv4 fragments. --- datapath-windows/automake.mk | 2 + datapath-windows/ovsext/Debug.h | 3 +- datapath-windows/ovsext/IpFragment.c | 506 +++++++++++++++++++++++++++++++++ datapath-windows/ovsext/IpFragment.h | 74 +++++ datapath-windows/ovsext/Switch.c | 9 + datapath-windows/ovsext/ovsext.vcxproj | 2 + 6 files changed, 595 insertions(+), 1 deletion(-) create mode 100644 datapath-windows/ovsext/IpFragment.c create mode 100644 datapath-windows/ovsext/IpFragment.h -- 2.9.3.windows.1 _______________________________________________ dev mailing list dev@openvswitch.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=6OuVHk-mnufSWzkKa74UkQ&m=I5c08LjVSUqyr1NmvoFFEPPDfrSIQhDwNr4ybCJddFg&s=bcz73RoSkTyq2VMcRCK0nsPiI0dAKuhAW1JsKq7SawI&e= diff --git a/datapath-windows/automake.mk b/datapath-windows/automake.mk index 53983ae..4f7b55a 100644 --- a/datapath-windows/automake.mk +++ b/datapath-windows/automake.mk @@ -32,6 +32,8 @@ EXTRA_DIST += \ datapath-windows/ovsext/Flow.h \ datapath-windows/ovsext/Gre.h \ datapath-windows/ovsext/Gre.c \ + datapath-windows/ovsext/IpFragment.c \ + datapath-windows/ovsext/IpFragment.h \ datapath-windows/ovsext/IpHelper.c \ datapath-windows/ovsext/IpHelper.h \ datapath-windows/ovsext/Jhash.c \ diff --git a/datapath-windows/ovsext/Debug.h b/datapath-windows/ovsext/Debug.h index cae6ac9..6de1812 100644 --- a/datapath-windows/ovsext/Debug.h +++ b/datapath-windows/ovsext/Debug.h @@ -42,8 +42,9 @@ #define OVS_DBG_STT BIT32(22) #define OVS_DBG_CONTRK BIT32(23) #define OVS_DBG_GENEVE BIT32(24) +#define OVS_DBG_IPFRAG BIT32(25) -#define OVS_DBG_LAST 24 /* Set this to the last defined module number. */ +#define OVS_DBG_LAST 25 /* Set this to the last defined module number. */ /* Please add above OVS_DBG_LAST. */ #define OVS_DBG_ERROR DPFLTR_ERROR_LEVEL diff --git a/datapath-windows/ovsext/IpFragment.c b/datapath-windows/ovsext/IpFragment.c new file mode 100644 index 0000000..2ce3932 --- /dev/null +++ b/datapath-windows/ovsext/IpFragment.c @@ -0,0 +1,506 @@ +/* + * Copyright (c) 2017 VMware, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_licenses_LICENSE-2D2.0&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=6OuVHk-mnufSWzkKa74UkQ&m=I5c08LjVSUqyr1NmvoFFEPPDfrSIQhDwNr4ybCJddFg&s=MqMW4vcIn0dMvg5iQsDXTsvkSW5hnJ95l3b9ZmAGdwk&e= + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include "Conntrack.h" +#include "Debug.h" +#include "IpFragment.h" +#include "Jhash.h" +#include "Offload.h" +#include "PacketParser.h" + +#ifdef OVS_DBG_MOD +#undef OVS_DBG_MOD +#endif +#define OVS_DBG_MOD OVS_DBG_IPFRAG + +/* Function declarations */ +static VOID OvsIpFragmentEntryCleaner(PVOID data); +static VOID OvsIpFragmentEntryDelete(POVS_IPFRAG_ENTRY entry); + +/* Global and static variables */ +static OVS_IPFRAG_THREAD_CTX ipFragThreadCtx; +static PNDIS_RW_LOCK_EX ovsIpFragmentHashLockObj; +static UINT64 ipTotalEntries; +static PLIST_ENTRY OvsIpFragTable; + +NDIS_STATUS +OvsInitIpFragment(POVS_SWITCH_CONTEXT context) +{ + + NDIS_STATUS status; + HANDLE threadHandle = NULL; + + /* Init the sync-lock */ + ovsIpFragmentHashLockObj = NdisAllocateRWLock(context->NdisFilterHandle); + if (ovsIpFragmentHashLockObj == NULL) { + return STATUS_INSUFFICIENT_RESOURCES; + } + + /* Init the Hash Buffer */ + OvsIpFragTable = OvsAllocateMemoryWithTag(sizeof(LIST_ENTRY) + * IP_FRAG_HASH_TABLE_SIZE, + OVS_MEMORY_TAG); + if (OvsIpFragTable == NULL) { + NdisFreeRWLock(ovsIpFragmentHashLockObj); + ovsIpFragmentHashLockObj = NULL; + return STATUS_INSUFFICIENT_RESOURCES; + } + + for (int i = 0; i < IP_FRAG_HASH_TABLE_SIZE; i++) { + InitializeListHead(&OvsIpFragTable[i]); + } + + /* Init Cleaner Thread */ + KeInitializeEvent(&ipFragThreadCtx.event, NotificationEvent, FALSE); + status = PsCreateSystemThread(&threadHandle, SYNCHRONIZE, NULL, NULL, + NULL, OvsIpFragmentEntryCleaner, + &ipFragThreadCtx); + + if (status != STATUS_SUCCESS) { + OvsFreeMemoryWithTag(OvsIpFragTable, OVS_MEMORY_TAG); + OvsIpFragTable = NULL; + NdisFreeRWLock(ovsIpFragmentHashLockObj); + ovsIpFragmentHashLockObj = NULL; + return status; + } + + ObReferenceObjectByHandle(threadHandle, SYNCHRONIZE, NULL, KernelMode, + &ipFragThreadCtx.threadObject, NULL); + ZwClose(threadHandle); + threadHandle = NULL; + return STATUS_SUCCESS; +} + +static __inline UINT32 +OvsGetIPFragmentHash(POVS_IPFRAG_KEY fragKey) +{ + UINT32 arr[6]; + arr[0] = (UINT32)fragKey->protocol; + arr[1] = (UINT32)fragKey->id; + arr[2] = (UINT32)fragKey->sAddr; + arr[3] = (UINT32)fragKey->dAddr; + arr[4] = (UINT32)((fragKey->tunnelId & 0xFFFFFFFF00000000LL) >> 32); + arr[5] = (UINT32)(fragKey->tunnelId & 0xFFFFFFFFLL); + return OvsJhashWords(arr, 6, OVS_HASH_BASIS); +} + +static __inline POVS_IPFRAG_ENTRY +OvsLookupIPFrag(POVS_IPFRAG_KEY fragKey, UINT32 hash) +{ + POVS_IPFRAG_ENTRY entry; + PLIST_ENTRY link; + LOCK_STATE_EX lockState; + + NdisAcquireRWLockWrite(ovsIpFragmentHashLockObj, &lockState, 0); + LIST_FORALL(&OvsIpFragTable[hash & IP_FRAG_HASH_TABLE_MASK], link) { + entry = CONTAINING_RECORD(link, OVS_IPFRAG_ENTRY, link); + if (entry->fragKey.dAddr == fragKey->dAddr && + entry->fragKey.sAddr == fragKey->sAddr && + entry->fragKey.id == fragKey->id && + entry->fragKey.protocol == fragKey->protocol && + entry->fragKey.tunnelId == fragKey->tunnelId) { + NdisReleaseRWLock(ovsIpFragmentHashLockObj, &lockState); + return entry; + } + } + NdisReleaseRWLock(ovsIpFragmentHashLockObj, &lockState); + return NULL; +} + +/* +*---------------------------------------------------------------------------- +* OvsIpv4Reassemble +* Reassemble the ipv4 fragments and return newNbl on success. +* Should be called after acquiring the lockObj for the entry. +*---------------------------------------------------------------------------- +*/ +NDIS_STATUS +OvsIpv4Reassemble(POVS_SWITCH_CONTEXT switchContext, + PNET_BUFFER_LIST *curNbl, + OvsCompletionList *completionList, + NDIS_SWITCH_PORT_ID sourcePort, + POVS_IPFRAG_ENTRY entry, + PNET_BUFFER_LIST *newNbl) +{ + NDIS_STATUS status = NDIS_STATUS_SUCCESS; + NDIS_STRING filterReason; + POVS_BUFFER_CONTEXT ctx; + PNET_BUFFER curNb; + EthHdr *eth; + IPHdr *ipHdr, *newIpHdr; + CHAR *ethBuf[sizeof(EthHdr)]; + CHAR *packetBuf; + UINT16 ipHdrLen, packetLen, packetHeader; + POVS_FRAGMENT_LIST head = NULL; + + curNb = NET_BUFFER_LIST_FIRST_NB(*curNbl); + ASSERT(NET_BUFFER_NEXT_NB(curNb) == NULL); + + eth = (EthHdr*)NdisGetDataBuffer(curNb, ETH_HEADER_LENGTH, + (PVOID)ðBuf, 1, 0); + if (eth == NULL) { + return NDIS_STATUS_INVALID_PACKET; + } + ipHdr = (IPHdr *)((PCHAR)eth + ETH_HEADER_LENGTH); + if (ipHdr == NULL) { + return NDIS_STATUS_INVALID_PACKET; + } + ipHdrLen = (UINT16)(ipHdr->ihl * 4); + packetLen = ETH_HEADER_LENGTH + ipHdrLen + entry->totalLen; + packetBuf = (CHAR*)OvsAllocateMemoryWithTag(packetLen, + OVS_MEMORY_TAG); + if (packetBuf == NULL) { + OVS_LOG_ERROR("Insufficient resources, failed to allocate packetBuf"); + return NDIS_STATUS_RESOURCES; + } + + /* copy Ethernet header */ + NdisMoveMemory(packetBuf, eth, ETH_HEADER_LENGTH); + /* copy ipv4 header to packet buff */ + NdisMoveMemory(packetBuf + ETH_HEADER_LENGTH, ipHdr, ipHdrLen); + + /* update new ip header */ + newIpHdr = (IPHdr *)(packetBuf + ETH_HEADER_LENGTH); + newIpHdr->frag_off = 0; + newIpHdr->tot_len = htons(packetLen - ETH_HEADER_LENGTH); + newIpHdr->check = 0; + newIpHdr->check = IPChecksum((UINT8 *)packetBuf + ETH_HEADER_LENGTH, + ipHdrLen, 0); + packetHeader = ETH_HEADER_LENGTH + ipHdrLen; + head = entry->head; + while (head) { + ASSERT((packetHeader + head->offset) <= packetLen); + NdisMoveMemory(packetBuf + packetHeader + head->offset, + head->pbuff, head->len); + head = head->next; + } + /* Create new nbl from the flat buffer */ + *newNbl = OvsAllocateNBLFromBuffer(switchContext, packetBuf, packetLen); + if (*newNbl == NULL) { + OVS_LOG_ERROR("Insufficient resources, failed to allocate newNbl"); + status = NDIS_STATUS_RESOURCES; + } + + OvsFreeMemoryWithTag(packetBuf, OVS_MEMORY_TAG); + /* Timeout the entry so that clean up thread deletes it .*/ + entry->expiration -= IPFRAG_ENTRY_TIMEOUT; + + /* Complete the fragment NBL */ + ctx = (POVS_BUFFER_CONTEXT)NET_BUFFER_LIST_CONTEXT_DATA_START(*curNbl); + if (ctx->flags & OVS_BUFFER_NEED_COMPLETE) { + RtlInitUnicodeString(&filterReason, L"Complete last fragment"); + OvsAddPktCompletionList(completionList, TRUE, sourcePort, *curNbl, 1, + &filterReason); + } else { + OvsCompleteNBL(switchContext, *curNbl, TRUE); + } + *curNbl = *newNbl; + return status; +} +/* +*---------------------------------------------------------------------------- +* OvsProcessIpv4Fragment +* Reassemble the fragments once all the fragments are recieved and +* return NDIS_STATUS_PENDING for the pending fragments +* XXX - Instead of copying NBls, Keep the NBLs in limbo state. +*---------------------------------------------------------------------------- +*/ +NDIS_STATUS +OvsProcessIpv4Fragment(POVS_SWITCH_CONTEXT switchContext, + PNET_BUFFER_LIST *curNbl, + OvsCompletionList *completionList, + NDIS_SWITCH_PORT_ID sourcePort, + UINT16 *mru, + ovs_be64 tunnelId, + PNET_BUFFER_LIST *newNbl) +{ + NDIS_STATUS status = NDIS_STATUS_PENDING; + PNET_BUFFER curNb; + CHAR *ethBuf[sizeof(EthHdr)]; + UINT16 offset, flags; + UINT16 payloadLen, ipHdrLen; + UINT32 hash; + UINT64 currentTime; + EthHdr *eth; + IPHdr *ipHdr; + OVS_IPFRAG_KEY fragKey; + POVS_IPFRAG_ENTRY entry; + POVS_FRAGMENT_LIST fragStorage; + LOCK_STATE_EX htLockState, entryLockState; + + curNb = NET_BUFFER_LIST_FIRST_NB(*curNbl); + ASSERT(NET_BUFFER_NEXT_NB(curNb) == NULL); + + eth = (EthHdr*)NdisGetDataBuffer(curNb, ETH_HEADER_LENGTH, + (PVOID)ðBuf, 1, 0); + if (eth == NULL) { + return NDIS_STATUS_INVALID_PACKET; + } + + ipHdr = (IPHdr *)((PCHAR)eth + ETH_HEADER_LENGTH); + if (ipHdr == NULL) { + return NDIS_STATUS_INVALID_PACKET; + } + ipHdrLen = (UINT16)(ipHdr->ihl * 4); + payloadLen = ntohs(ipHdr->tot_len) - ipHdrLen; + offset = ntohs(ipHdr->frag_off) & IP_OFFSET; + offset <<= 3; + flags = ntohs(ipHdr->frag_off) & IP_MF; + + /*Copy fragment specific fields. */ + fragKey.protocol = ipHdr->protocol; + fragKey.id = ipHdr->id; + fragKey.sAddr = ipHdr->saddr; + fragKey.dAddr = ipHdr->daddr; + fragKey.tunnelId = tunnelId; + /* Padding. */ + NdisZeroMemory(&fragKey.pad_1, 3); + fragKey.pad_2 = 0; + + fragStorage = (POVS_FRAGMENT_LIST ) + OvsAllocateMemoryWithTag(sizeof(OVS_FRAGMENT_LIST), OVS_MEMORY_TAG); + if (fragStorage == NULL) { + OVS_LOG_ERROR("Insufficient resources, failed to allocate fragStorage"); + return NDIS_STATUS_RESOURCES; + } + + fragStorage->pbuff = (CHAR *)OvsAllocateMemoryWithTag(payloadLen, + OVS_MEMORY_TAG); + if (fragStorage->pbuff == NULL) { + OVS_LOG_ERROR("Insufficient resources, failed to allocate fragStorage"); + OvsFreeMemoryWithTag(fragStorage, OVS_MEMORY_TAG); + return NDIS_STATUS_RESOURCES; + } + + /* Copy payload from nbl to fragment storage. */ + if (OvsGetPacketBytes(*curNbl, payloadLen, ETH_HEADER_LENGTH + ipHdrLen, + fragStorage->pbuff) == NULL) { + status = NDIS_STATUS_RESOURCES; + goto payload_copy_error; + } + fragStorage->len = payloadLen; + fragStorage->offset = offset; + fragStorage->next = NULL; + hash = OvsGetIPFragmentHash(&fragKey); + entry = OvsLookupIPFrag(&fragKey, hash); + if (entry == NULL) { + entry = (POVS_IPFRAG_ENTRY) + OvsAllocateMemoryWithTag(sizeof(OVS_IPFRAG_ENTRY), + OVS_MEMORY_TAG); + if (entry == NULL) { + status = NDIS_STATUS_RESOURCES; + goto payload_copy_error; + } + /* Copy the fragmeny key. */ + NdisZeroMemory(entry, sizeof(OVS_IPFRAG_ENTRY)); + NdisMoveMemory(&(entry->fragKey), &fragKey, + sizeof(OVS_IPFRAG_KEY)); + /* Update maximum recieving unit. */ + entry->mru = payloadLen + ipHdrLen; + entry->recvdLen += fragStorage->len; + entry->head = entry->tail = fragStorage; + if (!flags) { + entry->totalLen = offset + payloadLen; + } + NdisGetCurrentSystemTime((LARGE_INTEGER *)¤tTime); + entry->expiration = currentTime + IPFRAG_ENTRY_TIMEOUT; + + /* Init the sync-lock. */ + entry->lockObj = NdisAllocateRWLock(switchContext->NdisFilterHandle); + if (entry->lockObj == NULL) { + OvsFreeMemoryWithTag(entry, OVS_MEMORY_TAG); + status = NDIS_STATUS_RESOURCES; + goto payload_copy_error; + } + + NdisAcquireRWLockWrite(ovsIpFragmentHashLockObj, &htLockState, 0); + InsertHeadList(&OvsIpFragTable[hash & IP_FRAG_HASH_TABLE_MASK], + &entry->link); + + ipTotalEntries++; + NdisReleaseRWLock(ovsIpFragmentHashLockObj, &htLockState); + return NDIS_STATUS_PENDING; + } else { + /* Acquire the entry lock. */ + NdisAcquireRWLockWrite(entry->lockObj, &entryLockState, 0); + NdisGetCurrentSystemTime((LARGE_INTEGER *)¤tTime); + if (currentTime > entry->expiration) { + /* Expired entry. */ + goto fragment_error; + } + POVS_FRAGMENT_LIST next = entry->head; + POVS_FRAGMENT_LIST prev = entry->tail; + if (prev != NULL || prev->offset < offset) { + next = NULL; + goto found; + } + prev = NULL; + for (next = entry->head; next != NULL; next = next->next) { + if (next->offset > fragStorage->offset) { + break; + } + prev = next; + } +found: + /*Check for overlap. */ + if (prev) { + /* i bytes overlap. */ + int i = (prev->offset + prev->len) - fragStorage->offset; + if (i > 0) { + goto fragment_error; + } + } + if (next) { + /* i bytes overlap. */ + int i = (fragStorage->offset + fragStorage->len) - next->offset; + if (i > 0) { + goto fragment_error; + } + } + /*Insert. */ + if (prev) { + prev->next = fragStorage; + fragStorage->next = next; + } else { + fragStorage->next = next; + entry->head = fragStorage; + } + if (!next) { + entry->tail = fragStorage; + } + + entry->mru = entry->mru > (payloadLen + ipHdrLen) ? + entry->mru : (payloadLen + ipHdrLen); + if (entry->recvdLen + fragStorage->len > entry->recvdLen) { + entry->recvdLen += fragStorage->len; + } else { + /* Overflow, ignore the fragment.*/ + goto fragment_error; + } + if (!flags) { + entry->totalLen = offset + payloadLen; + } + if (entry->recvdLen == entry->totalLen) { + /* Update mru of the forwarding context. */ + *mru = entry->mru + ETH_HEADER_LENGTH; + status = OvsIpv4Reassemble(switchContext, curNbl, completionList, + sourcePort, entry, newNbl); + } + NdisReleaseRWLock(entry->lockObj, &entryLockState); + return status; + } +fragment_error: + /* Release the entry lock. */ + NdisReleaseRWLock(entry->lockObj, &entryLockState); +payload_copy_error: + OvsFreeMemoryWithTag(fragStorage->pbuff, OVS_MEMORY_TAG); + OvsFreeMemoryWithTag(fragStorage, OVS_MEMORY_TAG); + return status; +} + + +/* +*---------------------------------------------------------------------------- +* OvsIpFragmentEntryCleaner +* Runs periodically and cleans up the Ip Fragment table +* Interval is selected as twice the entry timeout +*---------------------------------------------------------------------------- +*/ +static VOID +OvsIpFragmentEntryCleaner(PVOID data) +{ + + POVS_IPFRAG_THREAD_CTX context = (POVS_IPFRAG_THREAD_CTX)data; + PLIST_ENTRY link, next; + POVS_IPFRAG_ENTRY entry; + BOOLEAN success = TRUE; + + while (success) { + LOCK_STATE_EX lockState; + NdisAcquireRWLockWrite(ovsIpFragmentHashLockObj, &lockState, 0); + if (context->exit) { + NdisReleaseRWLock(ovsIpFragmentHashLockObj, &lockState); + break; + } + + /* Set the timeout for the thread and cleanup. */ + UINT64 currentTime, threadSleepTimeout; + NdisGetCurrentSystemTime((LARGE_INTEGER *)¤tTime); + threadSleepTimeout = currentTime + IPFRAG_CLEANUP_INTERVAL; + for (int i = 0; i < IP_FRAG_HASH_TABLE_SIZE && ipTotalEntries; i++) { + LIST_FORALL_SAFE(&OvsIpFragTable[i], link, next) { + entry = CONTAINING_RECORD(link, OVS_IPFRAG_ENTRY, link); + if (entry->expiration < currentTime) { + OvsIpFragmentEntryDelete(entry); + } + } + } + + NdisReleaseRWLock(ovsIpFragmentHashLockObj, &lockState); + KeWaitForSingleObject(&context->event, Executive, KernelMode, + FALSE, (LARGE_INTEGER *)&threadSleepTimeout); + } + + PsTerminateSystemThread(STATUS_SUCCESS); +} + +static VOID +OvsIpFragmentEntryDelete(POVS_IPFRAG_ENTRY entry) +{ + LOCK_STATE_EX lockState; + NdisAcquireRWLockWrite(entry->lockObj, &lockState, 0); + POVS_FRAGMENT_LIST head = entry->head; + POVS_FRAGMENT_LIST temp = NULL; + while (head) { + temp = head; + head = head->next; + OvsFreeMemoryWithTag(temp->pbuff, OVS_MEMORY_TAG); + OvsFreeMemoryWithTag(temp, OVS_MEMORY_TAG); + } + RemoveEntryList(&entry->link); + ipTotalEntries--; + NdisReleaseRWLock(entry->lockObj, &lockState); + NdisFreeRWLock(entry->lockObj); + OvsFreeMemoryWithTag(entry, OVS_MEMORY_TAG); +} + +VOID +OvsCleanupIpFragment(VOID) +{ + PLIST_ENTRY link, next; + POVS_IPFRAG_ENTRY entry; + LOCK_STATE_EX lockState; + NdisAcquireRWLockWrite(ovsIpFragmentHashLockObj, &lockState, 0); + ipFragThreadCtx.exit = 1; + KeSetEvent(&ipFragThreadCtx.event, 0, FALSE); + NdisReleaseRWLock(ovsIpFragmentHashLockObj, &lockState); + KeWaitForSingleObject(ipFragThreadCtx.threadObject, Executive, + KernelMode, FALSE, NULL); + ObDereferenceObject(ipFragThreadCtx.threadObject); + + if (OvsIpFragTable) { + for (int i = 0; i < IP_FRAG_HASH_TABLE_SIZE && ipTotalEntries; i++) { + LIST_FORALL_SAFE(&OvsIpFragTable[i], link, next) { + entry = CONTAINING_RECORD(link, OVS_IPFRAG_ENTRY, link); + OvsIpFragmentEntryDelete(entry); + } + } + OvsFreeMemoryWithTag(OvsIpFragTable, OVS_MEMORY_TAG); + OvsIpFragTable = NULL; + } + NdisFreeRWLock(ovsIpFragmentHashLockObj); + ovsIpFragmentHashLockObj = NULL; + } diff --git a/datapath-windows/ovsext/IpFragment.h b/datapath-windows/ovsext/IpFragment.h new file mode 100644 index 0000000..8d87451 --- /dev/null +++ b/datapath-windows/ovsext/IpFragment.h @@ -0,0 +1,74 @@ +/* +* Copyright (c) 2017 VMware, Inc. +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at: +* +* https://urldefense.proofpoint.com/v2/url?u=http-3A__www.apache.org_licenses_LICENSE-2D2.0&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=6OuVHk-mnufSWzkKa74UkQ&m=I5c08LjVSUqyr1NmvoFFEPPDfrSIQhDwNr4ybCJddFg&s=MqMW4vcIn0dMvg5iQsDXTsvkSW5hnJ95l3b9ZmAGdwk&e= +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +#ifndef __IPFRAGMENT_H_ +#define __IPFRAGMENT_H_ 1 +#include "PacketIO.h" + +typedef struct _OVS_FRAGMENT_LIST { + CHAR *pbuff; + UINT16 len; + UINT16 offset; + struct _OVS_FRAGMENT_LIST *next; +} OVS_FRAGMENT_LIST, *POVS_FRAGMENT_LIST; + +typedef struct _OVS_IPFRAG_KEY { + UINT8 protocol; + UINT8 pad_1[3]; /* Align the structure to address boundaries.*/ + UINT16 id; + UINT16 pad_2; /* Align the structure to address boundaries.*/ + UINT32 sAddr; + UINT32 dAddr; + ovs_be64 tunnelId; +} OVS_IPFRAG_KEY, *POVS_IPFRAG_KEY; + +typedef struct _OVS_IPFRAG_ENTRY { + PNDIS_RW_LOCK_EX lockObj; /* To access the entry. */ + UINT16 totalLen; + UINT16 recvdLen; + UINT16 mru; + UINT64 expiration; + OVS_IPFRAG_KEY fragKey; + POVS_FRAGMENT_LIST head; + POVS_FRAGMENT_LIST tail; + LIST_ENTRY link; +} OVS_IPFRAG_ENTRY, *POVS_IPFRAG_ENTRY; + +typedef struct _OVS_IPFRAG_THREAD_CTX { + KEVENT event; + PVOID threadObject; + UINT32 exit; +} OVS_IPFRAG_THREAD_CTX, *POVS_IPFRAG_THREAD_CTX; + +#define IP_FRAG_HASH_TABLE_SIZE ((UINT32)1 << 10) +#define IP_FRAG_HASH_TABLE_MASK (IP_FRAG_HASH_TABLE_SIZE - 1) +/*30s -Sufficient time to recieve all fragments.*/ +#define IPFRAG_ENTRY_TIMEOUT 300000000LL +#define IPFRAG_CLEANUP_INTERVAL IPFRAG_ENTRY_TIMEOUT * 2 /*1m.*/ +PNET_BUFFER_LIST OvsIpv4FragmentNBL(PVOID ovsContext, + PNET_BUFFER_LIST nbl, + UINT16 mru); + +NDIS_STATUS OvsProcessIpv4Fragment(POVS_SWITCH_CONTEXT switchContext, + PNET_BUFFER_LIST *curNbl, + OvsCompletionList *completionList, + NDIS_SWITCH_PORT_ID sourcePort, + UINT16 *mru, + ovs_be64 tunnelId, + PNET_BUFFER_LIST *newNbl); +NDIS_STATUS OvsInitIpFragment(POVS_SWITCH_CONTEXT context); +VOID OvsCleanupIpFragment(VOID); +#endif /* __IPFRAGMENT_H_ */ diff --git a/datapath-windows/ovsext/Switch.c b/datapath-windows/ovsext/Switch.c index 138a656..558e3af 100644 --- a/datapath-windows/ovsext/Switch.c +++ b/datapath-windows/ovsext/Switch.c @@ -27,6 +27,7 @@ #include "Flow.h" #include "IpHelper.h" #include "Oid.h" +#include "IpFragment.h" #ifdef OVS_DBG_MOD #undef OVS_DBG_MOD @@ -229,6 +230,12 @@ OvsCreateSwitch(NDIS_HANDLE ndisFilterHandle, if (status != STATUS_SUCCESS) { OvsUninitSwitchContext(switchContext); OVS_LOG_ERROR("Exit: Failed to initialize Connection tracking"); + } + + status = OvsInitIpFragment(switchContext); + if (status != STATUS_SUCCESS) { + OvsUninitSwitchContext(switchContext); + OVS_LOG_ERROR("Exit: Failed to initialize Ip Fragment"); goto create_switch_done; } @@ -265,6 +272,8 @@ OvsExtDetach(NDIS_HANDLE filterModuleContext) OvsCleanupSttDefragmentation(); OvsCleanupConntrack(); OvsCleanupCtRelated(); + OvsCleanupIpFragment(); + /* This completes the cleanup, and a new attach can be handled now. */ OVS_LOG_TRACE("Exit: OvsDetach Successfully"); diff --git a/datapath-windows/ovsext/ovsext.vcxproj b/datapath-windows/ovsext/ovsext.vcxproj index 44aea19..ecfc0b8 100644 --- a/datapath-windows/ovsext/ovsext.vcxproj +++ b/datapath-windows/ovsext/ovsext.vcxproj @@ -112,6 +112,7 @@ + @@ -268,6 +269,7 @@ +