From patchwork Wed Oct 4 20:21:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Daniel Henrique Barboza X-Patchwork-Id: 821471 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3y6nQf2Rmnz9t5q for ; Thu, 5 Oct 2017 07:23:06 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3y6nQf1Wc5zDr6s for ; Thu, 5 Oct 2017 07:23:06 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=danielhb@linux.vnet.ibm.com; receiver=) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3y6nPJ6SP6zDr0K for ; Thu, 5 Oct 2017 07:21:56 +1100 (AEDT) Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v94KK49h030794 for ; Wed, 4 Oct 2017 16:21:54 -0400 Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.149]) by mx0a-001b2d01.pphosted.com with ESMTP id 2dd5p6jtaj-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 04 Oct 2017 16:21:54 -0400 Received: from localhost by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 4 Oct 2017 14:21:53 -0600 Received: from b03cxnp07029.gho.boulder.ibm.com (9.17.130.16) by e31.co.us.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 4 Oct 2017 14:21:52 -0600 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id v94KLnsv9568662; Wed, 4 Oct 2017 13:21:51 -0700 Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D83A27803F; Wed, 4 Oct 2017 14:21:51 -0600 (MDT) Received: from [9.80.219.45] (unknown [9.80.219.45]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP id 6AC9D78043; Wed, 4 Oct 2017 14:21:50 -0600 (MDT) To: linuxppc-dev@lists.ozlabs.org From: Daniel Henrique Barboza Subject: Possible LMB hot unplug bug in 4.13+ kernels Date: Wed, 4 Oct 2017 17:21:48 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 x-cbid: 17100420-8235-0000-0000-00000C5D252F X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007843; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000233; SDB=6.00926511; UDB=6.00466087; IPR=6.00706723; BA=6.00005620; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017393; XFM=3.00000015; UTC=2017-10-04 20:21:53 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17100420-8236-0000-0000-00003DE8A357 Message-Id: <94618f16-b47f-b714-9cb5-4bbbf7fdccdf@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-10-04_09:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1710040282 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nathan Fontenot Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi, I've stumbled in a LMB hot unplug problem when running a guest with 4.13+ kernel using QEMU 2.10. When trying to hot unplug a recently hotplugged LMB this is what I got, using an upstream kernel: --------------- QEMU cmd line: sudo ./qemu-system-ppc64 -name migrate_qemu -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0xf -device spapr-vscsi,id=scsi0,reg=0x2000 -smp 32,maxcpus=32,sockets=32,cores=1,threads=1 --machine pseries,accel=kvm,kvm-type=HV,usb=off,dump-guest-core=off -m 4G,slots=32,maxmem=32G -drive file=/home/danielhb/vm_imgs/f26.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -nographic Last login: Wed Oct  4 12:28:25 on hvc0 [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal:        4161728 kB MemFree:         3204352 kB MemAvailable:    3558336 kB [danielhb@localhost ~]$ QEMU 2.10.50 monitor - type 'help' for more information (qemu) (qemu) object_add memory-backend-ram,id=ram0,size=1G (qemu) device_add pc-dimm,id=dimm0,memdev=ram0 (qemu) [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal:        5210304 kB MemFree:         4254656 kB MemAvailable:    4598144 kB [danielhb@localhost ~]$ (qemu) (qemu) device_del dimm0 (qemu) [  136.058727] pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs (qemu) [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal:        5210304 kB MemFree:         4253696 kB MemAvailable:    4597184 kB [danielhb@localhost ~]$ ------------- The LMBs are about to be unplugged, them something happens and they ended up being hotplugged back. This isn't reproducible with 4.11 guests. I can reliably reproduce it in 4.13+. Haven't tried 4.12. Changing QEMU snapshots or even the hypervisor kernel/OS didn't affect the result. In an attempt to better understand the issue I did the following changes in upstream kernel:         scns_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; @@ -442,8 +444,10 @@ static bool lmb_is_removable(struct of_drconf_cell *lmb)  #ifdef CONFIG_FA_DUMP         /* Don't hot-remove memory that falls in fadump boot memory area */ -       if (is_fadump_boot_memory_area(phys_addr, block_sz)) +       if (is_fadump_boot_memory_area(phys_addr, block_sz)) { +               pr_err("lmb belongs to fadump boot memory area\n");                 return false; +       }  #endif         for (i = 0; i < scns_per_block; i++) { @@ -454,7 +458,9 @@ static bool lmb_is_removable(struct of_drconf_cell *lmb)                 rc &= is_mem_section_removable(pfn, PAGES_PER_SECTION);                 phys_addr += MIN_MEMORY_BLOCK_SIZE;         } - +       if (!rc) { +               pr_err("is_mem_section_removable returned false \n"); +       }         return rc ? true : false;  } @@ -465,12 +471,16 @@ static int dlpar_remove_lmb(struct of_drconf_cell *lmb)         unsigned long block_sz;         int nid, rc; -       if (!lmb_is_removable(lmb)) +       if (!lmb_is_removable(lmb)) { +               pr_err("dlpar_remove_lmb: lmb is not removable! \n");                 return -EINVAL; +       }         rc = dlpar_offline_lmb(lmb); -       if (rc) +       if (rc) { +               pr_err("dlpar_remove_lmb: offline_lmb returned not zero \n");                 return rc; +       }         block_sz = pseries_memory_block_size();         nid = memory_add_physaddr_to_nid(lmb->base_addr); And this is the output: --------- [danielhb@localhost ~]$ QEMU 2.10.50 monitor - type 'help' for more information (qemu) (qemu) object_add memory-backend-ram,id=ram0,size=1G (qemu) device_add pc-dimm,id=dimm0,memdev=ram0 (qemu) [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal:        5210304 kB MemFree:         4254656 kB MemAvailable:    4598144 kB [danielhb@localhost ~]$ (qemu) (qemu) device_del dimm0 (qemu) [  136.058473] pseries-hotplug-mem: is_mem_section_removable returned false [  136.058607] pseries-hotplug-mem: dlpar_remove_lmb: lmb is not removable! [  136.058727] pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs (qemu) [danielhb@localhost ~]$ grep Mem /proc/meminfo MemTotal:        5210304 kB MemFree:         4253696 kB MemAvailable:    4597184 kB [danielhb@localhost ~]$ --------------- It appears that the hot unplug is failing because lmb_is_removable(lmb) is returning false inside dlpar_remove_lmb, triggering the hotplug of the LMBs again:         if (rc) {                 pr_err("Memory indexed-count-remove failed, adding any removed LMBs\n");                 for (i = start_index; i < end_index; i++) {                         if (!lmbs[i].reserved)                                 continue;                         rc = dlpar_add_lmb(&lmbs[i]);                         if (rc)                                 pr_err("Failed to add LMB, drc index %x\n", be32_to_cpu(lmbs[i].drc_index));                         lmbs[i].reserved = 0;                 } I am not aware of anything that I can do from QEMU side to fix this. Can anyone take a look or provide guidance? Am I missing something in my tests? Thanks, Daniel diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 1d48ab424bd9..37550833cdb0 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -433,8 +433,10 @@ static bool lmb_is_removable(struct of_drconf_cell *lmb)         unsigned long pfn, block_sz;         u64 phys_addr; -       if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) +       if (!(lmb->flags & DRCONF_MEM_ASSIGNED)) { +               pr_err("lmb is not assigned \n");                 return false; +       }         block_sz = memory_block_size_bytes();