diff mbox

Regular oops on shutdown of KVM/ARM64 machines with VGA device

Message ID 559270D3.8030305@arm.com
State New
Headers show

Commit Message

Marc Zyngier June 30, 2015, 10:34 a.m. UTC
On 30/06/15 08:54, Dirk Müller wrote:
> Hi Marc,
> 
>> Also, care to provide some hints about your kernel configuration?
> 
> I believe the relevant parameters are:
> 
> CONFIG_PGTABLE_LEVELS=4
> # CONFIG_ARM64_64K_PAGES is not set
> # CONFIG_ARM64_VA_BITS_39 is not set
> CONFIG_ARM64_VA_BITS_48=y
> CONFIG_ARM64_VA_BITS=48
> CONFIG_KVM_MMIO=y
> CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT=y
> CONFIG_KVM_COMPAT=y
> CONFIG_VIRTUALIZATION=y
> CONFIG_KVM=y
> CONFIG_KVM_ARM_HOST=y
> CONFIG_KVM_ARM_MAX_VCPUS=4
> 
> 
> the full config is here: http://pastebin.com/raw.php?i=GKAaVLYE
> 
>> What is the VGA device you mention in $subject?
>> A QEMU command line so that we can try and reproduce the issue you're
>> seeing?
> 
> with qemu 2.3.0:
> 
> qemu-system-aarch64 --enable-kvm -M virt -cpu host -vnc :4 -bios
> /usr/share/qemu/qemu-uefi-aarch64.bin -m 1G -device VGA
> 
> then connecting to the vnc to cause the VGA device to be initialized,
> and then simply ctrl-c'ing the qemu process, you'll get this crash
> 100% of each and every time. If you want additional debug output or
> try out something, just let me know and I'll be happy to provide you
> with it.

Can try the following patch?



It seems to fix the issue for me, though with a relatively different
configuration.

Thanks,

	M.

Comments

Dirk Müller June 30, 2015, 4:16 p.m. UTC | #1
Hi Marc,

> Can try the following patch?

[..]

Thanks a lot for the quick patch, from a brief testing this seems to
fix the issue (on a 4k kernel). I'll retest this in our original
configuration (which was 64k) but so far I don't see a reason why it
shouldn't fix the issue.

Thanks again,
Greetings,
Dirk
Marc Zyngier June 30, 2015, 4:20 p.m. UTC | #2
On 30/06/15 17:16, Dirk Müller wrote:
> Hi Marc,
> 
>> Can try the following patch?
> 
> [..]
> 
> Thanks a lot for the quick patch, from a brief testing this seems to
> fix the issue (on a 4k kernel). I'll retest this in our original
> configuration (which was 64k) but so far I don't see a reason why it
> shouldn't fix the issue.

Awesome. Mind if I put your Tested-by on the patch?

Thanks,

	M.
Marc Zyngier July 1, 2015, 8:20 a.m. UTC | #3
[+Will, Catalin]

On 30/06/15 19:50, Christoffer Dall wrote:
> On Tue, Jun 30, 2015 at 05:20:11PM +0100, Marc Zyngier wrote:
>> On 30/06/15 17:16, Dirk Müller wrote:
>>> Hi Marc,
>>>
>>>> Can try the following patch?
>>>
>>> [..]
>>>
>>> Thanks a lot for the quick patch, from a brief testing this seems to
>>> fix the issue (on a 4k kernel). I'll retest this in our original
>>> configuration (which was 64k) but so far I don't see a reason why it
>>> shouldn't fix the issue.
>>
>> Awesome. Mind if I put your Tested-by on the patch?
>>
> Looks to me like the definition of pmd_huge() on arm64 is broken; pretty
> sure when I reviewed this original patch I followed the path of both
> pmd_huge() and pmd_trans_huge() and checked that they don't return true
> if the entry is clear.  This happens to be the case on both arm and x86,
> and I probably only looked at the arm code and not the arm64 code.
> 
> I'm fine with this patch, but I think we should also merge the
> following, since by definition, a clear pmd cannot also be a huge pmd:
> 
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 2de9d2e..779520b 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -40,7 +40,7 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
>  
>  int pmd_huge(pmd_t pmd)
>  {
> -	return !(pmd_val(pmd) & PMD_TABLE_BIT);
> +	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>  }
>  
>  int pud_huge(pud_t pud)
> 

If the convention is for pmd_huge to check for pmd_none, then we don't
need my patch, and only this should be merged.

Catalin, Will: your thoughts?

        M.
Catalin Marinas July 1, 2015, 11:27 a.m. UTC | #4
On Wed, Jul 01, 2015 at 09:20:28AM +0100, Marc Zyngier wrote:
> [+Will, Catalin]
> 
> On 30/06/15 19:50, Christoffer Dall wrote:
> > On Tue, Jun 30, 2015 at 05:20:11PM +0100, Marc Zyngier wrote:
> >> On 30/06/15 17:16, Dirk Müller wrote:
> >>> Hi Marc,
> >>>
> >>>> Can try the following patch?
> >>>
> >>> [..]
> >>>
> >>> Thanks a lot for the quick patch, from a brief testing this seems to
> >>> fix the issue (on a 4k kernel). I'll retest this in our original
> >>> configuration (which was 64k) but so far I don't see a reason why it
> >>> shouldn't fix the issue.
> >>
> >> Awesome. Mind if I put your Tested-by on the patch?
> >>
> > Looks to me like the definition of pmd_huge() on arm64 is broken; pretty
> > sure when I reviewed this original patch I followed the path of both
> > pmd_huge() and pmd_trans_huge() and checked that they don't return true
> > if the entry is clear.  This happens to be the case on both arm and x86,
> > and I probably only looked at the arm code and not the arm64 code.
> > 
> > I'm fine with this patch, but I think we should also merge the
> > following, since by definition, a clear pmd cannot also be a huge pmd:
> > 
> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > index 2de9d2e..779520b 100644
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -40,7 +40,7 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
> >  
> >  int pmd_huge(pmd_t pmd)
> >  {
> > -	return !(pmd_val(pmd) & PMD_TABLE_BIT);
> > +	return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
> >  }
> >  
> >  int pud_huge(pud_t pud)
> 
> If the convention is for pmd_huge to check for pmd_none, then we don't
> need my patch, and only this should be merged.

Adding Steve on cc. I can see that the mm code checks for pmd_none()
before calling pmd_huge() but I'm not sure it does this all the time
(same goes for pud_huge).

Steve, do you have any more insight here?
Steve Capper July 1, 2015, 11:44 a.m. UTC | #5
On 1 July 2015 at 12:27, Catalin Marinas <catalin.marinas@arm.com> wrote:
> On Wed, Jul 01, 2015 at 09:20:28AM +0100, Marc Zyngier wrote:
>> [+Will, Catalin]
>>
>> On 30/06/15 19:50, Christoffer Dall wrote:
>> > On Tue, Jun 30, 2015 at 05:20:11PM +0100, Marc Zyngier wrote:
>> >> On 30/06/15 17:16, Dirk Müller wrote:
>> >>> Hi Marc,
>> >>>
>> >>>> Can try the following patch?
>> >>>
>> >>> [..]
>> >>>
>> >>> Thanks a lot for the quick patch, from a brief testing this seems to
>> >>> fix the issue (on a 4k kernel). I'll retest this in our original
>> >>> configuration (which was 64k) but so far I don't see a reason why it
>> >>> shouldn't fix the issue.
>> >>
>> >> Awesome. Mind if I put your Tested-by on the patch?
>> >>
>> > Looks to me like the definition of pmd_huge() on arm64 is broken; pretty
>> > sure when I reviewed this original patch I followed the path of both
>> > pmd_huge() and pmd_trans_huge() and checked that they don't return true
>> > if the entry is clear.  This happens to be the case on both arm and x86,
>> > and I probably only looked at the arm code and not the arm64 code.
>> >
>> > I'm fine with this patch, but I think we should also merge the
>> > following, since by definition, a clear pmd cannot also be a huge pmd:
>> >
>> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>> > index 2de9d2e..779520b 100644
>> > --- a/arch/arm64/mm/hugetlbpage.c
>> > +++ b/arch/arm64/mm/hugetlbpage.c
>> > @@ -40,7 +40,7 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
>> >
>> >  int pmd_huge(pmd_t pmd)
>> >  {
>> > -   return !(pmd_val(pmd) & PMD_TABLE_BIT);
>> > +   return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>> >  }
>> >
>> >  int pud_huge(pud_t pud)
>>
>> If the convention is for pmd_huge to check for pmd_none, then we don't
>> need my patch, and only this should be merged.
>
> Adding Steve on cc. I can see that the mm code checks for pmd_none()
> before calling pmd_huge() but I'm not sure it does this all the time
> (same goes for pud_huge).
>
> Steve, do you have any more insight here?
>

I thought pmd_none was always called before pmd_huge, but this was an
oversight on my part as clear pud's and pmd's cannot also be huge.
I think Christoffer's patch should be applied (with the equivalent for
pud_huge too) in case the logic ever changes.

Cheers,
--
Steve
Christoffer Dall July 1, 2015, 12:05 p.m. UTC | #6
On Wed, Jul 1, 2015 at 1:44 PM, Steve Capper <steve.capper@linaro.org> wrote:
> On 1 July 2015 at 12:27, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> On Wed, Jul 01, 2015 at 09:20:28AM +0100, Marc Zyngier wrote:
>>> [+Will, Catalin]
>>>
>>> On 30/06/15 19:50, Christoffer Dall wrote:
>>> > On Tue, Jun 30, 2015 at 05:20:11PM +0100, Marc Zyngier wrote:
>>> >> On 30/06/15 17:16, Dirk Müller wrote:
>>> >>> Hi Marc,
>>> >>>
>>> >>>> Can try the following patch?
>>> >>>
>>> >>> [..]
>>> >>>
>>> >>> Thanks a lot for the quick patch, from a brief testing this seems to
>>> >>> fix the issue (on a 4k kernel). I'll retest this in our original
>>> >>> configuration (which was 64k) but so far I don't see a reason why it
>>> >>> shouldn't fix the issue.
>>> >>
>>> >> Awesome. Mind if I put your Tested-by on the patch?
>>> >>
>>> > Looks to me like the definition of pmd_huge() on arm64 is broken; pretty
>>> > sure when I reviewed this original patch I followed the path of both
>>> > pmd_huge() and pmd_trans_huge() and checked that they don't return true
>>> > if the entry is clear.  This happens to be the case on both arm and x86,
>>> > and I probably only looked at the arm code and not the arm64 code.
>>> >
>>> > I'm fine with this patch, but I think we should also merge the
>>> > following, since by definition, a clear pmd cannot also be a huge pmd:
>>> >
>>> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>> > index 2de9d2e..779520b 100644
>>> > --- a/arch/arm64/mm/hugetlbpage.c
>>> > +++ b/arch/arm64/mm/hugetlbpage.c
>>> > @@ -40,7 +40,7 @@ int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr, pte_t *ptep)
>>> >
>>> >  int pmd_huge(pmd_t pmd)
>>> >  {
>>> > -   return !(pmd_val(pmd) & PMD_TABLE_BIT);
>>> > +   return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>>> >  }
>>> >
>>> >  int pud_huge(pud_t pud)
>>>
>>> If the convention is for pmd_huge to check for pmd_none, then we don't
>>> need my patch, and only this should be merged.
>>
>> Adding Steve on cc. I can see that the mm code checks for pmd_none()
>> before calling pmd_huge() but I'm not sure it does this all the time
>> (same goes for pud_huge).
>>
>> Steve, do you have any more insight here?
>>
>
> I thought pmd_none was always called before pmd_huge, but this was an
> oversight on my part as clear pud's and pmd's cannot also be huge.
> I think Christoffer's patch should be applied (with the equivalent for
> pud_huge too) in case the logic ever changes.
>
ok, I'll send out a patch.

-Christoffer
diff mbox

Patch

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7b42012..d902a53 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -109,7 +109,7 @@  static void kvm_flush_dcache_pud(pud_t pud)
  */
 static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd)
 {
-	if (!kvm_pmd_huge(*pmd))
+	if (pmd_none(*pmd) || !kvm_pmd_huge(*pmd))
 		return;
 
 	pmd_clear(pmd);