diff mbox

[RFC] Implement GIC-500 from GICv3 family for arm64

Message ID 1425912119-15681-1-git-send-email-shlomo.pongratz@toganetworks.com
State New
Headers show

Commit Message

shlomo.pongratz@toganetworks.com March 9, 2015, 2:41 p.m. UTC
From: Shlomo Pongratz <shlomo.pongratz@huawei.com>

This patch is a first step toward 128 cores support for arm64.

At first only 64 cores are supported for two reasons:
First the largest integer type has the size of 64 bits and modifying
essential data structures in order to support 128 cores will require
the usage of bitops.
Second currently the Linux (kernel) can be configured to support
up to 64 cores thus there is no urgency with 128 cores support.

Things left to do:

Currently the booting Linux may got stuck. The probability of getting stuck
increases with the number of cores. I'll appreciate core review.

There is a need to support flexible clusters size. The GIC-500 can support
up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
So for example, if one wishes to have 16 cores, the options are:
2 clusters of 8 cores each, 4 clusters with 4 cores each
Currently only the first option is supported.
There is an issue of passing clock affinity to via the dtb. In the dtb

interrupt section there are only 24 bit left to affinity since the
variable is a 32 bit entity and 8 bits are reserved for flags.
See Documentation/devicetree/bindings/arm/arch_timer.txt.
Note that this issue is not seems to be critical as when checking
/proc/irq/3/smp_affinity with 32 cores all 32 bits are one.

The last issue is to add support for 128 cores. This requires the usage
of bitops and currently can be tested up to 64 cores.

Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
---
 hw/arm/Makefile.objs               |    2 +-
 hw/arm/virtv2.c                    |  774 +++++++++++++++++
 hw/intc/Makefile.objs              |    2 +
 hw/intc/arm_gic_common.c           |    2 +
 hw/intc/arm_gicv3.c                | 1596 ++++++++++++++++++++++++++++++++++++
 hw/intc/arm_gicv3_common.c         |  188 +++++
 hw/intc/gicv3_internal.h           |  153 ++++
 include/hw/intc/arm_gicv3.h        |   44 +
 include/hw/intc/arm_gicv3_common.h |  136 +++
 target-arm/cpu.c                   |    1 +
 target-arm/cpu.h                   |    6 +
 target-arm/cpu64.c                 |   92 +++
 target-arm/helper.c                |   12 +-
 target-arm/psci.c                  |   18 +-
 target-arm/translate-a64.c         |   14 +
 15 files changed, 3034 insertions(+), 6 deletions(-)
 create mode 100644 hw/arm/virtv2.c
 create mode 100644 hw/intc/arm_gicv3.c
 create mode 100644 hw/intc/arm_gicv3_common.c
 create mode 100644 hw/intc/gicv3_internal.h
 create mode 100644 include/hw/intc/arm_gicv3.h
 create mode 100644 include/hw/intc/arm_gicv3_common.h

--
1.9.1

-------------------------------------------------------------------------------------------------------------------------------------------------
This email and any files transmitted and/or attachments with it are confidential and proprietary information of
Toga Networks Ltd., and intended solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager. This message contains confidential
information of Toga Networks Ltd., and is intended only for the individual named. If you are not the named
addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately
by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
------------------------------------------------------------------------------------------------------------------------------------------------

Comments

Shannon Zhao March 10, 2015, 1:18 a.m. UTC | #1
On 2015/3/9 22:41, shlomo.pongratz@toganetworks.com wrote:
> From: Shlomo Pongratz <shlomo.pongratz@huawei.com>
> 
> This patch is a first step toward 128 cores support for arm64.
> 
> At first only 64 cores are supported for two reasons:
> First the largest integer type has the size of 64 bits and modifying
> essential data structures in order to support 128 cores will require
> the usage of bitops.
> Second currently the Linux (kernel) can be configured to support
> up to 64 cores thus there is no urgency with 128 cores support.
> 
> Things left to do:
> 
> Currently the booting Linux may got stuck. The probability of getting stuck
> increases with the number of cores. I'll appreciate core review.
> 
> There is a need to support flexible clusters size. The GIC-500 can support
> up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
> So for example, if one wishes to have 16 cores, the options are:
> 2 clusters of 8 cores each, 4 clusters with 4 cores each
> Currently only the first option is supported.
> There is an issue of passing clock affinity to via the dtb. In the dtb
> 
> interrupt section there are only 24 bit left to affinity since the
> variable is a 32 bit entity and 8 bits are reserved for flags.
> See Documentation/devicetree/bindings/arm/arch_timer.txt.
> Note that this issue is not seems to be critical as when checking
> /proc/irq/3/smp_affinity with 32 cores all 32 bits are one.
> 
> The last issue is to add support for 128 cores. This requires the usage
> of bitops and currently can be tested up to 64 cores.
> 
> Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
> ---
>  hw/arm/Makefile.objs               |    2 +-
>  hw/arm/virtv2.c                    |  774 +++++++++++++++++

Hi,

I think here you want to introduce GICv3 in this patch. So is this necessary to
add a new virtv2 machine? And the codes of this machine mostly are same with virt.

Maybe we can add a parameter such as -GICv3 for machine virt to choose GICv3 for it
and choose GICv2 without this parameter. Then we can reuse more codes.
Pei XiaoYong March 10, 2015, 7:06 a.m. UTC | #2
于 2015/3/9 22:41, shlomo.pongratz@toganetworks.com 写道:
> From: Shlomo Pongratz <shlomo.pongratz@huawei.com>
>
> This patch is a first step toward 128 cores support for arm64.
>
> At first only 64 cores are supported for two reasons:
> First the largest integer type has the size of 64 bits and modifying
> essential data structures in order to support 128 cores will require
> the usage of bitops.
> Second currently the Linux (kernel) can be configured to support
> up to 64 cores thus there is no urgency with 128 cores support.
>
> Things left to do:
>
> Currently the booting Linux may got stuck. The probability of getting stuck
> increases with the number of cores. I'll appreciate core review.
>
> There is a need to support flexible clusters size. The GIC-500 can support
> up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
> So for example, if one wishes to have 16 cores, the options are:
> 2 clusters of 8 cores each, 4 clusters with 4 cores each
> Currently only the first option is supported.
> There is an issue of passing clock affinity to via the dtb. In the dtb
>
> interrupt section there are only 24 bit left to affinity since the
> variable is a 32 bit entity and 8 bits are reserved for flags.
> See Documentation/devicetree/bindings/arm/arch_timer.txt.
> Note that this issue is not seems to be critical as when checking
> /proc/irq/3/smp_affinity with 32 cores all 32 bits are one.
>
> The last issue is to add support for 128 cores. This requires the usage
> of bitops and currently can be tested up to 64 cores.
>
> Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
> ---
>  hw/arm/Makefile.objs               |    2 +-
>  hw/arm/virtv2.c                    |  774 +++++++++++++++++
>  hw/intc/Makefile.objs              |    2 +
>  hw/intc/arm_gic_common.c           |    2 +
>  hw/intc/arm_gicv3.c                | 1596 ++++++++++++++++++++++++++++++++++++
>  hw/intc/arm_gicv3_common.c         |  188 +++++
>  hw/intc/gicv3_internal.h           |  153 ++++
>  include/hw/intc/arm_gicv3.h        |   44 +
>  include/hw/intc/arm_gicv3_common.h |  136 +++
>  target-arm/cpu.c                   |    1 +
>  target-arm/cpu.h                   |    6 +
>  target-arm/cpu64.c                 |   92 +++
>  target-arm/helper.c                |   12 +-
>  target-arm/psci.c                  |   18 +-
>  target-arm/translate-a64.c         |   14 +
>  15 files changed, 3034 insertions(+), 6 deletions(-)
>  create mode 100644 hw/arm/virtv2.c
>  create mode 100644 hw/intc/arm_gicv3.c
>  create mode 100644 hw/intc/arm_gicv3_common.c
>  create mode 100644 hw/intc/gicv3_internal.h
>  create mode 100644 include/hw/intc/arm_gicv3.h
>  create mode 100644 include/hw/intc/arm_gicv3_common.h
>
>
> ------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
>
as far as we know , there are many components in gic-v3 implementation ,
like distributor , redistributor , its , lpi . Offsets of them is not
defined in the gic-v3 specification , i think wo should implement these
components independently , not like v2&v1 implementation in qemu.
shlomo.pongratz@toganetworks.com March 10, 2015, 9:30 a.m. UTC | #3
On 09 آذار, 2015 م 05:13, Peter Maydell wrote:
> On 9 March 2015 at 23:41,  <shlomo.pongratz@toganetworks.com> wrote:
>> From: Shlomo Pongratz <shlomo.pongratz@huawei.com>
>>
>> This patch is a first step toward 128 cores support for arm64.
>>
>> At first only 64 cores are supported for two reasons:
>> First the largest integer type has the size of 64 bits and modifying
>> essential data structures in order to support 128 cores will require
>> the usage of bitops.
>> Second currently the Linux (kernel) can be configured to support
>> up to 64 cores thus there is no urgency with 128 cores support.
>>
>> Things left to do:
>>
>> Currently the booting Linux may got stuck. The probability of getting stuck
>> increases with the number of cores. I'll appreciate core review.
>>
>> There is a need to support flexible clusters size. The GIC-500 can support
>> up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
>> So for example, if one wishes to have 16 cores, the options are:
>> 2 clusters of 8 cores each, 4 clusters with 4 cores each
>> Currently only the first option is supported.
>> There is an issue of passing clock affinity to via the dtb. In the dtb
>>
>> interrupt section there are only 24 bit left to affinity since the
>> variable is a 32 bit entity and 8 bits are reserved for flags.
>> See Documentation/devicetree/bindings/arm/arch_timer.txt.
>> Note that this issue is not seems to be critical as when checking
>> /proc/irq/3/smp_affinity with 32 cores all 32 bits are one.
>>
>> The last issue is to add support for 128 cores. This requires the usage
>> of bitops and currently can be tested up to 64 cores.
>>
>> Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
>> ---
>>   hw/arm/Makefile.objs               |    2 +-
>>   hw/arm/virtv2.c                    |  774 +++++++++++++++++
>>   hw/intc/Makefile.objs              |    2 +
>>   hw/intc/arm_gic_common.c           |    2 +
>>   hw/intc/arm_gicv3.c                | 1596 ++++++++++++++++++++++++++++++++++++
>>   hw/intc/arm_gicv3_common.c         |  188 +++++
>>   hw/intc/gicv3_internal.h           |  153 ++++
>>   include/hw/intc/arm_gicv3.h        |   44 +
>>   include/hw/intc/arm_gicv3_common.h |  136 +++
>>   target-arm/cpu.c                   |    1 +
>>   target-arm/cpu.h                   |    6 +
>>   target-arm/cpu64.c                 |   92 +++
>>   target-arm/helper.c                |   12 +-
>>   target-arm/psci.c                  |   18 +-
>>   target-arm/translate-a64.c         |   14 +
>>   15 files changed, 3034 insertions(+), 6 deletions(-)
>>   create mode 100644 hw/arm/virtv2.c
>>   create mode 100644 hw/intc/arm_gicv3.c
>>   create mode 100644 hw/intc/arm_gicv3_common.c
>>   create mode 100644 hw/intc/gicv3_internal.h
>>   create mode 100644 include/hw/intc/arm_gicv3.h
>>   create mode 100644 include/hw/intc/arm_gicv3_common.h
> This is way too big to review as a single patch; you should
> find a way to split it into a series of multiple coherent patches.
>
> thanks
> -- PMM
Hi Peter,

Thanks I'll do that.

Best regards,

S.P.
-------------------------------------------------------------------------------------------------------------------------------------------------
This email and any files transmitted and/or attachments with it are confidential and proprietary information of
Toga Networks Ltd., and intended solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager. This message contains confidential
information of Toga Networks Ltd., and is intended only for the individual named. If you are not the named
addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately
by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
------------------------------------------------------------------------------------------------------------------------------------------------
shlomo.pongratz@toganetworks.com March 10, 2015, 9:34 a.m. UTC | #4
On 10 آذار, 2015 ص 03:18, Shannon Zhao wrote:
> On 2015/3/9 22:41, shlomo.pongratz@toganetworks.com wrote:
>> From: Shlomo Pongratz <shlomo.pongratz@huawei.com>
>>
>> This patch is a first step toward 128 cores support for arm64.
>>
>> At first only 64 cores are supported for two reasons:
>> First the largest integer type has the size of 64 bits and modifying
>> essential data structures in order to support 128 cores will require
>> the usage of bitops.
>> Second currently the Linux (kernel) can be configured to support
>> up to 64 cores thus there is no urgency with 128 cores support.
>>
>> Things left to do:
>>
>> Currently the booting Linux may got stuck. The probability of getting stuck
>> increases with the number of cores. I'll appreciate core review.
>>
>> There is a need to support flexible clusters size. The GIC-500 can support
>> up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
>> So for example, if one wishes to have 16 cores, the options are:
>> 2 clusters of 8 cores each, 4 clusters with 4 cores each
>> Currently only the first option is supported.
>> There is an issue of passing clock affinity to via the dtb. In the dtb
>>
>> interrupt section there are only 24 bit left to affinity since the
>> variable is a 32 bit entity and 8 bits are reserved for flags.
>> See Documentation/devicetree/bindings/arm/arch_timer.txt.
>> Note that this issue is not seems to be critical as when checking
>> /proc/irq/3/smp_affinity with 32 cores all 32 bits are one.
>>
>> The last issue is to add support for 128 cores. This requires the usage
>> of bitops and currently can be tested up to 64 cores.
>>
>> Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
>> ---
>>   hw/arm/Makefile.objs               |    2 +-
>>   hw/arm/virtv2.c                    |  774 +++++++++++++++++
> Hi,
>
> I think here you want to introduce GICv3 in this patch. So is this necessary to
> add a new virtv2 machine? And the codes of this machine mostly are same with virt.
>
> Maybe we can add a parameter such as -GICv3 for machine virt to choose GICv3 for it
> and choose GICv2 without this parameter. Then we can reuse more codes.
>
Hi Shannon,

Using a parameter and configuring the virtual machine makes the core
unreadable.
There are to many if then...else statements.

Best regards.
-------------------------------------------------------------------------------------------------------------------------------------------------
This email and any files transmitted and/or attachments with it are confidential and proprietary information of
Toga Networks Ltd., and intended solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager. This message contains confidential
information of Toga Networks Ltd., and is intended only for the individual named. If you are not the named
addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately
by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
------------------------------------------------------------------------------------------------------------------------------------------------
shlomo.pongratz@toganetworks.com March 10, 2015, 9:47 a.m. UTC | #5
On 10 آذار, 2015 ص 09:06, Pei XiaoYong wrote:
> 于 2015/3/9 22:41, shlomo.pongratz@toganetworks.com 写道:
>> From: Shlomo Pongratz <shlomo.pongratz@huawei.com>
>>
>> This patch is a first step toward 128 cores support for arm64.
>>
>> At first only 64 cores are supported for two reasons:
>> First the largest integer type has the size of 64 bits and modifying
>> essential data structures in order to support 128 cores will require
>> the usage of bitops.
>> Second currently the Linux (kernel) can be configured to support
>> up to 64 cores thus there is no urgency with 128 cores support.
>>
>> Things left to do:
>>
>> Currently the booting Linux may got stuck. The probability of getting stuck
>> increases with the number of cores. I'll appreciate core review.
>>
>> There is a need to support flexible clusters size. The GIC-500 can support
>> up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
>> So for example, if one wishes to have 16 cores, the options are:
>> 2 clusters of 8 cores each, 4 clusters with 4 cores each
>> Currently only the first option is supported.
>> There is an issue of passing clock affinity to via the dtb. In the dtb
>>
>> interrupt section there are only 24 bit left to affinity since the
>> variable is a 32 bit entity and 8 bits are reserved for flags.
>> See Documentation/devicetree/bindings/arm/arch_timer.txt.
>> Note that this issue is not seems to be critical as when checking
>> /proc/irq/3/smp_affinity with 32 cores all 32 bits are one.
>>
>> The last issue is to add support for 128 cores. This requires the usage
>> of bitops and currently can be tested up to 64 cores.
>>
>> Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
>> ---
>>   hw/arm/Makefile.objs               |    2 +-
>>   hw/arm/virtv2.c                    |  774 +++++++++++++++++
>>   hw/intc/Makefile.objs              |    2 +
>>   hw/intc/arm_gic_common.c           |    2 +
>>   hw/intc/arm_gicv3.c                | 1596 ++++++++++++++++++++++++++++++++++++
>>   hw/intc/arm_gicv3_common.c         |  188 +++++
>>   hw/intc/gicv3_internal.h           |  153 ++++
>>   include/hw/intc/arm_gicv3.h        |   44 +
>>   include/hw/intc/arm_gicv3_common.h |  136 +++
>>   target-arm/cpu.c                   |    1 +
>>   target-arm/cpu.h                   |    6 +
>>   target-arm/cpu64.c                 |   92 +++
>>   target-arm/helper.c                |   12 +-
>>   target-arm/psci.c                  |   18 +-
>>   target-arm/translate-a64.c         |   14 +
>>   15 files changed, 3034 insertions(+), 6 deletions(-)
>>   create mode 100644 hw/arm/virtv2.c
>>   create mode 100644 hw/intc/arm_gicv3.c
>>   create mode 100644 hw/intc/arm_gicv3_common.c
>>   create mode 100644 hw/intc/gicv3_internal.h
>>   create mode 100644 include/hw/intc/arm_gicv3.h
>>   create mode 100644 include/hw/intc/arm_gicv3_common.h
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>>
> as far as we know , there are many components in gic-v3 implementation ,
> like distributor , redistributor , its , lpi . Offsets of them is not
> defined in the gic-v3 specification , i think wo should implement these
> components independently , not like v2&v1 implementation in qemu.
>
Hi Peixiaoyong,

My immediate goal is running more than 8 cores, so currently "its" and
"ipi" are not supported.
I've used the offsets' rules from GIC-500 which is an implementation of
GICv3.
When and if "its" and "ipi" will be implemented then I think a new virt
machine will need to be created
as this is like a new HW BSP with different architecture.

Best regards,

S.P.
-------------------------------------------------------------------------------------------------------------------------------------------------
This email and any files transmitted and/or attachments with it are confidential and proprietary information of
Toga Networks Ltd., and intended solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager. This message contains confidential
information of Toga Networks Ltd., and is intended only for the individual named. If you are not the named
addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately
by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not
the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
------------------------------------------------------------------------------------------------------------------------------------------------
Shannon Zhao March 10, 2015, 9:50 a.m. UTC | #6
On 2015/3/10 17:34, Shlomo Pongratz wrote:
> 
> On 10 آذار, 2015 ص 03:18, Shannon Zhao wrote:
>> On 2015/3/9 22:41, shlomo.pongratz@toganetworks.com wrote:
>>> From: Shlomo Pongratz <shlomo.pongratz@huawei.com>
>>>
>>> This patch is a first step toward 128 cores support for arm64.
>>>
>>> At first only 64 cores are supported for two reasons:
>>> First the largest integer type has the size of 64 bits and modifying
>>> essential data structures in order to support 128 cores will require
>>> the usage of bitops.
>>> Second currently the Linux (kernel) can be configured to support
>>> up to 64 cores thus there is no urgency with 128 cores support.
>>>
>>> Things left to do:
>>>
>>> Currently the booting Linux may got stuck. The probability of getting stuck
>>> increases with the number of cores. I'll appreciate core review.
>>>
>>> There is a need to support flexible clusters size. The GIC-500 can support
>>> up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
>>> So for example, if one wishes to have 16 cores, the options are:
>>> 2 clusters of 8 cores each, 4 clusters with 4 cores each
>>> Currently only the first option is supported.
>>> There is an issue of passing clock affinity to via the dtb. In the dtb
>>>
>>> interrupt section there are only 24 bit left to affinity since the
>>> variable is a 32 bit entity and 8 bits are reserved for flags.
>>> See Documentation/devicetree/bindings/arm/arch_timer.txt.
>>> Note that this issue is not seems to be critical as when checking
>>> /proc/irq/3/smp_affinity with 32 cores all 32 bits are one.
>>>
>>> The last issue is to add support for 128 cores. This requires the usage
>>> of bitops and currently can be tested up to 64 cores.
>>>
>>> Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
>>> ---
>>>   hw/arm/Makefile.objs               |    2 +-
>>>   hw/arm/virtv2.c                    |  774 +++++++++++++++++
>> Hi,
>>
>> I think here you want to introduce GICv3 in this patch. So is this necessary to
>> add a new virtv2 machine? And the codes of this machine mostly are same with virt.
>>
>> Maybe we can add a parameter such as -GICv3 for machine virt to choose GICv3 for it
>> and choose GICv2 without this parameter. Then we can reuse more codes.
>>
> Hi Shannon,
> 
> Using a parameter and configuring the virtual machine makes the core
> unreadable.
> There are to many if then...else statements.

Sorry, I don't think so. As we have implement GICv3 in qemu using a parameter way,
just about 10 if then...else statements are needed. The repeat codes are huge
compared with those statements.
Claudio Fontana March 10, 2015, 9:59 a.m. UTC | #7
On 10.03.2015 10:50, Shannon Zhao wrote:
> On 2015/3/10 17:34, Shlomo Pongratz wrote:
>>
>> On 10 آذار, 2015 ص 03:18, Shannon Zhao wrote:
>>> On 2015/3/9 22:41, shlomo.pongratz@toganetworks.com wrote:
>>>> From: Shlomo Pongratz <shlomo.pongratz@huawei.com>
>>>>
>>>> This patch is a first step toward 128 cores support for arm64.
>>>>
>>>> At first only 64 cores are supported for two reasons:
>>>> First the largest integer type has the size of 64 bits and modifying
>>>> essential data structures in order to support 128 cores will require
>>>> the usage of bitops.
>>>> Second currently the Linux (kernel) can be configured to support
>>>> up to 64 cores thus there is no urgency with 128 cores support.
>>>>
>>>> Things left to do:
>>>>
>>>> Currently the booting Linux may got stuck. The probability of getting stuck
>>>> increases with the number of cores. I'll appreciate core review.
>>>>
>>>> There is a need to support flexible clusters size. The GIC-500 can support
>>>> up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
>>>> So for example, if one wishes to have 16 cores, the options are:
>>>> 2 clusters of 8 cores each, 4 clusters with 4 cores each
>>>> Currently only the first option is supported.
>>>> There is an issue of passing clock affinity to via the dtb. In the dtb
>>>>
>>>> interrupt section there are only 24 bit left to affinity since the
>>>> variable is a 32 bit entity and 8 bits are reserved for flags.
>>>> See Documentation/devicetree/bindings/arm/arch_timer.txt.
>>>> Note that this issue is not seems to be critical as when checking
>>>> /proc/irq/3/smp_affinity with 32 cores all 32 bits are one.
>>>>
>>>> The last issue is to add support for 128 cores. This requires the usage
>>>> of bitops and currently can be tested up to 64 cores.
>>>>
>>>> Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
>>>> ---
>>>>   hw/arm/Makefile.objs               |    2 +-
>>>>   hw/arm/virtv2.c                    |  774 +++++++++++++++++
>>> Hi,
>>>
>>> I think here you want to introduce GICv3 in this patch. So is this necessary to
>>> add a new virtv2 machine? And the codes of this machine mostly are same with virt.
>>>
>>> Maybe we can add a parameter such as -GICv3 for machine virt to choose GICv3 for it
>>> and choose GICv2 without this parameter. Then we can reuse more codes.
>>>
>> Hi Shannon,
>>
>> Using a parameter and configuring the virtual machine makes the core
>> unreadable.
>> There are to many if then...else statements.
> 
> Sorry, I don't think so. As we have implement GICv3 in qemu using a parameter way,
> just about 10 if then...else statements are needed. The repeat codes are huge
> compared with those statements.
> 

As Shannon mentions it should be possible to at least try to do it without that amount of code duplication.
Should every change to the virt platform be reflected in two files every time anything minor has to be changed?
I don't think so. Lets try to put it in virt and see how the result looks like..

Btw, as Peter mentioned this patch needs to be split.

Ciao,

Claudio
michael March 10, 2015, 3:01 p.m. UTC | #8
On 2015年03月10日 17:47, Shlomo Pongratz wrote:
>
> On 10 آذار, 2015 ص 09:06, Pei XiaoYong wrote:
>> 于 2015/3/9 22:41, shlomo.pongratz@toganetworks.com 写道:
>>> From: Shlomo Pongratz <shlomo.pongratz@huawei.com>
>>>
>>> This patch is a first step toward 128 cores support for arm64.
>>>
>>> At first only 64 cores are supported for two reasons:
>>> First the largest integer type has the size of 64 bits and modifying
>>> essential data structures in order to support 128 cores will require
>>> the usage of bitops.
>>> Second currently the Linux (kernel) can be configured to support
>>> up to 64 cores thus there is no urgency with 128 cores support.
>>>
>>> Things left to do:
>>>
>>> Currently the booting Linux may got stuck. The probability of 
>>> getting stuck
>>> increases with the number of cores. I'll appreciate core review.
>>>
>>> There is a need to support flexible clusters size. The GIC-500 can 
>>> support
>>> up to 128 cores, up to 32 clusters and up to 8 cores is a cluster.
>>> So for example, if one wishes to have 16 cores, the options are:
>>> 2 clusters of 8 cores each, 4 clusters with 4 cores each
>>> Currently only the first option is supported.
>>> There is an issue of passing clock affinity to via the dtb. In the dtb
>>>
>>> interrupt section there are only 24 bit left to affinity since the
>>> variable is a 32 bit entity and 8 bits are reserved for flags.
>>> See Documentation/devicetree/bindings/arm/arch_timer.txt.
>>> Note that this issue is not seems to be critical as when checking
>>> /proc/irq/3/smp_affinity with 32 cores all 32 bits are one.
>>>
>>> The last issue is to add support for 128 cores. This requires the usage
>>> of bitops and currently can be tested up to 64 cores.
>>>
>>> Signed-off-by: Shlomo Pongratz <shlomo.pongratz@toganetworks.com>
>>> ---
>>>   hw/arm/Makefile.objs               |    2 +-
>>>   hw/arm/virtv2.c                    |  774 +++++++++++++++++
>>>   hw/intc/Makefile.objs              |    2 +
>>>   hw/intc/arm_gic_common.c           |    2 +
>>>   hw/intc/arm_gicv3.c                | 1596 
>>> ++++++++++++++++++++++++++++++++++++
>>>   hw/intc/arm_gicv3_common.c         |  188 +++++
>>>   hw/intc/gicv3_internal.h           |  153 ++++
>>>   include/hw/intc/arm_gicv3.h        |   44 +
>>>   include/hw/intc/arm_gicv3_common.h |  136 +++
>>>   target-arm/cpu.c                   |    1 +
>>>   target-arm/cpu.h                   |    6 +
>>>   target-arm/cpu64.c                 |   92 +++
>>>   target-arm/helper.c                |   12 +-
>>>   target-arm/psci.c                  |   18 +-
>>>   target-arm/translate-a64.c         |   14 +
>>>   15 files changed, 3034 insertions(+), 6 deletions(-)
>>>   create mode 100644 hw/arm/virtv2.c
>>>   create mode 100644 hw/intc/arm_gicv3.c
>>>   create mode 100644 hw/intc/arm_gicv3_common.c
>>>   create mode 100644 hw/intc/gicv3_internal.h
>>>   create mode 100644 include/hw/intc/arm_gicv3.h
>>>   create mode 100644 include/hw/intc/arm_gicv3_common.h
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------------------------ 
>>>
>>>
>>>
>>>
>>>
>>>
>> as far as we know , there are many components in gic-v3 implementation ,
>> like distributor , redistributor , its , lpi . Offsets of them is not
>> defined in the gic-v3 specification , i think wo should implement these
>> components independently , not like v2&v1 implementation in qemu.
>>
> Hi Peixiaoyong,
>
> My immediate goal is running more than 8 cores, so currently "its" and
> "ipi" are not supported.
> I've used the offsets' rules from GIC-500 which is an implementation of
> GICv3.
> When and if "its" and "ipi" will be implemented then I think a new virt
> machine will need to be created
> as this is like a new HW BSP with different architecture.
>
> Best regards,


Hi :
I think we should focus on the scalable of the code . On the other hand 
, we need remove the redundant code  .
diff mbox

Patch

diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs
index 6088e53..b01801b 100644
--- a/hw/arm/Makefile.objs
+++ b/hw/arm/Makefile.objs
@@ -2,7 +2,7 @@  obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o
 obj-$(CONFIG_DIGIC) += digic_boards.o
 obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o
 obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o
-obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o
+obj-y += tosa.o versatilepb.o vexpress.o virt.o virtv2.o xilinx_zynq.o z2.o

 obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o
 obj-$(CONFIG_DIGIC) += digic.o
diff --git a/hw/arm/virtv2.c b/hw/arm/virtv2.c
new file mode 100644
index 0000000..69653ca
--- /dev/null
+++ b/hw/arm/virtv2.c
@@ -0,0 +1,774 @@ 
+/*
+ * ARM mach-virt emulation
+ *
+ * Copyright (c) 2013 Linaro Limited
+ * Copyright (c) 2015 Huawei.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Emulate a virtual board which works by passing Linux all the information
+ * it needs about what devices are present via the device tree.
+ * There are some restrictions about what we can do here:
+ *  + we can only present devices whose Linux drivers will work based
+ *    purely on the device tree with no platform data at all
+ *  + we want to present a very stripped-down minimalist platform,
+ *    both because this reduces the security attack surface from the guest
+ *    and also because it reduces our exposure to being broken when
+ *    the kernel updates its device tree bindings and requires further
+ *    information in a device binding that we aren't providing.
+ * This is essentially the same approach kvmtool uses.
+ */
+
+#include "hw/sysbus.h"
+#include "hw/arm/arm.h"
+#include "hw/arm/primecell.h"
+#include "hw/devices.h"
+#include "net/net.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/device_tree.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/kvm.h"
+#include "hw/boards.h"
+#include "hw/loader.h"
+#include "exec/address-spaces.h"
+#include "qemu/bitops.h"
+#include "qemu/error-report.h"
+
+#undef DEBUG_VIRT2
+
+#ifdef DEBUG_VIRT2
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, "virt2: " fmt , ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) do {} while(0)
+#endif
+
+
+#define NUM_VIRTIO_TRANSPORTS 32
+
+/* Number of external interrupt lines to configure the GIC with */
+#define NUM_IRQS 128
+
+#define GIC_FDT_IRQ_TYPE_SPI 0
+#define GIC_FDT_IRQ_TYPE_PPI 1
+
+#define GIC_FDT_IRQ_FLAGS_EDGE_LO_HI 1
+#define GIC_FDT_IRQ_FLAGS_EDGE_HI_LO 2
+#define GIC_FDT_IRQ_FLAGS_LEVEL_HI 4
+#define GIC_FDT_IRQ_FLAGS_LEVEL_LO 8
+
+#define GIC_FDT_IRQ_PPI_CPU_START 8
+#define GIC_FDT_IRQ_PPI_CPU_WIDTH 24
+
+enum {
+    VIRT_FLASH,
+    VIRT_MEM,
+    VIRT_CPUPERIPHS,
+    VIRT_GIC_DIST,
+    VIRT_GIC_DIST_SPI,
+    VIRT_ITS_CONTROL,
+    VIRT_ITS_TRANSLATION,
+    VIRT_LPI,
+    VIRT_UART,
+    VIRT_MMIO,
+    VIRT_RTC,
+    VIRT_FW_CFG,
+    VIRT_GIC_CPU,
+};
+
+typedef struct MemMapEntry {
+    hwaddr base;
+    hwaddr size;
+} MemMapEntry;
+
+typedef struct VirtBoardInfo {
+    struct arm_boot_info bootinfo;
+    const char *cpu_model;
+    const MemMapEntry *memmap;
+    const int *irqmap;
+    int smp_cpus;
+    void *fdt;
+    int fdt_size;
+    uint32_t clock_phandle;
+} VirtBoardInfo;
+
+typedef struct {
+    MachineClass parent;
+    VirtBoardInfo *daughterboard;
+} VirtMachineClass;
+
+typedef struct {
+    MachineState parent;
+    bool secure;
+} VirtMachineState;
+
+#define TYPE_VIRT_MACHINE   "virt2"
+#define VIRT_MACHINE(obj) \
+    OBJECT_CHECK(VirtMachineState, (obj), TYPE_VIRT_MACHINE)
+#define VIRT_MACHINE_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(VirtMachineClass, obj, TYPE_VIRT_MACHINE)
+#define VIRT_MACHINE_CLASS(klass) \
+    OBJECT_CLASS_CHECK(VirtMachineClass, klass, TYPE_VIRT_MACHINE)
+
+/* Addresses and sizes of our components.
+ * 0..128MB is space for a flash device so we can run bootrom code such as UEFI.
+ * 128MB..256MB is used for miscellaneous device I/O.
+ * 256MB..1GB is reserved for possible future PCI support (ie where the
+ * PCI memory window will go if we add a PCI host controller).
+ * 1GB and up is RAM (which may happily spill over into the
+ * high memory region beyond 4GB).
+ * This represents a compromise between how much RAM can be given to
+ * a 32 bit VM and leaving space for expansion and in particular for PCI.
+ * Note that devices should generally be placed at multiples of 0x10000,
+ * to accommodate guests using 64K pages.
+ */
+static const MemMapEntry a57memmap[] = {
+    /* Space up to 0x8000000 is reserved for a boot ROM */
+    [VIRT_FLASH] =           {          0, 0x08000000 },
+    [VIRT_CPUPERIPHS] =      { 0x08000000, 0x00840000 },
+    [VIRT_GIC_DIST] =        { 0x08000000, 0x00010000 },
+    [VIRT_GIC_DIST_SPI]=     { 0x08010000, 0x00010000 },
+    [VIRT_ITS_CONTROL] =     { 0x08020000, 0x00010000 },
+    [VIRT_ITS_TRANSLATION] = { 0x08030000, 0x00010000 },
+    [VIRT_LPI] =             { 0x08040000, 0x00800000 },
+    [VIRT_UART] =            { 0x09000000, 0x00001000 },
+    [VIRT_RTC] =             { 0x09010000, 0x00001000 },
+    [VIRT_FW_CFG] =          { 0x09020000, 0x0000000a },
+    [VIRT_MMIO] =            { 0x0a000000, 0x00000200 },
+    /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
+    /* 0x10000000 .. 0x40000000 reserved for PCI */
+    [VIRT_MEM] = { 0x40000000, 30ULL * 1024 * 1024 * 1024 },
+    /* Need to support both memmory mapped and system registers according to
+     * section D7.6.22 4 SRE_EL1 of the ARM arch ref manual */
+    [VIRT_GIC_CPU] = { 0x40000000, 30ULL * 1024 * 1024 * 1024 },
+};
+
+static const int a57irqmap[] = {
+    [VIRT_UART] = 1,
+    [VIRT_RTC] = 2,
+    [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
+};
+
+static VirtBoardInfo machines[] = {
+    {
+        .cpu_model = "cortex-a53",
+        .memmap = a57memmap,
+        .irqmap = a57irqmap,
+    },
+    {
+        .cpu_model = "cortex-a57",
+        .memmap = a57memmap,
+        .irqmap = a57irqmap,
+    },
+    {
+        .cpu_model = "host",
+        .memmap = a57memmap,
+        .irqmap = a57irqmap,
+    },
+};
+
+static VirtBoardInfo *find_machine_info(const char *cpu)
+{
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(machines); i++) {
+        if (strcmp(cpu, machines[i].cpu_model) == 0) {
+            return &machines[i];
+        }
+    }
+    return NULL;
+}
+
+static void create_fdt(VirtBoardInfo *vbi)
+{
+    void *fdt = create_device_tree(&vbi->fdt_size);
+
+    if (!fdt) {
+        error_report("create_device_tree() failed");
+        exit(1);
+    }
+
+    vbi->fdt = fdt;
+
+    /* Header */
+    qemu_fdt_setprop_string(fdt, "/", "compatible", "linux,dummy-virt");
+    qemu_fdt_setprop_cell(fdt, "/", "#address-cells", 0x2);
+    qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x2);
+
+    /*
+     * /chosen and /memory nodes must exist for load_dtb
+     * to fill in necessary properties later
+     */
+    qemu_fdt_add_subnode(fdt, "/chosen");
+    qemu_fdt_add_subnode(fdt, "/memory");
+    qemu_fdt_setprop_string(fdt, "/memory", "device_type", "memory");
+
+    /* Clock node, for the benefit of the UART. The kernel device tree
+     * binding documentation claims the PL011 node clock properties are
+     * optional but in practice if you omit them the kernel refuses to
+     * probe for the device.
+     */
+    vbi->clock_phandle = qemu_fdt_alloc_phandle(fdt);
+    qemu_fdt_add_subnode(fdt, "/apb-pclk");
+    qemu_fdt_setprop_string(fdt, "/apb-pclk", "compatible", "fixed-clock");
+    qemu_fdt_setprop_cell(fdt, "/apb-pclk", "#clock-cells", 0x0);
+    qemu_fdt_setprop_cell(fdt, "/apb-pclk", "clock-frequency", 24000000);
+    qemu_fdt_setprop_string(fdt, "/apb-pclk", "clock-output-names",
+                                "clk24mhz");
+    qemu_fdt_setprop_cell(fdt, "/apb-pclk", "phandle", vbi->clock_phandle);
+
+}
+
+static void fdt_add_psci_node(const VirtBoardInfo *vbi)
+{
+    uint32_t cpu_suspend_fn;
+    uint32_t cpu_off_fn;
+    uint32_t cpu_on_fn;
+    uint32_t migrate_fn;
+    void *fdt = vbi->fdt;
+    ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(0));
+
+    qemu_fdt_add_subnode(fdt, "/psci");
+    if (armcpu->psci_version == 2) {
+       const char comp[] = "arm,psci-0.2\0arm,psci";
+       qemu_fdt_setprop(fdt, "/psci", "compatible", comp, sizeof(comp));
+       cpu_off_fn = QEMU_PSCI_0_2_FN_CPU_OFF;
+       if (arm_feature(&armcpu->env, ARM_FEATURE_AARCH64)) {
+            cpu_suspend_fn = QEMU_PSCI_0_2_FN64_CPU_SUSPEND;
+            cpu_on_fn = QEMU_PSCI_0_2_FN64_CPU_ON;
+            migrate_fn = QEMU_PSCI_0_2_FN64_MIGRATE;
+       } else {
+            cpu_suspend_fn = QEMU_PSCI_0_2_FN_CPU_SUSPEND;
+            cpu_on_fn = QEMU_PSCI_0_2_FN_CPU_ON;
+            migrate_fn = QEMU_PSCI_0_2_FN_MIGRATE;
+       }
+    } else {
+        qemu_fdt_setprop_string(fdt, "/psci", "compatible", "arm,psci");
+        cpu_suspend_fn = QEMU_PSCI_0_1_FN_CPU_SUSPEND;
+        cpu_off_fn = QEMU_PSCI_0_1_FN_CPU_OFF;
+        cpu_on_fn = QEMU_PSCI_0_1_FN_CPU_ON;
+        migrate_fn = QEMU_PSCI_0_1_FN_MIGRATE;
+    }
+
+    /* We adopt the PSCI spec's nomenclature, and use 'conduit' to refer
+     * to the instruction that should be used to invoke PSCI functions.
+     * However, the device tree binding uses 'method' instead, so that is
+     * what we should use here.
+     */
+    qemu_fdt_setprop_string(fdt, "/psci", "method", "hvc");
+
+    qemu_fdt_setprop_cell(fdt, "/psci", "cpu_suspend", cpu_suspend_fn);
+    qemu_fdt_setprop_cell(fdt, "/psci", "cpu_off", cpu_off_fn);
+    qemu_fdt_setprop_cell(fdt, "/psci", "cpu_on", cpu_on_fn);
+    qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn);
+}
+
+//PPI is moved to different location
+
+static void fdt_add_timer_nodes(const VirtBoardInfo *vbi)
+{
+    /* Note that on A57 h/w these interrupts are level-triggered,
+     * but for the GIC implementation provided by both QEMU and KVM
+     * they are edge-triggered.
+     */
+    // ARMCPU *armcpu;
+    uint32_t max;
+    uint32_t irqflags = GIC_FDT_IRQ_FLAGS_EDGE_LO_HI;
+    /* Argument is 32 bit but 8 bits are reserved for flags */
+    max = (vbi->smp_cpus >= 24) ? 24 : vbi->smp_cpus;
+    irqflags = deposit32(irqflags, GIC_FDT_IRQ_PPI_CPU_START,
+        GIC_FDT_IRQ_PPI_CPU_WIDTH, (1 << max) - 1);
+
+    qemu_fdt_add_subnode(vbi->fdt, "/timer");
+    qemu_fdt_setprop_string(vbi->fdt, "/timer",
+                                "compatible", "arm,armv8-timer\0arm,armv7-timer");
+    qemu_fdt_setprop_cells(vbi->fdt, "/timer", "interrupts",
+                               GIC_FDT_IRQ_TYPE_PPI, 13, irqflags,
+                               GIC_FDT_IRQ_TYPE_PPI, 14, irqflags,
+                               GIC_FDT_IRQ_TYPE_PPI, 11, irqflags,
+                               GIC_FDT_IRQ_TYPE_PPI, 10, irqflags);
+}
+
+static void fdt_add_cpu_nodes(const VirtBoardInfo *vbi)
+{
+    int cpu;
+
+    qemu_fdt_add_subnode(vbi->fdt, "/cpus");
+    /* From Documentation/devicetree/bindings/arm/cpus.txt
+     *  On ARM v8 64-bit systems value should be set to 2,
+     *  that corresponds to the MPIDR_EL1 register size.
+     *  If MPIDR_EL1[63:32] value is equal to 0 on all CPUs
+     *  in the system, #address-cells can be set to 1, since
+     *  MPIDR_EL1[63:32] bits are not used for CPUs
+     *  identification.
+     *
+     *  Now GIC500 doesn't support affinities 2 & 3 so currently
+     *  #address-cells can stay 1 until future GIC
+     */
+    qemu_fdt_setprop_cell(vbi->fdt, "/cpus", "#address-cells", 0x1);
+    qemu_fdt_setprop_cell(vbi->fdt, "/cpus", "#size-cells", 0x0);
+
+    for (cpu = vbi->smp_cpus - 1; cpu >= 0; cpu--) {
+        int Aff1, Aff0;
+        char *nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
+        ARMCPU *armcpu = ARM_CPU(qemu_get_cpu(cpu));
+
+        qemu_fdt_add_subnode(vbi->fdt, nodename);
+        qemu_fdt_setprop_string(vbi->fdt, nodename,"device_type","cpu");
+        qemu_fdt_setprop_string(vbi->fdt, nodename, "compatible",
+                                    armcpu->dtb_compatible);
+
+        if (vbi->smp_cpus > 1) {
+            qemu_fdt_setprop_string(vbi->fdt, nodename,
+                                        "enable-method", "psci");
+        }
+
+        /* If cpus node's #address-cells property is set to 1
+         * The reg cell bits [23:0] must be set to bits [23:0] of MPIDR_EL1.
+         * Currently we support simple afinity scheme e.g. we can have
+         * 1 cluster with 8 cores but we can't have 8 clusters wit 1 core each.
+         */
+        Aff1 = cpu / 8;
+        Aff0 = cpu % 8;
+        qemu_fdt_setprop_cell(vbi->fdt, nodename, "reg", (Aff1 << 8) | Aff0);
+        g_free(nodename);
+    }
+}
+
+static void fdt_add_gic_node(const VirtBoardInfo *vbi)
+{
+    uint32_t gic_phandle;
+
+    gic_phandle = qemu_fdt_alloc_phandle(vbi->fdt);
+    qemu_fdt_setprop_cell(vbi->fdt, "/", "interrupt-parent", gic_phandle);
+
+    qemu_fdt_add_subnode(vbi->fdt, "/intc");
+    /* 'cortex-a57-gic' means 'GIC v3' */
+    qemu_fdt_setprop_string(vbi->fdt, "/intc", "compatible",
+                            "arm,gic-v3");
+    qemu_fdt_setprop_cell(vbi->fdt, "/intc", "#interrupt-cells", 3);
+    qemu_fdt_setprop(vbi->fdt, "/intc", "interrupt-controller", NULL, 0);
+    qemu_fdt_setprop_sized_cells(vbi->fdt, "/intc", "reg",
+                             2, vbi->memmap[VIRT_GIC_DIST].base,
+                             2, vbi->memmap[VIRT_GIC_DIST].size,
+#if 0
+                             /* Currently no need for SPI & ITS */
+                             2, vbi->memmap[VIRT_GIC_DIST_SPI].base,
+                             2, vbi->memmap[VIRT_GIC_DIST_SPI].size,
+                             2, vbi->memmap[VIRT_ITS_CONTROL].base,
+                             2, vbi->memmap[VIRT_ITS_CONTROL].size,
+                             2, vbi->memmap[VIRT_ITS_TRANSLATION].base,
+                             2, vbi->memmap[VIRT_ITS_TRANSLATION].size,
+#endif
+                             2, vbi->memmap[VIRT_LPI].base,
+                             2, vbi->memmap[VIRT_LPI].size);
+    qemu_fdt_setprop_cell(vbi->fdt, "/intc", "phandle", gic_phandle);
+}
+
+static void create_gic(const VirtBoardInfo *vbi, qemu_irq *pic)
+{
+    /* We create a standalone GIC v3 */
+    DeviceState *gicdev;
+    SysBusDevice *gicbusdev;
+    const char *gictype = "arm_gicv3";
+    int i;
+
+    if (kvm_irqchip_in_kernel()) {
+        gictype = "kvm-arm-gic";
+    }
+
+    gicdev = qdev_create(NULL, gictype);
+
+    for (i = 0; i < vbi->smp_cpus; i++) {
+        CPUState *cpu = qemu_get_cpu(i);
+        CPUARMState *env = cpu->env_ptr;
+        env->nvic = gicdev;
+    }
+
+    qdev_prop_set_uint32(gicdev, "revision", 3);
+    qdev_prop_set_uint32(gicdev, "num-cpu", smp_cpus);
+    /* Note that the num-irq property counts both internal and external
+     * interrupts; there are always 32 of the former (mandated by GIC spec).
+     */
+    qdev_prop_set_uint32(gicdev, "num-irq", NUM_IRQS + 32);
+    qdev_init_nofail(gicdev);
+    gicbusdev = SYS_BUS_DEVICE(gicdev);
+
+    sysbus_mmio_map(gicbusdev, 0, vbi->memmap[VIRT_GIC_DIST].base);
+    sysbus_mmio_map(gicbusdev, 1, vbi->memmap[VIRT_GIC_DIST_SPI].base);
+    sysbus_mmio_map(gicbusdev, 2, vbi->memmap[VIRT_ITS_CONTROL].base);
+    sysbus_mmio_map(gicbusdev, 3, vbi->memmap[VIRT_ITS_TRANSLATION].base);
+    sysbus_mmio_map(gicbusdev, 4, vbi->memmap[VIRT_LPI].base);
+    /* Wire the outputs from each CPU's generic timer to the
+     * appropriate GIC PPI inputs, and the GIC's IRQ output to
+     * the CPU's IRQ input.
+     */
+    for (i = 0; i < smp_cpus ; i++) {
+        DeviceState *cpudev = DEVICE(qemu_get_cpu(i));
+        int ppibase = NUM_IRQS + i * 32;
+        /* physical timer; we wire it up to the non-secure timer's ID,
+         * since a real A57 always has TrustZone but QEMU doesn't.
+         */
+        qdev_connect_gpio_out(cpudev, 0,
+                              qdev_get_gpio_in(gicdev, ppibase + 30));
+        /* virtual timer */
+        qdev_connect_gpio_out(cpudev, 1,
+                              qdev_get_gpio_in(gicdev, ppibase + 27));
+
+        sysbus_connect_irq(gicbusdev, i, qdev_get_gpio_in(cpudev, ARM_CPU_IRQ));
+    }
+
+    for (i = 0; i < NUM_IRQS; i++) {
+        pic[i] = qdev_get_gpio_in(gicdev, i);
+    }
+
+    fdt_add_gic_node(vbi);
+}
+
+static void create_uart(const VirtBoardInfo *vbi, qemu_irq *pic)
+{
+    char *nodename;
+    hwaddr base = vbi->memmap[VIRT_UART].base;
+    hwaddr size = vbi->memmap[VIRT_UART].size;
+    int irq = vbi->irqmap[VIRT_UART];
+    const char compat[] = "arm,pl011\0arm,primecell";
+    const char clocknames[] = "uartclk\0apb_pclk";
+
+    sysbus_create_simple("pl011", base, pic[irq]);
+
+    nodename = g_strdup_printf("/pl011@%" PRIx64, base);
+    qemu_fdt_add_subnode(vbi->fdt, nodename);
+    /* Note that we can't use setprop_string because of the embedded NUL */
+    qemu_fdt_setprop(vbi->fdt, nodename, "compatible",
+                         compat, sizeof(compat));
+    qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
+                                     2, base, 2, size);
+    qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
+                               GIC_FDT_IRQ_TYPE_SPI, irq,
+                               GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+    qemu_fdt_setprop_cells(vbi->fdt, nodename, "clocks",
+                               vbi->clock_phandle, vbi->clock_phandle);
+    qemu_fdt_setprop(vbi->fdt, nodename, "clock-names",
+                         clocknames, sizeof(clocknames));
+
+    qemu_fdt_setprop_string(vbi->fdt, "/chosen", "stdout-path", nodename);
+    g_free(nodename);
+}
+
+static void create_rtc(const VirtBoardInfo *vbi, qemu_irq *pic)
+{
+    char *nodename;
+    hwaddr base = vbi->memmap[VIRT_RTC].base;
+    hwaddr size = vbi->memmap[VIRT_RTC].size;
+    int irq = vbi->irqmap[VIRT_RTC];
+    const char compat[] = "arm,pl031\0arm,primecell";
+
+    sysbus_create_simple("pl031", base, pic[irq]);
+
+    nodename = g_strdup_printf("/pl031@%" PRIx64, base);
+    qemu_fdt_add_subnode(vbi->fdt, nodename);
+    qemu_fdt_setprop(vbi->fdt, nodename, "compatible", compat, sizeof(compat));
+    qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
+                                 2, base, 2, size);
+    qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
+                           GIC_FDT_IRQ_TYPE_SPI, irq,
+                           GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+    qemu_fdt_setprop_cell(vbi->fdt, nodename, "clocks", vbi->clock_phandle);
+    qemu_fdt_setprop_string(vbi->fdt, nodename, "clock-names", "apb_pclk");
+    g_free(nodename);
+}
+
+static void create_virtio_devices(const VirtBoardInfo *vbi, qemu_irq *pic)
+{
+    int i;
+    hwaddr size = vbi->memmap[VIRT_MMIO].size;
+
+    /* Note that we have to create the transports in forwards order
+     * so that command line devices are inserted lowest address first,
+     * and then add dtb nodes in reverse order so that they appear in
+     * the finished device tree lowest address first.
+     */
+    for (i = 0; i < NUM_VIRTIO_TRANSPORTS; i++) {
+        int irq = vbi->irqmap[VIRT_MMIO] + i;
+        hwaddr base = vbi->memmap[VIRT_MMIO].base + i * size;
+
+        sysbus_create_simple("virtio-mmio", base, pic[irq]);
+    }
+
+    for (i = NUM_VIRTIO_TRANSPORTS - 1; i >= 0; i--) {
+        char *nodename;
+        int irq = vbi->irqmap[VIRT_MMIO] + i;
+        hwaddr base = vbi->memmap[VIRT_MMIO].base + i * size;
+
+        nodename = g_strdup_printf("/virtio_mmio@%" PRIx64, base);
+        qemu_fdt_add_subnode(vbi->fdt, nodename);
+        qemu_fdt_setprop_string(vbi->fdt, nodename,
+                                "compatible", "virtio,mmio");
+        qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
+                                     2, base, 2, size);
+        qemu_fdt_setprop_cells(vbi->fdt, nodename, "interrupts",
+                               GIC_FDT_IRQ_TYPE_SPI, irq,
+                               GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
+        g_free(nodename);
+    }
+}
+
+static void create_one_flash(const char *name, hwaddr flashbase,
+                             hwaddr flashsize)
+{
+    /* Create and map a single flash device. We use the same
+     * parameters as the flash devices on the Versatile Express board.
+     */
+    DriveInfo *dinfo = drive_get_next(IF_PFLASH);
+    DeviceState *dev = qdev_create(NULL, "cfi.pflash01");
+    const uint64_t sectorlength = 256 * 1024;
+
+    if (dinfo && qdev_prop_set_drive(dev, "drive",
+                                     blk_by_legacy_dinfo(dinfo))) {
+        abort();
+    }
+
+    qdev_prop_set_uint32(dev, "num-blocks", flashsize / sectorlength);
+    qdev_prop_set_uint64(dev, "sector-length", sectorlength);
+    qdev_prop_set_uint8(dev, "width", 4);
+    qdev_prop_set_uint8(dev, "device-width", 2);
+    qdev_prop_set_uint8(dev, "big-endian", 0);
+    qdev_prop_set_uint16(dev, "id0", 0x89);
+    qdev_prop_set_uint16(dev, "id1", 0x18);
+    qdev_prop_set_uint16(dev, "id2", 0x00);
+    qdev_prop_set_uint16(dev, "id3", 0x00);
+    qdev_prop_set_string(dev, "name", name);
+    qdev_init_nofail(dev);
+
+    sysbus_mmio_map(SYS_BUS_DEVICE(dev), 0, flashbase);
+}
+
+static void create_flash(const VirtBoardInfo *vbi)
+{
+    /* Create two flash devices to fill the VIRT_FLASH space in the memmap.
+     * Any file passed via -bios goes in the first of these.
+     */
+    hwaddr flashsize = vbi->memmap[VIRT_FLASH].size / 2;
+    hwaddr flashbase = vbi->memmap[VIRT_FLASH].base;
+    char *nodename;
+
+    if (bios_name) {
+        const char *fn;
+
+        if (drive_get(IF_PFLASH, 0, 0)) {
+            error_report("The contents of the first flash device may be "
+                         "specified with -bios or with -drive if=pflash... "
+                         "but you cannot use both options at once");
+            exit(1);
+        }
+        fn = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
+        if (!fn || load_image_targphys(fn, flashbase, flashsize) < 0) {
+            error_report("Could not load ROM image '%s'", bios_name);
+            exit(1);
+        }
+    }
+
+    create_one_flash("virt2.flash0", flashbase, flashsize);
+    create_one_flash("virt2.flash1", flashbase + flashsize, flashsize);
+
+    nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
+    qemu_fdt_add_subnode(vbi->fdt, nodename);
+    qemu_fdt_setprop_string(vbi->fdt, nodename, "compatible", "cfi-flash");
+    qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
+                                 2, flashbase, 2, flashsize,
+                                 2, flashbase + flashsize, 2, flashsize);
+    qemu_fdt_setprop_cell(vbi->fdt, nodename, "bank-width", 4);
+    g_free(nodename);
+}
+
+static void create_fw_cfg(const VirtBoardInfo *vbi)
+{
+    hwaddr base = vbi->memmap[VIRT_FW_CFG].base;
+    hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
+    char *nodename;
+
+    fw_cfg_init_mem_wide(base + 8, base, 8);
+
+    nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
+    qemu_fdt_add_subnode(vbi->fdt, nodename);
+    qemu_fdt_setprop_string(vbi->fdt, nodename,
+                            "compatible", "qemu,fw-cfg-mmio");
+    qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
+                                 2, base, 2, size);
+    g_free(nodename);
+}
+
+static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
+{
+    const VirtBoardInfo *board = (const VirtBoardInfo *)binfo;
+
+    *fdt_size = board->fdt_size;
+    return board->fdt;
+}
+
+static void machvirt2_init(MachineState *machine)
+{
+    VirtMachineState *vms = VIRT_MACHINE(machine);
+    qemu_irq pic[NUM_IRQS];
+    MemoryRegion *sysmem = get_system_memory();
+    int n;
+    MemoryRegion *ram = g_new(MemoryRegion, 1);
+    const char *cpu_model = machine->cpu_model;
+    VirtBoardInfo *vbi;
+
+    if (!cpu_model) {
+        cpu_model = "cortex-a57";
+    }
+
+    vbi = find_machine_info(cpu_model);
+    if (!vbi) {
+        error_report("mach-virt: CPU %s not supported", cpu_model);
+        exit(1);
+    }
+
+    /* With gic3 full implementation (with bitops) rase the lmit to 128 */
+    if (smp_cpus > 64) {
+        error_report("mach-virt: cannot model more than 64 cores");
+        exit(1);
+    }
+
+    vbi->smp_cpus = smp_cpus;
+
+    if (machine->ram_size > vbi->memmap[VIRT_MEM].size) {
+        error_report("mach-virt: cannot model more than 30GB RAM");
+        exit(1);
+    }
+
+    create_fdt(vbi);
+
+    for (n = 0; n < smp_cpus; n++) {
+        ObjectClass *oc = cpu_class_by_name(TYPE_AARCH64_CPU, cpu_model);
+        Object *cpuobj;
+
+        if (!oc) {
+            fprintf(stderr, "Unable to find CPU definition\n");
+            exit(1);
+        }
+        cpuobj = object_new(object_class_get_name(oc));
+
+        if (!vms->secure) {
+            object_property_set_bool(cpuobj, false, "has_el3", NULL);
+        }
+
+        object_property_set_int(cpuobj, QEMU_PSCI_CONDUIT_HVC, "psci-conduit",
+                                NULL);
+
+        /* Secondary CPUs start in PSCI powered-down state */
+        if (n > 0) {
+            object_property_set_bool(cpuobj, true, "start-powered-off", NULL);
+        }
+
+        if (object_property_find(cpuobj, "reset-cbar", NULL)) {
+            object_property_set_int(cpuobj, vbi->memmap[VIRT_CPUPERIPHS].base,
+                                    "reset-cbar", &error_abort);
+        }
+
+        object_property_set_bool(cpuobj, true, "realized", NULL);
+    }
+    fdt_add_timer_nodes(vbi);
+    fdt_add_cpu_nodes(vbi);
+    fdt_add_psci_node(vbi);
+
+    memory_region_init_ram(ram, NULL, "mach-virt.ram", machine->ram_size,
+                           &error_abort);
+    vmstate_register_ram_global(ram);
+    memory_region_add_subregion(sysmem, vbi->memmap[VIRT_MEM].base, ram);
+
+    create_flash(vbi);
+    create_gic(vbi, pic);
+    create_uart(vbi, pic);
+    create_rtc(vbi, pic);
+
+    /* Create mmio transports, so the user can create virtio backends
+     * (which will be automatically plugged in to the transports). If
+     * no backend is created the transport will just sit harmlessly idle.
+     */
+    create_virtio_devices(vbi, pic);
+
+    create_fw_cfg(vbi);
+
+    vbi->bootinfo.ram_size = machine->ram_size;
+    vbi->bootinfo.kernel_filename = machine->kernel_filename;
+    vbi->bootinfo.kernel_cmdline = machine->kernel_cmdline;
+    vbi->bootinfo.initrd_filename = machine->initrd_filename;
+    vbi->bootinfo.nb_cpus = smp_cpus;
+    vbi->bootinfo.board_id = -1;
+    vbi->bootinfo.loader_start = vbi->memmap[VIRT_MEM].base;
+    vbi->bootinfo.get_dtb = machvirt_dtb;
+    vbi->bootinfo.firmware_loaded = bios_name || drive_get(IF_PFLASH, 0, 0);
+    DPRINTF(" ---- kernel load ----- \n");
+    arm_load_kernel(ARM_CPU(first_cpu), &vbi->bootinfo);
+    DPRINTF(" ---- kernel load finish ----- \n");
+}
+
+
+static bool virt_get_secure(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->secure;
+}
+
+static void virt_set_secure(Object *obj, bool value, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->secure = value;
+}
+
+static void virt_instance_init(Object *obj)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    /* EL3 is enabled by default on virt */
+    vms->secure = true;
+    object_property_add_bool(obj, "secure", virt_get_secure,
+                             virt_set_secure, NULL);
+    object_property_set_description(obj, "secure",
+                                    "Set on/off to enable/disable the ARM "
+                                    "Security Extensions (TrustZone)",
+                                    NULL);
+}
+
+static void virt_class_init(ObjectClass *oc, void *data)
+{
+    MachineClass *mc = MACHINE_CLASS(oc);
+
+    mc->name = TYPE_VIRT_MACHINE;
+    mc->desc = "ARM Virtual Machine",
+    mc->init = machvirt2_init;
+    /* With gic3 full implementation (with bitops) rase the lmit to 128 */
+    mc->max_cpus = 64;
+}
+
+
+static const TypeInfo machvirt2_info = {
+    .name = TYPE_VIRT_MACHINE,
+    .parent = TYPE_MACHINE,
+    .instance_size = sizeof(VirtMachineState),
+    .instance_init = virt_instance_init,
+    .class_size = sizeof(VirtMachineClass),
+    .class_init = virt_class_init,
+};
+
+static void machvirt2_machine_init(void)
+{
+    type_register_static(&machvirt2_info);
+}
+
+machine_init(machvirt2_machine_init);
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 843864a..41fe9ec 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -11,6 +11,8 @@  common-obj-$(CONFIG_SLAVIO) += slavio_intctl.o
 common-obj-$(CONFIG_IOAPIC) += ioapic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic_common.o
 common-obj-$(CONFIG_ARM_GIC) += arm_gic.o
+common-obj-$(CONFIG_ARM_GIC) += arm_gicv3_common.o
+common-obj-$(CONFIG_ARM_GIC) += arm_gicv3.o
 common-obj-$(CONFIG_OPENPIC) += openpic.o

 obj-$(CONFIG_APIC) += apic.o apic_common.o
diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c
index 18b01ba..190df46 100644
--- a/hw/intc/arm_gic_common.c
+++ b/hw/intc/arm_gic_common.c
@@ -2,7 +2,9 @@ 
  * ARM GIC support - common bits of emulated and KVM kernel model
  *
  * Copyright (c) 2012 Linaro Limited
+ * Copyright (c) 2015 Huawei.
  * Written by Peter Maydell
+ * Extended to 64 cores by Shlomo Pongratz
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
new file mode 100644
index 0000000..1dd73d0
--- /dev/null
+++ b/hw/intc/arm_gicv3.c
@@ -0,0 +1,1596 @@ 
+/*
+ * ARM Generic/Distributed Interrupt Controller
+ *
+ * Copyright (c) 2006-2007 CodeSourcery.
+ * Copyright (c) 2015 Huawei.
+ * Written by Shlomo Pongratz
+ * Base on gic.c by Paul Brook
+ *
+ * This code is licensed under the GPL.
+ */
+
+/* This file contains implementation code for the GIC-500 interrupt
+ * controller, which is an implementation of the GICv3 architecture.
+ * Curently it supports up to 64 cores. Enhancmet to 128 cores requires
+ * working with bitops.
+ */
+
+#include "hw/sysbus.h"
+#include "gicv3_internal.h"
+#include "qom/cpu.h"
+
+
+#undef DEBUG_GICV3
+
+#ifdef DEBUG_GICV3
+#define DPRINTF(fmt, ...) \
+do { fprintf(stderr, "arm_gicv3::%s: " fmt , __func__, ## __VA_ARGS__); } while (0)
+#else
+#define DPRINTF(fmt, ...) do {} while(0)
+#endif
+
+void armv8_gicv3_set_sgi(void *opaque, int cpuindex, uint64_t value);
+uint64_t armv8_gicv3_acknowledge_irq(void *opaque, int cpuindex);
+void armv8_gicv3_complete_irq(void *opaque, int cpuindex, int irq);
+uint64_t armv8_gicv3_get_priority_mask(void *opaque, int cpuindex);
+void armv8_gicv3_set_priority_mask(void *opaque, int cpuindex, uint32_t mask);
+
+static const uint8_t gic_dist_ids[] = {
+    0x44, 0x00, 0x00, 0x00, 0x092, 0xB4, 0x3B, 0x00, 0x0D, 0xF0, 0x05, 0xB1
+};
+
+static const uint8_t gic_lpi_ids[] = {
+    0x44, 0x00, 0x00, 0x00, 0x093, 0xB4, 0x3B, 0x00, 0x0D, 0xF0, 0x05, 0xB1
+};
+
+#define NUM_CPU(s) ((s)->num_cpu)
+
+static inline int gic_get_current_cpu(GICState *s)
+{
+    if (s->num_cpu > 1) {
+        return current_cpu->cpu_index;
+    }
+    return 0;
+}
+
+/* TODO: Many places that call this routine could be optimized.  */
+/* Update interrupt status after enabled or pending bits have been changed.  */
+void gicv3_update(GICState *s)
+{
+    int best_irq;
+    int best_prio;
+    int irq;
+    int level;
+    int cpu;
+    uint64_t cm;
+
+    for (cpu = 0; cpu < NUM_CPU(s); cpu++) {
+        cm = 1ll << cpu;
+        s->current_pending[cpu] = 1023;
+        if (!s->enabled || !s->cpu_enabled[cpu]) {
+            qemu_irq_lower(s->parent_irq[cpu]);
+            /* In original GICv4 there is a return here. But if status is
+             * disbaled then all parent IRQs need to be lowered
+             * And assume cpu i is disabled then with the original GICv4
+             * implementation cpu - 1 will be considerd but not cpu + 1
+             */
+            continue;
+        }
+        best_prio = 0x100;
+        best_irq = 1023;
+        for (irq = 0; irq < s->num_irq; irq++) {
+            if (GIC_TEST_ENABLED(irq, cm) && gic_test_pending(s, irq, cm) &&
+                (irq < GICV3_INTERNAL || GIC_TARGET(irq) & cm)) {
+                if (GIC_GET_PRIORITY(irq, cpu) < best_prio) {
+                    best_prio = GIC_GET_PRIORITY(irq, cpu);
+                    best_irq = irq;
+                }
+            }
+        }
+        level = 0;
+        if (best_prio < s->priority_mask[cpu]) {
+            s->current_pending[cpu] = best_irq;
+            if (best_prio < s->running_priority[cpu]) {
+                DPRINTF("Raised pending IRQ %d (cpu %d)\n", best_irq, cpu);
+                level = 1;
+            }
+        }
+        qemu_set_irq(s->parent_irq[cpu], level);
+    }
+}
+
+static void gicv3_set_irq_generic(GICState *s, int irq, int level,
+                                  uint64_t cm, uint64_t target)
+{
+    if (level) {
+        GIC_SET_LEVEL(irq, cm);
+        DPRINTF("Set %d pending mask 0x%lx\n", irq, target);
+        if (GIC_TEST_EDGE_TRIGGER(irq)) {
+            GIC_SET_PENDING(irq, target);
+        }
+    } else {
+        GIC_CLEAR_LEVEL(irq, cm);
+    }
+}
+
+/* Process a change in an external IRQ input.  */
+static void gic_set_irq(void *opaque, int irq, int level)
+{
+    /* Meaning of the 'irq' parameter:
+     *  [0..N-1] : external interrupts
+     *  [N..N+31] : PPI (internal) interrupts for CPU 0
+     *  [N+32..N+63] : PPI (internal interrupts for CPU 1
+     *  ...
+     */
+    GICState *s = (GICState *)opaque;
+    uint64_t cm, target;
+
+    if (irq < (s->num_irq - GICV3_INTERNAL)) {
+        /* The first external input line is internal interrupt 32.  */
+        cm = ALL_CPU_MASK;
+        irq += GICV3_INTERNAL;
+        target = GIC_TARGET(irq);
+    } else {
+        int cpu;
+        irq -= (s->num_irq - GICV3_INTERNAL);
+        cpu = irq / GICV3_INTERNAL;
+        irq %= GICV3_INTERNAL;
+        cm = 1ll << cpu;
+        target = cm;
+    }
+
+    assert(irq >= GICV3_NR_SGIS);
+
+    if (level == GIC_TEST_LEVEL(irq, cm)) {
+        return;
+    }
+
+    gicv3_set_irq_generic(s, irq, level, cm, target);
+
+    gicv3_update(s);
+}
+
+static void gic_set_running_irq(GICState *s, int cpu, int irq)
+{
+    s->running_irq[cpu] = irq;
+    if (irq == 1023) {
+        s->running_priority[cpu] = 0x100;
+    } else {
+        s->running_priority[cpu] = GIC_GET_PRIORITY(irq, cpu);
+    }
+    gicv3_update(s);
+}
+
+uint32_t gicv3_acknowledge_irq(GICState *s, int cpu)
+{
+    int ret, irq, src;
+    uint64_t cm = 1ll << cpu;
+    irq = s->current_pending[cpu];
+    if (irq == 1023
+            || GIC_GET_PRIORITY(irq, cpu) >= s->running_priority[cpu]) {
+        DPRINTF("ACK no pending IRQ\n");
+        return 1023;
+    }
+    s->last_active[irq][cpu] = s->running_irq[cpu];
+
+    if (irq < GICV3_NR_SGIS) {
+        /* Lookup the source CPU for the SGI and clear this in the
+         * sgi_pending map.  Return the src and clear the overall pending
+         * state on this CPU if the SGI is not pending from any CPUs.
+         */
+        assert(s->sgi_state[irq].pending[cpu] != 0);
+        src = ctz64(s->sgi_state[irq].pending[cpu]);
+        s->sgi_state[irq].pending[cpu] &= ~(1ll << src);
+        if (s->sgi_state[irq].pending[cpu] == 0) {
+            GIC_CLEAR_PENDING(irq, GIC_TEST_MODEL(irq) ? ALL_CPU_MASK : cm);
+        }
+        /* GICv3 krenel driver doen't mask src bits like GICv2 driver
+         * so don't add src i.e. ret = irq | ((src & 0x7) << 10);
+         * Need to check spec in order to see it it is right
+         */
+        ret = irq;
+    } else {
+        //DPRINTF("ACK irq(%d) cpu(%d) \n", irq, cpu);
+        /* Clear pending state for both level and edge triggered
+         * interrupts. (level triggered interrupts with an active line
+         * remain pending, see gic_test_pending)
+         */
+        GIC_CLEAR_PENDING(irq, GIC_TEST_MODEL(irq) ? ALL_CPU_MASK : cm);
+        ret = irq;
+    }
+
+    gic_set_running_irq(s, cpu, irq);
+    DPRINTF("out ACK irq-ret(%d) cpu(%d) \n", ret, cpu);
+    return ret;
+}
+
+void gicv3_set_priority(GICState *s, int cpu, int irq, uint8_t val)
+{
+    if (irq < GICV3_INTERNAL) {
+        s->priority1[irq][cpu] = val;
+    } else {
+        s->priority2[(irq) - GICV3_INTERNAL] = val;
+    }
+}
+
+void gicv3_complete_irq(GICState *s, int cpu, int irq)
+{
+    int update = 0;
+
+    DPRINTF("EOI irq(%d) cpu (%d)\n", irq, cpu);
+    if (irq >= s->num_irq) {
+        /* This handles two cases:
+         * 1. If software writes the ID of a spurious interrupt [ie 1023]
+         * to the GICC_EOIR, the GIC ignores that write.
+         * 2. If software writes the number of a non-existent interrupt
+         * this must be a subcase of "value written does not match the last
+         * valid interrupt value read from the Interrupt Acknowledge
+         * register" and so this is UNPREDICTABLE. We choose to ignore it.
+         */
+        return;
+    }
+
+    if (s->running_irq[cpu] == 1023)
+        return; /* No active IRQ.  */
+
+    if (irq != s->running_irq[cpu]) {
+        /* Complete an IRQ that is not currently running.  */
+        int tmp = s->running_irq[cpu];
+        while (s->last_active[tmp][cpu] != 1023) {
+            if (s->last_active[tmp][cpu] == irq) {
+                s->last_active[tmp][cpu] = s->last_active[irq][cpu];
+                break;
+            }
+            tmp = s->last_active[tmp][cpu];
+        }
+        if (update) {
+            gicv3_update(s);
+        }
+    } else {
+        /* Complete the current running IRQ.  */
+        gic_set_running_irq(s, cpu, s->last_active[s->running_irq[cpu]][cpu]);
+    }
+}
+
+static uint64_t gic_dist_readb(void *opaque, hwaddr offset)
+{
+    GICState *s = (GICState *)opaque;
+    uint64_t res;
+    int irq;
+    int i;
+    int cpu;
+    uint64_t cm;
+    uint64_t mask;
+
+    cpu = gic_get_current_cpu(s);
+    if (offset & 3)
+        return 0;
+    cm = 1ll << cpu;
+    if (offset < 0x100) {
+        if (offset == 0) {/* GICD_CTLR */
+            DPRINTF("Distribution GICD_CTLR(%d) enable(%d)\n", cpu, s->enabled);
+            return s->enabled;
+        }
+        if (offset == 4) { /* GICD_TYPER */
+            uint64_t num = NUM_CPU(s);
+            /* the number of cores in the system, saturated to 8 minus one. */
+            if (num > 8)
+                num = 8;
+            res = s->num_irq / 32;
+            res |= (num - 1) << 5;
+            res |= 0xF << 19;
+            return res;
+        }
+        if (offset < 0x08)
+            return 0;
+        if (offset == 0x08)
+            return 0x43B; /* GIC_IIDR ARM */
+        if (offset >= 0x80) {
+            /* Interrupt Security , RAZ/WI */
+            return 0;
+        }
+        goto bad_reg;
+    } else if (offset < 0x200) {
+        /* Interrupt Set/Clear Enable.  */
+        if (offset < 0x180)
+            irq = (offset - 0x100) * 8;
+        else
+            irq = (offset - 0x180) * 8;
+        irq += GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        res = 0;
+        for (i = 0; i < 8; i++) {
+            if (GIC_TEST_ENABLED(irq + i, cm)) {
+                res |= (1 << i);
+            }
+        }
+    } else if (offset < 0x300) {
+        /* Interrupt Set/Clear Pending.  */
+        if (offset < 0x280)
+            irq = (offset - 0x200) * 8;
+        else
+            irq = (offset - 0x280) * 8;
+        irq += GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        res = 0;
+        mask = (irq < GICV3_INTERNAL) ?  cm : ALL_CPU_MASK;
+        for (i = 0; i < 8; i++) {
+            if (gic_test_pending(s, irq + i, mask)) {
+                res |= (1 << i);
+            }
+        }
+    } else if (offset < 0x400) {
+        /* Interrupt Active.  */
+        irq = (offset - 0x300) * 8 + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        res = 0;
+        mask = (irq < GICV3_INTERNAL) ?  cm : ALL_CPU_MASK;
+        for (i = 0; i < 8; i++) {
+            if (GIC_TEST_ACTIVE(irq + i, mask)) {
+                res |= (1 << i);
+            }
+        }
+    } else if (offset < 0x800) {
+        /* Interrupt Priority.  */
+        irq = (offset - 0x400) + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        res = GIC_GET_PRIORITY(irq, cpu);
+    } else if (offset < 0xc00) {
+        /* Interrupt CPU Target.  */
+        if (s->num_cpu == 1) {
+            /* For uniprocessor GICs these RAZ/WI */
+            res = 0;
+        } else {
+            irq = (offset - 0x800) + GICV3_BASE_IRQ;
+            if (irq >= s->num_irq) {
+                goto bad_reg;
+            }
+            if (irq >= 29 && irq <= 31) {
+                res = cm;
+            } else {
+                res = GIC_TARGET(irq);
+            }
+        }
+    } else if (offset < 0xf00) {
+        /* Interrupt Configuration.  */
+        irq = (offset - 0xc00) * 4 + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        res = 0;
+        for (i = 0; i < 4; i++) {
+            if (GIC_TEST_MODEL(irq + i))
+                res |= (1 << (i * 2));
+            if (GIC_TEST_EDGE_TRIGGER(irq + i))
+                res |= (2 << (i * 2));
+        }
+    } else if (offset < 0xf10) {
+        goto bad_reg;
+    } else if (offset < 0xf30) {
+        /* These are 32 bit registers, should not be used with 128 coers. */
+        if (offset < 0xf20) {
+            /* GICD_CPENDSGIRn */
+            irq = (offset - 0xf10);
+        } else {
+            irq = (offset - 0xf20);
+            /* GICD_SPENDSGIRn */
+        }
+
+        res = s->sgi_state[irq].pending[cpu];
+    } else if (offset < 0xffd0) {
+        goto bad_reg;
+    } else /* offset >= 0xffd0 */ {
+        if (offset & 3) {
+            res = 0;
+        } else {
+            res = gic_dist_ids[(offset - 0xffd0) >> 2];
+        }
+    }
+    return res;
+bad_reg:
+    qemu_log_mask(LOG_GUEST_ERROR,
+                  "%s: Bad offset %x\n", __func__, (int)offset);
+    return 0;
+}
+
+static uint64_t gic_dist_readw(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_dist_readb(opaque, offset);
+    val |= gic_dist_readb(opaque, offset + 1) << 8;
+    return val;
+}
+
+static uint64_t gic_dist_readl(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_dist_readw(opaque, offset);
+    val |= gic_dist_readw(opaque, offset + 2) << 16;
+    return val;
+}
+
+static uint64_t gic_dist_readll(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_dist_readl(opaque, offset);
+    val |= gic_dist_readl(opaque, offset + 4) << 32;
+    return val;
+}
+
+static void gic_dist_writeb(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    GICState *s = (GICState *)opaque;
+    int irq;
+    int i;
+    int cpu;
+
+    cpu = gic_get_current_cpu(s);
+
+    if (offset < 0x100) {
+        if (offset == 0) {
+            s->enabled = (value & 1);
+            DPRINTF("Distribution %sabled\n", s->enabled ? "En" : "Dis");
+        } else if (offset < 4) {
+            /* ignored.  */
+        } else if (offset >= 0x80) {
+            /* Interrupt Security Registers, RAZ/WI */
+        } else {
+            goto bad_reg;
+        }
+    } else if (offset < 0x180) {
+        /* Interrupt Set Enable.  */
+        irq = (offset - 0x100) * 8 + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        if (irq < GICV3_NR_SGIS) {
+            DPRINTF("ISENABLERn SGI should be only in redistributer %d\n", irq);
+            /* Ignored according to comment g in documment.*/
+            return;
+        }
+
+        for (i = 0; i < 8; i++) {
+            if (value & (1 << i)) {
+                uint64_t mask =
+                    (irq < GICV3_INTERNAL) ? (1ll << cpu) : GIC_TARGET(irq + i);
+                uint64_t cm = (irq < GICV3_INTERNAL) ? (1ll << cpu) : ALL_CPU_MASK;
+
+                if (!GIC_TEST_ENABLED(irq + i, cm)) {
+                    DPRINTF("Enabled IRQ %d\n", irq + i);
+                }
+                GIC_SET_ENABLED(irq + i, cm);
+                /* If a raised level triggered IRQ enabled then mark
+                   is as pending.  */
+                if (GIC_TEST_LEVEL(irq + i, mask)
+                        && !GIC_TEST_EDGE_TRIGGER(irq + i)) {
+                    if (irq + i == 0)
+                        DPRINTF("Set %d pending mask %lx\n", irq + i, mask);
+                    GIC_SET_PENDING(irq + i, mask);
+                }
+            }
+        }
+    } else if (offset < 0x200) {
+        /* Interrupt Clear Enable.  */
+        irq = (offset - 0x180) * 8 + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        if (irq < GICV3_NR_SGIS) {
+            DPRINTF("ICENABLERn SGI should be only in redistributer %d\n", irq);
+            /* Ignored according to comment g in documment.*/
+            return;
+        }
+
+        for (i = 0; i < 8; i++) {
+            if (value & (1 << i)) {
+                uint64_t cm = (irq < GICV3_INTERNAL) ? (1ll << cpu) : ALL_CPU_MASK;
+
+                if (GIC_TEST_ENABLED(irq + i, cm)) {
+                    DPRINTF("Disabled IRQ %d\n", irq + i);
+                }
+                GIC_CLEAR_ENABLED(irq + i, cm);
+            }
+        }
+    } else if (offset < 0x280) {
+        /* Interrupt Set Pending.  */
+        irq = (offset - 0x200) * 8 + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        if (irq < GICV3_NR_SGIS) {
+            value = 0;
+        }
+
+        for (i = 0; i < 8; i++) {
+            if (value & (1 << i)) {
+                GIC_SET_PENDING(irq + i, GIC_TARGET(irq + i));
+            }
+        }
+    } else if (offset < 0x300) {
+        /* Interrupt Clear Pending.  */
+        irq = (offset - 0x280) * 8 + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        if (irq < GICV3_NR_SGIS) {
+            value = 0;
+        }
+
+        for (i = 0; i < 8; i++) {
+            /* ??? This currently clears the pending bit for all CPUs, even
+               for per-CPU interrupts. It's unclear whether this is the
+               corect behavior.  */
+            if (value & (1 << i)) {
+                GIC_CLEAR_PENDING(irq + i, ALL_CPU_MASK);
+            }
+        }
+    } else if (offset < 0x400) {
+        /* Interrupt Active.  */
+        goto bad_reg;
+    } else if (offset < 0x800) {
+        /* Interrupt Priority.  */
+        irq = (offset - 0x400) + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        gicv3_set_priority(s, cpu, irq, value);
+    } else if (offset < 0xc00) {
+        /* Interrupt CPU Target. RAZ/WI on uni-processor GICs, with the
+         * annoying exception of the 11MPCore's GIC.
+         */
+        if (s->num_cpu != 1) {
+            irq = (offset - 0x800) + GICV3_BASE_IRQ;
+            if (irq >= s->num_irq) {
+                goto bad_reg;
+            }
+            if (irq < 29) {
+                value = 0;
+            } else if (irq < GICV3_INTERNAL) {
+                value = ALL_CPU_MASK;
+            }
+            s->irq_target[irq] = value & ALL_CPU_MASK;
+        }
+    } else if (offset < 0xf00) {
+        /* Interrupt Configuration.  */
+        irq = (offset - 0xc00) * 4 + GICV3_BASE_IRQ;
+        if (irq >= s->num_irq)
+            goto bad_reg;
+        if (irq < GICV3_NR_SGIS)
+            value |= 0xaa;
+        for (i = 0; i < 4; i++) {
+            if (value & (2 << (i * 2))) {
+                GIC_SET_EDGE_TRIGGER(irq + i);
+            } else {
+                GIC_CLEAR_EDGE_TRIGGER(irq + i);
+            }
+        }
+    } else if (offset < 0xf10) {
+        /* 0xf00 is only handled for 32-bit writes.  */
+        goto bad_reg;
+    } else if (offset < 0xf20) {
+        /* GICD_CPENDSGIRn */
+        /* This is a 32 bits register shouldn't be used with 128 cores */
+        irq = (offset - 0xf10);
+        DPRINTF("GICD_CPENDSGIRn irq(%d) %lu\n", irq, value);
+
+        s->sgi_state[irq].pending[cpu] &= ~value;
+        if (s->sgi_state[irq].pending[cpu] == 0) {
+            GIC_CLEAR_PENDING(irq, 1ll << cpu);
+        }
+    } else if (offset < 0xf30) {
+        /* GICD_SPENDSGIRn */
+        irq = (offset - 0xf20);
+        DPRINTF("GICD_SPENDSGIRn irq(%d) %lu\n", irq, value);
+
+        GIC_SET_PENDING(irq, 1ll << cpu);
+        s->sgi_state[irq].pending[cpu] |= value;
+    } else {
+        goto bad_reg;
+    }
+    gicv3_update(s);
+    return;
+bad_reg:
+    qemu_log_mask(LOG_GUEST_ERROR,
+                  "%s: Bad offset %x\n", __func__, (int)offset);
+}
+
+static void gic_dist_writew(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_dist_writeb(opaque, offset, value & 0xff);
+    gic_dist_writeb(opaque, offset + 1, value >> 8);
+}
+
+static void gic_dist_writel(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    if (offset == 0xf00) {
+        /* GICD_SGIR Software generated Interrupt register
+         * This register should not be used if GICv2 backwareds competability
+         * support is not included. (comment t page 3-8 on GIC-500 doc)
+         */
+        int cpu;
+        int irq;
+        uint64_t mask, cm;
+        int target_cpu;
+        GICState *s = (GICState *)opaque;
+
+        DPRINTF("GICv2 backwards competability is not supported\n");
+        cpu = gic_get_current_cpu(s);
+        irq = value & 0x3ff;
+        switch ((value >> 24) & 3) {
+        case 0:
+            mask = (value >> 16) & ALL_CPU_MASK;
+            break;
+        case 1:
+            mask = ALL_CPU_MASK ^ (1ll << cpu);
+            break;
+        case 2:
+            mask = 1ll << cpu;
+            break;
+        default:
+            DPRINTF("Bad Soft Int target filter\n");
+            mask = ALL_CPU_MASK;
+            break;
+        }
+        cm = (1ll << cpu);
+        DPRINTF("irq(%d) mask(%lu)\n", irq, mask);
+        GIC_SET_PENDING(irq, mask);
+        target_cpu = ctz64(mask);
+        while (target_cpu < GICV3_NCPU) {
+            s->sgi_state[irq].pending[target_cpu] |= cm;
+            mask &= ~(1ll << target_cpu);
+            target_cpu = ctz64(mask);
+        }
+        gicv3_update(s);
+        return;
+    }
+    gic_dist_writew(opaque, offset, value & 0xffff);
+    gic_dist_writew(opaque, offset + 2, value >> 16);
+}
+
+static void gic_dist_writell(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    GICState *s = (GICState *)opaque;
+    //DPRINTF("offset %p data %p\n", (void *) offset, (void *) value);
+
+    if (offset >= 0x6100 && offset <= 0x7EF8) {
+        int irq = (offset - 0x6100) / 8;
+        /* GCID_IROUTERn [affinity-3:X:affinity-2:affinity-1:affininty-0]
+         * See kernel code for fields
+         * GIC 500 currently supports 32 clusters with 8 cores each,
+         * but virtv2 fills the Aff0 before filling Aff1 so
+         * 16 = 2 * 8 but not 4 x 4 nor 8 x 2 not 16 x 1
+         */
+        uint32_t cpu, Aff1, Aff0;
+        Aff1 = (value & 0xf00) >> (8 - 3); /* Shift by 8 multiply by 8 */
+        Aff0 = value & 0x7;
+        cpu = Aff1 + Aff0;
+        s->irq_target[irq] = 1ll << cpu;
+        gicv3_update(s);
+        DPRINTF("irq(%d) cpu(%d)\n", irq, cpu);
+        return;
+    }
+
+    gic_dist_writel(opaque, offset, value & 0xffffffff);
+    gic_dist_writel(opaque, offset + 4, value >> 32);
+}
+
+static uint64_t gic_dist_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint64_t data;
+    switch (size) {
+    case 1:
+        data = gic_dist_readb(opaque, addr);
+        break;
+    case 2:
+        data = gic_dist_readw(opaque, addr);
+        break;
+    case 4:
+        data = gic_dist_readl(opaque, addr);
+        break;
+    case 8:
+        data = gic_dist_readll(opaque, addr);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+    //DPRINTF("offset %p data %p\n", (void *) addr, (void *) data);
+    return data;
+}
+
+static void gic_dist_write(void *opaque, hwaddr addr, uint64_t data, unsigned size)
+{
+    //DPRINTF("offset %p data %p\n", (void *) addr, (void *) data);
+    switch (size) {
+    case 1:
+        gic_dist_writeb(opaque, addr, data);
+        break;
+    case 2:
+        gic_dist_writew(opaque, addr, data);
+        break;
+    case 4:
+        gic_dist_writel(opaque, addr, data);
+        break;
+    case 8:
+        gic_dist_writell(opaque, addr, data);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+}
+
+static const MemoryRegionOps gic_dist_ops = {
+    .read = gic_dist_read,
+    .write = gic_dist_write,
+    .impl = {
+         .min_access_size = 4,
+         .max_access_size = 8,
+     },
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static uint64_t gic_its_readb(void *opaque, hwaddr offset)
+{
+    return 0;
+}
+
+static uint64_t gic_its_readw(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_its_readb(opaque, offset);
+    val |= gic_its_readb(opaque, offset + 1) << 8;
+    return val;
+}
+
+static uint64_t gic_its_readl(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_its_readw(opaque, offset);
+    val |= gic_its_readw(opaque, offset + 2) << 16;
+    return val;
+}
+
+static uint64_t gic_its_readll(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_its_readl(opaque, offset);
+    val |= gic_its_readl(opaque, offset + 4) << 32;
+    return val;
+}
+
+static void gic_its_writeb(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    GICState *s = (GICState *)opaque;
+    gicv3_update(s);
+    return;
+}
+
+static void gic_its_writew(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_its_writeb(opaque, offset, value & 0xff);
+    gic_its_writeb(opaque, offset + 1, value >> 8);
+}
+
+static void gic_its_writel(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_its_writel(opaque, offset, value & 0xffff);
+    gic_its_writel(opaque, offset + 2, value >> 16);
+}
+
+static void gic_its_writell(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_its_writell(opaque, offset, value & 0xffffffff);
+    gic_its_writell(opaque, offset + 4, value >> 32);
+}
+
+static uint64_t gic_its_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint64_t data;
+    switch (size) {
+    case 1:
+        data = gic_its_readb(opaque, addr);
+        break;
+    case 2:
+        data = gic_its_readw(opaque, addr);
+        break;
+    case 4:
+        data = gic_its_readl(opaque, addr);
+        break;
+    case 8:
+        data = gic_its_readll(opaque, addr);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+    DPRINTF("offset %p data %p\n", (void *) addr, (void *) data);
+    return data;
+}
+
+static void gic_its_write(void *opaque, hwaddr addr, uint64_t data, unsigned size)
+{
+    DPRINTF("offset %p data %p\n", (void *) addr, (void *) data);
+    switch (size) {
+    case 1:
+        gic_its_writew(opaque, addr, data);
+        break;
+    case 2:
+        gic_its_writew(opaque, addr, data);
+        break;
+    case 4:
+        gic_its_writel(opaque, addr, data);
+        break;
+    case 8:
+        gic_its_writell(opaque, addr, data);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+}
+
+static const MemoryRegionOps gic_its_ops = {
+    .read = gic_its_read,
+    .write = gic_its_write,
+    .impl = {
+         .min_access_size = 4,
+         .max_access_size = 8,
+     },
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static uint64_t gic_spi_readb(void *opaque, hwaddr offset)
+{
+    return 0;
+}
+
+static uint64_t gic_spi_readw(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_spi_readb(opaque, offset);
+    val |= gic_spi_readb(opaque, offset + 1) << 8;
+    return val;
+}
+
+static uint64_t gic_spi_readl(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_spi_readw(opaque, offset);
+    val |= gic_spi_readw(opaque, offset + 2) << 16;
+    return val;
+}
+
+static uint64_t gic_spi_readll(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_spi_readl(opaque, offset);
+    val |= gic_spi_readl(opaque, offset + 4) << 32;
+    return val;
+}
+
+static void gic_spi_writeb(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    GICState *s = (GICState *)opaque;
+    gicv3_update(s);
+    return;
+}
+
+static void gic_spi_writew(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_spi_writeb(opaque, offset, value & 0xff);
+    gic_spi_writeb(opaque, offset + 1, value >> 8);
+}
+
+static void gic_spi_writel(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_spi_writew(opaque, offset, value & 0xffff);
+    gic_spi_writew(opaque, offset + 2, value >> 16);
+}
+
+static void gic_spi_writell(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_spi_writel(opaque, offset, value & 0xffffffff);
+    gic_spi_writel(opaque, offset + 4, value >> 32);
+}
+
+static uint64_t gic_spi_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint64_t data;
+    switch (size) {
+    case 1:
+        data = gic_spi_readb(opaque, addr);
+        break;
+    case 2:
+        data = gic_spi_readw(opaque, addr);
+        break;
+    case 4:
+        data = gic_spi_readl(opaque, addr);
+        break;
+    case 8:
+        data = gic_spi_readll(opaque, addr);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+    DPRINTF("offset %p data %p\n", (void *) addr, (void *) data);
+    return data;
+}
+
+static void gic_spi_write(void *opaque, hwaddr addr, uint64_t data, unsigned size)
+{
+    DPRINTF("offset %p data %p\n", (void *) addr, (void *) data);
+    switch (size) {
+    case 1:
+        gic_spi_writeb(opaque, addr, data);
+        break;
+    case 2:
+        gic_spi_writew(opaque, addr, data);
+        break;
+    case 4:
+        gic_spi_writel(opaque, addr, data);
+        break;
+    case 8:
+        gic_spi_writell(opaque, addr, data);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+}
+
+static const MemoryRegionOps gic_spi_ops = {
+    .read = gic_spi_read,
+    .write = gic_spi_write,
+    .impl = {
+         .min_access_size = 4,
+         .max_access_size = 8,
+     },
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+
+static uint64_t gic_its_cntrl_readb(void *opaque, hwaddr offset)
+{
+    GICState *s = (GICState *)opaque;
+    uint64_t res=0;
+
+    if (offset < 0x100) {
+          if (offset == 0)
+            return 0;
+          if (offset == 4)
+              return 0;
+          if (offset < 0x08)
+            return s->num_cpu;
+          if (offset >= 0x80) {
+            return 0;
+          }
+          goto bad_reg;
+      }
+    return res;
+bad_reg:
+    qemu_log_mask(LOG_GUEST_ERROR,
+                  "%s: Bad offset %x\n", __func__, (int)offset);
+    return 0;
+}
+
+static uint64_t gic_its_cntrl_readw(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_its_cntrl_readb(opaque, offset);
+    val |= gic_its_cntrl_readb(opaque, offset + 1) << 8;
+    return val;
+}
+
+static uint64_t gic_its_cntrl_readl(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_its_cntrl_readw(opaque, offset);
+    val |= gic_its_cntrl_readw(opaque, offset + 2) << 16;
+    return val;
+}
+
+static uint64_t gic_its_cntrl_readll(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_its_cntrl_readl(opaque, offset);
+    val |= gic_its_cntrl_readl(opaque, offset + 4) << 32;
+    return val;
+}
+
+static void gic_its_cntrl_writeb(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    GICState *s = (GICState *)opaque;
+    if (offset < 0x100) {
+        if (offset < 0x08)
+            s->num_cpu = value;
+        else
+            goto bad_reg;
+    }
+    gicv3_update(s);
+    return;
+bad_reg:
+    qemu_log_mask(LOG_GUEST_ERROR,
+                  "%s: Bad offset %x\n", __func__, (int)offset);
+}
+
+static void gic_its_cntrl_writew(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_its_cntrl_writeb(opaque, offset, value & 0xff);
+    gic_its_cntrl_writeb(opaque, offset + 1, value >> 8);
+}
+
+static void gic_its_cntrl_writel(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_its_cntrl_writew(opaque, offset, value & 0xffff);
+    gic_its_cntrl_writew(opaque, offset + 2, value >> 16);
+}
+
+static void gic_its_cntrl_writell(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_its_cntrl_writel(opaque, offset, value & 0xffffffff);
+    gic_its_cntrl_writel(opaque, offset + 4, value >> 32);
+}
+
+static uint64_t gic_its_cntrl_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint64_t data;
+    switch (size) {
+    case 1:
+        data = gic_its_cntrl_readb(opaque, addr);
+        break;
+    case 2:
+        data = gic_its_cntrl_readw(opaque, addr);
+        break;
+    case 4:
+        data = gic_its_cntrl_readl(opaque, addr);
+        break;
+    case 8:
+        data = gic_its_cntrl_readll(opaque, addr);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+    DPRINTF("offset %p data %p\n", (void *) addr, (void *) data);
+    return data;
+}
+
+static void gic_its_cntrl_write(void *opaque, hwaddr addr, uint64_t data, unsigned size)
+{
+    DPRINTF("offset %p data %p\n", (void *) addr, (void *) data);
+    switch (size) {
+    case 1:
+        gic_its_cntrl_writeb(opaque, addr, data);
+        break;
+    case 2:
+        gic_its_cntrl_writew(opaque, addr, data);
+        break;
+    case 4:
+        gic_its_cntrl_writel(opaque, addr, data);
+        break;
+    case 8:
+        gic_its_cntrl_writell(opaque, addr, data);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+}
+
+static const MemoryRegionOps gic_its_cntrl_ops = {
+    .read = gic_its_cntrl_read,
+    .write = gic_its_cntrl_write,
+    .impl = {
+         .min_access_size = 4,
+         .max_access_size = 8,
+     },
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+
+static uint64_t gic_lpi_readb(void *opaque, hwaddr offset)
+{
+    GICState *s = (GICState *)opaque;
+    uint64_t res = 0;
+    uint64_t sgi_ppi, core, off;
+    uint64_t cm, i;
+
+    /* Table 3-2 page 3-4
+     * [          1<bits-for-core#>x<16-bits-offset>]
+     * x = 0: LPIs
+     * x = 1: SGIs & PPIs
+     */
+    off = offset;
+    sgi_ppi = off & (1 << 16);
+    core = (off >> 17) & s->cpu_mask;
+    offset = off & 0xFFFF;
+    cm = 1ll << core;
+
+    if (sgi_ppi) {
+        /* SGIs, PPIs */
+        /* Interrupt Set/Clear Enable.  */
+        if (offset < 0x200) {
+            int irq;
+            if (offset < 0x180)
+                irq = (offset - 0x100) * 8;
+            else
+                irq = (offset - 0x180) * 8;
+            irq += GICV3_BASE_IRQ;
+            if (irq >= s->num_irq)
+                goto bad_reg;
+            res = 0;
+            for (i = 0; i < 8; i++) {
+                if (GIC_TEST_ENABLED(irq + i, cm)) {
+                    res |= (1 << i);
+                }
+            }
+        } else if (offset < 0xc00) {
+            /* Interrupt Priority.  */
+            int irq;
+            irq = (offset - 0x400) + GICV3_BASE_IRQ;
+            if (irq >= s->num_irq)
+                goto bad_reg;
+            res = GIC_GET_PRIORITY(irq, core);
+        }
+    } else {
+        /* LPIs */
+        if (offset < 0x100) {
+            if (offset == 0) {/* GICR_CTLR */
+                DPRINTF("Redist-GICR_CTLR-CPU caller cpu(%d) core(%lu)\n",
+                    gic_get_current_cpu(s), core);
+                return 0;
+            }
+            if (offset == 4)
+                return 0x43B; /* ARM */
+            if (offset == 0x8) { /* GICR_IIDR */
+                res = core << 8; /* Linear */
+                /* Simple clustering */
+                res |= (core % 8) << 32; /* Afinity 0 */
+                res |= (core / 8) << 40; /* Afinity 1 */
+                if (core == s->num_cpu - 1) {
+                    /* Last redistributer */
+                    res |= 1 << 4;
+                }
+                return res;
+            }
+            if (offset == 0x14) { /* GICR_WAKER */
+                if (s->cpu_enabled[core])
+                    return 0;
+                else
+                    return GICR_WAKER_ProcessorSleep;
+                DPRINTF("Redist-CPU (%d) is enabled(%d)\n",
+                        gic_get_current_cpu(s), s->cpu_enabled[core]);
+
+            }
+            if (offset >= 0x80 && offset < 0xFFD0)
+                return 0;
+            goto bad_reg;
+        }
+        if (offset < 0xffd0) {
+            goto bad_reg;
+        } else /* offset >= 0xffd0 */ {
+            if (offset & 3) {
+                res = 0;
+            } else {
+                res = gic_lpi_ids[(offset - 0xffd0) >> 2];
+            }
+        }
+    }
+    return res;
+bad_reg:
+    qemu_log_mask(LOG_GUEST_ERROR,
+                  "%s: Bad offset %x\n", __func__, (int)offset);
+    return 0;
+}
+
+static uint64_t gic_lpi_readw(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_lpi_readb(opaque, offset);
+    val |= gic_lpi_readb(opaque, offset + 1) << 8;
+    return val;
+}
+
+static uint64_t gic_lpi_readl(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_lpi_readw(opaque, offset);
+    val |= gic_lpi_readw(opaque, offset + 2) << 16;
+    return val;
+}
+
+static uint64_t gic_lpi_readll(void *opaque, hwaddr offset)
+{
+    uint64_t val;
+    val = gic_lpi_readl(opaque, offset);
+    val |= gic_lpi_readl(opaque, offset + 4) << 32;
+    return val;
+}
+
+
+static void gic_lpi_writeb(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    GICState *s = (GICState *)opaque;
+    uint64_t sgi_ppi, core, off;
+    uint64_t cm;
+    /* Table 3-2 page 3-4
+     * [          1<bits-for-core#>x<16-bits-offset>]
+     * x = 0: LPIs
+     * x = 1: SGIs & PPIs
+     */
+    off = offset;
+    sgi_ppi = off & (1 << 16);
+    core = (off >> 17) & s->cpu_mask;
+    offset = off & 0xFFFF;
+    cm = 1ll << core;
+
+    if (sgi_ppi) {
+        /* SGIs, PPIs */
+        if (offset < 0x180) {
+            /* Interrupt Set Enable.  */
+            int irq, i;
+            irq = (offset - 0x100) * 8 + GICV3_BASE_IRQ;
+            if (irq >= s->num_irq)
+                goto bad_reg;
+            if (irq >= GICV3_INTERNAL) {
+                DPRINTF("ISENABLERn non Internal should be only in distributer %d\n", irq);
+                /* The registers after 0x100 are reserved */
+                return;
+            }
+            if (irq < GICV3_NR_SGIS) {
+                value = 0xff;
+            }
+
+            for (i = 0; i < 8; i++) {
+                if (value & (1 << i)) {
+                    /* This is redistributer ALL doesn't apply */
+                    if (!GIC_TEST_ENABLED(irq + i, cm)) {
+                        DPRINTF("Enabled IRQ %d\n", irq + i);
+                    }
+                    GIC_SET_ENABLED(irq + i, cm);
+                    /* If a raised level triggered IRQ enabled then mark
+                       is as pending.  */
+                    if (GIC_TEST_LEVEL(irq + i, cm)
+                            && !GIC_TEST_EDGE_TRIGGER(irq + i)) {
+                        DPRINTF("Set %d pending mask %lx\n", irq + i, cm);
+                        GIC_SET_PENDING(irq + i, cm);
+                    }
+                }
+            }
+        } else if (offset < 0x200) {
+            /* Interrupt Clear Enable.  */
+            int irq, i;
+            irq = (offset - 0x180) * 8 + GICV3_BASE_IRQ;
+            if (irq >= s->num_irq)
+                goto bad_reg;
+            if (irq >= GICV3_INTERNAL) {
+                DPRINTF("ICENABLERn non Internal should be only in distributer %d\n", irq);
+                /* The registers after 0x180 are reserved */
+                return;
+            }
+            if (irq < GICV3_NR_SGIS) {
+                value = 0;
+            }
+
+            for (i = 0; i < 8; i++) {
+                if (value & (1 << i)) {
+                    /* This is redistributer ALL doesn't apply */
+                    if (GIC_TEST_ENABLED(irq + i, cm)) {
+                        DPRINTF("Disabled IRQ %d\n", irq + i);
+                    }
+                    GIC_CLEAR_ENABLED(irq + i, cm);
+                }
+            }
+        } else if (offset < 0xc00) {
+            /* Interrupt Priority. */
+            int irq;
+            irq = (offset - 0x400) + GICV3_BASE_IRQ;
+            if (irq >= s->num_irq)
+                goto bad_reg;
+            gicv3_set_priority(s, core, irq, value);
+        }
+    } else {
+        /* LPIs */
+        if (offset == 0x14) { /* GICR_WAKER */
+            if (value & GICR_WAKER_ProcessorSleep)
+                s->cpu_enabled[core] = 0;
+            else
+                s->cpu_enabled[core] = 1;
+            DPRINTF("Redist-CPU (%d) core(%lu) set enabled(%d)\n",
+                    gic_get_current_cpu(s), core, s->cpu_enabled[core]);
+       }
+    }
+    gicv3_update(s);
+    return;
+
+    bad_reg:
+    qemu_log_mask(LOG_GUEST_ERROR,
+                  "%s: Bad offset %x\n", __func__, (int)offset);
+}
+
+static void gic_lpi_writew(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_lpi_writeb(opaque, offset, value & 0xff);
+    gic_lpi_writeb(opaque, offset + 1, value >> 8);
+}
+
+static void gic_lpi_writel(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_lpi_writew(opaque, offset, value & 0xffff);
+    gic_lpi_writew(opaque, offset + 2, value >> 16);
+}
+
+static void gic_lpi_writell(void *opaque, hwaddr offset,
+                            uint64_t value)
+{
+    gic_lpi_writel(opaque, offset, value & 0xffffffff);
+    gic_lpi_writel(opaque, offset + 4, value >> 32);
+}
+
+static uint64_t gic_lpi_read(void *opaque, hwaddr addr, unsigned size)
+{
+    uint64_t data;
+    switch (size) {
+    case 1:
+        data = gic_lpi_readb(opaque, addr);
+        break;
+    case 2:
+        data = gic_lpi_readw(opaque, addr);
+        break;
+    case 4:
+        data = gic_lpi_readl(opaque, addr);
+        break;
+    case 8:
+        data = gic_lpi_readll(opaque, addr);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+//    DPRINTF("[%s] offset %p data %p\n", addr & (1 << 16) ? "SGI/PPI" : "LPI" , (void *) addr, (void *) data);
+    return data;
+}
+
+static void gic_lpi_write(void *opaque, hwaddr addr, uint64_t data, unsigned size)
+{
+//    DPRINTF("[%s] offset %p data %p\n", addr & (1 << 16) ? "SGI/PPI" : "LPI" , (void *) addr, (void *) data);
+    switch (size) {
+    case 1:
+        gic_lpi_writeb(opaque, addr, data);
+        break;
+    case 2:
+        gic_lpi_writew(opaque, addr, data);
+        break;
+    case 4:
+        gic_lpi_writel(opaque, addr, data);
+        break;
+    case 8:
+        gic_lpi_writell(opaque, addr, data);
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: size %u\n", __func__, size);
+        assert(0);
+        break;
+    }
+}
+
+static const MemoryRegionOps  gic_lpi_ops = {
+    .read = gic_lpi_read,
+    .write = gic_lpi_write,
+    .impl = {
+         .min_access_size = 4,
+         .max_access_size = 8,
+     },
+    .endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+void gicv3_init_irqs_and_distributor(GICState *s, int num_irq)
+{
+    SysBusDevice *sbd = SYS_BUS_DEVICE(s);
+    int i;
+
+    DPRINTF(" ---- gicv3_init_irqs_and_distributor   ----- \n");
+    i = s->num_irq - GICV3_INTERNAL;
+    /* For the GIC, also expose incoming GPIO lines for PPIs for each CPU.
+     * GPIO array layout is thus:
+     *  [0..N-1] spi
+     *  [N..N+31] PPIs for CPU 0
+     *  [N+32..N+63] PPIs for CPU 1
+     *   ...
+     */
+    i += (GICV3_INTERNAL * s->num_cpu);
+    qdev_init_gpio_in(DEVICE(s), gic_set_irq, i);
+    for (i = 0; i < NUM_CPU(s); i++) {
+        sysbus_init_irq(sbd, &s->parent_irq[i]);
+    }
+
+    memory_region_init_io(&s->iomem_dist, OBJECT(s), &gic_dist_ops, s,"gic_dist", 0x10000);
+    memory_region_init_io(&s->iomem_spi, OBJECT(s), &gic_spi_ops, s,"gic_spi", 0x10000);
+    memory_region_init_io(&s->iomem_its_cntrl, OBJECT(s), &gic_its_cntrl_ops, s,"gic_its_cntrl", 0x10000);
+    memory_region_init_io(&s->iomem_its, OBJECT(s), &gic_its_ops, s,"gic_its_trans", 0x10000);
+    memory_region_init_io(&s->iomem_lpi, OBJECT(s), &gic_lpi_ops, s,"gic_lpi", 0x800000);
+}
+
+static void arm_gic_realize(DeviceState *dev, Error **errp)
+{
+    /* Device instance realize function for the GIC sysbus device */
+    int i;
+    GICState *s = ARM_GIC(dev);
+    SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
+    ARMGICClass *agc = ARM_GIC_GET_CLASS(s);
+    Error *local_err = NULL;
+    uint32_t power2;
+
+    agc->parent_realize(dev, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    gicv3_init_irqs_and_distributor(s, s->num_irq);
+
+    /* Tell the common code we're a GICv3 */
+    s->revision = REV_V3;
+
+    /* Compute mask for decoding the core number in redistributer */
+    if (is_power_of_2(NUM_CPU(s)))
+        power2 = NUM_CPU(s);
+    else
+        /* QEMU has only  pow2floor !!! */
+        power2 = pow2floor(2 * NUM_CPU(s));
+    s->cpu_mask = (power2 - 1);
+
+    DPRINTF(" -- NUM_CPUS(%d) - cpu mask(0%x) -- \n", NUM_CPU(s), s->cpu_mask);
+
+    for (i = 0; i < NUM_CPU(s); i++) {
+        s->backref[i] = s;
+    }
+
+    for (i = 0; i < GICV3_NCPU; i++)
+        s->cpu_enabled[i] = 0;
+
+    /* Init memory regions */
+    sysbus_init_mmio(sbd, &s->iomem_dist);
+    sysbus_init_mmio(sbd, &s->iomem_spi);
+    sysbus_init_mmio(sbd, &s->iomem_its_cntrl);
+    sysbus_init_mmio(sbd, &s->iomem_its);
+    sysbus_init_mmio(sbd, &s->iomem_lpi);
+}
+
+void armv8_gicv3_set_sgi(void *opaque, int cpuindex, uint64_t value)
+{
+    GICState *s = (GICState *) opaque;
+    int irq, i;
+    uint32_t target;
+    uint64_t cm = (1ll << cpuindex);
+
+    /* Page 2227 ICC_SGI1R_EL1 */
+
+    irq = (value >> 24) & 0xf;
+
+    /* The external routines use the hardware vector numbering, ie. the first
+     * IRQ is #16.  The internal GIC routines use #32 as the first IRQ.
+     */
+    if (irq >= 16)
+        irq += 16;
+
+    /* IRM bit */
+    if (value & (1ll << 40)) {
+        /* Send to all the cores */
+        for (i = 0; i < s->num_cpu; i++) {
+            s->sgi_state[irq].pending[i] |= cm;
+        }
+        GIC_SET_PENDING(irq, ALL_CPU_MASK);
+        DPRINTF("cpu(%d) sends irq(%d) to ALL\n", cpuindex, irq);
+    } else {
+        /* Find linear of first core in cluster. See page 2227 ICC_SGI1R_EL1
+         * With our GIC-500 implentation we can have 16 clusters of 8 cpu each.
+         */
+#if 1
+        target = (value & (0xfl << 16)) >> (16 - 3); /* shift 16 mult by 8 */
+#else
+        /* Prep for more advanced GIC */
+        target  = (value & (0xffl << 16)) >> (16 - 8);
+        target |= (value & (0xffl << 32)) >> (32 - 16);
+        target |= (value & (0xffl << 48)) >> (48 - 24);
+#endif
+
+        /* Use 8 and not 16 since only 8 cores can be in a cluster of GIC-500 */
+        assert((value & 0xff00) == 0);
+        for (i = 0; i < 8; i++) {
+            if (value & (1 << i)) {
+                //DPRINTF("cpu(%d) sends irq(%d) to cpu(%d)\n", cpuindex, irq, target + i);
+                s->sgi_state[irq].pending[target + i] |= cm;
+                GIC_SET_PENDING(irq, (1ll << (target + i)));
+             }
+         }
+    }
+    gicv3_update(s);
+}
+
+uint64_t armv8_gicv3_acknowledge_irq(void *opaque, int cpuindex)
+{
+    GICState *s = (GICState *) opaque;
+    return gicv3_acknowledge_irq(s, cpuindex);
+}
+
+void armv8_gicv3_complete_irq(void *opaque, int cpuindex, int irq)
+{
+    GICState *s = (GICState *) opaque;
+    irq &= 0xffffff;
+    gicv3_complete_irq(s, cpuindex, irq);
+}
+
+uint64_t armv8_gicv3_get_priority_mask(void *opaque, int cpuindex)
+{
+    GICState *s = (GICState *) opaque;
+    return s->priority_mask[cpuindex];
+}
+
+void armv8_gicv3_set_priority_mask(void *opaque, int cpuindex, uint32_t mask)
+{
+    GICState *s = (GICState *) opaque;
+    s->priority_mask[cpuindex] = mask & 0xff;
+}
+
+static void arm_gicv3_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    ARMGICClass *agc = ARM_GIC_CLASS(klass);
+
+    agc->parent_realize = dc->realize;
+    dc->realize = arm_gic_realize;
+}
+
+static const TypeInfo arm_gicv3_info = {
+    .name = TYPE_ARM_GICV3,
+    .parent = TYPE_ARM_GICV3_COMMON,
+    .instance_size = sizeof(GICState),
+    .class_init = arm_gicv3_class_init,
+    .class_size = sizeof(ARMGICClass),
+};
+
+static void arm_gicv3_register_types(void)
+{
+    type_register_static(&arm_gicv3_info);
+}
+
+type_init(arm_gicv3_register_types)
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
new file mode 100644
index 0000000..5875710
--- /dev/null
+++ b/hw/intc/arm_gicv3_common.c
@@ -0,0 +1,188 @@ 
+/*
+ * ARM GIC support - common bits of emulated and KVM kernel model
+ *
+ * Copyright (c) 2012 Linaro Limited
+ * Copyright (c) 2015 Huawei.
+ * Written by Peter Maydell
+ * Extended to 64 cores by Shlomo Pongratz
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "gicv3_internal.h"
+
+static void gicv3_pre_save(void *opaque)
+{
+    GICState *s = (GICState *)opaque;
+    ARMGICCommonClass *c = ARM_GIC_COMMON_GET_CLASS(s);
+
+    if (c->pre_save) {
+        c->pre_save(s);
+    }
+}
+
+static int gicv3_post_load(void *opaque, int version_id)
+{
+    GICState *s = (GICState *)opaque;
+    ARMGICCommonClass *c = ARM_GIC_COMMON_GET_CLASS(s);
+
+    if (c->post_load) {
+        c->post_load(s);
+    }
+    return 0;
+}
+
+static const VMStateDescription vmstate_gicv3_irq_state = {
+    .name = "arm_gicv3_irq_state",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(enabled, gicv3_irq_state),
+        VMSTATE_UINT64(pending, gicv3_irq_state),
+        VMSTATE_UINT64(active, gicv3_irq_state),
+        VMSTATE_UINT64(level, gicv3_irq_state),
+        VMSTATE_BOOL(model, gicv3_irq_state),
+        VMSTATE_BOOL(edge_trigger, gicv3_irq_state),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_gicv3_sgi_state = {
+    .name = "arm_gicv3_sgi_state",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64_ARRAY(pending, gicv3_sgi_state, GICV3_NCPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_gicv3 = {
+    .name = "arm_gicv3",
+    .version_id = 7,
+    .minimum_version_id = 7,
+    .pre_save = gicv3_pre_save,
+    .post_load = gicv3_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_BOOL(enabled, GICState),
+        VMSTATE_BOOL_ARRAY(cpu_enabled, GICState, GICV3_NCPU),
+        VMSTATE_STRUCT_ARRAY(irq_state, GICState, GICV3_MAXIRQ, 1,
+                             vmstate_gicv3_irq_state, gicv3_irq_state),
+        VMSTATE_UINT64_ARRAY(irq_target, GICState, GICV3_MAXIRQ),
+        VMSTATE_UINT8_2DARRAY(priority1, GICState, GICV3_INTERNAL, GICV3_NCPU),
+        VMSTATE_UINT8_ARRAY(priority2, GICState, GICV3_MAXIRQ - GICV3_INTERNAL),
+        VMSTATE_UINT16_2DARRAY(last_active, GICState, GICV3_MAXIRQ, GICV3_NCPU),
+        VMSTATE_STRUCT_ARRAY(sgi_state, GICState, GICV3_NR_SGIS, 1,
+                             vmstate_gicv3_sgi_state, gicv3_sgi_state),
+        VMSTATE_UINT16_ARRAY(priority_mask, GICState, GICV3_NCPU),
+        VMSTATE_UINT16_ARRAY(running_irq, GICState, GICV3_NCPU),
+        VMSTATE_UINT16_ARRAY(running_priority, GICState, GICV3_NCPU),
+        VMSTATE_UINT16_ARRAY(current_pending, GICState, GICV3_NCPU),
+        VMSTATE_UINT8_ARRAY(bpr, GICState, GICV3_NCPU),
+        VMSTATE_UINT8_ARRAY(abpr, GICState, GICV3_NCPU),
+        VMSTATE_UINT32_2DARRAY(apr, GICState, GIC_NR_APRS, GICV3_NCPU),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void arm_gicv3_common_realize(DeviceState *dev, Error **errp)
+{
+    GICState *s = ARM_GIC_COMMON(dev);
+    int num_irq = s->num_irq;
+
+    if (s->num_cpu > GICV3_NCPU) {
+        error_setg(errp, "requested %u CPUs exceeds GIC maximum %d",
+                   s->num_cpu, GICV3_NCPU);
+        return;
+    }
+    s->num_irq += GICV3_BASE_IRQ;
+    if (s->num_irq > GICV3_MAXIRQ) {
+        error_setg(errp,
+                   "requested %u interrupt lines exceeds GIC maximum %d",
+                   num_irq, GICV3_MAXIRQ);
+        return;
+    }
+    /* ITLinesNumber is represented as (N / 32) - 1 (see
+     * gic_dist_readb) so this is an implementation imposed
+     * restriction, not an architectural one:
+     */
+    if (s->num_irq < 32 || (s->num_irq % 32)) {
+        error_setg(errp,
+                   "%d interrupt lines unsupported: not divisible by 32",
+                   num_irq);
+        return;
+    }
+}
+
+static void arm_gicv3_common_reset(DeviceState *dev)
+{
+    GICState *s = ARM_GIC_COMMON(dev);
+    int i;
+    memset(s->irq_state, 0, GICV3_MAXIRQ * sizeof(gicv3_irq_state));
+    for (i = 0; i < s->num_cpu; i++) {
+        s->priority_mask[i] = 0;
+        s->current_pending[i] = 1023;
+        s->running_irq[i] = 1023;
+        s->running_priority[i] = 0x100;
+        s->cpu_enabled[i] = false;
+    }
+    for (i = 0; i < GICV3_NR_SGIS; i++) {
+        GIC_SET_ENABLED(i, ALL_CPU_MASK);
+        GIC_SET_EDGE_TRIGGER(i);
+    }
+    if (s->num_cpu == 1) {
+        /* For uniprocessor GICs all interrupts always target the sole CPU */
+        for (i = 0; i < GICV3_MAXIRQ; i++) {
+            s->irq_target[i] = 1;
+        }
+    }
+    s->enabled = false;
+}
+
+static Property arm_gicv3_common_properties[] = {
+    DEFINE_PROP_UINT32("num-cpu", GICState, num_cpu, 1),
+    DEFINE_PROP_UINT32("num-irq", GICState, num_irq, 32),
+    /* Revision can be 3 for GIC architecture specification
+     * versions 1 or 2, or 0 to indicate the legacy 11MPCore GIC.
+     * (Internally, 0xffffffff also indicates "not a GIC but an NVIC".)
+     */
+    DEFINE_PROP_UINT32("revision", GICState, revision, 3),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void arm_gicv3_common_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+
+    dc->reset = arm_gicv3_common_reset;
+    dc->realize = arm_gicv3_common_realize;
+    dc->props = arm_gicv3_common_properties;
+    dc->vmsd = &vmstate_gicv3;
+}
+
+static const TypeInfo arm_gicv3_common_type = {
+    .name = TYPE_ARM_GICV3_COMMON,
+    .parent = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(GICState),
+    .class_size = sizeof(ARMGICCommonClass),
+    .class_init = arm_gicv3_common_class_init,
+    .abstract = true,
+};
+
+static void register_types(void)
+{
+    type_register_static(&arm_gicv3_common_type);
+}
+
+type_init(register_types)
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
new file mode 100644
index 0000000..fae2b8c
--- /dev/null
+++ b/hw/intc/gicv3_internal.h
@@ -0,0 +1,153 @@ 
+/*
+ * ARM GIC support - internal interfaces
+ *
+ * Copyright (c) 2012 Linaro Limited
+ * Copyright (c) 2015 Huawei.
+ * Written by Peter Maydell
+ * Extended to 64 cores by Shlomo Pongratz
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef QEMU_ARM_GICV3_INTERNAL_H
+#define QEMU_ARM_GICV3_INTERNAL_H
+
+#include "hw/intc/arm_gicv3.h"
+
+#define ALL_CPU_MASK ((uint64_t) (0xffffffffffffffff))
+
+/* The NVIC has 16 internal vectors.  However these are not exposed
+   through the normal GIC interface.  */
+#define GICV3_BASE_IRQ (0)
+
+#define GIC_SET_ENABLED(irq, cm) s->irq_state[irq].enabled |= (cm)
+#define GIC_CLEAR_ENABLED(irq, cm) s->irq_state[irq].enabled &= ~(cm)
+#define GIC_TEST_ENABLED(irq, cm) ((s->irq_state[irq].enabled & (cm)) != 0)
+#define GIC_SET_PENDING(irq, cm) s->irq_state[irq].pending |= (cm)
+#define GIC_TEST_PENDING(irq, cm) ((s->irq_state[irq].pending & (cm)) != 0)
+#define GIC_CLEAR_PENDING(irq, cm) s->irq_state[irq].pending &= ~(cm)
+#define GIC_SET_ACTIVE(irq, cm) s->irq_state[irq].active |= (cm)
+#define GIC_CLEAR_ACTIVE(irq, cm) s->irq_state[irq].active &= ~(cm)
+#define GIC_TEST_ACTIVE(irq, cm) ((s->irq_state[irq].active & (cm)) != 0)
+#define GIC_SET_MODEL(irq) s->irq_state[irq].model = true
+#define GIC_CLEAR_MODEL(irq) s->irq_state[irq].model = false
+#define GIC_TEST_MODEL(irq) s->irq_state[irq].model
+#define GIC_SET_LEVEL(irq, cm) s->irq_state[irq].level |= (cm)
+#define GIC_CLEAR_LEVEL(irq, cm) s->irq_state[irq].level &= ~(cm)
+#define GIC_TEST_LEVEL(irq, cm) ((s->irq_state[irq].level & (cm)) != 0)
+#define GIC_SET_EDGE_TRIGGER(irq) s->irq_state[irq].edge_trigger = true
+#define GIC_CLEAR_EDGE_TRIGGER(irq) s->irq_state[irq].edge_trigger = false
+#define GIC_TEST_EDGE_TRIGGER(irq) (s->irq_state[irq].edge_trigger)
+#define GIC_GET_PRIORITY(irq, cpu) (((irq) < GICV3_INTERNAL) ?          \
+                                    s->priority1[irq][cpu] :            \
+                                    s->priority2[(irq) - GICV3_INTERNAL])
+#define GIC_TARGET(irq) s->irq_target[irq]
+
+/* The special cases for the revision property: */
+#define REV_11MPCORE 0
+#define REV_V3 3
+#define REV_NVIC 0xffffffff
+
+uint32_t gicv3_acknowledge_irq(GICState *s, int cpu);
+void gicv3_complete_irq(GICState *s, int cpu, int irq);
+void gicv3_update(GICState *s);
+void gicv3_init_irqs_and_distributor(GICState *s, int num_irq);
+void gicv3_set_priority(GICState *s, int cpu, int irq, uint8_t val);
+
+static inline bool gic_test_pending(GICState *s, int irq, uint64_t cm)
+{
+    /* Edge-triggered interrupts are marked pending on a rising edge, but
+     * level-triggered interrupts are either considered pending when the
+     * level is active or if software has explicitly written to
+     * GICD_ISPENDR to set the state pending.
+     */
+    return (s->irq_state[irq].pending & cm) ||
+        (!GIC_TEST_EDGE_TRIGGER(irq) && GIC_TEST_LEVEL(irq, cm));
+}
+
+
+#define GICD_CTLR            0x0000
+#define GICD_TYPER           0x0004
+#define GICD_IIDR            0x0008
+#define GICD_STATUSR         0x0010
+#define GICD_SETSPI_NSR      0x0040
+#define GICD_CLRSPI_NSR      0x0048
+#define GICD_SETSPI_SR       0x0050
+#define GICD_CLRSPI_SR       0x0058
+#define GICD_SEIR            0x0068
+#define GICD_ISENABLER       0x0100
+#define GICD_ICENABLER       0x0180
+#define GICD_ISPENDR         0x0200
+#define GICD_ICPENDR         0x0280
+#define GICD_ISACTIVER       0x0300
+#define GICD_ICACTIVER       0x0380
+#define GICD_IPRIORITYR      0x0400
+#define GICD_ICFGR           0x0C00
+#define GICD_IROUTER         0x6000
+#define GICD_PIDR2           0xFFE8
+
+#define GICD_CTLR_RWP           (1U << 31)
+#define GICD_CTLR_ARE_NS        (1U << 4)
+#define GICD_CTLR_ENABLE_G1A    (1U << 1)
+#define GICD_CTLR_ENABLE_G1     (1U << 0)
+
+#define GICD_IROUTER_SPI_MODE_ONE    (0U << 31)
+#define GICD_IROUTER_SPI_MODE_ANY    (1U << 31)
+
+#define GIC_PIDR2_ARCH_MASK   0xf0
+#define GIC_PIDR2_ARCH_GICv3  0x30
+#define GIC_PIDR2_ARCH_GICv4  0x40
+
+/*
+ * Re-Distributor registers, offsets from RD_base
+ */
+#define GICR_CTLR             GICD_CTLR
+#define GICR_IIDR             0x0004
+#define GICR_TYPER            0x0008
+#define GICR_STATUSR          GICD_STATUSR
+#define GICR_WAKER            0x0014
+#define GICR_SETLPIR          0x0040
+#define GICR_CLRLPIR          0x0048
+#define GICR_SEIR             GICD_SEIR
+#define GICR_PROPBASER        0x0070
+#define GICR_PENDBASER        0x0078
+#define GICR_INVLPIR          0x00A0
+#define GICR_INVALLR          0x00B0
+#define GICR_SYNCR            0x00C0
+#define GICR_MOVLPIR          0x0100
+#define GICR_MOVALLR          0x0110
+#define GICR_PIDR2            GICD_PIDR2
+
+#define GICR_WAKER_ProcessorSleep    (1U << 1)
+#define GICR_WAKER_ChildrenAsleep    (1U << 2)
+
+/*
+ * Re-Distributor registers, offsets from SGI_base
+ */
+#define GICR_ISENABLER0         GICD_ISENABLER
+#define GICR_ICENABLER0         GICD_ICENABLER
+#define GICR_ISPENDR0           GICD_ISPENDR
+#define GICR_ICPENDR0           GICD_ICPENDR
+#define GICR_ISACTIVER0         GICD_ISACTIVER
+#define GICR_ICACTIVER0         GICD_ICACTIVER
+#define GICR_IPRIORITYR0        GICD_IPRIORITYR
+#define GICR_ICFGR0             GICD_ICFGR
+
+#define GICR_TYPER_VLPIS        (1U << 1)
+#define GICR_TYPER_LAST         (1U << 4)
+
+
+
+
+#endif /* !QEMU_ARM_GIC_INTERNAL_H */
diff --git a/include/hw/intc/arm_gicv3.h b/include/hw/intc/arm_gicv3.h
new file mode 100644
index 0000000..e315bda
--- /dev/null
+++ b/include/hw/intc/arm_gicv3.h
@@ -0,0 +1,44 @@ 
+/*
+ * ARM GIC support
+ *
+ * Copyright (c) 2012 Linaro Limited
+ * Copyright (c) 2015 Huawei.
+ * Written by Peter Maydell
+ * Extended to 64 cores by Shlomo Pongratz
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_ARM_GICV3_H
+#define HW_ARM_GICV3_H
+
+#include "arm_gicv3_common.h"
+
+#define TYPE_ARM_GICV3 "arm_gicv3"
+#define ARM_GIC(obj) \
+     OBJECT_CHECK(GICState, (obj), TYPE_ARM_GICV3)
+#define ARM_GIC_CLASS(klass) \
+     OBJECT_CLASS_CHECK(ARMGICClass, (klass), TYPE_ARM_GICV3)
+#define ARM_GIC_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(ARMGICClass, (obj), TYPE_ARM_GICV3)
+
+typedef struct ARMGICClass {
+    /*< private >*/
+    ARMGICCommonClass parent_class;
+    /*< public >*/
+
+    DeviceRealize parent_realize;
+} ARMGICClass;
+
+#endif
diff --git a/include/hw/intc/arm_gicv3_common.h b/include/hw/intc/arm_gicv3_common.h
new file mode 100644
index 0000000..c02b4cb
--- /dev/null
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -0,0 +1,136 @@ 
+/*
+ * ARM GIC support
+ *
+ * Copyright (c) 2012 Linaro Limited
+ * Copyright (c) 2015 Huawei.
+ * Written by Peter Maydell
+ * Extended to 64 cores by Shlomo Pongratz
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef HW_ARM_GICV3_COMMON_H
+#define HW_ARM_GICV3_COMMON_H
+
+#include "hw/sysbus.h"
+
+/* Maximum number of possible interrupts, determined by the GIC architecture */
+#define GICV3_MAXIRQ 1020
+/* First 32 are private to each CPU (SGIs and PPIs). */
+#define GICV3_INTERNAL 32
+#define GICV3_NR_SGIS 16
+#define GICV3_NCPU 64
+
+#define MAX_NR_GROUP_PRIO 128
+#define GIC_NR_APRS (MAX_NR_GROUP_PRIO / 32)
+
+typedef struct gicv3_irq_state {
+    /* The enable bits are only banked for per-cpu interrupts.  */
+    uint64_t enabled;
+    uint64_t pending;
+    uint64_t active;
+    uint64_t level;
+    bool model; /* 0 = N:N, 1 = 1:N */
+    bool edge_trigger; /* true: edge-triggered, false: level-triggered  */
+} gicv3_irq_state;
+
+typedef struct gicv3_sgi_state {
+    uint64_t pending[GICV3_NCPU];
+} gicv3_sgi_state;
+
+typedef struct GICState {
+    /*< private >*/
+    SysBusDevice parent_obj;
+    /*< public >*/
+
+    qemu_irq parent_irq[GICV3_NCPU];
+    bool enabled;
+    bool cpu_enabled[GICV3_NCPU];
+
+    gicv3_irq_state irq_state[GICV3_MAXIRQ];
+    uint64_t irq_target[GICV3_MAXIRQ];
+    uint8_t priority1[GICV3_INTERNAL][GICV3_NCPU];
+    uint8_t priority2[GICV3_MAXIRQ - GICV3_INTERNAL];
+    uint16_t last_active[GICV3_MAXIRQ][GICV3_NCPU];
+    /* For each SGI on the target CPU, we store 8 bits
+     * indicating which source CPUs have made this SGI
+     * pending on the target CPU. These correspond to
+     * the bytes in the GIC_SPENDSGIR* registers as
+     * read by the target CPU.
+     */
+    gicv3_sgi_state sgi_state[GICV3_NR_SGIS];
+
+    uint16_t priority_mask[GICV3_NCPU];
+    uint16_t running_irq[GICV3_NCPU];
+    uint16_t running_priority[GICV3_NCPU];
+    uint16_t current_pending[GICV3_NCPU];
+
+    /* We present the GICv2 without security extensions to a guest and
+     * therefore the guest can configure the GICC_CTLR to configure group 1
+     * binary point in the abpr.
+     */
+    uint8_t  bpr[GICV3_NCPU];
+    uint8_t  abpr[GICV3_NCPU];
+
+    /* The APR is implementation defined, so we choose a layout identical to
+     * the KVM ABI layout for QEMU's implementation of the gic:
+     * If an interrupt for preemption level X is active, then
+     *   APRn[X mod 32] == 0b1,  where n = X / 32
+     * otherwise the bit is clear.
+     *
+     * TODO: rewrite the interrupt acknowlege/complete routines to use
+     * the APR registers to track the necessary information to update
+     * s->running_priority[] on interrupt completion (ie completely remove
+     * last_active[][] and running_irq[]). This will be necessary if we ever
+     * want to support TCG<->KVM migration, or TCG guests which can
+     * do power management involving powering down and restarting
+     * the GIC.
+     */
+    uint32_t apr[GIC_NR_APRS][GICV3_NCPU];
+
+    uint32_t cpu_mask; /* For redistributer */
+    uint32_t num_cpu;
+    uint32_t num_processor;
+    MemoryRegion iomem_dist; /* Distributor */
+    MemoryRegion iomem_spi;
+    MemoryRegion iomem_its_cntrl;
+    MemoryRegion iomem_its;
+    MemoryRegion iomem_lpi; /* Redistributor */
+    /* This is just so we can have an opaque pointer which identifies
+     * both this GIC and which CPU interface we should be accessing.
+     */
+    struct GICState *backref[GICV3_NCPU];
+    uint32_t num_irq;
+    uint32_t revision;
+    int dev_fd; /* kvm device fd if backed by kvm vgic support */
+} GICState;
+
+#define TYPE_ARM_GICV3_COMMON "arm_gicv3_common"
+#define ARM_GIC_COMMON(obj) \
+     OBJECT_CHECK(GICState, (obj), TYPE_ARM_GICV3_COMMON)
+#define ARM_GIC_COMMON_CLASS(klass) \
+     OBJECT_CLASS_CHECK(ARMGICCommonClass, (klass), TYPE_ARM_GICV3_COMMON)
+#define ARM_GIC_COMMON_GET_CLASS(obj) \
+     OBJECT_GET_CLASS(ARMGICCommonClass, (obj), TYPE_ARM_GICV3_COMMON)
+
+typedef struct ARMGICCommonClass {
+    /*< private >*/
+    SysBusDeviceClass parent_class;
+    /*< public >*/
+
+    void (*pre_save)(GICState *s);
+    void (*post_load)(GICState *s);
+} ARMGICCommonClass;
+
+#endif
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 986f04c..5603a82 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -278,6 +278,7 @@  static void arm_cpu_set_irq(void *opaque, int irq, int level)
     ARMCPU *cpu = opaque;
     CPUARMState *env = &cpu->env;
     CPUState *cs = CPU(cpu);
+
     static const int mask[] = {
         [ARM_CPU_IRQ] = CPU_INTERRUPT_HARD,
         [ARM_CPU_FIQ] = CPU_INTERRUPT_FIQ,
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 11845a6..6ea99ea 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -1013,6 +1013,12 @@  void armv7m_nvic_set_pending(void *opaque, int irq);
 int armv7m_nvic_acknowledge_irq(void *opaque);
 void armv7m_nvic_complete_irq(void *opaque, int irq);

+void armv8_gicv3_set_sgi(void *opaque, int cpuindex, uint64_t value);
+uint64_t armv8_gicv3_acknowledge_irq(void *opaque, int cpuindex);
+void armv8_gicv3_complete_irq(void *opaque, int cpuindex, int irq);
+uint64_t armv8_gicv3_get_priority_mask(void *opaque, int cpuindex);
+void armv8_gicv3_set_priority_mask(void *opaque, int cpuindex, uint32_t mask);
+
 /* Interface for defining coprocessor registers.
  * Registers are defined in tables of arm_cp_reginfo structs
  * which are passed to define_arm_cp_regs().
diff --git a/target-arm/cpu64.c b/target-arm/cpu64.c
index 823c739..890e358 100644
--- a/target-arm/cpu64.c
+++ b/target-arm/cpu64.c
@@ -45,6 +45,62 @@  static uint64_t a57_l2ctlr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 }
 #endif

+#ifndef CONFIG_USER_ONLY
+static void sgi_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+{
+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+    CPUState *cpu = ENV_GET_CPU(env);
+    armv8_gicv3_set_sgi(env->nvic, cpu->cpu_index, value);
+}
+
+static uint64_t iar_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+    uint64_t value;
+    CPUState *cpu = ENV_GET_CPU(env);
+    value = armv8_gicv3_acknowledge_irq(env->nvic, cpu->cpu_index);
+    return value;
+}
+
+static uint32_t sre = 1;
+static void sre_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+{
+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+    sre = value & 0xFFFFFFFF;
+}
+
+static uint64_t sre_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+    uint64_t value;
+    value = sre;
+    return value;
+}
+
+static void eoir_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+{
+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+    CPUState *cpu = ENV_GET_CPU(env);
+    armv8_gicv3_complete_irq(env->nvic, cpu->cpu_index, value);
+}
+
+static uint64_t pmr_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+    uint64_t value;
+    CPUState *cpu = ENV_GET_CPU(env);
+    value = armv8_gicv3_get_priority_mask(env->nvic, cpu->cpu_index);
+    return value;
+}
+
+static void pmr_write(CPUARMState *env, const ARMCPRegInfo *ri, uint64_t value)
+{
+    /* Invalidate single TLB entry by MVA and ASID (TLBIMVA) */
+    CPUState *cpu = ENV_GET_CPU(env);
+    armv8_gicv3_set_priority_mask(env->nvic, cpu->cpu_index, value);
+}
+#endif
+
 static const ARMCPRegInfo cortexa57_cp_reginfo[] = {
 #ifndef CONFIG_USER_ONLY
     { .name = "L2CTLR_EL1", .state = ARM_CP_STATE_AA64,
@@ -89,6 +145,42 @@  static const ARMCPRegInfo cortexa57_cp_reginfo[] = {
     { .name = "L2MERRSR",
       .cp = 15, .opc1 = 3, .crm = 15,
       .access = PL1_RW, .type = ARM_CP_CONST | ARM_CP_64BIT, .resetvalue = 0 },
+    { .name = "EIOR1_EL1", .state = ARM_CP_STATE_AA64,
+#ifndef CONFIG_USER_ONLY
+      .writefn = eoir_write,
+#endif
+      .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 1,
+      .access = PL1_W, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
+    { .name = "IAR1_EL1", .state = ARM_CP_STATE_AA64,
+#ifndef CONFIG_USER_ONLY
+      .readfn = iar_read,
+#endif
+      .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 0,
+      .access = PL1_R, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
+    { .name = "SGI1R_EL1", .state = ARM_CP_STATE_AA64,
+#ifndef CONFIG_USER_ONLY
+      .writefn = sgi_write,
+#endif
+      .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 11, .opc2 = 5,
+      .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
+    { .name = "PMR_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 4, .crm = 6, .opc2 = 0,
+#ifndef CONFIG_USER_ONLY
+      .readfn = pmr_read, .writefn = pmr_write,
+#endif
+      .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
+    { .name = "CTLR_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 4,
+      .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
+    { .name = "SRE_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 5, .resetvalue = 1, /* Use system registers */
+#ifndef CONFIG_USER_ONLY
+      .readfn = sre_read, .writefn = sre_write,
+#endif
+      .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
+    { .name = "IGRPEN1_EL1", .state = ARM_CP_STATE_AA64,
+      .opc0 = 3, .opc1 = 0, .crn = 12, .crm = 12, .opc2 = 7,
+      .access = PL1_RW, .type = ARM_CP_SPECIAL, .resetvalue = 0 },
     REGINFO_SENTINEL
 };

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 3bc20af..3733201 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2026,11 +2026,19 @@  static const ARMCPRegInfo strongarm_cp_reginfo[] = {
 static uint64_t mpidr_read(CPUARMState *env, const ARMCPRegInfo *ri)
 {
     CPUState *cs = CPU(arm_env_get_cpu(env));
-    uint32_t mpidr = cs->cpu_index;
-    /* We don't support setting cluster ID ([8..11]) (known as Aff1
+    uint32_t mpidr, aff0, aff1;
+    uint32_t cpuid = cs->cpu_index;
+    /* We don't support setting cluster ID ([16..23]) (known as Aff2
      * in later ARM ARM versions), or any of the higher affinity level fields,
      * so these bits always RAZ.
      */
+    /* Currently GIC-500 code supports 64 cores in 16 clusters with 8 cores each
+     * Future code will remove this limitation.
+     * This code is valid for GIC-400 too.
+     */
+    aff0 = cpuid % 8;
+    aff1 = cpuid / 8;
+    mpidr = (aff1 << 8) | aff0;
     if (arm_feature(env, ARM_FEATURE_V7MP)) {
         mpidr |= (1U << 31);
         /* Cores which are uniprocessor (non-coherent)
diff --git a/target-arm/psci.c b/target-arm/psci.c
index d8fafab..0d45d66 100644
--- a/target-arm/psci.c
+++ b/target-arm/psci.c
@@ -86,6 +86,7 @@  void arm_handle_psci_call(ARMCPU *cpu)
     CPUARMState *env = &cpu->env;
     uint64_t param[4];
     uint64_t context_id, mpidr;
+    uint32_t core, Aff1, Aff0;
     target_ulong entry;
     int32_t ret = 0;
     int i;
@@ -121,7 +122,14 @@  void arm_handle_psci_call(ARMCPU *cpu)

         switch (param[2]) {
         case 0:
-            target_cpu_state = qemu_get_cpu(mpidr & 0xff);
+            /* MPIDR_EL1 [RES0:affinity-3:RES1:U:RES0:MT:affinity-1:affininty-0]
+             * GIC 500 code currently supports 16 clusters with 8 cores each
+             * GIC 400 supports 8 cores so 0x7 for Aff0 is O.K. too
+             */
+            Aff1 = (mpidr & 0xf00) >> (8 - 3); /* Shift by 8 multiply by 8 */
+            Aff0 = mpidr & 0x7;
+            core = Aff1 + Aff0;
+            target_cpu_state = qemu_get_cpu(core);
             if (!target_cpu_state) {
                 ret = QEMU_PSCI_RET_INVALID_PARAMS;
                 break;
@@ -153,7 +161,11 @@  void arm_handle_psci_call(ARMCPU *cpu)
         context_id = param[3];

         /* change to the cpu we are powering up */
-        target_cpu_state = qemu_get_cpu(mpidr & 0xff);
+        /* Currently supports 64 cores in 16 clusters with 8 cores each */
+        Aff1 = (mpidr & 0xf00) >> (8 - 3); /* Shift by 8 multiply by 8 */
+        Aff0 = mpidr & 0x7;
+        core = Aff1 + Aff0;
+        target_cpu_state = qemu_get_cpu(core);
         if (!target_cpu_state) {
             ret = QEMU_PSCI_RET_INVALID_PARAMS;
             break;
@@ -186,7 +198,7 @@  void arm_handle_psci_call(ARMCPU *cpu)
         assert(is_a64(env) == is_a64(&target_cpu->env));
         if (is_a64(env)) {
             if (entry & 1) {
-                ret = QEMU_PSCI_RET_INVALID_PARAMS;
+                 ret = QEMU_PSCI_RET_INVALID_PARAMS;
                 break;
             }
             target_cpu->env.xregs[0] = context_id;
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 34b489f..c2fcc3d 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1334,13 +1334,27 @@  static void handle_sys(DisasContext *s, uint32_t insn, bool isread,
         qemu_log_mask(LOG_UNIMP, "%s access to unsupported AArch64 "
                       "system register op0:%d op1:%d crn:%d crm:%d op2:%d\n",
                       isread ? "read" : "write", op0, op1, crn, crm, op2);
+        fprintf(stderr, "%s access to unsupported AArch64 "
+                      "system register op0:%d op1:%d crn:%d crm:%d op2:%d\n",
+                      isread ? "read" : "write", op0, op1, crn, crm, op2);
         unallocated_encoding(s);
         return;
     }

+#if 0
+    fprintf(stderr, "%s access to supported AArch64 "
+                  "system register op0:%d op1:%d crn:%d crm:%d op2:%d\n",
+                  isread ? "read" : "write", op0, op1, crn, crm, op2);
+
+    fprintf(stderr, "read offset %p\n", (void *)ri->fieldoffset);
+#endif
+
     /* Check access permissions */
     if (!cp_access_ok(s->current_el, ri, isread)) {
         unallocated_encoding(s);
+        fprintf(stderr, "%s access to unsupported AArch64 "
+                      "system register op0:%d op1:%d crn:%d crm:%d op2:%d\n",
+                      isread ? "read" : "write", op0, op1, crn, crm, op2);
         return;
     }