diff mbox

[v3] powerpc/85xx: add support to JOG feature using cpufreq interface

Message ID 1324985129-26219-1-git-send-email-chenhui.zhao@freescale.com (mailing list archive)
State Superseded
Delegated to: Kumar Gala
Headers show

Commit Message

chenhui zhao Dec. 27, 2011, 11:25 a.m. UTC
From: Li Yang <leoli@freescale.com>

Some 85xx silicons like MPC8536 and P1022 have a JOG feature, which provides
a dynamic mechanism to lower or raise the CPU core clock at runtime.

This patch adds the support to change CPU frequency using the standard
cpufreq interface. The ratio CORE to CCB can be 1:1(except MPC8536), 3:2,
2:1, 5:2, 3:1, 7:2 and 4:1.

Two CPU cores on P1022 must not in the low power state during the frequency
transition. The driver uses a flag to meet the requirement.

The jog mode frequency transition process on the MPC8536 is similar to
the deep sleep process. The driver need save the CPU state and restore
it after CPU warm reset.

Note:
 * The I/O peripherals such as PCIe and eTSEC may lose packets during
   the jog mode frequency transition.
 * The driver doesn't support MPC8536 Rev 1.0 due to a JOG erratum.
   Subsequent revisions of MPC8536 have corrected the erratum.

Signed-off-by: Dave Liu <daveliu@freescale.com>
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jerry Huang <Chang-Ming.Huang@freescale.com>
Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
CC: Scott Wood <scottwood@freescale.com>
---
This patch depends on my previous patches related to power management.

Changes for v3:
 - Use different set_pll() functions for P1022 and MPC8536.
 - Fix a race issue for p1022.
 - Add "mpc85xx_enter_jog".

 arch/powerpc/platforms/85xx/Makefile      |    1 +
 arch/powerpc/platforms/85xx/cpufreq-jog.c |  404 +++++++++++++++++++++++++++++
 arch/powerpc/platforms/85xx/sleep.S       |    1 +
 arch/powerpc/platforms/Kconfig            |    8 +
 arch/powerpc/sysdev/fsl_soc.h             |    1 +
 5 files changed, 415 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/platforms/85xx/cpufreq-jog.c

Comments

Scott Wood Jan. 3, 2012, 10:14 p.m. UTC | #1
On 12/27/2011 05:25 AM, Zhao Chenhui wrote:
>  * The driver doesn't support MPC8536 Rev 1.0 due to a JOG erratum.
>    Subsequent revisions of MPC8536 have corrected the erratum.

Where do you check for this?

> +#define POWMGTCSR_LOSSLESS_MASK	0x00400000
> +#define POWMGTCSR_JOG_MASK	0x00200000

Are these really masks, or just values to use?

> +#define POWMGTCSR_CORE0_IRQ_MSK	0x80000000
> +#define POWMGTCSR_CORE0_CI_MSK	0x40000000
> +#define POWMGTCSR_CORE0_DOZING	0x00000008
> +#define POWMGTCSR_CORE0_NAPPING	0x00000004
> +
> +#define POWMGTCSR_CORE_INT_MSK	0x00000800
> +#define POWMGTCSR_CORE_CINT_MSK	0x00000400
> +#define POWMGTCSR_CORE_UDE_MSK	0x00000200
> +#define POWMGTCSR_CORE_MCP_MSK	0x00000100
> +#define P1022_POWMGTCSR_MSK	(POWMGTCSR_CORE_INT_MSK | \
> +				 POWMGTCSR_CORE_CINT_MSK | \
> +				 POWMGTCSR_CORE_UDE_MSK | \
> +				 POWMGTCSR_CORE_MCP_MSK)
> +
> +static void keep_waking_up(void *dummy)
> +{
> +	unsigned long flags;
> +
> +	local_irq_save(flags);
> +	mb();
> +
> +	in_jog_process = 1;
> +	mb();
> +
> +	while (in_jog_process != 0)
> +		mb();
> +
> +	local_irq_restore(flags);
> +}

Please document this.  Compare in_jog_process == 1, not != 0 -- it's
unlikely, but what if the other cpu sees that in_jog_process has been
set to 1, exits and sets in_jog_process to 0, then re-enters set_pll and
sets in_jog_process to -1 again before this function does another load
of in_jog_process?

Do you really need all these mb()s?  I think this would suffice:

	local_irq_save(flags);

	in_jog_process = 1;

	while (in_jog_process == 1)
		barrier();

	local_irq_restore();

It's not really a performance issue, just simplicity.

> +static int p1022_set_pll(unsigned int cpu, unsigned int pll)
> +{
> +	int index, hw_cpu = get_hard_smp_processor_id(cpu);
> +	int shift;
> +	u32 corefreq, val, mask = 0;
> +	unsigned int cur_pll = get_pll(hw_cpu);
> +	unsigned long flags;
> +	int ret = 0;
> +
> +	if (pll == cur_pll)
> +		return 0;
> +
> +	shift = hw_cpu * CORE_RATIO_BITS + CORE0_RATIO_SHIFT;
> +	val = (pll & CORE_RATIO_MASK) << shift;
> +
> +	corefreq = sysfreq * pll / 2;
> +	/*
> +	 * Set the COREx_SPD bit if the requested core frequency
> +	 * is larger than the threshold frequency.
> +	 */
> +	if (corefreq > FREQ_533MHz)
> +		val |= PMJCR_CORE0_SPD_MASK << hw_cpu;

P1022 manual says the threshold is 500 MHz (but doesn't say how to set
the bit if the frequency is exactly 500 MHz).  Where did 533340000 come
from?

> +
> +	mask = (CORE_RATIO_MASK << shift) | (PMJCR_CORE0_SPD_MASK << hw_cpu);
> +	clrsetbits_be32(guts + PMJCR, mask, val);
> +
> +	/* readback to sync write */
> +	val = in_be32(guts + PMJCR);

You don't use val after this -- just ignore the return value from in_be32().

> +	/*
> +	 * A Jog request can not be asserted when any core is in a low
> +	 * power state on P1022. Before executing a jog request, any
> +	 * core which is in a low power state must be waked by a
> +	 * interrupt, and keep waking up until the sequence is
> +	 * finished.
> +	 */
> +	for_each_present_cpu(index) {
> +		if (!cpu_online(index))
> +			return -EFAULT;
> +	}

EFAULT is not the appropriate error code -- it is for when userspace
passes a bad virtual address.

Better, don't fail here -- bring the other core out of the low power
state in order to do the jog.  cpufreq shouldn't stop working just
because we took a core offline.

What prevents a core from going offline just after you check here?

> +	in_jog_process = -1;
> +	mb();
> +	smp_call_function(keep_waking_up, NULL, 0);

What does "keep waking up" mean?  Something like spin_while_jogging
might be clearer.

> +	local_irq_save(flags);
> +	mb();
> +	/* Wait for the other core to wake. */
> +	while (in_jog_process != 1)
> +		mb();

Timeout?  And more unnecessary mb()s.

Might be nice to support more than two cores, even if this code isn't
currently expected to be used on such hardware (it's just a generic
"hold other cpus" loop; might as well make it reusable).  You could do
this by using an atomic count for other cores to check in and out of the
spin loop.

> +	out_be32(guts + POWMGTCSR, POWMGTCSR_JOG_MASK | P1022_POWMGTCSR_MSK);
> +
> +	if (!spin_event_timeout(((in_be32(guts + POWMGTCSR) &
> +	    POWMGTCSR_JOG_MASK) == 0), 10000, 10)) {
> +		pr_err("%s: Fail to switch the core frequency.\n", __func__);
> +		ret = -EFAULT;
> +	}
> +
> +	clrbits32(guts + POWMGTCSR, P1022_POWMGTCSR_MSK);
> +	in_jog_process = 0;
> +	mb();

This mb() (or better, a readback of POWMGTCSR) should be before you
clear in_jog_process.  For clarity of its purpose, the clearing of
POWMGTCSR should go in the failure branch of spin_event_timeout().

> +	/* the latency of a transition, the unit is ns */
> +	policy->cpuinfo.transition_latency = 2000;

Is this based on observation?

> diff --git a/arch/powerpc/platforms/85xx/sleep.S b/arch/powerpc/platforms/85xx/sleep.S
> index 763d2f2..919781d 100644
> --- a/arch/powerpc/platforms/85xx/sleep.S
> +++ b/arch/powerpc/platforms/85xx/sleep.S
> @@ -59,6 +59,7 @@ powmgtreq:
>  	 * r5 = JOG or deep sleep request
>  	 *      JOG-0x00200000, deep sleep-0x00100000
>  	 */
> +_GLOBAL(mpc85xx_enter_jog)
>  _GLOBAL(mpc85xx_enter_deep_sleep)
>  	lis	r6, ccsrbase_low@ha
>  	stw	r4, ccsrbase_low@l(r6)

Why does this need two entry points rather than a more appropriate name?

> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index 3fe6d92..1d0c4e0 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -200,6 +200,14 @@ config CPU_FREQ_PMAC64
>  	  This adds support for frequency switching on Apple iMac G5,
>  	  and some of the more recent desktop G5 machines as well.
>  
> +config MPC85xx_CPUFREQ
> +	bool "Support for Freescale MPC85xx CPU freq"
> +	depends on PPC_85xx && PPC32
> +	select CPU_FREQ_TABLE
> +	help
> +	  This adds support for frequency switching on Freescale MPC85xx,
> +	  currently including P1022 and MPC8536.

default y, given the dependencies?  Or wait for more testing before we
do that?

-Scott
Zhao Chenhui Jan. 4, 2012, 9:34 a.m. UTC | #2
> On 12/27/2011 05:25 AM, Zhao Chenhui wrote:
> >  * The driver doesn't support MPC8536 Rev 1.0 due to a JOG erratum.
> >    Subsequent revisions of MPC8536 have corrected the erratum.
> 
> Where do you check for this?

Nowhere. I just notify this patch don't support MPC8536 Rev 1.0.

> 
> > +#define POWMGTCSR_LOSSLESS_MASK	0x00400000
> > +#define POWMGTCSR_JOG_MASK	0x00200000
> 
> Are these really masks, or just values to use?

They are masks.

> 
> > +#define POWMGTCSR_CORE0_IRQ_MSK	0x80000000
> > +#define POWMGTCSR_CORE0_CI_MSK	0x40000000
> > +#define POWMGTCSR_CORE0_DOZING	0x00000008
> > +#define POWMGTCSR_CORE0_NAPPING	0x00000004
> > +
> > +#define POWMGTCSR_CORE_INT_MSK	0x00000800
> > +#define POWMGTCSR_CORE_CINT_MSK	0x00000400
> > +#define POWMGTCSR_CORE_UDE_MSK	0x00000200
> > +#define POWMGTCSR_CORE_MCP_MSK	0x00000100
> > +#define P1022_POWMGTCSR_MSK	(POWMGTCSR_CORE_INT_MSK | \
> > +				 POWMGTCSR_CORE_CINT_MSK | \
> > +				 POWMGTCSR_CORE_UDE_MSK | \
> > +				 POWMGTCSR_CORE_MCP_MSK)
> > +
> > +static void keep_waking_up(void *dummy)
> > +{
> > +	unsigned long flags;
> > +
> > +	local_irq_save(flags);
> > +	mb();
> > +
> > +	in_jog_process = 1;
> > +	mb();
> > +
> > +	while (in_jog_process != 0)
> > +		mb();
> > +
> > +	local_irq_restore(flags);
> > +}
> 
> Please document this.  Compare in_jog_process == 1, not != 0 -- it's
> unlikely, but what if the other cpu sees that in_jog_process has been
> set to 1, exits and sets in_jog_process to 0, then re-enters set_pll and
> sets in_jog_process to -1 again before this function does another load
> of in_jog_process?

Thanks. I'll fix it.

> 
> Do you really need all these mb()s?  I think this would suffice:
> 
> 	local_irq_save(flags);
> 
> 	in_jog_process = 1;
> 
> 	while (in_jog_process == 1)
> 		barrier();
> 
> 	local_irq_restore();
> 
> It's not really a performance issue, just simplicity.
> 
> > +static int p1022_set_pll(unsigned int cpu, unsigned int pll)
> > +{
> > +	int index, hw_cpu = get_hard_smp_processor_id(cpu);
> > +	int shift;
> > +	u32 corefreq, val, mask = 0;
> > +	unsigned int cur_pll = get_pll(hw_cpu);
> > +	unsigned long flags;
> > +	int ret = 0;
> > +
> > +	if (pll == cur_pll)
> > +		return 0;
> > +
> > +	shift = hw_cpu * CORE_RATIO_BITS + CORE0_RATIO_SHIFT;
> > +	val = (pll & CORE_RATIO_MASK) << shift;
> > +
> > +	corefreq = sysfreq * pll / 2;
> > +	/*
> > +	 * Set the COREx_SPD bit if the requested core frequency
> > +	 * is larger than the threshold frequency.
> > +	 */
> > +	if (corefreq > FREQ_533MHz)
> > +		val |= PMJCR_CORE0_SPD_MASK << hw_cpu;
> 
> P1022 manual says the threshold is 500 MHz (but doesn't say how to set
> the bit if the frequency is exactly 500 MHz).  Where did 533340000 come
> from?

Please refer to Chapter 25 "25.4.1.11 Power Management Jog Control Register (PMJCR)".

> 
> > +
> > +	mask = (CORE_RATIO_MASK << shift) | (PMJCR_CORE0_SPD_MASK <<
> hw_cpu);
> > +	clrsetbits_be32(guts + PMJCR, mask, val);
> > +
> > +	/* readback to sync write */
> > +	val = in_be32(guts + PMJCR);
> 
> You don't use val after this -- just ignore the return value from
> in_be32().

OK.

> 
> > +	/*
> > +	 * A Jog request can not be asserted when any core is in a low
> > +	 * power state on P1022. Before executing a jog request, any
> > +	 * core which is in a low power state must be waked by a
> > +	 * interrupt, and keep waking up until the sequence is
> > +	 * finished.
> > +	 */
> > +	for_each_present_cpu(index) {
> > +		if (!cpu_online(index))
> > +			return -EFAULT;
> > +	}
> 
> EFAULT is not the appropriate error code -- it is for when userspace
> passes a bad virtual address.
> 
> Better, don't fail here -- bring the other core out of the low power
> state in order to do the jog.  cpufreq shouldn't stop working just
> because we took a core offline.
> 
> What prevents a core from going offline just after you check here?
> 
> > +	in_jog_process = -1;
> > +	mb();
> > +	smp_call_function(keep_waking_up, NULL, 0);
> 
> What does "keep waking up" mean?  Something like spin_while_jogging
> might be clearer.
> 
> > +	local_irq_save(flags);
> > +	mb();
> > +	/* Wait for the other core to wake. */
> > +	while (in_jog_process != 1)
> > +		mb();
> 
> Timeout?  And more unnecessary mb()s.
> 
> Might be nice to support more than two cores, even if this code isn't
> currently expected to be used on such hardware (it's just a generic
> "hold other cpus" loop; might as well make it reusable).  You could do
> this by using an atomic count for other cores to check in and out of the
> spin loop.

This is just for P1022, a dual-core chip. A separate patch will
support multi-core chips, such as P4080, etc.

> 
> > +	out_be32(guts + POWMGTCSR, POWMGTCSR_JOG_MASK |
> P1022_POWMGTCSR_MSK);
> > +
> > +	if (!spin_event_timeout(((in_be32(guts + POWMGTCSR) &
> > +	    POWMGTCSR_JOG_MASK) == 0), 10000, 10)) {
> > +		pr_err("%s: Fail to switch the core frequency.\n", __func__);
> > +		ret = -EFAULT;
> > +	}
> > +
> > +	clrbits32(guts + POWMGTCSR, P1022_POWMGTCSR_MSK);
> > +	in_jog_process = 0;
> > +	mb();
> 
> This mb() (or better, a readback of POWMGTCSR) should be before you
> clear in_jog_process.  For clarity of its purpose, the clearing of
> POWMGTCSR should go in the failure branch of spin_event_timeout().

According to the manual, P1022_POWMGTCSR_MSK should be reset
by software regardless of failure or success.

-chenhui
Scott Wood Jan. 4, 2012, 8:41 p.m. UTC | #3
On 01/04/2012 03:34 AM, Zhao Chenhui-B35336 wrote:
>> On 12/27/2011 05:25 AM, Zhao Chenhui wrote:
>>>  * The driver doesn't support MPC8536 Rev 1.0 due to a JOG erratum.
>>>    Subsequent revisions of MPC8536 have corrected the erratum.
>>
>> Where do you check for this?
> 
> Nowhere. I just notify this patch don't support MPC8536 Rev 1.0.

Is mpc8536 rev 1.0 supported by the kernel in general?  If so, and this
code doesn't work with it, it needs to check for that revision and not
register the cpufreq handler if found.

>>> +#define POWMGTCSR_LOSSLESS_MASK	0x00400000
>>> +#define POWMGTCSR_JOG_MASK	0x00200000
>>
>> Are these really masks, or just values to use?
> 
> They are masks.

They're bits.  Sometimes you use it additively, to set this bit along
with others.  Sometimes you use it subtractively, to test whether the
bit has cleared -- you could argue that it's used as a mask in that
context, but I don't think adding _MASK to the name really adds anything
here (likewise for things like PMJCR_CORE0_SPD_MASK).

>>> +static int p1022_set_pll(unsigned int cpu, unsigned int pll)
>>> +{
>>> +	int index, hw_cpu = get_hard_smp_processor_id(cpu);
>>> +	int shift;
>>> +	u32 corefreq, val, mask = 0;
>>> +	unsigned int cur_pll = get_pll(hw_cpu);
>>> +	unsigned long flags;
>>> +	int ret = 0;
>>> +
>>> +	if (pll == cur_pll)
>>> +		return 0;
>>> +
>>> +	shift = hw_cpu * CORE_RATIO_BITS + CORE0_RATIO_SHIFT;
>>> +	val = (pll & CORE_RATIO_MASK) << shift;
>>> +
>>> +	corefreq = sysfreq * pll / 2;
>>> +	/*
>>> +	 * Set the COREx_SPD bit if the requested core frequency
>>> +	 * is larger than the threshold frequency.
>>> +	 */
>>> +	if (corefreq > FREQ_533MHz)
>>> +		val |= PMJCR_CORE0_SPD_MASK << hw_cpu;
>>
>> P1022 manual says the threshold is 500 MHz (but doesn't say how to set
>> the bit if the frequency is exactly 500 MHz).  Where did 533340000 come
>> from?
> 
> Please refer to Chapter 25 "25.4.1.11 Power Management Jog Control Register (PMJCR)".

You seem to have a different version of the p1022 manual than I (and the
FSL docs website) do.  In my copy 25.4.1 is "Performance Monitor
Interrupt" and it has no subsections.

PMJCR is described in 26.4.1.11 and for CORE0_SPD says:

> 0 Core0 frequency at 400–500 MHz
> 1 Core0 frequency at 500–1067 MHz

>>> +	local_irq_save(flags);
>>> +	mb();
>>> +	/* Wait for the other core to wake. */
>>> +	while (in_jog_process != 1)
>>> +		mb();
>>
>> Timeout?  And more unnecessary mb()s.
>>
>> Might be nice to support more than two cores, even if this code isn't
>> currently expected to be used on such hardware (it's just a generic
>> "hold other cpus" loop; might as well make it reusable).  You could do
>> this by using an atomic count for other cores to check in and out of the
>> spin loop.
> 
> This is just for P1022, a dual-core chip. A separate patch will
> support multi-core chips, such as P4080, etc.

My point was that this specific function isn't really doing anything
p1022-specific, it's just a way to get other CPUs in the system to halt
until signalled to continue.  I thought it would be nice to just write
it generically from the start, but it's up to you.

>>> +	out_be32(guts + POWMGTCSR, POWMGTCSR_JOG_MASK |
>> P1022_POWMGTCSR_MSK);
>>> +
>>> +	if (!spin_event_timeout(((in_be32(guts + POWMGTCSR) &
>>> +	    POWMGTCSR_JOG_MASK) == 0), 10000, 10)) {
>>> +		pr_err("%s: Fail to switch the core frequency.\n", __func__);
>>> +		ret = -EFAULT;
>>> +	}
>>> +
>>> +	clrbits32(guts + POWMGTCSR, P1022_POWMGTCSR_MSK);
>>> +	in_jog_process = 0;
>>> +	mb();
>>
>> This mb() (or better, a readback of POWMGTCSR) should be before you
>> clear in_jog_process.  For clarity of its purpose, the clearing of
>> POWMGTCSR should go in the failure branch of spin_event_timeout().
> 
> According to the manual, P1022_POWMGTCSR_MSK should be reset
> by software regardless of failure or success.

OK, I missed that you're clearing more bits than you checked in
spin_event_timeout().  Could you rename P1022_POWMGTCSR_MSK to something
more meaningful (especially since you use _MASK all over the place to
mean something else)?

-Scott
diff mbox

Patch

diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
index f9fcbf4..eeaa1f4 100644
--- a/arch/powerpc/platforms/85xx/Makefile
+++ b/arch/powerpc/platforms/85xx/Makefile
@@ -5,6 +5,7 @@  obj-$(CONFIG_SMP) += smp.o
 ifneq ($(CONFIG_PPC_E500MC),y)
 obj-$(CONFIG_SUSPEND)	+= sleep.o
 endif
+obj-$(CONFIG_MPC85xx_CPUFREQ) += cpufreq-jog.o
 
 obj-y += common.o
 
diff --git a/arch/powerpc/platforms/85xx/cpufreq-jog.c b/arch/powerpc/platforms/85xx/cpufreq-jog.c
new file mode 100644
index 0000000..d1418d9
--- /dev/null
+++ b/arch/powerpc/platforms/85xx/cpufreq-jog.c
@@ -0,0 +1,404 @@ 
+/*
+ * Copyright (C) 2008-2011 Freescale Semiconductor, Inc.
+ * Author: Dave Liu <daveliu@freescale.com>
+ * Modifier: Chenhui Zhao <chenhui.zhao@freescale.com>
+ *
+ * The cpufreq driver is for Freescale 85xx processor,
+ * based on arch/powerpc/platforms/cell/cbe_cpufreq.c
+ * (C) Copyright IBM Deutschland Entwicklung GmbH 2005-2007
+ *	Christian Krafft <krafft@de.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/module.h>
+#include <linux/cpufreq.h>
+#include <linux/of_platform.h>
+#include <linux/suspend.h>
+
+#include <asm/prom.h>
+#include <asm/time.h>
+#include <asm/reg.h>
+#include <asm/io.h>
+#include <asm/machdep.h>
+#include <asm/smp.h>
+
+#include <sysdev/fsl_soc.h>
+
+static DEFINE_MUTEX(mpc85xx_switch_mutex);
+static void __iomem *guts;
+
+static u32 sysfreq;
+static int in_jog_process;
+static struct cpufreq_frequency_table *mpc85xx_freqs;
+static int (*set_pll)(unsigned int cpu, unsigned int pll);
+
+static struct cpufreq_frequency_table mpc8536_freqs_table[] = {
+	{3,	0},
+	{4,	0},
+	{5,	0},
+	{6,	0},
+	{7,	0},
+	{8,	0},
+	{0,	CPUFREQ_TABLE_END},
+};
+
+static struct cpufreq_frequency_table p1022_freqs_table[] = {
+	{2,	0},
+	{3,	0},
+	{4,	0},
+	{5,	0},
+	{6,	0},
+	{7,	0},
+	{8,	0},
+	{0,	CPUFREQ_TABLE_END},
+};
+
+#define FREQ_533MHz	533340000
+#define FREQ_800MHz	800000000
+
+#define CORE_RATIO_BITS		8
+#define CORE_RATIO_MASK		0x3f
+#define CORE0_RATIO_SHIFT	16
+
+#define PORPLLSR	0x0
+
+#define PMJCR		0x7c
+#define PMJCR_CORE0_SPD_MASK	0x00001000
+#define PMJCR_CORE_SPD_MASK	0x00002000
+
+#define POWMGTCSR	0x80
+#define POWMGTCSR_LOSSLESS_MASK	0x00400000
+#define POWMGTCSR_JOG_MASK	0x00200000
+#define POWMGTCSR_CORE0_IRQ_MSK	0x80000000
+#define POWMGTCSR_CORE0_CI_MSK	0x40000000
+#define POWMGTCSR_CORE0_DOZING	0x00000008
+#define POWMGTCSR_CORE0_NAPPING	0x00000004
+
+#define POWMGTCSR_CORE_INT_MSK	0x00000800
+#define POWMGTCSR_CORE_CINT_MSK	0x00000400
+#define POWMGTCSR_CORE_UDE_MSK	0x00000200
+#define POWMGTCSR_CORE_MCP_MSK	0x00000100
+#define P1022_POWMGTCSR_MSK	(POWMGTCSR_CORE_INT_MSK | \
+				 POWMGTCSR_CORE_CINT_MSK | \
+				 POWMGTCSR_CORE_UDE_MSK | \
+				 POWMGTCSR_CORE_MCP_MSK)
+
+static void keep_waking_up(void *dummy)
+{
+	unsigned long flags;
+
+	local_irq_save(flags);
+	mb();
+
+	in_jog_process = 1;
+	mb();
+
+	while (in_jog_process != 0)
+		mb();
+
+	local_irq_restore(flags);
+}
+
+/*
+ * hardware specific functions
+ */
+static int get_pll(int hw_cpu)
+{
+	int ret, shift;
+	u32 cur_pll = in_be32(guts + PORPLLSR);
+
+	shift = hw_cpu * CORE_RATIO_BITS + CORE0_RATIO_SHIFT;
+	ret = (cur_pll >> shift) & CORE_RATIO_MASK;
+	return ret;
+}
+
+static int mpc8536_set_pll(unsigned int cpu, unsigned int pll)
+{
+	u32 corefreq, val, mask;
+	unsigned int cur_pll = get_pll(0);
+	int ret = 0;
+	unsigned long flags;
+
+	if (pll == cur_pll)
+		return 0;
+
+	val = (pll & CORE_RATIO_MASK) << CORE0_RATIO_SHIFT;
+
+	corefreq = sysfreq * pll / 2;
+	/*
+	 * Set the COREx_SPD bit if the requested core frequency
+	 * is larger than the threshold frequency.
+	 */
+	if (corefreq > FREQ_800MHz)
+			val |= PMJCR_CORE_SPD_MASK;
+
+	mask = (CORE_RATIO_MASK << CORE0_RATIO_SHIFT) | PMJCR_CORE_SPD_MASK;
+	clrsetbits_be32(guts + PMJCR, mask, val);
+
+	/* readback to sync write */
+	val = in_be32(guts + PMJCR);
+
+	local_irq_save(flags);
+	mpc85xx_enter_jog(get_immrbase(), POWMGTCSR_JOG_MASK);
+	local_irq_restore(flags);
+
+	/* verify */
+	cur_pll =  get_pll(0);
+	if (cur_pll != pll) {
+		pr_err("%s: Error. The current PLL of core 0 is %d instead of %d.\n",
+				__func__, cur_pll, pll);
+		ret = -EFAULT;
+	}
+	return ret;
+}
+
+static int p1022_set_pll(unsigned int cpu, unsigned int pll)
+{
+	int index, hw_cpu = get_hard_smp_processor_id(cpu);
+	int shift;
+	u32 corefreq, val, mask = 0;
+	unsigned int cur_pll = get_pll(hw_cpu);
+	unsigned long flags;
+	int ret = 0;
+
+	if (pll == cur_pll)
+		return 0;
+
+	shift = hw_cpu * CORE_RATIO_BITS + CORE0_RATIO_SHIFT;
+	val = (pll & CORE_RATIO_MASK) << shift;
+
+	corefreq = sysfreq * pll / 2;
+	/*
+	 * Set the COREx_SPD bit if the requested core frequency
+	 * is larger than the threshold frequency.
+	 */
+	if (corefreq > FREQ_533MHz)
+		val |= PMJCR_CORE0_SPD_MASK << hw_cpu;
+
+	mask = (CORE_RATIO_MASK << shift) | (PMJCR_CORE0_SPD_MASK << hw_cpu);
+	clrsetbits_be32(guts + PMJCR, mask, val);
+
+	/* readback to sync write */
+	val = in_be32(guts + PMJCR);
+
+	/*
+	 * A Jog request can not be asserted when any core is in a low
+	 * power state on P1022. Before executing a jog request, any
+	 * core which is in a low power state must be waked by a
+	 * interrupt, and keep waking up until the sequence is
+	 * finished.
+	 */
+	for_each_present_cpu(index) {
+		if (!cpu_online(index))
+			return -EFAULT;
+	}
+
+	in_jog_process = -1;
+	mb();
+	smp_call_function(keep_waking_up, NULL, 0);
+
+	local_irq_save(flags);
+	mb();
+	/* Wait for the other core to wake. */
+	while (in_jog_process != 1)
+		mb();
+
+	out_be32(guts + POWMGTCSR, POWMGTCSR_JOG_MASK | P1022_POWMGTCSR_MSK);
+
+	if (!spin_event_timeout(((in_be32(guts + POWMGTCSR) &
+	    POWMGTCSR_JOG_MASK) == 0), 10000, 10)) {
+		pr_err("%s: Fail to switch the core frequency.\n", __func__);
+		ret = -EFAULT;
+	}
+
+	clrbits32(guts + POWMGTCSR, P1022_POWMGTCSR_MSK);
+	in_jog_process = 0;
+	mb();
+
+	local_irq_restore(flags);
+
+	/* verify */
+	cur_pll =  get_pll(hw_cpu);
+	if (cur_pll != pll) {
+		pr_err("%s: Error. The current PLL of core %d is %d instead of %d.\n",
+				__func__, hw_cpu, cur_pll, pll);
+		ret = -EFAULT;
+	}
+	return ret;
+}
+
+/*
+ * cpufreq functions
+ */
+static int mpc85xx_cpufreq_cpu_init(struct cpufreq_policy *policy)
+{
+	unsigned int i, cur_pll;
+	int hw_cpu = get_hard_smp_processor_id(policy->cpu);
+
+	if (!cpu_present(policy->cpu))
+		return -ENODEV;
+
+	/* the latency of a transition, the unit is ns */
+	policy->cpuinfo.transition_latency = 2000;
+
+	cur_pll = get_pll(hw_cpu);
+
+	/* initialize frequency table */
+	pr_debug("core%d frequency table:\n", hw_cpu);
+	for (i = 0; mpc85xx_freqs[i].frequency != CPUFREQ_TABLE_END; i++) {
+		/* The frequency unit is kHz. */
+		mpc85xx_freqs[i].frequency =
+				(sysfreq * mpc85xx_freqs[i].index / 2) / 1000;
+		pr_debug("%d: %dkHz\n", i, mpc85xx_freqs[i].frequency);
+
+		if (mpc85xx_freqs[i].index == cur_pll)
+			policy->cur = mpc85xx_freqs[i].frequency;
+	}
+	pr_debug("current pll is at %d, and core freq is%d\n",
+					cur_pll, policy->cur);
+
+	cpufreq_frequency_table_get_attr(mpc85xx_freqs, policy->cpu);
+
+	/*
+	 * This ensures that policy->cpuinfo_min
+	 * and policy->cpuinfo_max are set correctly.
+	 */
+	return cpufreq_frequency_table_cpuinfo(policy, mpc85xx_freqs);
+}
+
+static int mpc85xx_cpufreq_cpu_exit(struct cpufreq_policy *policy)
+{
+	cpufreq_frequency_table_put_attr(policy->cpu);
+	return 0;
+}
+
+static int mpc85xx_cpufreq_verify(struct cpufreq_policy *policy)
+{
+	return cpufreq_frequency_table_verify(policy, mpc85xx_freqs);
+}
+
+static int mpc85xx_cpufreq_target(struct cpufreq_policy *policy,
+			      unsigned int target_freq,
+			      unsigned int relation)
+{
+	struct cpufreq_freqs freqs;
+	unsigned int new;
+	int ret = 0;
+
+	if (!set_pll)
+		return -ENODEV;
+
+	cpufreq_frequency_table_target(policy,
+				       mpc85xx_freqs,
+				       target_freq,
+				       relation,
+				       &new);
+
+	freqs.old = policy->cur;
+	freqs.new = mpc85xx_freqs[new].frequency;
+	freqs.cpu = policy->cpu;
+
+	mutex_lock(&mpc85xx_switch_mutex);
+	cpufreq_notify_transition(&freqs, CPUFREQ_PRECHANGE);
+
+	ret = set_pll(policy->cpu, mpc85xx_freqs[new].index);
+	if (!ret) {
+		pr_info("cpufreq: Setting core%d frequency to %d kHz and " \
+			 "PLL ratio to %d:2\n",
+			 policy->cpu,
+			 mpc85xx_freqs[new].frequency,
+			 mpc85xx_freqs[new].index);
+
+		ppc_proc_freq = freqs.new * 1000ul;
+	}
+	cpufreq_notify_transition(&freqs, CPUFREQ_POSTCHANGE);
+	mutex_unlock(&mpc85xx_switch_mutex);
+
+	return ret;
+}
+
+static struct cpufreq_driver mpc85xx_cpufreq_driver = {
+	.verify		= mpc85xx_cpufreq_verify,
+	.target		= mpc85xx_cpufreq_target,
+	.init		= mpc85xx_cpufreq_cpu_init,
+	.exit		= mpc85xx_cpufreq_cpu_exit,
+	.name		= "mpc85xx-JOG",
+	.owner		= THIS_MODULE,
+	.flags		= CPUFREQ_CONST_LOOPS,
+};
+
+static int mpc85xx_job_probe(struct platform_device *ofdev)
+{
+	struct device_node *np = ofdev->dev.of_node;
+
+	if (of_device_is_compatible(np, "fsl,mpc8536-guts")) {
+		mpc85xx_freqs = mpc8536_freqs_table;
+		set_pll = mpc8536_set_pll;
+	} else if (of_device_is_compatible(np, "fsl,p1022-guts")) {
+		mpc85xx_freqs = p1022_freqs_table;
+		set_pll = p1022_set_pll;
+	}
+
+	sysfreq = fsl_get_sys_freq();
+
+	guts = of_iomap(np, 0);
+	if (guts == NULL)
+		return -ENOMEM;
+
+	pr_info("Freescale MPC85xx CPU frequency switching(JOG) driver\n");
+
+	return cpufreq_register_driver(&mpc85xx_cpufreq_driver);
+}
+
+static int mpc85xx_jog_remove(struct platform_device *ofdev)
+{
+	iounmap(guts);
+	cpufreq_unregister_driver(&mpc85xx_cpufreq_driver);
+
+	return 0;
+}
+
+static struct of_device_id mpc85xx_jog_ids[] = {
+	{ .compatible = "fsl,mpc8536-guts", },
+	{ .compatible = "fsl,p1022-guts", },
+	{}
+};
+
+static struct platform_driver mpc85xx_jog_driver = {
+	.driver = {
+		.name = "mpc85xx_cpufreq_jog",
+		.owner = THIS_MODULE,
+		.of_match_table = mpc85xx_jog_ids,
+	},
+	.probe = mpc85xx_job_probe,
+	.remove = mpc85xx_jog_remove,
+};
+
+static int __init mpc85xx_jog_init(void)
+{
+	return platform_driver_register(&mpc85xx_jog_driver);
+}
+
+static void __exit mpc85xx_jog_exit(void)
+{
+	platform_driver_unregister(&mpc85xx_jog_driver);
+}
+
+module_init(mpc85xx_jog_init);
+module_exit(mpc85xx_jog_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Dave Liu <daveliu@freescale.com>");
diff --git a/arch/powerpc/platforms/85xx/sleep.S b/arch/powerpc/platforms/85xx/sleep.S
index 763d2f2..919781d 100644
--- a/arch/powerpc/platforms/85xx/sleep.S
+++ b/arch/powerpc/platforms/85xx/sleep.S
@@ -59,6 +59,7 @@  powmgtreq:
 	 * r5 = JOG or deep sleep request
 	 *      JOG-0x00200000, deep sleep-0x00100000
 	 */
+_GLOBAL(mpc85xx_enter_jog)
 _GLOBAL(mpc85xx_enter_deep_sleep)
 	lis	r6, ccsrbase_low@ha
 	stw	r4, ccsrbase_low@l(r6)
diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
index 3fe6d92..1d0c4e0 100644
--- a/arch/powerpc/platforms/Kconfig
+++ b/arch/powerpc/platforms/Kconfig
@@ -200,6 +200,14 @@  config CPU_FREQ_PMAC64
 	  This adds support for frequency switching on Apple iMac G5,
 	  and some of the more recent desktop G5 machines as well.
 
+config MPC85xx_CPUFREQ
+	bool "Support for Freescale MPC85xx CPU freq"
+	depends on PPC_85xx && PPC32
+	select CPU_FREQ_TABLE
+	help
+	  This adds support for frequency switching on Freescale MPC85xx,
+	  currently including P1022 and MPC8536.
+
 config PPC_PASEMI_CPUFREQ
 	bool "Support for PA Semi PWRficient"
 	depends on PPC_PASEMI
diff --git a/arch/powerpc/sysdev/fsl_soc.h b/arch/powerpc/sysdev/fsl_soc.h
index 29a87ee..8735ab0 100644
--- a/arch/powerpc/sysdev/fsl_soc.h
+++ b/arch/powerpc/sysdev/fsl_soc.h
@@ -62,5 +62,6 @@  void fsl_hv_halt(void);
  * code can be compatible with both 32-bit & 36-bit.
  */
 extern void mpc85xx_enter_deep_sleep(u64 ccsrbar, u32 powmgtreq);
+extern void mpc85xx_enter_jog(u64 ccsrbar, u32 powmgtreq);
 #endif
 #endif