Patchwork Patch for AMD Dispatch Scheduler

login
register
mail settings
Submitter reza yazdani
Date Aug. 24, 2010, 10:02 p.m.
Message ID <79771.93512.qm@web33002.mail.mud.yahoo.com>
Download mbox | patch
Permalink /patch/62630/
State New
Headers show

Comments

reza yazdani - Aug. 24, 2010, 10:02 p.m.
Dispatch scheduling is a new BD feature. It is composed of two parts: the scheduling part and the alignment part.

The scheduling part (this patch) arranges instructions to maximize the throughput of the hardware dispatcher. It makes sure dispatch widow boundaries are roughly observed. It is roughly, because the lengths of instructions, in number of bytes, are not known at the scheduling time. In x86 some instruction lengths may not be known until assembly time where information such as branch offsets are computed. Scheduling part is called once before register allocation and once after register allocation.

The alignment part (not in this patch) makes sure dispatch widows align at the correct boundaries.

Dispatch Scheduling is implemented as an extension to Haifa Scheduler pass. Scheduler is programed to follow x86-BD dispatching rules during the scheduling.

2 GCC hook functions are used to communicate from the machine independent part to the machine dependent parts of the scheduler.

A new command line flag –mdispatch-scheduler is defined. This option sets flag_dispatch_scheduling. To perform dispatch scheduling “-march=bdver1” and Haifa Scheduling flags must all be selected on the command line.

Testing
-------

Self compile ran with “–mdispatch-scheduling -fschedule-insns -fsched-pressure –O2". Dispatch scheduling flag was manually set on in the self compile to exercise the new code. No new test added for this implementation. Make check of i386 tests passes. No difference in the number of failures with and without the dispatch flag.

ChangeLog
---------

2010-08-12  Reza Yazdani  <reza.yazdani@amd.com>

    * tm.texi.in (TARGET_SCHED_DISPATCH): New.
    (TARGET_SCHED_DISPATCH_DO): New.
    * tm.texi: Regererated.
    * hooks.c (hook_bool_rtx_int_false): New.
    (hook_void_rtx_int): New.
    * hooks.h (hook_bool_rtx_int_false): New.
    (hook_void_rtx_int): New.
    * target.def (dispatch): Defined.
    (dispatch_do): Defined.
    * haifa-sched.c (ready_remove_first_dispatch): New.
    (number_in_ready): New.
    (get_ready_element): New.
    * sched-init.h (get_ready_element): Declared.
    (number_in_ready): Declared.
    (debug_ready_dispatch): Declared.
    (debug_dispatch_window): Declared.
    * i386.opt (-mdispatch-scheduler): Declared.
    (flag_dispatch_scheduling): Declared.
    * i386.c (has_dispatch): New.
    (get_mem_group): New.
    (is_cmp): New.
    (dispatch_violation): New.
    (is_branch): New.
    (is_prefetch): New.
    (init_window): New.
    (allocate_window): New.
    (init_dispatch_sched): New.
    (is_end_basic_block): New.
    (process_end_window): New.
    (allocate_next_window): New.
    (find_constant_1): New.
    (find_constant): New.
    (get_num_immediate): New.
    (has_immediate): New.
    (get_insn_path): New.
    (dispatch_group): New.
    (count_num_restricted): New.
    (fits_dispatch_window): New.
    (add_insn_window): New.
    (add_to_dispatch_window): New.
    (debug_dispatch_window_file): New.
    (debug_dispatch_window): New.
    (debug_insn_dispatch_info_file): New.
    (debug_ready_dispatch): New.
    (do_dispatch): New.
    (has_dispatch): New.

Reza Yazdani
--------------------------------------------------------------------------
Previous communications regarding this patch:

There were complains that there are too many hooks in my implementation. I changed the interface and only two hooks are used in the current implementation. One for boolean functions and one for action routines.

RY
 
--- On Fri, 8/20/10, Vladimir N. Makarov <vmakarov@redhat.com> wrote:

> From: Vladimir N. Makarov <vmakarov@redhat.com>
> Subject: Re: Patch for AMD Dispatch Scheduler
> To: "reza yazdani" <yazdani_reza@yahoo.com>
> Cc: gcc-patches@gcc.gnu.org, jh@suse.cz, ubizjak@gmail.com, sebpop@gmail.com
> Date: Friday, August 20, 2010, 8:29 AM
> On 08/12/2010 08:27 PM, reza yazdani
> wrote:
> > Dispatch scheduling is a new BD feature. It is
> composed of two parts: the scheduling part and the alignment
> part.
> > 
> > The scheduling part (this patch) arranges instructions
> to maximize the throughput of the hardware dispatcher. It
> makes sure dispatch widow boundaries are roughly observed.
> It is roughly, because the lengths of instructions, in
> number of bytes, are not known at the scheduling time. In
> x86 some instruction lengths may not be known until assembly
> time where information such as branch offsets are computed.
> Scheduling part is called once before register allocation
> and once after register allocation.
> > 
> > The alignment part (not in this patch) makes sure
> dispatch widows align at the correct boundaries.
> > 
> > Dispatch Scheduling is implemented as an extension to
> Haifa Scheduler pass. Scheduler is programed to follow
> x86-BD dispatching rules during the scheduling.
> > 
>   I thought about this patch for a long time. 
> The insn scheduler is
> very machine-dependent pass and it is hard to use the same
> model for
> so different targets.  The insn scheduler reached a
> point where it has
> too many hooks some of them is duplicated.  We need to
> do some work to
> decrease number of hooks and their renaming.  For
> example, code for
> add_to_dispatch_window call can be hidden in variable_issue
> or
> dispatch_init in issue_rate or
> dfa_{pre,post}_cycle_insn.  Of course,
> the work is not for you because it needs a lot of knowledge
> about GCC
> insn scheduler.
> 
>   As for the scheduler part of the patch.  I
> think it is ok for now.
> But if you used dfa pipeline hazard recognizer as Richard
> Henderson
> mentioned, you would not need function
> ready_remove_first_dispatch and
> you could use first cycle multi-pass insn scheduling
> (although it is
> not necessary for your target because constraints do not
> depend on the
> insn order in the window as for Itanium but it might change
> in future
> who knows).  You could also use modulo-scheduling
> which might be more
> important for OOO processors than insn scheduling. 
> You would not need
> a new option too for dispatch scheduling.  Writing for
> dfa description
> of a new processor would require nontraditional programming
> like
> defining new attributes for md insns and usage of them in
> dfa
> description.  I think it is too late for GCC4.6 to
> rewrite this all.
> But I think it is worth to be considered for next GCC
> release.  I guess you
> could avoid the current implementation if you discussed the
> approach with the GCC
> community before starting the implementation.
> 
>   On the other hand, the current implementation of dfa
> pipeline hazard recognizer
> has own disadvantage for x86/x86_64.  The recognizer
> is considered to be too big
> because of numerous automata.
> 
>   The scheduler part of patch is ok for the trunk.
> 
> Thanks.
> 
> 
> > A new command line flag –mdispatch-scheduler is
> defined. This option sets flag_dispatch_scheduling. To
> perform dispatch scheduling “-march=bdver1” and Haifa
> Scheduling flags must all be selected on the command line.
> > 
> 
> 
------------------------------------
From: "Richard Henderson" <rth@redhat.com>

On 08/12/2010 05:27 PM, reza yazdani wrote:
> The scheduling part (this patch) arranges instructions to maximize
> the throughput of the hardware dispatcher. It makes sure dispatch
> widow boundaries are roughly observed. It is roughly, because the
> lengths of instructions, in number of bytes, are not known at the
> scheduling time. In x86 some instruction lengths may not be known
> until assembly time where information such as branch offsets are
> computed. Scheduling part is called once before register allocation
> and once after register allocation.

I'm a bit confused how this "dispatch" scheduling is different
from other scheduling, and why it needs so many new hooks.  The
whole process smells very similar to "bundling" like we'd do on
ia64 for instance.

For instance, if I compare the structure of your new
ready_remove_first_dispatch function to choose_ready, I see
many similarities.  It sure looks like the multipass_dfa
scheduling hooks can be made to do what you want.

Can you explain what's fundamentally different about your bits?

---------------------------------------------------------------
From:"Uros Bizjak" <ubizjak@gmail.com>

On Fri, Aug 13, 2010 at 2:27 AM, reza yazdani <yazdani_reza@yahoo.com> wrote:
> Dispatch scheduling is a new BD feature. It is composed of two parts: the scheduling part and the alignment part.
>
> The scheduling part (this patch) arranges instructions to maximize the throughput of the hardware dispatcher. It makes sure dispatch widow boundaries are roughly observed. It is roughly, because the lengths of instructions, in number of bytes, are not known at the scheduling time. In x86 some instruction lengths may not be known until assembly time where information such as branch offsets are computed. Scheduling part is called once before register allocation and once after register allocation.
>
> The alignment part (not in this patch) makes sure dispatch widows align at the correct boundaries.
>
> Dispatch Scheduling is implemented as an extension to Haifa Scheduler pass. Scheduler is programed to follow x86-BD dispatching rules during the scheduling.
>
> A new command line flag –mdispatch-scheduler is defined. This option sets flag_dispatch_scheduling. To perform dispatch scheduling “-march=bdver1” and Haifa Scheduling flags must all be selected on the command line.
>
> Testing
> -------
>
> Self compile ran with “–mdispatch-scheduling –O3”. No new test added for this implementation. Make check of i386 tests passes. No difference in the number of failures with and without the dispatch flag.

+/* Number of allowable groups in a dispatch window.  It is an array
+   indexed by dispatch_group enum.  100 is used as a big number,
+   because the number of these kind of operations does not have any
+   effect in dispatch window, but we need them for other reasons in
+   the table.  */
+static int num_allowable_groups[disp_last] =
+{
+  0, 2, 1, 1, 2, 4, 4, 2, 1, BIG, BIG
+};

Some invalid number, like -1 doesn't fit there?

+/* Returns decode type on an AMDFAM10 machine which can be
+   "DIRECT", "DOUBLE", "VECTOR" which are decoded
+   to 1, 2 or more than 2 micro-ops respectively for INSN.  */
+static enum attr_amdfam10_decode
+dispatch_group_amdfam10_decode (rtx insn)

Space between comment and function declaration.

+/* Return true if INSN is a prefetch instruction.  */
+static bool
+is_prefetch (rtx insn)
+{
+  return ((strcmp (GET_RTX_NAME (GET_CODE (insn)), "prefetch_sse") == 0)
+         || (strcmp (GET_RTX_NAME (GET_CODE (insn)),
+                     "prefetch_sse_rex") == 0)
+         || (strcmp (GET_RTX_NAME (GET_CODE (insn)),
+                     "prefetch_3dnow") == 0)
+         || (strcmp (GET_RTX_NAME (GET_CODE (insn)),
+                     "prefetch_3dnow_rex") == 0));
+}

No! Please introduce "prefetch" type and handle it elsewhere instead
of the call to is_prefetch. The names already changed in the
mainline...

+static void
+process_end_window (void)
+{
+  gcc_assert (dispatch_window_list->num_insn <= MAX_INSN);
+  if (dispatch_window_list->next)
+    {
+      gcc_assert (dispatch_window_list1->num_insn <= MAX_INSN);
+      gcc_assert (dispatch_window_list->window_size +
dispatch_window_list1->window_size <= 48);
+      init_window (1);
+    }
+  init_window (0);
+}

Watch for line lengths (there are multiple violations in the patch).

+static void
+find_con (const_rtx in_rtx, int *imm, int *imm32, int *imm64)

find_constant, please ?
+  code = GET_CODE (in_rtx);
+  if (code == CONST_INT)
+    {
+      (*imm)++;
+      (*imm32)++;
+    }
+  else if (code == CONST_DOUBLE)
+    {
+      (*imm)++;
+      (*imm64)++;
+    }

This will work in the wrong way on 64bit hosts, where CONST_INT also
covers 64bit immediates. Try to compile:

long long test (long long a)
{
  return a + 0x1122334455667788ll;
}

#(insn:TI 6 3 23 imm.c:2 (set (reg:DI 0 ax [61])
#        (const_int 1234605616436508552 [0x1122334455667788])) 89
{*movdi_1_rex64} (expr_list:REG_EQUIV (const_int 1234605616436508552
[0x1122334455667788])
#        (nil)))
    movabsq    $1234605616436508552, %rax    # 6    *movdi_1_rex64/3    [length = 10]

+static int
+get_num_imm (rtx insn, int *imm, int *imm32, int *imm64)

get_num_immediates...
Also, this function should be rewritten to use for_each_rtx to call
find_con, see many examples in gcc source.

+static bool
+has_imm (rtx insn)

has_immediate

+/* Get bytes length of INSN.
+   This function is very similar to the static function min_insn_size
+   in i386.c.  The main difference is the use of get_attr_length.  */
+
+static int
+get_insn_length (rtx insn)

Huh? min_insn_size also uses get_attr_length. This function is almost
exact copy of min_insn_size (minus some early discards and "important
cases" that are present in min_insn_size and not here). Please remove
this function.

+static enum insn_path
+get_insn_path (rtx insn)
+{
+  enum attr_amdfam10_decode path = dispatch_group_amdfam10_decode (insn);
+
+  if ((int)path == 0)
+    return path_single;
+
+  if ((int)path == 1)
+    return path_double;
+
+  return path_multi;

Just inline dispatch_group_amdfam10_decode (this is the only user).
Also, don't cast attributes, you can use
AMDFAM10_DECODE_{DIRECT,VECTOR,DOUBLE} defines.

+static int
+count_num_restricted (rtx insn, dispatch_windows *window_list)

Watch line lengths in this function!

+static void
+print_dispatch_window (FILE *file, int window_num)

Please also make this a DEBUG_FUNCTION, the convention is to name it
"debug_dispatch_window_file".

+  fprintf (file, "  num_imm = %d, num_imm_32 = %d, num_imm_64 = %d,\
+ imm_size = %d\n",
+          list->num_imm, list->num_imm_32, list->num_imm_64, list->imm_size);

No multiline strings... If it can't be split, it is tolerable for the
string can go over 72 character limit.

+/* Print to stderr a dispatch window.  */

Please print to stdout, as all other debug functions do.

+DEBUG_FUNCTION void
+debug_print_dispatch_window (int window_num)

debug_dispatch_window

+static void
+print_insn_dispatch_info (FILE *file, rtx insn)

Similar to above, make this a DEBUG_FUNCTION and rename it to
debug_insn_dispatch_info_file. Also, watch for multiline strings.

+DEBUG_FUNCTION void
+debug_print_insn_dispatch_info (rtx insn)
+{
+  print_insn_dispatch_info (stderr, insn);
+}

As above, print to stdout and rename this function to debug_insn_dispatch_info.

Thanks,
Uros.
Vladimir Makarov - Aug. 31, 2010, 8:23 p.m.
On 08/24/2010 06:02 PM, reza yazdani wrote:
> Dispatch scheduling is a new BD feature. It is composed of two parts: the scheduling part and the alignment part.
>
> The scheduling part (this patch) arranges instructions to maximize the throughput of the hardware dispatcher. It makes sure dispatch widow boundaries are roughly observed. It is roughly, because the lengths of instructions, in number of bytes, are not known at the scheduling time. In x86 some instruction lengths may not be known until assembly time where information such as branch offsets are computed. Scheduling part is called once before register allocation and once after register allocation.
>
> The alignment part (not in this patch) makes sure dispatch widows align at the correct boundaries.
>
> Dispatch Scheduling is implemented as an extension to Haifa Scheduler pass. Scheduler is programed to follow x86-BD dispatching rules during the scheduling.
>
> 2 GCC hook functions are used to communicate from the machine independent part to the machine dependent parts of the scheduler.
>
> A new command line flag –mdispatch-scheduler is defined. This option sets flag_dispatch_scheduling. To perform dispatch scheduling “-march=bdver1” and Haifa Scheduling flags must all be selected on the command line.
>
> Testing
> -------
>
> Self compile ran with “–mdispatch-scheduling -fschedule-insns -fsched-pressure –O2". Dispatch scheduling flag was manually set on in the self compile to exercise the new code. No new test added for this implementation. Make check of i386 tests passes. No difference in the number of failures with and without the dispatch flag.
>
> ChangeLog
> ---------
>
> 2010-08-12  Reza Yazdani<reza.yazdani@amd.com>
>
>      * tm.texi.in (TARGET_SCHED_DISPATCH): New.
>      (TARGET_SCHED_DISPATCH_DO): New.
>      * tm.texi: Regererated.
>      * hooks.c (hook_bool_rtx_int_false): New.
>      (hook_void_rtx_int): New.
>      * hooks.h (hook_bool_rtx_int_false): New.
>      (hook_void_rtx_int): New.
>      * target.def (dispatch): Defined.
>      (dispatch_do): Defined.
>      * haifa-sched.c (ready_remove_first_dispatch): New.
>      (number_in_ready): New.
>      (get_ready_element): New.
>      * sched-init.h (get_ready_element): Declared.
>      (number_in_ready): Declared.
>      (debug_ready_dispatch): Declared.
>      (debug_dispatch_window): Declared.
>      * i386.opt (-mdispatch-scheduler): Declared.
>      (flag_dispatch_scheduling): Declared.
>      * i386.c (has_dispatch): New.
>      (get_mem_group): New.
>      (is_cmp): New.
>      (dispatch_violation): New.
>      (is_branch): New.
>      (is_prefetch): New.
>      (init_window): New.
>      (allocate_window): New.
>      (init_dispatch_sched): New.
>      (is_end_basic_block): New.
>      (process_end_window): New.
>      (allocate_next_window): New.
>      (find_constant_1): New.
>      (find_constant): New.
>      (get_num_immediate): New.
>      (has_immediate): New.
>      (get_insn_path): New.
>      (dispatch_group): New.
>      (count_num_restricted): New.
>      (fits_dispatch_window): New.
>      (add_insn_window): New.
>      (add_to_dispatch_window): New.
>      (debug_dispatch_window_file): New.
>      (debug_dispatch_window): New.
>      (debug_insn_dispatch_info_file): New.
>      (debug_ready_dispatch): New.
>      (do_dispatch): New.
>      (has_dispatch): New.
>
> Reza Yazdani
> --------------------------------------------------------------------------
> Previous communications regarding this patch:
>
> There were complains that there are too many hooks in my implementation. I changed the interface and only two hooks are used in the current implementation. One for boolean functions and one for action routines.
>
Thanks for addressing this issue.  The scheduler part (haifa-sched.c) is 
ok.  I'd move external declaration debug_ready_dispatch and 
debug_dispatch_window from sched-int.h to i386.h because they are 
defined there.  Sched-int.h for definitions of machine-dependent parts 
of insn scheduler.
reza yazdani - Sept. 7, 2010, 9:51 p.m.
Hello All,

I receive Vladimir's repose and moved the extern's he mentioned to i386.h in my own copy.

Is the rest of the code, mostly in i386.[ch], okay to be checked into the trunk after the above change?

Please let me know.

Thanks,
reza

--- On Tue, 8/31/10, Vladimir Makarov <vmakarov@redhat.com> wrote:

> From: Vladimir Makarov <vmakarov@redhat.com>
> Subject: Re: Patch for AMD Dispatch Scheduler
> To: "reza yazdani" <yazdani_reza@yahoo.com>
> Cc: gcc-patches@gcc.gnu.org, jh@suse.cz, ubizjak@gmail.com, sebpop@gmail.com, "Richard Henderson" <rth@redhat.com>
> Date: Tuesday, August 31, 2010, 1:23 PM
>   On 08/24/2010 06:02 PM, reza
> yazdani wrote:
> > Dispatch scheduling is a new BD feature. It is
> composed of two parts: the scheduling part and the alignment
> part.
> >
> > The scheduling part (this patch) arranges instructions
> to maximize the throughput of the hardware dispatcher. It
> makes sure dispatch widow boundaries are roughly observed.
> It is roughly, because the lengths of instructions, in
> number of bytes, are not known at the scheduling time. In
> x86 some instruction lengths may not be known until assembly
> time where information such as branch offsets are computed.
> Scheduling part is called once before register allocation
> and once after register allocation.
> >
> > The alignment part (not in this patch) makes sure
> dispatch widows align at the correct boundaries.
> >
> > Dispatch Scheduling is implemented as an extension to
> Haifa Scheduler pass. Scheduler is programed to follow
> x86-BD dispatching rules during the scheduling.
> >
> > 2 GCC hook functions are used to communicate from the
> machine independent part to the machine dependent parts of
> the scheduler.
> >
> > A new command line flag –mdispatch-scheduler is
> defined. This option sets flag_dispatch_scheduling. To
> perform dispatch scheduling “-march=bdver1” and Haifa
> Scheduling flags must all be selected on the command line.
> >
> > Testing
> > -------
> >
> > Self compile ran with “–mdispatch-scheduling
> -fschedule-insns -fsched-pressure –O2". Dispatch
> scheduling flag was manually set on in the self compile to
> exercise the new code. No new test added for this
> implementation. Make check of i386 tests passes. No
> difference in the number of failures with and without the
> dispatch flag.
> >
> > ChangeLog
> > ---------
> >
> > 2010-08-12  Reza Yazdani<reza.yazdani@amd.com>
> >
> >      * tm.texi.in
> (TARGET_SCHED_DISPATCH): New.
> >      (TARGET_SCHED_DISPATCH_DO): New.
> >      * tm.texi: Regererated.
> >      * hooks.c
> (hook_bool_rtx_int_false): New.
> >      (hook_void_rtx_int): New.
> >      * hooks.h
> (hook_bool_rtx_int_false): New.
> >      (hook_void_rtx_int): New.
> >      * target.def (dispatch): Defined.
> >      (dispatch_do): Defined.
> >      * haifa-sched.c
> (ready_remove_first_dispatch): New.
> >      (number_in_ready): New.
> >      (get_ready_element): New.
> >      * sched-init.h
> (get_ready_element): Declared.
> >      (number_in_ready): Declared.
> >      (debug_ready_dispatch): Declared.
> >      (debug_dispatch_window):
> Declared.
> >      * i386.opt (-mdispatch-scheduler):
> Declared.
> >      (flag_dispatch_scheduling):
> Declared.
> >      * i386.c (has_dispatch): New.
> >      (get_mem_group): New.
> >      (is_cmp): New.
> >      (dispatch_violation): New.
> >      (is_branch): New.
> >      (is_prefetch): New.
> >      (init_window): New.
> >      (allocate_window): New.
> >      (init_dispatch_sched): New.
> >      (is_end_basic_block): New.
> >      (process_end_window): New.
> >      (allocate_next_window): New.
> >      (find_constant_1): New.
> >      (find_constant): New.
> >      (get_num_immediate): New.
> >      (has_immediate): New.
> >      (get_insn_path): New.
> >      (dispatch_group): New.
> >      (count_num_restricted): New.
> >      (fits_dispatch_window): New.
> >      (add_insn_window): New.
> >      (add_to_dispatch_window): New.
> >      (debug_dispatch_window_file):
> New.
> >      (debug_dispatch_window): New.
> >      (debug_insn_dispatch_info_file):
> New.
> >      (debug_ready_dispatch): New.
> >      (do_dispatch): New.
> >      (has_dispatch): New.
> >
> > Reza Yazdani
> >
> --------------------------------------------------------------------------
> > Previous communications regarding this patch:
> >
> > There were complains that there are too many hooks in
> my implementation. I changed the interface and only two
> hooks are used in the current implementation. One for
> boolean functions and one for action routines.
> >
> Thanks for addressing this issue.  The scheduler part
> (haifa-sched.c) is 
> ok.  I'd move external declaration
> debug_ready_dispatch and 
> debug_dispatch_window from sched-int.h to i386.h because
> they are 
> defined there.  Sched-int.h for definitions of
> machine-dependent parts 
> of insn scheduler.
> 
>
Uros Bizjak - Sept. 8, 2010, 6:09 a.m.
On Tue, Sep 7, 2010 at 11:51 PM, reza yazdani <yazdani_reza@yahoo.com> wrote:

> I receive Vladimir's repose and moved the extern's he mentioned to i386.h in my own copy.
>
> Is the rest of the code, mostly in i386.[ch], okay to be checked into the trunk after the above change?
>
> Please let me know.

One question for following code:

+/* Return true if insn is a compare instruction.  */
+
+static bool
+is_cmp (rtx insn)
+{
+  enum attr_type type;
+  type = get_attr_type (insn);
+  return (type == TYPE_TEST || type == TYPE_ICMP || type == TYPE_FCMP);
+}

We have also SSE compare instructions (TYPE_SSECOMI) and several x87
compare sequences involving fcmp/fnstsw of TYPE_MULTI. Perhaps you
should check for GET_CODE (PATTERN (insn)) == COMPARE here?

The x86 part is otherwise OK, but please wait a day or two for
eventual comments from other maintainers.

Thanks,
Uros.

Patch

Index: doc/tm.texi
===================================================================
--- doc/tm.texi	(revision 163494)
+++ doc/tm.texi	(working copy)
@@ -6759,6 +6759,16 @@  bound will be used in case this hook is 
 of instructions divided by the issue rate.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_SCHED_DISPATCH (rtx @var{insn}, int @var{x})
+This hook is called by Haifa Scheduler.  It returns true if dispatch scheduling
+is supported in hardware and the condition specified in the parameter is true.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_SCHED_DISPATCH_DO (rtx @var{insn}, int @var{x})
+This hook is called by Haifa Scheduler.  It performs the operation specified
+in its second parameter.
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
Index: doc/tm.texi.in
===================================================================
--- doc/tm.texi.in	(revision 163494)
+++ doc/tm.texi.in	(working copy)
@@ -6759,6 +6759,16 @@  bound will be used in case this hook is 
 of instructions divided by the issue rate.
 @end deftypefn
 
+@hook TARGET_SCHED_DISPATCH
+This hook is called by Haifa Scheduler.  It returns true if dispatch scheduling
+is supported in hardware and the condition specified in the parameter is true.
+@end deftypefn
+
+@hook TARGET_SCHED_DISPATCH_DO
+This hook is called by Haifa Scheduler.  It performs the operation specified
+in its second parameter.
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
Index: hooks.c
===================================================================
--- hooks.c	(revision 163494)
+++ hooks.c	(working copy)
@@ -340,3 +340,18 @@  hook_tree_const_tree_null (const_tree t 
 {
   return NULL;
 }
+
+/* Generic hook that takes a rtx and an int and returns a bool.  */
+
+bool
+hook_bool_rtx_int_false (rtx insn ATTRIBUTE_UNUSED, int mode ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+/* Generic hook that takes a rtx and an int and returns void.  */
+
+void
+hook_void_rtx_int (rtx insn ATTRIBUTE_UNUSED, int mode ATTRIBUTE_UNUSED)
+{
+}
Index: hooks.h
===================================================================
--- hooks.h	(revision 163494)
+++ hooks.h	(working copy)
@@ -46,6 +46,7 @@  extern bool hook_bool_const_tree_hwi_hwi
 							  HOST_WIDE_INT,
 							  const_tree);
 extern bool hook_bool_rtx_false (rtx);
+extern bool hook_bool_rtx_int_false (rtx, int);
 extern bool hook_bool_uintp_uintp_false (unsigned int *, unsigned int *);
 extern bool hook_bool_rtx_int_int_intp_bool_false (rtx, int, int, int *, bool);
 extern bool hook_bool_size_t_constcharptr_int_true (size_t, const char *, int);
@@ -55,6 +56,7 @@  extern bool hook_bool_tree_bool_false (t
 
 extern void hook_void_void (void);
 extern void hook_void_constcharptr (const char *);
+extern void hook_void_rtx_int (rtx, int);
 extern void hook_void_FILEptr_constcharptr (FILE *, const char *);
 extern void hook_void_tree (tree);
 extern void hook_void_tree_treeptr (tree, tree *);
Index: target.def
===================================================================
--- target.def	(revision 163494)
+++ target.def	(working copy)
@@ -761,6 +761,24 @@  DEFHOOK
  "",
  int, (struct ddg *g), NULL)
 
+/* The following member value is a function that initializes dispatch
+   schedling and adds instructions to dispatch window according to its
+   parameters.  */
+DEFHOOK
+(dispatch_do,
+"",
+void, (rtx insn, int x),
+hook_void_rtx_int)
+
+/* The following member value is a a function that returns true is
+   dispatch schedling is supported in hardware and condition passed
+   as the second parameter is true.  */
+DEFHOOK
+(dispatch,
+"",
+bool, (rtx insn, int x),
+hook_bool_rtx_int_false)
+
 HOOK_VECTOR_END (sched)
 
 /* Functions relating to vectorization.  */
Index: haifa-sched.c
===================================================================
--- haifa-sched.c	(revision 163494)
+++ haifa-sched.c	(working copy)
@@ -532,6 +532,7 @@  static void extend_h_i_d (void);
 
 static void ready_add (struct ready_list *, rtx, bool);
 static rtx ready_remove_first (struct ready_list *);
+static rtx ready_remove_first_dispatch (struct ready_list *ready);
 
 static void queue_to_ready (struct ready_list *);
 static int early_queue_to_ready (state_t, struct ready_list *);
@@ -2636,13 +2637,15 @@  choose_ready (struct ready_list *ready, 
     }
 
   lookahead = 0;
-
   if (targetm.sched.first_cycle_multipass_dfa_lookahead)
     lookahead = targetm.sched.first_cycle_multipass_dfa_lookahead ();
   if (lookahead <= 0 || SCHED_GROUP_P (ready_element (ready, 0))
       || DEBUG_INSN_P (ready_element (ready, 0)))
     {
-      *insn_ptr = ready_remove_first (ready);
+      if (targetm.sched.dispatch (NULL_RTX, IS_DISPATCH_ON))
+	*insn_ptr = ready_remove_first_dispatch (ready);
+      else
+	*insn_ptr = ready_remove_first (ready);
       return 0;
     }
   else
@@ -3140,6 +3143,10 @@  schedule_block (basic_block *target_bb)
 						       last_scheduled_insn);
 
 	  move_insn (insn, last_scheduled_insn, current_sched_info->next_tail);
+
+	  if (targetm.sched.dispatch (NULL_RTX, IS_DISPATCH_ON))
+	    targetm.sched.dispatch_do (insn, ADD_TO_DISPATCH_WINDOW);
+
 	  reemit_notes (insn);
 	  last_scheduled_insn = insn;
 
@@ -3364,8 +3371,11 @@  sched_init (void)
   flag_schedule_speculative_load = 0;
 #endif
 
+  if (targetm.sched.dispatch (NULL_RTX, IS_DISPATCH_ON))
+    targetm.sched.dispatch_do (NULL_RTX, DISPATCH_INIT);
   sched_pressure_p = (flag_sched_pressure && ! reload_completed
 		      && common_sched_info->sched_pass_id == SCHED_RGN_PASS);
+
   if (sched_pressure_p)
     ira_setup_eliminable_regset ();
 
@@ -5557,4 +5567,69 @@  sched_emit_insn (rtx pat)
   return insn;
 }
 
+/* This function returns a candidate satisfying dispatch constraints from
+   the ready list.  */
+
+static rtx
+ready_remove_first_dispatch (struct ready_list *ready)
+{
+  int i;
+  rtx insn = ready_element (ready, 0);
+
+  if (ready->n_ready == 1
+      || INSN_CODE (insn) < 0
+      || !INSN_P (insn)
+      || !active_insn_p (insn)
+      || targetm.sched.dispatch (insn, FITS_DISPATCH_WINDOW))
+    return ready_remove_first (ready);
+
+  for (i = 1; i < ready->n_ready; i++)
+    {
+      insn = ready_element (ready, i);
+      if (INSN_CODE (insn) < 0
+	  || !INSN_P (insn)
+	  || !active_insn_p (insn))
+	continue;
+      if (targetm.sched.dispatch (insn, FITS_DISPATCH_WINDOW))
+	{
+	  /* Return ith element of ready.  */
+	  insn = ready_remove (ready, i);
+	  return insn;
+	}
+    }
+
+  if (targetm.sched.dispatch (NULL_RTX, DISPATCH_VIOLATION))
+    return ready_remove_first (ready);
+
+  for (i = 1; i < ready->n_ready; i++)
+    {
+      insn = ready_element (ready, i);
+      if (INSN_CODE (insn) < 0
+	  || !INSN_P (insn)
+	  || !active_insn_p (insn))
+	continue;
+      if (targetm.sched.dispatch (insn, IS_CMP))
+	  /* Return ith element of ready.  */
+	  return ready_remove (ready, i);
+    }
+
+  return ready_remove_first (ready);
+}
+
+/* Get number of ready insn in the ready list.  */
+
+int
+number_in_ready (void)
+{
+  return ready.n_ready;
+}
+
+/* Get number of ready's in the ready list.  */
+
+rtx
+get_ready_element (int i)
+{
+  return ready_element (&ready, i);
+}
+
 #endif /* INSN_SCHEDULING */
Index: sched-int.h
===================================================================
--- sched-int.h	(revision 163494)
+++ sched-int.h	(working copy)
@@ -1477,6 +1477,13 @@  sd_iterator_next (sd_iterator_def *it_pt
        sd_iterator_cond (&(ITER), &(DEP));			\
        sd_iterator_next (&(ITER)))
 
+#define IS_DISPATCH_ON 1
+#define IS_CMP 2
+#define DISPATCH_VIOLATION 3
+#define FITS_DISPATCH_WINDOW 4
+#define DISPATCH_INIT 5
+#define ADD_TO_DISPATCH_WINDOW 6
+
 extern int sd_lists_size (const_rtx, sd_list_types_def);
 extern bool sd_lists_empty_p (const_rtx, sd_list_types_def);
 extern void sd_init_insn (rtx);
@@ -1496,5 +1503,9 @@  extern void sd_debug_lists (rtx, sd_list
 extern void print_insn (char *, const_rtx, int);
 extern void print_pattern (char *, const_rtx, int);
 extern void print_value (char *, const_rtx, int);
+extern rtx get_ready_element (int i);
+extern int number_in_ready (void);
+extern void debug_ready_dispatch (void);
+extern void debug_dispatch_window (int);
 
 #endif /* GCC_SCHED_INT_H */
Index: config/i386/i386.opt
===================================================================
--- config/i386/i386.opt	(revision 163494)
+++ config/i386/i386.opt	(working copy)
@@ -250,6 +250,11 @@  Enable automatic generation of fused flo
 if the ISA supports such instructions.  The -mfused-madd option is on by
 default.
 
+mdispatch-scheduler
+Target RejectNegative Var(flag_dispatch_scheduler)
+Do dispatch scheduling if processor is bdver1 and Haifa scheduling
+is selected.
+
 ;; ISA support
 
 m32
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c	(revision 163494)
+++ config/i386/i386.c	(working copy)
@@ -55,7 +55,7 @@  along with GCC; see the file COPYING3.  
 #include "cselib.h"
 #include "debug.h"
 #include "dwarf2out.h"
-
+#include "sched-int.h"
 static rtx legitimize_dllimport_symbol (rtx, bool);
 
 #ifndef CHECK_STACK_LIMIT
@@ -31465,6 +31465,797 @@  ix86_enum_va_list (int idx, const char *
   return 0;
 }
 
+#undef TARGET_SCHED_DISPATCH
+#define TARGET_SCHED_DISPATCH has_dispatch
+#undef TARGET_SCHED_DISPATCH_DO
+#define TARGET_SCHED_DISPATCH_DO do_dispatch
+
+/* The size of the dispatch window is the total number of bytes of
+   object code allowed in a window.  */
+#define DISPATCH_WINDOW_SIZE 16
+
+/* Number of dispatch windows considered for scheduling.  */
+#define MAX_DISPATCH_WINDOWS 3
+
+/* Maximum number of instructions in a window.  */
+#define MAX_INSN 4
+
+/* Maximum number of immediate operands in a window.  */
+#define MAX_IMM 4
+
+/* Maximum number of immediate bits allowed in a window.  */
+#define MAX_IMM_SIZE 128
+
+/* Maximum number of 32 bit immediates allowed in a window.  */
+#define MAX_IMM_32 4
+
+/* Maximum number of 64 bit immediates allowed in a window.  */
+#define MAX_IMM_64 2
+
+/* Maximum total of loads or prefetches allowed in a window.  */
+#define MAX_LOAD 2
+
+/* Maximum total of stores allowed in a window.  */
+#define MAX_STORE 1
+
+#undef BIG
+#define BIG 100
+
+
+/* Dispatch groups.  Istructions that affect the mix in a dispatch window.  */
+enum dispatch_group {
+  disp_no_group = 0,
+  disp_load,
+  disp_store,
+  disp_load_store,
+  disp_prefetch,
+  disp_imm,
+  disp_imm_32,
+  disp_imm_64,
+  disp_branch,
+  disp_cmp,
+  disp_jcc,
+  disp_last
+};
+
+/* Number of allowable groups in a dispatch window.  It is an array
+   indexed by dispatch_group enum.  100 is used as a big number,
+   because the number of these kind of operations does not have any
+   effect in dispatch window, but we need them for other reasons in
+   the table.  */
+static unsigned int num_allowable_groups[disp_last] = {
+  0, 2, 1, 1, 2, 4, 4, 2, 1, BIG, BIG
+};
+
+char group_name[disp_last + 1][16] = {
+  "disp_no_group", "disp_load", "disp_store", "disp_load_store",
+  "disp_prefetch", "disp_imm", "disp_imm_32", "disp_imm_64",
+  "disp_branch", "disp_cmp", "disp_jcc", "disp_last"
+};
+
+/* Instruction path.  */
+enum insn_path {
+  no_path = 0,
+  path_single, /* Single micro op.  */
+  path_double, /* Double micro op.  */
+  path_multi,  /* Instructions with more than 2 micro op..  */
+  last_path
+};
+
+/* sched_insn_info defines a window to the instructions scheduled in
+   the basic block.  It contains a pointer to the insn_info table and
+   the instruction scheduled.
+
+   Windows are allocated for each basic block and are linked
+   together.  */
+typedef struct sched_insn_info_s {
+  rtx insn;
+  enum dispatch_group group;
+  enum insn_path path;
+  int byte_len;
+  int imm_bytes;
+} sched_insn_info;
+
+/* Linked list of dispatch windows.  This is a two way list of
+   dispatch windows of a basic block.  It contains information about
+   the number of uops in the window and the total number of
+   instructions and of bytes in the object code for this dispatch
+   window.  */
+typedef struct dispatch_windows_s {
+  int num_insn;            /* Number of insn in the window.  */
+  int num_uops;            /* Number of uops in the window.  */
+  int window_size;         /* Number of bytes in the window.  */
+  int window_num;          /* Window number between 0 or 1.  */
+  int num_imm;             /* Number of immediates in an insn.  */
+  int num_imm_32;          /* Number of 32 bit immediates in an insn.  */
+  int num_imm_64;          /* Number of 64 bit immediates in an insn.  */
+  int imm_size;            /* Total immediates in the window.  */
+  int num_loads;           /* Total memory loads in the window.  */
+  int num_stores;          /* Total memory stores in the window.  */
+  int violation;          /* Violation exists in window.  */
+  sched_insn_info *window; /* Pointer to the window.  */
+  struct dispatch_windows_s *next;
+  struct dispatch_windows_s *prev;
+} dispatch_windows;
+
+/* Immediate valuse used in an insn.  */
+typedef struct imm_info_s
+  {
+    int imm;
+    int imm32;
+    int imm64;
+  } imm_info;
+
+static dispatch_windows *dispatch_window_list;
+static dispatch_windows *dispatch_window_list1;
+
+/* Get dispatch group of insn.  */
+
+static enum dispatch_group
+get_mem_group (rtx insn)
+{
+  enum attr_memory memory;
+
+  if (INSN_CODE (insn) < 0)
+    return disp_no_group;
+  memory = get_attr_memory (insn);
+  if (memory == MEMORY_STORE)
+    return disp_store;
+
+  if (memory == MEMORY_LOAD)
+    return disp_load;
+
+  if (memory == MEMORY_BOTH)
+    return disp_load_store;
+
+  return disp_no_group;
+}
+
+/* Return true if insn is a compare instruction.  */
+
+static bool
+is_cmp (rtx insn)
+{
+  enum attr_type type;
+  type = get_attr_type (insn);
+  return (type == TYPE_TEST || type == TYPE_ICMP || type == TYPE_FCMP);
+}
+
+/* Return true if a dispatch violation encountered.  */
+
+static bool
+dispatch_violation (void)
+{
+  if (dispatch_window_list->next)
+    return dispatch_window_list->next->violation;
+  return dispatch_window_list->violation;
+}
+
+/* Return true if insn is a branch instruction.  */
+
+static bool
+is_branch (rtx insn)
+{
+  return (CALL_P (insn) || JUMP_P (insn));
+}
+
+/* Return true if insn is a prefetch instruction.  */
+
+static bool
+is_prefetch (rtx insn)
+{
+  return NONJUMP_INSN_P (insn) && GET_CODE (PATTERN (insn)) == PREFETCH;
+}
+
+/* This function initializes a dispatch window and the list container holding a
+   pointer to the window.  */
+
+static void
+init_window (int window_num)
+{
+  int i;
+  dispatch_windows *new_list;
+
+  if (window_num == 0)
+    new_list = dispatch_window_list;
+  else
+    new_list = dispatch_window_list1;
+
+  new_list->num_insn = 0;
+  new_list->num_uops = 0;
+  new_list->window_size = 0;
+  new_list->next = NULL;
+  new_list->prev = NULL;
+  new_list->window_num = window_num;
+  new_list->num_imm = 0;
+  new_list->num_imm_32 = 0;
+  new_list->num_imm_64 = 0;
+  new_list->imm_size = 0;
+  new_list->num_loads = 0;
+  new_list->num_stores = 0;
+  new_list->violation = false;
+
+  for (i = 0; i < MAX_INSN; i++)
+    {
+      new_list->window[i].insn = NULL;
+      new_list->window[i].group = disp_no_group;
+      new_list->window[i].path = no_path;
+      new_list->window[i].byte_len = 0;
+      new_list->window[i].imm_bytes = 0;
+    }
+  return;
+}
+
+/* This function allocates and initializes a dispatch window and the
+   list container holding a pointer to the window.  */
+
+static dispatch_windows *
+allocate_window (void)
+{
+  dispatch_windows *new_list = XNEW (struct dispatch_windows_s);
+  new_list->window = XNEWVEC (struct sched_insn_info_s, MAX_INSN + 1);
+
+  return new_list;
+}
+
+/* This routine initializes the dispatch scheduling information.  It
+   initiates building dispatch scheduler tables and constructs the
+   first dispatch window.  */
+
+static void
+init_dispatch_sched (void)
+{
+  /* Allocate a dispatch list and a window.  */
+  dispatch_window_list = allocate_window ();
+  dispatch_window_list1 = allocate_window ();
+  init_window (0);
+  init_window (1);
+}
+
+/* This function returns true if a branch is detected.  End of a basic block
+   does not have to be a branch, but here we assume only branches end a
+   window.  */
+
+static bool
+is_end_basic_block (enum dispatch_group group)
+{
+  return group == disp_branch;
+}
+
+/* This function is called when the end of a window processing is reached.  */
+
+static void
+process_end_window (void)
+{
+  gcc_assert (dispatch_window_list->num_insn <= MAX_INSN);
+  if (dispatch_window_list->next)
+    {
+      gcc_assert (dispatch_window_list1->num_insn <= MAX_INSN);
+      gcc_assert (dispatch_window_list->window_size + dispatch_window_list1->window_size <= 48);
+      init_window (1);
+    }
+  init_window (0);
+}
+
+/* Allocates a new dispatch window and adds it to WINDOW_LIST.
+   WINDOW_NUM is either 0 or 1.  A maximum of two windows are generated
+   for 48 bytes of instructions.  Note that these windows are not dispatch
+   windows that their sizes are DISPATCH_WINDOW_SIZE.  */
+
+static dispatch_windows *
+allocate_next_window (int window_num)
+{
+  if (window_num == 0)
+    {
+      if (dispatch_window_list->next)
+	  init_window (1);
+      init_window (0);
+      return dispatch_window_list;
+    }
+
+  dispatch_window_list->next = dispatch_window_list1;
+  dispatch_window_list1->prev = dispatch_window_list;
+
+  return dispatch_window_list1;
+}
+
+/* Increment the number of immediate operands of an instruction.  */
+
+static int
+find_constant_1 (rtx *in_rtx, imm_info *imm_values)
+{
+  if (*in_rtx == 0)
+    return 0;
+
+    switch ( GET_CODE (*in_rtx))
+    {
+    case CONST:
+    case SYMBOL_REF:
+    case CONST_INT:
+      (imm_values->imm)++;
+      if (x86_64_immediate_operand (*in_rtx, SImode))
+	(imm_values->imm32)++;
+      else
+	(imm_values->imm64)++;
+      break;
+
+    case CONST_DOUBLE:
+      (imm_values->imm)++;
+      (imm_values->imm64)++;
+      break;
+
+    case CODE_LABEL:
+      if (LABEL_KIND (*in_rtx) == LABEL_NORMAL)
+	{
+	  (imm_values->imm)++;
+	  (imm_values->imm32)++;
+	}
+      break;
+
+    default:
+      break;
+    }
+
+  return 0;
+}
+
+/* Compute number of immediate operands of an instruction.  */
+
+static void
+find_constant (rtx in_rtx, imm_info *imm_values)
+{
+  for_each_rtx (INSN_P (in_rtx) ? &PATTERN (in_rtx) : &in_rtx,
+		(rtx_function)find_constant_1, (void *)imm_values);
+}
+
+/* Return total size of immediate operands of an instruction along with number
+   of corresponding immediate-operands.  It initializes its parameters to zero
+   befor calling FIND_CONSTANT.
+   INSN is the input instruction.  IMM is the total of immediates.
+   IMM32 is the number of 32 bit immediates.  IMM64 is the number of 64
+   bit immediates.  */
+
+static int
+get_num_immediates (rtx insn, int *imm, int *imm32, int *imm64)
+{
+  imm_info imm_values = {0, 0, 0};
+
+  find_constant (insn, &imm_values);
+  *imm = imm_values.imm;
+  *imm32 = imm_values.imm32;
+  *imm64 = imm_values.imm64;
+  return imm_values.imm32 * 4 + imm_values.imm64 * 8;
+}
+
+/* This function indicates if an operand of an instruction is an
+   immediate.  */
+
+static bool
+has_immediate (rtx insn)
+{
+  int num_imm_operand;
+  int num_imm32_operand;
+  int num_imm64_operand;
+
+  if (insn)
+    return get_num_immediates (insn, &num_imm_operand, &num_imm32_operand,
+			&num_imm64_operand);
+  return false;
+}
+
+/* Return single or double path for instructions.  */
+
+static enum insn_path
+get_insn_path (rtx insn)
+{
+  enum attr_amdfam10_decode path = get_attr_amdfam10_decode (insn);
+
+  if ((int)path == 0)
+    return path_single;
+
+  if ((int)path == 1)
+    return path_double;
+
+  return path_multi;
+}
+
+/* Return insn dispatch group.  */
+
+static enum
+dispatch_group get_insn_group (rtx insn)
+{
+  enum dispatch_group group = get_mem_group (insn);
+  if (group)
+    return group;
+
+  if (is_branch (insn))
+    return disp_branch;
+
+  if (is_cmp (insn))
+    return disp_cmp;
+
+  if (has_immediate (insn))
+    return disp_imm;
+
+  if (is_prefetch (insn))
+    return disp_prefetch;
+
+  return disp_no_group;
+}
+
+/* Count number of GROUP restricted instructions in a dispatch
+   window WINDOW_LIST.  */
+
+static int
+count_num_restricted (rtx insn, dispatch_windows *window_list)
+{
+  enum dispatch_group group = get_insn_group (insn);
+  int imm_size;
+  int num_imm_operand;
+  int num_imm32_operand;
+  int num_imm64_operand;
+
+  if (group == disp_no_group)
+    return 0;
+
+  if (group == disp_imm)
+    {
+      imm_size = get_num_immediates (insn, &num_imm_operand, &num_imm32_operand,
+			      &num_imm64_operand);
+      if (window_list->imm_size + imm_size > MAX_IMM_SIZE
+	  || num_imm_operand + window_list->num_imm > MAX_IMM
+	  || (num_imm32_operand > 0
+	      && (window_list->num_imm_32 + num_imm32_operand > MAX_IMM_32
+		  || window_list->num_imm_64 * 2 + num_imm32_operand > MAX_IMM_32))
+	  || (num_imm64_operand > 0
+	      && (window_list->num_imm_64 + num_imm64_operand > MAX_IMM_64
+		  || window_list->num_imm_32 + num_imm64_operand * 2 > MAX_IMM_32))
+	  || (window_list->imm_size + imm_size == MAX_IMM_SIZE
+	      && num_imm64_operand > 0
+	      && ((window_list->num_imm_64 > 0
+		   && window_list->num_insn >= 2)
+		  || window_list->num_insn >= 3)))
+	return BIG;
+
+      return 1;
+    }
+
+  if ((group == disp_load_store
+       && (window_list->num_loads >= MAX_LOAD
+	   || window_list->num_stores >= MAX_STORE))
+      || ((group == disp_load
+	   || group == disp_prefetch)
+	  && window_list->num_loads >= MAX_LOAD)
+      || (group == disp_store
+	  && window_list->num_stores >= MAX_STORE))
+    return BIG;
+
+  return 1;
+}
+
+/* This function returns true if insn satisfies dispatch rules on the
+   last window scheduled.  */
+
+static bool
+fits_dispatch_window (rtx insn)
+{
+  dispatch_windows *window_list = dispatch_window_list;
+  dispatch_windows *window_list_next = dispatch_window_list->next;
+  unsigned int num_restrict;
+  enum dispatch_group group = get_insn_group (insn);
+  enum insn_path path = get_insn_path (insn);
+  int sum;
+
+  /* Make disp_cmp and disp_jcc get scheduled at the latest.  These
+     instructions should be given the lowest priority in the
+     scheduling process in Haifa scheduler to make sure they will be
+     scheduled in the same dispatch window as the refrence to them.  */
+  if (group == disp_jcc || group == disp_cmp)
+    return false;
+
+  /* Check nonrestricted.  */
+  if (group == disp_no_group || group == disp_branch)
+    return true;
+
+  /* Get last dispatch window.  */
+  if (window_list_next)
+    window_list = window_list_next;
+
+  if (window_list->window_num == 1)
+    {
+     sum = window_list->prev->window_size + window_list->window_size;
+      if (sum == 32
+	  || (min_insn_size (insn) + sum) >= 48)
+	/* Window 1 is full.  Go for next window.  */
+	return true;
+    }
+
+  num_restrict = count_num_restricted (insn, window_list);
+
+  if (num_restrict > num_allowable_groups[group])
+    return false;
+
+  /* See if it fits in the first window.  */
+  if (window_list->window_num == 0)
+    {
+      /* The first widow should have only single and double path
+	 uops.  */
+      if (path == path_double
+	  && (window_list->num_uops + 2) > MAX_INSN)
+	return false;
+      else if (path != path_single)
+        return false;
+    }
+  return true;
+}
+
+/* Add an instruction INSN with NUM_UOPS micro-operations to the
+   dispatch window WINDOW_LIST.  */
+
+static void
+add_insn_window (rtx insn, dispatch_windows *window_list, int num_uops)
+{
+  int byte_len = min_insn_size (insn);
+  int num_insn = window_list->num_insn;
+  int imm_size;
+  sched_insn_info *window = window_list->window;
+  enum dispatch_group group = get_insn_group (insn);
+  enum insn_path path = get_insn_path (insn);
+  int num_imm_operand;
+  int num_imm32_operand;
+  int num_imm64_operand;
+
+  if (!window_list->violation && group != disp_cmp
+      && !fits_dispatch_window (insn))
+    window_list->violation = true;
+
+  imm_size = get_num_immediates (insn, &num_imm_operand, &num_imm32_operand,
+			  &num_imm64_operand);
+    /* Initialize window with new instruction.  */
+  window[num_insn].insn = insn;
+  window[num_insn].byte_len = byte_len;
+  window[num_insn].group = group;
+  window[num_insn].path = path;
+  window[num_insn].imm_bytes = imm_size;
+
+  window_list->window_size += byte_len;
+  window_list->num_insn = num_insn + 1;
+  window_list->num_uops = window_list->num_uops + num_uops;
+  window_list->imm_size += imm_size;
+  window_list->num_imm += num_imm_operand;
+  window_list->num_imm_32 += num_imm32_operand;
+  window_list->num_imm_64 += num_imm64_operand;
+
+  if (group == disp_store)
+    window_list->num_stores += 1;
+  else if (group == disp_load
+	   || group == disp_prefetch)
+    window_list->num_loads += 1;
+  else if (group == disp_load_store)
+    {
+      window_list->num_stores += 1;
+      window_list->num_loads += 1;
+    }
+}
+
+/* Adds a scheduled instruction, INSN, to the current dispatch window.
+   If the total bytes of instructions or the number of instructions in
+   the window exceed allowable, it allocates a new window.  */
+
+static void
+add_to_dispatch_window (rtx insn)
+{
+  int byte_len;
+  dispatch_windows *window_list;
+  dispatch_windows *next_list;
+  dispatch_windows *window0_list;
+  enum insn_path path;
+  enum dispatch_group insn_group;
+  bool insn_fits;
+  int num_insn;
+  int num_uops;
+  int window_num;
+  int insn_num_uops;
+  int sum;
+
+  if (INSN_CODE (insn) < 0)
+    return;
+
+  byte_len = min_insn_size (insn);
+  window_list = dispatch_window_list;
+  next_list = window_list->next;
+  path = get_insn_path (insn);
+  insn_group = get_insn_group (insn);
+
+  /* Get the last dispatch window.  */
+  if (next_list)
+      window_list = dispatch_window_list->next;
+
+  if (path == path_single)
+    insn_num_uops = 1;
+  else if (path == path_double)
+    insn_num_uops = 2;
+  else
+    insn_num_uops = (int) path;
+
+  /* If current window is full, get a new window.
+     Window number zero is full, if MAX_INSN uops are scheduled in it.
+     Window number one is full, if window zero's bytes plus window
+     one's bytes is 32, or if the bytes of the new instruction added
+     to the total makes it greater than 48, or it has already MAX_INSN
+     instructions in it.  */
+  num_insn = window_list->num_insn;
+  num_uops = window_list->num_uops;
+  window_num = window_list->window_num;
+  insn_fits = fits_dispatch_window (insn);
+
+  if (num_insn >= MAX_INSN
+      || num_uops + insn_num_uops > MAX_INSN
+      || !(insn_fits))
+    {
+      window_num = ~window_num & 1;
+      window_list = allocate_next_window (window_num);
+    }
+
+  if (window_num == 0)
+    {
+      add_insn_window (insn, window_list, insn_num_uops);
+      if (window_list->num_insn >= MAX_INSN
+	  && insn_group == disp_branch)
+	{
+	  process_end_window ();
+	  return;
+	}
+    }
+  else if (window_num == 1)
+    {
+      window0_list = window_list->prev;
+      sum = window0_list->window_size + window_list->window_size;
+      if (sum == 32
+	  || (byte_len + sum) >= 48)
+	{
+	  process_end_window ();
+	  window_list = dispatch_window_list;
+	}
+
+      add_insn_window (insn, window_list, insn_num_uops);
+    }
+  else
+    gcc_unreachable ();
+
+  if (is_end_basic_block (insn_group))
+    {
+      /* End of basic block is reached do end-basic-block process.  */
+      process_end_window ();
+      return;
+    }
+}
+
+/* Print the dispatch window, WINDOW_NUM, to FILE.  */
+
+DEBUG_FUNCTION static void
+debug_dispatch_window_file (FILE *file, int window_num)
+{
+  dispatch_windows *list;
+  int i;
+
+  if (window_num == 0)
+    list = dispatch_window_list;
+  else
+    list = dispatch_window_list1;
+
+  fprintf (file, "Window #%d:\n", list->window_num);
+  fprintf (file, "  num_insn = %d, num_uops = %d, window_size = %d\n",
+	  list->num_insn, list->num_uops, list->window_size);
+  fprintf (file, "  num_imm = %d, num_imm_32 = %d, num_imm_64 = %d, imm_size = %d\n",
+	   list->num_imm, list->num_imm_32, list->num_imm_64, list->imm_size);
+
+  fprintf (file, "  num_loads = %d, num_stores = %d\n", list->num_loads,
+	  list->num_stores);
+  fprintf (file, " insn info:\n");
+
+  for (i = 0; i < MAX_INSN; i++)
+    {
+      if (!list->window[i].insn)
+	break;
+      fprintf (file, "    group[%d] = %s, insn[%d] = %p, path[%d] = %d byte_len[%d] = %d, imm_bytes[%d] = %d\n",
+	      i, group_name[list->window[i].group],
+	      i, (void *)list->window[i].insn,
+	      i, list->window[i].path,
+	      i, list->window[i].byte_len,
+	      i, list->window[i].imm_bytes);
+    }
+}
+
+/* Print to stdout a dispatch window.  */
+
+DEBUG_FUNCTION void
+debug_dispatch_window (int window_num)
+{
+  debug_dispatch_window_file (stdout, window_num);
+}
+
+/* Print INSN dispatch information to FILE.  */
+
+DEBUG_FUNCTION static void
+debug_insn_dispatch_info_file (FILE *file, rtx insn)
+{
+  int byte_len;
+  enum insn_path path;
+  enum dispatch_group group;
+  int imm_size;
+  int num_imm_operand;
+  int num_imm32_operand;
+  int num_imm64_operand;
+
+  if (INSN_CODE (insn) < 0)
+    return;
+
+  byte_len = min_insn_size (insn);
+  path = get_insn_path (insn);
+  group = get_insn_group (insn);
+  imm_size = get_num_immediates (insn, &num_imm_operand, &num_imm32_operand,
+				 &num_imm64_operand);
+
+  fprintf (file, " insn info:\n");
+  fprintf (file, "  group = %s, path = %d, byte_len = %d\n",
+	   group_name[group], path, byte_len);
+  fprintf (file, "  num_imm = %d, num_imm_32 = %d, num_imm_64 = %d, imm_size = %d\n",
+	   num_imm_operand, num_imm32_operand, num_imm64_operand, imm_size);
+}
+
+/* Print to STDERR the status of the ready list with respect to
+   dispatch windows.  */
+
+DEBUG_FUNCTION void
+debug_ready_dispatch (void)
+{
+  int i;
+  int no_ready = number_in_ready ();
+  fprintf (stdout, "Number of ready: %d\n", no_ready);
+  for (i = 0; i < no_ready; i++)
+    debug_insn_dispatch_info_file (stdout, get_ready_element (i));
+}
+
+/* This routine is the driver of the dispatch scheduler.  */
+
+static void
+do_dispatch (rtx insn, int mode)
+{
+  if (mode == DISPATCH_INIT)
+    init_dispatch_sched ();
+  else if (mode == ADD_TO_DISPATCH_WINDOW)
+    add_to_dispatch_window (insn);
+}
+
+/* Return TRUE if Dispatch Scheduling is supported.  */
+
+static bool
+has_dispatch (rtx insn, int action)
+{
+  if (ix86_tune == PROCESSOR_BDVER1 && flag_dispatch_scheduler)
+    switch (action)
+      {
+      default:
+	return false;
+
+      case IS_DISPATCH_ON:
+	return true;
+	break;
+
+      case IS_CMP:
+	return is_cmp (insn);
+
+      case DISPATCH_VIOLATION:
+	return dispatch_violation ();
+
+      case FITS_DISPATCH_WINDOW:
+	return fits_dispatch_window (insn);
+      }
+
+  return false;
+}
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory