| Submitter | Xinliang David Li |
|---|---|
| Date | Dec. 1, 2012, 5:50 a.m. |
| Message ID | <CAAkRFZJv-CUKDju7QT2ngDFT50BpU6w7fwPQXuiZfAHycG74dw@mail.gmail.com> |
| Download | mbox | patch |
| Permalink | /patch/203112/ |
| State | New |
| Headers | show |
Comments
On Sat, Dec 1, 2012 at 6:50 AM, Xinliang David Li wrote: > 2010-11-30 Xinliang David Li <> > > * config/i386/i386.c: Allow sign extend instructions (cltd etc) > on modern CPUs. You installed the patch without the ChangeLog entry... (http://gcc.gnu.org/ml/gcc-cvs/2012-12/msg00027.html) Ciao! Steven
Fixed. thanks, David On Sat, Dec 1, 2012 at 4:08 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote: > On Sat, Dec 1, 2012 at 6:50 AM, Xinliang David Li wrote: >> 2010-11-30 Xinliang David Li <> >> >> * config/i386/i386.c: Allow sign extend instructions (cltd etc) >> on modern CPUs. > > You installed the patch without the ChangeLog entry... > (http://gcc.gnu.org/ml/gcc-cvs/2012-12/msg00027.html) > > Ciao! > Steven
Patch
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 193861) +++ config/i386/i386.c (working copy) @@ -1822,7 +1822,7 @@ static unsigned int initial_ix86_tune_fe m_K6, /* X86_TUNE_USE_CLTD */ - ~(m_PENT | m_CORE2I7 | m_ATOM | m_K6 | m_GENERIC), + ~(m_PENT | m_ATOM | m_K6), /* X86_TUNE_USE_XCHGB: Use xchgb %rh,%rl instead of rolw/rorw $8,rx. */ m_PENT4,
Compiling the following code with O2 typedef unsigned long ulong; typedef __SIZE_TYPE__ size_t; long woo_i(long a, long b) { return a/b; } GCC generates: .LFB0: .cfi_startproc movq %rdi, %rdx movq %rdi, %rax sarq $63, %rdx idivq %rsi ret but both ICC and LLVM generate smaller and faster version: movq %rdi, %rax cqto idivq %rsi ret for reference see http://www.agner.org/optimize/instruction_tables.pdf. On Pentium, the latency of the instruction is 3 cycles while on modern CPUs, the instruction has only one uOp with 1 cycle latency. The following proposed patch fixed the problem. Note that for Atom, only the CWD instruction is slow with 5 cycle latency, the rest sign extension instructions are fast -- the fix for Atom needs finer grain control and can be done separately. Ok to install after testing? 2010-11-30 Xinliang David Li <davidxl@google.com> * config/i386/i386.c: Allow sign extend instructions (cltd etc) on modern CPUs. thanks, David