diff mbox series

[v2] slof/engine.in: refine +COMP and -COMP by not using COMPILE

Message ID 20240202051548.877087-1-kconsul@linux.vnet.ibm.com
State New
Headers show
Series [v2] slof/engine.in: refine +COMP and -COMP by not using COMPILE | expand

Commit Message

Kautuk Consul Feb. 2, 2024, 5:15 a.m. UTC
Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP
as is being used by the rest of the compiler.
Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP
as that is more informative as it mirrors the way the col() macro defines
a colon definition.

Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
---
 slof/engine.in | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Kautuk Consul Feb. 12, 2024, 9:28 a.m. UTC | #1
Hi Segher/Thomas,

On 2024-02-02 00:15:48, Kautuk Consul wrote:
> Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP
> as is being used by the rest of the compiler.
> Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP
> as that is more informative as it mirrors the way the col() macro defines
> a colon definition.
> 
> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> ---
>  slof/engine.in | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/slof/engine.in b/slof/engine.in
> index 59e82f1..fa4d82e 100644
> --- a/slof/engine.in
> +++ b/slof/engine.in
> @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE)
>  col(COMPILE R> CELL+ DUP @ COMPILE, >R)
> 
>  var(THERE 0)
> -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
> -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)
> +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,)
> +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE)

What do you think about the above changes ?
Are there any more changes I could do to this patch ?
Or if you want to reject can you tell me why exactly ?

> 
>  // Structure words.
>  col(RESOLVE-ORIG HERE OVER CELL+ - SWAP !)
> -- 
> 2.31.1
>
Kautuk Consul Feb. 22, 2024, 7:38 a.m. UTC | #2
Hi Segher/Thomas,

On 2024-02-02 00:15:48, Kautuk Consul wrote:
> Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP
> as is being used by the rest of the compiler.
> Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP
> as that is more informative as it mirrors the way the col() macro defines
> a colon definition.
> 
> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> ---
>  slof/engine.in | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/slof/engine.in b/slof/engine.in
> index 59e82f1..fa4d82e 100644
> --- a/slof/engine.in
> +++ b/slof/engine.in
> @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE)
>  col(COMPILE R> CELL+ DUP @ COMPILE, >R)
> 
>  var(THERE 0)
> -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
> -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)
> +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,)
> +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE)
> 
Did you get time to review this simple patch ?
Is there something wrong in it or the description ?
>  // Structure words.
>  col(RESOLVE-ORIG HERE OVER CELL+ - SWAP !)
> -- 
> 2.31.1
>
Thomas Huth Feb. 22, 2024, 9:04 a.m. UTC | #3
On 22/02/2024 08.38, Kautuk Consul wrote:
> Hi Segher/Thomas,
> 
> On 2024-02-02 00:15:48, Kautuk Consul wrote:
>> Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP
>> as is being used by the rest of the compiler.
>> Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP
>> as that is more informative as it mirrors the way the col() macro defines
>> a colon definition.
>>
>> Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
>> ---
>>   slof/engine.in | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/slof/engine.in b/slof/engine.in
>> index 59e82f1..fa4d82e 100644
>> --- a/slof/engine.in
>> +++ b/slof/engine.in
>> @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE)
>>   col(COMPILE R> CELL+ DUP @ COMPILE, >R)
>>
>>   var(THERE 0)
>> -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
>> -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)
>> +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,)
>> +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE)
>>
> Did you get time to review this simple patch ?
> Is there something wrong in it or the description ?

  Hi Kautuk,

could you maybe do some performance checks to see whether this make a 
difference (e.g. by running the command in a tight loop many times)?
You can use "tb@" to get the current value of the timebase counter, so 
reading that before and after the loop should provide you with a way of 
measuring the required time.

  Thomas
Kautuk Consul Feb. 22, 2024, 11:46 a.m. UTC | #4
Hi Thomas,

> 
>  Hi Kautuk,
> 
> could you maybe do some performance checks to see whether this make a
> difference (e.g. by running the command in a tight loop many times)?
> You can use "tb@" to get the current value of the timebase counter, so
> reading that before and after the loop should provide you with a way of
> measuring the required time.
> 
>  Thomas
> 
This patch is to improve compilation timings of the 
IF/AHEAD/THEN/CASE/ENDCASE/BEGIN/AGAIN/UNTIL/DO/?DO/LOOP/+LOOP/ Forth words
that are NOT within any Forth procedure.
And it does this in the same way for all of these Forth words because
all of these words simply utilize the +COMP and -COMP words.

I created a patch on top of this patch file that introduces the older
implementation of IF and THEN and I called them IF2 and THEN2 as
follows:
col(+COMP-BEFORE STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
col(-COMP-BEFORE -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE SEMICOLON THERE @ DOTO HERE COMP-BUFFER EXECUTE)
imm(IF2 +COMP-BEFORE DOTICK DO0BRANCH COMPILE, HERE 0 COMPILE,)
imm(THEN2 ?COMP RESOLVE-ORIG -COMP-BEFORE)

The IF2 and THEN2 use -COMP-BEFORE and +COMP-BEFORE in order to have the
changes before I applied my "[PATCH v2] slof/engine.in: refine +COMP and -COMP by not using"
patch file.

Now that I have both implementation, I used the timebase in order to
test what is the difference in timebase before and after invocation of
numerous IF-THEN and IF2-THEN2 Forth words. I made the following changes
to ./board-qemu/slof/OF.fs:
diff --git a/board-qemu/slof/OF.fs b/board-qemu/slof/OF.fs
index df33c80..56805fc 100644
--- a/board-qemu/slof/OF.fs
+++ b/board-qemu/slof/OF.fs
@@ -22,6 +22,7 @@ hex
 
 #include "base.fs"
 
+
 \ Set default load-base to 0x4000
 4000 to default-load-base
 
@@ -329,6 +330,151 @@ check-boot-from-ram
 
 8ff cp
 
+." BEFORE-PATCH: BEFORE TB is: " tb@ .
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+cr ." BEFORE-PATCH: AFTER TB is: " tb@ . cr
+
+." AFTER-PATCH: BEFORE TB is: " tb@ .
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr
+
+." AFTER-PATCH: BEFORE TB is: " tb@ .
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr

With the above changes in slof/engine.in and board-qemu/slof/OF.fs I
complied SLOF and got the following output on running a guest:
[root@r223l performance_work]# virsh start vm4 --console                
Domain 'vm4' started
Connected to domain 'vm4'
Escape character is ^] (Ctrl + ])
Populating /vdevice methods
Populating /vdevice/vty@30000000
Populating /vdevice/nvram@71000000
Populating /pci@800000020000000
                     00 0800 (D) : 1b36 000d    serial bus [ usb-xhci ]
                     00 1000 (D) : 1af4 1003    virtio [ serial ]
                     00 1800 (D) : 1af4 1001    virtio [ block ]
                     00 2000 (D) : 1af4 1002    legacy-device*
                     00 2800 (D) : 1234 1111    qemu vga
No NVRAM common partition, re-initializing...
Installing QEMU fb



Scanning USB
  XHCI: Initializing
    USB Keyboard
No console specified using screen & keyboard
BEFORE-PATCH: BEFORE TB is: 9de978a1
BEFORE-PATCH: AFTER TB is: 9e78efba
AFTER-PATCH: BEFORE TB is: 9ebb67aa
AFTER-PATCH: AFTER TB is: 9f2247cc
AFTER-PATCH: BEFORE TB is: 9f64b9fd
AFTER-PATCH: AFTER TB is: 9fc33e6c

  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci@800000020000000/scsi@3 ...   Successfully loaded

[root@r223l performance_work]# echo $((0x9e78efba-0x9de978a1))
9402137
[root@r223l performance_work]# echo $((0x9f2247cc-0x9ebb67aa))                                                                                                                                             
6742050
[root@r223l performance_work]# echo $((0x9fc33e6c-0x9f64b9fd))                                                                                                                                             
6194287
[root@r223l performance_work]# echo "scale=4;(9402137-6742050)/512" | bc                                                                                                                                   
5195.4824
[root@r223l performance_work]# echo "scale=4;(9402137-6194287)/512" | bc                                                                                                                                   
6265.3320
[root@r223l performance_work]#

As per the calculations in the output of the BEFORE-PATCH and
AFTER-PATCH logs I find that there is a very noticeable and consistent improvement
in multiple runs in terms of microseconds. (My POWER9 bare-metal has 512 MHz timebase-frequency
so thats why I am dividing by 512).

Note: The above figures include the execution speed of IF-THEN and
IF2-THEN2 after compilation. But since the actual execution speeds of the IF-THEN and the IF2-THEN2 after
their compilation should be the same, this should get adjusted in my
subtraction in the above 2 bc commands.
>
Kautuk Consul Feb. 23, 2024, 5:59 a.m. UTC | #5
Just to clarify one last point:

On 2024-02-22 17:16:33, Kautuk Consul wrote:
> Hi Thomas,
> 
> > 
> >  Hi Kautuk,
> > 
> > could you maybe do some performance checks to see whether this make a
> > difference (e.g. by running the command in a tight loop many times)?
Running a single loop many times will not expose much because that loop
(which is NOT within a Forth colon subroutine) will compile only once.
In my performance benchmarking with tb@ I have put 45 IF-THEN and
IF2-THEN2 control statements that will each compile once and reveal the
difference in compilation speeds.

> > You can use "tb@" to get the current value of the timebase counter, so
> > reading that before and after the loop should provide you with a way of
> > measuring the required time.
> > 
> >  Thomas
> >
Segher Boessenkool Feb. 23, 2024, 8:57 p.m. UTC | #6
On Mon, Feb 12, 2024 at 02:58:46PM +0530, Kautuk Consul wrote:
> Hi Segher/Thomas,
> 
> On 2024-02-02 00:15:48, Kautuk Consul wrote:
> > Use the standard "DOTICK <word> COMPILE," mechanism in +COMP and -COMP
> > as is being used by the rest of the compiler.
> > Also use "SEMICOLON" instead of "EXIT" to compile into HERE in -COMP
> > as that is more informative as it mirrors the way the col() macro defines
> > a colon definition.
> > 
> > Signed-off-by: Kautuk Consul <kconsul@linux.vnet.ibm.com>
> > ---
> >  slof/engine.in | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/slof/engine.in b/slof/engine.in
> > index 59e82f1..fa4d82e 100644
> > --- a/slof/engine.in
> > +++ b/slof/engine.in
> > @@ -422,8 +422,8 @@ imm(.( LIT(')') PARSE TYPE)
> >  col(COMPILE R> CELL+ DUP @ COMPILE, >R)
> > 
> >  var(THERE 0)
> > -col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
> > -col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)
> > +col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,)
> > +col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE)
> 
> What do you think about the above changes ?
> Are there any more changes I could do to this patch ?
> Or if you want to reject can you tell me why exactly ?

I think both changes are bad.  They reduce abstraction, for no reason at
all.

If you think the compiler should inline more, or do better optimisations
even, work on *that*, don't do one unimportant case of it manually.

I never made the indirect threading engine in Paflof faster, because it
was plenty fast already.  In SLOF, almost everything is compiled at
runtime, and if it is important to speed that up there are some well-
known usual caching tricks to make things *factors* faster.  The main
focus points for SLOF were to have an engine that is easily adapted for
different purposes (and it was! Ask me about it :-) ), and to have
things using it as debuggable as possible (you really need some hardware
debugging thing to make it real easy; I had one back then.  You need to
be able to look at all memory state after a stop (a crash, perhaps), and
seeing all CPU registers is useful as well.

If you want to improve engine.in, get rid of it completely?  Make the
whol thing cross-compile perhaps.  Everything from source code.  The
engine.in thing is essentially an already compiled thing (but not
relocated yet, not fixed to some address), which is still in mostly
obvious 1-1 correspondence to it source code, which can be easily
"uncompiled" as well.  Like:

col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)

: +comp  ( -- )
  state @  1 state +!  IF exit THEN
  here there !
  comp-buffer to here
  compile docol ;
: -comp ( -- )
  -1 state +!
  state @ IF exit THEN
  compile exit
  there @ to here
  comp-buffer execute ;

"['] semicolon compile," is not something a user would ever write.  A
user would write "compile exit".  It is standard Forth, it works
anywhere.  It is much more idiomatic..


Segher
Segher Boessenkool Feb. 23, 2024, 9:04 p.m. UTC | #7
On Fri, Feb 23, 2024 at 11:29:23AM +0530, Kautuk Consul wrote:
> > > difference (e.g. by running the command in a tight loop many times)?
> Running a single loop many times will not expose much because that loop
> (which is NOT within a Forth colon subroutine) will compile only once.
> In my performance benchmarking with tb@ I have put 45 IF-THEN and
> IF2-THEN2 control statements that will each compile once and reveal the
> difference in compilation speeds.

All of this is only for things compiled in interpretation mode anyway.
Even how you get the source code in (read it from a slow flash rom in
the best case!) dominates performance.

You do not write things in Forth because it is perfect speed.  Write
things directly in machine code if you want that, or in another high-
level language that emphasises optimal execution speed.  The good things
about Forth are rapid prototyping, immediate testing of all code you
write, simple compact code, that kind of goodness.  Ideal for (system)
firmware!


Segher
Kautuk Consul Feb. 26, 2024, 4:32 a.m. UTC | #8
On 2024-02-23 15:04:56, Segher Boessenkool wrote:
> On Fri, Feb 23, 2024 at 11:29:23AM +0530, Kautuk Consul wrote:
> > > > difference (e.g. by running the command in a tight loop many times)?
> > Running a single loop many times will not expose much because that loop
> > (which is NOT within a Forth colon subroutine) will compile only once.
> > In my performance benchmarking with tb@ I have put 45 IF-THEN and
> > IF2-THEN2 control statements that will each compile once and reveal the
> > difference in compilation speeds.
> 
> All of this is only for things compiled in interpretation mode anyway.
> Even how you get the source code in (read it from a slow flash rom in
> the best case!) dominates performance.
> 
> You do not write things in Forth because it is perfect speed.  Write
> things directly in machine code if you want that, or in another high-
> level language that emphasises optimal execution speed.  The good things
> about Forth are rapid prototyping, immediate testing of all code you
> write, simple compact code, that kind of goodness.  Ideal for (system)
> firmware!
> 
> 
> Segher

Yes, but SLOF will be there on the product we sell to our customers.
Considering that there is a noticeable improvement in performance I just
thought maybe IBM management would be interested in it.
Kautuk Consul Feb. 26, 2024, 4:39 a.m. UTC | #9
Hi,

On 2024-02-23 14:57:23, Segher Boessenkool wrote:
> 
> I think both changes are bad.  They reduce abstraction, for no reason at
> all.
> 
> If you think the compiler should inline more, or do better optimisations
> even, work on *that*, don't do one unimportant case of it manually.
> 
> I never made the indirect threading engine in Paflof faster, because it
> was plenty fast already.  In SLOF, almost everything is compiled at
> runtime, and if it is important to speed that up there are some well-
> known usual caching tricks to make things *factors* faster.  The main
> focus points for SLOF were to have an engine that is easily adapted for
> different purposes (and it was! Ask me about it :-) ), and to have
> things using it as debuggable as possible (you really need some hardware
> debugging thing to make it real easy; I had one back then.  You need to
> be able to look at all memory state after a stop (a crash, perhaps), and
> seeing all CPU registers is useful as well.
> 
> If you want to improve engine.in, get rid of it completely?  Make the
> whol thing cross-compile perhaps.  Everything from source code.  The
> engine.in thing is essentially an already compiled thing (but not
> relocated yet, not fixed to some address), which is still in mostly
> obvious 1-1 correspondence to it source code, which can be easily
> "uncompiled" as well.  Like:

:-). Getting rid of it completely and making the whole thing
cross-compile would require more time that I'm not so sure that I or
even my manager would be able to spare in our project.

> 
> col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
> col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)
> 
> : +comp  ( -- )
>   state @  1 state +!  IF exit THEN
>   here there !
>   comp-buffer to here
>   compile docol ;
> : -comp ( -- )
>   -1 state +!
>   state @ IF exit THEN
>   compile exit
>   there @ to here
>   comp-buffer execute ;
> 
> "['] semicolon compile," is not something a user would ever write.  A
> user would write "compile exit".  It is standard Forth, it works
> anywhere.  It is much more idiomatic..

Okay, I can accept the fact that maybe we should use EXIT instead of
SEMICOLON. But at least can we remove the invocation of the "COMPILE"
keyword in +COMP and -COMP ? The rest of the compiler in slof/engine.in
uses the standard "DOTICK <word> COMPILE," format so why cannot we use
this for -COMP as well as +COMP ?


> 
> 
> Segher
Kautuk Consul Feb. 26, 2024, 4:43 a.m. UTC | #10
Hi Thomas,

On 2024-02-26 10:02:55, Kautuk Consul wrote:
> On 2024-02-23 15:04:56, Segher Boessenkool wrote:
> > On Fri, Feb 23, 2024 at 11:29:23AM +0530, Kautuk Consul wrote:
> > > > > difference (e.g. by running the command in a tight loop many times)?
> > > Running a single loop many times will not expose much because that loop
> > > (which is NOT within a Forth colon subroutine) will compile only once.
> > > In my performance benchmarking with tb@ I have put 45 IF-THEN and
> > > IF2-THEN2 control statements that will each compile once and reveal the
> > > difference in compilation speeds.
> > 
> > All of this is only for things compiled in interpretation mode anyway.
> > Even how you get the source code in (read it from a slow flash rom in
> > the best case!) dominates performance.
> > 
> > You do not write things in Forth because it is perfect speed.  Write
> > things directly in machine code if you want that, or in another high-
> > level language that emphasises optimal execution speed.  The good things
> > about Forth are rapid prototyping, immediate testing of all code you
> > write, simple compact code, that kind of goodness.  Ideal for (system)
> > firmware!
> > 
> > 
> > Segher
> 
> Yes, but SLOF will be there on the product we sell to our customers.
> Considering that there is a noticeable improvement in performance I just
> thought maybe IBM management would be interested in it.

On this note, what did you also try to understand the performance
implications of my patch ? What improvements did you observer on your
set up?
Kautuk Consul Feb. 26, 2024, 6:47 a.m. UTC | #11
On 2024-02-26 10:13:52, Kautuk Consul wrote:
> Hi Thomas,
> 
> On 2024-02-26 10:02:55, Kautuk Consul wrote:
> > On 2024-02-23 15:04:56, Segher Boessenkool wrote:
> > > On Fri, Feb 23, 2024 at 11:29:23AM +0530, Kautuk Consul wrote:
> > > > > > difference (e.g. by running the command in a tight loop many times)?
> > > > Running a single loop many times will not expose much because that loop
> > > > (which is NOT within a Forth colon subroutine) will compile only once.
> > > > In my performance benchmarking with tb@ I have put 45 IF-THEN and
> > > > IF2-THEN2 control statements that will each compile once and reveal the
> > > > difference in compilation speeds.
> > > 
> > > All of this is only for things compiled in interpretation mode anyway.
> > > Even how you get the source code in (read it from a slow flash rom in
> > > the best case!) dominates performance.
> > > 
> > > You do not write things in Forth because it is perfect speed.  Write
> > > things directly in machine code if you want that, or in another high-
> > > level language that emphasises optimal execution speed.  The good things
> > > about Forth are rapid prototyping, immediate testing of all code you
> > > write, simple compact code, that kind of goodness.  Ideal for (system)
> > > firmware!
> > > 
> > > 
> > > Segher
> > 
> > Yes, but SLOF will be there on the product we sell to our customers.
> > Considering that there is a noticeable improvement in performance I just
> > thought maybe IBM management would be interested in it.
> 
> On this note, what did you also try to understand the performance
> implications of my patch ? What improvements did you observer on your
> set up?
One strange thing I noticed is that when I copy paste the same IF-THEN
or IF2-THEN2 lines below the ones that I already have in my OF.fs, the
timebase seems to keep reducing for some strange reason. The number of
IF-THEN and IF2-THEN2 statements are the exact same.
Kautuk Consul Feb. 26, 2024, 6:59 a.m. UTC | #12
One change:

On 2024-02-26 12:17:54, Kautuk Consul wrote:
> One strange thing I noticed is that when I copy paste the same IF-THEN
> or IF2-THEN2 lines below the ones that I already have in my OF.fs, the
> timebase seems to keep reducing for some strange reason. The number of
> IF-THEN and IF2-THEN2 statements are the exact same.

It seems it is not a good idea to put the before patch and after patch
statements together as the timebase difference keeps decreasing. So then
I ran the VM once with the "BEFORE-PATCH" and "AFTER-PATCH" changes and
I find that for 45 lines of IF-THEN and IF2-THEN2 the performance
improvement is aroung 35 microseconds only. Per IF-THEN pair then the
improvement is only in nanoseconds.
Kautuk Consul March 7, 2024, 6:33 a.m. UTC | #13
Hi Segher,

> > If you want to improve engine.in, get rid of it completely?  Make the
> > whol thing cross-compile perhaps.  Everything from source code.  The
> > engine.in thing is essentially an already compiled thing (but not
> > relocated yet, not fixed to some address), which is still in mostly
> > obvious 1-1 correspondence to it source code, which can be easily
> > "uncompiled" as well.  Like:
> 
> :-). Getting rid of it completely and making the whole thing
> cross-compile would require more time that I'm not so sure that I or
> even my manager would be able to spare in our project.
> 
> > 
> > col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
> > col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)
> > 
> > : +comp  ( -- )
> >   state @  1 state +!  IF exit THEN
> >   here there !
> >   comp-buffer to here
> >   compile docol ;
> > : -comp ( -- )
> >   -1 state +!
> >   state @ IF exit THEN
> >   compile exit
> >   there @ to here
> >   comp-buffer execute ;
> > 
> > "['] semicolon compile," is not something a user would ever write.  A
> > user would write "compile exit".  It is standard Forth, it works
> > anywhere.  It is much more idiomatic..
> 
> Okay, I can accept the fact that maybe we should use EXIT instead of
> SEMICOLON. But at least can we remove the invocation of the "COMPILE"
> keyword in +COMP and -COMP ? The rest of the compiler in slof/engine.in
> uses the standard "DOTICK <word> COMPILE," format so why cannot we use
> this for -COMP as well as +COMP ?
> 
Do you agree with the above reasoning as well as the fact that I think
we would all here (in the KVM team) appreciate even this small
improvement in performance ?
Can I send a v3 patch with the "DOTICK EXIT COMPILE," "DOTICK DOCOL COMPILE," changes ?
Kautuk Consul March 18, 2024, 5:12 a.m. UTC | #14
Hi Segher,
> 
> > > If you want to improve engine.in, get rid of it completely?  Make the
> > > whol thing cross-compile perhaps.  Everything from source code.  The
> > > engine.in thing is essentially an already compiled thing (but not
> > > relocated yet, not fixed to some address), which is still in mostly
> > > obvious 1-1 correspondence to it source code, which can be easily
> > > "uncompiled" as well.  Like:
> > 
> > :-). Getting rid of it completely and making the whole thing
> > cross-compile would require more time that I'm not so sure that I or
> > even my manager would be able to spare in our project.
> > 
> > > 
> > > col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
> > > col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)
> > > 
> > > : +comp  ( -- )
> > >   state @  1 state +!  IF exit THEN
> > >   here there !
> > >   comp-buffer to here
> > >   compile docol ;
> > > : -comp ( -- )
> > >   -1 state +!
> > >   state @ IF exit THEN
> > >   compile exit
> > >   there @ to here
> > >   comp-buffer execute ;
> > > 
> > > "['] semicolon compile," is not something a user would ever write.  A
> > > user would write "compile exit".  It is standard Forth, it works
> > > anywhere.  It is much more idiomatic..
> > 
> > Okay, I can accept the fact that maybe we should use EXIT instead of
> > SEMICOLON. But at least can we remove the invocation of the "COMPILE"
> > keyword in +COMP and -COMP ? The rest of the compiler in slof/engine.in
> > uses the standard "DOTICK <word> COMPILE," format so why cannot we use
> > this for -COMP as well as +COMP ?
> > 
Do you all agree with the above reasoning as well as the fact that I think
we would all here (in the KVM team) appreciate even this small
improvement in performance ?
Can I send a v3 patch with the "DOTICK EXIT COMPILE," "DOTICK DOCOL COMPILE," changes ?
diff mbox series

Patch

diff --git a/slof/engine.in b/slof/engine.in
index 59e82f1..fa4d82e 100644
--- a/slof/engine.in
+++ b/slof/engine.in
@@ -422,8 +422,8 @@  imm(.( LIT(')') PARSE TYPE)
 col(COMPILE R> CELL+ DUP @ COMPILE, >R)
 
 var(THERE 0)
-col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
-col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE EXIT THERE @ DOTO HERE COMP-BUFFER EXECUTE)
+col(+COMP STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE DOTICK DOCOL COMPILE,)
+col(-COMP -1 STATE +! STATE @ 0BRANCH(1) EXIT DOTICK SEMICOLON COMPILE, THERE @ DOTO HERE COMP-BUFFER EXECUTE)
 
 // Structure words.
 col(RESOLVE-ORIG HERE OVER CELL+ - SWAP !)