|
PowerBASIC Forums
![]() Third-Party Addons
![]() A Very Small Fast Parser ( Assembly code) (Page 4)
|
This topic is 4 pages long: 1 2 3 4 |
next newest topic | next oldest topic |
| Author | Topic: A Very Small Fast Parser ( Assembly code) |
|
Michael Mattias Member |
>File input was not the issue here I could not help myself. This is simply a golden opportunity to extol the virtues of memory-mapped files! IP: Logged |
|
Charles Pegge Member |
New: Fast Parser with Word List (PB Assembler) Verily, this code addeth further functionality by enabling the use As before, it canst read the entire Bible in about 50-70 milliseconds (posting 19 Aug 2006) http://www.forum.it-berater.org/index.php?topic=158.0 ------------------ [This message has been edited by Charles Pegge (edited August 19, 2006).] IP: Logged |
|
Paul Squires Member |
Wow! I have a HUGE need for your ParserWords code. I am extremely impressed with the speed at which it parses the code. It is almost unbelievable. Thanks a million for sharing this with us. I greatly appreciated it ![]() ------------------ IP: Logged |
|
Charles Pegge Member |
Thanks Paul, Always glad to help. It's been a long time since I did assembler, but I am getting the If you have any ideas about semantics checking, which could be ------------------ [This message has been edited by Charles Pegge (edited August 20, 2006).] IP: Logged |
|
Nick Luick Member |
Charles, I keep a working 20meg working ram disk, so exection & bible read 247 asm, 83 bas, 330 msec total barely let the enter key up and it was over. on a P3-500mz win98 256 ram ------------------ IP: Logged |
|
Paul Dixon Member |
Charles, I haven't read through the whole of your code but if you want it to be a little faster.. The LOOP opcode is slow compared to coding the same thing yourself. You should try to avoid using it.
Unfortunately, the PB compiler doesn't recognise the PREFETCH opcodes so you have to assemble them by hand which can be a bit of a nuisance. Paul. ------------------ IP: Logged |
|
Charles Pegge Member |
Paul, I've just tried removing the LOOP instruction, replacing it with Couldnt get PREFETCH to improve performance but got it to slow I need to check this further.
[This message has been edited by Charles Pegge (edited August 22, 2006).] IP: Logged |
|
Charles Pegge Member |
Paul, I confirm that LOOP takes a lot longer than DEC ECX: JNZ short .. In an empty loop with 2gig repeats, the LOOP instruction took I also found The best way to get a speed improvement is to run ------------------ IP: Logged |
|
Michael Mattias Member |
quote: Um, don't the processor chip people publish tables of "clocks per instruction" anymore? IP: Logged |
|
Charles Pegge Member |
Not on the Intel Architecture Software Developer’s Manual. Too many CPUs, Clock speeds, architectures. Not like it used to be. Easier just to test ------------------ IP: Logged |
|
Eddy Van Esch Member |
quote: Due to pipelining and other optimisations, the sum of the clock cycles does not equal the amount of clock cycles needed to execute a series of commands. Kind regards [This message has been edited by Eddy Van Esch (edited August 24, 2006).] IP: Logged |
|
Charles Pegge Member |
For a mind-bending discourse on what a Pentium can get up to: http://www.intel.com/design/pentiumii/manuals/24319202.pdf (Volume 3) Chapter 14. Optimization. Cacheing, pipelining, parallel execution, branch prediction. And ------------------ PS [This message has been edited by Charles Pegge (edited August 24, 2006).] IP: Logged |
|
Eddy Van Esch Member |
quote: Indeed. You can read there why 2 + 2 sometimes equals 3 .. ![]() BTW you can order that book (and the other 4 volumes) as hardcopy for free at Intel on simple request. Kind regards ------------------ IP: Logged |
|
Paul Dixon Member |
Charles, I can't get much more speed out of this without spending a lot of time making sure every jump target is aligned, and that's rarely worth the effort. I did get a little improvement using a PREFETCH but it was only about 2% and it messed up code alignment elsewhere so it's probably not worth it. Two things I did notice: 2) your timing uses GetTickCount which has a resolution of only 15-16ms. Since you're timing things in the order of a few tens of ms then you really need to use a better resolution timer such as the Performance Counters. Here's a simple example of how to use it: #INCLUDE "win32api.inc" There are some instruction latency numbers in the Athlon Optimisation Manual: www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf There are more, including Pentium here, plus lots of other information:
If you read the Athlon Optimisation Guide it lists the LOOP instruction as having a latency of 8clks and it's a vector decode instruction which stops other instructions executing in parallel.
Paul. ------------------ IP: Logged |
|
Charles Pegge Member |
Many thanks Paul for the info and your investigations. That AMD Volume on optimization is very useful, especially One further area of optimization is to exploit parallel execution At first glance, it seems that there are no further opportunities But here is a test rig for investigating the effects of both The absolute clock count is not significant, but is good for
------------------ [This message has been edited by Charles Pegge (edited August 25, 2006).] IP: Logged |
This topic is 4 pages long: 1 2 3 4 All times are EasternTime (US) | next newest topic | next oldest topic |
![]() |
|
Copyright © 1999-2006 PowerBASIC, Inc. All Rights Reserved.