|
PowerBASIC Forums
![]() Third-Party Addons
![]() A Very Small Fast Parser ( Assembly code) (Page 3)
|
This topic is 4 pages long: 1 2 3 4 |
next newest topic | next oldest topic |
| Author | Topic: A Very Small Fast Parser ( Assembly code) |
|
Paul Dixon Member |
Charles, I tried an alternative table solution with smaller tables but it didn't work too well so I gave up due to lack of time. I tried tweaking your code and made it 20% faster so there are still savings to be made there. If you're going to try jump tables, they don't need to be near the code. Paul. ------------------ IP: Logged |
|
Charles Pegge Member |
Eros, I am working on Paul's idea of a jump table system. It is more Meanwhile here is the existing code in SUB form.
------------------ IP: Logged |
|
Michael Mattias Member |
I just looked at the source code for the demo. While that sample (one dinky little sentence) is nicely constructed to test various data condition, it is simply too tiny to claim any kind of 'performance benchmarking' data. Why don't you test it on the "bible search" file used for the contest last year? Let's see if I can find a link to that....yup, here it is... Here's the original contest announcement, contains link to the package containing the test file: http://www.powerbasic.com/support/forums/Forum12/HTML/001733.html (I tested the download, and the file is still out there on my web site!) IP: Logged |
|
Charles Pegge Member |
Michael, I have just downloaded the zip. Thanks! The little test string was there primarily to test functionality. I have just written a pure jump-table version of the parser I'll try out the 2 versions on your text and see what happens. Perhaps there ought to be a standard measure of parsing speed: Paul, How did you manage to tweak the original for an extra 20%? ------------------ IP: Logged |
|
Brad D Byrne Member |
Charles, Impressive numbers!! Thanks! also, as far as the jump table, wouldn't it be better to access it as
also, I wouldn't be suprised if the tree type structure as your original ------------------ IP: Logged |
|
Michael Mattias Member |
quote: I'm not sure we are ready for a new unit of measure for speed, but I certainly would not object to "the standard" for text-parsing, file access and other "Hey, look how fast my text processing is!" code were to be that same bible text file. There's some sneaky data conditions in that file, too. IP: Logged |
|
Brad D Byrne Member |
just an idea, can we store the jump to label in the table, allowing us to eliminate the compare? ie.
------------------ IP: Logged |
|
Paul Dixon Member |
Charles, the extra speed from tweaking your code was straight forward. First, esi. ! mov esi,eax ' use the char as an index Why don't you just use eax in the index and then you don't need to transfer the value to esi at all saving 2 opcodes: ! mov al,[ebx+eax] Of course now you aren't using esi at all so you no longer need to push/pop it to the stack saving another 2 opcodes. Second, tblp. ! push tblp ' hold the tbl pointer on the stack until all other variable But you aren't using ebx between that first instruction and the last so why are you pushing tblp? Why not just load it into ebx as soon as ebx becomes available saving 1 opcode: '! mov ebx,tblp ' hold the tbl pointer on the stack until all other variable Third. Use the size extension instructions, they're as fast as the normal ones but can save opcodes and time: ! xor eax,eax ' clear eax for another use Delete that saving 1 opcode and load eax with ! movzx eax,byte ptr [ebp] ' get character instead. This will clear the upper part of the register for you. More importantly, you should use the movzx version of the instruction anywhere that you use move to al and you don't need upper part of the register. This is because the CPU has trouble handling a reference to any full 32-bit register immediatley after part of that register has been written to. So e.g. in your code where you have: ! mov al,[ebp] ' get character you should change it to ! movzx eax,byte ptr [ebp] ' get char Try them out and see if it improves your code speed. Paul. ------------------ IP: Logged |
|
Eros Olmi Member |
Charles, thanks a lot for SUB version. What about transforming it into a function or adding another BYREF param in order to return Token type? Delimiter, Number, String, QuotedString, ... Thanks again. ------------------ IP: Logged |
|
Charles Pegge Member |
Paul, Many thanks. My last major encounter with x86 was back in 1984, designing So all these luxury Pentium opcodes are a new experience for me. But still happy to lose a few opcodes. Incidently I found that you cant do this in PB:
For the jump-table version, I used an array of LONGs and generated the ------------------ IP: Logged |
|
Paul Dixon Member |
Charles, no speed increase? When I tried it on my Athlon 2600+ I got similar figures to your Athlon 3200, that's 23% faster. <<! fld double ptr [ebx] >> Try DWORD instead of DOUBLE. Paul. ------------------ IP: Logged |
|
Charles Pegge Member |
Eros, Paul, Michael I have posted the latest version. Managed to shed a few more opcodes but more functionality I think this has the right logistics to do useful parsing. But The code is quite heavily annotated so its too long to show here. download from: ------------------ IP: Logged |
|
Michael Mattias Member |
quote: I thought that 1/18th second number was bogus [well, the word which actually occurred to me started with 'b'] so I d/l'd the package again. It's not really doing the "whole Bible reading demo" in that your timer does not include the time it takes to open the file and load it into a string.
Ok, so this is supposed to be showing off the pure "text parsing" code, not necessarily the time it takes to actually do something, which perforce includes the time required to open the file and load to a string. But as long as I brought it up...instead of loading to a string (which takes time and resouces), since all you need is the address and length of the string data, this is an IDEAL place to use File Mapping. Mapping a file gets you the address and length of the file data (as a pointer) without the overhead (both performance and user memory resources) of creating an OLE string. And you know, it's just so darned easy to do. All you do is download this demo .... Memory-Mapped File Version of LINE INPUT .. and call "MapThisFile" and bingo, there's your address and length. MCM IP: Logged |
|
Charles Pegge Member |
File input was not the issue here - just getting the most efficient x86-friendly algorithm. Making comparative measurements.. And it's more geared towards computer scripts. Strictly speaking it should be called a lexer rather than a ------------------ IP: Logged |
|
Eros Olmi Member |
Charles, thanks a lot for your code! ------------------ IP: Logged |
This topic is 4 pages long: 1 2 3 4 All times are EasternTime (US) | next newest topic | next oldest topic |
![]() |
|
Copyright © 1999-2006 PowerBASIC, Inc. All Rights Reserved.