|
PowerBASIC Forums
![]() PowerBASIC for Windows
![]() Dwords are bad m'kay?...
|
| next newest topic | next oldest topic |
| Author | Topic: Dwords are bad m'kay?... |
|
Mike Joseph Member |
... well dwords seem to be quite fast when used with SHIFT LEFT and SHIFT RIGHt. But otherwise, they seem to be very slow. I've been spending quite a few hours today and yesterday trying to optimize a piece of code which I had translated over from C source. The C source was about roughly 6x as fast as mine and I couldnt figure out why. I was using DWords in every case where they were using "unsigned longs" When compiled, my app was taking ~40 seconds to go from start to finish, the C app was taking ~7 seconds. After converting all my Dwords to Longs, my code is now ~11 seconds which is much better. Im going to make a few more changes and hopefully I can catch or even beat 7 seconds.
Here are the results to a few tests I was running. If you can give any insight into why some of the numbers are the way they are (some of which seem counter intuitive) please do! --------------------------------------- FUNCTION PBMAIN() Itterations = 50000000 lngTime = TIMER MSGBOX FORMAT$(Itterations,"#,###,###") & " itterations in " & FORMAT$(TIMER - lngTime,"####.##") & " seconds." END FUNCTION FUNCTION PBMAIN() Itterations = 50000000 lngTime = TIMER MSGBOX FORMAT$(Itterations,"#,###,###") & " itterations in " & FORMAT$(TIMER - lngTime,"####.##") & " seconds." END FUNCTION FUNCTION PBMAIN() Itterations = 5000000 lngTime = TIMER MSGBOX FORMAT$(Itterations,"#,###,###") & " itterations in " & FORMAT$(TIMER - lngTime,"####.##") & " seconds." END FUNCTION FUNCTION PBMAIN() Itterations = 50000000 lngTime = TIMER MSGBOX FORMAT$(Itterations,"#,###,###") & " itterations in " & FORMAT$(TIMER - lngTime,"####.##") & " seconds." END FUNCTION --------------------------------------------- FUNCTION PBMAIN() Itterations = 50000000 lngTime = TIMER MSGBOX FORMAT$(Itterations,"#,###,###") & " itterations in " & FORMAT$(TIMER - lngTime,"####.##") & " seconds." END FUNCTION ----------------------------------------------------- CONCLUSIONS Thanks! [This message has been edited by Mike Joseph (edited April 01, 2000).] IP: Logged |
|
Enoch S Ceshkovsky unregistered |
quote: 1. It's bad for performance of your app and other apps running. 2. Yep 3. To put it simply, 2 extra copy operations per SHIFT call. SHIFT is a function and wants a variable pointer in Ebx. So the compiler creates a hidden LOCAL LONG, copies ESI/EDI into the long, calls SHIFT, copies the long back into ESI/EDI. 4. sqr is a floating. Power uses a function. (see #3) I'm not too good with ASM, but I think PB thinks of DWORDs as floating point numbers trapped in an integer body. [This message has been edited by Enoch S Ceshkovsky (edited April 02, 2000).] IP: Logged |
|
Mike Joseph Member |
Thanks for the feeback. I think you are right about DWord performance being a bug in the compiler. I ran the same 5 tests below against QUAD integers and even QUAD integers are significantly faster than DWORDS in 3 of the 5 tests. Test1: XOR, AND, OR Test2: SHIFT LEFT, SHIFT RIGHT Test3: MULTIPLY, DIVIDE TEST4: ADDITIONS, SUBTRACTION TEST5: SQUARE ROOTS AND EXPONENTS So if you need integers larger than LONG type and you dont plan on using SHIFT and are mostly doing multiplies/divides, bitwise operations with very few additions/subtractions, then you are better off going with QUAD integers over DWORDs. -Mike ------------------ [This message has been edited by Mike Joseph (edited April 03, 2000).] IP: Logged |
|
Enoch S Ceshkovsky unregistered |
It appears PTR's are affected by this also, but worse they have to call to PB runtime functions that do the dword->quad->float, float->quad->dword. Quad should be faster because it doesnt have the extra step of dword->quad. Regardless, PB shouldn't be using floating math with a quad, dword, or a pointer. [This message has been edited by Enoch S Ceshkovsky (edited April 03, 2000).] IP: Logged |
|
Ron Pierce Member |
Mike, try using inline asm to shift bits. I found a while ago that there was something about PowerBASIC's Shift staement that needed optimizing. Hopefully they'll change the Shift statement to a function which would permit more efficient coding. 'SHIFT LEFT TestVariable1, 8& 'SHIFT LEFT TestVariable2, 8& !shl TestVariable1, 8& !shl TestVariable2, 8& 'SHIFT RIGHT TestVariable1, 8& 'SHIFT RIGHT TestVariable2, 8& !shr TestVariable1, 8& !shr TestVariable2, 8& Ron IP: Logged |
|
Mike Joseph Member |
Awesome, thank you! You made my day. The code ive been working on i was able improve the performance of processing a simulated 4 meg file to ~6.3 seconds. Switching to the inline assembly for shifting just dropped it down to ~4.8 seconds! I guess you really can squeeze blood out of a turnip The VC6 compiled code I'm using for comparison is churning out ~2.9seconds (originally i thought it was ~7seconds, but i was seriously disappointed to find out i was compiling a debug version.. when i switched it to Release configuration, it improved from 7 to 2.9) The LCC-Win32 compiler produces an exe that can process the file in ~4.9 seconds which makes the PB version faster by about .1 seconds. Whats interesting is, i used the _asm shr x,8; in place of the x >>= 8; in both the VC compiled version and the LCC-Win32 versions and the performance stayed pretty much the same. Guess now i'm going to have to start learnging asm in my spare time ------------------ IP: Logged |
|
Ron Pierce Member |
Mike, declare the same register variables in C and see if the results are different. Were you able to do inline asm in LCC? Ron IP: Logged |
|
Enoch S Ceshkovsky unregistered |
Another comment to make here.
The first isnt very fair to PB if you want to compare it against VC6. ![]() ------------------ IP: Logged |
|
Mike Joseph Member |
Enoch, i did update the position of the timer as you suggested. Results seem more or less the same. Unless the change is at least .1 second worth its hard to notice since even during seperate time tests on the same exact .exe results in variations of +/- 0.2 seconds (seems OS timing related) Ron, in my PB code im already using register variables but I wasnt in the C version. I did a couple of tests on the C version compiled with VC and i was able to improve it from 2.9 to just about 2.09. Using register variables with the LCC-win32 compiled version didnt seem to help at all. Regarding the use of _asm in the LCC version, i was in error. I in fact i had to comment those out to get it to compile. I've added !xor assembly operators in place of some of the basic XOR operator and now my PB version is ~4.3 seconds which is better than the LCC version by .7 seconds. I have a few other XOR location's id like to do this to, but Im not sure how to get it to work with a Pointer. Any ideas? Here is the line I'd like to convert to asm. X = X XOR @data.P(i) Is there a faster way to do this then the following? temp = @data.P(i) It also seems that !xor in PB only works if either the source or the destination is a register variable... the AND operator seems to work with both. Also, I have used asm on the AND operator and have replaced the following line: d = x AND 255 REM this is the original line with an asm version which looks as follows !and x,255 REM this version requires 3 lines of code This gives me a speed boost, but because the results are stored in the x variable, it needs to be restored since i must retain the value of x. Is there a way to avoid this? -Mike ------------------ IP: Logged |
|
Enoch S Ceshkovsky unregistered |
Valid for XOR,OR,AND:
Here's a website you might find useful: http://www.ece.uiuc.edu/ece291/books/artofasm.html There are also ASM tutorials for PB programmers, if you search the Programming Forum. [This message has been edited by Enoch S Ceshkovsky (edited April 03, 2000).] IP: Logged |
|
Mike Joseph Member |
Cool, thanks Enoch. -Mike ------------------ IP: Logged |
|
Steve Hutchesson Member |
I have got much the same type of results as Ron Pierce with the difference between using inline asm instructions and the intrinsic PB functions. The times below are specific to my own computer which is a 600 PIII so if you have either a faster or slower box, just alter the loop count so that the minimum duration is over a half a second and you will get results within a percent or so. I have constructed the loop in asm to remove any interaction with In the 3 comparisons, the shl/shr instructions using a register are a lot The analysis by Enoch makes this slower operation clear as it is doing The speed difference between a global variable and a local variable has Another factor from the Intel documentation is the order and size of stack LOCAL var1 as LONG If you need to control the alignment, there is a trick in PowerBASIC where LOCAL dummy as My_DWORD_Aligned_Dummy_UDT I tested the speed diference with Do loops and For loops and it seems that If the loop is truly speed critical, construct the loop using a label and Regards, hutch@pbq.com.au
------------------ IP: Logged |
|
Enoch S Ceshkovsky unregistered |
I wonder if any of the PB staff have a comment about why DWORDs & QUADs are using floating math... ------------------ IP: Logged |
|
Dave Navarro Member |
I don't know why DWORD's are using floating point, but the reason that QUADs (64-bit) using floating point is because the last time I looked, Intel wasn't shipping a 64-bit CPU yet. --Dave ------------------ IP: Logged |
|
Enoch S Ceshkovsky unregistered |
quote: Cute ![]() Quads can be manipulated faster with integer math, but it's easier for the compiler to use floats. Same idea as multiplying 32-bit variables in 16-bit. Something to add to the bottom of the list of "PB Optimizations". ------------------ IP: Logged |
All times are EasternTime (US) | next newest topic | next oldest topic |
![]() |
|
Copyright © 1999-2007 PowerBASIC, Inc. All Rights Reserved.