|
Author
|
Topic: Dicing With Probability
|
Charles Pegge Member
|
posted February 01, 2007 01:39 PM
Emil, You nearly got it but were mislead by the 'e' which is just a temp variable. Also my scripting language does naughty C things like storing values while in the middle of an expression. ( by using the word 'to' )I use PBwin so I have not compiled this but the algorithm is now as intended. It can be regarded as Darwinian evolution in the sense that each loop represents a single generation with a mutation conferring possible advantage. The advantage is an increase in probability of hitting the target value. #COMPILE EXE #DIM ALL FUNCTION PBMAIN () AS LONG DIM m1 AS EXT, i AS QUAD, r AS EXT, a AS EXT, ee AS EXT, ans AS EXT DIM fbr AS EXT, pi AS EXT, e AS EXT, ml AS QUAD DIM pr AS STRING, tg AS STRING, seed AS EXT ml=1e3: i=ml: r=0: a=1: ee=1: ans=1: fbr=6: pi=ATN(1)*4 pr="#.############" tg=USING$(pr,pi) PRINT "Successive approximation using random numbers" PRINT "Target: ";tg PRINT seed=TIMER RANDOMIZE seed DO r=RND(1) r=(r*2-1)*ee+ans e=ABS(SIN(r/6)-.5) IF e < a THEN a=e ans=r IF ee > e*fbr THEN ee= e*fbr END IF IF USING$(pr, ans)=tg THEN EXIT DO DECR i LOOP UNTIL i <= 0 IF i<=0 THEN PRINT "Unable to complete in "; ml; "loops" PRINT ans ELSE PRINT "Hit target Est of pi to "; pr "decimal places in "; ml-i+1; "loops" END IF WAITKEY$ END FUNCTION
------------------ www.pegge.net IP: Logged |
Emil Menzel Member
|
posted February 01, 2007 04:39 PM
Charles: Thank you for fixing my bugs. Actually I like your script, but I don't have a compiler that will run it.Darwinian variation & natural selection, as I understand it, has no purpose, target or goal unless one so construes "having more grandchildren". ------------------
IP: Logged |
Charles Pegge Member
|
posted February 01, 2007 05:32 PM
Yes I think the only `target` of Darwinian evolution is to be reproducable. It's not really 'Survival of the Fittest'. After all, there are some highly impractical creatures out there in the rain forests, like the birds of paradise.Speaking of which, the scripting language is in a rather fluid state at the moment. One of my greatest pleasures was to remove any user interfacing. All inputs and outputs are files. Notepad provides the IDE. So far I have used it for scripts to generate scripts like those 3d models on my website. I hope to lose a few more words before releasing it. You may have noticed that it does not use '<' and '>'. This is to allow code and markup languages to be freely intermixed. ------------------ www.pegge.net IP: Logged |
David Roberts Member
|
posted February 02, 2007 11:44 AM
quote: >'RND is uniformly distributed beteen 0 and 1 That's not a documented claim for RND.
quote: How random is the RND function?
Working on the null hypothesis that the mean of all possible RNDs is 0.5, ignoring the fact that RND will never equal 1, the following code looks at generating 1000 RNDs 10000 times to see how well the null hypothesis holds.
#COMPILE EXE #REGISTER NONE #DIM ALL #TOOLS OFF %NOMMIDS = 1 %NOGDI = 1 '#INCLUDE "WIN32API.INC" TYPE PercentagePoints one AS LONG two AS LONG four AS LONG eight AS LONG sixteen AS LONG END TYPE FUNCTION PBMAIN( ) AS LONG LOCAL pPoints AS PercentagePoints LOCAL tot, xbar, z AS SINGLE LOCAL n, i, j AS LONG LOCAL one, two, four, eight, sixteen AS LONG RANDOMIZE FOR j= 1 TO 10000 n = 1000 tot = 0 FOR i = 1 TO n tot = tot + RND NEXT xbar = tot/n ' Average of n * RNDs ' Rectangular distribution probability density function ' f(x; alpha, beta) = 1/(beta - alpha) ' Mean: (beat + alpha)/2 ' Variance: (beta - alpha)^2/12 ' With RND we have beta = 1 and alpha = 0. ' The random variable z = (xbar - mean)/( SQR(Variance)/SQR(n) ) ' is normally distributed. ' SQR(12) = 3.464101615 z = ABS(xbar - 0.5)*3.464101615*SQR(n) SELECT CASE z CASE > 2.32635 ' 1% point of standardised normal distribution INCR pPoints.one CASE > 2.05375 ' 2% INCR pPoints.two CASE > 1.75069 ' 4% INCR pPoints.four CASE > 1.40507 ' 8% INCR pPoints.eight CASE > 0.99446 ' 16% INCR pPoints.sixteen END SELECT ' Reseed for the next loop RANDOMIZE RND ' On fast machines RANDOMIZE [TIMER] will give ' a repeat as TIMER will not have moved on NEXT j one = pPoints.one two = one + pPoints.two four = two + pPoints.four eight = four + pPoints.eight sixteen = eight + pPoints.sixteen ? " 1%:" & STR$(one) ? " 2%:" & STR$(two) ? " 4%:" & STR$(four) ? " 8%:" & STR$(eight) ? "16%:" & STR$(sixteen) WAITKEY$ END FUNCTION This output has been found to be common.
1%: 209 2%: 399 4%: 776 8%: 1535 16%: 3148 What this is saying is that 209 times out of 10000 runs of 1000 RNDs the actual average would occur less than 1% of the time. That is to say there is strong evidence that the uniformity of RND is highly suspect 2.09% of the time. In statistical significance terms we use 'some evidence', 'strong evidence' and 'very strong evidence'. In practice, what we do with them depends upon what we are considering. With a 'big bucks' or 'life and death' decision then 'some evidence' may be enough to rethink. In other cases we may decide to rethink only if there is 'very strong evidence' to question a null hypothesis. We are getting a 4% significant level nearly 8% of the time which isn't bad but nothing to shout about so RND may not be the best vehicle for an iterative process.Of course, for occasional use in an application RND is more than adequate and may still be adequate with frequent use; whatever we mean by frequent. If a more robust generator is required then there will probably be a speed penalty and if we go the cryptographic route then the speed penalty will be massive compared with RND.
[This message has been edited by David Roberts (edited February 02, 2007).] IP: Logged |
Charles Pegge Member
|
posted February 02, 2007 03:53 PM
David, Trying to understand your algorithm. Where does the 12 come from? Can you explain a bit about the Rectangular distribution probability density function?------------------ www.pegge.net IP: Logged |
Charles Pegge Member
|
posted February 02, 2007 05:13 PM
My results are similar to yours but I wonder if the percentages on the left are specified for a single tail of the probability curve. If you double them. It's a good match. 1% 2.02% 2% 4.16% 4% 8.41% 8% 16.18% 16% 32.98% ------------------ www.pegge.net IP: Logged |
Emil Menzel Member
|
posted February 02, 2007 06:05 PM
This might be a simpler approach. And if the results do not meet theoretical expectation you can sue Lady Luck, not me. I add a chi-square test, which might be the first type of inferential statistic to apply. It is also easiest to compute, if not also to understand. If you reliably get more than 5% of the chi-square values within the printed table >= 3.84, then worry. If you do not get any values that large, repeat the test a few times more and you surely will. The value of Chi-tot that is considered statistically significant depends on the degrees of freedom (here, number of "bins" or rows, minus 1). For df=10, the .05 level value is 18.31; for df=25 it is 37.65. NOTE ADDED Feb 12, 2007 In PBDOS 3.5 [but not in PBCC 4.02] if you set "number of bins" in the program equal to 2^M (e.g.) 4, 16, 32, 64, 128.... and N equal to (e.g.) 2^16, you will get Chi-squares = 0.00000.... Change number of bins by a count of 1 and you won't get that, or at least you will not get it for sure. 'language PB3.5 for DOS 'test the mean, sd and distribution of RND dim N as quad, seed as quad, X as quad, UB as quad, LB as quad dim mean as ext, sd as ext dim expect as ext, chi as ext, chiTot as extN=2^16 seed=timer 'set number of "bins" desired in freq. distr. UB=15 LB=0 dim freq(LB to UB) as quad randomize seed CLS print x=Stats(mean,sd,N) print using "RND Mean ##.############ sd ##.############ N #########";mean;sd;N print "expected 0.500000 sqr(1/12) ";sqr(1/12) print print "Frequency distribution of RND" print "bin obs expected chi-sq " X=Uniform_distrib (N, freq()) expect=N / ((UB-LB)+1) 'expected freq in each bin, if distribution is uniform for I=lbound(freq) to ubound(freq) chi=(freq(i)-expect)^2 / expect chiTot =chiTot + chi print using "#### ###### ######.## ######.##";i; freq(i); expect; chi next print print "Chi-square (tot)=";chiTot; " d.f. ="; UB-LB print "Tables of Chi-square are available in almost any statistics textbook" print "and of course online" FUNCTION Uniform_Distrib (N as quad, freq() as quad) as quad dim i as long, X as ext, LB as long, UB as long LB=lbound(freq) UB=ubound(freq) FOR i = 1 to N X = RND(LB,UB) incr freq(X) next FUNCTION = 0 END FUNCTION FUNCTION Stats (Mean as ext, sd as ext, N as quad) as long dim x as ext, sumx as ext, sumx2 as ext, i as quad for i = 1 to N x=rnd(1) sumx=sumx+x sumx2=sumx2+ x*x next mean=sumx/N sd=sqr((sumx2-sumx^2/N)/(N-1)) FUNCTION=0 END FUNCTION
------------------
[This message has been edited by Emil Menzel (edited February 12, 2007).] IP: Logged |
David Roberts Member
|
posted February 02, 2007 09:13 PM
CharlesUniform Distribution I never think either/or and have habitually used ABS and single sided. Emil What's the period of RND? Can it stand nearly 2^20 without reseeding? IP: Logged |
Emil Menzel Member
|
posted February 02, 2007 10:49 PM
David: RND's period is 2^32. A program to test this is given in: http://www.powerbasic.com/support/forums/Forum7/HTML/002589.html I have read that for critical problems it is best not to draw more than about 3 million numbers before reseeding or switching one's RNG. But I don't know how the author arrived at that number. ------------------
IP: Logged |
David Roberts Member
|
posted February 03, 2007 02:10 AM
> RND's period is 2^32Does that go for PB3.5 DOS as well? IP: Logged |
Charles Pegge Member
|
posted February 03, 2007 04:57 AM
Chi Squared TestsApplying Chi Squared pairs to my previous binomial probability curves versus RND values I get a summed value of between 18 and 25. I tested this with trial sessions varying from 500 to 1 million an got this result consistently. Since there are 26 degrees of freedom involved in this test, this is well within the expected range of variability. However, just to muddy the waters, the individual Chi for #11 or #12 were often above 4. So a more sensitive test might reveal some subtle bias, which suggests that you get the equivalent of say 12heads:14tails more often than you would expect by chance. Here is a rerun of the test with the individual Chi squareds. Results Generating Probability Distribution for p=.5 with 50000 repeats of 26 Compared with binomial distribution for (.5+.5)^26 0 1.49011611938476E-8 #0 0 3.8743019104004E-7 #1 0 4.84287738800048E-6 #2 0 3.8743019104004E-5 #3 Chi^2=1.9 .00016 2.22772359848022E-4 #4 Chi^2=0.9 .00088 9.801983833313E-4 #5 Chi^2=0.5 _.00332 _3.43069434165955E-3 #6 Chi^2=0.2 ____.00938 ____9.80198383331297E-3 #7 Chi^2=0.9 ________.02204 _________2.32797116041185E-2 #8 Chi^2=3.4 __________________.04706 __________________4.65594232082365E-2 #9 Chi^2=0.3 ______________________________.07888 ______________________________7.91510194540031E-2 #10 Chi^2=0.1 _____________________________________________.1177 ____________________________________________.115128755569458 #11 Chi^2=3.2 ________________________________________________________.14458 _______________________________________________________.143910944461822 #12 Chi^2=0.2 ____________________________________________________________.15608 ____________________________________________________________.154981017112731 #13 Chi^2=0.5 _______________________________________________________.1427 _______________________________________________________.143910944461823 #14 Chi^2=0.6 ____________________________________________.11446 ____________________________________________.115128755569459 #15 Chi^2=0.2 _______________________________.07952 ______________________________7.91510194540031E-2 #16 Chi^2=0.1 __________________.0461 __________________4.65594232082366E-2 #17 Chi^2=0.2 _________.02388 _________2.32797116041185E-2 #18 Chi^2=0.8 ___.009 ____9.80198383331297E-3 #19 Chi^2=3.3 _.0033 _3.43069434165954E-3 #20 Chi^2=0.2 .00066 9.801983833313E-4 #21 Chi^2=5.2 .00026 2.22772359848023E-4 #22 Chi^2=0.3 .00004 3.8743019104004E-5 #23 Chi^2=0. 0 4.84287738800049E-6 #24 0 3.8743019104004E-7 #25 0 1.49011611938476E-8 #26 total chi^2=23.0802957820652 at 26 degrees of freedom
------------------ www.pegge.net
[This message has been edited by Charles Pegge (edited February 03, 2007).] IP: Logged |
Charles Pegge Member
|
posted February 03, 2007 06:07 AM
David,Thanks for the Wolfram link. I will have to improve my Greek. This is very hard core. It will take a while to absorb. These are the confidence figures normally given for standard deviations, so I think the sd % values used in your program need to be doubled. 1s 68.26894921371% 2s 95.44997361036% 3s 99.73002039367% 4s 99.99366575163% http://en.wikipedia.org/wiki/Standard_deviation ------------------ www.pegge.net
[This message has been edited by Charles Pegge (edited February 03, 2007).] IP: Logged |
David Roberts Member
|
posted February 03, 2007 12:36 PM
quote: so I think the sd % values used in your program need to be doubled.
Charles, you have persuaded me.  This leads to a remarkable rule of thumb. Assuming that RND is uniformly distributed then there is an x% probability that it will produce results which only occur x% of the time. Suppose we define suspect at 5% and highly suspect at 1% then our rule of thumb says: 1% of the time RND is highly suspect. 5% of the time RND is suspect. OR 99% of the time RND is not highly suspect. 95% of the time RND is not suspect. For most practical purposes this gives, excuse the pun, the thumbs up for RND; as if it needed it.  IP: Logged |
Emil Menzel Member
|
posted February 03, 2007 12:50 PM
David: I believe that PBDOS RND period is also 2^32 but I did not have time to test it. Charles: Wolfram on distributions might be fun for PhD candidates in mathematical statistics, but in any applied science I know it would suffice to say, as an intro stat textbook might, that: "A rectangular distribution is one that has uniform frequencies over all class intervals" (or equal frequencies in each "bin", to use the language I used earlier). Re Chi-square. The formula I gave [chi-sq= ((o-e)^2) / e] was for frequencies, not proportions. I hope that you caught that rather than vice versa. Another potential gotcha in using chi-square is that if expected frequencies are "very small", chi-square values will be "artificially" inflated. E.g, (3-1)^2 / 1 = 4 which, for 1 d.f. is supposedly p<.05; but it ain't, in my book. But in any event I did not see anything in your results that would lead me to worry about random number generator. If you do worry, I'd advise skipping the transformation of your raw RNDs into a theoretically normal distribution; examine the raw RNDs as such. Also, there are a number of different ways of "normalizing" RNDs and some might do better than others. Some, for example, use a large sample of RNDs to generate a single value of RND-normal, and in principle, they should have the edge.
------------------
IP: Logged |
Charles Pegge Member
|
posted February 03, 2007 01:13 PM
David,I agree with your conclusions but I am seeing a consistently high Chi value in the #11 or #12 area. #13 being the peak of the curve where 50% of RNDs are below 0.5 and 50% above. So I am setting up a chi chart to see where the variance shows up over multiple sessions. I'm just curious. Does anyone here know what algorithm is used in the RND function?
------------------ www.pegge.net IP: Logged |