|
Author
|
Topic: Dicing With Probability
|
Charles Pegge Member
|
posted February 07, 2007 02:45 PM
>>xi=( (d(c)-pp)^2/pp ) + ( ( (g-d(c))-(g-pp) )^2 / (g-pp) ) '// chi terms>>(d(c)-pp)^2 * ( 1/pp + 1/(g-pp) ) Yes, David I agree with your formula reduction. It is just (Observed-Expected)^2/Expected for both heads and tails summed. In this case the tails are the exact complement of the heads. (One degree of freedom for this.) As for selecting seeds, I knew it would stir controversy.  It all depends how you are going to use the numbers, especially when you have a limited sample of them. Lumpy sugar is okay in the mixing bowl but not for putting in your cup of coffee. John, if the tail Chis are included, should a Yates correction be applied here?
------------------ www.pegge.net IP: Logged |
John Gleason Member
|
posted February 07, 2007 04:58 PM
Charles, quote:
if the tail Chis are included, should a Yates correction be applied here?
Yowie, I'm afraid I'd be sending a boy to do a man's job if I try to answer that one. Maybe a little help from Dave or Emil here? I wish tho that, like all of you do, I could reduce, transform, compress, and otherwise play the equations "just like you're ringing a bell."  But I can add this: If you're certain that it's correct and you don't want the tails, then when you throw away a data set, continue the next set by generating randoms from the current point in your sequence, without re-seeding. I'm quite sure that this is even required by law in several municipalities. ------------------
IP: Logged |
Emil Menzel Member
|
posted February 07, 2007 05:05 PM
Charles: I agree with you that whether picking a seed number is naughty, nice or "other" depends on one's purpose. But if you want to assure that RND data should conform to theoretical expectation, there is an easier way to do it, and it is perfectly legit. It is called "sampling with replacement". You define an entire finite "population" of numbers that you are interested in; then you draw numbers from this population "at random" using one or another acceptable method of sampling. Thanks to John we know that PB's RND has a finite store of numbers, and that if you RANDOMIZE and then call RND 2^32 times, you will get all of these numbers & no others, regardless of what seed you use. As the program below shows, the mean of all these RND numbers is not the theoretically expected .5 -- it is "off" after the 9th decimal place. The sd is not precisely the theoretically expected sqr( 1/12) either, & is off to about the same degree. If one were to use these same numbers to estimate pi, the departure from theoretical expectation would similarly be "trivial" for most people's ordinary, everyday purposes but "very substantial" as far as some other people would judge the issue. So what is the true mean & sd of RND? I would think that that will depend on whether you choose to call the 2^32 numbers in question your "population" or merely a sample thereof. I'd choose the latter option myself. 'RNDstats.BAS 'by Emil Menzel 'based on John Gleason's program 'Initially I plugged the stats routines into his original program; 'here I do them separately, for speed & exhaustiveness. #COMPILE EXE #DIM ALL FUNCTION PBMAIN () AS LONG' LOCAL ii, i2, i3 AS LONG LOCAL eRnd AS EXT, t1 AS DOUBLE LOCAL A AS STRING, sumx AS EXT, sumx2 AS EXT, sd AS EXT, N AS EXT ' eRnd=TIMER RANDOMIZE eRnd '''5777766666 'any valid number here can be used LOCATE 5,1: PRINT "RND Stats" PRINT "This program tests all possible 2^32 RND random numbers" PRINT "in PowerBASIC's random number generator." PRINT "Use any seed number you want; mean & sd should be the same" LOCATE 10 PRINT "seed ";eRnd FOR ii = 1 TO 65536 FOR i2 = 1 TO 65536 eRnd = RND INCR N sumX=sumx+eRnD sumx2=sumx2+eRnd*eRnd NEXT LOCATE 12, 1 PRINT N; NEXT LOCATE 12,1 sd=SQR((sumx2-sumx^2/N)/(N-1)) A$=USING$( "N ############ mean ##.############ sd ##.############",N,sumX/N,sd) PRINT PRINT A$ PRINT WAITKEY$ END FUNCTION
------------------
IP: Logged |
David Roberts Member
|
posted February 07, 2007 08:53 PM
Since we are using all 2^32 then it is academic what seed we use. RND has a default seed so if we did not implement RANDOMIZE then the default would be used and we'd get the same results.  quote: the mean of all these RND numbers is not the theoretically expected .5 -- it is "off" after the 9th decimal place.
0.5 is not the theoretical mean of RND since it generates an extended-precision value in the range [0,1) and not [0,1] so we would expect a value less than 0.5. Your app, Emil, gave me 0.499999999767 so would, as you say, have an effect on our pi calcs but, of course, would be insignificant, excuse the pun, with regard to testing a statistical hypothesis. IP: Logged |
Charles Pegge Member
|
posted February 08, 2007 01:50 PM
Emil, You obviously need a high quality stream of random numbers with which to build pi. At one extreme you depend heavily of the random numbers with minimal procedure and at the other end, you have totally deterministic methods with no random elements in the computation. And somewhere in between there are the delightfully named Monte Carlo methods, which you mentioned above.I found a link explaining what they are in very simple terms http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html Some of the things I was doing before probably qualify for this definition. Adding a random element to successive approximation seems to simplify the procedure of getting answers from awkward formulae while imposing a minimal penalty in computational efficiency when compared to binary approximation. My favorite awkward formula is sin(.5*a)/sin(a)=sin(72/2) This relates the side of the spherical triangles in an 'inflated' icosahedron to the surface angles at the vertex of each triangle. This is a very important figure required for designing geodesic domes. Before I found the trig solution to this I used to use successive approximation to get an accurate answer, but using random numbers the approximation is much easier to set up, and looks very similar to that PI method you recoded. For this sort of method, high quality random numbers are not needed. It is also possible to add a line so you drop out of the loop when 'a' falls below a threshold, according to the precision required. #COMPILE EXE #DIM ALL GLOBAL pi AS DOUBLEFUNCTION sind(a AS DOUBLE) AS DOUBLE: FUNCTION=SIN(pi*a/180): END FUNCTION FUNCTION PBMAIN () AS LONG ' to solve sin(.5*ans)/Sin(a)=sin(36) DIM pi AS GLOBAL DOUBLE: pi=ATN(1)*4 DIM ml AS DOUBLE, i AS DOUBLE ,r AS DOUBLE ,e AS DOUBLE, a AS DOUBLE, ee AS DOUBLE, ans AS DOUBLE, fbr AS DOUBLE, sins AS DOUBLE sins=sind(36) ' right side of equation. const ml=200 ' number of loops i=ml ' down counter a=2 ' prior error: start with an unlikely figure ee=90 ' random span scaling factor ans=45 ' rough estimate fbr=180 ' feedback scaling factor DO r=(RND(1)*2-1)*ee+ans e=ABS(sind(.5*r)/sind(r)-sins) IF e < a THEN a=e: ans=r: ee=MIN(e*fbr,ee) DECR i: IF i<=0 THEN EXIT LOOP LOOP MSGBOX "Est of central angle for icos triangle side "+STR$(ans)+ " compared with "+STR$(ATN(2)*180/pi) END FUNCTION
------------------ www.pegge.net IP: Logged |
Mark Hunter Member
|
posted February 08, 2007 06:53 PM
Raphson's Method will solve this kind of problem. (We're pretending we don't know the arctan(2) solution.)
Global pi As Double ' Sin in degrees rather than radians. Function sind(a As Double) As Double Function = Sin(pi*a/180) End Function ' Our function F. ' We will solve F(x) = sind(36). Function F(a As Double) As Double Function = sind(a*.5)/sind(a) End Function Function PBMain Dim sins As Double Dim x1 As Double, x2 As Double Dim t As Double, m As Double pi = Atn(1)*4 sins = sind(36) x1 = 45 ' one rough estimate x2 = 46 ' another rough estimate Do Until Abs(x1-x2) < .00000000001 ' Cauchy convergence m = (F(x2)-F(x1))/(x2-x1) ' slope of chord t = x2 ' save x2 x2 = x1 - (F(x1) - sins)/m ' new x2 where chord cuts x axis x1 = t ' new x1 = old x2 Print x1 Loop Print "compared with" Print Atn(2)*180/pi WaitKey$ End Function
------------------ ARI Watch IP: Logged |
Emil Menzel Member
|
posted February 08, 2007 09:09 PM
David: Your comment about significance or lack thereof for statistical hypotheses sounds odd to me. What is significant statistically depends on what significance level one (or one's profession) chooses to set, and who are you or I to choose it for everybody? As you say, RND is never supposed to be >= 1. But how often is it <= 0? According to my program, it is 2 times out of the 2^32 of RND's entire period. Drop those 2 cases, & what is the mean RND? It is 5, precisely (or I should say to the 18 decimal places of my EXT variables). Alas, the s.d. is still "off" from SQR( 1/ 12) at about the 10th decimal place... Of course, such empirical considerations do not prove what RND's *theoretical* mean, s.d. and distribution are. For that I'll go back and re-read, for starters, Donald E. Knuth, Chapter 3, Vol. 2, The art of computer programming (1998); chapter on random numbers. A great book! Charles & Mark: Thanks for the programs. I hope to study them more closely.
------------------
IP: Logged |
David Roberts Member
|
posted February 09, 2007 01:16 AM
quote: Your comment about significance or lack thereof for statistical hypotheses sounds odd to me. What is significant statistically depends on what significance level one (or one's profession) chooses to set, and who are you or I to choose it for everybody?
Emil, I entirely agree with the second sentence although, for the life of me, I cannot figure out why you wrote it because there doesn't seem to be a link between what I wrote and what you wrote. I did write 'excuse the pun'.IP: Logged |
John Gleason Member
|
posted February 09, 2007 12:00 PM
Charles, That was an excellent article you linked to by Joy Woller. I always wondered what the heck "Monty Carlo" simulations were, and how you could derive pi from random numbers. What else can you want from a page? hehheh.I had a thought re. making the pi derivation more dependent on good numbers from the random number generator, but I don't know if it's valid: Rather than just dividing pi by 2 as Joy did, divide it by maybe 64 or 128 or more. It seems like since many more randoms will therefore need to be generated, the random algorithm becomes increasingly important, hence one algorithm may begin to be distinguishable from another by how accurately and even quickly it estimates pi. IP: Logged |
Charles Pegge Member
|
posted February 09, 2007 01:46 PM
John, there are Las Vegas methods too! but good information is harder to come by, since it seems to be locked up in various Math journals. Perhaps they dont want us to know.Mark, Thanks for your program showing Raphson's method. It has given me the idea of using 2 estimates in the generic approximation procedure. This allows the feedback to be automatically scaled, which makes it much simpler to use. My improved algorithm, working to 12 digit precision, gets pi out in about 45 loops and the icos angle problem in 74, where a binary approximation would take 40 loops.
------------------ www.pegge.net IP: Logged |
Mark Hunter Member
|
posted February 09, 2007 02:19 PM
Your method works without using random numbers. If – in your original program above – we replacer = (RND(1)*2 - 1)*ee + ans Withr = -ee + ans so that the factor, instead of random between -1 and 1, is fixed at -1, and print out - ans instead of ans, it converges just as well.
------------------ ARI Watch IP: Logged |
Charles Pegge Member
|
posted February 09, 2007 03:09 PM
Mark, This is what I have so far. It applies the algorithm to 9 formulae and while the randomising method obtains answers for all of them, a deterministic convergence only gets two or three of them. In a sense it is self-correcting when it tries to converge on the wrong side of the answer.PS. The function pointering used here makes the code a little more complex than it needs to be. #COMPILE EXE #DIM ALL GLOBAL pi AS DOUBLE DECLARE FUNCTION gen(v AS DOUBLE) AS DOUBLE FUNCTION sind(a AS DOUBLE) AS DOUBLE: FUNCTION=SIN(pi*a/180): END FUNCTION FUNCTION ff(ans AS DOUBLE, f AS LONG) AS DOUBLE: DIM v AS DOUBLE: CALL DWORD f USING gen(ans) TO v: FUNCTION=v:END FUNCTION ' FUNCTION ranprox(f AS LONG, ans AS DOUBLE, i AS LONG) AS DOUBLE: DIM r AS DOUBLE ,e AS DOUBLE, a AS DOUBLE, ee AS DOUBLE, fbr AS DOUBLE i=0 ' up counter fbr=1/ABS(ff(ans+1,f)-ff(ans,f)) ' feedback scaling factor ee=fbr ' random span scaling factor a=1e307 ' prior error with ridiculous value DO r=(RND(1)*2-1)*ee+ans: 'r = -ee + ans 'r = ee + ans e=ff(r,f):IF e < a THEN a=e: ans=r: ee=MIN(e*fbr,ee) INCR i: IF (a*fbr<1e-12)OR(i>1000) THEN FUNCTION=ans: EXIT LOOP 'ee=ee*.5 LOOP END FUNCTION ' ' to solve sin(.5*ans)/Sin(a)=sin(36) FUNCTION fp(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS(SIN(r/6)-.5): END FUNCTION FUNCTION f1(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS(sind(.5*r)/sind(r)-sind(36)): END FUNCTION FUNCTION f2(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+1): END FUNCTION FUNCTION f3(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+.5): END FUNCTION FUNCTION f4(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+SQR(.5)): END FUNCTION FUNCTION f5(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+2): END FUNCTION FUNCTION f6(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+3): END FUNCTION FUNCTION f7(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+4): END FUNCTION FUNCTION f8(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+6): END FUNCTION ' ' FUNCTION PBMAIN () AS LONG: DIM i AS LONG pi=ATN(1)*4 ' compute pi MSGBOX _ "Pi: "+STR$(ranprox(CODEPTR(fp),3,i))+" in "+STR$(i)+" loops"+$CRLF + _ "Central angle of icos triangle: "+STR$(ranprox(CODEPTR(f1),45,i))+" in "+STR$(i)+" loops"+$CRLF + _ "Phi: "+STR$(ranprox(CODEPTR(f2),2,i))+" in "+STR$(i)+" loops"+$CRLF + _ ".5: "+STR$(ranprox(CODEPTR(f3),3,i))+" in "+STR$(i)+" loops"+$CRLF + _ "sqr(.5): "+STR$(ranprox(CODEPTR(f4),3,i))+" in "+STR$(i)+" loops"+$CRLF + _ "2: "+STR$(ranprox(CODEPTR(f5),3,i))+" in "+STR$(i)+" loops"+$CRLF + _ "3: "+STR$(ranprox(CODEPTR(f6),4,i))+" in "+STR$(i)+" loops"+$CRLF + _ "4: "+STR$(ranprox(CODEPTR(f7),5,i))+" in "+STR$(i)+" loops"+$CRLF + _ "6: "+STR$(ranprox(CODEPTR(f8),10,i))+" in "+STR$(i)+" loops"+$CRLF + _ "" END FUNCTION
------------------ www.pegge.net IP: Logged |
Mark Hunter Member
|
posted February 09, 2007 04:11 PM
Broadly speaking both you and Raphson are doing the same thing: computing the y axis error and adding it (after multiplying by a suitable factor) to the x axis value, and doing this repeatedly.Your methods differ in what “suitable factor” is used. Like yours, the Raphson Method is general and will work for all nine of your equations. If you want to solve F(x) = a use Dim a As Double Dim x1 As Double, x2 As Double Dim t As Double, m As Double x1 = rough estimate x2 = different rough estimate Do Until Abs(x1 - x2) < .00000000001 ' Cauchy convergence m = (F(x2) - F(x1))/(x2 - x1) ' slope of chord t = x2 ' save x2 x2 = x1 - (F(x1) - a)/m ' new x2 where chord intersects x axis x1 = t ' new x1 = old x2 Loop Print x1 If you draw a picture you can see why this works.Your method, as I said, is somewhat similar. Essentially instead of using the 1/slope as the factor, it uses a random factor within a certain range. I’d bet that in all cases you can replace this by a suitable constant factor and it will converge, though still far more slowly than if you use the “best” factor 1/slope. (Raphson takes 7 iterations versus over a hundred to get the same accuracy). But as you observed, the clever randomness eliminates the need to find it. For what it’s worth the case sin(x/2)/sin(x) = sin(36) using a constant “feedback scaling factor” can be simplified further: Global pi As DoubleFunction sind(a As Double) As Double Function = Sin(pi*a/180) End Function Function PBMain ' to solve sin(.5*ans)/Sin(a) = sin(36) Dim i As Long, r As Double, e As Double, sins As Double pi = Atn(1)*4 sins = sind(36) ' constant r = 45 ' rough estimate %s = 80 ' feedback scaling factor For i = 1 To 120 e = Abs(sind(.5*r)/sind(r) - sins) 'error r = r + e*%s ' new r by adding magnified error Next Print r Print "compared with" Print Atn(2)*180/pi WaitKey$ End Function ------------------ ARI Watch
[This message has been edited by Mark Hunter (edited February 09, 2007).] IP: Logged |
Emil Menzel Member
|
posted February 09, 2007 08:45 PM
Re the distribution of RND. I used the function RND (0, 9) to examine the relative frequencies of these numbers across the entire 2^32 period of RND. Eight of the 10 "bins" had exactly the same frequencies, & the other 2 were "off" from that by a count of 2. Now that is what I call a uniform or rectangular distribution. The overall Chi-square was 1.49 (d.f. =9) or just about as close to zero as one could get... Perhaps I'll try this test for 52 bins (for card players) or 100 bins. David: My apologies for reading between the lines. Charles & John: I see no reason to use Yates' correction. And I'd say use any reasonable or customary significance level -- .1, .05, .025, .01, .001, etc. I recommend Donald Knuth's chapter on random numbers for a thorough account of Chi-square & other tests, and how to use & interpret them. "Monte Carlo Pi" seems like quite a popular topic; more accurately, Pi seems like a popular way to introduce some of the basic issues and methods. What I like most about the Monte Carlo methods is that you can tailor-make them for almost any sort of statistical problem. Check out this web site if you can http://cs.sunysb.edu/~algorith/. It covers both of the above problems, & more. Mark & Charles; "Deterministic versus randomization-based algorithms" sounds like an interesting challenge.
------------------
IP: Logged |
Charles Pegge Member
|
posted February 10, 2007 03:56 AM
Mark, Got it! Many thanks. Raphson's method is very robust. Here is my interpretation replacing the inside of the ranprox function. And all is deftly accomplished in less than 10 loops! i=0 ' up counter r=ans+1 ' second extimate of ans a=ff(ans,f) ' first result e=ff(r,f) ' second result DO fbr=(r-ans)/(e-a): ans=r: r=r-fbr*e: a=e:e=ff(r,f) INCR i: IF (ABS(r-ans)<1e-12)OR(i>1000) THEN FUNCTION=ans: EXIT LOOP LOOP
------------------ www.pegge.net
[This message has been edited by Charles Pegge (edited February 10, 2007).] IP: Logged |