PowerBASIC Forums
  Cafe PowerBASIC
  Dicing With Probability (Page 9)

Post New Topic  Post A Reply
profile | register | preferences | faq | search

UBBFriend: Email This Page to Someone!
This topic is 10 pages long:   1  2  3  4  5  6  7  8  9  10 
next newest topic | next oldest topic
Author Topic:   Dicing With Probability
Charles Pegge
Member
posted February 07, 2007 02:45 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
>>xi=( (d(c)-pp)^2/pp ) + ( ( (g-d(c))-(g-pp) )^2 / (g-pp) ) '// chi terms

>>(d(c)-pp)^2 * ( 1/pp + 1/(g-pp) )

Yes, David I agree with your formula reduction.
It is just (Observed-Expected)^2/Expected for both heads and tails summed.
In this case the tails are the exact complement of the heads. (One degree of freedom for this.)

As for selecting seeds, I knew it would stir controversy.
It all depends how you are going to use the numbers, especially when you have a
limited sample of them. Lumpy sugar is okay in the mixing bowl but not for putting in your
cup of coffee.

John, if the tail Chis are included, should a Yates correction be applied here?

------------------
www.pegge.net

IP: Logged

John Gleason
Member
posted February 07, 2007 04:58 PM     Click Here to See the Profile for John Gleason     Edit/Delete Message   Reply w/Quote
Charles,
quote:

if the tail Chis are included, should a Yates correction be applied here?


Yowie, I'm afraid I'd be sending a boy to do a man's job if I try
to answer that one. Maybe a little help from Dave or Emil here? I wish
tho that, like all of you do, I could reduce, transform, compress, and otherwise
play the equations "just like you're ringing a bell."

But I can add this: If you're certain that it's correct and you don't want the tails,
then when you throw away a data set, continue the next set by generating randoms from
the current point in your sequence, without re-seeding. I'm quite sure that this is
even required by law in several municipalities.

------------------

IP: Logged

Emil Menzel
Member
posted February 07, 2007 05:05 PM     Click Here to See the Profile for Emil Menzel     Edit/Delete Message   Reply w/Quote
Charles:
I agree with you that whether picking a seed number is naughty, nice or
"other" depends on one's purpose. But if you want to assure that
RND data should conform to theoretical expectation, there is an
easier way to do it, and it is perfectly legit. It is called
"sampling with replacement". You define an entire finite
"population" of numbers that you are interested in; then you draw
numbers from this population "at random" using one or another
acceptable method of sampling.

Thanks to John we know that PB's RND has a finite store of numbers,
and that if you RANDOMIZE and then call RND 2^32 times, you will
get all of these numbers & no others, regardless of what seed you
use. As the program below shows, the mean of all these RND
numbers is not the theoretically expected .5 -- it is "off" after
the 9th decimal place. The sd is not precisely the theoretically
expected sqr( 1/12) either, & is off to about the same degree. If
one were to use these same numbers to estimate pi, the departure
from theoretical expectation would similarly be "trivial" for most
people's ordinary, everyday purposes but "very substantial" as far
as some other people would judge the issue.

So what is the true mean & sd of RND? I would think that that will
depend on whether you choose to call the 2^32 numbers in question
your "population" or merely a sample thereof. I'd choose the latter
option myself.

'RNDstats.BAS
'by Emil Menzel
'based on John Gleason's program
'Initially I plugged the stats routines into his original program;
'here I do them separately, for speed & exhaustiveness.
#COMPILE EXE
#DIM ALL


FUNCTION PBMAIN () AS LONG

'
LOCAL ii, i2, i3 AS LONG
LOCAL eRnd AS EXT, t1 AS DOUBLE
LOCAL A AS STRING, sumx AS EXT, sumx2 AS EXT, sd AS EXT, N AS EXT
'
eRnd=TIMER
RANDOMIZE eRnd '''5777766666 'any valid number here can be used
LOCATE 5,1: PRINT "RND Stats"
PRINT "This program tests all possible 2^32 RND random numbers"
PRINT "in PowerBASIC's random number generator."
PRINT "Use any seed number you want; mean & sd should be the same"

LOCATE 10
PRINT "seed ";eRnd
FOR ii = 1 TO 65536
FOR i2 = 1 TO 65536
eRnd = RND
INCR N
sumX=sumx+eRnD
sumx2=sumx2+eRnd*eRnd
NEXT
LOCATE 12, 1
PRINT N;
NEXT
LOCATE 12,1
sd=SQR((sumx2-sumx^2/N)/(N-1))
A$=USING$( "N ############ mean ##.############ sd ##.############",N,sumX/N,sd)
PRINT
PRINT A$
PRINT
WAITKEY$
END FUNCTION

------------------

IP: Logged

David Roberts
Member
posted February 07, 2007 08:53 PM     Click Here to See the Profile for David Roberts     Edit/Delete Message   Reply w/Quote
Since we are using all 2^32 then it is academic what seed we use. RND has a default seed so if we did not implement RANDOMIZE then the default would be used and we'd get the same results.
quote:
the mean of all these RND numbers is not the theoretically expected .5 -- it is "off" after the 9th decimal place.

0.5 is not the theoretical mean of RND since it generates an extended-precision value in the range [0,1) and not [0,1] so we would expect a value less than 0.5. Your app, Emil, gave me 0.499999999767 so would, as you say, have an effect on our pi calcs but, of course, would be insignificant, excuse the pun, with regard to testing a statistical hypothesis.

IP: Logged

Charles Pegge
Member
posted February 08, 2007 01:50 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
Emil,
You obviously need a high quality stream of random numbers with which to build pi.
At one extreme you depend heavily of the random numbers with minimal procedure and
at the other end, you have totally deterministic methods with no random elements
in the computation. And somewhere in between there are the delightfully named Monte
Carlo methods, which you mentioned above.

I found a link explaining what they are in very simple terms
http://www.chem.unl.edu/zeng/joy/mclab/mcintro.html

Some of the things I was doing before probably qualify for this definition. Adding
a random element to successive approximation seems to simplify the procedure of getting
answers from awkward formulae while imposing a minimal penalty in computational
efficiency when compared to binary approximation.

My favorite awkward formula is sin(.5*a)/sin(a)=sin(72/2)

This relates the side of the spherical triangles in an 'inflated' icosahedron
to the surface angles at the vertex of each triangle. This is a very important figure
required for designing geodesic domes. Before I found the trig solution to this
I used to use successive approximation to get an accurate answer, but using random numbers
the approximation is much easier to set up, and looks very similar to that PI method
you recoded.

For this sort of method, high quality random numbers are not needed.
It is also possible to add a line so you drop out of the loop when 'a' falls below a
threshold, according to the precision required.


#COMPILE EXE
#DIM ALL
GLOBAL pi AS DOUBLE

FUNCTION sind(a AS DOUBLE) AS DOUBLE: FUNCTION=SIN(pi*a/180): END FUNCTION

FUNCTION PBMAIN () AS LONG
' to solve sin(.5*ans)/Sin(a)=sin(36)
DIM pi AS GLOBAL DOUBLE: pi=ATN(1)*4
DIM ml AS DOUBLE, i AS DOUBLE ,r AS DOUBLE ,e AS DOUBLE, a AS DOUBLE, ee AS DOUBLE, ans AS DOUBLE, fbr AS DOUBLE, sins AS DOUBLE
sins=sind(36) ' right side of equation. const
ml=200 ' number of loops
i=ml ' down counter
a=2 ' prior error: start with an unlikely figure
ee=90 ' random span scaling factor
ans=45 ' rough estimate
fbr=180 ' feedback scaling factor
DO
r=(RND(1)*2-1)*ee+ans
e=ABS(sind(.5*r)/sind(r)-sins)
IF e < a THEN a=e: ans=r: ee=MIN(e*fbr,ee)
DECR i: IF i<=0 THEN EXIT LOOP
LOOP
MSGBOX "Est of central angle for icos triangle side "+STR$(ans)+ " compared with "+STR$(ATN(2)*180/pi)
END FUNCTION

------------------
www.pegge.net

IP: Logged

Mark Hunter
Member
posted February 08, 2007 06:53 PM     Click Here to See the Profile for Mark Hunter     Edit/Delete Message   Reply w/Quote
Raphson's Method will solve this kind of problem.
(We're pretending we don't know the arctan(2) solution.)
Global pi As Double

' Sin in degrees rather than radians.
Function sind(a As Double) As Double
Function = Sin(pi*a/180)
End Function

' Our function F.
' We will solve F(x) = sind(36).
Function F(a As Double) As Double
Function = sind(a*.5)/sind(a)
End Function

Function PBMain
Dim sins As Double
Dim x1 As Double, x2 As Double
Dim t As Double, m As Double

pi = Atn(1)*4
sins = sind(36)

x1 = 45 ' one rough estimate
x2 = 46 ' another rough estimate

Do Until Abs(x1-x2) < .00000000001 ' Cauchy convergence
m = (F(x2)-F(x1))/(x2-x1) ' slope of chord
t = x2 ' save x2
x2 = x1 - (F(x1) - sins)/m ' new x2 where chord cuts x axis
x1 = t ' new x1 = old x2
Print x1
Loop

Print "compared with"
Print Atn(2)*180/pi
WaitKey$
End Function

------------------
ARI Watch

IP: Logged

Emil Menzel
Member
posted February 08, 2007 09:09 PM     Click Here to See the Profile for Emil Menzel     Edit/Delete Message   Reply w/Quote
David:
Your comment about significance or lack thereof for statistical
hypotheses sounds odd to me. What is significant statistically
depends on what significance level one (or one's profession)
chooses to set, and who are you or I to choose it for everybody?

As you say, RND is never supposed to be >= 1. But how often is it
<= 0? According to my program, it is 2 times out of the 2^32 of RND's
entire period. Drop those 2 cases, & what is the mean RND? It is
5, precisely (or I should say to the 18 decimal places of my EXT
variables). Alas, the s.d. is still "off" from SQR( 1/ 12) at about the
10th decimal place... Of course, such empirical considerations
do not prove what RND's *theoretical* mean, s.d. and distribution
are. For that I'll go back and re-read, for starters, Donald E. Knuth,
Chapter 3, Vol. 2, The art of computer programming (1998); chapter on
random numbers. A great book!

Charles & Mark:
Thanks for the programs. I hope to study them more closely.


------------------

IP: Logged

David Roberts
Member
posted February 09, 2007 01:16 AM     Click Here to See the Profile for David Roberts     Edit/Delete Message   Reply w/Quote
quote:
Your comment about significance or lack thereof for statistical hypotheses sounds odd to me. What is significant statistically depends on what significance level one (or one's profession) chooses to set, and who are you or I to choose it for everybody?

Emil, I entirely agree with the second sentence although, for the life of me, I cannot figure out why you wrote it because there doesn't seem to be a link between what I wrote and what you wrote. I did write 'excuse the pun'.

IP: Logged

John Gleason
Member
posted February 09, 2007 12:00 PM     Click Here to See the Profile for John Gleason     Edit/Delete Message   Reply w/Quote
Charles,
That was an excellent article you linked to by Joy Woller. I always
wondered what the heck "Monty Carlo" simulations were, and how you
could derive pi from random numbers. What else can you want from a page?
hehheh.

I had a thought re. making the pi derivation more dependent on good
numbers from the random number generator, but I don't know if it's valid:
Rather than just dividing pi by 2 as Joy did, divide it by maybe 64 or
128 or more. It seems like since many more randoms will therefore need to be
generated, the random algorithm becomes increasingly important, hence one
algorithm may begin to be distinguishable from another by how accurately
and even quickly it estimates pi.

IP: Logged

Charles Pegge
Member
posted February 09, 2007 01:46 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
John,
there are Las Vegas methods too! but good information is harder to come by, since it
seems to be locked up in various Math journals. Perhaps they dont want us to know.

Mark,
Thanks for your program showing Raphson's method. It has given me the idea of using 2
estimates in the generic approximation procedure. This allows the feedback to be
automatically scaled, which makes it much simpler to use. My improved algorithm,
working to 12 digit precision, gets pi out in about 45 loops and the icos angle problem
in 74, where a binary approximation would take 40 loops.

------------------
www.pegge.net

IP: Logged

Mark Hunter
Member
posted February 09, 2007 02:19 PM     Click Here to See the Profile for Mark Hunter     Edit/Delete Message   Reply w/Quote
Your method works without using random numbers.
If – in your original program above – we replace
r = (RND(1)*2 - 1)*ee + ans
With
r = -ee + ans
so that the factor, instead of random between -1 and 1,
is fixed at -1, and print out - ans instead of ans, it converges
just as well.

------------------
ARI Watch

IP: Logged

Charles Pegge
Member
posted February 09, 2007 03:09 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
Mark,
This is what I have so far. It applies the algorithm to 9 formulae and
while the randomising method obtains answers for all of them, a deterministic convergence
only gets two or three of them. In a sense it is self-correcting when it tries to
converge on the wrong side of the answer.

PS.
The function pointering used here makes the code a little more complex than
it needs to be.


#COMPILE EXE
#DIM ALL
GLOBAL pi AS DOUBLE
DECLARE FUNCTION gen(v AS DOUBLE) AS DOUBLE
FUNCTION sind(a AS DOUBLE) AS DOUBLE: FUNCTION=SIN(pi*a/180): END FUNCTION
FUNCTION ff(ans AS DOUBLE, f AS LONG) AS DOUBLE: DIM v AS DOUBLE: CALL DWORD f USING gen(ans) TO v: FUNCTION=v:END FUNCTION
'
FUNCTION ranprox(f AS LONG, ans AS DOUBLE, i AS LONG) AS DOUBLE: DIM r AS DOUBLE ,e AS DOUBLE, a AS DOUBLE, ee AS DOUBLE, fbr AS DOUBLE
i=0 ' up counter
fbr=1/ABS(ff(ans+1,f)-ff(ans,f)) ' feedback scaling factor
ee=fbr ' random span scaling factor
a=1e307 ' prior error with ridiculous value
DO
r=(RND(1)*2-1)*ee+ans:
'r = -ee + ans
'r = ee + ans
e=ff(r,f):IF e < a THEN a=e: ans=r: ee=MIN(e*fbr,ee)
INCR i: IF (a*fbr<1e-12)OR(i>1000) THEN FUNCTION=ans: EXIT LOOP
'ee=ee*.5
LOOP
END FUNCTION
'
' to solve sin(.5*ans)/Sin(a)=sin(36)
FUNCTION fp(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS(SIN(r/6)-.5): END FUNCTION
FUNCTION f1(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS(sind(.5*r)/sind(r)-sind(36)): END FUNCTION
FUNCTION f2(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+1): END FUNCTION
FUNCTION f3(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+.5): END FUNCTION
FUNCTION f4(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+SQR(.5)): END FUNCTION
FUNCTION f5(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+2): END FUNCTION
FUNCTION f6(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+3): END FUNCTION
FUNCTION f7(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+4): END FUNCTION
FUNCTION f8(r AS DOUBLE) AS DOUBLE:FUNCTION=ABS((1/r)-r+6): END FUNCTION
'
'
FUNCTION PBMAIN () AS LONG: DIM i AS LONG
pi=ATN(1)*4 ' compute pi
MSGBOX _
"Pi: "+STR$(ranprox(CODEPTR(fp),3,i))+" in "+STR$(i)+" loops"+$CRLF + _
"Central angle of icos triangle: "+STR$(ranprox(CODEPTR(f1),45,i))+" in "+STR$(i)+" loops"+$CRLF + _
"Phi: "+STR$(ranprox(CODEPTR(f2),2,i))+" in "+STR$(i)+" loops"+$CRLF + _
".5: "+STR$(ranprox(CODEPTR(f3),3,i))+" in "+STR$(i)+" loops"+$CRLF + _
"sqr(.5): "+STR$(ranprox(CODEPTR(f4),3,i))+" in "+STR$(i)+" loops"+$CRLF + _
"2: "+STR$(ranprox(CODEPTR(f5),3,i))+" in "+STR$(i)+" loops"+$CRLF + _
"3: "+STR$(ranprox(CODEPTR(f6),4,i))+" in "+STR$(i)+" loops"+$CRLF + _
"4: "+STR$(ranprox(CODEPTR(f7),5,i))+" in "+STR$(i)+" loops"+$CRLF + _
"6: "+STR$(ranprox(CODEPTR(f8),10,i))+" in "+STR$(i)+" loops"+$CRLF + _
""
END FUNCTION

------------------
www.pegge.net

IP: Logged

Mark Hunter
Member
posted February 09, 2007 04:11 PM     Click Here to See the Profile for Mark Hunter     Edit/Delete Message   Reply w/Quote
Broadly speaking both you and Raphson are doing the same thing:
computing the y axis error and adding it (after multiplying by
a suitable factor) to the x axis value, and doing this repeatedly.

Your methods differ in what “suitable factor” is used.

Like yours, the Raphson Method is general and will work for all
nine of your equations. If you want to solve
F(x) = a
use

 Dim a As Double
Dim x1 As Double, x2 As Double
Dim t As Double, m As Double
x1 = rough estimate
x2 = different rough estimate
Do Until Abs(x1 - x2) < .00000000001 ' Cauchy convergence
m = (F(x2) - F(x1))/(x2 - x1) ' slope of chord
t = x2 ' save x2
x2 = x1 - (F(x1) - a)/m ' new x2 where chord intersects x axis
x1 = t ' new x1 = old x2
Loop
Print x1
If you draw a picture you can see why this works.

Your method, as I said, is somewhat similar. Essentially instead
of using the 1/slope as the factor, it uses a random factor within
a certain range. I’d bet that in all cases you can replace this
by a suitable constant factor and it will converge, though still
far more slowly than if you use the “best” factor 1/slope. (Raphson
takes 7 iterations versus over a hundred to get the same accuracy).
But as you observed, the clever randomness eliminates the need to
find it.

For what it’s worth the case
sin(x/2)/sin(x) = sin(36)
using a constant “feedback scaling factor” can be simplified further:

Global pi As Double

Function sind(a As Double) As Double
Function = Sin(pi*a/180)
End Function

Function PBMain
' to solve sin(.5*ans)/Sin(a) = sin(36)
Dim i As Long, r As Double, e As Double, sins As Double
pi = Atn(1)*4
sins = sind(36) ' constant
r = 45 ' rough estimate
%s = 80 ' feedback scaling factor
For i = 1 To 120
e = Abs(sind(.5*r)/sind(r) - sins) 'error
r = r + e*%s ' new r by adding magnified error
Next
Print r
Print "compared with"
Print Atn(2)*180/pi
WaitKey$
End Function


------------------
ARI Watch

[This message has been edited by Mark Hunter (edited February 09, 2007).]

IP: Logged

Emil Menzel
Member
posted February 09, 2007 08:45 PM     Click Here to See the Profile for Emil Menzel     Edit/Delete Message   Reply w/Quote
Re the distribution of RND. I used the function RND (0, 9) to
examine the relative frequencies of these numbers across the
entire 2^32 period of RND. Eight of the 10 "bins" had exactly
the same frequencies, & the other 2 were "off" from that by a
count of 2. Now that is what I call a uniform or
rectangular distribution. The overall Chi-square was 1.49
(d.f. =9) or just about as close to zero as one could get...
Perhaps I'll try this test for 52 bins (for card players) or
100 bins.

David:
My apologies for reading between the lines.

Charles & John:
I see no reason to use Yates' correction. And I'd say use any
reasonable or customary significance level -- .1, .05, .025, .01,
.001, etc. I recommend Donald Knuth's chapter on random numbers
for a thorough account of Chi-square & other tests, and how to
use & interpret them.

"Monte Carlo Pi" seems like quite a popular topic; more accurately,
Pi seems like a popular way to introduce some of the basic issues
and methods. What I like most about the Monte Carlo methods is
that you can tailor-make them for almost any sort of statistical
problem.

Check out this web site if you can http://cs.sunysb.edu/~algorith/.
It covers both of the above problems, & more.

Mark & Charles;
"Deterministic versus randomization-based algorithms" sounds like
an interesting challenge.


------------------

IP: Logged

Charles Pegge
Member
posted February 10, 2007 03:56 AM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
Mark,
Got it! Many thanks. Raphson's method is very robust. Here is my interpretation replacing
the inside of the ranprox function. And all is deftly accomplished in less than 10 loops!


i=0 ' up counter
r=ans+1 ' second extimate of ans
a=ff(ans,f) ' first result
e=ff(r,f) ' second result
DO
fbr=(r-ans)/(e-a): ans=r: r=r-fbr*e: a=e:e=ff(r,f)
INCR i: IF (ABS(r-ans)<1e-12)OR(i>1000) THEN FUNCTION=ans: EXIT LOOP
LOOP


------------------
www.pegge.net

[This message has been edited by Charles Pegge (edited February 10, 2007).]

IP: Logged


This topic is 10 pages long:   1  2  3  4  5  6  7  8  9  10 

All times are EasternTime (US)

next newest topic | next oldest topic

Administrative Options: Close Topic | Archive/Move | Delete Topic
Post New Topic  Post A Reply
Hop to:

Contact Us | PowerBASIC BASIC Compilers

Copyright © 1999-2006 PowerBASIC, Inc. All Rights Reserved.


Ultimate Bulletin Board 5.45c