PowerBASIC Forums
  Cafe PowerBASIC
  Dicing With Probability (Page 6)

Post New Topic  Post A Reply
profile | register | preferences | faq | search

UBBFriend: Email This Page to Someone!
This topic is 10 pages long:   1  2  3  4  5  6  7  8  9  10 
next newest topic | next oldest topic
Author Topic:   Dicing With Probability
Charles Pegge
Member
posted February 01, 2007 01:39 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
Emil,
You nearly got it but were mislead by the 'e' which is just a temp variable.
Also my scripting language does naughty C things like storing values while
in the middle of an expression. ( by using the word 'to' )

I use PBwin so I have not compiled this but the algorithm is now as intended.
It can be regarded as Darwinian evolution in the sense that each loop
represents a single generation with a mutation conferring possible advantage.
The advantage is an increase in probability of hitting the target value.


#COMPILE EXE
#DIM ALL
FUNCTION PBMAIN () AS LONG
DIM m1 AS EXT, i AS QUAD, r AS EXT, a AS EXT, ee AS EXT, ans AS EXT
DIM fbr AS EXT, pi AS EXT, e AS EXT, ml AS QUAD
DIM pr AS STRING, tg AS STRING, seed AS EXT
ml=1e3: i=ml: r=0: a=1: ee=1: ans=1: fbr=6: pi=ATN(1)*4
pr="#.############"
tg=USING$(pr,pi)
PRINT "Successive approximation using random numbers"
PRINT "Target: ";tg
PRINT
seed=TIMER
RANDOMIZE seed
DO
r=RND(1)
r=(r*2-1)*ee+ans
e=ABS(SIN(r/6)-.5)
IF e < a THEN
a=e
ans=r
IF ee > e*fbr THEN ee= e*fbr
END IF
IF USING$(pr, ans)=tg THEN EXIT DO
DECR i
LOOP UNTIL i <= 0
IF i<=0 THEN
PRINT "Unable to complete in "; ml; "loops"
PRINT ans
ELSE
PRINT "Hit target Est of pi to "; pr "decimal places in "; ml-i+1; "loops"
END IF
WAITKEY$
END FUNCTION


------------------
www.pegge.net

IP: Logged

Emil Menzel
Member
posted February 01, 2007 04:39 PM     Click Here to See the Profile for Emil Menzel     Edit/Delete Message   Reply w/Quote
Charles:
Thank you for fixing my bugs. Actually I like your script, but
I don't have a compiler that will run it.

Darwinian variation & natural selection, as I understand it, has
no purpose, target or goal unless one so construes "having more
grandchildren".

------------------

IP: Logged

Charles Pegge
Member
posted February 01, 2007 05:32 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
Yes I think the only `target` of Darwinian evolution is to be reproducable.
It's not really 'Survival of the Fittest'. After all, there are some highly
impractical creatures out there in the rain forests, like the birds of paradise.

Speaking of which, the scripting language is in a rather fluid state at the moment.
One of my greatest pleasures was to remove any user interfacing. All inputs and
outputs are files. Notepad provides the IDE. So far I have used it for
scripts to generate scripts like those 3d models on my website. I hope to lose
a few more words before releasing it. You may have noticed that it does not
use '<' and '>'. This is to allow code and markup languages to be freely
intermixed.

------------------
www.pegge.net

IP: Logged

David Roberts
Member
posted February 02, 2007 11:44 AM     Click Here to See the Profile for David Roberts     Edit/Delete Message   Reply w/Quote
quote:
>'RND is uniformly distributed beteen 0 and 1

That's not a documented claim for RND.

quote:
How random is the RND function?

Working on the null hypothesis that the mean of all possible RNDs is 0.5, ignoring the fact that RND will never equal 1, the following code looks at generating 1000 RNDs 10000 times to see how well the null hypothesis holds.

#COMPILE  EXE
#REGISTER NONE
#DIM ALL
#TOOLS OFF

%NOMMIDS = 1
%NOGDI = 1

'#INCLUDE "WIN32API.INC"

TYPE PercentagePoints
one AS LONG
two AS LONG
four AS LONG
eight AS LONG
sixteen AS LONG
END TYPE


FUNCTION PBMAIN( ) AS LONG

LOCAL pPoints AS PercentagePoints
LOCAL tot, xbar, z AS SINGLE
LOCAL n, i, j AS LONG
LOCAL one, two, four, eight, sixteen AS LONG

RANDOMIZE

FOR j= 1 TO 10000

n = 1000

tot = 0
FOR i = 1 TO n
tot = tot + RND
NEXT
xbar = tot/n ' Average of n * RNDs

' Rectangular distribution probability density function
' f(x; alpha, beta) = 1/(beta - alpha)
' Mean: (beat + alpha)/2
' Variance: (beta - alpha)^2/12
' With RND we have beta = 1 and alpha = 0.

' The random variable z = (xbar - mean)/( SQR(Variance)/SQR(n) )
' is normally distributed.

' SQR(12) = 3.464101615

z = ABS(xbar - 0.5)*3.464101615*SQR(n)

SELECT CASE z
CASE > 2.32635 ' 1% point of standardised normal distribution
INCR pPoints.one
CASE > 2.05375 ' 2%
INCR pPoints.two
CASE > 1.75069 ' 4%
INCR pPoints.four
CASE > 1.40507 ' 8%
INCR pPoints.eight
CASE > 0.99446 ' 16%
INCR pPoints.sixteen
END SELECT

' Reseed for the next loop
RANDOMIZE RND ' On fast machines RANDOMIZE [TIMER] will give
' a repeat as TIMER will not have moved on

NEXT j

one = pPoints.one
two = one + pPoints.two
four = two + pPoints.four
eight = four + pPoints.eight
sixteen = eight + pPoints.sixteen
? " 1%:" & STR$(one)
? " 2%:" & STR$(two)
? " 4%:" & STR$(four)
? " 8%:" & STR$(eight)
? "16%:" & STR$(sixteen)

WAITKEY$

END FUNCTION

This output has been found to be common.

 1%: 209
2%: 399
4%: 776
8%: 1535
16%: 3148


What this is saying is that 209 times out of 10000 runs of 1000 RNDs the actual average would occur less than 1% of the time. That is to say there is strong evidence that the uniformity of RND is highly suspect 2.09% of the time.

In statistical significance terms we use 'some evidence', 'strong evidence' and 'very strong evidence'. In practice, what we do with them depends upon what we are considering. With a 'big bucks' or 'life and death' decision then 'some evidence' may be enough to rethink. In other cases we may decide to rethink only if there is 'very strong evidence' to question a null hypothesis.

We are getting a 4% significant level nearly 8% of the time which isn't bad but nothing to shout about so RND may not be the best vehicle for an iterative process.

Of course, for occasional use in an application RND is more than adequate and may still be adequate with frequent use; whatever we mean by frequent.

If a more robust generator is required then there will probably be a speed penalty and if we go the cryptographic route then the speed penalty will be massive compared with RND.



[This message has been edited by David Roberts (edited February 02, 2007).]

IP: Logged

Charles Pegge
Member
posted February 02, 2007 03:53 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
David,
Trying to understand your algorithm. Where does the 12 come from?
Can you explain a bit about the Rectangular distribution probability
density function?

------------------
www.pegge.net

IP: Logged

Charles Pegge
Member
posted February 02, 2007 05:13 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
My results are similar to yours but I wonder if the percentages on the
left are specified for a single tail of the probability curve.
If you double them. It's a good match.

1% 2.02%
2% 4.16%
4% 8.41%
8% 16.18%
16% 32.98%

------------------
www.pegge.net

IP: Logged

Emil Menzel
Member
posted February 02, 2007 06:05 PM     Click Here to See the Profile for Emil Menzel     Edit/Delete Message   Reply w/Quote
This might be a simpler approach. And if the results do not meet
theoretical expectation you can sue Lady Luck, not me.

I add a chi-square test, which might be the first type of
inferential statistic to apply. It is also easiest to compute, if
not also to understand. If you reliably get more than 5% of the
chi-square values within the printed table >= 3.84, then worry.
If you do not get any values that large, repeat the test a few
times more and you surely will. The value of Chi-tot that is
considered statistically significant depends on the degrees of
freedom (here, number of "bins" or rows, minus 1). For df=10, the
.05 level value is 18.31; for df=25 it is 37.65.

NOTE ADDED Feb 12, 2007
In PBDOS 3.5 [but not in PBCC 4.02] if you set "number of bins" in
the program equal to 2^M (e.g.) 4, 16, 32, 64, 128.... and N equal
to (e.g.) 2^16, you will get Chi-squares = 0.00000.... Change number
of bins by a count of 1 and you won't get that, or at least you will
not get it for sure.


'language PB3.5 for DOS
'test the mean, sd and distribution of RND
dim N as quad, seed as quad, X as quad, UB as quad, LB as quad
dim mean as ext, sd as ext
dim expect as ext, chi as ext, chiTot as ext

N=2^16
seed=timer
'set number of "bins" desired in freq. distr.
UB=15
LB=0
dim freq(LB to UB) as quad

randomize seed
CLS

print
x=Stats(mean,sd,N)
print using "RND Mean ##.############ sd ##.############ N #########";mean;sd;N
print "expected 0.500000 sqr(1/12) ";sqr(1/12)
print
print "Frequency distribution of RND"
print "bin obs expected chi-sq "

X=Uniform_distrib (N, freq())
expect=N / ((UB-LB)+1) 'expected freq in each bin, if distribution is uniform
for I=lbound(freq) to ubound(freq)
chi=(freq(i)-expect)^2 / expect
chiTot =chiTot + chi
print using "#### ###### ######.## ######.##";i; freq(i); expect; chi
next
print
print "Chi-square (tot)=";chiTot; " d.f. ="; UB-LB
print "Tables of Chi-square are available in almost any statistics textbook"
print "and of course online"

FUNCTION Uniform_Distrib (N as quad, freq() as quad) as quad
dim i as long, X as ext, LB as long, UB as long
LB=lbound(freq)
UB=ubound(freq)
FOR i = 1 to N
X = RND(LB,UB)
incr freq(X)
next
FUNCTION = 0
END FUNCTION

FUNCTION Stats (Mean as ext, sd as ext, N as quad) as long
dim x as ext, sumx as ext, sumx2 as ext, i as quad
for i = 1 to N
x=rnd(1)
sumx=sumx+x
sumx2=sumx2+ x*x
next
mean=sumx/N
sd=sqr((sumx2-sumx^2/N)/(N-1))
FUNCTION=0
END FUNCTION

------------------


[This message has been edited by Emil Menzel (edited February 12, 2007).]

IP: Logged

David Roberts
Member
posted February 02, 2007 09:13 PM     Click Here to See the Profile for David Roberts     Edit/Delete Message   Reply w/Quote
Charles

Uniform Distribution

I never think either/or and have habitually used ABS and single sided.

Emil

What's the period of RND? Can it stand nearly 2^20 without reseeding?

IP: Logged

Emil Menzel
Member
posted February 02, 2007 10:49 PM     Click Here to See the Profile for Emil Menzel     Edit/Delete Message   Reply w/Quote
David:
RND's period is 2^32. A program to test this is given in: http://www.powerbasic.com/support/forums/Forum7/HTML/002589.html

I have read that for critical problems it is best not to draw more
than about 3 million numbers before reseeding or switching one's
RNG. But I don't know how the author arrived at that number.

------------------

IP: Logged

David Roberts
Member
posted February 03, 2007 02:10 AM     Click Here to See the Profile for David Roberts     Edit/Delete Message   Reply w/Quote
> RND's period is 2^32

Does that go for PB3.5 DOS as well?

IP: Logged

Charles Pegge
Member
posted February 03, 2007 04:57 AM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
Chi Squared Tests

Applying Chi Squared pairs to my previous binomial probability curves versus RND values
I get a summed value of between 18 and 25. I tested this with trial sessions varying
from 500 to 1 million an got this result consistently.

Since there are 26 degrees of freedom involved in this test, this is well within
the expected range of variability.

However, just to muddy the waters, the individual Chi for #11 or #12 were
often above 4. So a more sensitive test might reveal some subtle bias, which
suggests that you get the equivalent of say 12heads:14tails more often than you
would expect by chance.

Here is a rerun of the test with the individual Chi squareds.

Results

Generating Probability Distribution for p=.5 with 50000 repeats of 26
Compared with binomial distribution for (.5+.5)^26
0
1.49011611938476E-8
#0
0
3.8743019104004E-7
#1
0
4.84287738800048E-6
#2
0
3.8743019104004E-5
#3 Chi^2=1.9
.00016
2.22772359848022E-4
#4 Chi^2=0.9
.00088
9.801983833313E-4
#5 Chi^2=0.5
_.00332
_3.43069434165955E-3
#6 Chi^2=0.2
____.00938
____9.80198383331297E-3
#7 Chi^2=0.9
________.02204
_________2.32797116041185E-2
#8 Chi^2=3.4
__________________.04706
__________________4.65594232082365E-2
#9 Chi^2=0.3
______________________________.07888
______________________________7.91510194540031E-2
#10 Chi^2=0.1
_____________________________________________.1177
____________________________________________.115128755569458
#11 Chi^2=3.2
________________________________________________________.14458
_______________________________________________________.143910944461822
#12 Chi^2=0.2
____________________________________________________________.15608
____________________________________________________________.154981017112731
#13 Chi^2=0.5
_______________________________________________________.1427
_______________________________________________________.143910944461823
#14 Chi^2=0.6
____________________________________________.11446
____________________________________________.115128755569459
#15 Chi^2=0.2
_______________________________.07952
______________________________7.91510194540031E-2
#16 Chi^2=0.1
__________________.0461
__________________4.65594232082366E-2
#17 Chi^2=0.2
_________.02388
_________2.32797116041185E-2
#18 Chi^2=0.8
___.009
____9.80198383331297E-3
#19 Chi^2=3.3
_.0033
_3.43069434165954E-3
#20 Chi^2=0.2
.00066
9.801983833313E-4
#21 Chi^2=5.2
.00026
2.22772359848023E-4
#22 Chi^2=0.3
.00004
3.8743019104004E-5
#23 Chi^2=0.
0
4.84287738800049E-6
#24
0
3.8743019104004E-7
#25
0
1.49011611938476E-8
#26


total chi^2=23.0802957820652 at 26 degrees of freedom


------------------
www.pegge.net

[This message has been edited by Charles Pegge (edited February 03, 2007).]

IP: Logged

Charles Pegge
Member
posted February 03, 2007 06:07 AM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
David,

Thanks for the Wolfram link. I will have to improve my Greek. This is
very hard core. It will take a while to absorb.

These are the confidence figures normally given for standard deviations,
so I think the sd % values used in your program need to be doubled.

1s 68.26894921371%
2s 95.44997361036%
3s 99.73002039367%
4s 99.99366575163%

http://en.wikipedia.org/wiki/Standard_deviation


------------------
www.pegge.net

[This message has been edited by Charles Pegge (edited February 03, 2007).]

IP: Logged

David Roberts
Member
posted February 03, 2007 12:36 PM     Click Here to See the Profile for David Roberts     Edit/Delete Message   Reply w/Quote
quote:
so I think the sd % values used in your program need to be doubled.

Charles, you have persuaded me.

This leads to a remarkable rule of thumb.

Assuming that RND is uniformly distributed then there is an x% probability that it will produce results which only occur x% of the time.

Suppose we define suspect at 5% and highly suspect at 1% then our rule of thumb says:
1% of the time RND is highly suspect.
5% of the time RND is suspect.

OR

99% of the time RND is not highly suspect.
95% of the time RND is not suspect.

For most practical purposes this gives, excuse the pun, the thumbs up for RND; as if it needed it.

IP: Logged

Emil Menzel
Member
posted February 03, 2007 12:50 PM     Click Here to See the Profile for Emil Menzel     Edit/Delete Message   Reply w/Quote
David:
I believe that PBDOS RND period is also 2^32 but I did not have time
to test it.

Charles:
Wolfram on distributions might be fun for PhD candidates in mathematical
statistics, but in any applied science I know it would suffice to say,
as an intro stat textbook might, that:
"A rectangular distribution is one that has uniform frequencies over
all class intervals" (or equal frequencies in each "bin", to use the
language I used earlier).

Re Chi-square. The formula I gave [chi-sq= ((o-e)^2) / e] was for
frequencies, not proportions. I hope that you caught that rather than
vice versa. Another potential gotcha in using chi-square is that if
expected frequencies are "very small", chi-square values will be
"artificially" inflated. E.g, (3-1)^2 / 1 = 4 which, for 1 d.f.
is supposedly p<.05; but it ain't, in my book. But in any event I did
not see anything in your results that would lead me to worry about
random number generator.

If you do worry, I'd advise skipping the transformation of your raw
RNDs into a theoretically normal distribution; examine the raw RNDs
as such. Also, there are a number of different ways of "normalizing"
RNDs and some might do better than others. Some, for example, use a
large sample of RNDs to generate a single value of RND-normal, and
in principle, they should have the edge.



------------------

IP: Logged

Charles Pegge
Member
posted February 03, 2007 01:13 PM     Click Here to See the Profile for Charles Pegge     Edit/Delete Message   Reply w/Quote
David,

I agree with your conclusions but I am seeing a consistently high Chi value in the #11 or #12
area. #13 being the peak of the curve where 50% of RNDs are below 0.5 and 50% above. So I am
setting up a chi chart to see where the variance shows up over multiple sessions. I'm just
curious.

Does anyone here know what algorithm is used in the RND function?


------------------
www.pegge.net

IP: Logged


This topic is 10 pages long:   1  2  3  4  5  6  7  8  9  10 

All times are EasternTime (US)

next newest topic | next oldest topic

Administrative Options: Close Topic | Archive/Move | Delete Topic
Post New Topic  Post A Reply
Hop to:

Contact Us | PowerBASIC BASIC Compilers

Copyright © 1999-2006 PowerBASIC, Inc. All Rights Reserved.


Ultimate Bulletin Board 5.45c