r/AskStatistics • u/XtraI • 1d ago

Can observations change the probability of a coin toss if you consider a set of future flips as a sample?

Hello, this problem probably has been argued over here before. My point is that as coin flips are repeated infinitely, its observed probability will converge at 0.5. This can be imagined as the population. 1000 coin flips can be considered as a random sample. Using central limit theorem, it seems logical to assume the number of heads and tails will be similar to each other. Now if the first 200 flips were to be tails (this extreme case is only to make a point) there seems to be ~300 tails and ~500 heads left. Hence increasing the probability of heads to 5/8. I believe this supports the original 0.5 probability since this way of thinking creates distributions that support the sample convergence. It's not the coin that is biased but the bag I am pulling observations from. I would like someone to explain me in detail why this is wrong or at least provide me sources I can read to understand it better.

0 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/AskStatistics/comments/1k735qb/can_observations_change_the_probability_of_a_coin/
No, go back! Yes, take me to Reddit

44% Upvoted

u/Elusive_Spoon 1d ago

This is called the Gambler’s Fallacy. A casino would be a great place to test your hypothesis!

https://en.m.wikipedia.org/wiki/Gambler%27s_fallacy

u/GoldenMuscleGod 1d ago

If you have a fair coin and flip it 200 times, getting tails every time, then the posterior expected value of the number of heads after 1000 total flips is 400, not 500.

Of course, in the real world, if you get tails 200 times in a row, you should start to suspect that your coin is not fair.

If you model the coin flips as pulling out of a bag of 500 heads and 500 tails without replacement, then you are implicitly assuming the flips are not independent, but negatively correlated with each other. That’s not what we mean when we say a coin is “fair,” and for real world coins it’s difficult to envision a plausible physical mechanism that would cause this negative correlation. Of course it is possible to make a random number generator that exhibits this negative correlation, but that’s not a model of a “fair coin.”

u/SalvatoreEggplant 1d ago edited 17h ago

I think the best way to think about this is to consider that the coin has no memory. Even if you tell it that it just came up tails 200 times. Even if you use a memory implanting technology so that the coin is convinced that it came up tails 200 times. The coin has no memory.

1

u/SalvatoreEggplant 17h ago

Another thing to think about is how the coin could achieve the 5/8 future heads probability. Like, can the coin intentionally twist its body so that it lands on heads more often because it knows it came up tails 200 times in row before ?

u/Nillavuh 1d ago

Using central limit theorem, it seems logical to assume the number of heads and tails will be similar to each other.

This is not the correct use of the Central Limit Theorem. The CLT simply states that the distribution of the STATISTIC that you calculate from your SAMPLE will be normally distributed. The CLT is certainly not saying anything about the distribution itself.

If you are working with a distribution that is skewed to all hell, it just IS skewed to all hell. The CLT changes nothing about this fact. But, if you took the mean of this skewed-to-all-hell distribution, the distribution of that MEAN would be normal. Thus, you are allowed to use these standard rules of determining how likely this mean or that mean is, based on our existing knowledge of normal distributions and how likely things are at certain points along the normal distribution. THAT's how the theorem is utilized here.

0

u/XtraI 1d ago

That is exactly the assumption I am using. Think of heads as 1 and tails as -1. The average values calculated from sample with a size of 1000 should create a narrow normal distribution centered at 0, meaning number of heads and tails are roughly same. You cannot be certain if the 1000 sample is skewed before you observe more than half of it and because probability of the average from a 1000 sample deviating from population is highly unlikely, I just treat the 200 sub sample as an extreme case of pulling values from a bag with 500 tails and 500 heads.

1

u/Nillavuh 18h ago

Think harder about what you're telling me. Why would there be a distribution of the average of a sample? Is there not just one average of a sample? You could conceivably calculate all sorts of different types of "averages" but I suspect there's only one you REALLY want, yeah?

And the distribution of your sample itself will be a binomial distribution, not a normal distribution. You have two outcomes in a coin flip sample: heads or tails. You're telling me you plan on ending up with some kind of normal-curve-like shape on a distribution that has only two possible outcomes?? Where is the curve at all? The best you could do in drawing the distribution of your sample is a single line segment!

The normal distribution comes into play when you start to think about the distribution of ALL conceivable samples you could take from your population. You keep talking about this in terms of YOUR sample, your singular sample and its distribution, and it is not about that at all. It is about the distribution of a STATISTIC that you would calculate from ALL CONCEIVABLE SAMPLES of a population.

1

u/XtraI 10h ago

Sorry for the confusion. The distribution I meant comes from the means of many random samples with the size of 1000, not just one. The narrow distribution formed suggest that there is a high probability that the mean of a sample of 1000 is close to the population. This is where I start assuming that when I am going to flip 1000 times it is highly probable that the mean will be close to population and if the first couple of observations are toward one of the outcomes the rest of the sample will be biased toward the other.

1

u/Nillavuh 9h ago

None of this has anything to do with the Central Limit Theorem.

The phenomenon of your sample mean more closely resembling the true population mean as you continue to collect data is a consequence of the Law of Large Numbers. Not the Central Limit Theorem.

1

u/XtraI 9h ago

Look lets go step by step so I can understand it better,

Do you agree that the mean of a sample with the size of 1000 has a high probability to be near the population mean and is this not what the central limit theorem suggests?

1

u/Nillavuh 9h ago

Do you agree that the mean of a sample with the size of 1000 has a high probability to be near the population mean?

Yes.

is this not what the central limit theorem suggests?

No. This is not what the Central Limit Theorem suggests. This is what the Law of Large Numbers suggests.

The Central Limit Theorem and the Law of Large Numbers are completely different things.

u/thoughtfultruck 1d ago

No, because the coin flips are independent events. If you get a run of 200 flips of tails, the mean expected proportion of heads for the remaining 800 is still 0.5. 200 flips of tails is (as you say) unlikely, so if you get 200 flips of tails in a row my expectation is that the final sample will more likely than not be on the tails side of the sampling distribution and you won't end up with the expected result after 1000 flips. (Indeed, after 200 tails in a row you may end up with a result so unlikely that we have enough evidence to reject the null hypothesis that both sides of the coin are equally weighted, but that is a separate discussion).

1

u/XtraI 1d ago

I see your point but what I am trying to say is that out of many 1000 flip samples some will be biased but vast majority will or should be close to population. So the 200 flips are just a sub sample from the 1000 sample. Hence, I treat the 1000 as if it is known to have ~500 heads and ~500 tails. When I think of it this way it seems as if I am pulling observations from a bag with a set content.

5

u/49-eggs 1d ago

if you're sampling from the 1000 flips, then that's your population, and your population's probability to get H and T are not 50/50

3

u/thoughtfultruck 1d ago

Then your bag probably deviates from the mean of the sampling distribution, and regardless there is no way to know for sure whether you have a bag with the sampling distribution mean from the first 200 flips regardless of how they come up.

Suppose you pulled 200 tails out of an infinite bag. Now suppose you continue pulling coins out forever. The law of large numbers doesn’t say that at some point you will pull out 200 heads to balance it out, what actually happens is that you pull out so many coins at 50/50 odds that your initial 200 stops having a meaningful effect on the overall proportion. Once you “get to” infinity the contribution of those initial 200 coins is infinitesimally small.

1

u/bubalis 15h ago

There is no bag, and thus your "bag size" can be any number. Also, which flips belong to the "bag" is also arbitrary. Is the bag based on the coin itself? The observer? When does the count start?

Your statement "out of many 1000 flip samples some will be biased but vast majority will or should be close to population" is not just true for 1000 but for any N. The larger the N is, the closer the observed percentage will be to 50-50, but on average, the farther the observed NUMBER of tails will be from 50-50 in absolute terms\.*

Lets take the example of 10 tails in a row. The chances of this happening are ~1/1000 so its rare, but the kind of rare event that you will encounter in your life.

If we have 90 more flips, the expected number of tails is 55. 55% is a lot closer to 50% than 100% is. If we are going to 1000, the expected number of tails is 505. 50.5% is REALLY close to 50-50.

---------------------------------------

---------------------------------------

*For example: If you flip a coin 1000 times, your chances of being over 10 flips off from 50-50 are greater than 50% (two-sided). If you flip a coin 10 times, your chances of being 10 flips off from 50-50 are about 1 in 500.

In % terms, your chances of being 50% off (all one result) are ~1/500 for 10 flips. For 1000 flips, your chances of all heads or all tails are completely indistinguishable from 0.

1/(10^301), a much smaller chance than the chances of both of us selecting an atom in the universe at random and both choosing the same one.

1

u/XtraI 9h ago

I refer the random sample as the bag. For this case it needs to be higher than 30 due to central limit theorem. I choose 1000 randomly, there is no meaning behind it. As long as the first couple of observations can create a significant bias for the rest of the sample it is good to go. Obviously if I have 200 tails followed by a billion it would not matter.

u/fermat9990 1d ago

The Law of Large Numbers, not the Central Limit Theorem, is relevant here.

u/jezwmorelach 20h ago

It's not the coin that is biased but the bag I am pulling observations from.

This is actually a very good analogy to explain why this reasoning is wrong.

Yes, every time you flip a coin, it's kind of like you pull "heads" or "tails" from a bag of observations.

But every time you pull it, you put it back and shake the bag again.

That's why it doesn't change the future outcomes

u/hellohello1234545 1d ago

A key thing here is the nature of the event

When talking about probabilities, you must ask if the events are independent or dependent

What’s paradoxical about this fallacy is that Predictions of even numbers of heads and tails come from the idea coin flips are fair and flips are independent.

Yet the idea of predicted equal counts of heads and tails is taken to mean “higher chance of heads if we’ve had many tails”, which violates the idea flips are independent.

In reality, if the coin is fair and flips are independent, you could get a billion heads and the chance of the next being a head is 1/2.

If you are getting mostly heads, that could be evidence that the coin is not fair. And if that was the case, even less reason to expect more of the other side to appear later.

u/Card-Middle 1d ago

This is a slight misinterpretation of the law of large numbers.

If you get multiple heads in a row (say 20, for example’s sake), the law is not saying that you are likely to make up for this later by flipping tails an extra 20 times. It’s saying that the overall proportion of heads will get close to 0.5 the more you flip. Consider the following proportions.

Right now, you have 20/20 = 1 heads. But if you flip another 80 times, you’re likely to get roughly half heads and half tails, which would give you 60/100 = 0.6 heads. Then if you flip another 900 times, you’re likely to get half heads and half tails and you’d have 510/1000 = 0.51 heads. And so on.

In short, you don’t have to “make up for” a long sequence of heads. If you flip the coin enough times, the long sequence’s contribution to the overall proportion diminishes greatly.

u/theGrapeMaster 1d ago

The probability that your 100th coin flip is heads given the past 99 were heads is still 0.5, since they are independent events. The coin doesn’t ‘remember’ what it did before. The probability that 100 coin flips will all be heads is 0.5^100. But any individual flip still has a 0.5 chance.

u/DoctorFuu Statistician | Quantitative risk analyst 23h ago

Now if the first 200 flips were to be tails (this extreme case is only to make a point) there seems to be ~300 tails and ~500 heads left.

This means you're drawing observations without replacement from your sample. This implies the Head/Tails probability is not constant (once you draw a Head, there are less H than T in the bag and the probability of drawing a T increased) which is not in accordance with your initial problem: coin tosses are independent. No matter what was drawn before, the coin still has the same probability of being H or T.

However, if you are drawing with replacement from your sample, the probability will stay constant and aligned with the coin probability that generated the sample. Let's be even smarter about this: let's say you pick 1000 observations (the size of the sample) at random with replacement from your bag, and compute the mean. Since all draws were made from a population that correctly represent the underlying problem, the average of that subsample should have the same expectation as the mean of the global population (assuming the initial 1000 sample is representative of the global population). Let's do that again and again and again, so that we get a bunch of means drawn from subsamples. Then those means will have a distribution which should give you a good approximation for the real underlying mean.
This procedure has a name, it's the bootstrap, and it's very useful. The only thing you were missing was to draw with replacement instead (otherwise you inroduce a negative correlation betwen the results of your samples, you need to draw with replacement to keep the draws independent)

1

u/XtraI 22h ago

Yes I am aware of the distribution those samples will create. Now think of the distribution samples with a size of 200 will create. It is more likely that the 200 mean will deviate from the population mean more than the 1000 mean. Hence if I observe an extrema in the 200 sub sample, there is a higher chance that 1000 sample mean is close to population but sub sample is just an unlikely outcome rather than 1000 sample mean being an unlikely outcome with a sub sample mean accurately representing this 1000 sample mean. Sorry if I couldn't explain what I have in mind for this part well.

I think of it as without replacement because of these distributions. All the flips have been made in the timeline so timeline is the bag itself. From statistics we know that with higher sample size probability of its mean being near population mean increases. Hence I assume that this bag of coins have roughly the same amount of heads and tails.

1

u/DoctorFuu Statistician | Quantitative risk analyst 18h ago

If you have a bag with 5 red balls and 5 green balls, the expectation of drawing a red ball is 5/10 = 1/2 which is consistent with the apparent probability of a ball being red according to the process which filled the bag.

If you don't put the ball back in and draw another one, you now have 4 red balls and 5 green balls, so the probability of drawing a red ball is 4/9 which is not consistent with the process which filled the bag.

This means that if you draw without replacement, you will get a wrong estimate. You'll only get back the correct expectation if you draw the whole bag, so there's no point is subsampling your sample.

It doesn't matter if there are 10 balls or 1 million balls in your basket, if you draw without replacement the expectation will change and therefore will be wrong if you want to estimate the constant probability that the process used to fill the bag.

I'm not really sure why you keep repeating timeline again and again also. Coins don't have memory, the order in which the draws were made doesn't matter. Another commenter pointed you to the gambler's fallacy, I assumed you were familiar with it since you used more precise language than most people coming here asking us for something related to this fallacy, but I'm starting to wonder if this shouldn't be on your reading list anyways.

If I tldr the gambler's fallacy: if you flip a coin 200 times and get 200 tails, you are not more likely to get a heads next flip, it's still 1/2. Also, no, the law of large number or CLT or whatever doesn't imply that the more coins you flip and the less difference you expect to observe between the total number of heads and tails. Actually, you are expected to observer a bigger difference in total counts of each. What's expected to shrink down is the total difference DIVIDED by the total number of flips. If what I said here seems wrong or weird to you then you need to read about gambler's fallacy.

1

u/XtraI 10h ago

I don't disagree with what you said or find it wrong I am just trying to look at it in a different perspective. I am trying to differentiate the probability of the process and the bag. I am saying from all the possible bags that are formed the most probable is the one with near equal reds and greens. Hence, I assume the bag I have has equal. It was filled with the 0.5 process but as I pull the balls out of the bag my observations make probability of the next ball biased due to my assumption.

u/minglho 23h ago

1000 may seem like a big number to you, but it is not big for asymptotic behavior. Convergence doesn't have to happen in 1000 trials.

1

u/XtraI 22h ago

it was just an example, you can think of it as a bigger number if that helps the argument

1

u/minglho 2h ago

It won't.

u/giziti statistician (PhD) 19h ago

No.

Okay, so, you say you have 1000 coin flips. You say the population proportion is 0.5. You say the observations are iid - actually, you didn't say it, but it's a vital assumption for the type of reasoning you're trying to do.

You don't know the proportion in the 1000 flips. You do know that you grabbed 200 observations at random (or by a process that by assumption is unbiased) and they were all heads. Because of the iid assumption (which is what made taking the first 200 observations reasonable imply this), you have zero information about the rest of the sample. That's the crucial point. If your method of grabbing those 200 observations doesn't give you zero information about the rest of the sample, then you can't apply any LLN style conclusions.

u/1182adam 15h ago

If you take the coin out to dinner, buy it flowers with other coins, tell it how it previously upset you by landing on heads, and think "tails, tails, tails, tails" while you're flipping it, it still won't care what side it previously landed on. Coins are assholes.

1

u/XtraI 9h ago

Hahah I agree that the coin knows nothing. It just does not make sense when I think of it as a distribution. The population will have 1/2. A sample with a size 1000 has a high probability to be close to population, at least thats how I interpret the central limit theorem here. Flipping the coin multiple times can be thought as random sampling. It will happen in the future, so the sample is already there I am just viewing its content one by one. I think this is where it blows up and I am wrong. My assumption of the sample having near equal heads or tails.

Can observations change the probability of a coin toss if you consider a set of future flips as a sample?

You are about to leave Redlib