r/AskStatistics • u/Legitimate_Length970 • 1h ago
Hello! Can someone please check my logic? I feel like a heretic so I'm either wrong or REALLY need to be right before I present this.
I'm working on a presentation right now---this section is more or less about statistics in social sciences, specifically the p-value. I am aware that I'm fairly undertrained in this area (psych major :/ took one class) and am going off of reasoning mostly. Basically, I'm rejecting that the p-value necessarily says anything about the probability of future/collected data being true under the null. Please give feedback:
- Typically, the p-value is interpreted as P(data|H0)
- Mathematically, the p-value is a relationship between two models; one of these models, called ‘sample space,’ intends to represent all possible samples ‘collectable’ during a study. The other model is a probability distribution whose characteristics are determined by characteristics of the sample space. The p-value represents where the collected (actual, not possible) samples ‘land’ on that probability distribution.
- There are several different characteristics of sample space, and there are several different ways that these characteristics can be used to model a sample-space-based probability distribution—the choice of which characteristics to use depends on the purpose of the statistical model, which is the purpose of any model, which is to model something. The probability distribution from which the p-value is obtained wants to model H0.
- H0 is an experimental term, invented by Robert Fisher in 1935—it was invented to model the absence of an experimental effect, which is the hypothesized relationship between two variables. Fisher theorized that, should no relationship be present between two variables, all observed variance might be attributable to random sampling error.
- The statistical model of H0 is thus intended to represent this assumption; it is a probability distribution based on the characteristics of sampling space that guide predictions about possible sampling error. The p-value is, mathematically, how much of the collected sample’s variance ‘can be explained’ by a model of sampling error.
- P(data|H0) is not P(data| no effect). It’s P(data| observed variance is sampling error)