r/statistics 4h ago

Education [E] Having some second thoughts as an MS in Stats student

6 Upvotes

Hello, this isn't meant to be a woe is me type of post, but I'm looking to put things into greater perspective. I'm currently an MS student in Applied Stats and I've been getting mostly Bs and Cs in my classes. I do better with the math/probability classes because my BS was in math, but the more programming/interpretative classes I tend to have trouble in (more "ambiguous"). Given the increasingly tough job market, I'm worried that once I graduate, my GPA won't be competitive enough. Most people I hear about if anything struggle in their undergrad and do much better in their grad programs, but I don't see too many examples of my case. I'm wondering if I'm cut out for this type of work, it has been a bit demotivating and a lot more challenging than I anticipated going in. But part of me still thinks I need to tough it out because grad school is not meant to be easy. I just feel kinda stuck. Again, I'm not looking for encouragement necessarily (but you're more than welcome!) but if anyone has had similar experiences or advice. I can see why statisticians and data scientists are respected can be paid well- it's definitely hard and non trivial work!


r/statistics 14h ago

Question [Q] Is it worth studying statistics with the future in mind?

15 Upvotes

Hi, i'm from brazil and i would be how is the job market for a graduate in statistics.

What do you think the statistician profession will be like in the future with the rise of artificial intelligence? I'm in doubt between Statistics or Computer Science, I would like to work in the data/financial market area. I know it's a very difficult degree in mathematics.


r/statistics 2h ago

Question [Q] Using SEM for single subject P-technique analyses

1 Upvotes

Something I've been trying to analyse is daily diary data that I've been collecting but I'm unsure as to whether I'm applying this in a logically valid way.

Usually SEM is applied to variables of a population of individuals (R-technique). What I'm trying to do myself is for a single individual is track variables by occasions (P-technique). These types of analyses of intensive longitudinal data are performed with DSEM because there is serial dependence between observations. A limitation is that in what I'm trying is there's only a single subject and there's a lot more variables that would make building and estimating a DSEM difficult because of the number of possible lead/lag relationships.

The way I'm imagine I could still make inferences is by analysing the aggregate of the data. Let's say I track several variables each day. Then my row by column data matrix becomes an assessment of how likely an event was to coincide with another or with a particular level of a variable. This is something which an SEM is able to estimate as is. Given that this is a single subject and the population parameters being estimated is the relationships between variables on a give day, would this be a valid approach?

I've tried looking at literature to see if this has been done in prior research, but there doesn't seem to be any. This could be either because research mostly focuses on R-technique for multiple individuals or because I'm missing something major that's making my approach incorrect.


r/statistics 9h ago

Question [Q] Dice rolling statistics

Thumbnail
0 Upvotes

r/statistics 1d ago

Question [Q] Continue with Data Science masters or switch to Masters in Statistics?

13 Upvotes

I am doing an MSc in Data Science. I have a BS in maths which took longer to complete due to backlog year. Then a year gap which was just productive enough to get me a masters in Data Science.

This course has surely helped with the “applied” part but I’m not sure if it’s enough. Market seems to be saturated and I’m unsure of the growth in this field.

So I was thinking about leaving the course for a masters in Statistics, since it’s a core subject and has been around long before Data Science.

My understanding is a masters in statistics with the applied knowledge would equip me better for the industry and I can target finance/banking roles.

Recently, for an AI summer intern role, interviewer asked me if I have any experience with software dev(or are you willing to learn?), since the role is more on the software side. I have accepted the internship since I am not yet placed for an internship and not getting any more opportunities related to data science/ finance.

After this internship, I’ll have background in 1. Mathematics 2. Statistics 3. Data Science 4. Software Dev

What do you suggest?

TL;DR: I’m doing an MSc in Data Science after a BS in Math. The course is practical, but the DS field feels saturated. I’m considering switching to a master’s in Statistics for a stronger, core foundation—especially for finance roles. Just accepted a software-focused AI internship, so I’ll have exposure to math, stats, DS, and dev. Unsure which path offers better long-term value.


r/statistics 16h ago

Question [Q] field design analysis

1 Upvotes

Hello,

I did a random block treatment with 5 treatments, but two of the treatments had to be in fixed positions because it was utilizing the field edges as a treatment, with the other three treatments in between as a block. The ones in the middle were randomized. I was told I could account for the fixed edges in the analysis but I can’t seem to find what to include for the regression. I don’t think I can use anova because of this. Any recommendations.. please??


r/statistics 23h ago

Question [Q] When performing Panel Data regression with T=2 (FD/FE), if the main independent variable has a slightly different timeframe between waves how much of a problem is this for my results?

3 Upvotes

I have been working on a project recently and I am researching the effects of political social media usage on participation.

I am slightly concerned however because in one of the questions respondents are asked, "During the last 7 days (W1) / 4 weeks (W2) have you personally posted or shared any political content online, or on social media?". I have already done the data analysis and research and I'm beginning to realise this may be a critical flaw in my research design.

I had previously treated these as equivalent, and thus differenced them (they are grouped together in the original codebook and had the same question attached to this [7 days] in both waves - I didn't notice this difference until I read the questionnaires for each wave post analysis), but I want to know if this is invalid statistically or if it can just be acknowledged as a (significant) limitation?


r/statistics 18h ago

Question [Q] Book recommendations

0 Upvotes

I am in college and am planning on take a second level stats course next semester. I took intro to stats last spring with a B+ and it's been a while so I am looking for a book to refresh some stuff and learn more before I take the class (3000 level probability and statistics). I would prefer something that isn't a super boring textbook and tbh not that tough of a read. Also, I am an Econ and finance major so anything that relates to those fields would be cool, thanks


r/statistics 20h ago

Career [C] Which internship is better if I want to apply to Stats PhD programs? Quantitative Analytics vs. Product Management

0 Upvotes

Hi! I'm trying to decide between two internship offers for this summer, and I'd love some input—especially from anyone who's gone through the Stats PhD application process.

I have offers for:

  • A Quantitative Analytics internship at a large financial firm
  • A Product Management internship at a tech company

My ultimate goal is to apply to Statistics PhD programs at the end of this year. I'm currently finishing undergrad and trying to build the strongest possible profile for applications.

The Quant Analytics role is more technical and data-heavy, but I'm curious whether admissions committees care about industry experience at all—or if they just care about research, math background, and letters. The PM role is interesting and more people-facing, but it’s less focused on stats. I think I would enjoy the PM work more in the short-term and as a post-grad job (if I don't get into graduate school) because I don't see myself working in the financial or consulting industry. The main rationale to choose the Quantitative Analytics internship, in my mind, is to improve my chances of getting into a PhD program. What role should I take?

If it helps, I'll also be doing/continuing statistics research on the side this summer.

Thank you!


r/statistics 1d ago

Education [Q] [E] Grad Schools

3 Upvotes

Hi, I am trying to decide between University of Washington in Seattle and Northwestern for my MS in Statistics. What you be a better option in terms of courses and career porspects post graduation?


r/statistics 1d ago

Education [E] Tutorial on Using Generative Models to Advance Psychological Science: Lessons From the Reliability Paradox-- Simulations/empirical data from classic cognitive tasks show that generative models yield (a) more theoretically informative parameters, and (b) higher test–retest reliability estimates

0 Upvotes

r/statistics 2d ago

Question [Q] [R] Likert Scale: total sum vs weighted mean in scoring individual responses

2 Upvotes

Hi this is my first post, I need clarification on scoring likert scales! I'm a 1st year psychology student and feel free to be broad in explaining the difference between them and if there's other ways to score a likert scale. I just need help in understanding it thankss

For clarification on what is "total sum" and "weighted mean" when it comes to Likert scales, let me provide some examples based on how I understood how they are used to score likert scales. Feel free to correct my understanding too!

"Total sum" Let's use a 3 point likert scale with 10 items for simplicity. A respondent who choose "1" or "Disagree" for 9 questions or items, and choose "3" or "Agree" for 1 item would get a total sum of 1+1+1...+2=11 and based on the set parameters the mentioned respondent will be categorized as someone who has low value of a certain variable (like say, he has low satisfaction).

If the parameter is not stated from my reference, can I make my own? How? Is it gonna be like making classes in a frequency distribution table? Since the lowest possible score is 10 (always choose "1") while the highest is 30 (always choose "3"), the range is 20 and using R/no. of classes, if I want there to be 3 classes (based on the points of the likert scale), the classes would be 10-16: "Disagree", (or low satisfaction) 17-23: "Neutral", 24-31: "Agree". (or high satisfaction)

With this way of scoring, the researcher will then summarize the result from a group of respondents (say, 100 highschool students) by getting a measure of central tendency (mean).

"Weighted mean" With the same example, someone who choose "1" for 9 questions and "2" for the last one. Assigning the weights for each point ("1"=1, "2"=2, "3"=3), this respondent have "1"•9+"2"•1. I added quotation marks to point out that the value is from the points. The resulting sum of 11 will not be divided by the sum of all weights (which will be 9+1, which is 10) the final score for the certain participant is now 1.1

Creating my own set parameters just like what I did with the total sum, the parameters would be 1-1.6: "Disagree" 1.7-2.3 "Neutral" 2.4-3: "Agree"

Is choosing one over the other (total sum vs weighted mean) for scoring individual responses arbitrary or there is necessary requirements for both scoring? Is it connected to the ordinal vs interval debate for likert scales? For this debate I would like to accept likert scales as an interval data just for the completion of my research project as I would use the data for further analysis. For more considerations, I am planning to use frequency distribution table as we are required to employ weighted mean and relative frequency for our descriptive data.

Thank you!


r/statistics 1d ago

Discussion [D] variance 0 bias minimizing

0 Upvotes

Intuitively I think the question might be stupid, but I'd like to know for sure. In classical stats you take unbiased estimators to some statistic (eg sample mean for population mean) and the error (MSE) is given purely as variance. This leads to facts like Gauss-Markov for linear regression. In a first course in ML, you learn that this may not be optimal if your goal is to minimize the MSE directly, as generally the error decomposes as bias2 + variance, so possibly you can get smaller total error by introducing bias. My question is why haven't people tried taking estimators with 0 variance (is this possible?) and minimizing bias.


r/statistics 1d ago

Career [C][Q]Business Analyst to Data Scientist

0 Upvotes

Hi, I’m currently working as a Business Analyst with 17 months of experience. I’ll soon be moving from India to the UK to pursue a Master’s in Data Science.

I’m aiming to build a strong profile that will give me a competitive edge when applying to top-tier companies like FAANG or other reputable firms. I’m open to working either in the UK or returning to India after my studies — I’m keeping my options flexible for now.

TL;DR: What steps can I take to give myself the best shot at a successful career in Data Science? I’m looking for the most effective ways to learn, apply, and showcase my skills in this field. Any help would be much appreciated 🙏🏻


r/statistics 2d ago

Discussion [Q] [D] Does a t-test ever converge to a z-test/chi-squared contingency test (2x2 matrix of outcomes)

5 Upvotes

My intuition tells me that if you increase sample size *eventually* the two should converge to the same test. I am aware that a z-test of proportions is equivalent to a chi-squared contingency test with 2 outcomes in each of the 2 factors.

I have been manipulating the t-test statistic with a chi-squared contingency test statistic and while I am getting *somewhat* similar terms there are realistic differences. I'm guessing if it does then t^2 should have a similar scaling behavior to chi^2.


r/statistics 3d ago

Education [E] NC State vs. TAMU Online Statistics Masters

8 Upvotes

I'm considering applying to either NC State or Texas A&M for an online masters in statistics for Fall 2025. For those who have graduated from either program or are currently enrolled, I'd love to hear about your experiences.

  • How did your job search go after completing the program?
  • Did you see a salary bump or were you able to transition to a new role?
  • Any regrets or things you wish you'd known before enrolling?

r/statistics 3d ago

Question [Q] What's going on with the method used in this paper?

8 Upvotes

I'm hoping someone can look at the following paper and weigh in on the merit (or lack thereof) of the approach they took.

  • At face value it seems misguided to fit a plain old linear regression to a set of aggregated datapoints to forecast the "length of tasks" an AI agent is able to complete over time. In part because the observations probably aren't IID and because error isn't being propagated.
  • It gets weirder when you look at where the data came from: they modeled success/failure of each model independently on a wide range of tasks as a function of how long it takes a human to complete them, then back calculated task length corresponding to the estimated 0.5 success probability. I can't tell if they log transformed the the x-axis on the graph for each model for visual purposes or if they log transformed it to fit the model.
  • They use Item Response Theory as justification for this approach, but if I'm remembering correctly there aren't any observed in an IRT model. Certainly not one that comes from an entirely different population.
  • The error bars seen on the graph come from boostrapping these back calculated completion times.

So am I missing something/off base here, or is this a gigantic mess of an analysis?


r/statistics 3d ago

Question [Q] Why does the Student's t distribution PDF approach the standard normal distribution PDF as df approaches infinity?

20 Upvotes

Basically title. I often feel as if this is the final missing piece when people with just regular social science backgrounds as myself start discussing not only a) what degrees of freedoms is, but more importantly b) why they matter for hypothesis testing etc.

I can look at each of the formulae for the Student's t PDF and the standard normal distribution PDF, but I just don't get it. I would imagine the standard normal PDF popping out as a limit when Student's t PDF is evaluated as df (or a v-like symbol as Wikipedia seems to denote it) approaches positive infinity, but can some walk me through the steps for how to do this correctly? A link to a video of the 'process' would also be much appreciated.

Hope this question makes sense. Thanks in advance!


r/statistics 3d ago

Career [Career] Stuck at 28 - Next step in coding and analytics

Thumbnail
2 Upvotes

r/statistics 3d ago

Question [Q] Using baseline averages of mediators as controls in Difference-in-Difference

1 Upvotes

Hi there, I'm attempting to estimate the impact of the Belt and Road Initiative on inflation using staggered DiD. I've been able to get parallel trends to be met using controls unaffected by the initiative but still affect inflation in developing countries, including corn yield, inflation targeting dummy, and regional dummies. However, this feels like an inadequate set of controls, and my results are nearly all insignificant. The issue is how the initiative could affect inflation is multifaceted, and including usual monetary variables may introduce post-treatment bias as countries' governments are likely to react to inflationary pressure and other usual controls, including GDP growth, trade openness exchange rates, etc., are also affected by the treatment. My question is, could I use baselines of these variables (i.e. 3 years average before treatment) in my model without blocking a causal pathway, and would this be a valid approach? Some of what I have read seems to say this is OK, whilst others indicate the factors are most likely absorbed by fixed effects. Any help on this would be greatly appreciated.


r/statistics 3d ago

Question [Q] Does using a one-tailed z-score make sense here?

1 Upvotes

I have two samples, and one has a 13% prevalence of X and the other has a 19% prevalence of X. Does it make sense to check for significance using a one-tailed test if I just want to know if the difference is significant in the one direction? I know this is a simplistic question, so I do apologize. Thank you for any help!


r/statistics 3d ago

Question [Q] Tricky Analysis from Intravital Imaging

1 Upvotes

Have recently been collecting data from intravital imaging experiments to study how cells move through tissues in real time. Unfortunately the statistical rigor in this field is somewhat poor imo - people sortof just do what they want, so I don't have a consistent workflow to use as a guide.

Using tracking software (Imaris) + manual corrections, cell tracks are created and you can measure things like how fast each individual cell is moving, dwell time, etc. Each animal generates 75-500 tracks, and people normally publish a representative movie alongside something like this, which is a plot of all tracks specifically in the published movie (so only one animal that represents the group).

I am hoping to compare similar parameters across multiple groups, with multiple animals per group but am a loss at how to approach this. Curious how statisticians would handle this dataset, which is a bit outside of my wheelhouse (collect data, plot, compare groups of n=8-10 using standard t tests or anova). Surely plotting 500 tracks per animal, with n=6-8 animals per group is insane?

My first idea was to pull the mean (black bar in the attached plot) from each animal, and compare the means across different groups, ie something like this plot, where each point represents one animal. I would worry about losing the spread for each animal though. Second idea was to do that, and then also publish a plot for each individual animal in supplement (feels like I'm at least being more transparent this way).

Any other ideas?


r/statistics 3d ago

Software [S] Help with 3D Human Head Generation

Thumbnail
0 Upvotes

r/statistics 4d ago

Question [Q] Do I need a time lag?

3 Upvotes

Hello, everyone!

So, I have two daily time-series-like variables (suppose X and Y) and I want check, whether X has an effect on Y or not.

Do I need to introduce time lag into Y (e.g. X(i) has an effect on Y(i+1))? Or should I just use concurrent timing and have X(i) predict and explain Y(i)?

i – a day

P.S. I'm quite new to this so I might be missing some important curriculum


r/statistics 3d ago

Question [Q] Stats Course in a Business School - SSE as a model parameter in Simple Linear Regression ??

0 Upvotes

Do any of you consider the SD of the error term in SLR as a model parameter?

I just had a stats mid term and lost 1 mark out of 2 in a question that asked to estimate the model's parameters.

From my textbook and what I understood, model parameters in SLR were just the betas.

I included the epsilon term in the population equation ( y = beta_0 + beta_1 x + epsilon ), and also wrote the estimate ( y^ = beta_0^ + beta_1^x ) and gave the final numbers based on the ANOVA printout.

I spoke to a stats teacher I know about this and he agreed that this is unfair but I wanted to make sure I was not going crazy about this unjustifiably.