Category: Statistical Analysis

What To Do When You Can’t Run the Ideal Analysis

One of the things that many statistics students struggle with is on the quest to find the right analysis. While this should be praised, the hard reality is that in many situations, there isn’t one right analysis. 

If you take a look at a textbook, for example, you will see that they only tend to show ideal situations and the right analysis. They can even be seen as cookbooks. You just need to follow the steps and to get the perfect meal. However, real data analysis is completely different. In fact, it rarely fits that ideal. And this is what makes data analysis so tough.

Discover everything you need to know about the z score table. 

data analysis

The truth is that with real data analysis, you need to keep in mind many different contextual factors. After all, you have your research question, the types of variables, the design, and even the data issues that need to come all together to do the best analysis you have available to you.

While you may have more control over what you have to work with when you are involved in the planning and execution of the data collection, your control also has a limit. And of course, sometimes you do everything right and things don’t go as you expected.

It’s not wrong to run an imperfect analysis as long as you’re transparent about its weaknesses. It doesn’t mean there is a better analysis out there.

Your job is to do the best analysis you can based on what you have to work with.

Looking to determine the Z value?

A Simple Example

Let’s take a look at a quick example so that you can fully understand what we mean here. Imagine that you have a dependent variable in the form of a rate. Imagine that it was the number of sales per employee. It was highly skewed. The unit of analysis was the sales office.

Simple Example

The best way to analyze these is with a count model – usually a Poisson or a negative binomial regression. Situations like this fit their assumptions. In case you don’t know, count models assume a skewed distribution of Y|X and the higher variance at higher means. In addition, they can include the number of employees as an exposure variable and will only give positive predicted values.

Make sure to use our z score calculator.

The problem is that count models require the dependent variable to be a count (number of sales). So, we have to separate the exposure variable (number of employees). The model will combine them into a rate.

But whoever collected these data combined these into a rate already. They didn’t keep the original variables. So the ideal analysis was just out of reach.

So, what could you do in this situation?

real data analysis

In our opinion, the best option would be a log transformation and a linear model instead. While it’s not ideal, this approach should mitigate issues with skew and non-constant variance. It can give a reasonable answer to the research question.

It’s important that the researcher describes in detail what he did and the possible biases and assumption violations this analysis introduces so that the reader can make their own inferences.

3 Statistical Analysis Rules That You Should Forget About

In case you just started taking statistics classes, then you may be feeling a bit overwhelmed. After all, there are so many different concepts, notations, and even vocabulary. 

The reality is that good statistics teachers try to teach these basic concepts with time to ensure that students can easily understand them. However, in the school environment, sometimes time is scarce. So, they need to rush to teaching new things. In addition to this, students should also be allowed to have time to practice these new concepts and ideas. 

Statistical Analysis Rules

Looking to know more about the z score and the z tables?

One of the things that you need to keep in mind about statistics and the things that come in your manual is that they are oversimplified. Overall, when you need to work with real data, you will see that it can be very messy. 

So, to ensure that you get a good grasp of statistics, especially if you just started your classes, make sure that you keep reading. 

3 Statistical Analysis Rules That You Should Forget About When Dealing With Real Data

When you need to check statistical assumptions, just run a test. In this case, this will allow you to determine if the assumption is met by the significance of that test. 

The reality is that every statistical model and test has assumptions. And even though they are important, sometimes they can be difficult to verify. 

One of the things that you should keep in mind is that for many assumptions, there are specific tests whose goal is to test if the assumption of another test is being met. However, in most cases, these tests can help you but they aren’t definitive. 

Discover how to use z tables.

So, instead of doing this, you can:

#1: You should use the test results as just one of the many pieces of information that you can use together to decide if an assumption is violated. 

first rules of statistical analysis

So, just delete outliers that are 3 or more standard deviations from the mean. As you will be able to see, your data will look a lot better. And while when you have evidence that there is an error you can’t use this strategy, it can help in some situations. All you need to keep in mind is to not always follow this rule since you may be introducing bias into your results or you may be missing out on the most interesting part of your data set.

Understanding the Z test.

#2: When you discover an outlier, you should investigate it properly.

When this happens, you should try to find out if this is an error. Besides, you should try to discover where it comes from. The best thing to do in these cases is to check the normality of the dependent variables before you run a linear model. 

As you probably already know, in a t test, there is the assumption that the dependent variable is normally distributed within each group. And this is the same thing as saying that given the group as defined by X, Y follows a normal distribution.

Statistics concepts

ANOVA also has a similar assumption: given the group as defined by X, Y follows a normal distribution.

But here’s the thing: the distribution of Y as a whole doesn’t have to be normal. In fact, if X has a big effect, the distribution of Y, across all values of X, will often be skewed or bimodal or just a big old mess. This happens even if the distribution of Y, at each value of X, is perfectly normal.

How can you calculate the z score?

#3: Another thing you can do is to simply check the assumptions after you have picked the predictors. After all, normality depends on which independent variables are in the model.