blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/ #65

utterances-bot · 2021-01-04T15:29:54Z

utterances-bot
Jan 4, 2021

Wilcoxon test in R: how to compare 2 groups under the non-normality assumption - Stats and R

Learn how to do the Wilcoxon test (non-parametric version of the Student's t-test) in R, used to compare 2 groups when the normality assumption is violated

https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/

AntoineSoetewey · 2021-01-04T15:29:55Z

AntoineSoetewey
Jan 4, 2021
Maintainer

Comment written by Gerald I Cheves on June 08, 2020 01:09:13:

What's the difference between the Shapiro-Wilk normality test and the Kolmogorov test for normality?

0 replies

AntoineSoetewey · 2021-01-04T15:57:58Z

AntoineSoetewey
Jan 4, 2021
Maintainer

Comment written by Gerald I Cheves on June 08, 2020 01:09:13:

What's the difference between the Shapiro-Wilk normality test and the Kolmogorov test for normality?

Comment written by Antoine Soetewey on June 08, 2020 04:09:03:

Good question Gerald.

This article discusses the different normality tests.

Briefly said, Kolmogorov-Smirnov and Shapiro-Wilk tests both have the same hypotheses (H0: data comes from a normal distribution and H1: data does not come from a normal distribution), but Shapiro-Wilk test is less sensitive to extreme values and more powerful than Kolmogorov-Smirnov test.

Hope this helps.

Best,
Antoine

0 replies

xairigu · 2021-03-03T02:39:20Z

xairigu
Mar 3, 2021

Hi, ¿why you are not testing for variance and what would be the difference between testing for it or not testing? ¿does it change what is being tested with wilcoxon test?

0 replies

AntoineSoetewey · 2021-03-03T11:23:56Z

AntoineSoetewey
Mar 3, 2021
Maintainer

Hi, ¿why you are not testing for variance and what would be the difference between testing for it or not testing? ¿does it change what is being tested with wilcoxon test?

Dear Xaira,

This is a good question and it is often raised.

Here are 3 good articles discussing the concept of equal variances in Wilcoxon test: 1, 2 & 3.

See for instance in 1: "If the two distributions have a different shape, the Mann-Whitney U test is used to determine whether there are differences in the distributions of your two groups. However, if the two distributions are the same shape, the Mann-Whitney U test is used to determine whether there are differences in the medians of your two groups."

To rephrase it, if you only want to compare the two groups you do not have to test the equality of variances. However, if your goal is to compare medians of the two groups then you will need to make sure that the two distributions have the same shape.

So testing for equality of variances will change your interpretation. In this article I don't compare medians, I only compare the groups. This is the reason I don't test for equality of variances. I have added a note regarding this assumption in this section, so thanks for your question.

For your information, this is equivalent when using Kruskal-Wallis test to compare 3 groups or more (see this footnote in my article about ANOVA): if you only want to compare the groups you do not need homoscedasticity, but if you want to compare the medians this assumption must be met.

Hope this helps.

Regards,
Antoine

0 replies

Cannaxuan · 2021-03-11T10:28:32Z

Cannaxuan
Mar 11, 2021

Since you want to compare the groups by determining whether there are differences in the distributions of the two groups, how to inteprete ' Alternative = "less" or "greater" '?

0 replies

AntoineSoetewey · 2021-03-11T12:32:24Z

AntoineSoetewey
Mar 11, 2021
Maintainer

Since you want to compare the groups by determining whether there are differences in the distributions of the two groups, how to inteprete ' Alternative = "less" or "greater" '?

Hello,

Thanks for your question.

Indeed in the first place I would like to test whether there are differences in the distribution of the two groups, so I don't specify any alternative and test the following:

H0: the 2 groups are similar
H1: the 2 groups are different

However, one may be interested to go further (based on preliminary research or on the research question for instance) by testing whether one group performs better or worse than the other. In this case, the alternative should be specified. If one wants to test whether:

group 1 performs better than group 2 then alternative = "greater" should be added
on the contrary, group 1 performs worse than group 2, alternative = "less" should be added.

Hope this makes sense; let me know if not.

Regards,
Antoine

0 replies

Cannaxuan · 2021-03-11T12:38:14Z

Cannaxuan
Mar 11, 2021

The R document cites "if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) is carried out. In this case, the null hypothesis is that the distributions of x and y differ by a location shift of mu and the alternative is that they differ by some other location shift (and the one-sided alternative "greater" is that x is shifted to the right of y)."

0 replies

AntoineSoetewey · 2021-03-11T13:16:55Z

AntoineSoetewey
Mar 11, 2021
Maintainer

The R document cites "if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) is carried out. In this case, the null hypothesis is that the distributions of x and y differ by a location shift of mu and the alternative is that they differ by some other location shift (and the one-sided alternative "greater" is that x is shifted to the right of y)."

Thanks for the reference.

alternative "greater" is that x is shifted to the right of y:

if I understand correctly, it seems to me that x is larger than y, which means (in our case) that group 1 (x) performs better than group 2 (y).

Unless you have another interpretation of the documentation? I'd be happy to discuss it.

Regards,
Antoine

0 replies

Cannaxuan · 2021-03-12T01:15:08Z

Cannaxuan
Mar 12, 2021

The R document cites "if both x and y are given and paired is FALSE, a Wilcoxon rank sum test (equivalent to the Mann-Whitney test) is carried out. In this case, the null hypothesis is that the distributions of x and y differ by a location shift of mu and the alternative is that they differ by some other location shift (and the one-sided alternative "greater" is that x is shifted to the right of y)."

Thanks for the reference.

alternative "greater" is that x is shifted to the right of y:

if I understand correctly, it seems to me that x is larger than y, which means (in our case) that group 1 (x) performs better than group 2 (y).

Unless you have another interpretation of the documentation? I'd be happy to discuss it.

Regards,
Antoine

Thanks for your quick reply.
I am still a little confused that is "differences in the distribution of the two groups" equivalent to "differences in the mean(mu) of the two groups"?
If it is, then I am clear. But if not, according to the R document of "wilcox.test" , no specify "alternative" means " alternative = two.sided"(default). This is for the distribution test or the mean test?
Below is the link of the R document of "wilcox.test" for your referrence.
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/wilcox.test

0 replies

AntoineSoetewey · 2021-03-12T18:32:37Z

AntoineSoetewey
Mar 12, 2021
Maintainer

Thanks for your quick reply.
I am still a little confused that is "differences in the distribution of the two groups" equivalent to "differences in the mean(mu) of the two groups"?

As far as I understand, the Wilcoxon test is not comparing the means (mu). This is the reason that in this article I wrote:

H0: the 2 groups are similar
H1: the 2 groups are different

The Student's t-test is comparing the means:

H0: mean of group 1 = mean of group 2
H1: mean of group 1 is different than mean of group 2

That being said, both tests allows to compare two groups (with a different process if I may say).

If it is, then I am clear. But if not, according to the R document of "wilcox.test" , no specify "alternative" means " alternative = two.sided"(default). This is for the distribution test or the mean test?

If you don't specify any alternative, it is indeed a two-sided test so you are testing:

H0: the 2 groups are similar
H1: the 2 groups are different

But in any case (i.e., with or without specifying an alternative), with the Wilcoxon test you are not using the means, you are rather comparing the distributions of the two groups, so I would not call it a mean test.

Hope this helps. I am not completely sure I understand your question so my apologies if I am not answering it.

Regards,
Antoine

0 replies

AnnemarieVilladsen · 2022-07-04T16:41:46Z

AnnemarieVilladsen
Jul 4, 2022 — with giscus

Thanks for a great article. However, to use the Wilcoxon signed-rank test, must data be continuous?
Perhaps you can help me. I have data from two paired groups – before and after treatment where they have answered a questionnaire. From this questionnaire, I calculate a score, and then I would like to compare the score before and after treatment. So, I would perform the Wilcoxon signed-rank test as I have paired groups (data are not normally distributed and there is equal variance), however, I have some zeros (no difference) and I get a warning that it cannot compute exact p-values with ties. Do you know how to handle these problems? Is it right that to use the test the data must be continuous? – because my data is not, as it is a score value, categorical ordinal. So do you know which test to use with my data?

1 reply

AntoineSoetewey Jul 4, 2022
Maintainer

Hello @AnnemarieVilladsen,

How do you calculate the score? You take the mean across several items? Or the score is based on a single item?

AnnemarieVilladsen · 2022-07-04T19:09:54Z

AnnemarieVilladsen
Jul 4, 2022

The score is a sum of different questions. From each question a value from 0-3 can be giving.

1 reply

AntoineSoetewey Jul 4, 2022
Maintainer

Although your data is not continuous, you can still use the Wilcoxon signed-rank test because the score you computed based on several questions can still be ranked (see section “Appropriate data” in this article and Assumption 1 in this article).

Regarding the warning. It’s a warning rather than an indication that your results are incorrect. R is reporting a p-value based on a normal approximation rather than an exact p-value based on the data values because there are ties (see more info in the section “wilcox.test() Function” in this article).

Hope this helps.

Regards,
Antoine

AnnemarieVilladsen · 2022-07-05T17:57:47Z

AnnemarieVilladsen
Jul 5, 2022

Thanks for the great answer!👏

0 replies

adrianolszewski · 2024-10-10T16:56:23Z

adrianolszewski
Oct 10, 2024 — with giscus

What a nicely written blog! It's very clean and guides the readers step by step, I like it!
I thought it might be worthwhile for the readers if I proposed some more options for comparing groups under non-normality and add few notes.

Notes first (about medians):
As you properly mentioned (so many resources ignore this fact completely!), the Mann-Whitney (-Wilcoxon) can be used to compare medians only if the other (than location) properties, like dispersion (variance) and the shape of the distribution are constrained to be same, briefly - ID (identically distributed). Because only then the only what lefts is the location parameter. Otherwise this test will be sensitive to all the differences, testing in general something called H0: stochastic equivalence vs. H1: stochastic superiority (or dominance).

I know you wrote that comparing medians was NOT the intention for this post, but by this occasion let me only briefly mention two alternatives for medians, just in case :)

the Brown-Mood test of medians - much older, focusing solely on medians (not to be confused with the Mood's test of scale parameters - mood.test() in R), can be found in the RVAideMemoire, coin and nonpar packages. and
the quantile regression. In R it's the quantreg package.
I know, quantile regression may seem an overkill for such a simple task, but 1) will compare exactly medians (and nothing else than requested quantiles), 2) can be extended to any number of levels and factors (relaxing the problem of 1-way Kruskal-Wallis, for instance), and even their interactions, 3) can adjust for numerical covariates. If we have just one 2-level factor, it will just compare medians without the need for ID variables. If we use mixed-model quantile regression (lqmm), also IID (paired data can be handled).

Proposals next:
Sometimes it may happen, that the observed data are not well describable with the theoretical normal distribution (have multiple modes, are skewed, fatter tails), BUT we know, from the domain knowledge and past research, that in population summarizing them with arithmetic means IS meaningful. In other words - it makes sense to ask for comparing their means, despite the fact they don't look Gaussian in samples. Also, if the variances differ (no I.D.). So we have the generalized Behrens-Fisher problem...

And then, there is one brilliant option to compare them with preserved H0 (about means) while relaxing the requirement for normality: the permutation Welch-Satterthwaite t-test. The good-old Welch t-test, which accounts for the unequal variances, + permutation approach (actually, in reality it's just a smaller subset of huge number of permutations) which "makes" the theoretical distribution of the test statistics under true H0, so we can test safely. (PS: and we can also specify the "trim" parameter, which turns the Welch t-test into Yuen-Welch t-test, which accounts also for extreme observations, aka outliers).

In R there are multiple packages for doing permutation t-test, but let me call my preferred one: MKinfer. This is my favoruite workhorse for both permutation and bootstrap testing. It does one fantastic thing: it prints both results based on permutations (or bootstrap, depending on choice) AND the ordinary Welch t-test, so we can compare them. Why is it useful? Because this way we can clearly assess how much the violation of normality affected the actual result. If it did not - so if both results are very close to each other, we can safely report the classic Welch t-test without "scaring people" :) with the "permutation" name. Because despite the fact the assumption was violated, it didn't affect the estimation process noticeably.

The permutation (Welch, Yuen-Welch) t-test isn't a magic wand - it won't make the inference meaningful if summarizing our data with arithmetic means is just pointless. It will give a technically valid answer to a pointless question. But if only such question makes sense, this is just awesome way to answer it. One needs to remember to save the seed (set.seed()) to be able to replicate the results in future.

If you do not mind, I would like also to provide some literature for curious readers, exactly about these topics:
👨‍🏫 Janssen, A. (1997). Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens–Fisher problem. Statist. Probab. Lett. 36 9–21. MR1491070

👩‍🏫 Janssen, A. (2005). Resampling Student's t-type statistics. Annals of the Institute of Statistical Mathematics. 57. 507-529. 10.1007/BF02509237 , https://www.researchgate.net/publication/24052826_Resampling_Student%27s_t-type_statistics

👩‍🏫 Arnold Janssen, Thorsten Pauls "How do bootstrap and permutation tests work?," The Annals of Statistics, Ann. Statist. 31(3), 768-806, (June 2003), https://projecteuclid.org/journals/annals-of-statistics/volume-31/issue-3/How-do-bootstrap-and-permutation-tests-work/10.1214/aos/1056562462.full

👨‍🏫 EunYi Chung, Joseph P. Romano "Exact and asymptotically robust permutation tests," The Annals of Statistics, Ann. Statist. 41(2), 484-507, (April 2013), https://projecteuclid.org/journals/annals-of-statistics/volume-41/issue-2/Exact-and-asymptotically-robust-permutation-tests/10.1214/13-AOS1090.full

👨‍🏫 Noguchi, K., Konietschke, F., Marmolejo-Ramos, F. et al. Permutation tests are robust and powerful at 0.5% and 5% significance levels. (2021). https://link.springer.com/content/pdf/10.3758/s13428-021-01595-5.pdf

👩‍🏫 Check also: Amro Lubna (2022), Resampling-Based Inference Methods for Repeated Measures Data with Missing Values, https://eldorado.tu-dortmund.de/bitstream/2003/40978/1/Diss.pdf

💡 Huang, Peng et al. “Formulating appropriate statistical hypotheses for treatment comparison in clinical trial design and analysis.” Contemporary clinical trials vol. 39,2 (2014): 294-302, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4254362/

2 replies

adrianolszewski Oct 10, 2024 — with giscus

Ah, I was to quick and forgot to add another part... But maybe it's better, because it's kinda separate thing.
One can use the ordinal logistic regression (aka proportional odds model) to analyze numerical data (which are of course also ordinal + have arithmetics enabled, so we can add and subtract meaningfully) non-parametrically. Professor Frank Harrell has been known from promoting this approach for years! It's just awesome way, especially if we notice the fact, that the Mann-Whitney (Wilcoxon) is... just the ordinal logistic regression with a single 2-level predictor and no covariates. But there's more - by using the rms package (named after his book title: Regression Modelling Strategies) we can also obtain the empirical CDF for the data, and from this - means and medians.
This is a topic really worth checking! https://www.fharrell.com/post/rpo/, https://hbiostat.org/rmsc/cony#ordinal-regression-models-for-continuous-y

AntoineSoetewey Oct 11, 2024
Maintainer

Thank you so much @adrianolszewski for your detailed and insightful comment! Your perspective really adds value to the post, and I appreciate the time you took to share it. Looking forward to hearing more from you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/ #65

{{title}}

Replies: 14 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/ #65

utterances-bot Jan 4, 2021

Wilcoxon test in R: how to compare 2 groups under the non-normality assumption - Stats and R

Replies: 14 comments · 4 replies

AntoineSoetewey Jan 4, 2021 Maintainer

AntoineSoetewey Jan 4, 2021 Maintainer

xairigu Mar 3, 2021

AntoineSoetewey Mar 3, 2021 Maintainer

Cannaxuan Mar 11, 2021

AntoineSoetewey Mar 11, 2021 Maintainer

Cannaxuan Mar 11, 2021

AntoineSoetewey Mar 11, 2021 Maintainer

Cannaxuan Mar 12, 2021

AntoineSoetewey Mar 12, 2021 Maintainer

AnnemarieVilladsen Jul 4, 2022 — with giscus

AntoineSoetewey Jul 4, 2022 Maintainer

AnnemarieVilladsen Jul 4, 2022

AntoineSoetewey Jul 4, 2022 Maintainer

AnnemarieVilladsen Jul 5, 2022

adrianolszewski Oct 10, 2024 — with giscus

adrianolszewski Oct 10, 2024 — with giscus

AntoineSoetewey Oct 11, 2024 Maintainer

utterances-bot
Jan 4, 2021

Replies: 14 comments 4 replies

AntoineSoetewey
Jan 4, 2021
Maintainer

AntoineSoetewey
Jan 4, 2021
Maintainer

xairigu
Mar 3, 2021

AntoineSoetewey
Mar 3, 2021
Maintainer

Cannaxuan
Mar 11, 2021

AntoineSoetewey
Mar 11, 2021
Maintainer

Cannaxuan
Mar 11, 2021

AntoineSoetewey
Mar 11, 2021
Maintainer

Cannaxuan
Mar 12, 2021

AntoineSoetewey
Mar 12, 2021
Maintainer

AnnemarieVilladsen
Jul 4, 2022 — with giscus

AntoineSoetewey Jul 4, 2022
Maintainer

AnnemarieVilladsen
Jul 4, 2022

AntoineSoetewey Jul 4, 2022
Maintainer

AnnemarieVilladsen
Jul 5, 2022

adrianolszewski
Oct 10, 2024 — with giscus

AntoineSoetewey Oct 11, 2024
Maintainer