This vignette can be cited as:

```
## To cite package 'correlation' in publications use:
##
## Makowski, D., Wiernik, B. M., Patil, I., Lüdecke, D., & Ben-Shachar,
## M. S. (2022). correlation: Methods for correlation analysis (0.8.3)
## [R package]. https://CRAN.R-project.org/package=correlation (Original
## work published 2020)
##
## Makowski, D., Ben-Shachar, M. S., Patil, I., & Lüdecke, D. (2019).
## Methods and algorithms for correlation analysis in R. Journal of Open
## Source Software, 5(51), 2306. https://doi.org/10.21105/joss.02306
##
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.
```

Imagine we have an experiment in which **10
individuals** completed a task with **100 trials**.
For each of the 1000 trials (10 * 100) in total, we measured two things,
**V1** and **V2**, and we are interested in
**investigating the link between these two variables**.

We will generate data using the `simulate_simpson()`

function from this package and look at its summary:

Now let’s visualize the two variables:

```
library(ggplot2)
ggplot(data, aes(x = V1, y = V2)) +
geom_point() +
geom_smooth(colour = "black", method = "lm", se = FALSE) +
theme_classic()
```

That seems pretty straightforward! It seems like there is a
**negative correlation** between V1 and V2. Let’s test
this.

Indeed, there is a **strong, negative and significant
correlation** between V1 and V2.

Great, can we go ahead and **publish these results in
PNAS**?

Not so fast! Ever heard of the **Simpson’s
Paradox**?

Let’s colour our datapoints by group (by individuals):

```
library(ggplot2)
ggplot(data, aes(x = V1, y = V2)) +
geom_point(aes(colour = Group)) +
geom_smooth(aes(colour = Group), method = "lm", se = FALSE) +
geom_smooth(colour = "black", method = "lm", se = FALSE) +
theme_classic()
```

Mmh, interesting. It seems like, for each subject, the relationship
is different. The (global) negative trend seems to be an artifact of
**differences between the groups** and could be
spurious!

**Multilevel (as in multi-group) **
correlations allow us to account for

You can compute them with the **correlations**
package by setting the `multilevel`

argument to
`TRUE`

.

For completeness, let’s also see if its Bayesian cousin agrees with it:

**Dayum!** We were too hasty in our conclusions! Taking
the group into account seems to be super important.

*Note*: In this simple case where only two variables are of
interest, it would be of course best to directly proceed using a mixed
regression model instead of correlations. That being said, the latter
can be useful for exploratory analysis, when multiple variables are of
interest, or in combination with a network or structural approach.