| Title: | Computation of 2D and 3D Elliptical Joint Confidence Regions |
|---|---|
| Description: | Computing elliptical joint confidence regions at a specified confidence level. It provides the flexibility to estimate either classical or robust confidence regions, which can be visualized in 2D or 3D plots. The classical approach assumes normality and uses the mean and covariance matrix to define the confidence regions. Alternatively, the robustified version employs estimators like minimum covariance determinant (MCD) and M-estimator, making them less sensitive to outliers and departures from normality. Furthermore, the functions allow users to group the dataset based on categorical variables and estimate separate confidence regions for each group. This capability is particularly useful for exploring potential differences or similarities across subgroups within a dataset. Varmuza and Filzmoser (2009, ISBN:978-1-4200-5947-2). Johnson and Wichern (2007, ISBN:0-13-187715-1). Raymaekers and Rousseeuw (2019) <DOI:10.1080/00401706.2019.1677270>. |
| Authors: | Christian L. Goueguel [aut, cre] (ORCID: <https://orcid.org/0000-0003-0521-3446>) |
| Maintainer: | Christian L. Goueguel <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.0 |
| Built: | 2026-05-31 08:53:42 UTC |
| Source: | https://github.com/christiangoueguel/confidenceellipse |
Compute the coordinate points of confidence ellipses at a specified confidence level.
confidence_ellipse( .data, x, y, .group_by = NULL, conf_level = 0.95, robust = FALSE, distribution = "normal" )confidence_ellipse( .data, x, y, .group_by = NULL, conf_level = 0.95, robust = FALSE, distribution = "normal" )
.data |
data frame or tibble. |
x |
column name for the x-axis variable. |
y |
column name for the y-axis variable. |
.group_by |
column name for the grouping variable ( |
conf_level |
confidence level for the ellipse (0.95 by default). |
robust |
optional ( |
distribution |
optional ( |
The function computes the coordinates of the confidence ellipse based
on the specified confidence level and the provided data. It can handle both classical
and robust estimation methods, and it supports grouping by a factor variable.
The distribution parameter controls the statistical approach used for ellipse
calculation. The "normal" option uses the chi-square distribution quantile,
which is appropriate when working with very large samples.
Whereas the "hotelling" option uses Hotelling's T² distribution quantile.
This approach accounts for uncertainty in estimating both mean and covariance
from sample data, producing larger ellipses that better reflect sampling uncertainty.
This is statistically more rigorous for smaller sample sizes where parameter
estimation uncertainty is higher.
The combination of distribution = "hotelling" and robust = TRUE offers the
most conservative and statistically rigorous approach, particularly recommended
for exploratory data analysis and when dealing with datasets that may
not meet ideal statistical assumptions. For very large samples, the default
settings (distribution = "normal", robust = FALSE) may be sufficient, as
the differences between methods diminish with increasing sample size.
Data frame of the coordinates points.
Christian L. Goueguel
Raymaekers, J., Rousseeuw P.J. (2019). Fast robust correlation for high dimensional data. Technometrics, 63(2), 184-198.
Brereton, R. G. (2016). Hotelling’s T-squared distribution, its relationship to the F distribution and its use in multivariate space. Journal of Chemometrics, 30(1), 18–21.
# Data data("glass", package = "ConfidenceEllipse") # Confidence ellipse ellipse <- confidence_ellipse(.data = glass, x = SiO2, y = Na2O) ellipse_grp <- confidence_ellipse( .data = glass, x = SiO2, y = Na2O, .group_by = glassType )# Data data("glass", package = "ConfidenceEllipse") # Confidence ellipse ellipse <- confidence_ellipse(.data = glass, x = SiO2, y = Na2O) ellipse_grp <- confidence_ellipse( .data = glass, x = SiO2, y = Na2O, .group_by = glassType )
Compute the coordinate points of confidence ellipsoids at a specified confidence level.
confidence_ellipsoid( .data, x, y, z, .group_by = NULL, conf_level = 0.95, robust = FALSE, distribution = "normal" )confidence_ellipsoid( .data, x, y, z, .group_by = NULL, conf_level = 0.95, robust = FALSE, distribution = "normal" )
.data |
data frame or tibble. |
x |
column name for the x-axis variable. |
y |
column name for the y-axis variable. |
z |
column name for the z-axis variable. |
.group_by |
column name for the grouping variable ( |
conf_level |
confidence level for the ellipsoid (0.95 by default). |
robust |
optional ( |
distribution |
optional ( |
The function computes the coordinates of the confidence ellipse based
on the specified confidence level and the provided data. It can handle both classical
and robust estimation methods, and it supports grouping by a factor variable.
The distribution parameter controls the statistical approach used for ellipse
calculation. The "normal" option uses the chi-square distribution quantile,
which is appropriate when working with very large samples.
Whereas the "hotelling" option uses Hotelling's T² distribution quantile.
This approach accounts for uncertainty in estimating both mean and covariance
from sample data, producing larger ellipses that better reflect sampling uncertainty.
This is statistically more rigorous for smaller sample sizes where parameter
estimation uncertainty is higher.
The combination of distribution = "hotelling" and robust = TRUE offers the
most conservative and statistically rigorous approach, particularly recommended
for exploratory data analysis and when dealing with datasets that may
not meet ideal statistical assumptions. For very large samples, the default
settings (distribution = "normal", robust = FALSE) may be sufficient, as
the differences between methods diminish with increasing sample size.
Data frame of the coordinates points.
Christian L. Goueguel
Raymaekers, J., Rousseeuw P.J. (2019). Fast robust correlation for high dimensional data. Technometrics, 63(2), 184-198.
Brereton, R. G. (2016). Hotelling’s T-squared distribution, its relationship to the F distribution and its use in multivariate space. Journal of Chemometrics, 30(1), 18–21.
# Data data("glass", package = "ConfidenceEllipse") # Confidence ellipsoid ellipsoid <- confidence_ellipsoid(.data = glass, x = SiO2, y = Na2O, z = Fe2O3) ellipsoid_grp <- confidence_ellipsoid( .data = glass, x = SiO2, y = Na2O, z = Fe2O3, .group_by = glassType )# Data data("glass", package = "ConfidenceEllipse") # Confidence ellipsoid ellipsoid <- confidence_ellipsoid(.data = glass, x = SiO2, y = Na2O, z = Fe2O3) ellipsoid_grp <- confidence_ellipsoid( .data = glass, x = SiO2, y = Na2O, z = Fe2O3, .group_by = glassType )
The dataset is comprised of 13 different measurements for 180 archaeological glass vessels from different groups (Janssen, K.H.A., De Raedt, I., Schalm, O., Veeckman, J.: Microchim. Acta 15 (suppl.) (1998) 253-267. Compositions of 15th - 17th century archaeological glass vessels excavated in Antwerp.).
glassglass
Data frame of 180 rows and 14 columns.