Package 'CGPfunctions' reference manual

Title:	Powell Miscellaneous Functions for Teaching and Learning Statistics
Description:	Miscellaneous functions useful for teaching statistics as well as actually practicing the art. They typically are not new methods but rather wrappers around either base R or other packages.
Authors:	Chuck Powell [aut, cre]
Maintainer:	Chuck Powell <[email protected]>
License:	MIT + file LICENSE
Version:	0.6.3
Built:	2025-03-05 05:50:08 UTC
Source:	https://github.com/ibecav/cgpfunctions

Anova Tables for Type 2 sums of squares

Description

Calculates and displays type-II analysis-of-variance tables for model objects produced by aov. This is a vastly reduced version of the Anova function from package car

Usage

aovtype2(mod)
aovtype2(mod)

Arguments

mod

aov model object from base R.

Details

Details about how the function works in order of steps taken. Type-II tests are invariant with respect to (full-rank) contrast coding. Type-II tests are calculated according to the principle of marginality, testing each term after all others, except ignoring the term's higher-order relatives. This definition of Type-II tests corresponds to the tests produced by SAS for analysis-of-variance models, where all of the predictors are factors, but not more generally (i.e., when there are quantitative predictors).

Value

An object of class "anova", which usually is printed.

Author(s)

John Fox [email protected]; as modified by Chuck Powell

References

: Fox, J. (2016) Applied Regression Analysis and Generalized Linear Models, Third Edition. Sage.

Examples


mtcars$cyl <- factor(mtcars$cyl)
mtcars$am <- factor(mtcars$am)
mod <- aov(hp ~ cyl * am, data = mtcars)
aovtype2(mod)
  
mtcars$cyl <- factor(mtcars$cyl)
mtcars$am <- factor(mtcars$am)
mod <- aov(hp ~ cyl * am, data = mtcars)
aovtype2(mod)

Choose display type for BF formatting.

Description

Choose display type for BF formatting.

Usage

bf_display(bf = NULL, display_type = "bf", k = 2)
bf_display(bf = NULL, display_type = "bf", k = 2)

Arguments

`bf`	A numeric vector containing one or more BF values.
`display_type`	A string containing which option one of "support", "logged", or "sensible".
`k`	A numeric for the number of rounded digits.

Value

a formatted character string.

Author(s)

Chuck Powell

CGPfunctions: A package of miscellaneous functions for teaching statistics.

Description

A package that includes miscellaneous functions useful for teaching statistics as well as actually practicing the art. They typically are not new methods but rather wrappers around either base R or other packages.

Functions included

newggslopegraph creates a "slopegraph" as conceptualized by Edward Tufte.
Plot2WayANOVA which as the name implies conducts a 2 way ANOVA and plots the results using 'ggplot2'
PlotXTabs2 which wraps around ggplot2 to provide Bivariate bar charts for categorical and ordinal data.
chaid_table provides tabular summary of CHAID partykit object.
cross2_var_vectors helper function to cross a vector of variables.
PlotXTabs Plots cross tabulated variables using 'ggplot2'
Mode which finds the modal value in a vector of data
SeeDist which wraps around ggplot2 to provide visualizations of univariate data.
OurConf which wraps around ggplot2 to provide visualizations of sampling confidence intervals.

Produce CHAID results tables from a partykit CHAID model

Description

Produce CHAID results tables from a partykit CHAID model

Usage

chaid_table(chaidobject)
chaid_table(chaidobject)

Arguments

chaidobject

An object of type 'constparty' or 'party' which was produced by 'CHAID::chaid' see simple example below.

Value

A tibble containing the results.

Author(s)

Chuck Powell

Examples

library(CGPfunctions)
chaid_table(chaidUS)

library(CGPfunctions)
chaid_table(chaidUS)

U.S. 2000 Election Data (short)

Description

Data from a post-election survey following the year 2000 U.S. presidential elections. This is a subset from package 'CHAID'.

Usage

chaidUS
chaidUS

Format

A partykit on the following 6 variables.:

vote3: candidate voted for Gore or Bush
gender: gender, a factor with levels male and female
ager: age group, an ordered factor with levels 18-24 < 25-34 < 35-44 < 45-54 < 55-64 < 65+
empstat: status of employment, a factor with levels yes, no or retired
educr: status of education, an ordered factor with levels <HS < HS < >HS < College < Post Coll
marstat: status of living situation, a factor with levels married, widowed, divorced or never married

Source

https://r-forge.r-project.org/R/?group_id=343

Cross two vectors of variable names from a dataframe

Description

Cross two vectors of variable names from a dataframe

Usage

cross2_var_vectors(data, x, y, verbose = FALSE)
cross2_var_vectors(data, x, y, verbose = FALSE)

Arguments

`data`	the dataframe or tibble the variables are contained in.
`x`, `y`	These are either character or integer vectors containing the names, e.g. "am" or the column numbers e.g. 9
`verbose`	the default is FALSE, setting to TRUE will cat additional output to the screen

Value

a list with two sublists 'lista' and 'listb'. Very handy for feeding the lists to 'purrr' for further processing.

Author(s)

Chuck Powell

Examples

cross2_var_vectors(mtcars, 9, c(2, 10:11))
cross2_var_vectors(mtcars, "am", c("cyl", "gear", "carb"))
x2 <- c("am", "carb")
y2 <- c("vs", "cyl", "gear")
cross2_var_vectors(mtcars, x2, y2, verbose = TRUE)

## Not run: 
variables_list <- cross2_var_vectors(mtcars, x2, y2)
mytitles <- stringr::str_c(
  stringr::str_to_title(variables_list$listb),
  " by ",
  stringr::str_to_title(variables_list$lista),
  " in mtcars data"
  )
purrr::pmap(
.l = list(
   x = variables_list[[1]], # variables_list$lista
   y = variables_list[[2]], # variables_list$listb
   title = mytitles
),
.f = CGPfunctions::PlotXTabs2,
data = mtcars,
ylab = NULL,
perc.k = 1,
palette = "Set2"
)


## End(Not run)

cross2_var_vectors(mtcars, 9, c(2, 10:11))
cross2_var_vectors(mtcars, "am", c("cyl", "gear", "carb"))
x2 <- c("am", "carb")
y2 <- c("vs", "cyl", "gear")
cross2_var_vectors(mtcars, x2, y2, verbose = TRUE)

## Not run: 
variables_list <- cross2_var_vectors(mtcars, x2, y2)
mytitles <- stringr::str_c(
  stringr::str_to_title(variables_list$listb),
  " by ",
  stringr::str_to_title(variables_list$lista),
  " in mtcars data"
  )
purrr::pmap(
.l = list(
   x = variables_list[[1]], # variables_list$lista
   y = variables_list[[2]], # variables_list$listb
   title = mytitles
),
.f = CGPfunctions::PlotXTabs2,
data = mtcars,
ylab = NULL,
perc.k = 1,
palette = "Set2"
)


## End(Not run)

Derive the modal value(s) for a set of data

Description

This function takes a vector and returns one or mode values that represent the mode point of the data

Usage

Mode(x)
Mode(x)

Arguments

x

a vector

Value

a vector containing one or more modal values for the input vector

Warning

Be careful the function does some basic error checking but the return to Mode(NA) is NA and a vector where the majority of entries are NA is also NA

Examples

Mode(sample(1:100, 1000, replace = TRUE))
Mode(mtcars$hp)
Mode(iris$Sepal.Length)
Mode(sample(1:100, 1000, replace = TRUE))
Mode(mtcars$hp)
Mode(iris$Sepal.Length)

Tufte dataset on cancer survival rates

Description

A dataset containing cancer survival rates for different types of cancer over a 20 year period.

Usage

newcancer
newcancer

Format

A data frame with 96 rows and 3 variables:

Year: ordered factor for the 5, 10, 15 and 20 year survival rates
Type: factor containing the name of the cancer type
Survival: numeric for this data a whole number corresponding to the percent survival rate

Source

https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0003nk

Tufte dataset on Gross Domestic Product, 1970 and 1979

Description

Current receipts of fifteen national governments as a percentage of gross domestic product

Usage

newgdp
newgdp

Format

A data frame with 30 rows and 3 variables:

Year: character for 1970 and 1979
Country: factor country name
GDP: numeric a percentage of gross domestic product

Source

Edward Tufte. Beautiful Evidence. Graphics Press, 174-176.

Plot a Slopegraph a la Tufte using dplyr and ggplot2

Description

Creates a "slopegraph" as conceptualized by Edward Tufte. Slopegraphs are minimalist and efficient presentations of your data that can simultaneously convey the relative rankings, the actual numeric values, and the changes and directionality of the data over time. Takes a dataframe as input, with three named columns being used to draw the plot. Makes the required adjustments to the ggplot2 parameters and returns the plot.

Usage

newggslopegraph(
  dataframe,
  Times,
  Measurement,
  Grouping,
  Data.label = NULL,
  Title = "No title given",
  SubTitle = "No subtitle given",
  Caption = "No caption given",
  XTextSize = 12,
  YTextSize = 3,
  TitleTextSize = 14,
  SubTitleTextSize = 10,
  CaptionTextSize = 8,
  TitleJustify = "left",
  SubTitleJustify = "left",
  CaptionJustify = "right",
  LineThickness = 1,
  LineColor = "ByGroup",
  DataTextSize = 2.5,
  DataTextColor = "black",
  DataLabelPadding = 0.05,
  DataLabelLineSize = 0,
  DataLabelFillColor = "white",
  WiderLabels = FALSE,
  ReverseYAxis = FALSE,
  ReverseXAxis = FALSE,
  RemoveMissing = TRUE,
  ThemeChoice = "bw"
)
newggslopegraph(
  dataframe,
  Times,
  Measurement,
  Grouping,
  Data.label = NULL,
  Title = "No title given",
  SubTitle = "No subtitle given",
  Caption = "No caption given",
  XTextSize = 12,
  YTextSize = 3,
  TitleTextSize = 14,
  SubTitleTextSize = 10,
  CaptionTextSize = 8,
  TitleJustify = "left",
  SubTitleJustify = "left",
  CaptionJustify = "right",
  LineThickness = 1,
  LineColor = "ByGroup",
  DataTextSize = 2.5,
  DataTextColor = "black",
  DataLabelPadding = 0.05,
  DataLabelLineSize = 0,
  DataLabelFillColor = "white",
  WiderLabels = FALSE,
  ReverseYAxis = FALSE,
  ReverseXAxis = FALSE,
  RemoveMissing = TRUE,
  ThemeChoice = "bw"
)

Arguments

`dataframe`	a dataframe or an object that can be coerced to a dataframe. Basic error checking is performed, to include ensuring that the named columns exist in the dataframe. See the `newcancer` dataset for an example of how the dataframe should be organized.
`Times`	a column inside the dataframe that will be plotted on the x axis. Traditionally this is some measure of time. The function accepts a column of class ordered, factor or character. NOTE if your variable is currently a "date" class you must convert before using the function with `as.character(variablename)`.
`Measurement`	a column inside the dataframe that will be plotted on the y axis. Traditionally this is some measure such as a percentage. Currently the function accepts a column of type integer or numeric. The slopegraph will be most effective when the measurements are not too disparate.
`Grouping`	a column inside the dataframe that will be used to group and distinguish measurements.
`Data.label`	an optional column inside the dataframe that will be used as the label for the data points plotted. Can be complex strings and have 'NA' values but must be of class 'chr'. By default 'Measurement' is converted to 'chr' and used.
`Title`	Optionally the title to be displayed. Title = NULL will remove it entirely. Title = "" will provide an empty title but retain the spacing.
`SubTitle`	Optionally the sub-title to be displayed. SubTitle = NULL will remove it entirely. SubTitle = "" will provide and empty title but retain the spacing.
`Caption`	Optionally the caption to be displayed. Caption = NULL will remove it entirely. Caption = "" will provide and empty title but retain the spacing.
`XTextSize`	Optionally the font size for the X axis labels to be displayed. XTextSize = 12 is the default must be a numeric. Note that X & Y axis text are on different scales
`YTextSize`	Optionally the font size for the Y axis labels to be displayed. YTextSize = 3 is the default must be a numeric. Note that X & Y axis text are on different scales
`TitleTextSize`	Optionally the font size for the Title to be displayed. TitleTextSize = 14 is the default must be a numeric.
`SubTitleTextSize`	Optionally the font size for the SubTitle to be displayed. SubTitleTextSize = 10 is the default must be a numeric.
`CaptionTextSize`	Optionally the font size for the Caption to be displayed. CaptionTextSize = 8 is the default must be a numeric.
`TitleJustify`	Justification of title can be either a character "L", "R" or "C" or use the `hjust =` notation from `ggplot2` with a numeric value between '0' (left) and '1' (right).
`SubTitleJustify`	Justification of subtitle can be either a character "L", "R" or "C" or use the `hjust =` notation from `ggplot2` with a numeric value between '0' (left) and '1' (right).
`CaptionJustify`	Justification of caption can be either a character "L", "R" or "C" or use the `hjust =` notation from `ggplot2` with a numeric value between '0' (left) and '1' (right).
`LineThickness`	Optionally the thickness of the plotted lines that connect the data points. LineThickness = 1 is the default must be a numeric.
`LineColor`	Optionally the color of the plotted lines. By default it will use the ggplot2 color palette for coloring by `Grouping`. The user may override with one valid color of their choice e.g. "black" (see colors() for choices) OR they may provide a vector of colors such as c("gray", "red", "green", "gray", "blue") OR a named vector like c("Green" = "gray", "Liberal" = "red", "NDP" = "green", "Others" = "gray", "PC" = "blue"). Any input must be character, and the length of a vector should equal the number of levels in `Grouping`. If the user does not provide enough colors they will be recycled.
`DataTextSize`	Optionally the font size of the plotted data points. DataTextSize = 2.5 is the default must be a numeric.
`DataTextColor`	Optionally the font color of the plotted data points. '"black"' is the default can be either 'colors()' or hex value e.g. "#FF00FF".
`DataLabelPadding`	Optionally the amount of space between the plotted data point numbers and the label "box". By default very small = 0.05 to avoid overlap. Must be a numeric. Too large a value will risk "hiding" datapoints.
`DataLabelLineSize`	Optionally how wide a line to plot around the data label box. By default = 0 to have no visible border line around the label. Must be a numeric.
`DataLabelFillColor`	Optionally the fill color or background of the plotted data points. '"white"' is the default can be any of the 'colors()' or hex value e.g. "#FF00FF".
`WiderLabels`	logical, set this value to `TRUE` if your "labels" or `Grouping` variable values tend to be long as they are in the `newcancer` dataset. This setting will give them more room in the same plot size.
`ReverseYAxis`	logical, set this value to `TRUE` if you want to reverse the Y scale, especially useful for rankings when you want #1 on top.
`ReverseXAxis`	logical, set this value to `TRUE` if you want to reverse the factor levels on the X scale.
`RemoveMissing`	logical, by default set to `TRUE` so that if any `Measurement` is missing all rows for that `Grouping` are removed. If set to `FALSE` then the function will try to remove and graph what data it does have. N.B. missing values for `Times` and `Grouping` are never permitted and will generate a fatal error with a warning.
`ThemeChoice`	character, by default set to "bw" the other choices are "ipsum", "econ", "wsj", "gdocs", and "tufte".

Value

a plot of type ggplot to the default plot device

Author(s)

Chuck Powell

References

Based on: Edward Tufte, Beautiful Evidence (2006), pages 174-176.

Examples

# the minimum command to generate a plot
newggslopegraph(newcancer, Year, Survival, Type)

# adding a title which is always recommended
newggslopegraph(newcancer, Year, Survival, Type,
  Title = "Estimates of Percent Survival Rates",
  SubTitle = NULL,
  Caption = NULL
)

# simple formatting changes
newggslopegraph(newcancer, Year, Survival, Type,
  Title = "Estimates of Percent Survival Rates",
  LineColor = "darkgray",
  LineThickness = .5,
  SubTitle = NULL,
  Caption = NULL
)

# complex formatting with recycling and wider labels see vignette for more examples
newggslopegraph(newcancer, Year, Survival, Type,
  Title = "Estimates of Percent Survival Rates",
  SubTitle = "Based on: Edward Tufte, Beautiful Evidence, 174, 176.",
  Caption = "https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0003nk",
  LineColor = c("black", "red", "grey"),
  LineThickness = .5,
  WiderLabels = TRUE
)

# not a great example but demonstrating functionality
newgdp$rGDP <- round(newgdp$GDP)

newggslopegraph(newgdp,
  Year,
  rGDP,
  Country,
  LineColor = c(rep("grey", 3), "red", rep("grey", 11)),
  DataTextSize = 3,
  DataLabelFillColor = "gray",
  DataLabelPadding = .2,
  DataLabelLineSize = .5
)
# the minimum command to generate a plot
newggslopegraph(newcancer, Year, Survival, Type)

# adding a title which is always recommended
newggslopegraph(newcancer, Year, Survival, Type,
  Title = "Estimates of Percent Survival Rates",
  SubTitle = NULL,
  Caption = NULL
)

# simple formatting changes
newggslopegraph(newcancer, Year, Survival, Type,
  Title = "Estimates of Percent Survival Rates",
  LineColor = "darkgray",
  LineThickness = .5,
  SubTitle = NULL,
  Caption = NULL
)

# complex formatting with recycling and wider labels see vignette for more examples
newggslopegraph(newcancer, Year, Survival, Type,
  Title = "Estimates of Percent Survival Rates",
  SubTitle = "Based on: Edward Tufte, Beautiful Evidence, 174, 176.",
  Caption = "https://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0003nk",
  LineColor = c("black", "red", "grey"),
  LineThickness = .5,
  WiderLabels = TRUE
)

# not a great example but demonstrating functionality
newgdp$rGDP <- round(newgdp$GDP)

newggslopegraph(newgdp,
  Year,
  rGDP,
  Country,
  LineColor = c(rep("grey", 3), "red", rep("grey", 11)),
  DataTextSize = 3,
  DataLabelFillColor = "gray",
  DataLabelPadding = .2,
  DataLabelLineSize = .5
)

Plotting random samples of confidence intervals around the mean

Description

This function takes some parameters and simulates random samples and their confidence intervals

Usage

OurConf(samples = 100, n = 30, mu = 0, sigma = 1, conf.level = 0.95)
OurConf(samples = 100, n = 30, mu = 0, sigma = 1, conf.level = 0.95)

Arguments

`samples`	The number of times to draw random samples
`n`	The sample size we draw each time
`mu`	The population mean mu
`sigma`	The population standard deviation
`conf.level`	What confidence level to compute 1 - alpha (significance level)

Value

A ggplot2 object

Author(s)

Chuck Powell

Examples

OurConf(samples = 100, n = 30, mu = 0, sigma = 1, conf.level = 0.95)
OurConf(samples = 2, n = 5)
OurConf(samples = 25, n = 25, mu = 100, sigma = 20, conf.level = 0.99)
OurConf(samples = 100, n = 30, mu = 0, sigma = 1, conf.level = 0.95)
OurConf(samples = 2, n = 5)
OurConf(samples = 25, n = 25, mu = 100, sigma = 20, conf.level = 0.99)

Plot a 2 Way ANOVA using dplyr and ggplot2

Description

Takes a formula and a dataframe as input, conducts an analysis of variance prints the results (AOV summary table, table of overall model information and table of means) then uses ggplot2 to plot an interaction graph (line or bar) . Also uses Brown-Forsythe test for homogeneity of variance. Users can also choose to save the plot out as a png file.

Usage

Plot2WayANOVA(formula,
               dataframe = NULL,
               confidence=.95,
               plottype = "line",
               errorbar.display = "CI",
               xlab = NULL,
               ylab = NULL,
               title = NULL,
               subtitle = NULL,
               interact.line.size = 2,
               ci.line.size = 1,
               mean.label = FALSE,
               mean.ci = TRUE,
               mean.size = 4,
               mean.shape = 23,
               mean.color = "darkred",
               mean.label.size = 3,
               mean.label.color = "black",
               offset.style = "none",
               overlay.type = NULL,
               posthoc.method = "scheffe",
               show.dots = FALSE,
               PlotSave = FALSE,
               ggtheme = ggplot2::theme_bw(),
               package = "RColorBrewer",
               palette = "Dark2",
               ggplot.component = NULL)
Plot2WayANOVA(formula,
               dataframe = NULL,
               confidence=.95,
               plottype = "line",
               errorbar.display = "CI",
               xlab = NULL,
               ylab = NULL,
               title = NULL,
               subtitle = NULL,
               interact.line.size = 2,
               ci.line.size = 1,
               mean.label = FALSE,
               mean.ci = TRUE,
               mean.size = 4,
               mean.shape = 23,
               mean.color = "darkred",
               mean.label.size = 3,
               mean.label.color = "black",
               offset.style = "none",
               overlay.type = NULL,
               posthoc.method = "scheffe",
               show.dots = FALSE,
               PlotSave = FALSE,
               ggtheme = ggplot2::theme_bw(),
               package = "RColorBrewer",
               palette = "Dark2",
               ggplot.component = NULL)

Arguments

`formula`	a formula with a numeric dependent (outcome) variable, and two independent (predictor) variables e.g. `mpg ~ am * vs`. The independent variables are coerced to factors (with warning) if possible.
`dataframe`	a dataframe or an object that can be coerced to a dataframe
`confidence`	what confidence level for confidence intervals
`plottype`	bar or line (quoted)
`errorbar.display`	default "CI" (confidence interval), which type of errorbar should be displayed around the mean point? Other options include "SEM" (standard error of the mean) and "SD" (standard dev). "none" removes it entirely much like `interaction.plot`
`xlab`, `ylab`	Labels for 'x' and 'y' axis variables. If 'NULL' (default), variable names for 'x' and 'y' will be used.
`title`	The text for the plot title. A generic default is provided.
`subtitle`	The text for the plot subtitle. If 'NULL' (default), key model information is provided as a subtitle.
`interact.line.size`	Line size for the line connecting the group means (Default: '2').
`ci.line.size`	Line size for the confidence interval bracketing the group means (Default: '1').
`mean.label`	Logical that decides whether the value of the group mean is to be displayed (Default: 'FALSE').
`mean.ci`	Logical that decides whether the confidence interval for group means is to be displayed (Default: 'TRUE').
`mean.size`	Point size for the data point corresponding to mean (Default: '4').
`mean.shape`	Shape of the plot symbol for the mean (Default: '23' which is a diamond).
`mean.color`	Color for the data point corresponding to mean (Default: '"darkred"').
`mean.label.size`, `mean.label.color`	Aesthetics for the label displaying mean. Defaults: '3', '"black"', respectively.
`offset.style`	A character string (e.g., '"wide"' or '"narrow"', or '"none"') which controls whether items are offset from the centerline for clarity. Useful when you want to add individual datapoints or confdence interval lines overlap. (Default: '"none"').
`overlay.type`	A character string (e.g., '"box"' or '"violin"'), if you wish to overlay that information on factor1
`posthoc.method`	A character string, one of "hsd", "bonf", "lsd", "scheffe", "newmankeuls", defining the method for the pairwise comparisons. (Default: '"scheffe"').
`show.dots`	Logical that decides whether the individual data points are displayed (Default: 'FALSE').
`PlotSave`	a logical indicating whether the user wants to save the plot as a png file
`ggtheme`	A function, ggplot2 theme name. Default value is ggplot2::theme_bw(). Any of the ggplot2 themes, or themes from extension packages are allowed (e.g., hrbrthemes::theme_ipsum(), etc.).
`package`	Name of package from which the palette is desired as string or symbol.
`palette`	Name of palette as string or symbol.
`ggplot.component`	A ggplot component to be added to the plot prepared. The default is NULL. The argument should be entered as a function. for example to change the size and color of the x axis text you use: 'ggplot.component = theme(axis.text.x = element_text(size=13, color="darkred"))' depending on what theme is in use the ggplot component might not work as expected.

Details

Details about how the function works in order of steps taken.

Some basic error checking to ensure a valid formula and dataframe. Only accepts fully *crossed* formula to check for interaction term
Ensure the dependent (outcome) variable is numeric and that the two independent (predictor) variables are or can be coerced to factors – user warned on the console
Remove missing cases – user warned on the console
Calculate a summarized table of means, sds, standard errors of the means, confidence intervals, and group sizes.
Use aov function to execute an Analysis of Variance (ANOVA)
Use sjstats::anova_stats to calculate eta squared and omega squared values per factor. If the design is unbalanced warn the user and use Type II sums of squares
Produce a standard ANOVA table with additional columns
Use the PostHocTest for producing a table of post hoc comparisons for all effects that were significant
Testing Homogeneity of Variance assumption with Brown-Forsythe test
Use the PostHocTest for conducting post hoc tests for effects that were significant
Use the shapiro.test for testing normality assumption with Shapiro-Wilk
Use ggplot2 to plot an interaction plot of the type the user specified.

The defaults are deliberately constructed to emphasize the nature of the interaction rather than focusing on distributions. So while a violin plot of the first factor by level is displayed along with dots for individual data points shaded by the second factor, the emphasis is on the interaction lines.

Value

A list with 5 elements which is returned invisibly. These items are always sent to the console for display but for user convenience the function also returns a named list with the following items in case the user desires to save them or further process them - $ANOVATable, $ModelSummary, $MeansTable, $PosthocTable, $BFTest, and $SWTest. The plot is always sent to the default plot device

Author(s)

Chuck Powell

References

: ANOVA: Delacre, Leys, Mora, & Lakens, *PsyArXiv*, 2018

Examples


Plot2WayANOVA(mpg ~ am * cyl, mtcars, plottype = "line")
Plot2WayANOVA(mpg ~ am * cyl,
  mtcars,
  plottype = "line",
  overlay.type = "box",
  mean.label = TRUE
)

library(ggplot2)
Plot2WayANOVA(mpg ~ am * vs, 
  mtcars, 
  confidence = .99,
  ggplot.component = theme(axis.text.x = element_text(size=13, color="darkred")))
  
Plot2WayANOVA(mpg ~ am * cyl, mtcars, plottype = "line")
Plot2WayANOVA(mpg ~ am * cyl,
  mtcars,
  plottype = "line",
  overlay.type = "box",
  mean.label = TRUE
)

library(ggplot2)
Plot2WayANOVA(mpg ~ am * vs, 
  mtcars, 
  confidence = .99,
  ggplot.component = theme(axis.text.x = element_text(size=13, color="darkred")))

Plot a Cross Tabulation of two variables using dplyr and ggplot2

Description

Takes a dataframe and at least two variables as input, conducts a crosstabulation of the variables using dplyr. Removes NAs and then plots the results as one of three types of bar (column) graphs using ggplot2. The function accepts either bare variable names or column numbers as input (see examples for the possibilities)

Usage

PlotXTabs(dataframe, xwhich, ywhich, plottype = "side")
PlotXTabs(dataframe, xwhich, ywhich, plottype = "side")

Arguments

`dataframe`	an object that is of class dataframe
`xwhich`	either a bare variable name that is valid in the dataframe or one or more column numbers. An attempt will be made to coerce the variable to a factor but odd plots will occur if you pass it a variable that is by rights continuous in nature.
`ywhich`	either a bare variable name that is valid in the dataframe or one or more column numbers that exist in the dataframe. An attempt will be made to coerce the variable to a factor but odd plots will occur if you pass it a variable that is by rights continuous in nature.
`plottype`	one of three options "side", "stack" or "percent"

Value

One or more ggplots to the default graphics device as well as advisory information in the console

Author(s)

Chuck Powell

Examples

PlotXTabs(mtcars, am, vs)
PlotXTabs(mtcars, am, vs, "stack")
PlotXTabs(mtcars, am, vs, "percent")
PlotXTabs(mtcars, am, 8, "side")
PlotXTabs(mtcars, 8, am, "stack")
PlotXTabs(mtcars, am, c(8, 10), "percent")
PlotXTabs(mtcars, c(10, 8), am)
PlotXTabs(mtcars, c(2, 9), c(10, 8), "mispelled")
## Not run: 
PlotXTabs(happy, happy, sex) # baseline
PlotXTabs(happy, 2, 5, "stack") # same thing using column numbers
PlotXTabs(happy, 2, c(5:9), plottype = "percent") # multiple columns RHS
PlotXTabs(happy, c(2, 5), 9, plottype = "side") # multiple columns LHS
PlotXTabs(happy, c(2, 5), c(6:9), plottype = "percent")
PlotXTabs(happy, happy, c(6, 7, 9), plottype = "percent")
PlotXTabs(happy, c(6, 7, 9), happy, plottype = "percent")

## End(Not run)

PlotXTabs(mtcars, am, vs)
PlotXTabs(mtcars, am, vs, "stack")
PlotXTabs(mtcars, am, vs, "percent")
PlotXTabs(mtcars, am, 8, "side")
PlotXTabs(mtcars, 8, am, "stack")
PlotXTabs(mtcars, am, c(8, 10), "percent")
PlotXTabs(mtcars, c(10, 8), am)
PlotXTabs(mtcars, c(2, 9), c(10, 8), "mispelled")
## Not run: 
PlotXTabs(happy, happy, sex) # baseline
PlotXTabs(happy, 2, 5, "stack") # same thing using column numbers
PlotXTabs(happy, 2, c(5:9), plottype = "percent") # multiple columns RHS
PlotXTabs(happy, c(2, 5), 9, plottype = "side") # multiple columns LHS
PlotXTabs(happy, c(2, 5), c(6:9), plottype = "percent")
PlotXTabs(happy, happy, c(6, 7, 9), plottype = "percent")
PlotXTabs(happy, c(6, 7, 9), happy, plottype = "percent")

## End(Not run)

Bivariate bar (column) charts with statistical tests

Description

Bivariate bar charts for nominal and ordinal data with (optionally) statistical details included in the plot as a subtitle.

Usage

PlotXTabs2(
  data,
  x,
  y,
  counts = NULL,
  results.subtitle = TRUE,
  title = NULL,
  subtitle = NULL,
  caption = NULL,
  plottype = "percent",
  xlab = NULL,
  ylab = "Percent",
  legend.title = NULL,
  legend.position = "right",
  labels.legend = NULL,
  sample.size.label = TRUE,
  data.label = "percentage",
  label.text.size = 4,
  label.fill.color = "white",
  label.fill.alpha = 1,
  bar.outline.color = "black",
  x.axis.orientation = NULL,
  conf.level = 0.95,
  k = 2,
  perc.k = 0,
  mosaic.offset = 0.003,
  mosaic.alpha = 1,
  bf.details = FALSE,
  bf.display = "regular",
  sampling.plan = "jointMulti",
  fixed.margin = "rows",
  prior.concentration = 1,
  paired = FALSE,
  ggtheme = ggplot2::theme_bw(),
  package = "RColorBrewer",
  palette = "Dark2",
  direction = 1,
  ggplot.component = NULL
)
PlotXTabs2(
  data,
  x,
  y,
  counts = NULL,
  results.subtitle = TRUE,
  title = NULL,
  subtitle = NULL,
  caption = NULL,
  plottype = "percent",
  xlab = NULL,
  ylab = "Percent",
  legend.title = NULL,
  legend.position = "right",
  labels.legend = NULL,
  sample.size.label = TRUE,
  data.label = "percentage",
  label.text.size = 4,
  label.fill.color = "white",
  label.fill.alpha = 1,
  bar.outline.color = "black",
  x.axis.orientation = NULL,
  conf.level = 0.95,
  k = 2,
  perc.k = 0,
  mosaic.offset = 0.003,
  mosaic.alpha = 1,
  bf.details = FALSE,
  bf.display = "regular",
  sampling.plan = "jointMulti",
  fixed.margin = "rows",
  prior.concentration = 1,
  paired = FALSE,
  ggtheme = ggplot2::theme_bw(),
  package = "RColorBrewer",
  palette = "Dark2",
  direction = 1,
  ggplot.component = NULL
)

Arguments

`data`	A dataframe or tibble containing the 'x' and 'y' variables.
`x`	The variable to plot on the X axis of the chart.
`y`	The variable to segment the columns and test for independence.
`counts`	If the dataframe is based upon counts rather than individual rows for observations, 'counts' must contain the name of variable that contains the counts. See 'HairEyeColor' example.
`results.subtitle`	Decides whether the results of statistical tests are displayed as a subtitle (Default: TRUE). If set to FALSE, no subtitle.
`title`	The text for the plot title.
`subtitle`	The text for the plot subtitle. N.B if statistical results are requested through 'results.subtitle = TRUE' the results will have precedence.
`caption`	The text for the plot caption. Please note the interaction with 'bf.details'.
`plottype`	one of four options "side", "stack", "mosaic" or "percent"
`xlab`	Custom text for the 'x' axis label (Default: 'NULL', which will cause the 'x' axis label to be the 'x' variable).
`ylab`	Custom text for the 'y' axis label (Default: '"Percent"'). Set to 'NULL' for no label.
`legend.title`	Title text for the legend.
`legend.position`	The position of the legend '"none"', '"left"', '"right"', '"bottom"', '"top"' (Default: '"right"').
`labels.legend`	A character vector with custom labels for levels of the 'y' variable displayed in the legend.
`sample.size.label`	Logical that decides whether sample size information should be displayed for each level of the grouping variable 'y' (Default: 'TRUE').
`data.label`	Character decides what information needs to be displayed on the label in each bar segment. Possible options are '"percentage"' (default), '"counts"', '"both"'.
`label.text.size`	Numeric that decides size for bar labels (Default: '4').
`label.fill.color`	Character that specifies fill color for bar labels (Default: 'white').
`label.fill.alpha`	Numeric that specifies fill color transparency or '"alpha"' for bar labels (Default: '1' range '0' to '1').
`bar.outline.color`	Character specifying color for bars (default: '"black"').
`x.axis.orientation`	The orientation of the 'x' axis labels one of "slant" or "vertical" to change from the default horizontal orientation (Default: 'NULL' which is horizontal).
`conf.level`	Scalar between 0 and 1. If unspecified, the defaults return lower and upper confidence intervals (0.95).
`k`	Number of digits after decimal point (should be an integer) (Default: k = 2) for statistical results.
`perc.k`	Numeric that decides number of decimal places for percentage labels (Default: '0').
`mosaic.offset`	Numeric that decides size of spacing between mosaic blocks (Default: '.003' which is very narrow). "reasonable" values probably lie between .05 and .001
`mosaic.alpha`	Numeric that controls the "alpha" level of the mosaic plot blocks (Default: '1' which is essentially no "fading"). Values must be in the range 0 to 1 see: 'ggplot2::aes_colour_fill_alpha'
`bf.details`	Logical that decides whether to display additional information from the Bayes Factor test in the caption (default:'FALSE'). This will take precedence over any text you enter as a 'caption'.
`bf.display`	Character that determines how the Bayes factor value is is displayed. The default is simply the number rounded to 'k'. Other options include "sensible", "log" and "support".
`sampling.plan`	the sampling plan (see details in ?contingencyTableBF).
`fixed.margin`	(see details in ?contingencyTableBF).
`prior.concentration`	(see details in ?contingencyTableBF).
`paired`	Not used yet.
`ggtheme`	A function, ggplot2 theme name. Default value is ggplot2::theme_bw(). Any of the ggplot2 themes, or themes from extension packages are allowed (e.g., hrbrthemes::theme_ipsum(), etc.).
`package`	Name of package from which the palette is desired as string or symbol.
`palette`	Name of palette as string or symbol.
`direction`	Either '1' or '-1'. If '-1' the palette will be reversed.
`ggplot.component`	A ggplot component to be added to the plot prepared by ggstatsplot. Default is NULL. The argument should be entered as a function. If the given function has an argument axes.range.restrict and if it has been set to TRUE, the added ggplot component might not work as expected.

Author(s)

Chuck Powell, Indrajeet Patil

Examples


# for reproducibility
set.seed(123)

# simplest possible call with the defaults
PlotXTabs2(
  data = mtcars,
  y = vs,
  x =  cyl
)  

# more complex call
PlotXTabs2(
  data = datasets::mtcars,
  y = vs,
  x = cyl,
  bf.details = TRUE,
  labels.legend = c("0 = V-shaped", "1 = straight"),
  legend.title = "Engine Style",
  legend.position = "right",
  title = "The perenial mtcars example",
  palette = "Pastel1"
)

PlotXTabs2(
  data = as.data.frame(HairEyeColor),
  y = Eye,
  x = Hair,
  counts = Freq
)

## Not run: 
# mosaic plot requires ggmosaic 0.2.2 or higher from github
PlotXTabs2(
  data = mtcars,
  x = vs,
  y =  am, 
  plottype = "mosaic", 
  data.label = "both", 
  mosaic.alpha = .9, 
  bf.display = "support", 
  title = "Motorcars Mosaic Plot VS by AM"
)

## End(Not run)

# for reproducibility
set.seed(123)

# simplest possible call with the defaults
PlotXTabs2(
  data = mtcars,
  y = vs,
  x =  cyl
)  

# more complex call
PlotXTabs2(
  data = datasets::mtcars,
  y = vs,
  x = cyl,
  bf.details = TRUE,
  labels.legend = c("0 = V-shaped", "1 = straight"),
  legend.title = "Engine Style",
  legend.position = "right",
  title = "The perenial mtcars example",
  palette = "Pastel1"
)

PlotXTabs2(
  data = as.data.frame(HairEyeColor),
  y = Eye,
  x = Hair,
  counts = Freq
)

## Not run: 
# mosaic plot requires ggmosaic 0.2.2 or higher from github
PlotXTabs2(
  data = mtcars,
  x = vs,
  y =  am, 
  plottype = "mosaic", 
  data.label = "both", 
  mosaic.alpha = .9, 
  bf.display = "support", 
  title = "Motorcars Mosaic Plot VS by AM"
)

## End(Not run)

SeeDist – See The Distribution

Description

This function takes a vector of numeric data and returns one or more ggplot2 plots that help you visualize the data. Meant to be a useful wrapper for exploring univariate data. Has a plethora of options including type of visualization (histogram, boxplot, density, violin) as well as commonly desired overplots like mean and median points, z and t curves etc.. Common descriptive statistics are provided as a subtitle if desired and sent to the console as well.

Usage

SeeDist(
  x,
  title = "Default",
  subtitle = "Default",
  numbins = 0,
  xlab = NULL,
  var_explain = NULL,
  data.fill.color = "deepskyblue",
  mean.line.color = "darkgreen",
  median.line.color = "yellow",
  mode.line.color = "orange",
  mean.line.type = "longdash",
  median.line.type = "dashed",
  mode.line.type = "dashed",
  mean.line.size = 1.5,
  median.line.size = 1.5,
  mean.point.shape = 21,
  median.point.shape = 23,
  mean.point.size = 4,
  median.point.size = 4,
  zcurve.color = "red",
  zcurve.type = "twodash",
  zcurve.size = 1,
  tcurve.color = "black",
  tcurve.type = "dotted",
  tcurve.size = 1,
  mode.line.size = 1,
  whatplots = c("d", "b", "h", "v"),
  k = 2,
  add_jitter = TRUE,
  add_rug = TRUE,
  xlim_left = NULL,
  xlim_right = NULL,
  ggtheme = ggplot2::theme_bw()
)
SeeDist(
  x,
  title = "Default",
  subtitle = "Default",
  numbins = 0,
  xlab = NULL,
  var_explain = NULL,
  data.fill.color = "deepskyblue",
  mean.line.color = "darkgreen",
  median.line.color = "yellow",
  mode.line.color = "orange",
  mean.line.type = "longdash",
  median.line.type = "dashed",
  mode.line.type = "dashed",
  mean.line.size = 1.5,
  median.line.size = 1.5,
  mean.point.shape = 21,
  median.point.shape = 23,
  mean.point.size = 4,
  median.point.size = 4,
  zcurve.color = "red",
  zcurve.type = "twodash",
  zcurve.size = 1,
  tcurve.color = "black",
  tcurve.type = "dotted",
  tcurve.size = 1,
  mode.line.size = 1,
  whatplots = c("d", "b", "h", "v"),
  k = 2,
  add_jitter = TRUE,
  add_rug = TRUE,
  xlim_left = NULL,
  xlim_right = NULL,
  ggtheme = ggplot2::theme_bw()
)

Arguments

`x`	the data to be visualized. Must be numeric.
`title`	Optionally replace the default title displayed. title = NULL will remove it entirely. title = "" will provide an empty title but retain the spacing. A sensible default is provided otherwise.
`subtitle`	Optionally replace the default subtitle displayed. subtitle = NULL will remove it entirely. subtitle = "" will provide an empty subtitle but retain the spacing. A sensible default is provided otherwise.
`numbins`	the number of bins to use for any plots that bin. If nothing is specified the function will calculate a rational number using Freedman-Diaconis via the `nclass.FD` function
`xlab`	Custom text for the 'x' axis label (Default: 'NULL', which will cause the 'x' axis label to be the 'x' variable).
`var_explain`	additional contextual information about the variable as a string such as "Miles Per Gallon" which is appended to the default title information.
`data.fill.color`	Character string that specifies fill color for the main data area (Default: 'deepskyblue').
`mean.line.color`, `median.line.color`, `mode.line.color`	Character string that specifies line color (Default: 'darkgreen', 'yellow', 'orange').
`mean.line.type`, `median.line.type`, `mode.line.type`	Character string that specifies line color (Default: 'longdash', 'dashed', 'dashed').
`mean.line.size`, `median.line.size`, `mode.line.size`	Numeric that specifies line size (Default: '1.5', '1.5', '1'). You can set to '0' to make any of the lines "disappear".
`mean.point.shape`, `median.point.shape`	Integer in 0 - 25 specifies shape of mean or median point mark on the violin plot (Default: '21', '23').
`mean.point.size`, `median.point.size`	Integer specifies size of mean or median point mark on the violin plot (Default: '4'). You can set to '0' to make any of the points "disappear".
`zcurve.color`, `tcurve.color`	Character string that specifies line color (Default: 'red', 'black').
`zcurve.type`, `tcurve.type`	Character string that specifies line color (Default: 'twodash', 'dotted').
`zcurve.size`, `tcurve.size`	Numeric that specifies line size (Default: '1'). You can set to '0' to make any of the lines "disappear".
`whatplots`	what type of plots? The default is whatplots = c("d", "b", "h", "v") for a density, a boxplot, a histogram, and a violin plot
`k`	Number of digits after decimal point (should be an integer) (Default: k = 2) for statistical results.
`add_jitter`	Logical (Default: 'TRUE') controls whether jittered data ponts are added to violin plot.
`add_rug`	Logical (Default: 'TRUE') controls whether "rug" data points are added to density plot and histogram.
`xlim_left`, `xlim_right`	Logical. For density plots can be used to override the default which is 3 std deviations left and right of the mean of x. Useful for theoretical reasons like horsepower < 0 or when 'ggplot2' warns you that it has removed rows containing non-finite values (stat_density).
`ggtheme`	A function, ggplot2 theme name. Default value is ggplot2::theme_bw(). Any of the ggplot2 themes, or themes from extension packages are allowed (e.g., hrbrthemes::theme_ipsum(), etc.).

Value

from 1 to 4 plots depending on what the user specifies as well as an extensive summary courtesy 'DescTools::Desc' printed to the console

Warning

If the data has more than 3 modal values only the first three of them are plotted. The rest are ignored and the user is warned on the console.

Missing values are removed with a warning to the user

Author(s)

Chuck Powell

Examples

SeeDist(rnorm(100, mean = 100, sd = 20), numbins = 15, var_explain = "A Random Sample")
SeeDist(mtcars$hp, var_explain = "Horsepower", whatplots = c("d", "b"))
SeeDist(iris$Sepal.Length, var_explain = "Sepal Length", whatplots = "d")
SeeDist(rnorm(100, mean = 100, sd = 20), numbins = 15, var_explain = "A Random Sample")
SeeDist(mtcars$hp, var_explain = "Horsepower", whatplots = c("d", "b"))
SeeDist(iris$Sepal.Length, var_explain = "Sepal Length", whatplots = "d")

U.S. 2000 Election Data (short)

Description

Data from a post-election survey following the year 2000 U.S. presidential elections. This is a subset from package 'CHAID'.

Usage

USvoteS
USvoteS

Format

A data frame with 1000 observations on the following 6 variables.:

vote3: candidate voted for Gore or Bush
gender: gender, a factor with levels male and female
ager: age group, an ordered factor with levels 18-24 < 25-34 < 35-44 < 45-54 < 55-64 < 65+
empstat: status of employment, a factor with levels yes, no or retired
educr: status of education, an ordered factor with levels <HS < HS < >HS < College < Post Coll
marstat: status of living situation, a factor with levels married, widowed, divorced or never married

Source

https://r-forge.r-project.org/R/?group_id=343

Package 'CGPfunctions'

Help Index

Anova Tables for Type 2 sums of squares

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Choose display type for BF formatting.

Description

Usage

Arguments

Value

Author(s)

CGPfunctions: A package of miscellaneous functions for teaching statistics.

Description

Functions included

Produce CHAID results tables from a partykit CHAID model

Description

Usage

Arguments

Value

Author(s)

Examples

U.S. 2000 Election Data (short)

Description

Usage

Format

Source

Cross two vectors of variable names from a dataframe

Description

Usage

Arguments

Value

Author(s)

Examples

Derive the modal value(s) for a set of data

Description

Usage

Arguments

Value

Warning

Examples

Tufte dataset on cancer survival rates

Description

Usage

Format

Source

Tufte dataset on Gross Domestic Product, 1970 and 1979

Description

Usage

Format

Source

Plot a Slopegraph a la Tufte using dplyr and ggplot2

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Plotting random samples of confidence intervals around the mean

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Plot a 2 Way ANOVA using dplyr and ggplot2

Description

Usage

Arguments

Details

Value