![]() Include answers to the qualitative questions using comments.ġ. Make a script in RStudio that collects all your R code required to answer the following questions. To the end of one of your lines of code making a graph, to see whether you prefer the result to the default design. It can also be nice to remove the default gray background, to make what some feel is a cleaner graph. The labels that we want to add are included in quotes inside the xlab and ylab functions. ggplot(guppyFatherSonData,Īes(x = fatherOrnamentation, y = sonAttractiveness)) + Let’s do that for the scatterplot, to make the labels a little nicer to read for humans. For example, you can change the text of the x-axis label or the y-axis label by using xlab or ylab. Let’s dig a little deeper into just a couple of options that you can add to any of the forgoing graphs to make them look a little better. You can choose the font, the font size, the colors, the style of the axes labels, etc., and you can customize the legends and axes legends nearly as much as you want. Not only are there far more choices about the kinds of plots available, but there are many, many options for customizing the look and feel of each graph. The code we have listed here for graphics barely scratches the surface of what ggplot, and R as a whole, are capable of. The vertical lines are called whiskers, and they cover most of the range of the data (except when data points are pretty far from the median (see text), when they are plotted as individual dots, as on the male boxplot). The “third quartile” is the 75 th percentile– the value bigger than 3/4 of the other values.) (The “first quartile” is the 25 th percentile of the data–the value which is bigger than 25% of the other values. The upper and lower bounds of the box extend from the first to the third quartile. Here the thick bar in the middle of each boxplot is the median of that group. The other new feature here is the new geom function, geom_boxplot(). See the result below, and look at where the variables are. Notice that the y variable here is age, and x is the categorical variable sex that winds up on the x-axis. Here’s the code to draw a boxplot for age in the titanic data set, separately for each sex: ggplot(titanicData, aes(x=sex, y=age)) + geom_boxplot() # Warning: Removed 680 rows containing non-finite values (stat_boxplot). At the end of this lab we’ll see a couple of options that can make a ggplot graph look a little better.Ī boxplot is a convenient way of showing the frequency distribution of a numerical variable in multiple groups. This is not the most beautiful graph in the world, but it conveys the information. This is the part that tells R that the “geometry” of our plot should be a histogram. The second function in this command is geom_histogram(). (The aes stands for “aesthetics”,” but if you’re like us this won’t help you remember it any better.) In this case, the aes() function tells R that we want age to be the x-variable (i.e. the variable that is displayed along the x-axis). The second input to ggplot is an aes() function. Listed first is titanicData this is the name of the data frame containing the variables that we want to graph. The first function is ggplot(), and it has two input arguments. Notice that there are two functions called here, put together in a single command with a plus sign. # Warning: Removed 680 rows containing non-finite values (stat_bin). Here’s the code to make a simple histogram of age: ggplot(titanicData, aes(x=age)) + geom_histogram() # `stat_bin()` using `bins = 30`. titanicData <- read.csv("DataForLabs/titanic.csv", stringsAsFactors = TRUE) Make sure you have loaded the data (using read.csv) into a data frame called titanicData. Let’s see how to make a basic histogram using the age data from the Titanic data set. A histogram represents the frequency distribution of a numerical variable in a sample.
0 Comments
Leave a Reply. |