We have information on perch- a type of fish- caught in a lake in Finland. For each of the 56 fish caught we have data on their weight (in grams), length (in cm), and width (in cm). Create a model using the variables collected to predict the weight of perch causing at Lake Laengelmavesi in Finland.
First, let’s calculate the correlation between each of the terms using a correlation matrix as well as a scatterplot matrix.
cor(Perch[, c(2:4)])
## Weight Length Width
## Weight 1.0000000 0.9595061 0.9642244
## Length 0.9595061 1.0000000 0.9751074
## Width 0.9642244 0.9751074 1.0000000
plot(Perch[, c(2:4)])
It appears that both width and length are highly correlated with weight; however, when we look at the scatterplot matrix the relationships do not appear to be linear.
p1 <- ggplot(Perch) + geom_point(aes(x = Length, y = Weight)) +
labs(x = "Length", y = "Weight", title = "Scatterplot: Weight vs Length")
p2 <- ggplot(Perch) + geom_point(aes(x = Length^2, y = Weight)) +
labs(x = "Length^2", y = "Weight", title = "Scatterplot: Weight vs Length^2")
p3 <- ggplot(Perch) + geom_point(aes(x = Width, y = Weight)) +
labs(x = "Width", y = "Weight", title = "Scatterplot: Weight vs Width")
p4 <- ggplot(Perch) + geom_point(aes(x = Width^2, y = Weight)) +
labs(x = "Width^2", y = "Weight", title = "Scatterplot: Weight vs Width^2")
grid.arrange(p1,p2,p3,p4)