1  Introduction

2 Beginners’ guide to R

It is often said that the majority of time is spent on data cleaning and 20% on analysis. A common trap when one starts using R is that the library have not been installed or the data are in different folder to the working directory. Installing library is easy with the command install.packages. The command getwd() will tell you which directory you are in.

Sometimes, the user encounter issues with R. This is not an uncommon problem and often not due to serious errors but forgetting that R is case-sensitive. Also when copying codes from the net across, R does not recognise the “inverted comma” but has its own style of “inverted comma”. Unlike Python, R does not tolerate space between variables. R also does not like variables to be named starting with a number.

R treats the variables in certain ways depend on their class. Some function requires that the variables are numeric. The Breast Cancer data from mlbench has 11 columns with the 9 covariates being factors and ordered factors. These issues will be dealt further in the chapter on data wrangling. One way to get an overview of the structure of the data is to use the glimpse function from dplyr.

Another issue is that some libraries have assign certain names to their function and which is also used by other libraries. In the course of opening multiple libraries, R may be confused and use the last library open and which may lead to error result. We will illustrate this with the forest function in the Statistics chapter.

A selling point of R is that there is a large community of users who often post their solutions online. it’s worth spending time on stack overflow to look at similar problems that others have and the approaches to solving them. A clue to pose questions is to look at the error. If you wish to post questions for others to help then it’s encouraged that the question include some data so that the person providing the solution can understand the nature of the errors. Useful blogs are also available at https://www.r-bloggers.com/.

This chapter focuss on the ggplot2 and its related libraries for plotting (Wickham 2016). Base R also has plotting function but lacks the flexibility of ggplot2. Plotting is introduced early to enage clinicians who may not have the patience read the following chapter on data wrangling prior. The book is not intended to be used in a sequential fashion as the reader may find elements of this chapter relevant to them and jump to another chapter such as chapter 3 on statistics.

2.1 Plot using base R

Below we illustrate bar plot using base R.

data("Leukemia", package="Stat2Data")
#AML dataset-treatment
colnames(Leukemia)
[1] "Age"    "Smear"  "Infil"  "Index"  "Blasts" "Temp"   "Resp"   "Time"  
[9] "Status"
#base R
hist(Leukemia$Age, 8, xlab = "Age",main="Acute Myeloid Leukemia Dataset")

Line plot can be performed in base R using abline argument. The font is defined by cex argument and colour defined by col argument. The title is defined by main argument. The X and Y axes can be labelled using xlab and ylab arguments within the plot function. Below we illustrate example of drawing a line to a point and in the segment below we illustrate example of drawing a line.

#set parameters of plot
X=seq(0,1,by=.1) #define a set of point from 0 to 1, separated by 0.1.
Y=seq(0,1,by=.1)

#define A and B
A=.065; B=.44

#location of point
plot(X,Y, main="ROC curve")
points(A,B,pch=8,col="red",cex=2) #add point
abline(coef = c(0,1)) #add diagonal line

#draw line to a point
segments(x0=0,y0=0,x1=A,y1=B,col="blue")
segments(x0=A,y0=B,x1=1,y=1,col="blue")

This is an illustration of using base R to plot regression line. Later, we will illustrate using geom_smooth call from ggplot2.

TPR_1=.44; FPR_1=.065

plot(X,Y,main="Likelihood ratio graph", xlab="1-Specificity",ylab="Sensitivity",cex=.25)
  points(.02,.8,pch=8,col="red",cex=2) #add point
  df1<-data.frame(c1=c(0,TPR_1),c2=c(0,FPR_1))
  reg1<-lm(c1~c2,data=df1)
  df2<-data.frame(c1=c(TPR_1,1),c2=c(FPR_1,1))
  reg2<-lm(c1~c2,data=df2)
  abline(reg1) #draw line using coefficient reg1
  abline(reg2) #draw line using coefficient reg2
  text(x=FPR_1,y=TPR_1+.3,label="Superior",cex=.7)
  text(x=FPR_1+.2,y=TPR_1+.2,label="Absence",cex=.7)
  text(x=.0125,y=TPR_1-.1,label="Presence",cex=.7)
  text(x=FPR_1+.1,y=TPR_1,label="Inferior",cex=.7)

Function can be plotted using variations of the above. This requires a formula to describe variable Y. Shading in base R is performed with the polygon function.

AUC_Logistic<-function (A,B,C,D){
  #binary data
  #A=True pos %B=False positive   %C=False negative   %D=True negative
  
  TPR=A/(A+C)
  FPR=1-(D/(D+B))
  
  #binomial distribution sqrt(np(1-p))
  #Statist. Med. 2002; 21:1237-1256 (DOI: 10.1002/sim.1099)
  STDa=sqrt((A+C)*TPR*(1-TPR));
  STDn=sqrt((B+D)*FPR*(1-FPR));
  a=STDn/STDa;
  theta=log((TPR/(1-TPR))/((FPR/(1-FPR))^a));
  
  #define a set of point from 0 to 1, separated by 0.001.
  X=seq(0,1,by=0.001)  
  
  #logistic regression model
  Y1=(X^a)/(X^a+(((1-X)^a)*exp(-1*theta)));
  AUC=round(pracma::trapz(X,Y1),2)
  AUC

#SE using Hanley & McNeil
#Preferred method if more than one TPR,FPR data point known
#Hanley is less conservative than Bamber
  Nn=B+D;
  Na=A+C;
  Q1=AUC/(2-AUC);
  Q2=2*AUC^2/(1+AUC);
  
SEhanley=sqrt(((AUC*(1-AUC))+((Na-1)*(Q1-AUC^2))+((Nn-1)*(Q2-AUC^2)))/(Na*Nn))

#SE using Bamber
#Ns is the number of patients in the smallest group
if (A+C>B+D) {
  Ns=B+D
  } else {
   Ns=A+C
  }
SEbamber=sqrt((AUC*(1-AUC))/(Ns-1))

# plot smoothed ROC
plot(X,Y1,main="ROC curve", xlab="1-Specificity",ylab="Sensitivity",cex=.25)
points(FPR,TPR,pch=8,col="red",cex=2) #add point

Y2=0
polygon(c(X[X >= 0 & X <= 1],
          rev(X[X >= 0 & X <= 1])),
        c(Y1[X >= 0 & X <= 1],
          rev(Y2[X >= 0 & X <= 1])),
        col = "#6BD7AF")


print(paste("The Area under the ROC curve using the logistic function is", 
            AUC,". The Area under the ROC curve using rank sum method is", round(.5*(TPR+(1-FPR)),2)))
}

AUC_Logistic(10,20,2,68)

[1] "The Area under the ROC curve using the logistic function is 0.84 . The Area under the ROC curve using rank sum method is 0.8"

2.2 ggplot2

The plot below uses ggplot2 or grammar of graphics. The plot is built layer by layer like constructing a sentence. Plotting is a distinct advantage of R over commercial software with GUI (graphical user interface) like SPSS. A wide variety of media organisations (BBC, Economist) are using ggplot2 with their own stylised theme. The plot has a certain structure such the name of the data and aesthetics for x and y axes. For illustration purpose the aesthetics are labelled with x and y statements. The fill argument in the aesthetics indicate the variables for coloring.

There are different ways to use ggplot2: quick plot or qplot with limited options and full ggplot2 with all the options. The choice of the method depends on individual preference and as well as reason for plotting. This function qplot has now been deprecated to encourage users to learn ggplot.

To run the examples, check that you have install the libraries. an error can occurred if you don’t have the required library. The meta-character # is used to signal that the line is meant to comment the code ie R will not read it. The install.packages command only need to be run once.

2.2.1 Histogram

The flexibility of ggplot2 is shown here in this histogram. The legend can be altered using the scale_fill_manual function. If other colours are preferred then under values add the preferred colours.

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.3
#qplot
qplot(Age, data=Leukemia, bins=8)
Warning: `qplot()` was deprecated in ggplot2 3.4.0.

Now a more complex version of histogram using ggplot with color added.

#ggplot2
ggplot(data=Leukemia,aes(Age,fill=as.factor(Resp)))+
  geom_histogram(bins=8)

Note the legend is a bit untidy. The legend can be changed using scale_fill_manual. The color can be specified using rgb argument. This is important as some journals prefer certain color.

#adding legend, changing the values 0 and 1 to treatment response
ggplot(data=Leukemia,aes(Age,fill=as.factor(Resp)))+
  geom_histogram(bins=8)+
  scale_fill_manual(name="Response",values=c("#999999","#56B4E9"),
                    breaks=c(0,1),labels=c("No Treatment","Treatment"))

Adding title is easy with ggtitle.

#adding title
ggplot(data=Leukemia,aes(Age,fill=as.factor(Resp)))+
  geom_histogram(bins=8)+
  scale_fill_manual(name="Response",
                    values=c("#555555","#56B4E9"),
                    breaks=c(0,1),
                    labels=c("No Treatment","Treatment"))+
  ggtitle("Acute Myeloid Leukemia Treatment dataset")

2.2.2 Bar plot

Previously, we had used base R for bar plot. Here we use the geom_bar argument in ggplot.

library(ggplot2)
library(extrafont)
Registering fonts with R
#this data came from a paper published in Epilespia 2019 on cost of looking 
#after patients with pseudoseizure 

##                              Lower  Upper
##                    percentage 0.95   0.95  
## Duration_PNES.lmg  0.0974     0.0057 0.2906
## pseudostatus.lmg   0.2283     0.0687 0.3753
## Hx_anxiety.lmg     0.0471     0.0057 0.1457
## Hx_depression.lmg  0.0059     0.0041 0.1082
## DSP.lmg            0.0582     0.0071 0.1500
## seizure_burden.lmg 0.0179     0.0041 0.1058
## sex.lmg            0.0413     0.0030 0.1519
df<-data.frame("Predictors"=c("Duration_PNES","pseudostatus","Hx_anxiety",
"Hx_depression","DSP","seizure_burden","sex"),
    "Importance"=c(0.09737,0.22825, 0.047137,0.00487,0.058153,0.01786,0.04131),
    "Lower.95"=c(0.0057,0.0687,0.0057,0.0041,0.0071,0.0041,0.0030),
    "Upper.95"=c(0.2906,0.3753,0.1457,0.1082,0.1500,0.1058,0.1519))

#check dimensions of data frame
dim(df)
[1] 7 4
#check variables in data frame
colnames(df)
[1] "Predictors" "Importance" "Lower.95"   "Upper.95"  
#bar plot uses geom_bar 
ggplot(df, aes(x=Predictors,y=Importance))+
  geom_bar(stat="identity")

This bar plot may be considered as untidy as the variables have not been sorted. Reordering the data requires ordering the factors. The colors were requested by the journal Epilesia in order to aid recognition of the bar by readers with color blindness (Seneviratne et al. 2019).

#reordering the data
df3<-df2<-df
df3$Predictors<-factor(df2$Predictors, levels=df2[order(df$Importance),"Predictors"])
#adding color
p<-ggplot(df3, aes(x=Predictors,y=Importance,fill=Predictors))+
  geom_bar(colour="black",
    stat="identity",
    fill=
      c("#e4b84b","#ce8080","#511c23","#e37c1d","#ffde75","#abb47d","#a1c5cb"))
p

This bar plot is now ordered but the labels on the axis seem to run into each other. One solution is to tile the axis title using element_text. Note that the text size can also be specified within this argument.

#rotate legend on x axis label by 45
p+theme(axis.title.y = element_text(face="bold",  size=12),
        axis.title.x = element_text(face="bold", size=12), 
        axis.text.x  = element_text(angle=45, vjust=0.5, size=10))

The title can be broken up using the backward slash.

#adding break in title
p1<-p+geom_errorbar(aes(ymin=Lower.95,ymax=Upper.95,width=0.2))+
  labs(y="R2 exaplained (%)")+
  theme(text=element_text(size=10))+
  ggtitle(" Relative Importance of Regressors \n Cost among patients with non-epileptic seizure")
p1

2.2.3 Pie chart

This example uses the data above on contribution of non-epileptic seizure variables to hospitalised cost.

library(ggplot2)
df3$Percentage=round(df3$Importance/sum(df3$Importance)*100,0)
ggplot(df3, aes(x="" ,y=Percentage,fill=Predictors))+
  geom_bar(stat="identity", width=1, color="white") +
  coord_polar("y", start=0) +
  theme_void() 

2.2.4 Scatter plot

The above data is used here to illustrate scatter plot. We can denote the color difference among the Predictors by adding color argument in the aesthetics.

#color in qplot
qplot(data=df3, Predictors, Importance,color=Predictors)

Adding color in ggplot is the same as in qplot.

#color ggplot
ggplot(df3, aes(x=Predictors,y=Importance,color=Predictors))+
  geom_point()+
  theme(axis.title.y = element_text(face="bold",  size=10),
        axis.title.x = element_text(face="bold", size=10), 
        axis.text.x  = element_text(angle=45, vjust=0.5, size=10))

The size argument within aes can be used like color. In this case, it’s used to denote the importance of the predictors.

#size
ggplot(df3, aes(x=Predictors,y=Importance,color=Predictors,size=Predictors))+
  geom_point()+
  theme(axis.title.y = element_text(face="bold",  size=12),
        axis.title.x = element_text(face="bold", size=12), 
        axis.text.x  = element_text(angle=45, vjust=0.5, size=12))
Warning: Using size for a discrete variable is not advised.

This is a more complicated example of scatter plot combined with formula of regression line. The paste0 function is used to add the equation to the plot. The data comes from GBD 2016 publication on lifetime risk of stroke. A comparison with plotting from base R is also provided.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ lubridate 1.9.2     ✔ tibble    3.2.1
✔ purrr     1.0.1     ✔ tidyr     1.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
load("./Data-Use/world_stroke.Rda")
#fitting a regression model
fit<-lm(MeanLifetimeRisk~LifeExpectancy,data=world_sfdf)
fitsum<-summary(fit)
#base R scatter plot with fitted line
x=world_sfdf$LifeExpectancy #define x
y=world_sfdf$MeanLifetimeRisk #define y
plot(x,y, data=world_sfdf, main = "Lifetime Stroke Risk",
     xlab = "Life Expectancy", ylab = "Life time Risk",
     pch = 19)
abline(lm(y ~ x, data = world_sfdf), col = "blue")

The ggplot version is now provided. Note that the line is fitted in the geom_smooth argument. An interesting aspect of this plot is that the data can be describe as heterosecadic in which the variance changes throughout the plot. The plot suggests that a mixed model strategy rather than linear regression is the best way to describe the data.

#ggplot2 scatter plot with fitted line
SR<-ggplot(world_sfdf,  aes(x=LifeExpectancy,y=MeanLifetimeRisk))+
  geom_smooth(method="lm")+geom_point()+
  xlab("Life Expectancy")+
  ggtitle(paste0("Life time Risk", "=", 
                 round(fitsum$coefficients[1],2),"+",
                 round(fitsum$coefficients[2],2)," x ","Life Expectancy"))
SR
`geom_smooth()` using formula = 'y ~ x'

To use the name of the country as label, we use geom_text. The world_sfdf data is now partitioned to show only data from Europe. An interesting pattern emerge. There is clumping of the data around Belgium and Portugal. The nudge_x and nudge_y function are used to adjust the labels and the size argument adjust the label.

library(tidyverse)
library(sf)
Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
world_sfdf %>% filter(Continent=="EUROPE") %>%
ggplot( aes(x=LifeExpectancy,y=MeanLifetimeRisk))+
  geom_smooth(method="lm")+
  xlab("Life Expectancy")+
  geom_text(aes(label=Country, nudge_x=.35, nudge_y=.5, 
      avoid_overlap=T), size=2)
old-style crs object detected; please recreate object with a recent sf::st_crs()
`geom_smooth()` using formula = 'y ~ x'

In order to understand the reason for deviations from the fitted line above, it is possible possible to add additional step to explore the relationship for each income group. This graph illustrates that the high income countries have a ceiling in the relationship between lifetime risk and life expectancy from age of 70 onward.

SRIncome<-ggplot(world_sfdf, aes(x=LifeExpectancy,
  y=MeanLifetimeRisk))+
    geom_smooth(method="lm", 
      aes(group=Income, linetype=Income, colour= Income))+ 
        geom_point()+
          xlab("Life Expectancy")

SRIncome
`geom_smooth()` using formula = 'y ~ x'

2.2.5 arrange plot in grids

Plots can be arrange in tabular format for presentation or journal submission.In base R multiple plots can be combined using par function and specify the number of columns by mfrow. The number of columns can be specified with ncol call when using gridExtra library.

#Leukemia data
par(mfrow=c(1,2)) #row of 1 and 2 columns

x=Leukemia$Age #define x
y=Leukemia$Blasts #define y

plot(x,y, data=Leukemia, main = "Leukemia data",
     xlab = "Age", ylab = "Blasts",
     pch = 19)
abline(lm(y ~ x, data = Leukemia), col = "blue")

y1=Leukemia$Smear
plot(x,y1, data=Leukemia, main = "Leukemia data",
     xlab = "Age", ylab = "Smear",
     pch = 19)
abline(lm(y1 ~ x, data = Leukemia), col = "blue")

library(gridExtra)

Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':

    combine
SRContinent<-ggplot(world_sfdf, aes(x=LifeExpectancy,
                y=MeanLifetimeRisk))+
                  geom_smooth(method="lm", 
                    aes(group=Continent, linetype=Continent, 
                        colour= Continent))+
                          geom_point()+
                              xlab("Life Expectancy")+
                                ylim(c(0,50))

grid.arrange(SRIncome, SRContinent, ncol=1) 
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

An alternative way to display multiple plots is to use patchwork library.

library(patchwork)
SRIncome/SRContinent

2.2.6 Line plot

The data below is generated in the section on data simulation. The data were simulated using summary data from recent clot retrieval trials in stroke Campbell et al. (2015)

library(ggplot2)
library(dplyr)
dtTime<-read.csv("./Data-Use/dtTime_simulated.csv")
dtTrial<-read.csv("./Data-Use/dtTrial_simulated.csv")

#summarise data using group_by
dtTime2<-dtTime %>%
  group_by(period, T) %>%
  summarise(meanY=mean(Y),
            sdY=sd(Y),
            upperY=meanY+sdY,
            lowerY=meanY-sdY)
`summarise()` has grouped output by 'period'. You can override using the
`.groups` argument.
#individual line plot
ggplot(dtTime,aes(x=as.factor(period),y=Y))+
  geom_line(aes(color=as.factor(T),group=id))+
  scale_color_manual(values = c("#e38e17", "#8e17e3")) + xlab("Time")+ylab("NIHSS")

The line plot can also be represented as boxplot without the connecting lines.

#box plot
gg<-ggplot(dtTime,aes(x=as.factor(period),y=Y))+
  geom_boxplot(aes(color=as.factor(T)))+  
  xlab("Time")+ylab("NIHSS")
gg+scale_fill_discrete(name="Treatment")

To perform line plot on the grouped data, first fit the regression line to the grouped data.

#linear regression Y1 predict Y2 where Y1 and Y2 are grouped data 
#from the simulated data above.
fit<-lm(Y1~Y0+T, data=dtTrial)

dtTrial2<-filter(dtTrial, T==1)

fit2<-lm(Y2~Y1, data=dtTrial2)

#line plot by group
pd <- position_dodge2(width = 0.2) # move them .2 to the left and right
gbase  = ggplot(dtTime2, aes(y=meanY, colour=as.factor(T))) + geom_errorbar(aes(ymin=lowerY, ymax=upperY), width=.3, position=pd)+
geom_point(position=pd) 
gline = gbase + geom_line(position=pd) 

dtTime2$period=as.numeric(dtTime2$period)
unique((dtTime2$period))
[1] 0 1 2 3
gline = gline %+% dtTime2

print(gline + aes(x=period))

2.2.7 Facet wrap

Facet wrap is a good way to visually explore different aspects pf the data. Using the dtTime data above, the plots are separated by trial assignment.

ggplot(dtTime,aes(x=as.factor(period),y=Y))+
  geom_line(aes(color=as.factor(T),group=id))+
  scale_color_manual(values = c("#e38e17", "#8e17e3"))+ 
  facet_wrap(~T)+ 
  xlab("Time")+ylab("NIHSS")

2.2.8 Polygons

The geom_polygon is often used in thematic plot of maps. It can be used to show polygons outside of map. It requires one data frame for coordinate and another for the values.

#simulate data
ids <- factor(c("1", "2", "3", "4","5","6"))

values <- data.frame(
  id = ids,
  value = c(3, 3.1, 3.2, 3.3,3.4,3.5)
)

a=seq(0,1.5,by=0.5)
x=c(a,a-.5,a+.5,a+.2, a-.3,a+.1)

positions <- data.frame(
  id = rep(ids, each = 4),
  x=c(a,a-.5,a+.5,a+.2, a-.3,a+.1),
  y=sample(x)
  )

# Currently we need to manually merge the two together
comb <- merge(values, positions, by = c("id"))

p <- ggplot(comb, aes(x = x, y = y)) +
  geom_polygon(aes(fill = value, group = id))
  
p

2.2.9 Gantt chart

Gantt chart can be used to illustrate project timeline. It needs a minimum of 4 data columns: Activity, Project, a start date and end date. This example below is meant as a template. If you have 6 rows of data then add 2 extra rows of data including colors.

library(tidyverse)
library(lubridate)
gantt_df<-data.frame(item=seq(1:4), 
    activity=c("Ethics submission","Length","Recruitment",
               "Follow up"),
    category=c("Ethics","Duration","Recruitment","Follow up"),
    Start=c("2020-06-30","2021-01-01","2021-01-01","2022-01-01"),
    End=c("2021-01-01","2023-01-01","2022-01-01","2023-01-01")) %>%
    mutate(
      Start=as.Date(Start, origin="1970-01-01" ),
      End=as.Date(End, origin="1970-01-01")#,
      #Set factor level to order the activities on the plot
      #activity=factor(activity,levels=activity[1:nrow(gantt_df)]),       #category=factor(category,levels=category[1:nrow(gantt_df)])
    )

#geom_linerange requires specification of min and max
plot_gantt <- qplot(ymin = Start,
                    ymax = End,
                    x = activity,
                    colour = category,
                    geom = "linerange",
                    data = gantt_df,
                    size = I(10)) +   #width of line
    scale_colour_manual(values = 
                          c("blue", "red", "purple", "yellow")) +
    coord_flip() +
    xlab("") +
    ylab("") +
    ggtitle("Project plan")
plot_gantt

Note that there’s an error from conversion of Date to numeric. More about this is available in Data Wrangling chapter.

as.numeric(gantt_df$Start)
[1] 18443 18628 18628 18993

This is a ggplot version of the above data. It requires conversion of the data into long format.

gantt_df.melt <- gantt_df%>%
  tidyr::pivot_longer(col = c(Start,End))

gantt_df.melt
# A tibble: 8 × 5
   item activity          category    name  value     
  <int> <chr>             <chr>       <chr> <date>    
1     1 Ethics submission Ethics      Start 2020-06-30
2     1 Ethics submission Ethics      End   2021-01-01
3     2 Length            Duration    Start 2021-01-01
4     2 Length            Duration    End   2023-01-01
5     3 Recruitment       Recruitment Start 2021-01-01
6     3 Recruitment       Recruitment End   2022-01-01
7     4 Follow up         Follow up   Start 2022-01-01
8     4 Follow up         Follow up   End   2023-01-01

We can plot the Gant chart

pl <- ggplot(gantt_df.melt, aes(x = value, y = category, 
        colour = activity ))+geom_line( alpha = 0.5,
              linewidth = 10)+xlab("Date")+
                labs(title="Gantt chart for study")
pl

2.2.10 Heatmap

The ggplot2 library can also be used for creating heatmap. This plot uses the geom_tile function.

library(ggplot2)
library(plyr)
------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------

Attaching package: 'plyr'
The following objects are masked from 'package:dplyr':

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize
The following object is masked from 'package:purrr':

    compact
library(reshape)

Attaching package: 'reshape'
The following objects are masked from 'package:plyr':

    rename, round_any
The following object is masked from 'package:lubridate':

    stamp
The following object is masked from 'package:dplyr':

    rename
The following objects are masked from 'package:tidyr':

    expand, smiths
library(scales)
Warning: package 'scales' was built under R version 4.3.3

Attaching package: 'scales'
The following object is masked from 'package:purrr':

    discard
The following object is masked from 'package:readr':

    col_factor
#swiss fertility dataset from 1888
data(swiss, package = "datasets") 

swiss$swiss_canton<-row.names(swiss) #assign column name to row name
rownames(swiss)<-NULL #remove row name
data.m <- melt(swiss)
Using swiss_canton as id variables
 data.m <- ddply(data.m, .(variable), transform, rescale = rescale(value))
 q <- ggplot(data.m, aes(variable, swiss_canton)) + 
         geom_tile(aes(fill = rescale), colour = "white")+
        scale_fill_gradient(low = "white", high = "steelblue")+
   ggtitle("Swiss Fertility Data 1888")
 q

2.3 ggplot2 extra

The following plots retains the framework of ggplot2. Their uses require installing additional libraries.

2.3.1 Alluvial and Sankey diagram

The Sankey flow diagram uses the width of the arrow used to indicate the flow rate. It is often used to indicate energy dissipation in the system. There are several libraries providing Sankey plot such as networkD3 library. Alluvial plot is a subset of Sankey diagram but differs in having a structured workflow. The ggalluvial library is chosen here for demonstration as it forms part of the ggplot2 framework.

library(ggalluvial)
library(Stat2Data)
data("Leukemia") #treatment of leukemia
#partition Age into 8 ordered factors
Leukemia$AgeCat<-cut_interval(Leukemia$Age, n=8, ordered_result=TRUE)
#axis1
ggplot(data=Leukemia, aes (y=Smear,axis1=AgeCat, axis2=Resp,axis3=Status))+
  geom_alluvium(aes(fill=AgeCat),width = 1/12)+
  geom_label(stat = "stratum", infer.label = TRUE) +
  geom_label(stat = "stratum", infer.label = TRUE)+
  scale_x_discrete(limits = c("AgeCat","Resp", "Status"),label=c("Age Category","Treatment Response","Mortality"), expand = c(.1, .1)) +
  scale_fill_brewer(type = "qual", palette = "Set1") +
  ggtitle("Outcome after Leukemia Treatment")

2.3.2 Survival plot

The survminer library extends ggplot2 style to survival plot. It requires several libraries such as survival for survival analysis and lubridate to parse time. A description of survival analysis is provided in the Statistics section.

library(survminer)
Loading required package: ggpubr

Attaching package: 'ggpubr'
The following object is masked from 'package:plyr':

    mutate
library(lubridate)
library(survival)

Attaching package: 'survival'
The following object is masked from 'package:survminer':

    myeloma
data(cancer, package="survival")

#data from survival package on NCCTG lung cancer trial
#https://stat.ethz.ch/R-manual/R-devel/library/survival/html/lung.html
#time in days
#status cesored=1, dead=2sex:
#sex:Male=1 Female=2
sfit<- survfit(Surv(time, status) ~ sex, data = cancer)
ggsurvplot(sfit, data=cancer,
           surv.median.line = "hv",
           pval=TRUE,
           pval.size=3, 
           conf.int = TRUE,
           legend.labs=c("Male","Female"),xlab="Time (Days)", 
           break.time.by=50,
           font.x=5,
           font.y=5,
           ggtheme = theme_bw(),  
           risk.table=T,
           risk.table.font=2, #adjust font
           risk.table.height=.3  #adjust table height
           )

2.3.3 ggraph and tidygraph

The igraph library does the heavy lifting in graph theory analysis. This aspect will be expanded on in the chapter on Graph Theory. However, the plotting function with igraph is still not optimal. The ggraph and tidygraph libraries extend the ggplot2 style to graph.

library(tidygraph)

Attaching package: 'tidygraph'
The following object is masked from 'package:reshape':

    rename
The following objects are masked from 'package:plyr':

    arrange, mutate, rename
The following object is masked from 'package:stats':

    filter
library(ggraph) 

#relationship among members of acute stroke team
tpa<-readr::read_csv("./Data-Use/TPA_edge010817.csv")%>%
  rename(from=V1, to=V2)
Rows: 25 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): V1, V2
dbl (2): time.line, time.capacity

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#node by degree centrality
graph<-as_tbl_graph(tpa) %>%  mutate(degree = centrality_degree())
ggraph(graph, layout = 'fr') + 
 geom_edge_link() + 
  #label size by degree centrality
  geom_node_point(aes(size=degree))+
  #label node
  geom_node_text(aes(label=name),repel=T)+
  ggtitle("Acute Stroke Team Network")
Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

2.3.4 ggparty-decision tree

Decision tree can be plotted using ggplot2 and ggparty framework. This example uses data from the Breast Cancer dataset. It explores the effect of different histological features on class of breast tissue type.

library(ggparty)
Loading required package: partykit
Loading required package: grid
Loading required package: libcoin
Loading required package: mvtnorm

Attaching package: 'ggparty'
The following object is masked from 'package:ggraph':

    geom_node_label
library(partykit)
library(tidyverse)

data("BreastCancer",package = "mlbench")

#check data type
str(BreastCancer)
'data.frame':   699 obs. of  11 variables:
 $ Id             : chr  "1000025" "1002945" "1015425" "1016277" ...
 $ Cl.thickness   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 5 5 3 6 4 8 1 2 2 4 ...
 $ Cell.size      : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 1 1 2 ...
 $ Cell.shape     : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 4 1 8 1 10 1 2 1 1 ...
 $ Marg.adhesion  : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 5 1 1 3 8 1 1 1 1 ...
 $ Epith.c.size   : Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 2 7 2 3 2 7 2 2 2 2 ...
 $ Bare.nuclei    : Factor w/ 10 levels "1","2","3","4",..: 1 10 2 4 1 10 10 1 1 1 ...
 $ Bl.cromatin    : Factor w/ 10 levels "1","2","3","4",..: 3 3 3 3 3 9 3 3 1 2 ...
 $ Normal.nucleoli: Factor w/ 10 levels "1","2","3","4",..: 1 2 1 7 1 7 1 1 1 1 ...
 $ Mitoses        : Factor w/ 9 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 5 1 ...
 $ Class          : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...
BC<- BreastCancer %>% select(-Id)

treeBC<-ctree(Class~., data=BC, control = ctree_control(testtype = "Teststatistic"))

#plot tree using plot from base R
plot(treeBC)

#plot tree using ggparty

ggparty(treeBC) +
  geom_edge() +
  geom_edge_label() +
  geom_node_label(aes(label = splitvar),
                  ids = "inner") +
  geom_node_label(aes(label = info),
                  ids = "terminal")

ggparty(treeBC) +
  geom_edge() +
  geom_edge_label() +
  # map color to level and size to nodesize for all nodes
  geom_node_splitvar(aes(col = factor(level),
                         size = nodesize)) +
  geom_node_info(aes(col = factor(level),
                     size = nodesize))

2.3.5 Map

Several simple examples are provided here. They illustrate the different plotting methods used according to the type of data. It is important to check the structure of the data using class() function.

2.3.5.1 Thematic map

The ggplot2 library can also be used to generate thematic (choropleth) map. Here we use map_data function from ggplot2 to obtain a map of USA. Geocoded data are contained in the long and lat columns. The US map data is in standard dataframe format. In this case, the geom_map function is used for mapping. The USArrests data contains a column for murder, assault, rape and urban population. The assault data presented here is normalised by the population data. This section will be expanded further in the Geospatial Analysischapter.

library(dplyr)
library(ggplot2)
arrest<-data("USArrests") 
arrest<-USArrests%>% add_rownames("region") %>% 
  mutate(region=tolower(region))
Warning: `add_rownames()` was deprecated in dplyr 1.0.0.
ℹ Please use `tibble::rownames_to_column()` instead.
US <- map_data("state") 
map_assault<-ggplot()+ 
  geom_map(data=US, map=US,
                    aes(x=long, y=lat, map_id=region),
                    fill="#ffffff", color="#ffffff", size=0.15)+
  #add USArrests data here
  geom_map(data=arrest, map=US,
              aes(fill=Assault/UrbanPop, map_id=region),
                    color="#ffffff", size=0.15)+
  scale_fill_continuous(low='thistle2', high='darkred', 
                                 guide='colorbar')
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Warning in geom_map(data = US, map = US, aes(x = long, y = lat, map_id =
region), : Ignoring unknown aesthetics: x and y
map_assault

This is a more complex example and uses state by state COVID-19 data CDC website. Steps to extract the COVID-10 is shown in the next chapter. The shapefile for USA can also be extracted from tigris library. A challenge with plotting a map of US is that the country extends far North to Alaska and East to pacific islands.

library(ggplot2)
library(dplyr)
covid<-read.csv("./Data-Use/Covid_bystate_Table130420.csv") %>% mutate(region=str_to_lower(Jurisdiction))

map_covid<-ggplot()+ 
  geom_map(data=US, map=US,
                    aes(x=long, y=lat, map_id=region),
                    fill="#ffffff", color="#ffffff", size=0.15)+
  #add covid data here
  geom_map(data=covid, map=US,
              aes(fill=CumulativeIncidence31.03.20, map_id=region),
                    color="#ffffff", size=0.15)+
  scale_fill_continuous(low='thistle2', high='darkred', 
                                 guide='colorbar')
Warning in geom_map(data = US, map = US, aes(x = long, y = lat, map_id =
region), : Ignoring unknown aesthetics: x and y
map_covid

In the simple example below we will generate a map of Australian State territories color by size of area. The ggplot2 combines with sf library and uses the shape file data in the geom_sf call.

library(ggplot2)
library(sf)
#shape file
Aust<-st_read("./Data-Use/GCCSA_2016_AUST.shp") 
Reading layer `GCCSA_2016_AUST' from data source 
  `C:\Users\phant\Documents\Book\Healthcare-R-Book\Data-Use\GCCSA_2016_AUST.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 34 features and 5 fields (with 18 geometries empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 96.81694 ymin: -43.74051 xmax: 167.998 ymax: -9.142176
Geodetic CRS:  GDA94
colnames(Aust) #find out column variables
[1] "GCC_CODE16" "GCC_NAME16" "STE_CODE16" "STE_NAME16" "AREASQKM16"
[6] "geometry"  
ggplot() + 
  geom_sf(data=Aust,aes(fill=AREASQKM16))+
  labs(x="Longitude (WGS84)", y="Latitude", title="Map of Australia") + 
  theme_bw() 

2.3.5.2 tmap

The tmap library works in conjunction with ggplot2 and sf. The tm_shape function takes in the shape data. The tm_polygon function color the shape file with the column data of interest.

library(tmap)
The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
     (status 2 uses the sf package in place of rgdal)
load("./Data-Use/world_stroke.Rda")
#data from GBD 2016 investigators
colnames(world_sfdf) 
 [1] "FIPS"                                
 [2] "ISO2"                                
 [3] "ISO3"                                
 [4] "UN"                                  
 [5] "NAME"                                
 [6] "AREA"                                
 [7] "POP2005"                             
 [8] "REGION"                              
 [9] "SUBREGION"                           
[10] "LON"                                 
[11] "LAT"                                 
[12] "MeanLifetimeRisk"                    
[13] "Country code"                        
[14] "y2015"                               
[15] "Continent"                           
[16] "Region, subregion, country or area *"
[17] "18+"                                 
[18] "Code"                                
[19] "Region"                              
[20] "Income"                              
[21] "LendingCategory"                     
[22] "Other"                               
[23] "LifeExpectancy"                      
[24] "MeanLifetimeRiskH"                   
[25] "MeanLifetimeRiskI"                   
[26] "geometry"                            
[27] "Country"                             
class(world_sfdf) #contains simple features
[1] "sf"         "tbl_df"     "tbl"        "data.frame"
#map of country income
m<-tm_shape(world_sfdf, projection = "+proj=eck4")+tm_polygons("Income")
m
old-style crs object detected; please recreate object with a recent sf::st_crs()
old-style crs object detected; please recreate object with a recent sf::st_crs()
old-style crs object detected; please recreate object with a recent sf::st_crs()
old-style crs object detected; please recreate object with a recent sf::st_crs()
old-style crs object detected; please recreate object with a recent sf::st_crs()

#save object as png
#tmap_save(m,file="world_income.png"")

#save as leaflet object
#tmap_save(m,file="world_income.html"")

map of lifetime stroke risk

n<-tm_shape(world_sfdf, projection = "+proj=eck4")+tm_polygons("MeanLifetimeRisk")
n
old-style crs object detected; please recreate object with a recent sf::st_crs()
old-style crs object detected; please recreate object with a recent sf::st_crs()
old-style crs object detected; please recreate object with a recent sf::st_crs()
old-style crs object detected; please recreate object with a recent sf::st_crs()
old-style crs object detected; please recreate object with a recent sf::st_crs()

2.3.5.3 Voronoi

The ggplot2 and sf libraries are extended to include drawing of voronoi. Voronoi is a special type of polygon. It can be seen as a mathematical approach to partition regions such that all points within the polygons are closest (depending on distance metrics) to the seed points. Voronoi has been used in disease mapping (John Snow mapping of Broad Street Cholera outbreak) and meteorology (Alfred Thiessen polygon method for measuring rainfall in basin). This is a more complex coding task. It uses the geom_voronoi call from ggvoronoi library. Some libraries have vignettes to help you implement the codes. The vignette in the ggvoronoi library can be called using vignette(“ggvoronoi”). The osmdata library will be used to provide map from OpenStreetMap. A related library is OpenStreetMap. The latter library uses raster file whereas osmdata provides vectorised map data. In the chapter on Geospatial Analysis we will exapnd on the use of OpenStreetMap. One issue with the use of voronoi is that there are infinite sets and so a boundary needs to set. In the example below, the boundary was set for Greater Melbourne.

library(dplyr)
library(ggvoronoi)
rgeos version: 0.6-3, (SVN revision 696)
 GEOS runtime version: 3.11.2-CAPI-1.17.2 
 Please note that rgeos will be retired during October 2023,
plan transition to sf or terra functions using GEOS at your earliest convenience.
See https://r-spatial.org/r/2023/05/15/evolution4.html for details.
 GEOS using OverlayNG
 Linking to sp version: 2.0-0 
 Polygon checking: TRUE 
library(ggplot2)
library(sf)

#subset data with dplyr for metropolitan melbourne
msclinic<-read.csv("./Data-Use/msclinic.csv") %>% filter(clinic==1, metropolitan==1)

Victoria<-st_read("./Data-Use/GCCSA_2016_AUST.shp") %>% filter(STE_NAME16=="Victoria") 
Reading layer `GCCSA_2016_AUST' from data source 
  `C:\Users\phant\Documents\Book\Healthcare-R-Book\Data-Use\GCCSA_2016_AUST.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 34 features and 5 fields (with 18 geometries empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 96.81694 ymin: -43.74051 xmax: 167.998 ymax: -9.142176
Geodetic CRS:  GDA94
m<-ggplot(msclinic)+
  geom_point(aes(x=lon,y=lat))+ 
  #add hospital location
  geom_voronoi(aes(x=lon,y=lat,fill=distance),fill=NA, color="black")+ 
  #create voronoi from hospital location
  geom_sf(data=Victoria,aes(fill=AREASQKM16)) +
  labs(x="Longitude (WGS84)", y="Latitude", 
       title="Voronoi Map of MS Clinics in Melbourne")
m  
Warning: `fortify(<SpatialPolygonsDataFrame>)` was deprecated in ggplot2 3.4.4.
ℹ Please migrate to sf.

This map is not so useful as the features of Victoria overwhelm the features of Greater Melbourne.

2.3.6 ggwordcloud

The ggwordcloud library extend the ggplot2 family to creating wordcloud. The following is an illustration of wordcloud created from comments on Youtube video “Stroke Heroes Act Fast”.

library(ggwordcloud)
library(tm)
Loading required package: NLP

Attaching package: 'NLP'
The following object is masked from 'package:ggplot2':

    annotate
library(tidyverse)
heroes_df<-read.csv("./Data-Use/Feb_01_1_49_59 PM_2018_AEDT_YoutubeData.csv",stringsAsFactors = FALSE)
#cleaning data
keywords <- heroes_df$Comment 
keywords <- iconv(keywords, to = 'utf-8')
#create corpus
myCorpus <- VCorpus(VectorSource(keywords))
#lower case
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
#remove numer
myCorpus <- tm_map(myCorpus, removeNumbers)
#remove punctuation
myCorpus <- tm_map(myCorpus, removePunctuation)
#remove stopwords
myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"),lazy=TRUE) 
#remove white space
myCorpus <- tm_map(myCorpus, stripWhitespace, lazy=TRUE)
#term document matrix
dtm <- DocumentTermMatrix(myCorpus,control = list(wordLengths=c(3, 20)))
#remove sparse terms
dtm<-removeSparseTerms(dtm, 0.95)
#remove words of length <=3
tdm=TermDocumentMatrix(myCorpus,
                       control = list(minWordLength=4,maxWordLength=20) )
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
#remove words with frequency <=1
d <- data.frame(word = names(v),freq=v) %>% filter(freq>1)
#wordcloud
ggplot(data = d, 
       aes(label = word, size = freq, col = as.character(freq))) + 
  geom_text_wordcloud(rm_outside = TRUE, max_steps = 1,
                      grid_size = 1, eccentricity = .8)+
    scale_size_area(max_size = 12)+
    scale_color_brewer(palette = "Paired", direction = -1)+
  theme_void()

2.3.7 gganimate

The library gganimate can be used to generate videos in the form of gif file. The data needs to be collated from wide to long format. For the purpose of this book, the code has been turned off.

library(ggplot2)
library(gganimate)

#ensure data in long format, Y =NIHSS
u <- ggplot(dtTime2, aes(period, meanY , color = T, frame = period)) +
  geom_bar(stat="identity") +
  geom_text(aes(x = 11.9, label = period), hjust = 0) +   xlim(0,13)+
  coord_cartesian(clip = 'off') + 
  facet_wrap(~meanY)+
  labs(title = 'ECR Trials', y = 'Number') +
transition_reveal(period)
u
#create gif
#animate(u,fps=15,duration=15)
#anim_save(u,file="./Data-Use/simulated_ECR_trial.gif", width=800, height=800)

2.3.8 ggneuro

There are several ways of plotting mri scans. The ggneuro library is illustrated here as it relates to the ggplot2 family. The Colin \(T_1\) scan is a high resolution scan.

#devtools::install_github("muschellij2/ggneuro")
library(ggneuro)
library(neurobase)
Loading required package: oro.nifti
oro.nifti 0.11.4

Attaching package: 'oro.nifti'
The following object is masked from 'package:tidygraph':

    slice
The following object is masked from 'package:dplyr':

    slice
colin_1mm<-untar("./Data-Use/colin_1mm.tgz")
colinIm<-readNIfTI("colin_1mm") 
ggortho(colinIm)

2.4 Plotly

Plotly is a javascript library which has its own API . It has additional ability for interaction as well as create a video. Examples are provided with calling plotly directly or via ggplot2. In the examples below, the plots are performed using ggplot2 and then pass onto plotly using ggplotly function. This example uses the Leukemia dataset from Stat2Data library. The plotly syntax uses ~ after the = symbol to identify a variable for plotting.

2.4.1 Histogram with plotly

library (plotly)

Attaching package: 'plotly'
The following object is masked from 'package:neurobase':

    colorbar
The following object is masked from 'package:oro.nifti':

    slice
The following object is masked from 'package:reshape':

    rename
The following objects are masked from 'package:plyr':

    arrange, mutate, rename, summarise
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
data("Leukemia") #treatment of leukemia
plot_ly (data=Leukemia, x=~Age, color = ~as.factor(Status))
No trace type specified:
  Based on info supplied, a 'histogram' trace seems appropriate.
  Read more about this trace type -> https://plotly.com/r/reference/#histogram
Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

2.4.2 Scatter plot with plotly

library(plotly)
library(Stat2Data)
data("Leukemia") #treatment of leukemia
#scatter plot directly from plotly
plot_ly(x=~Age,y=~Smear, #percentage of blast
        color=~as.factor(Status),  #dead or alive
        symbol=~Resp, 
        symbols=c('circle' ,'square'), #Response to treatment
        data=Leukemia) 
No trace type specified:
  Based on info supplied, a 'scatter' trace seems appropriate.
  Read more about this trace type -> https://plotly.com/r/reference/#scatter
No scatter mode specifed:
  Setting the mode to markers
  Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

2.4.3 Bar plot with plotly

Plotly can uses a ggplot object directly.

#using the epilepsy data in p generated above for bar plot
ggplotly(p)

Bar plot in plotly is specified by the argument type.


plot_ly(data=df3, x = ~Predictors, y=~Importance, 
        color = ~Predictors,type = 'bar') 

2.4.4 Heatmap with plotly

Heatmap in plotly is specified by the argument type. It requires 3 input: x, y and z.

library(plotly)
library(dplyr)
plot_ly(x=~variable, y=~swiss_canton, z=~value, data=data.m, 
        type='heatmap')%>%
  layout(margin=list(l=120))

2.4.5 Map with plotly

The data in this example is also used to make a map of Finland using mapview in the Geospatial Analysis chapter.

library(plotly)
library(geofi)
library(tidyverse)
library(tidyr)
library(pxweb)
pxweb 0.16.2: R tools for the PX-WEB API.
https://github.com/ropengov/pxweb
library(janitor)

Attaching package: 'janitor'
The following objects are masked from 'package:stats':

    chisq.test, fisher.test
#data  on Finland
#https://pxnet2.stat.fi/PXWeb/pxweb/en/Kuntien_avainluvut/
#Kuntien_avainluvut__2021/kuntien_avainluvut_2021_viimeisin.px/table/
#tableViewLayout1/
FinPop<-readxl::read_xlsx("./Data-Use/Kuntien avainluvut_20210704-153529.xlsx", skip = 2)
New names:
• `` -> `...1`
FinPop2<-FinPop[-c(1),] #remove data for entire Finland

#get shapefile for municipality
municipalities <- get_municipalities(year = 2020, scale = 4500) 
Requesting response from: http://geo.stat.fi/geoserver/wfs?service=WFS&version=1.0.0&request=getFeature&typename=tilastointialueet%3Akunta4500k_2020
Warning: Coercing CRS to epsg:3067 (ETRS89 / TM35FIN)
Data is licensed under: Attribution 4.0 International (CC BY 4.0)
#sf object)

#join shapefile and population data
municipalities2<-right_join(municipalities, FinPop2, 
                            by=c("name"="...1")) %>%
  rename(Pensioner=`Proportion of pensioners of the population, %, 2019`,Age65=`Share of persons aged over 64 of the population, %, 2020`) %>% na.omit()

plot_ly(municipalities2, 
        split = ~Age65)
No trace type specified:
  Based on info supplied, a 'scatter' trace seems appropriate.
  Read more about this trace type -> https://plotly.com/r/reference/#scatter

The plot_geo function enable usage of high level geometries.

plot_geo(municipalities2, 
        split = ~Age65)
Warning: The trace types 'scattermapbox' and 'scattergeo' require a projected
coordinate system that is based on the WGS84 datum (EPSG:4326), but the crs
provided is: '+proj=utm +zone=35 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m
+no_defs'. Attempting transformation to the target coordinate system.

Comparing ggplot2 output with plotly for mapping.

subplot(nrows = 2,
         ggplot(municipalities2) + geom_sf(),
         plot_ly(municipalities2)
) 
No trace type specified:
  Based on info supplied, a 'scatter' trace seems appropriate.
  Read more about this trace type -> https://plotly.com/r/reference/#scatter
Warning: Can only have one: config
library(ggplot2)
library(plotly)
library(dplyr)

covid<-read.csv("./Data-Use/Covid_bystate_Table130420.csv") %>% mutate(region=str_to_lower(Jurisdiction)) #consistent

covid_US<-ggplot()+ 
  geom_map(data=US, map=US,
                    aes(x=long, y=lat, map_id=region),
                    fill="#ffffff", color="#ffffff", size=0.15)+
  #add covid data here
  geom_map(data=covid, map=US,
              aes(fill=NumberCases31.03.20, map_id=region),
                    color="#ffffff", size=0.15)+
  scale_fill_continuous(low='thistle2', high='darkred', 
                                 guide='colorbar')

ggplotly(covid_US)

2.5 Graphviz and Mermaid

The DiagrammeR library is useful for plotting graph and flow diagram. Graphviz graph uses the function grViz.

library(DiagrammeR)

Attaching package: 'DiagrammeR'
The following object is masked from 'package:partykit':

    get_paths
The following object is masked from 'package:ggraph':

    get_edges
DiagrammeR::DiagrammeR("

sequenceDiagram;
  Triage->>Imaging: Scan 1.2 hr;
  Imaging->>Dysphagia Screen: Check Swallow 12.9 hr;
  Dysphagia Screen->> NBM Stop: Oral medication 0 hr;
  NBM Stop->>Antithrombotics: Administer 1.7 hr;
  
  Imaging->> Antithrombotics: PR medication 3.6 hr;
  
alt Scan available
  Triage->>Dysphagia Screen: Bypass Imaging 0.7 hr;
    
else TPA
  Triage->>TPA: Give TPA 0.7 hr;
  TPA->>Post TPA Scan: Exclude hemorrhage 24.7 hr;
  Post TPA Scan->>Antithrombotics: Antithrombotics 7.5 hr;
end
")

This flow diagram can also be plotted using mermaid.js. This diagram was used to illustrate the time to administer antiplatelet therapy (Phan TG 2021).

a.plot<-mermaid("
        graph TB
        
        A((Triage))
        A-->|2.3 hr|B(Imaging-No Stroke Code)
        A-->|0.6 hr|B1(Imaging-Stroke Code)
        
        B-->|14.6 hr|B2(Dysphagia Screen)
        B1-->|no TPA 10.7 hr|B2(Dysphagia Screen)

        C(Stop NBM)
        B2-->|0 hr|C
      
        C-->|Oral route 1.7 hr|E{Antithrombotics}
        
        D1-->|7.5 hr|E
        B-->|PR route 6.8 hr|E
        B1-->|PR route 3.8 hr|E
    
        B1-->|TPA 24.7 hr|D1(Post TPA Scan)
    
        style A fill:#ADF, stroke:#333, stroke-width:2px
        style B fill:#9AA, stroke:#333, stroke-width:2px
        style B2 fill:#9AA, stroke:#333, stroke-width:2px
        style B1 fill:#879, stroke:#333, stroke-width:2px
        style C fill:#9AA, stroke:#333, stroke-width:2px
        style D1 fill:#879, stroke:#333, stroke-width:2px
        style E fill:#9C2, stroke:#9C2, stroke-width:2px
        ") 
a.plot