Most Useful R Functions You Might Not Know
Almost every R user knows about popular packages like dplyr and ggplot2. But with 10,000+ packages on CRAN and yet more on GitHub, it’s not always easy to unearth libraries with great R functions. Here are the ten most useful R functions you might not know that makes my life easier working in R. If you already know them all, sorry for wasting your reading time, and please consider adding a comment with something else that you find useful for the benefit of other readers.
1. RStudio shortcut keys
This is less an R hack and more about the RStudio IDE, but the shortcut keys available for common commands are super useful and can save a lot of typing time. My two favourites are Ctrl+Shift+M for the pipe operator %>%
and Alt+- for the assignment operator<-
. If you want to see a full set of these awesome shortcuts just type Atl+Shift+K in RStudio.
2. Automate tidyverse styling with styler
It’s been a tough day, you’ve had a lot on your plate. Your code isn’t as neat as you’d like and you don’t have time to line edit it. Fear not. The stylerpackage has numerous functions to allow automatic restyling of your code to match tidyverse style. It’s as simple as running styler::style_file()
on your messy script and it will do a lot (though not all) of the work for you.
3. The Switch function
I LOVE switch()
. It's basically a convenient shortening of an if
a statement that chooses its value according to the value of another variable. I find it particularly useful when I am writing code that needs to load a different dataset according to a prior choice you make. For example, if you have a variable called animal
and you want to load a different set of data according to whether animal
is a dog, cat or rabbit you might write this:
data <- read.csv( switch(animal, "dog" = "dogdata.csv", "cat" = "catdata.csv", "rabbit" = "rabbitdata.csv") )
4. k-means on long data
k-means is an increasingly popular statistical method to cluster observations in data, often to simplify a large number of data points into a smaller number of clusters or archetypes. The kml package now allows k-means clustering to take place on longitudinal data, where the ‘data points’ are actually data series. This is super useful where the data points you are studying are actually readings over time. This could be the clinical observation of weight gain or loss in hospital patients or compensation trajectories of employees.
kml
works by first transforming data into an object of the class ClusterLongData
using the cld
function. Then it partitions the data using a 'hill climbing' algorithm, testing several values of k
20 times each. Finally, the choice()
function allows you to view the results of the algorithm for each k
graphically and decide what you believe to be an optimal clustering.
5. Text searching
If you’ve been using regular expressions to search for text that starts or ends with a certain character string, there’s an easier way. “startsWith() and endsWith() — did I really not know these?” tweeted data scientist Jonathan Carroll. “That’s it, I’m sitting down and reading through dox for every #rstats function.”
6. The req and validate functions in R Shiny
R Shiny development can be frustrating, especially when you get generic error messages that don’t help you understand what is going wrong under the hood. As Shiny develops, more and more validation and testing functions are being added to help better diagnose and alert when specific errors occur. The req()
function allows you to prevent an action from occurring unless another variable is present in the environment, but does so silently and without displaying an error. So you can make the display of UI elements conditional on previous actions. For example:
output$go_button <- shiny::renderUI({ # only display button if an animal input has been chosen shiny::req(input$animal) # display button shiny::actionButton("go", paste("Conduct", input$animal, "analysis!") ) })
validate()
checks before rendering output and enables you to return a tailored error message should a certain condition not be fulfilled, for example, if the user uploaded the wrong file:
# get csv input file inFile <- input$file1 data <- inFile$datapath # render table only if it is dogs shiny::renderTable({ # check that it is the dog file, not cats or rabbits shiny::validate( need("Dog Name" %in% colnames(data)), "Dog Name column not found - did you load the right file?" ) data })
7. revealjs
revealjs
is a package which allows you to create beautiful presentations in HTML with an intuitive slide navigation menu, with embedded R code. It can be used inside R Markdown and has very intuitive HTML shortcuts to allow you to create a nested, logical structure of pretty slides with a variety of styling options. The fact that the presentation is in HTML means that people can follow along on their tablets or phones as they listen to you speak, which is really handy. You can set up a revealjs
presentation by installing the package and then calling it in your YAML header. Here's an example YAML header of a talk I gave recently using revealjs
--- title: "Exporing the Edge of the People Analytics Universe" author: "Keith McNulty" output: revealjs::revealjs_presentation: center: yes template: starwars.html theme: black date: "HR Analytics Meetup London - 18 March, 2019" resource_files: - darth.png - deathstar.png - hanchewy.png - millenium.png - r2d2-threepio.png - starwars.html - starwars.png - stormtrooper.png ---
8. Datatables in RMarkdown or Shiny using DT
The DT package is an interface from R to the DataTables javascript library. This allows a very easy display of tables within a shiny app or R Markdown document that has a lot of in-built functionality and responsiveness. This prevents you from having to code separate data download functions, gives the user flexibility around the presentation and the ordering of the data and has a data search capability built in. For example, a simple command such as :
DT::datatable( head(iris), caption = 'Table 1: This is a simple caption for the table.' )
9. Pimp your RMarkdown with prettydoc
prettydoc
is a package by Yixuan Qiu which offers a simple set of themes to create a different, prettier look and feel for your RMarkdown documents. This is super helpful when you just want to jazz up your documents a little but don't have time to get into the styling of them yourself. It's really easy to use. Simple edits to the YAML header of your document can invoke a specific style theme throughout the document, with numerous themes available. For example, this will invoke a lovely clean blue colouring and style across titles, tables, embedded code and graphics:
--- title: "My doc" author: "Me" date: June 3, 2019 output: prettydoc::html_pretty: theme: architect highlight: github ---
10. Get minimum and maximum values with a single command.
Talking about the useful R functions you might not know how can I miss to find the minimum and maximum values in a vector. Base R’s range() function does just that, returning a 2-value vector with the lowest and highest values. The help file says range() works on numeric and character values, but I’ve also had success using it with date objects.
Originally published at https://pyoflife.com on August 7, 2022.