Contributing to the Tidyverse (dbplyr)
Illustrations by Allison Horst, artist in residence at RStudio
I was acknowledged as a contributor to the version 2.0.0 release of dbplyr!
dbplyr is the database backend for the ‘data
pliers dplyr’ data manipulation package in
the tidyverse software suite of R
statistical programming language.
Or to describe from ‘top-down’:
Ris a computer programming language used by statisticians and others who want to interpret data.tidyverseis a collection of software packages for theRlanguage which makes it easier for R users to manipulate and process data. So much easier, thatRis now taught to liberal arts post-graduate students to analyze data e.g. for environmental studies at Harvard Extension School. These students often have no prior experience in computer programming.The
tidyversewas largely the creation of a New Zealander, Hadley Wickham, and it looks like he is the chief maintainer of thetidyversesoftware. LikeR,tidyverseis ‘open source’, freely available for use and modification, and contributed to by many enthusiasts in the data science community.dplyris a software package in thetidyversecollection which does many of the common data manipulation tasks, such as filtering, changing, sorting, summarizing and selection.dbplyrallowsdplyrto interact with database backends.

My contributions to the free and open-source dbplyr are (ironically) related
to dbplyr operation with Microsoft SQL Server ‘MSSQL’.
In all credit to Microsoft, the basic versions of Microsoft SQL Server are freely
available, as are client libraries (for use in Linux), and Microsoft also provides
extensive freely available documentation.
As of 21st December 2020, my two accepted contributions (‘pull requests’) are:
Cast
as.doubleandas.numerictoFLOATinstead ofNUMERICIn MSSQL,
NUMERICconverts floating point number to integers, which is not what is intended foras.doubleandas.numericinR.Use
try_castinstead ofcastfor MSSQL version 11+ (2012+)In MSSQL,
try_castallows more elegant handling of invalid entries.try_castreturnsNA(not available) in situations wherecastwill return an error.
As of 21st December 2020, I also have a currently open contribution (‘pull request’) to fix an error in my second contribution.
What I really would like to say is just how friendly Hadley Wickham and others
have been in helping me contribute to and improve dbplyr.
Both in initial discussion and in the process of doing a ‘pull request’, Hadley and Kirrill Müller have answered the simplest of queries, amended my super-clumsy code and really encouraged me along! Hadley is an adjunct professor and something of a data science legend. I have not attended a formal computer programming class at high school, university or trade school, so I’m really humbled to feel like a valued contributor to the data science world.

(And why am I so interested in improving the operation of dbplyr with MSSQL?
It is because I use dbplyr/dplyr to interrogate the Best Practice
electronic medical record patient information database with my ‘near future’
patient care quality improvement tool GPstat!.)