Data Science Workflows in R

An introduction to deploying production quality R code

Author

Dean Marchiori

Published

October 20, 2024

Preface

This guide is a resource for data analysts and data scientists looking to improve the way they write R code. While R remains a popular choice for statistical modelling and data analysis, its rapid development has enabled users to progress their work right through to being deployed into Production. However the type of work done when conducting experiments and developing models is very different to packaging up this work so it can reliably drive decisions in an organisation. This book will provide readers with an overview of contemporary frameworks for how data analysis is done in practice. It will cover how R projects are usually structured and how this can evolve based on project complexity. It will examine what is meant by experimental vs production analysis code and which principles need to be adopted. Finally it will show current tools and frameworks for taking experimental R code and strengthening it to align with best practice for reliable production grade software. Readers can step through a case study and download code to follow along.

Who is this for?

This book is intended as an introductory guide for R users who have experience writing code and fitting models, but want to improve their practices for translating these models into robust code that is reliable and used to make real-world decisions.

This was initially developed as course materials for various workshops on R code development and MLOps and its primary purpose was to function as companion materials to workshop delivery.

Warning

This is a work in progress and is considered in DRAFT form. The contents (hopefully) will change and evolve over time. I would welcome any early feedback on the content or new interesting additions.

Terminology

Throughout this book the terms ‘data science’ and ‘data analysis’ should be considered interchangeable and represent the application of advanced data analysis and statistical techniques to data to achieve an outcome. The word ‘analyst’ will be adopted as the primary role for someone completing these tasks.

Contact Me

If you would like to get in touch head over to deanmarchiori.com

Contributing

Contributions to this work are welcomed via Issues on the Github page.

Please note that this project uses a Contributor Code of Conduct. By contributing to this book, you agree to abide by its terms.

Licence

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.