Bridging Python to R#
If you’ve been through my section on correlations, you’ve likely come across at least one instance of me bridging from Python to R. This is the act of pulling R into a Python environment and using R objects and R libraries. I commonly use bridging for certain types of correlations (polychoric) and statistical tests (ART ANOVA and its post hoc tests). Now, I don’t expect bridging will be a common activity for most people reading this, but all the same it’s worth understanding how to do it.
In general, the steps of bridging include:
Installing (if necessary) a bridging package, like rpy2, and importing it.
Install and then import R packages for use.
Create data structures and pass them back and forth as needed.
Perform analyses not easily performed in Python.
As you can see, it really is using Python and R seamlessly in the same workflow.
Since my preferred tool for this is rpy2, I want to walk through some of the basics of setting up for bridging with it. If you’re working primarily in Python, you may find that you don’t have R installed. If that’s the case, go here: R Project for Statistical Computing. You’ll have to choose a download mirror, but it’s very easy to find everything you need on the site.
If you have both Python and R installed, move onto the easiest part of it all: Installation.
# Installation via pip
pip install rpy2
# Installation via conda
conda install conda-forge::rpy2
If you run into problems during your installation, confirm the following:
Your Python version is 3.7 or higher.
Your R version is 4.0 or higher.
Next you can import the most basic modules. See the comments embedded within the code snippets for explanations of their purpose.
# Supporting Python imports
import pandas as pd
import numpy as np
# rpy2 Imports
import rpy2 #Your base import
import rpy2.robjects as ro #objects submodule. You need this to declare R objects in your workflow.
from rpy2.robjects.packages import importr, isinstalled #Helps you manage your R installations
from rpy2.robjects import pandas2ri #Facilitates interoperability between R and Python for dataframes.
from rpy2.robjects.conversion import localconverter #Assists with converting Python to R.
Now that you have all of the basics installed for your bridging work, you can write the code to install all of the R libraries you will need. In the scenario below, imagine that you want to install two libraries: ARTool and emmeans. The comments in the code snippet below will explain what is happening.
# Import all of the needed R libraries
utils = importr('utils')
# Checks to see if ARTool exists in the R environment. If not, it is installed. If it exists, moves on.
if not isinstalled('ARTool'):
utils.install_packages('ARTool')
# Checks to see if emmeans exists in the R environment. If not, it is installed, but will skip it if it exists.
if not isinstalled('emmeans'):
utils.install_packages('emmeans')
# Specifies the name of the ARTool package in the R environment.
artool = importr('ARTool')
# Specifies the name of the emmeans package in the R environment.
emmeans_pkg = importr('emmeans')
More than likely you’re going to be manipulating datapoints captured in a dataframe. That means that you will want to convert your Python dataframe to an R dataframe:
# Convert to an R dataframe
with localconverter(ro.default_converter + pandas2ri.converter):
r_df = ro.conversion.py2rpy(df)
I want to take a moment and discuss conversions briefly. Converter objects, like you see above with default_converter, bundle together conversion rules. In rpy2, conversion of pandas objects into R objects can either be done automatically, or they can be controlled within a context. If you want the conversions to proceed automatically, you would use activate with something like pandas2ri:
# Allows for automatic conversion
pandas2ri.activate()
# Stops the automatic conversion
pandas2ri.dectivate()
If you want more control over the conversions, though, you would create a context. A context is a set of specific rules you can specify for interactions between Python and R, like conversions. I recommend using a context to control your conversions between Python and R to reduce the chance of throwing errors and having to waste time on debugging them. My original example using with localconverter(ro.default_converter + pandas2ri.converter has a context.
Something else to consider when you’re using R, especially if any columns in your dataframe will act as categories, is that you will need to do one of two things:
Convert the relevant columns to dtype of category in Python
Convert them to factors after converting your Python dataframe to an R dataframe.
I’ll assume that you know how to do the first option, so here is how you would convert a column in an R dataframe to a factor datatype:
# Convert col to factor
ro.r(f'{r_df_name}${col} <- as.factor({r_df_name}${col})')
In the above line of code, you may notice ```ro.r()``. This is a bridge to the interpreter. It alerts the interpreter that the string within the parentheses should be interpreted as R code. Using it enables you to define and access R variables, functions, and expressions within your Python environment. Let me show you a more detailed example where you would access a native R function.
# Define a vector of float objects
my_floats = ro.r(FloatVector([1.0, 1.75, 2.5, 3.25, 4.0]))
# Take the average using R's mean function - mean(x, trim=0, na.rm=FALSE)
mean_r_func = ro.r['mean']
# Apply mean_r_func
my_mean = mean_r_func(my_floats)
This covers the basics of bridging Python to R with rpy2. If you want to see an introduction with custom Python functions using rpy2, you can read my article on Medium: A Brief Introduction to rpy2