

Jupyter Notebook: Python or R—Or Both?
PythonRTools & LanguagesPythonRposted by Steve Miller June 13, 2019 Steve Miller

I was analytically betwixt and between a few weeks ago. Most of my Jupyter Notebook work is done in either Python or R. Indeed, I like to self-demonstrate the power of each platform by recoding R work in Python and vice-versa.
I must have a dozen active notebooks, some developed in Python and some in R, that investigate the performance of the stock market, using indexes like the S&P 500, Wilshire 5000, and Russell 3000. These index data can be readily downloaded from the web, using core functionality of either of them, for subsequent analytic processing and display. I’ve learned a lot about both programs, and the stock market developing such notebooks. Alas, for the latest exercise I’d envisioned, I wished to look at Russell 3000 data, available in a Python notebook, through the lens of analytics I’d coded for the Wilshire 5000 in R. The mishmash of notebook kernel and stock market index combinations I possessed didn’t meet my needs.
[Related Article: Snakes in a Package: Combining Python and R with Reticulate]
I figured I had a couple of choices. I could either re-write the R graphics code in Python using the ever-improving Seaborn library, or I could adapt the R ggplot Wilshire code to interoperate with Russell 3000 data using the Python library rpy2 and RMagic—in effect engaging R within Python. I decided to do the latter.
R and Python are two of the top analytics platforms. Both are open source, both have large user bases, and both have incredibly productive ecosystems. In addition, they interoperate: Python developers can use the rpy2 library to include R code in their scripts, while R developers have access to Python via the reticulate package. There’s also the feather package, available in both, for sharing data across platforms. I fully expect even more seamless collaboration between them in the near future.
For the analysis that follows, I focus on the performance of the FTSE Russell 3000 index using Python. I first download two files—a year-to-date and a history, that provide final 3000 daily index levels starting in 2005. Attributes include index name, date, level without dividends reinvested, and level with dividends reinvested. I then wrangle the data to build the final Pandas dataframe. From there, I build R dataframes to show the growth of 1 from the inception of the data.
out = "r3000pd.csv"
combine.to_csv(out,index=None, sep=',')
print(round((1+combine.pctwodiv).prod(),2))
print(round((1+combine.pctwdiv).prod(),2))
blanks(2)
Load the rpy2 (R within Python) module
%load_ext rpy2.ipython
blanks(2)
Import pertinent rpy2 libraries.
import rpy2
import rpy2.robjects.numpy2ri
import rpy2.robjects as robjects
robjects.pandas2ri.activate()
blanks(2)
Create a version of the Pandas combine dataframe suitable for R processing.
r3000 = robjects.pandas2ri.py2ri(combine[['date','pctwdiv']])
blanks(2)
Take a peek at the R data.frame.
%R -i r3000 tail(r3000)
Load relevant R libraries.
%R require(tidyverse); require(data.table); require(RColorBrewer); require(R.utils); require(lubridate)
Create a cell of R processing. Push in the r3000 dataframe and commence R wrangling and graphics processing. Display the final chart.
%%R -w700 -h700 -i r3000
r3000 <- data.table(r3000)
mdte <- max(r3000[['date']])
dte <- substr(mdte,6,11)
tdates <- lubridate::date(paste(c("2019-","2011-","2015-"),dte,sep=""))
fdates <- lubridate::date(c("2017-01-20","2009-01-20","2013-01-21"))
#############################################
# function to build the data.table for ggplot.
#############################################
nmkreturn <- function(to,from,dt)
{
rbind(
data.table(potus='Trump',
date=dt[date>=fdates[1] & date<=tdates[1]]
date,
returnpct=dt[date>=fdates[2] & date<=tdates[2],cumprod(1+pctwdiv)]
),
data.table(potus='Obama 2',
date=dt[date>=fdates[3] & date<=tdates[3]]
V1
Y <- nwork[,.(date[.N],returnpct=round(100*(returnpct[.N])-100)),.(potus)]$returnpct
###############################
# save data to an rds data set.
###############################
ofile = "r3000.rds"
save(r3000,X,work,nwork,file=ofile)
###########################################
# set parm vars and execute the ggplot code.
###########################################
titstr <- paste("Russell 3000 Returns", " thru ", mdte,sep="")
stitstr <- "Trump vs Obama\n"
xstr <- "\nYear"
ystr <- "Growth %\n"
cstr <- "Administration\n"
pal <- brewer.pal(9,"Blues")
g <- ggplot(nwork,aes(x=date,y=100*(returnpct-1), col=legend)) +
geom_line(size=.7) +
theme(legend.position = "right", plot.background = element_rect(fill = pal[2]),
panel.background = element_rect(fill = pal[2])) +
theme(axis.text.x = element_text(size=10, angle = 45)) +
labs(title=titstr,subtitle=stitstr,x=xstr,y=ystr,col=cstr) +
annotate("text", x = X+100*24*60*60, y = Y, label = Y, size=4)
g
28 month stock market performance is solid for 45. Comparable period performance is even better for each administration of 44.