I was analytically betwixt and between a few weeks ago. Most of my Jupyter Notebook work is done in either Python or R. Indeed, I like to self-demonstrate the power of each platform by recoding R work in Python and vice-versa.
I must have a dozen active notebooks, some developed in Python and some in R, that investigate the performance of the stock market, using indexes like the S&P 500, Wilshire 5000, and Russell 3000. These index data can be readily downloaded from the web, using core functionality of either of them, for subsequent analytic processing and display. I’ve learned a lot about both programs, and the stock market developing such notebooks. Alas, for the latest exercise I’d envisioned, I wished to look at Russell 3000 data, available in a Python notebook, through the lens of analytics I’d coded for the Wilshire 5000 in R. The mishmash of notebook kernel and stock market index combinations I possessed didn’t meet my needs.
[Related Article: Snakes in a Package: Combining Python and R with Reticulate]
I figured I had a couple of choices. I could either re-write the R graphics code in Python using the ever-improving Seaborn library, or I could adapt the R ggplot Wilshire code to interoperate with Russell 3000 data using the Python library rpy2 and RMagic—in effect engaging R within Python. I decided to do the latter.
R and Python are two of the top analytics platforms. Both are open source, both have large user bases, and both have incredibly productive ecosystems. In addition, they interoperate: Python developers can use the rpy2 library to include R code in their scripts, while R developers have access to Python via the reticulate package. There’s also the feather package, available in both, for sharing data across platforms. I fully expect even more seamless collaboration between them in the near future.
For the analysis that follows, I focus on performance of the FTSE Russell 3000 index using Python. I first download two files—a year-to-date and a history, that provide final 3000 daily index levels starting in 2005. Attributes include index name, date, level without dividends reinvested, and level with dividends reinvested. I then wrangle the data to build the final Pandas dataframe. From there, I build R dataframes to show the growth of $1 invested over time suitable for ggplot2 charting.
[Related Article: Introduction to R Shiny]
The technology used below is JupyterLab 0.32.1, Anaconda Python 3.6.5, Pandas 0.23.0, R 3.6.0, and rpy2 2.9.4. This notebook’s kernel is Python 3 and uses the rpy2 library to enable R processing. In this instance, the initial data work is done in Python/Pandas, then handed off for graphics to the splendid R ggplot2 library.
Tone down warnings on pandas filters.
Add a local directory housing a personal library to the import path.
Load the personal library. The prarr and blanks functions are used in this article.
Import other relevant Python libraries.
Set and migrate to the new working directory.
Establish a list of url’s of the CSV files to be downloaded from the FTSE Russell website. The two files are year-to-date (ytd) and history (hist) of the daily Russell 3000 index levels.
Load the ytd file into a pandas dataframe. Do a little munging.
Ditto for hist.
Next, vertically “stack” hist and ytd into a “combine” dataframe.