fbpx
Jupyter Notebook: Python or R—Or Both? Jupyter Notebook: Python or R—Or Both?
I was analytically betwixt and between a few weeks ago. Most of my Jupyter Notebook work is done in either Python... Jupyter Notebook: Python or R—Or Both?

I was analytically betwixt and between a few weeks ago. Most of my Jupyter Notebook work is done in either Python or R. Indeed, I like to self-demonstrate the power of each platform by recoding R work in Python and vice-versa.

I must have a dozen active notebooks, some developed in Python and some in R, that investigate the performance of the stock market, using indexes like the S&P 500, Wilshire 5000, and Russell 3000. These index data can be readily downloaded from the web, using core functionality of either of them, for subsequent analytic processing and display. I’ve learned a lot about both programs, and the stock market developing such notebooks. Alas, for the latest exercise I’d envisioned, I wished to look at Russell 3000 data, available in a Python notebook, through the lens of analytics I’d coded for the Wilshire 5000 in R. The mishmash of notebook kernel and stock market index combinations I possessed didn’t meet my needs.

[Related Article: Snakes in a Package: Combining Python and R with Reticulate]

I figured I had a couple of choices. I could either re-write the R graphics code in Python using the ever-improving Seaborn library, or I could adapt the R ggplot Wilshire code to interoperate with Russell 3000 data using the Python library rpy2 and RMagic—in effect engaging R within Python. I decided to do the latter.

R and Python are two of the top analytics platforms. Both are open source, both have large user bases, and both have incredibly productive ecosystems. In addition, they interoperate: Python developers can use the rpy2 library to include R code in their scripts, while R developers have access to Python via the reticulate package. There’s also the feather package, available in both, for sharing data across platforms. I fully expect even more seamless collaboration between them in the near future.

For the analysis that follows, I focus on the performance of the FTSE Russell 3000 index using Python. I first download two files—a year-to-date and a history, that provide final 3000 daily index levels starting in 2005. Attributes include index name, date, level without dividends reinvested, and level with dividends reinvested. I then wrangle the data to build the final Pandas dataframe. From there, I build R dataframes to show the growth of 1 invested over time suitable for ggplot2 charting.  <a href="https://opendatascience.com/introduction-to-r-shiny/" target="_blank" rel="noopener noreferrer"><em>[Related Article: Introduction to R Shiny]</em></a>  The technology used below is JupyterLab 0.32.1, Anaconda Python 3.6.5, Pandas 0.23.0, R 3.6.0, and rpy2 2.9.4. This notebook's kernel is Python 3 and uses the rpy2 library to enable R processing. In this instance, the initial data work is done in Python/Pandas, then handed off for graphics to the splendid R ggplot2 library. <div class="cell border-box-sizing text_cell rendered"> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Tone down warnings on pandas filters.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [1]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="kn">import</span> <span class="nn">warnings</span> <span class="n">warnings</span><span class="o">.</span><span class="n">filterwarnings</span><span class="p">(</span><span class="s2">"ignore"</span><span class="p">)</span>  <span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="se">\n\n</span><span class="s2">"</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre></pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Add a local directory housing a personal library to the import path.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [2]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="kn">import</span> <span class="nn">sys</span>  <span class="n">functdir</span> <span class="o">=</span> <span class="s2">"c:/data/jupyter/notebooks/functions"</span> <span class="n">sys</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">functdir</span><span class="p">)</span>  <span class="nb">print</span><span class="p">(</span><span class="s2">"</span><span class="se">\n\n</span><span class="s2">"</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre></pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Load the personal library. The prarr and blanks functions are used in this article.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [3]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="kn">from</span> <span class="nn">myfuncs</span> <span class="k">import</span> <span class="o">*</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre></pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Import other relevant Python libraries.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [4]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre></pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Set and migrate to the new working directory.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [5]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="n">owd</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getcwd</span><span class="p">()</span>  <span class="n">nwd</span> <span class="o">=</span> <span class="s2">"c:/data/russell/potus"</span> <span class="n">os</span><span class="o">.</span><span class="n">chdir</span><span class="p">(</span><span class="n">nwd</span><span class="p">)</span>  <span class="nb">print</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">getcwd</span><span class="p">())</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>c:\data\russell\potus    </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Establish a list of url's of the CSV files to be downloaded from the FTSE Russell website. The two files are year-to-date (ytd) and history (hist) of the daily Russell 3000 index levels.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [6]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="n">urllst</span> <span class="o">=</span> <span class="p">[</span> <span class="s2">"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US3000.csv"</span><span class="p">,</span> <span class="s2">"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US3000.csv"</span> <span class="p">]</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre></pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Load the ytd file into a pandas dataframe. Do a little munging.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [7]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="n">ytd</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">urllst</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="n">ytd</span><span class="o">.</span><span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"portfolio"</span><span class="p">,</span><span class="s2">"date"</span><span class="p">,</span><span class="s2">"levwodiv"</span><span class="p">,</span><span class="s2">"levwdiv"</span><span class="p">]</span> <span class="n">ytd</span><span class="p">[</span><span class="s1">'portfolio'</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"r3000"</span> <span class="n">ytd</span><span class="p">[</span><span class="s1">'date'</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DatetimeIndex</span><span class="p">(</span><span class="n">ytd</span><span class="o">.</span><span class="n">date</span><span class="p">)</span>  <span class="n">ytd</span> <span class="o">=</span> <span class="n">ytd</span><span class="o">.</span><span class="n">sort_values</span><span class="p">([</span><span class="s1">'date'</span><span class="p">])</span><span class="o">.</span><span class="n">iloc</span><span class="p">[</span><span class="mi">1</span><span class="p">:,]</span>  <span class="n">ytd</span><span class="o">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span><span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>  <span class="n">prarr</span><span class="p">(</span><span class="n">ytd</span><span class="p">)</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>  portfolio       date     levwodiv      levwdiv 0     r3000 2019-01-01  2704.369326  7733.105730 1     r3000 2019-01-02  2707.170883  7741.158427 2     r3000 2019-01-03  2643.051723  7559.582582 3     r3000 2019-01-04  2734.493258  7821.127295 4     r3000 2019-01-07  2757.993050  7888.401675       portfolio       date     levwodiv      levwdiv 118     r3000 2019-06-14  3117.538895  8996.253148 119     r3000 2019-06-17  3122.143259  9009.636226 120     r3000 2019-06-18  3153.227920  9099.614008 121     r3000 2019-06-19  3163.499294  9129.317529 122     r3000 2019-06-20  3191.645509  9211.282316    </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Ditto for hist.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [8]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="n">hist</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">read_csv</span><span class="p">(</span><span class="n">urllst</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="n">hist</span><span class="o">.</span><span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"portfolio"</span><span class="p">,</span><span class="s2">"date"</span><span class="p">,</span><span class="s2">"levwodiv"</span><span class="p">,</span><span class="s2">"levwdiv"</span><span class="p">]</span> <span class="n">hist</span><span class="p">[</span><span class="s1">'portfolio'</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"r3000"</span> <span class="n">hist</span><span class="p">[</span><span class="s1">'date'</span><span class="p">]</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">DatetimeIndex</span><span class="p">(</span><span class="n">hist</span><span class="o">.</span><span class="n">date</span><span class="p">)</span>  <span class="n">hist</span><span class="o">.</span><span class="n">sort_values</span><span class="p">([</span><span class="s1">'date'</span><span class="p">],</span><span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>  <span class="n">hist</span><span class="o">.</span><span class="n">reset_index</span><span class="p">(</span><span class="n">drop</span><span class="o">=</span><span class="kc">True</span><span class="p">,</span><span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>  <span class="n">prarr</span><span class="p">(</span><span class="n">hist</span><span class="p">)</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>  portfolio       date     levwodiv      levwdiv 0     r3000 2005-01-03  1261.856142  2742.745480 1     r3000 2005-01-04  1245.475031  2707.401151 2     r3000 2005-01-05  1238.467380  2692.398250 3     r3000 2005-01-06  1242.776350  2702.305567 4     r3000 2005-01-07  1240.174458  2696.663529        portfolio       date     levwodiv      levwdiv 3645     r3000 2018-12-25  2536.218945  7248.537164 3646     r3000 2018-12-26  2662.170289  7608.685862 3647     r3000 2018-12-27  2682.958961  7668.843455 3648     r3000 2018-12-28  2681.191917  7665.856863 3649     r3000 2018-12-31  2704.369326  7733.105730    </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Next, vertically "stack" hist and ytd into a "combine" dataframe.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [9]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="n">combine</span> <span class="o">=</span> <span class="n">pd</span><span class="o">.</span><span class="n">concat</span><span class="p">([</span><span class="n">ytd</span><span class="p">,</span><span class="n">hist</span><span class="p">])</span> <span class="n">combine</span><span class="o">.</span><span class="n">sort_values</span><span class="p">([</span><span class="s1">'date'</span><span class="p">],</span><span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>  <span class="nb">print</span><span class="p">(</span><span class="n">combine</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span><span class="s2">"</span><span class="se">\n\n</span><span class="s2">"</span><span class="p">)</span> <span class="n">prarr</span><span class="p">(</span><span class="n">combine</span><span class="p">)</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>(3773, 4)      portfolio       date     levwodiv      levwdiv 0     r3000 2005-01-03  1261.856142  2742.745480 1     r3000 2005-01-04  1245.475031  2707.401151 2     r3000 2005-01-05  1238.467380  2692.398250 3     r3000 2005-01-06  1242.776350  2702.305567 4     r3000 2005-01-07  1240.174458  2696.663529       portfolio       date     levwodiv      levwdiv 118     r3000 2019-06-14  3117.538895  8996.253148 119     r3000 2019-06-17  3122.143259  9009.636226 120     r3000 2019-06-18  3153.227920  9099.614008 121     r3000 2019-06-19  3163.499294  9129.317529 122     r3000 2019-06-20  3191.645509  9211.282316    </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Delete duplicate level records that emerge from holidays.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [10]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="n">combine</span><span class="o">.</span><span class="n">drop_duplicates</span><span class="p">(</span><span class="n">subset</span><span class="o">=</span><span class="p">[</span><span class="s1">'levwodiv'</span><span class="p">,</span><span class="s1">'levwdiv'</span><span class="p">],</span> <span class="n">inplace</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>  <span class="nb">print</span><span class="p">(</span><span class="n">combine</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span><span class="s2">"</span><span class="se">\n\n</span><span class="s2">"</span><span class="p">)</span> <span class="n">prarr</span><span class="p">(</span><span class="n">combine</span><span class="p">)</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>(3640, 4)      portfolio       date     levwodiv      levwdiv 0     r3000 2005-01-03  1261.856142  2742.745480 1     r3000 2005-01-04  1245.475031  2707.401151 2     r3000 2005-01-05  1238.467380  2692.398250 3     r3000 2005-01-06  1242.776350  2702.305567 4     r3000 2005-01-07  1240.174458  2696.663529       portfolio       date     levwodiv      levwdiv 118     r3000 2019-06-14  3117.538895  8996.253148 119     r3000 2019-06-17  3122.143259  9009.636226 120     r3000 2019-06-18  3153.227920  9099.614008 121     r3000 2019-06-19  3163.499294  9129.317529 122     r3000 2019-06-20  3191.645509  9211.282316    </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Compute percent change attributes for both level with dividends and level without dividends.  </div> </div> </div> <div class="cell border-box-sizing code_cell rendered"> <div class="input"> <div class="prompt input_prompt">In [11]:</div> <div class="inner_cell"> <div class="input_area"> <div class=" highlight hl-ipython3"> <pre><span class="n">combine</span><span class="p">[</span><span class="s1">'pctwodiv'</span><span class="p">]</span> <span class="o">=</span> <span class="n">combine</span><span class="o">.</span><span class="n">levwodiv</span><span class="o">.</span><span class="n">pct_change</span><span class="p">()</span> <span class="n">combine</span><span class="p">[</span><span class="s1">'pctwdiv'</span><span class="p">]</span> <span class="o">=</span> <span class="n">combine</span><span class="o">.</span><span class="n">levwdiv</span><span class="o">.</span><span class="n">pct_change</span><span class="p">()</span>  <span class="nb">print</span><span class="p">(</span><span class="n">combine</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span><span class="s2">"</span><span class="se">\n\n</span><span class="s2">"</span><span class="p">)</span> <span class="n">prarr</span><span class="p">(</span><span class="n">combine</span><span class="p">)</span>  <span class="n">blanks</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> </pre> </div> </div> </div> </div> <div class="output_wrapper"> <div class="output"> <div class="output_area"> <div class="prompt"></div> <div class="output_subarea output_stream output_stdout output_text"> <pre>(3640, 6)      portfolio       date     levwodiv      levwdiv  pctwodiv   pctwdiv 0     r3000 2005-01-03  1261.856142  2742.745480       NaN       NaN 1     r3000 2005-01-04  1245.475031  2707.401151 -0.012982 -0.012886 2     r3000 2005-01-05  1238.467380  2692.398250 -0.005626 -0.005541 3     r3000 2005-01-06  1242.776350  2702.305567  0.003479  0.003680 4     r3000 2005-01-07  1240.174458  2696.663529 -0.002094 -0.002088       portfolio       date     levwodiv      levwdiv  pctwodiv   pctwdiv 118     r3000 2019-06-14  3117.538895  8996.253148 -0.002559 -0.002440 119     r3000 2019-06-17  3122.143259  9009.636226  0.001477  0.001488 120     r3000 2019-06-18  3153.227920  9099.614008  0.009956  0.009987 121     r3000 2019-06-19  3163.499294  9129.317529  0.003257  0.003264 122     r3000 2019-06-20  3191.645509  9211.282316  0.008897  0.008978    </pre> </div> </div> </div> </div> </div> <div class="cell border-box-sizing text_cell rendered"> <div class="prompt input_prompt"></div> <div class="inner_cell"> <div class="text_cell_render border-box-sizing rendered_html">  Write a csv file from the final dataframe. Calculate the growth of1 from the inception of the data.

In [12]:
out = "r3000pd.csv"
combine.to_csv(out,index=None, sep=',')
print(round((1+combine.pctwodiv).prod(),2))
print(round((1+combine.pctwdiv).prod(),2))
blanks(2)
2.53
3.36

Load the rpy2 (R within Python) module

In [13]:
%load_ext rpy2.ipython
blanks(2)

Import pertinent rpy2 libraries.

In [14]:
import rpy2                    
import rpy2.robjects.numpy2ri  
import rpy2.robjects as robjects
robjects.pandas2ri.activate()
blanks(2)

Create a version of the Pandas combine dataframe suitable for R processing.

In [15]:
r3000 = robjects.pandas2ri.py2ri(combine[['date','pctwdiv']])
blanks(2)

Take a peek at the R data.frame.

In [16]:
%R -i r3000 tail(r3000)
Out[16]:
datepctwdiv
02019-06-13 00:00:00-05:000.004958
12019-06-14 00:00:00-05:00-0.002440
22019-06-17 00:00:00-05:000.001488
32019-06-18 00:00:00-05:000.009987
42019-06-19 00:00:00-05:000.003264
52019-06-20 00:00:00-05:000.008978

Load relevant R libraries.

In [17]:
%R require(tidyverse); require(data.table); require(RColorBrewer); require(R.utils); require(lubridate)
Out[17]:
array([1], dtype=int32)

Create a cell of R processing. Push in the r3000 dataframe and commence R wrangling and graphics processing. Display the final chart.

In [18]:
%%R  -w700 -h700 -i r3000 
r3000 <- data.table(r3000)
mdte <- max(r3000[['date']])
dte <- substr(mdte,6,11)
tdates <- lubridate::date(paste(c("2019-","2011-","2015-"),dte,sep=""))
fdates <- lubridate::date(c("2017-01-20","2009-01-20","2013-01-21"))
#############################################
# function to build the data.table for ggplot.
#############################################
nmkreturn <- function(to,from,dt)
{
    
    rbind(
    data.table(potus='Trump',
           date=dt[date>=fdates[1] & date<=tdates[1]]</span><span class="kp">date</span><span class="p">,</span>            returnpct<span class="o">=</span>dt<span class="p">[</span><span class="kp">date</span><span class="o">>=</span>fdates<span class="p">[</span><span class="m">1</span><span class="p">]</span> <span class="o">&</span> <span class="kp">date</span><span class="o"><=</span>tdates<span class="p">[</span><span class="m">1</span><span class="p">],</span><span class="kp">cumprod</span><span class="p">(</span><span class="m">1</span><span class="o">+</span>pctwdiv<span class="p">)]</span>           <span class="p">),</span>       data.table<span class="p">(</span>potus<span class="o">=</span><span class="s">'Obama 1'</span><span class="p">,</span>            date<span class="o">=</span>dt<span class="p">[</span><span class="kp">date</span><span class="o">>=</span>fdates<span class="p">[</span><span class="m">2</span><span class="p">]</span> <span class="o">&</span> <span class="kp">date</span><span class="o"><=</span>tdates<span class="p">[</span><span class="m">2</span><span class="p">]]</span><span class="o">date,
           returnpct=dt[date>=fdates[2] & date<=tdates[2],cumprod(1+pctwdiv)]
          ),
    data.table(potus='Obama 2',
            date=dt[date>=fdates[3] & date<=tdates[3]]</span><span class="kp">date</span><span class="p">,</span>             returnpct<span class="o">=</span>dt<span class="p">[</span><span class="kp">date</span><span class="o">>=</span>fdates<span class="p">[</span><span class="m">3</span><span class="p">]</span> <span class="o">&</span> <span class="kp">date</span><span class="o"><=</span>tdates<span class="p">[</span><span class="m">3</span><span class="p">],</span><span class="kp">cumprod</span><span class="p">(</span><span class="m">1</span><span class="o">+</span>pctwdiv<span class="p">)]</span>           <span class="p">)</span>       <span class="p">)</span>       <span class="p">}</span>  <span class="c1">#########################################</span> <span class="c1"># nwork is the final graphics data.table.</span> <span class="c1">#########################################</span>  work <span class="o"><-</span> nmkreturn<span class="p">(</span>tdates<span class="p">,</span>fdates<span class="p">,</span>r3000<span class="p">)</span> nwork <span class="o"><-</span> data.table<span class="p">(</span>left_join<span class="p">(</span>work<span class="p">,</span>work<span class="p">[,</span><span class="m">.</span><span class="p">(</span>legend<span class="o">=</span><span class="kp">paste</span><span class="p">(</span>potus<span class="p">,</span><span class="kp">date</span><span class="p">[</span><span class="m">1</span><span class="p">],</span> <span class="kp">date</span><span class="p">[</span><span class="m">.</span>N<span class="p">],</span> <span class="s">"\n"</span><span class="p">,</span>sep<span class="o">=</span><span class="s">"\n"</span><span class="p">)),</span><span class="m">.</span><span class="p">(</span>potus<span class="p">)],</span>by<span class="o">=</span><span class="kt">c</span><span class="p">(</span><span class="s">"potus"</span><span class="o">=</span><span class="s">"potus"</span><span class="p">)))</span>    X <span class="o"><-</span> nwork<span class="p">[,</span><span class="m">.</span><span class="p">(</span><span class="kp">date</span><span class="p">[</span><span class="m">.</span>N<span class="p">],</span>returnpct<span class="o">=</span><span class="kp">round</span><span class="p">(</span><span class="m">100</span><span class="o">*</span><span class="p">(</span>returnpct<span class="p">[</span><span class="m">.</span>N<span class="p">])</span><span class="m">-100</span><span class="p">)),</span><span class="m">.</span><span class="p">(</span>potus<span class="p">)]</span><span class="o">V1
Y <- nwork[,.(date[.N],returnpct=round(100*(returnpct[.N])-100)),.(potus)]$returnpct
###############################
# save data to an rds data set.
###############################
ofile = "r3000.rds"
save(r3000,X,work,nwork,file=ofile)
 
###########################################
# set parm vars and execute the ggplot code.
###########################################
titstr <- paste("Russell 3000 Returns", " thru ", mdte,sep="")
stitstr <- "Trump vs Obama\n"
xstr <- "\nYear"
ystr <- "Growth %\n"
cstr <- "Administration\n"
pal <- brewer.pal(9,"Blues")
g <- ggplot(nwork,aes(x=date,y=100*(returnpct-1), col=legend)) +
      geom_line(size=.7) +
      theme(legend.position = "right", plot.background = element_rect(fill = pal[2]), 
      panel.background = element_rect(fill = pal[2])) +
      theme(axis.text.x = element_text(size=10, angle = 45)) +
      labs(title=titstr,subtitle=stitstr,x=xstr,y=ystr,col=cstr) + 
      annotate("text", x = X+100*24*60*60, y = Y, label = Y, size=4) 
g

28 month stock market performance is solid for 45. Comparable period performance is even better for each administration of 44.

About author
Steve Miller

Steve Miller

Steve is a data scientist focusing on research methods, data integration, statistics/ML, and computational analytics in R/python. Follow him on LinkedIn: https://www.linkedin.com/in/steve-miller-58ab881/

1