R User Community Worldwide: A Data-Driven Exploration
ModelingRTools & LanguagesRr ladiesr user communityworldwide r usersposted by R Central September 26, 2019 R Central
This posting discusses the worldwide R user community. R is a programming language and environment for statistical computing and data visualization. An important component of the R ecosystem is its powerful user community, which has continued to expand around the world over the years. In a previous blog post we announced an open-source dynamic dashboard dedicated to R-Ladies (a world-wide organization to promote gender diversity in the R community). We have now extended that work to encompass all R user groups organized on Meetup. Our new dashboard uses the Meetup API to retrieve public information about R User Groups (RUG) organized on its platform.
[Related Article: Where is Data Science Heading? Watching R’s Most Popular Packages May Have the Answer]
In this article, we present our motivation for this work, the challenges we faced and the tools and procedures used to surmount those challenges, and highlights from our final product.
Why a dashboard to explore R user groups globally now?
- We sought to present an objective representation of R’s popularity that would inform members of the data science community about its growth and activities. By making our solution open-source and data-driven, we maximized transparency and enhanced trustworthiness. This dashboard also helps leaders understand their user groups in a broader context while revealing opportunities for potential leaders to initiate new groups. Finally, the depiction of global distribution contributes to an important and evolving story about trends and opportunities in different parts of the world.
- Communities may flourish more easily when a unified, easily accessible and trusted resource presents information about its members and activities. Information regarding R user groups is spread across several sources and each source, while valuable, displays only a portion of the available information. A list of some sources we found are:
We therefore built a more comprehensive resource that aggregated much of the data in the last three sources listed above, which are all organized on the Meetup platform. Incidentally, our next target is to represent groups not organized on Meetup. Jumping Rivers provides a valuable list. Please reach out to us if you know of other resources we could explore.
- For decision making, R centered organizations may need to geographically measure the presence of R users and groups for events planning, diversity programs, and other activities that could expand the use of the R language in under-represented regions.
New and existing R users would have a resource through which they can find groups near them and learn how to get involved.
The R Consortium
Two out of three top-level R Consortium projects are focused on R user groups. Therefore, for the R Consortium, might be useful to map local chapters labelled by their activity level. The RUG dashboard shows active, inactive and unbegun R user groups through map markers, and links to the R Consortium’s RUGS grant program.
Classifying R User Community Groups
Classifying RUGs on Meetup.com was quite challenging , for reasons including:
- Some R user groups do not include “R” among their topics or areas of focus, or include only “R Project for Statistical Computing”, including ones with names comprised of “Location” + “R User Group” (e.g. “Las Vegas R User Group”) and “Location” + “Data Science” or “Analytics” (e.g. “Charleston Data Science”).
- There are other groups that have Python, Julia, KNIME, Hadoop, etc. in their Meetup names, yet mention “R Project for Statistical Computing” in their topics, or mention R and other languages in their names and topics. Classifying user-groups by searching their Meetup names alone and not searching their topics fields excluded many of such groups.
- Apart from “R Project for Statistical Computing”, many groups identify with the R language by referencing phrases such as “Programming in R”, “Data Science using R”, “R Programming Language”.
- Other unexpected name stylings were discovered as exemplified by: useRs, PhillyR, BelgradeR, etc.
It was significantly easier to find R-Ladies groups on Meetup because R-Ladies maintains a consistent naming structure across all groups: “R-Ladies” + “Location” (e.g. R-Ladies Madison). Only one find_groups() call from the meetupr package was sufficient.
Curtis Kephart generously provided us a key to solving this problem – a string matching he uses to retrieve upcoming R events. This technique involved retrieving all meetup groups associated with “data-science” and filtering the results for R user groups by searching their Meetup names for common strings associated with R user groups. We extended his approach to:
- Retrieve all data science groups on Meetup (7700 +) and use string matching to select groups that contain strings like “r user”, “r-user”,“r-lab”,“phillyr”,“rug”,“bioconductor”,“r-data”,“rug” in their Meetup URL names. We then performed a second round of string matching to search for strings like “programming-in-r”, “r-programming-”, “-using-r”, “r-language”, and “r-project-for-statistical” in the groups’ topics field.
- Retrieve all data analysis groups on Meetup (1190 +) and use string matching to select groups that contain strings like “r user”, “r-user”,“r-lab”,“phillyr”,“rug”,“bioconductor”,“r-data”,“rug” in their Meetup URL names. We then performed a second round of string matching to search for strings like “programming-in-r”, “r-programming-”, “-using-r”, “r-language”, and “r-project-for-statistical” in the groups’ topics field.
- Retrieve all user groups on Meetup that mention “r-project-for-statistical-computing” in their topics separately.
- Retrieve all R-Ladies groups separately because some were left out by the aforementioned matches.
We then removed duplicates from the aggregated data frame.
What Was Achieved
For the R user groups dashboard, the following was achieved:
- We used the meetupr package to extract R user groups from Meetup.com
- Improved the existing find_groups() and get_events() functions in meetupr to meet our requirements, and updated the API key usage to the recently required OAuth 2.0 authentication system
- Transformed the data retrieved from Meetup via meetupr from data frames to JSON, GeoJSON and CSV
- Stored the data by committing the JSON/GeoJSON/CSV files to the GitHub repository of the project
- Developed a static HTML dashboard interface based on Gentelella open-source Bootstrap template and rendered the stored data via dashboard components.
- Automated the process of extracting R user groups, data transformation and storage using Travis CI.
- Deployed the dashboard via GitHub Pages
Switching from Meetup API keys to OAuth 2.0 Authentication System
During our work, Meetup announced a removal of keys from their authentication system and a migration to the OAuth 2.0 system, which normally operates in an interactive session via web browser. We implemented this migration in a non-interactive context through Travis continuous integration (Travis CI) by encrypting a locally generated token automatically cached by the httr package.
We generated this token in an interactive session, cached it in a file named .httr-oauth, saved it as an .rds file, encrypted it using openssl, and pushed it to our repo. During each Travis build/cron job, Travis decrypts the token and the R scripts use that value to make Meetup API calls. The ideas for this approach were borrowed from this vignette on googlesheets and Travis’ documentation.
The sodium package featured in this article provides valuable encryption/decryption functionality available to all operating systems on Travis, AppVeyor, and r-hub. We hope to explore this in the future.
Some Highlights from the Dashboard
- A leaflet map with markers and pop-ups filled with information about the user groups’ membership, events, and status (active, inactive, or unbegun).
- Top destinations for R user groups based on membership across 6 regions.
To accomplish the dashboard, we used a mix of the tools listed below:
- R, RStudio and the following packages:
- meetupr, curl, jsonlite and leafletR
- Gentelella Admin Dashboard Bootstrap HTML template
- Travis CI to build the project, execute R scripts and bash commands
- Bash commands to call R scripts and commit modified files to GitHub
How We Achieved It
- We used the meetupr R package to retrieve R User Groups from meetup.com.
- We further analyzed this data by computing several summaries. We used the leafletR package to transform our data frame to GeoJSON. We used this GeoJSON file to create a leaflet map using leaflet.js. In this map, R user groups are separated into three groups with markers of three color categories: Active (blue), Inactive (dark-blue), and Unbegun (orange):
- Active groups have had an event in the past 180 days or have an upcoming event in the future
- Inactive groups have not had an event in the past 180 days and do not have an upcoming event
- Unbegun groups have not had an event in the past and none are planned for the future
- Persisted all data and our summaries in CSV / JSON files. After each Travis build, the data and our summaries get updated directly from the Meetup API.
- We wrote bash commands to run our R scripts, and commit updated CSV / JSON files to GitHub after every Travis build.
- We setup Travis Cron Jobs to build this project daily and update our data. This update happens around 09:45AM UTC.
- We then customized the Gentelella Admin Dashboard Bootstrap HTML template to our requirements.
The Final Product
At the time of writing this article, there are 883 R User Groups composed of 658,000 + members, across 83 countries, 404 cities, with more than 16,400+ past events and 500+ upcoming events. 63% of R user groups are active, 25% are inactive, and 12% are unbegun. Unbegun groups have members but have not started organizing events yet. Our observation is that members are added to the R community daily.
We are in hopes of expanding the features of this dashboard beyond its current state and we would love to hear from you if you have any ideas or find issues. We have received significant feedback from R users and these have helped us update our string matching process, helping us discover more user groups that mention R in their names and in their Meetup topics. Through these modifications, we have seen user-group count increase from 600+ to 800+ and several other figures have been automatically updated.
Feel free to Follow / Star the project at its GitHub repo: https://github.com/benubah/r-community-explorer/
We appreciate Curtis Kephart (RStudio) for contributing code that helped us with ideas on classifying R user groups.
[Related Article: Comparing Point-and-Click Front Ends for R]
We also thank the authors of the meetupr package for their excellent work. Special thanks to Jenny Bryan, Erin LeDell, and Greg Sutcliffe for their help over the last month with implementing the requirements for the new Meetup OAuth 2.0 authentication system.
Please reach out in the issues section of this project if you have thoughts about enhancements of this work or would like to collaborate. As mentioned above, we would like to use other data sources to add groups not featured on Meetup.
Authors: Benaiah Ubah, Claudia Vitolo, and Rick Pack