What Statistical Test Should I Do?
Being a teaching assistant in statistics for students with diverse backgrounds, I have the chance to see what is globally not well understood by students. I have realized that it is usually not a problem for students to do a specific statistical test when they are told which... Read more
How is Data Collection Used in the Justice System?
There’s no question that the world is becoming increasingly reliant on data and the criminal justice system is no exception. The justice system in the United States has used various data types and forms of data collection for years. For example, police departments, states, and the... Read more
Paving the Road to Facial Classification Accuracy
Facial classification is one of the most promising and controversial machine learning use cases. The technology has considerable potential in areas like security, but it also carries substantial privacy and bias concerns. Relying on racial recognition models that aren’t accurate can lead to severe consequences. Facial... Read more
How to Install R and RStudio
R is nothing more than a programming language. At the time of writing, this language is (one of) the leading program in statistics, although not the only programming language used by statisticians. In order to use R,... Read more
Why Is Python the Language of Choice for Data Scientists?
Python has grown to become one of the most popular and well-liked programming languages in the world, used by millions of developers since its creation in 1991. For data scientists in particular, Python has a strong, long-time base of developers. Why is Python the language of... Read more
PyCharm vs. VSCode: Which Is the Better Python IDE?
Python first debuted in 1991, making it older than many of the people who use it. In the intervening years, coders have turned it into one of the most popular programming languages ever conceived. The reasons for Python’s perennial popularity come down to three major features.... Read more
Supercharge Your Pandas Code with Apache Spark
Editor’s Note: Itai Yaffe and Daniel Haviv are speakers for ODSC East 2022. Be sure to check out their talk, “A bamboo of Pandas: crossing Pandas’ single-machine barrier with Apache Spark,” there! Pandas is a fast and powerful open-source data analysis and manipulation framework written in... Read more
3 Easy Tricks to Create New Columns in Python Pandas
In data processing & cleaning, we need to create new columns based on values in existing columns. In this blog, I explain How to create new columns derived from existing columns” with 3 simple methods. · Use lambda Function with apply() method · Use numpy.select() method... Read more
Tips and Tricks in RStudio and R Markdown
If you have the chance to work with an experienced programmer, you may be amazed by how fast she can write code. In this article, I share some tips and shortcuts you can use in RStudio and R Markdown to speed up the writing of your... Read more
Is Groovy a Viable Language for Data Science Applications? 5 Pros and Cons
Choosing the right programming language can make a remarkable difference in data science applications. While the industry standards are Python and R, some data scientists have branched off to use others they prefer. One such possible alternative is the Groovy programming language. Apache Groovy is an... Read more