A Practical Approach to Data Ethics

A Practical Approach to Data Ethics

WordPress database error: [Unknown column 'pop.last_viewed' in 'where clause']
SELECT pop.postid FROM wp_popularpostssummary as pop, wp_posts as p WHERE pop.postid = p.ID AND pop.last_viewed > DATE_SUB('2018-12-12 03:14:01', INTERVAL 1 WEEK) AND p.post_type = "post" GROUP BY pop.postid ORDER BY SUM(pop.pageviews) DESC LIMIT 5

Tools & LanguagesWorkflowData Ethicsposted by Aakash Gupta November 30, 2018

There is a Golden Rule in life. It’s a maxim that appears in various forms around the world: One should never do that to another which one regards as injurious to one’s own self. As a data scientist, I find this principle of reciprocity very appealing! Treat others’ data... Read more
Monthly Summary of Selected Trends, Activities and Insights for R – October 2018
Abstract In October, the spike in activities observed in September across the R ecosystem was maintained. In the following article, a summary of selected trends, activities, and insights around the R language in October, 2018, are presented as the R language keeps trending. Data for the trends and activities... Read more
Alexandru Agachi of Empiric Capital on “Handling Missing Data in Python/Pandas” at ODSC Europe 2018
Key Takeaways: It’s important to describe missing data and the challenges it poses. You need to clarify a confusing terminology that further adds to the field’s complexity. You should take the time to review methods for handling missing data. You need to learn how to apply robust multiple imputation... Read more
How Tidyverse Guides R Programmers Through Data Science Workflows
Whenever someone asks me how to get into data science using R, I invariably recommend checking out the tidyverse package. Tidyverse is a great launch pad for a language like R because it offers order and consistency. I studied programming language design as a CS undergrad. At the time,... Read more
Vectorizing Random Number Generators for Greater Speed: PCG and xorshift128+ (AVX-512 edition)
Most people designing random number generators program using regular code. If they are aiming for speed, they probably write functions in C. However, our processors have fast “vectorized” (or SIMD) instructions that can allow you to go faster. These instructions do several operations at once. For example, recent Skylake-X... Read more
Introducing ODPi Egeria – The Industry’s First Open Metadata Standard
Organizations looking to better locate, understand, manage and gain value from their data have a new industry standard to leverage. ODPi, a nonprofit Linux Foundation organization focused upon accelerating the open ecosystem of big data solutions, recently announced ODPi Egeria, a new project that supports the free flow of... Read more
Build a Multi-Class Support Vector Machine in R
Support Vector Machines (SVMs) are quite popular in the data science community. Data scientists often use SVMs for classification tasks, and they tend to perform well in a variety of problem domains. An SVM performs classification tasks by constructing hyperplanes in a multidimensional space that separates cases of different... Read more
Setting Your Hypothesis Test Up For Success
I want to go deep with you on exactly how I work with stakeholders ahead of launching a hypothesis test. This step is crucial to make sure that once a test is done running, we’ll actually be able to analyze it. This includes: A well-defined hypothesis A solid test... Read more
Client-side Web Development and Machine Learning
You might not expect client-side web development and machine learning to be in the same sentence. In this article, however, we’re going to look at how and why these two are beginning to collaborate rather successfully. There are many hidden uses for a collaboration between Javascript and machine learning.... Read more
Monthly Summary of Selected Trends, Activities, and Insights for R – September 2018
In September, there was a serious spike in activities across the R ecosystem. This article examines a summary of selected R trends, activities, and insights in September. Data for the trends and activities summarized here were obtained from popular websites used by the R community such as Google, GitHub, StackOverflow,... Read more