fbpx
An Introduction to Orchestrating Data Assets with Dagster
Editor’s note: Sandy Ryza is a speaker for ODSC West this November 1st-3rd. Be sure to check out his talk, “Orchestrating Data Assets instead of Tasks, with Dagster,” there! Dagster is an open-source data orchestrator: a framework for building and running data pipelines, similar to how... Read more
What to Consider When Building Data Pipelines
In 2021 we watched Fivetran raise $565 million, Airbyte $150 Million, Matillion $100 million, Rivery raised $16 million and Informatica went public. All of these companies have some piece of their business connected to data pipelines. Also sometimes referenced as ETL, ELT, E(t)LT, and CDC. For... Read more
Are You Ready to Lead a Data Science Project?
What is the problem that is compelling you to solve using data science? The power in data and the mechanisms to harness this power is now available to us. Identifying the right problem or use case is the first step. There are multiple use cases across... Read more
Three Methods of Data Pre-Processing for Text Classification
Editor’s Note: Nick will be presenting on this idea of data pre-processing during the workshop “Choosing The Right Deep Learning Framework: A Deep Learning Approach,” at ODSC Europe in London this November! As a developer advocate at IBM, I work to empower AI, machine learning, and... Read more
From Data to Process to Decision
I recently published a paper entitled ”Intelligent Decisions: How Businesses Can Improve Processes Using Artificial Intelligence Technologies.” The work focused on the possibility of employing artificial intelligence in the business process management functions of the enterprise. I would like to further explore this concept and investigate... Read more
5 DevOps Challenges To Overcome To Gain Productivity
Editor’s Note: Is your business ready to implement DevOps? Learn more at ODSC West on how you can do just that. DevOps brought the development community to the agile era where multiple teams can work in a collaborative environment sharing their skills, knowledge and development responsibilities. As... Read more
Data Science + Design Thinking: a Perfect Blend to Achieve the Best User Experience
  It’s one thing to rely on artificial intelligence, machine learning, and big data to make your product smarter.  And, quite another to build a product that’s so intuitive and easy-to-use that your customer falls in love with it. That’s the beauty of data science +... Read more
The Data Scientist’s Holy Grail – Labeled Data Sets
The Holy Grail for data scientists is the ability to obtain labeled data sets for the purpose of training a supervised machine learning algorithm. An algorithm’s ability to “learn” is based on training it using a labeled training set – having known response variable values that... Read more
A Practical Approach to Data Ethics
There is a Golden Rule in life. It’s a maxim that appears in various forms around the world: One should never do that to another which one regards as injurious to one’s own self. As a data scientist, I find this principle of reciprocity very appealing!... Read more
How Tidyverse Guides R Programmers Through Data Science Workflows
Whenever someone asks me how to get into data science using R, I invariably recommend checking out the tidyverse package. Tidyverse is a great launch pad for a language like R because it offers order and consistency. I studied programming language design as a CS undergrad.... Read more