At ODSC events, our goal is to provide training and workshops that help everyone from beginners to experienced data scientists and software engineers accelerate their hands-on skills in all areas of data science and AI. Here are our top five hands-on training focus areas that every data scientist should know and that we’re paying extra attention to at the ODSC East 2020 Virtual Conference this April 14-17. The data science skills for 2020 are…

Recent breakthroughs in Natural Language Processing – coupled with the fact that many companies are awash in human language data – solidifies this as one of the most in-demand hands-on skills of 2020. NLP transformer architectures was one of ODSC speakers’ favorite topics in 2019 due to major advances such as OpenAI’s GPT-2, Google’s BERT, DiDI’s Elmo, and FaceBook’s RoBERTa.  API libraries like Hugging Face’s transformer library have greatly accelerated their adoption. These pre-trained models, especially BERT, have been especially hailed by many as NLPs Imagenet moment. By employing new techniques like bi-directional sequencing and transformers, these models are saving data scientists the time and expenses normally required to train NLP models, thus marking a major development. Firing up and tuning these pre-trained models should be top of anyone’s list for 2020.


Deep Learning and Machine Learning

Hands-on skills with deep learning and machine learning are the bread and butter of any active practitioner.  TensorFlow, PyTorch, Keras, and scikit-learn are a few of the popular tools and frameworks that saw major releases in 2019 that further cemented their position as the leading machine learning and deep learning tools – and are featured in many of our hands-on training sessions. The release of TensorFlow 2.0 kept its position as the top framework, but PyTorch continued to get a lot of traction in 2019. Getting some hands-on experience with the latest releases in these prolific frameworks is a must for 2020.

MLOps & Workflow

Our new MLOps and Data Engineering focus area tracks coincide with the massive ramp-up of efforts to increase the percent of data science projects deployed to production. Machine Learning lifecycle tools like MLFlow and Kubeflow continue to grow in popularity, as do workflow tools like Airflow.  Interest in AutoML grew exponentially in 2019, given its potential as a productivity tool in all stages of the machine learning life cycle.  Sessions around labeling & annotation (LabelImg etc), model interoperability, pipelines, deployment, and testing saw increased interest. This coupled with the fact that 2019 saw a significant drop in the cost of modeling helped accelerate deployment in production environments. Thus understanding how to use some of these tools to build, test, deploy, and monitor your model in production will be the norm in 2020 

Trusted and Responsible AI

Increased model deployment in the real-world has raised the importance of Trusted and Responsible AI greatly. Hands-on experience in security, privacy, fairness, and explainability is pretty much a requirement for anyone practicing data science today.  Tools like IBM’s AI Fairness 360 Toolkit and Google’s Differential Privacy library – which allows one to draw insights from massive datasets while protecting user privacy – were but two of many popular projects in 2019 projects that allowed teams to put Responsible AI into practice. Microsoft’s SEAL, TensorFlow Privacy, Advertorch  RBC Capital, InterpretML, ALIBI were some of the additional tools released for this category that practitioners can use to implement responsible AI. 

Research Frontiers

ArXiv published over 21,000 papers on AI and data science topics in 2019 alone which doubled 2018’s figure. Research is not an area one would normally associate with hands-on training sessions; however, more experienced practitioners can benefit by staying current on emerging research topics, especially those that quickly move to applied applications.  For example, Pieter Abbeel, a leading researcher from the UC Berkeley BAIR lab, ran a deep reinforcement learning session that was very well-received at ODSC West and is a workshop topic we will continue to explore in 2020. Other research topics that we are excited about in terms of real-world potential include Federated Learning,  Advances in Recommendation Systems, Adversarial Deep Learning, Active Learning,  Semi-supervised & Self-supervised learning, Causal Inference with Machine Learning and Detecting AudioVisual Fakes, and Deepfakes are all areas where we will be hosting hands-on workshops on in 2020

Bonus Skill – AI for Climate

AI for social good is an important track at all our events. We are excited to host our first AI for Climate track, and are welcoming industry experts such as Microsoft researcher Lester Mackey, who won the $50K Prize4Life ALS disease progression prediction challenge. He has also won prizes for temperature and precipitation forecasting in the yearlong, real-time $800K Subseasonal Climate Forecast Rodeo. Experts like Lester Mackey are helping us understand how we can put our data science talents to use in tackling one of the most important issues of our time.  

