Technical preview: Native GPU programming with CUDAnative.jl
After 2 years of slow but steady development, we would like to announce the first preview release of native GPU programming capabilities for Julia. You can now write your CUDA kernels in Julia, albeit with some restrictions, making it possible to use Julia’s high-level language features to write high-performance... Read more
Faster deep learning with GPUs and Theano
Originally posted by Manojit Nandi, Data Scientist at STEALTHbits Technologies on the Domino data science blog Domino recently added support for GPU instances. To celebrate this release, I will show you how to: Configure the Python library Theano to use the GPU for computation. Build and train neural networks... Read more
Intro to Caret: Pre-Processing
Editor’s note: This is the third of a series of posts on the caret package. Creating Dummy Variables Zero- and Near Zero-Variance Predictors Identifying Correlated Predictors Linear Dependencies The preProcess Function Centering and Scaling Imputation Transforming Predictors Putting It All Together Class Distance Calculations caret includes several functions to pre-process... Read more
Implementing a CNN for Text Classification in Tensorflow
The full code is available on Github. In this post we will implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification. The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become... Read more
Amazon Enters The Open-Source Deep Learning Fray
The Synergy Research Group’s last report of 2015 attributed 31% of the cloud computing market to Amazon’s Amazon Web Services (AWS), nearly four times as much as its nearest competitor, Microsoft. This would come as no surprise to any programmer, Data Engineer, or Data Scientist, AWS is a mainstay... Read more
Google’s TensorFlow framework spread like wildfire upon its release. The slew of tutorials and extensions made an already robust ecosystem even more so. Recently, Google released one of their own extensions. It’s called SyntaxNet, a TensorFlow based syntactic parser for Natural Language Understanding. SyntaxNet uses neural networks to model... Read more
Jupyter, Zeppelin, Beaker: The Rise of the Notebooks
Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programing in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and collaborations that is... Read more
Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programing in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and collaborations that is... Read more
Riding on Large Data with Scikit-learn
What’s a Large Data Set? A data set is said to be large when it exceeds 20% of the available RAM for a single machine. Which for your standard MacBook Pro with 8Gb of RAM, corresponds to a meager 2Gb dataset — size that is becoming more and more... Read more