The software engineering rule of 3
Here’s a dumb extremely accurate rule I’m postulating* for software engineering projects: you need at least 3 examples before you solve the right problem. This is what I’ve noticed: Don’t factor out shared code between two classes. Wait until you have at least three. The two first attempts to solve a problem... Read more
Our research in 2016: personal scientific highlights
Year 2016 has been productive for science in my team. Here are some personal highlights: bridging artificial intelligence tools to human cognition, markers of neuropsychiatric conditions from brain activity at rest, algorithmic speedups for matrix factorization on huge datasets… Artificial-intelligence convolutional networks map well the human visual system Eickenberg et... Read more
Introduction Link to Part 1 Link to Part 2 In this post, we’ll go into summarizing a lot of the new and important developments in the field of computer vision and convolutional neural networks. We’ll look at some of the most important papers that have been published over the... Read more
Seven Python Kernels from Kaggle You Need to See Right Now
The ability to post and share kernels is probably my favorite thing about Kaggle. Learning from other users’ kernels has often provided inspiration for a number of my own projects. I also appreciate the attention to detail and descriptions provided by some users in their code as well. This... Read more
Why I like the Convolution Theorem
The convolution theorem (or theorems: it has versions that some people would call distinct species and other would describe as mere subspecies) is another almost obviously almost true result, this time about asymptotic efficiency. It’s an asymptotic version of the Cramér–Rao bound. Suppose (hattheta) is an efficient estimator of... Read more
A Guide to Gradient Boosted Trees with XGBoost in Python
XGBoost has become incredibly popular on Kaggle in the last year for any problems dealing with structured data. I was already familiar with sklearn’s version of gradient boosting and have used it before, but I hadn’t really considered trying XGBoost instead until I became more familiar with it. I was perfectly happy with sklearn’s... Read more
The inspiration for this post is a joint venture by both me and my husband, and its genesis lies more than 15 years in our past. One of the recurring conversations we have in our relationship (all long-term relationships have these, right?!) is about song lyrics and place names.... Read more
Third batch of notebooks for Think Stats
As I mentioned in the previous post and the one before that, I am getting ready to teach Data Science in the spring, so I am going back through Think Stats and updating the Jupyter notebooks.  I am done with Chapters 1 through 9 now. If you are reading the book, you can get... Read more
A Gentle Introduction to Recommender Systems with Implicit Feedback
Recommender systems have become a very important part of the retail, social networking, and entertainment industries. From providing advice on songs for you to try, suggesting books for you to read, or finding clothes to buy, recommender systems have greatly improved the ability of customers to make choices more... Read more
Time Series Analysis with Generalized Additive Models
Whenever you spot a trend plotted against time, you would be looking at a time series. The de facto choice for studying financial market performance and weather forecasts, time series are one of the most pervasive analysis techniques because of its inextricable relation to time—we are always interested to... Read more