Warning: Invalid argument supplied for foreach() in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 95
Warning: array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 102
Many fail to understand that generative AI isn’t just a subfield within data science. Thanks to an expanding list of tools, it has become a must-have skill. You may be asking yourself, why that’s the case? Or you may not even believe you don’t need generative... Read more
We’re hearing a lot about large language models, or LLMs recently in the news. If you don’t know, LLMs are a type of artificial intelligence that is trained on massive amounts of text data. This allows them to generate text that is often indistinguishable from human-written... Read more
Generative AI is an innovative technology that excels at creating something new from a set of inputs and has taken a bold step into the world of data. It’s a tool capable of generating realistic text, producing creative artwork, or simulating real-world scenarios. Today, its role... Read more
Conversational artificial intelligence has been around for almost 60 years now. Its first application was developed at the Massachusetts Institute of Technology in 1966, well before the dawn of personal computers. The typical application familiar to readers is much more recent, when AI operates as chatbots,... Read more
We already know a single decision tree can work surprisingly well. The idea of constructing a forest from individual trees seems like the natural next step. Today you’ll learn how the Random Forest classifier works and implement it from scratch in Python. This is the sixth of many... Read more
TLDR AUC is a good starting metric when comparing the performance of two models but it does not always tell the whole story NRI looks at the new models ability to correctly reclassify cancers and benigns and should be used alongside AUC IDI quantifies improvement of the slopes of... Read more
Customer lifetime value (CLV) is the total worth of a customer to a company over the length of their relationship. The collective CLV of a company’s customer base reflects its economic value and is often measured to evaluate its future prospects. While many ways to estimate... Read more
Combinations and permutations are common throughout mathematics and statistics, hence are a useful concept that we data scientists should know. In this post, I want to discuss the difference between the two, the difference between the two, and also how one would calculate them for some given data.... Read more
We often need to encode text data, including words, sentences, or documents into high-dimensional vectors. The sentence embedding is an important step in various NLP tasks such as sentiment analysis and extractive summarization. A flexible sentence embedding library is needed to prototype fast and to tune for... Read more
Interpretable machine learning techniques are becoming more popular among the data science community as more and more complex machine learning algorithms are adopted which are not easily interpretable. Model-Agnostic Interpretation techniques do not care about the underlying models, but they have the capability to interpret the... Read more