SHAP is one of the most well-known and commonly used model explainability libraries. In this series of posts, we’ll explain its theoretical basis, some fundamentals about its implementation and we’ll dig deep into some advanced usage points.
Introduction In our daily life we are used to creating models for every challenge we face. In many cases, these models are complex and we cannot easily analyze their behavior. This is the case, for example, of neural networks or, more generally, black box models.
1 2 3 4 5 6 7 8 9 10 11 12 13 %matplotlib notebook import os import pandas as pd import numpy as np import seaborn as sns import matplotlib as mlp import matplotlib.pyplot as plt from sklearn.preprocessing import MinMaxScaler from sklearn.cluster import KMeans, DBSCAN, SpectralClustering from sklearn.decomposition import PCA from sklearn.neighbors import NearestNeighbors pd.set_option('display.max_columns', None) In this post we will analyze some clustering models, and the importance of understanding and interpreting the model aiming for the best possible performance.
In machine learning projects we often want to deploy APIs to serve the client. Before deploying an API in production, one of the things we want to check is the performance in a load test in which the API has to support several concurrent users/requests. There are many tools that allow this type of tests, Apache JMeter, K6, Gatling, Loader etc, although in many cases they involve using another programming language such as Java or JavaScript, writing the tests in XML, or they are paid tools.
In this post we will explain what time series are, and why their analysis and prediction is a particular case of Machine Learning problems.
Afterwards, we will use FB Prophet to model a particular case. For that, we will use data from a shipping company with different delivery stations, where clients leave packages (letters, boxes, etc.) for their shipment. The series represents the total number of packages received each day by the whole network of stations.
In this article, we will see why data analysis is such an impactful part of any data project. Starting from a public Telco Churn dataset, we will go through the main steps to perform an insightful data analysis for every step of the project:
Assessing project feasibility Assessing model performance Monitoring production models Introduction to our Customer Churn project Customer churn (also known as attrition) is defined as the amount of customers who stopped using a service in a given timeframe.
1. Introduction Disclaimer: The purpose of the presentation is to make an introduction to text generation models, specifically GPT-2, and demonstrate their use. In no case is it oriented to the generation or dissemination of false information.
In this post we will see how to generate text with models based on the Transformers architecture, and we will use this knowledge to demonstrate how to create fake news. The objective is to demonstrate the operation and use of these models through this practical example.