OVERVIEW
The bridge between Tableau and Python was constructed utilizing tabpy in my previous post. We know already that Python leverages up Tableau possibilities allowing us to do heavy math calculations. Besides that, TabPy allow you to deploy this calculation into Python, update the calculations and control versions of your models. It is a crucial part of data science and/or machine learning models. If the model is ending up resting on your local computer and does not provide the current prediction and quality of your model to management and stakeholders it is worth nothing. In this post, I will go over an example showing you how to deploy a machine learning model in Tableau.
Deploying the Model into Tableau
First thing first, make sure that tabpy server is running. In the Anaconda Powershell Prompt (console), type tabpy. The following output you should expect:
After that, open Tableau and check the connection to TabPy as it was described in Part 1. Then, for simplicity, let’s consider the well-known Titanic dataset. I will not show here how to model the prediction of the survived passengers on Titanic because a reader can find a number of the solution on the Internet. I just would like to mention that the dataset was split into train and test datasets. The model was trained on the training dataset using the DecisionTreeClassifier of the sklearn Python library. Finally, the created model was dumped with the pickle Python library.
In this article, we will continue to operate with the test dataset and it is uploaded into Tableau. We need to understand that this dataset has undergone the feature engineering process and differs from the original dataset:
Then, we need to write a Python script to deploy our model:
import tabpy_clientfrom tabpy.tabpy_tools.client import Clientimport pandas as pdimport picklefrom sklearn.metrics import accuracy_scoreclient = tabpy_client.Client(‘http://localhost:9004/’)loaded_model = pickle.load(open(‘./titanic_model_decision_tree.sav’, ‘rb’))def titanic_survival_predictor(_arg1, _arg2, _arg3, _arg4,_arg5, _arg6, _arg7, _arg8, _arg9, _arg10,_arg11, _arg12, _arg13):#Get the new app’s data in a dictionarydata = {‘Pclass’: _arg1,‘Sex’: _arg2,‘SibSp’: _arg3,‘Parch’: _arg4,‘Fare’: _arg5,‘Embarked’: _arg6,‘Ticket_type’: _arg7,‘Name_Words_Count’: _arg8,‘Has_Cabin’: _arg9,‘FamilySize’ : _arg10,‘CategoricalFare’: _arg11,‘CategoricalAge’: _arg12,‘Title’ : _arg13}#Convert it into a dataframetest_df = pd.DataFrame(data = data)return loaded_model.predict(test_df).tolist()def titanic_survival_accuracy(_arg1, _arg2, _arg3, _arg4,_arg5, _arg6, _arg7, _arg8, _arg9, _arg10,_arg11, _arg12, _arg13, _arg14):#Get the new app’s data in a dictionarydata = {‘Pclass’: _arg1,‘Sex’: _arg2,‘SibSp’: _arg3,‘Parch’: _arg4,‘Fare’: _arg5,‘Embarked’: _arg6,‘Ticket_type’: _arg7,‘Name_Words_Count’: _arg8,‘Has_Cabin’: _arg9,‘FamilySize’ : _arg10,‘CategoricalFare’: _arg11,‘CategoricalAge’: _arg12,‘Title’ : _arg13}y_true = _arg14#Convert it into a dataframetest_df = pd.DataFrame(data = data)accuracy = accuracy_score(y_true, loaded_model.predict(test_df))return round(accuracy, 2)client.deploy(‘titanic_survival_predictor’, titanic_survival_predictor, ‘Predicts titanic survived passagers’, override = True)client.deploy(‘titanic_survival_accuracy’, titanic_survival_accuracy, ‘titanic_survival_accuracy’, override = True)print(client.get_endpoints())
You can see that we created the client instance which carries information about the host and port of TabPy. Our model was loaded using pickle and the instance of this model is a global variable. After that two functions were created which are quite similar to each other. The first function is titanic_survival_accuracy and it accepts 13 arguments _arg1 to _arg13. These arguments are simply features of our model which are passed from the calculated field in Tableau to TabPy where the calculations are actually taken place. The function returns a vector of our target variables which is passed back to Tableau. The short script in the Tabelau calculation field is required:
SCRIPT_INT(‘return tabpy.query(“titanic_survival_predictor”,_arg1, _arg2, _arg3, _arg4,_arg5, _arg6, _arg7, _arg8, _arg9, _arg10,_arg11, _arg12, _arg13)[“response”] ‘, ATTR([Pclass]), ATTR([Sex]), ATTR([Sib Sp]), ATTR([Parch]), ATTR([Fare]),
ATTR([Embarked]), ATTR([Ticket type]), ATTR([Name Words Count]), ATTR([Has Cabin]),
ATTR([Family Size]), ATTR([Categorical Fare]), ATTR([Categorical Age]), ATTR([Title]))
In this script, we are passing to the function 13 arguments described above using the tabpy.query method. For doing so, we are using SCRIPT_INT() function in the Calculated Field. This function accepts the Python script which will be interpreted in TabPy and it also accepted aggregated fields that represent our feature in this case. The aggregation function is ATTR(). It is important to note that besides SCRIPT_INT() which returns an integer, there are three more functions like SCRIPT_STR(), SCRIPT_BOOL() and SCRIPT_REAL() and return a string, boolean or real values. Finally, we obtain the following table in Tableau:
We can see, that Passenger Id is matched to the known Survived value and predicted value by the model (Survived_prediction). The table also provides information on whether the prediction is correct for the passenger or not. In addition, it is important to understand the quality of the model and it is calculated in the titanic_survival_accuracy function.
SCRIPT_REAL(‘return tabpy.query(“titanic_survival_accuracy”, _arg1, _arg2, _arg3, _arg4,_arg5, _arg6, _arg7, _arg8, _arg9, _arg10,_arg11, _arg12, _arg13, _arg14)[“response”]’, ATTR([Pclass]), ATTR([Sex]), ATTR([Sib Sp]), ATTR([Parch]), ATTR([Fare]),
ATTR([Embarked]), ATTR([Ticket type]), ATTR([Name Words Count]), ATTR([Has Cabin]),
ATTR([Family Size]), ATTR([Categorical Fare]), ATTR([Categorical Age]), ATTR([Title]), ATTR([Survived_true]))

We obtained decent accuracy for the model. The goal of this article is not to get the best possible result of the accuracy but to show the capability of Tableau extended by TabPy and Python. So, we will not optimize our model further.
CONCLUSION
In this article, a short introduction to the deployment of the data science model into Tableau is shown. The prediction was calculated and compared to the true value as well as the accuracy of the model. Ideally, this Tableau dashboard is posted on the Tableau server and provided with the new data for prediction on a regular basis. Then, the prediction can be analysed by different departments in a company and certain measures can be taken towards the clients who may be overdue their credit monthly payment. Finally, we can also monitor the model performance and if it is declined then we can retrain the model on the new dataset without any major changes in Tableau.