Tutorial ======== Python ------ Model deployment from Python - an easy way Througout this tutorial we are going to build and deploy an Python-Scikit model for online scoring. So, it will be available for quering from other other languages and envornment like R, Java, JS or PHP. Prerequisites ^^^^^^^^^^^^^ In order to start you need: #. Python (version 3) with installed: #. pandas #. scikit-learn #. account at ``_ Create Model ^^^^^^^^^^^^ A simple risk classification model will be built based on germancredit dataset:: import pandas as pd from sklearn.pipeline import Pipeline from sklearn.impute import SimpleImputer as Imputer from sklearn import tree from rtblib.ml import models url="http://freakonometrics.free.fr/german_credit.csv" df = pd.read_csv(url) input_df = df[['Account Balance', 'Payment Status of Previous Credit', 'Purpose', 'Length of current employment', 'Sex & Marital Status']].copy() target = df['Creditability'].copy() tree_opts = {'min_samples_leaf': 30, 'max_features': None, 'max_depth': 10} clf = Pipeline([("imputer", Imputer(strategy="mean")), ("model", tree.DecisionTreeClassifier(**tree_opts))]) clf = clf.fit(input_df, target) mod = models.ModelScikit(clf, list(input_df.columns), 'germancredittree', '1.0', model_type='CLASSIFICATION') models.save('german_credit_tree.mod', mod) Deploy Model ^^^^^^^^^^^^ If you haven't created account at scoringduck.com, it's the time to do it here ``_ Now install the Python client:: pip install http://app.scoringduck.com/static/jaga_client-0.0.1-py3-none-any.whl last lines of output should be similar to:: Installing collected packages: jaga-client Successfully installed jaga-client-0.0.1 Connect to server and check it works: Log in to your account at app.scoringduck.com, find your token - it is used as third argument during JagaClient object creation:: from jagaclient import JagaClient jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY") ret_list = jg.list_models() print(ret_list) Output is similar to:: [{'name': 'german_credit_logit', 'version': '1', 'modeltype': 'R', 'uploaddatetime': '2020-12-18 11:27:01', 'isdefault': True, 'isarchived': False, 'publicaccess': False}] There is single model (named german_credit_logit) provided by default during account creation, deploy previously created model:: result = jg.deploy("germancredittree", "1", "german_credit_tree.mod") print(result) Output should be similar to:: {'model': 'germancredittree', 'version': '1', 'msg': 'New version of model germancredittree 1 deployed', 'elapsed': 0.10997330000100192} deploy raises RuntimeError if something went wrong Check it's there:: ret_list = jg.list_models() print(ret_list) Output is similar to:: [{'name': 'germancredittree', 'version': '1', 'modeltype': 'Scikit', 'uploaddatetime': '2020-12-18 13:36:05', 'isdefault': True, 'isarchived': False, 'publicaccess': False}, {'name': 'german_credit_logit', 'version': '1', 'modeltype': 'R', 'uploaddatetime': '2020-12-18 11:27:01', 'isdefault': True, 'isarchived': False, 'publicaccess': False}] You have just deployed the model to the scoringduck scoring engine. Now it's available for querying from various tools/environments including Python. Score data ^^^^^^^^^^ In order to score data just do the following: Let's predict using following data from one of rows of credit: +-----------------------------------+---+ | Account Balance | 4 | +-----------------------------------+---+ | Payment Status of Previous Credit | 3 | +-----------------------------------+---+ | Purpose | 3 | +-----------------------------------+---+ | Length of current employment | 4 | +-----------------------------------+---+ | Sex & Marital Status | 3 | +-----------------------------------+---+ Python client accepts dictionary as data input:: to_score = {"Account Balance": 4, "Payment Status of Previous Credit": 3, "Purpose": 3, "Length of current employment": 4, "Sex & Marital Status": 3} result = jg.score("germancredittree", "1", to_score) print(result) Output is:: {'model': 'germancredittree', 'version': '1', 'result': {'res': 0.9545454545454546}, 'elapsed': 0.013599499998235842} Predicted value is named res Easy right ? The same way one can use the model from other languages/environments. Refer to R tutorial for information regarding installation and scoring using R client R - Based on * ``_ * ``_ Model deployment from R - an easy way Througout this tutorial we are going to build and deploy an R model for online scoring. So, it will be available for quering from other other languages and envornment like Python, Java, JS or PHP. Prerequisites ^^^^^^^^^^^^^ In order to start you need two things: #. R #. account at ``_ Create Model ^^^^^^^^^^^^ A simple risk classification model will be built based on germancredit dataset:: url="http://freakonometrics.free.fr/german_credit.csv" credit=read.csv(url, header = TRUE, sep = ",") i_test=sample(1:nrow(credit),size=333) i_calibration=(1:nrow(credit))[-i_test] logistic_model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = credit[i_calibration,]) saveRDS(logistic_model,"german_credit_logit.rds") Deploy Model ^^^^^^^^^^^^ If you haven't created account at scoringduck.com, it's the time to do it here ``_ Now install the R client:: install.packages(pkgs="http://app.scoringduck.com/static/jagaclient_0.1.0.tar.gz",repos=NULL,type="source") Output should be similar to:: * installing *source* package 'jagaclient' ... ** using staged installation ** R ** byte-compile and prepare package for lazy loading ** help *** installing help indices converting help for package 'jagaclient' finding HTML links ... wykonano connect html deploy html hello html list_models html score html ** building package indices ** testing if installed package can be loaded from temporary location *** arch - i386 *** arch - x64 ** testing if installed package can be loaded from final location *** arch - i386 *** arch - x64 ** testing if installed package keeps a record of temporary installation path * DONE (jagaclient) Check installation - following command should print TRUE:: print("jagaclient" %in% rownames(installed.packages())) Connect to server and check it works: Log in to your account at app.scoringduck.com, find your token - it is used as second argument of connect:: library("jagaclient") conndata <- jagaclient::connect("YOUR_USERNAME","YOUR_API_KEY","http://app.scoringduck.com/") ret_list <- jagaclient::list_models(conndata) print(ret_list) Output is similar to:: name version modeltype uploaddatetime isdefault isarchived publicaccess 1 german_credit_logit 1 R 2020-12-18 11:27:01 TRUE FALSE FALSE There is single model (named german_credit_logit) provided by default during account creation, which is same as one you created. If it would not be provided you would have to deploy it following way:: result <- jagaclient::deploy(conndata, "german_credit_logit", "1", "german_credit_logit.rds") print(result) deploy returns TRUE if everything went correctly. Models are available for querying from various tools/environments including R. Score data ^^^^^^^^^^ In order to score data just do the following: Let's predict using following data from one of rows of credit: +-----------------------------------+---+ | Account.Balance | 4 | +-----------------------------------+---+ | Payment.Status.of.Previous.Credit | 3 | +-----------------------------------+---+ | Purpose | 3 | +-----------------------------------+---+ | Length.of.current.employment | 4 | +-----------------------------------+---+ | Sex...Marital.Status | 3 | +-----------------------------------+---+ Server accepts data in JSON format - use jsonlite to create it:: library(jsonlite) to_score <- list("Account.Balance"=jsonlite::unbox(4), "Payment.Status.of.Previous.Credit"=jsonlite::unbox(3), "Purpose"=jsonlite::unbox(3), "Length.of.current.employment"=jsonlite::unbox(4), "Sex...Marital.Status"=jsonlite::unbox(3)) args <- jsonlite::toJSON(to_score) print(args) Output is:: {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3} Score it:: result <- jagaclient::score(conndata, "german_credit_logit", "1", args) print(result) Output should be similar to:: $model [1] "german_credit_logit" $version [1] "1" $result $result$Account.Balance [1] 4 $result$Payment.Status.of.Previous.Credit [1] 3 $result$Purpose [1] 3 $result$Length.of.current.employment [1] 4 $result$Sex...Marital.Status [1] 3 $result$res [1] 0.9053125 $elapsed [1] 0.06597743 Predicted value is named res Easy right ? The same way one can use the model from other languages/environments. For example you can score data sent from Python script:: from jagaclient import JagaClient jg = JagaClient("http://app.scoringduck.com/", "YOUR_USERNAME", "YOUR_API_KEY") args = {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3} result = jg.score("german_credit_logit", "1", args) print(result) Output should be similar to:: {'model': 'german_credit_logit', 'version': '1', 'result': {'Account.Balance': 4, 'Payment.Status.of.Previous.Credit': 3, 'Purpose': 3, 'Length.of.current.employment': 4, 'Sex...Marital.Status': 3, 'res': 0.9053124547915977}, 'elapsed': 0.018342145998758497} PMML ---- ScoringDuck does support *Predictive Model Markup Language*, therefore any tool for models creating, which is able to export model to said format might be used. For example sake model described in R section will be used. Prerequisites ^^^^^^^^^^^^^ In order to start you need following things: #. R with installed and working ``_ #. account at ``_ Create Model ^^^^^^^^^^^^ A simple risk classification model will be built based on germancredit dataset:: library("r2pmml") url="http://freakonometrics.free.fr/german_credit.csv" credit=read.csv(url, header = TRUE, sep = ",") credit$Creditability=factor(credit$Creditability) i_test=sample(1:nrow(credit),size=333) i_calibration=(1:nrow(credit))[-i_test] logistic_model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = credit[i_calibration,]) r2pmml(logistic_model,"german_credit_logit.pmml") Note that unlike in ``R`` example ``Creditability`` is converted from numeric to factor. Failure to observe this step will result in ``r2pmml`` failure. Deploy Model ^^^^^^^^^^^^ If you already have pmml file with model you can deploy it without installing any additional software. Just log in ``_ then click *Deploy models* and use form. Clients too might be used for deploying in same way like for theirs format (for information regarding client installation refer to relevant piece of Python section or R section). R client usage in this case is as follows:: library("jagaclient") conndata <- jagaclient::connect("YOUR_USERNAME","YOUR_API_KEY","http://app.scoringduck.com/") result <- jagaclient::deploy(conndata, "german_credit_logit", "2", "german_credit_logit.pmml") print(result) Python client usage in this case is as follows:: from jagaclient import JagaClient jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY") print(ret_list) result = jg.deploy("german_credit_logit", "2", "german_credit_logit.pmml") print(result) Remember to specify version which does not exist so far. Here ``2`` is used to avoid conflict with model described in R section. Score data ^^^^^^^^^^ It is possible to score without installing any additional software. Just log in ``_ then click *Manage* and then *Query* desired model. Clients too might be used for scoring, R client usage example:: library(jsonlite) to_score <- list("Account.Balance"=jsonlite::unbox(4), "Payment.Status.of.Previous.Credit"=jsonlite::unbox(3), "Purpose"=jsonlite::unbox(3), "Length.of.current.employment"=jsonlite::unbox(4), "Sex...Marital.Status"=jsonlite::unbox(3)) result <- jagaclient::score(conndata, "german_credit_logit", "2", args) print(result) Output should be similar to:: $model [1] "german_credit_logit" $version [1] "2" $result $result$`probability(0)` [1] 0.08665164 $result$`probability(1)` [1] 0.9133484 $elapsed [1] 0.5412666 Python client usage example:: from jagaclient import JagaClient jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY") args = {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3} result = jg.score("german_credit_logit","2",args) print(result) Output should be similar to:: {'model': 'german_credit_logit', 'version': '2', 'result': {'probability(0)': 0.0866516359066355, 'probability(1)': 0.9133483640933645}, 'elapsed': 0.6012406999998348} Remember to specify same version as during Deploy Model. For more detailed explanation of clients usage refer to *Score data* in Python section or R section. Or directly from bash console: TODO: bash script here