Tutorial

Python

Model deployment from Python - an easy way Througout this tutorial we are going to build and deploy an Python-Scikit model for online scoring. So, it will be available for quering from other other languages and envornment like R, Java, JS or PHP.

Prerequisites

In order to start you need:

  1. Python (version 3) with installed:

    1. pandas

    2. scikit-learn

  2. account at http://app.scoringduck.com/

Create Model

A simple risk classification model will be built based on germancredit dataset:

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer as Imputer
from sklearn import tree
from rtblib.ml import models

url="http://freakonometrics.free.fr/german_credit.csv"

df = pd.read_csv(url)

input_df = df[['Account Balance', 'Payment Status of Previous Credit', 'Purpose', 'Length of current employment', 'Sex & Marital Status']].copy()

target = df['Creditability'].copy()

tree_opts = {'min_samples_leaf': 30, 'max_features': None, 'max_depth': 10}
clf = Pipeline([("imputer", Imputer(strategy="mean")), ("model", tree.DecisionTreeClassifier(**tree_opts))])
clf = clf.fit(input_df, target)

mod = models.ModelScikit(clf, list(input_df.columns), 'germancredittree', '1.0', model_type='CLASSIFICATION')
models.save('german_credit_tree.mod', mod)

Deploy Model

If you haven’t created account at scoringduck.com, it’s the time to do it here http://app.scoringduck.com/new_account Now install the Python client:

pip install http://app.scoringduck.com/static/jaga_client-0.0.1-py3-none-any.whl

last lines of output should be similar to:

Installing collected packages: jaga-client
Successfully installed jaga-client-0.0.1

Connect to server and check it works: Log in to your account at app.scoringduck.com, find your token - it is used as third argument during JagaClient object creation:

from jagaclient import JagaClient
jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
ret_list = jg.list_models()
print(ret_list)

Output is similar to:

[{'name': 'german_credit_logit', 'version': '1', 'modeltype': 'R', 'uploaddatetime': '2020-12-18 11:27:01', 'isdefault': True, 'isarchived': False, 'publicaccess': False}]

There is single model (named german_credit_logit) provided by default during account creation, deploy previously created model:

result = jg.deploy("germancredittree", "1", "german_credit_tree.mod")
print(result)

Output should be similar to:

{'model': 'germancredittree', 'version': '1', 'msg': 'New version of model germancredittree 1 deployed', 'elapsed': 0.10997330000100192}

deploy raises RuntimeError if something went wrong Check it’s there:

ret_list = jg.list_models()
print(ret_list)

Output is similar to:

[{'name': 'germancredittree', 'version': '1', 'modeltype': 'Scikit', 'uploaddatetime': '2020-12-18 13:36:05', 'isdefault': True, 'isarchived': False, 'publicaccess': False}, {'name': 'german_credit_logit', 'version': '1', 'modeltype': 'R', 'uploaddatetime': '2020-12-18 11:27:01', 'isdefault': True, 'isarchived': False, 'publicaccess': False}]

You have just deployed the model to the scoringduck scoring engine. Now it’s available for querying from various tools/environments including Python.

Score data

In order to score data just do the following: Let’s predict using following data from one of rows of credit:

Account Balance

4

Payment Status of Previous Credit

3

Purpose

3

Length of current employment

4

Sex & Marital Status

3

Python client accepts dictionary as data input:

to_score = {"Account Balance": 4, "Payment Status of Previous Credit": 3, "Purpose": 3, "Length of current employment": 4, "Sex & Marital Status": 3}
result = jg.score("germancredittree", "1", to_score)
print(result)

Output is:

{'model': 'germancredittree', 'version': '1', 'result': {'res': 0.9545454545454546}, 'elapsed': 0.013599499998235842}

Predicted value is named res Easy right ? The same way one can use the model from other languages/environments. Refer to R tutorial for information regarding installation and scoring using R client

R

Based on

Model deployment from R - an easy way Througout this tutorial we are going to build and deploy an R model for online scoring. So, it will be available for quering from other other languages and envornment like Python, Java, JS or PHP.

Prerequisites

In order to start you need two things:

  1. R

  2. account at http://app.scoringduck.com/

Create Model

A simple risk classification model will be built based on germancredit dataset:

url="http://freakonometrics.free.fr/german_credit.csv"
credit=read.csv(url, header = TRUE, sep = ",")
i_test=sample(1:nrow(credit),size=333)
i_calibration=(1:nrow(credit))[-i_test]

logistic_model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = credit[i_calibration,])

saveRDS(logistic_model,"german_credit_logit.rds")

Deploy Model

If you haven’t created account at scoringduck.com, it’s the time to do it here http://app.scoringduck.com/new_account Now install the R client:

install.packages(pkgs="http://app.scoringduck.com/static/jagaclient_0.1.0.tar.gz",repos=NULL,type="source")

Output should be similar to:

* installing *source* package 'jagaclient' ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
  converting help for package 'jagaclient'
    finding HTML links ... wykonano
    connect                                 html
    deploy                                  html
    hello                                   html
    list_models                             html
    score                                   html
** building package indices
** testing if installed package can be loaded from temporary location
*** arch - i386
*** arch - x64
** testing if installed package can be loaded from final location
*** arch - i386
*** arch - x64
** testing if installed package keeps a record of temporary installation path
* DONE (jagaclient)

Check installation - following command should print TRUE:

print("jagaclient" %in% rownames(installed.packages()))

Connect to server and check it works: Log in to your account at app.scoringduck.com, find your token - it is used as second argument of connect:

library("jagaclient")
conndata <- jagaclient::connect("YOUR_USERNAME","YOUR_API_KEY","http://app.scoringduck.com/")
ret_list <- jagaclient::list_models(conndata)
print(ret_list)

Output is similar to:

                 name version modeltype      uploaddatetime isdefault isarchived publicaccess
1 german_credit_logit       1         R 2020-12-18 11:27:01      TRUE      FALSE        FALSE

There is single model (named german_credit_logit) provided by default during account creation, which is same as one you created. If it would not be provided you would have to deploy it following way:

result <- jagaclient::deploy(conndata, "german_credit_logit", "1", "german_credit_logit.rds")
print(result)

deploy returns TRUE if everything went correctly. Models are available for querying from various tools/environments including R.

Score data

In order to score data just do the following: Let’s predict using following data from one of rows of credit:

Account.Balance

4

Payment.Status.of.Previous.Credit

3

Purpose

3

Length.of.current.employment

4

Sex…Marital.Status

3

Server accepts data in JSON format - use jsonlite to create it:

library(jsonlite)
to_score <- list("Account.Balance"=jsonlite::unbox(4), "Payment.Status.of.Previous.Credit"=jsonlite::unbox(3), "Purpose"=jsonlite::unbox(3), "Length.of.current.employment"=jsonlite::unbox(4), "Sex...Marital.Status"=jsonlite::unbox(3))
args <- jsonlite::toJSON(to_score)
print(args)

Output is:

{"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}

Score it:

result <- jagaclient::score(conndata, "german_credit_logit", "1", args)
print(result)

Output should be similar to:

$model
[1] "german_credit_logit"

$version
[1] "1"

$result
$result$Account.Balance
[1] 4

$result$Payment.Status.of.Previous.Credit
[1] 3

$result$Purpose
[1] 3

$result$Length.of.current.employment
[1] 4

$result$Sex...Marital.Status
[1] 3

$result$res
[1] 0.9053125


$elapsed
[1] 0.06597743

Predicted value is named res Easy right ? The same way one can use the model from other languages/environments. For example you can score data sent from Python script:

from jagaclient import JagaClient
jg = JagaClient("http://app.scoringduck.com/", "YOUR_USERNAME", "YOUR_API_KEY")
args = {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}
result = jg.score("german_credit_logit", "1", args)
print(result)

Output should be similar to:

{'model': 'german_credit_logit', 'version': '1', 'result': {'Account.Balance': 4, 'Payment.Status.of.Previous.Credit': 3, 'Purpose': 3, 'Length.of.current.employment': 4, 'Sex...Marital.Status': 3, 'res': 0.9053124547915977}, 'elapsed': 0.018342145998758497}

PMML

ScoringDuck does support Predictive Model Markup Language, therefore any tool for models creating, which is able to export model to said format might be used. For example sake model described in R section will be used.

Prerequisites

In order to start you need following things:

  1. R with installed and working https://github.com/jpmml/r2pmml

  2. account at http://app.scoringduck.com/

Create Model

A simple risk classification model will be built based on germancredit dataset:

library("r2pmml")
url="http://freakonometrics.free.fr/german_credit.csv"
credit=read.csv(url, header = TRUE, sep = ",")
credit$Creditability=factor(credit$Creditability)
i_test=sample(1:nrow(credit),size=333)
i_calibration=(1:nrow(credit))[-i_test]

logistic_model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = credit[i_calibration,])

r2pmml(logistic_model,"german_credit_logit.pmml")

Note that unlike in R example Creditability is converted from numeric to factor. Failure to observe this step will result in r2pmml failure.

Deploy Model

If you already have pmml file with model you can deploy it without installing any additional software. Just log in http://app.scoringduck.com/ then click Deploy models and use form. Clients too might be used for deploying in same way like for theirs format (for information regarding client installation refer to relevant piece of Python section or R section). R client usage in this case is as follows:

library("jagaclient")
conndata <- jagaclient::connect("YOUR_USERNAME","YOUR_API_KEY","http://app.scoringduck.com/")
result <- jagaclient::deploy(conndata, "german_credit_logit", "2", "german_credit_logit.pmml")
print(result)

Python client usage in this case is as follows:

from jagaclient import JagaClient
jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
print(ret_list)
result = jg.deploy("german_credit_logit", "2", "german_credit_logit.pmml")
print(result)

Remember to specify version which does not exist so far. Here 2 is used to avoid conflict with model described in R section.

Score data

It is possible to score without installing any additional software. Just log in http://app.scoringduck.com/ then click Manage and then Query desired model. Clients too might be used for scoring, R client usage example:

library(jsonlite)
to_score <- list("Account.Balance"=jsonlite::unbox(4), "Payment.Status.of.Previous.Credit"=jsonlite::unbox(3), "Purpose"=jsonlite::unbox(3), "Length.of.current.employment"=jsonlite::unbox(4), "Sex...Marital.Status"=jsonlite::unbox(3))
result <- jagaclient::score(conndata, "german_credit_logit", "2", args)
print(result)

Output should be similar to:

$model
[1] "german_credit_logit"

$version
[1] "2"

$result
$result$`probability(0)`
[1] 0.08665164

$result$`probability(1)`
[1] 0.9133484


$elapsed
[1] 0.5412666

Python client usage example:

from jagaclient import JagaClient
jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
args = {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}
result = jg.score("german_credit_logit","2",args)
print(result)

Output should be similar to:

{'model': 'german_credit_logit', 'version': '2', 'result': {'probability(0)': 0.0866516359066355, 'probability(1)': 0.9133483640933645}, 'elapsed': 0.6012406999998348}

Remember to specify same version as during Deploy Model. For more detailed explanation of clients usage refer to Score data in Python section or R section.

Or directly from bash console: TODO: bash script here