Tutorial¶
Python¶
Model deployment from Python - an easy way Througout this tutorial we are going to build and deploy an Python-Scikit model for online scoring. So, it will be available for quering from other other languages and envornment like R, Java, JS or PHP.
Prerequisites¶
In order to start you need:
Python (version 3) with installed:
pandas
scikit-learn
account at http://app.scoringduck.com/
Create Model¶
A simple risk classification model will be built based on germancredit dataset:
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer as Imputer
from sklearn import tree
from rtblib.ml import models
url="http://freakonometrics.free.fr/german_credit.csv"
df = pd.read_csv(url)
input_df = df[['Account Balance', 'Payment Status of Previous Credit', 'Purpose', 'Length of current employment', 'Sex & Marital Status']].copy()
target = df['Creditability'].copy()
tree_opts = {'min_samples_leaf': 30, 'max_features': None, 'max_depth': 10}
clf = Pipeline([("imputer", Imputer(strategy="mean")), ("model", tree.DecisionTreeClassifier(**tree_opts))])
clf = clf.fit(input_df, target)
mod = models.ModelScikit(clf, list(input_df.columns), 'germancredittree', '1.0', model_type='CLASSIFICATION')
models.save('german_credit_tree.mod', mod)
Deploy Model¶
If you haven’t created account at scoringduck.com, it’s the time to do it here http://app.scoringduck.com/new_account Now install the Python client:
pip install http://app.scoringduck.com/static/jaga_client-0.0.1-py3-none-any.whl
last lines of output should be similar to:
Installing collected packages: jaga-client
Successfully installed jaga-client-0.0.1
Connect to server and check it works: Log in to your account at app.scoringduck.com, find your token - it is used as third argument during JagaClient object creation:
from jagaclient import JagaClient
jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
ret_list = jg.list_models()
print(ret_list)
Output is similar to:
[{'name': 'german_credit_logit', 'version': '1', 'modeltype': 'R', 'uploaddatetime': '2020-12-18 11:27:01', 'isdefault': True, 'isarchived': False, 'publicaccess': False}]
There is single model (named german_credit_logit) provided by default during account creation, deploy previously created model:
result = jg.deploy("germancredittree", "1", "german_credit_tree.mod")
print(result)
Output should be similar to:
{'model': 'germancredittree', 'version': '1', 'msg': 'New version of model germancredittree 1 deployed', 'elapsed': 0.10997330000100192}
deploy raises RuntimeError if something went wrong Check it’s there:
ret_list = jg.list_models()
print(ret_list)
Output is similar to:
[{'name': 'germancredittree', 'version': '1', 'modeltype': 'Scikit', 'uploaddatetime': '2020-12-18 13:36:05', 'isdefault': True, 'isarchived': False, 'publicaccess': False}, {'name': 'german_credit_logit', 'version': '1', 'modeltype': 'R', 'uploaddatetime': '2020-12-18 11:27:01', 'isdefault': True, 'isarchived': False, 'publicaccess': False}]
You have just deployed the model to the scoringduck scoring engine. Now it’s available for querying from various tools/environments including Python.
Score data¶
In order to score data just do the following: Let’s predict using following data from one of rows of credit:
Account Balance |
4 |
Payment Status of Previous Credit |
3 |
Purpose |
3 |
Length of current employment |
4 |
Sex & Marital Status |
3 |
Python client accepts dictionary as data input:
to_score = {"Account Balance": 4, "Payment Status of Previous Credit": 3, "Purpose": 3, "Length of current employment": 4, "Sex & Marital Status": 3}
result = jg.score("germancredittree", "1", to_score)
print(result)
Output is:
{'model': 'germancredittree', 'version': '1', 'result': {'res': 0.9545454545454546}, 'elapsed': 0.013599499998235842}
Predicted value is named res Easy right ? The same way one can use the model from other languages/environments. Refer to R tutorial for information regarding installation and scoring using R client
R¶
Based on
Model deployment from R - an easy way Througout this tutorial we are going to build and deploy an R model for online scoring. So, it will be available for quering from other other languages and envornment like Python, Java, JS or PHP.
Create Model¶
A simple risk classification model will be built based on germancredit dataset:
url="http://freakonometrics.free.fr/german_credit.csv"
credit=read.csv(url, header = TRUE, sep = ",")
i_test=sample(1:nrow(credit),size=333)
i_calibration=(1:nrow(credit))[-i_test]
logistic_model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = credit[i_calibration,])
saveRDS(logistic_model,"german_credit_logit.rds")
Deploy Model¶
If you haven’t created account at scoringduck.com, it’s the time to do it here http://app.scoringduck.com/new_account Now install the R client:
install.packages(pkgs="http://app.scoringduck.com/static/jagaclient_0.1.0.tar.gz",repos=NULL,type="source")
Output should be similar to:
* installing *source* package 'jagaclient' ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
converting help for package 'jagaclient'
finding HTML links ... wykonano
connect html
deploy html
hello html
list_models html
score html
** building package indices
** testing if installed package can be loaded from temporary location
*** arch - i386
*** arch - x64
** testing if installed package can be loaded from final location
*** arch - i386
*** arch - x64
** testing if installed package keeps a record of temporary installation path
* DONE (jagaclient)
Check installation - following command should print TRUE:
print("jagaclient" %in% rownames(installed.packages()))
Connect to server and check it works: Log in to your account at app.scoringduck.com, find your token - it is used as second argument of connect:
library("jagaclient")
conndata <- jagaclient::connect("YOUR_USERNAME","YOUR_API_KEY","http://app.scoringduck.com/")
ret_list <- jagaclient::list_models(conndata)
print(ret_list)
Output is similar to:
name version modeltype uploaddatetime isdefault isarchived publicaccess
1 german_credit_logit 1 R 2020-12-18 11:27:01 TRUE FALSE FALSE
There is single model (named german_credit_logit) provided by default during account creation, which is same as one you created. If it would not be provided you would have to deploy it following way:
result <- jagaclient::deploy(conndata, "german_credit_logit", "1", "german_credit_logit.rds")
print(result)
deploy returns TRUE if everything went correctly. Models are available for querying from various tools/environments including R.
Score data¶
In order to score data just do the following: Let’s predict using following data from one of rows of credit:
Account.Balance |
4 |
Payment.Status.of.Previous.Credit |
3 |
Purpose |
3 |
Length.of.current.employment |
4 |
Sex…Marital.Status |
3 |
Server accepts data in JSON format - use jsonlite to create it:
library(jsonlite)
to_score <- list("Account.Balance"=jsonlite::unbox(4), "Payment.Status.of.Previous.Credit"=jsonlite::unbox(3), "Purpose"=jsonlite::unbox(3), "Length.of.current.employment"=jsonlite::unbox(4), "Sex...Marital.Status"=jsonlite::unbox(3))
args <- jsonlite::toJSON(to_score)
print(args)
Output is:
{"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}
Score it:
result <- jagaclient::score(conndata, "german_credit_logit", "1", args)
print(result)
Output should be similar to:
$model
[1] "german_credit_logit"
$version
[1] "1"
$result
$result$Account.Balance
[1] 4
$result$Payment.Status.of.Previous.Credit
[1] 3
$result$Purpose
[1] 3
$result$Length.of.current.employment
[1] 4
$result$Sex...Marital.Status
[1] 3
$result$res
[1] 0.9053125
$elapsed
[1] 0.06597743
Predicted value is named res Easy right ? The same way one can use the model from other languages/environments. For example you can score data sent from Python script:
from jagaclient import JagaClient
jg = JagaClient("http://app.scoringduck.com/", "YOUR_USERNAME", "YOUR_API_KEY")
args = {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}
result = jg.score("german_credit_logit", "1", args)
print(result)
Output should be similar to:
{'model': 'german_credit_logit', 'version': '1', 'result': {'Account.Balance': 4, 'Payment.Status.of.Previous.Credit': 3, 'Purpose': 3, 'Length.of.current.employment': 4, 'Sex...Marital.Status': 3, 'res': 0.9053124547915977}, 'elapsed': 0.018342145998758497}
PMML¶
ScoringDuck does support Predictive Model Markup Language, therefore any tool for models creating, which is able to export model to said format might be used. For example sake model described in R section will be used.
Prerequisites¶
In order to start you need following things:
R with installed and working https://github.com/jpmml/r2pmml
account at http://app.scoringduck.com/
Create Model¶
A simple risk classification model will be built based on germancredit dataset:
library("r2pmml")
url="http://freakonometrics.free.fr/german_credit.csv"
credit=read.csv(url, header = TRUE, sep = ",")
credit$Creditability=factor(credit$Creditability)
i_test=sample(1:nrow(credit),size=333)
i_calibration=(1:nrow(credit))[-i_test]
logistic_model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = credit[i_calibration,])
r2pmml(logistic_model,"german_credit_logit.pmml")
Note that unlike in R
example Creditability
is converted from numeric to factor. Failure to observe this step will result in r2pmml
failure.
Deploy Model¶
If you already have pmml file with model you can deploy it without installing any additional software. Just log in http://app.scoringduck.com/ then click Deploy models and use form. Clients too might be used for deploying in same way like for theirs format (for information regarding client installation refer to relevant piece of Python section or R section). R client usage in this case is as follows:
library("jagaclient")
conndata <- jagaclient::connect("YOUR_USERNAME","YOUR_API_KEY","http://app.scoringduck.com/")
result <- jagaclient::deploy(conndata, "german_credit_logit", "2", "german_credit_logit.pmml")
print(result)
Python client usage in this case is as follows:
from jagaclient import JagaClient
jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
print(ret_list)
result = jg.deploy("german_credit_logit", "2", "german_credit_logit.pmml")
print(result)
Remember to specify version which does not exist so far. Here 2
is used to avoid conflict with model described in R section.
Score data¶
It is possible to score without installing any additional software. Just log in http://app.scoringduck.com/ then click Manage and then Query desired model. Clients too might be used for scoring, R client usage example:
library(jsonlite)
to_score <- list("Account.Balance"=jsonlite::unbox(4), "Payment.Status.of.Previous.Credit"=jsonlite::unbox(3), "Purpose"=jsonlite::unbox(3), "Length.of.current.employment"=jsonlite::unbox(4), "Sex...Marital.Status"=jsonlite::unbox(3))
result <- jagaclient::score(conndata, "german_credit_logit", "2", args)
print(result)
Output should be similar to:
$model
[1] "german_credit_logit"
$version
[1] "2"
$result
$result$`probability(0)`
[1] 0.08665164
$result$`probability(1)`
[1] 0.9133484
$elapsed
[1] 0.5412666
Python client usage example:
from jagaclient import JagaClient
jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
args = {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}
result = jg.score("german_credit_logit","2",args)
print(result)
Output should be similar to:
{'model': 'german_credit_logit', 'version': '2', 'result': {'probability(0)': 0.0866516359066355, 'probability(1)': 0.9133483640933645}, 'elapsed': 0.6012406999998348}
Remember to specify same version as during Deploy Model. For more detailed explanation of clients usage refer to Score data in Python section or R section.
Or directly from bash console: TODO: bash script here