Tutorial
========


Python
------


Model deployment from Python - an easy way Througout this tutorial we are going to build and deploy an Python-Scikit model for online scoring. So, it will be available for quering from other other languages and envornment like R, Java, JS or PHP.


Prerequisites
^^^^^^^^^^^^^


In order to start you need:

#. Python (version 3) with installed:

   #. pandas
   #. scikit-learn

#. account at `<http://app.scoringduck.com/>`_


Create Model
^^^^^^^^^^^^


A simple risk classification model will be built based on germancredit dataset::

    import pandas as pd
    from sklearn.pipeline import Pipeline
    from sklearn.impute import SimpleImputer as Imputer
    from sklearn import tree
    from rtblib.ml import models

    url="http://freakonometrics.free.fr/german_credit.csv"

    df = pd.read_csv(url)

    input_df = df[['Account Balance', 'Payment Status of Previous Credit', 'Purpose', 'Length of current employment', 'Sex & Marital Status']].copy()

    target = df['Creditability'].copy()

    tree_opts = {'min_samples_leaf': 30, 'max_features': None, 'max_depth': 10}
    clf = Pipeline([("imputer", Imputer(strategy="mean")), ("model", tree.DecisionTreeClassifier(**tree_opts))])
    clf = clf.fit(input_df, target)

    mod = models.ModelScikit(clf, list(input_df.columns), 'germancredittree', '1.0', model_type='CLASSIFICATION')
    models.save('german_credit_tree.mod', mod)


Deploy Model
^^^^^^^^^^^^


If you haven't created account at scoringduck.com, it's the time to do it here `<http://app.scoringduck.com/new_account>`_ Now install the Python client::

    pip install http://app.scoringduck.com/static/jaga_client-0.0.1-py3-none-any.whl

last lines of output should be similar to::

    Installing collected packages: jaga-client
    Successfully installed jaga-client-0.0.1

Connect to server and check it works: Log in to your account at app.scoringduck.com, find your token - it is used as third argument during JagaClient object creation::

    from jagaclient import JagaClient
    jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
    ret_list = jg.list_models()
    print(ret_list)

Output is similar to::

    [{'name': 'german_credit_logit', 'version': '1', 'modeltype': 'R', 'uploaddatetime': '2020-12-18 11:27:01', 'isdefault': True, 'isarchived': False, 'publicaccess': False}]

There is single model (named german_credit_logit) provided by default during account creation, deploy previously created model::

    result = jg.deploy("germancredittree", "1", "german_credit_tree.mod")
    print(result)

Output should be similar to::

    {'model': 'germancredittree', 'version': '1', 'msg': 'New version of model germancredittree 1 deployed', 'elapsed': 0.10997330000100192}

deploy raises RuntimeError if something went wrong Check it's there::

    ret_list = jg.list_models()
    print(ret_list)

Output is similar to::

    [{'name': 'germancredittree', 'version': '1', 'modeltype': 'Scikit', 'uploaddatetime': '2020-12-18 13:36:05', 'isdefault': True, 'isarchived': False, 'publicaccess': False}, {'name': 'german_credit_logit', 'version': '1', 'modeltype': 'R', 'uploaddatetime': '2020-12-18 11:27:01', 'isdefault': True, 'isarchived': False, 'publicaccess': False}]

You have just deployed the model to the scoringduck scoring engine. Now it's available for querying from various tools/environments including Python.


Score data
^^^^^^^^^^


In order to score data just do the following: Let's predict using following data from one of rows of credit:

+-----------------------------------+---+
| Account Balance                   | 4 |
+-----------------------------------+---+
| Payment Status of Previous Credit | 3 |
+-----------------------------------+---+
| Purpose                           | 3 |
+-----------------------------------+---+
| Length of current employment      | 4 |
+-----------------------------------+---+
| Sex & Marital Status              | 3 |
+-----------------------------------+---+

Python client accepts dictionary as data input::

    to_score = {"Account Balance": 4, "Payment Status of Previous Credit": 3, "Purpose": 3, "Length of current employment": 4, "Sex & Marital Status": 3}
    result = jg.score("germancredittree", "1", to_score)
    print(result)

Output is::

    {'model': 'germancredittree', 'version': '1', 'result': {'res': 0.9545454545454546}, 'elapsed': 0.013599499998235842}

Predicted value is named res Easy right ? The same way one can use the model from other languages/environments. Refer to R tutorial for information regarding installation and scoring using R client


R
-


Based on

* `<https://rpubs.com/Hgoswami/368878>`_
* `<https://www.r-bloggers.com/classification-on-the-german-credit-database/>`_

Model deployment from R - an easy way Througout this tutorial we are going to build and deploy an R model for online scoring. So, it will be available for quering from other other languages and envornment like Python, Java, JS or PHP.


Prerequisites
^^^^^^^^^^^^^


In order to start you need two things:

#. R
#. account at `<http://app.scoringduck.com/>`_


Create Model
^^^^^^^^^^^^


A simple risk classification model will be built based on germancredit dataset::

    url="http://freakonometrics.free.fr/german_credit.csv"
    credit=read.csv(url, header = TRUE, sep = ",")
    i_test=sample(1:nrow(credit),size=333)
    i_calibration=(1:nrow(credit))[-i_test]

    logistic_model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = credit[i_calibration,])

    saveRDS(logistic_model,"german_credit_logit.rds")


Deploy Model
^^^^^^^^^^^^


If you haven't created account at scoringduck.com, it's the time to do it here `<http://app.scoringduck.com/new_account>`_ Now install the R client::

    install.packages(pkgs="http://app.scoringduck.com/static/jagaclient_0.1.0.tar.gz",repos=NULL,type="source")

Output should be similar to::

    * installing *source* package 'jagaclient' ...
    ** using staged installation
    ** R
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
      converting help for package 'jagaclient'
        finding HTML links ... wykonano
        connect                                 html  
        deploy                                  html  
        hello                                   html  
        list_models                             html  
        score                                   html  
    ** building package indices
    ** testing if installed package can be loaded from temporary location
    *** arch - i386
    *** arch - x64
    ** testing if installed package can be loaded from final location
    *** arch - i386
    *** arch - x64
    ** testing if installed package keeps a record of temporary installation path
    * DONE (jagaclient)

Check installation - following command should print TRUE::

    print("jagaclient" %in% rownames(installed.packages()))
    
Connect to server and check it works: Log in to your account at app.scoringduck.com, find your token - it is used as second argument of connect::

    library("jagaclient")
    conndata <- jagaclient::connect("YOUR_USERNAME","YOUR_API_KEY","http://app.scoringduck.com/")
    ret_list <- jagaclient::list_models(conndata)
    print(ret_list)    

Output is similar to::

                     name version modeltype      uploaddatetime isdefault isarchived publicaccess
    1 german_credit_logit       1         R 2020-12-18 11:27:01      TRUE      FALSE        FALSE

There is single model (named german_credit_logit) provided by default during account creation, which is same as one you created. If it would not be provided you would have to deploy it following way::

    result <- jagaclient::deploy(conndata, "german_credit_logit", "1", "german_credit_logit.rds")
    print(result)    

deploy returns TRUE if everything went correctly. Models are available for querying from various tools/environments including R.


Score data
^^^^^^^^^^


In order to score data just do the following: Let's predict using following data from one of rows of credit:

+-----------------------------------+---+
| Account.Balance                   | 4 |
+-----------------------------------+---+
| Payment.Status.of.Previous.Credit | 3 |
+-----------------------------------+---+
| Purpose                           | 3 |
+-----------------------------------+---+
| Length.of.current.employment      | 4 |
+-----------------------------------+---+
| Sex...Marital.Status              | 3 |
+-----------------------------------+---+

Server accepts data in JSON format - use jsonlite to create it::

    library(jsonlite)
    to_score <- list("Account.Balance"=jsonlite::unbox(4), "Payment.Status.of.Previous.Credit"=jsonlite::unbox(3), "Purpose"=jsonlite::unbox(3), "Length.of.current.employment"=jsonlite::unbox(4), "Sex...Marital.Status"=jsonlite::unbox(3))
    args <- jsonlite::toJSON(to_score)
    print(args)

Output is::

    {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}

Score it::

    result <- jagaclient::score(conndata, "german_credit_logit", "1", args)
    print(result)

Output should be similar to::

    $model
    [1] "german_credit_logit"

    $version
    [1] "1"

    $result
    $result$Account.Balance
    [1] 4

    $result$Payment.Status.of.Previous.Credit
    [1] 3

    $result$Purpose
    [1] 3

    $result$Length.of.current.employment
    [1] 4

    $result$Sex...Marital.Status
    [1] 3

    $result$res
    [1] 0.9053125


    $elapsed
    [1] 0.06597743

Predicted value is named res Easy right ? The same way one can use the model from other languages/environments. For example you can score data sent from Python script::

    from jagaclient import JagaClient
    jg = JagaClient("http://app.scoringduck.com/", "YOUR_USERNAME", "YOUR_API_KEY")
    args = {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}
    result = jg.score("german_credit_logit", "1", args)
    print(result)

Output should be similar to::

    {'model': 'german_credit_logit', 'version': '1', 'result': {'Account.Balance': 4, 'Payment.Status.of.Previous.Credit': 3, 'Purpose': 3, 'Length.of.current.employment': 4, 'Sex...Marital.Status': 3, 'res': 0.9053124547915977}, 'elapsed': 0.018342145998758497}


PMML
----


ScoringDuck does support *Predictive Model Markup Language*, therefore any tool for models creating, which is able to export model to said format might be used. For example sake model described in R section will be used.


Prerequisites
^^^^^^^^^^^^^


In order to start you need following things:

#. R with installed and working `<https://github.com/jpmml/r2pmml>`_
#. account at `<http://app.scoringduck.com/>`_


Create Model
^^^^^^^^^^^^


A simple risk classification model will be built based on germancredit dataset::

    library("r2pmml")
    url="http://freakonometrics.free.fr/german_credit.csv"
    credit=read.csv(url, header = TRUE, sep = ",")
    credit$Creditability=factor(credit$Creditability)
    i_test=sample(1:nrow(credit),size=333)
    i_calibration=(1:nrow(credit))[-i_test]

    logistic_model <- glm(Creditability ~ Account.Balance + Payment.Status.of.Previous.Credit + Purpose + Length.of.current.employment + Sex...Marital.Status, family=binomial, data = credit[i_calibration,])

    r2pmml(logistic_model,"german_credit_logit.pmml")

Note that unlike in ``R`` example ``Creditability`` is converted from numeric to factor. Failure to observe this step will result in ``r2pmml`` failure.


Deploy Model
^^^^^^^^^^^^


If you already have pmml file with model you can deploy it without installing any additional software. Just log in `<http://app.scoringduck.com/>`_ then click *Deploy models* and use form. Clients too might be used for deploying in same way like for theirs format (for information regarding client installation refer to relevant piece of Python section or R section). R client usage in this case is as follows::
  
    library("jagaclient")
    conndata <- jagaclient::connect("YOUR_USERNAME","YOUR_API_KEY","http://app.scoringduck.com/")
    result <- jagaclient::deploy(conndata, "german_credit_logit", "2", "german_credit_logit.pmml")
    print(result) 
    
Python client usage in this case is as follows::

    from jagaclient import JagaClient
    jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
    print(ret_list)
    result = jg.deploy("german_credit_logit", "2", "german_credit_logit.pmml")
    print(result)

Remember to specify version which does not exist so far. Here ``2`` is used to avoid conflict with model described in R section.


Score data
^^^^^^^^^^


It is possible to score without installing any additional software. Just log in `<http://app.scoringduck.com/>`_ then click *Manage* and then *Query* desired model. Clients too might be used for scoring, R client usage example::

    library(jsonlite)
    to_score <- list("Account.Balance"=jsonlite::unbox(4), "Payment.Status.of.Previous.Credit"=jsonlite::unbox(3), "Purpose"=jsonlite::unbox(3), "Length.of.current.employment"=jsonlite::unbox(4), "Sex...Marital.Status"=jsonlite::unbox(3))
    result <- jagaclient::score(conndata, "german_credit_logit", "2", args)
    print(result)

Output should be similar to::

    $model
    [1] "german_credit_logit"

    $version
    [1] "2"

    $result
    $result$`probability(0)`
    [1] 0.08665164

    $result$`probability(1)`
    [1] 0.9133484


    $elapsed
    [1] 0.5412666

Python client usage example::

    from jagaclient import JagaClient
    jg = JagaClient("http://app.scoringduck.com/","YOUR_USERNAME","YOUR_API_KEY")
    args = {"Account.Balance":4,"Payment.Status.of.Previous.Credit":3,"Purpose":3,"Length.of.current.employment":4,"Sex...Marital.Status":3}
    result = jg.score("german_credit_logit","2",args)
    print(result)

Output should be similar to::

    {'model': 'german_credit_logit', 'version': '2', 'result': {'probability(0)': 0.0866516359066355, 'probability(1)': 0.9133483640933645}, 'elapsed': 0.6012406999998348}

Remember to specify same version as during Deploy Model. For more detailed explanation of clients usage refer to *Score data* in Python section or R section.

Or directly from bash console: TODO: bash script here