RASON Analytics API Help

Partitioning Example

Now let's take a look at a partitioning example.


{
  "modelName": "Partitioning",
  "modelDescription": "transformation: partitioning",
  "modelType": "datamining",
  "datasources": {
    "mySrc": {
      "type": "csv",
      "connection": "hald-small-binary.txt",
      "direction": "import"
    }   
  },
  "datasets": {
    "myData": {
      "binding": "mySrc"
    }
  },
  "transformer": {
    "myPartitioner": {
      "type": "transformation", "algorithm": "partitioning",
      "parameters": {
        "partitionMethod": 'RANDOM',
        "ratios": [
          [ "training", 0.5 ],
          [ "validation", 0.3 ],
          [ "test", 0.2 ]
        ],
        "seed": 123
      }
    }
  },
  "actions": {
    "partitions": {
      "data": "myData",
      "action": 'transform',
      "evaluations": [ 'transformation' ]
    }
  }
}

The "datasources" section in this example is identical to the previous Sampling Example. Inside of "datasets", the datasource "mySrc" is bound to the "myData" dataset. Since partitioning performs a transformation of the data, the "transformer" attribute is used with type "transformation" and algorithm "partitioning". Within "parameters", "RANDOM" is specified for "partitionMethod" which selects random partitioning as the type of partitioning to be performed.

In simple random sampling, every observation in the main dataset has equal probability of being selected for the partition dataset. For example, if you specify 60% for the training dataset, then 60% of the total observations are randomly selected for the training dataset. In other words, each observation has a 60% chance of being selected. Random partitioning uses the system clock as a default to initialize the random number seed. Alternatively, the random seed can be manually set which will result in the same observations being chosen for the training/validation/test sets each time a standard partition is created.

In this example, 50% of the records will be included in the training partition, 30% will be included in the validation partition and 20% will be included in the test partition.

Within "actions", the partitioning (or transformation) is performed on the MyData data set. The returned result will be the three different partitions: training, validation and test.


  Getting model results: GET https://rason.net/api/model/2590+Partitioning+2020-01-20-01-18-37-436902/result
  {
   "status": {
      "id": "2590+Partitioning+2020-01-20-01-18-37-436902",
      "code":0,
      "codeText":"Success"
  },
  "results":["partitions.transformation"], 
  "partitions":{
   "transformation": {
     "objectType": "dataFrameVector",
     "name": "myData - Partitioned",
     "data": {
        "training": {
           "objectType": "dataFrame",
           "name": "training",
           "order": "col",
           "rowNames": ["Record 1", "Record 12", "Record 6", "Record 13", "Record  9", "Record 4", "Record 2"],
           "colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
           "colTypes": ["double", "double", "double", "double", "double", "double"],
           "indexCols": null,
           "data": [
              [0, 1, 1, 1, 0, 0, 0] ,
              [7, 11, 11, 10, 2, 11, 1],
              [26, 66, 55, 68, 54, 31, 29],
              [6, 9, 9, 8, 18, 8, 15],
              [60, 12, 22, 12, 22, 47, 52],
              [1, 1, 1, 1, 1, 2, 3]
           ]
     },
     "validation": {
           "objectType": "dataFrame",
           "name": "validation",
           "order": "col",
           "rowNames": ["Record 11", "Record 3", "Record 10", "Record 7"],
           "colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
           "colTypes": ["double", "double", "double", "double", "double", "double"],
           "indexCols": null,
           "data": [
             [0, 1, 1, 1],
             [1, 11, 21, 3],
             [40, 56, 47, 71],
             [23, 8, 4, 17],
             [34, 20, 26, 6],
             [3, 2, 1, 1]
           ]
     },
     "testing": {
           "objectType": "dataFrame",
           "name": "testing",
           "rowNames": ["Record 5", "Record 8"],
           "colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
           "colTypes": ["double", "double", "double", "double", "double", "double"],
           "indexCols": null,
           "data": [
              [1, 0],
              [7, 1],
              [52, 31],
              [6, 22],
              [33, 44],
              [1, 2]
           ]
        }
      }
    }
  }
}

From the results, we can see the records allocated to the training, validation and test partitions.

Training Partition Results
Index	Y	X1	X2	X3	X4	Weights
1	0	7	26	6	60	1
12	1	11	66	9	12	1
6	1	11	55	9	22	1
13	1	10	68	8	12	1
9	0	2	54	18	22	1
4	0	11	31	8	47	2
2	0	1	29	15	52	3

Validation Partition Results
Index	Y	X1	X2	X3	X4	Weights
11	0	1	40	23	34	3
3	1	11	56	8	20	2
10	1	21	47	4	26	1
7	1	3	71	17	6	1

Test Partition Results
Index	Y	X1	X2	X3	X4	Weights
5	1	7	52	6	33	1
8	0	1	31	22	44	2

Back to Sampling Example

Continue to Feature Selection Example

RASON Analytics API Help

Download RASON User Guide

Download RASON Reference

Introduction to RASON

About RASON Models and the RASON Server

Rason Subscriptions

Rason Web IDE

Creating and Running a Decision Flow

Defining Your Optimization Model

Defining Your Simulation Model

Performing Sensitivity Analysis

Defining Your Stochastic Optimization Model

Defining Your Data Science Model

Defining Custom Types

Defining Custom Functions

Defining Your Decision Table

Defining Contexts

Using the REST API

REST API Quick Call Endpoints

REST API Endpoints

Decision Flow REST API Endpoints

OData Endpoints

OData Service for Decision Flows

Creating Your Own Application

Using Arrays, For, Loops and Tables

Organization Accounts

Partitioning Example