Introduction to RASON
About RASON Models and the RASON Server
Rason Subscriptions
Rason Web IDE
Creating and Running a Decision Flow
Defining Your Optimization Model
Defining Your Simulation Model
Performing Sensitivity Analysis
Defining Your Stochastic Optimization Model
Defining Your Data Science Model
Defining Custom Types
Defining Custom Functions
Defining Your Decision Table
Defining Contexts
Using the REST API
REST API Quick Call Endpoints
REST API Endpoints
Decision Flow REST API Endpoints
OData Endpoints
OData Service for Decision Flows
Creating Your Own Application
Using Arrays, For, Loops and Tables
Organization Accounts

Partitioning Example

Now let's take a look at a partitioning example.


{
  "modelName": "Partitioning",
  "modelDescription": "transformation: partitioning",
  "modelType": "datamining",
  "datasources": {
    "mySrc": {
      "type": "csv",
      "connection": "hald-small-binary.txt",
      "direction": "import"
    }   
  },
  "datasets": {
    "myData": {
      "binding": "mySrc"
    }
  },
  "transformer": {
    "myPartitioner": {
      "type": "transformation", "algorithm": "partitioning",
      "parameters": {
        "partitionMethod": 'RANDOM',
        "ratios": [
          [ "training", 0.5 ],
          [ "validation", 0.3 ],
          [ "test", 0.2 ]
        ],
        "seed": 123
      }
    }
  },
  "actions": {
    "partitions": {
      "data": "myData",
      "action": 'transform',
      "evaluations": [ 'transformation' ]
    }
  }
}
 

The "datasources" section in this example is identical to the previous Sampling Example. Inside of "datasets", the datasource "mySrc" is bound to the "myData" dataset. Since partitioning performs a transformation of the data, the "transformer" attribute is used with type "transformation" and algorithm "partitioning". Within "parameters", "RANDOM" is specified for "partitionMethod" which selects random partitioning as the type of partitioning to be performed.

In simple random sampling, every observation in the main dataset has equal probability of being selected for the partition dataset. For example, if you specify 60% for the training dataset, then 60% of the total observations are randomly selected for the training dataset. In other words, each observation has a 60% chance of being selected. Random partitioning uses the system clock as a default to initialize the random number seed. Alternatively, the random seed can be manually set which will result in the same observations being chosen for the training/validation/test sets each time a standard partition is created.

In this example, 50% of the records will be included in the training partition, 30% will be included in the validation partition and 20% will be included in the test partition.

Within "actions", the partitioning (or transformation) is performed on the MyData data set. The returned result will be the three different partitions: training, validation and test.


  Getting model results: GET https://rason.net/api/model/2590+Partitioning+2020-01-20-01-18-37-436902/result
  {
   "status": {
      "id": "2590+Partitioning+2020-01-20-01-18-37-436902",
      "code":0,
      "codeText":"Success"
  },
  "results":["partitions.transformation"], 
  "partitions":{
   "transformation": {
     "objectType": "dataFrameVector",
     "name": "myData - Partitioned",
     "data": {
        "training": {
           "objectType": "dataFrame",
           "name": "training",
           "order": "col",
           "rowNames": ["Record 1", "Record 12", "Record 6", "Record 13", "Record  9", "Record 4", "Record 2"],
           "colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
           "colTypes": ["double", "double", "double", "double", "double", "double"],
           "indexCols": null,
           "data": [
              [0, 1, 1, 1, 0, 0, 0] ,
              [7, 11, 11, 10, 2, 11, 1],
              [26, 66, 55, 68, 54, 31, 29],
              [6, 9, 9, 8, 18, 8, 15],
              [60, 12, 22, 12, 22, 47, 52],
              [1, 1, 1, 1, 1, 2, 3]
           ]
     },
     "validation": {
           "objectType": "dataFrame",
           "name": "validation",
           "order": "col",
           "rowNames": ["Record 11", "Record 3", "Record 10", "Record 7"],
           "colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
           "colTypes": ["double", "double", "double", "double", "double", "double"],
           "indexCols": null,
           "data": [
             [0, 1, 1, 1],
             [1, 11, 21, 3],
             [40, 56, 47, 71],
             [23, 8, 4, 17],
             [34, 20, 26, 6],
             [3, 2, 1, 1]
           ]
     },
     "testing": {
           "objectType": "dataFrame",
           "name": "testing",
           "rowNames": ["Record 5", "Record 8"],
           "colNames": ["Y", "X1", "X2", "X3", "X4", "Weights"],
           "colTypes": ["double", "double", "double", "double", "double", "double"],
           "indexCols": null,
           "data": [
              [1, 0],
              [7, 1],
              [52, 31],
              [6, 22],
              [33, 44],
              [1, 2]
           ]
        }
      }
    }
  }
}
 

From the results, we can see the records allocated to the training, validation and test partitions.

Training Partition Results
Index Y X1 X2 X3 X4 Weights
1 0 7 26 6 60 1
12 1 11 66 9 12 1
6 1 11 55 9 22 1
13 1 10 68 8 12 1
9 0 2 54 18 22 1
4 0 11 31 8 47 2
2 0 1 29 15 52 3
Validation Partition Results
Index Y X1 X2 X3 X4 Weights
11 0 1 40 23 34 3
3 1 11 56 8 20 2
10 1 21 47 4 26 1
7 1 3 71 17 6 1

Test Partition Results
Index Y X1 X2 X3 X4 Weights
5 1 7 52 6 33 1
8 0 1 31 22 44 2