From 39a7ebd2e19e0297a56ce956694857d93fc95cfa Mon Sep 17 00:00:00 2001 From: Tomas Sherwen Date: Thu, 18 Jul 2019 15:31:48 +0100 Subject: [PATCH 1/5] Updates to point users to use .csv maker in s2s --- README.md | 52 ++-- process_sklearn_models2csv_files.py | 365 ---------------------------- 2 files changed, 28 insertions(+), 389 deletions(-) delete mode 100644 process_sklearn_models2csv_files.py diff --git a/README.md b/README.md index 986c863e..6c84fccc 100644 --- a/README.md +++ b/README.md @@ -1,37 +1,37 @@ -# DOI: https://zenodo.org/record/2579240 + https://zenodo.org/record/2579240 +# TreeSurgeon - Visualisation of Radom Forest Regressor models +*TreeSurgeon* contains routines to visualise Radom Forest Regressor models. The module takes models output files made by [`sklearn`](https://scikit-learn.org/)'s RadomForestRegressor implementation of the random forest regressor algorithm. The raw output files from [`sklearn`](https://scikit-learn.org/) models (`*pkl`) first needs to be converted to the input .csv files required by *TreeSurgeon* using the +extract_models4TreeSurgeon.py script in the +[`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) module. -# Written for usage in: +# Quick Start -## "A machine learning based global sea-surface iodide distribution" +## Running -#### Authors: -Tomás Sherwen (1,2), Rosie J. Chance (2), Liselotte Tinel (2), Daniel Ellis (2), Mat J. Evans (1,2), and Lucy J. Carpenter (2) +- Process the saved Radom Forest Regressor models `*.pkl` files into the `.csv` that *TreeSurgeon* expects using the script in [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) module. You will need to update some lines in the script as described there. +`python extract_models4TreeSurgeon.py` -(1) National Centre for Atmospheric Science, University of York, York, YO10 5DD, UK -(2) Wolfson Atmospheric Chemistry Laboratories, University of York, York, YO10 5DD, UK +- Place files in the [`csv`](https://github.com/wolfiex/TreeSurgeon/tree/master/csv) folder. -#### Citation: -Sherwen, T., Chance, R. J., Tinel, L., Ellis, D., Evans, M. J., and Carpenter, L. J.: A machine learning based global sea-surface iodide distribution, Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2019-40, in review, 2019. - -# Running -Place files in csv folder. +for composite files: +`python start.py $NCPUS` -`python start.py $NCPUS` -for composite files -`python start.py $NCPUS 1 ` -for single dot files +or for single dot files +`python start.py $NCPUS 1 ` -This then runs in the background (no screen). To change edit 'show' option in main.js +- This then runs in the background (no screen). To change edit `show` option in main.js -# Set colours -see colours.json file +## Set colours +The colours are set in the `colours.json` file. -# Output -This is in the pdf folder. +## Output +This is in the [`pdfs`](https://github.com/wolfiex/TreeSurgeon/tree/master/pdfs) folder. -# Install +## Install ``` conda install nodejs npm install @@ -40,13 +40,17 @@ sudo npm install -g --save electron --unsafe-perm=true --allow-root - for merge - have imagemagick and ghostscript installed - -# Montage setup +## Montage setup python montage.py - - ## Example Output for Composite Graph +# Usage + +This package was initially written for use with the [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) package for work to predict sea-surface concentrations ([Sherwen et al. 2019])[https://doi.org/10.5194/essd-2019-40]. However it can be used for any Radom Forest Regressor models made by [`sklearn`](https://scikit-learn.org/) and post-processed to *TreeSurgeon* input by [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) + + +## Citation(s) +Sherwen, T., Chance, R. J., Tinel, L., Ellis, D., Evans, M. J., and Carpenter, L. J.: A machine learning based global sea-surface iodide distribution, Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2019-40, in review, 2019. diff --git a/process_sklearn_models2csv_files.py b/process_sklearn_models2csv_files.py deleted file mode 100644 index 99c73aa3..00000000 --- a/process_sklearn_models2csv_files.py +++ /dev/null @@ -1,365 +0,0 @@ -#!/usr/bin/python -# -*- coding: utf-8 -*- -""" -Function to process sklearn saved RandomForestRegressors to csv files, -which can then be read in by forrester's nope.js plotter functions. - -NOTE: - - The function get_RFR_dictionary is just pseudo code. It will need to updated by the user to provide a dictionary of values/models etc required by the other models provided here. The dictionary values required are stated in get_RFR_dictionary - -""" -from __future__ import print_function -import os -import glob -import numpy as np -import pandas as pd -import matplotlib.pyplot as plt - - -def main(): - """ - Driver to make summary csv files from sklearn RandomForestRegressor models - """ - - # Get dictionaries of feature variables, model names etc... - RFR_dict = get_RFR_dictionary() - - # Extract the pickled sklearn RandomForestRegressor models to .dot files - extract_trees_to_dot_files() - - # Analyse the nodes in the models - # (This calls the main worker function "get_decision_point_and_values_for_tree") - analyse_nodes_in_models( RFR_dict=RFR_dict ) - - -def get_RFR_dictionary(): - """ - Read in RandomForestRegressor variables - - Returns - ------- - (dict) - - Notes - ------- - - This is just pseudo code listing the vaiables that are required to be in the - dictionary - """ - # Setup a dictionary object - RFR_dict = {} - # Add model names and models - # RFR_dict['models_dict'] = {'name of model': model, ...} - # Add testing features for models - # RFR_dict['testing_features_dict'] = {'name of model': testing features of model, ...} - # Add list of the topmodels (models to analyse) - # RFR_dict['topmodels'] = [...] - - return RFR_dict - - -def extract_trees_to_dot_files(folder=None, plot_tree=False, - Name_of_model='Example_model', testing_features=None, max_depth=None): - """ - Extract model trees to .dot files to be plotted in d3 - - Parameters - ------- - folder (str): the folder that the model output *.pkl files are - testing_features (list): list of the testing features in a given model - Name_of_model (str): Name of model in filename, used in read and saving - plot_tree (boolean): plot up the extracted tree - max_depth (int): depth up to which to extract - - Returns - ------- - (None) - """ - from sklearn.externals import joblib - from sklearn import tree - import os - # Get the location of the saved model - model_filename = "my_model_{}.pkl".format( extr_str ) - # open as random forst object ("rf") - rf = joblib.load(folder+model_filename) - # loop trees in forest and save to disk - for n, rf_unit in enumerate( rf ): - out_file='tree_{}_{}.dot'.format( Name_of_model, n ) - tree.export_graphviz(rf_unit, out_file=out_file, max_depth=max_depth, - feature_names=testing_features ) - # Also plot up? - if plot_tree: - os.system('dot -Tpng tree.dot -o tree.png') - - -def analyse_nodes_in_models( RFR_dict=None, depth2investigate=5 ): - """ - Analyse the nodes in a RFR model - - Parameters - ------- - RFR_dict (dictionary): diction of models, model names, features etc - (see get_RFR_dictionary function) - depth2investigate (int): depth up to which to build statistics on - - Returns - ------- - (None) - """ - import glob - # models to analyse? - models2compare = [ ] - topmodels = RFR_dict['topmodels'] - models2compare = topmodels - # Loop and analyse models2compare - for model_name in models2compare: - print( model_name ) - get_decision_point_and_values_for_tree( model_name=model_name, - RFR_dict=RFR_dict, depth2investigate=depth2investigate ) - # Loop and update the variable names - for model_name in models2compare: - print( model_name ) - # Now rename variables in columns - filestr = 'Oi_prj_features_of*{}*{}*.csv' - filestr = filestr.format( model_name, depth2investigate ) - csv_files = glob.glob(filestr) - for csv_file in csv_files: - df = pd.read_csv( csv_file ) - # save the .csv - df.to_csv( csv_file ) - - -def get_decision_point_and_values_for_tree( depth2investigate=3, - model_name='RFR(TEMP+DEPTH+SAL)', RFR_dict=None, verbose=True, - debug=False ): - """ - Get the variables driving decisions at each point - - NOTE: - link: http://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html -# The decision estimator has an attribute called tree_ which stores the entire -# tree structure and allows access to low level attributes. The binary tree -# tree_ is represented as a number of parallel arrays. The i-th element of each -# array holds information about the node `i`. Node 0 is the tree's root. NOTE: -# Some of the arrays only apply to either leaves or split nodes, resp. In this -# case the values of nodes of the other type are arbitrary! -# -# Among those arrays, we have: -# - left_child, id of the left child of the node -# - right_child, id of the right child of the node -# - feature, feature used for splitting the node -# - threshold, threshold value at the node -# - """ - from sklearn.externals import joblib - from sklearn import tree - import os - # extra variables needed from RFR_dict - models_dict = RFR_dict['models_dict'] - testing_features_dict = RFR_dict['testing_features_dict'] - # Extract model from dictionary - model = models_dict[ model_name ] - # Get training_features - training_features = testing_features_dict[ model_name ].split('+') - # Core string for saving data to. - filename_str = 'Oi_prj_features_of_{}_for_depth_{}{}.{}' - # Intialise a DataFrame to store values in - df = pd.DataFrame() - # Loop by estimator in model - for n_estimator, estimator in enumerate( model ): - # Extract core variables of interest - n_nodes = estimator.tree_.node_count - children_left = estimator.tree_.children_left - children_right = estimator.tree_.children_right - feature = estimator.tree_.feature - threshold = estimator.tree_.threshold - n_node_samples = estimator.tree_.n_node_samples - # The tree structure can be traversed to compute various properties such - # as the depth of each node and whether or not it is a leaf. - node_depth = np.zeros(shape=n_nodes, dtype=np.int64) - is_leaves = np.zeros(shape=n_nodes, dtype=bool) - stack = [(0, -1)] # seed is the root node id and its parent depth - # Now extract data - while len(stack) > 0: - node_id, parent_depth = stack.pop() - node_depth[node_id] = parent_depth + 1 - # If we have a test node - if (children_left[node_id] != children_right[node_id]): - stack.append((children_left[node_id], parent_depth + 1)) - stack.append((children_right[node_id], parent_depth + 1)) - else: - is_leaves[node_id] = True - # - work out which nodes are required. - # NOTE: numbering is 1=># of nodes (zero is the first node) - # add the initial node to a dictionary - nodes2save = {} - depth = 0 - n_node = 0 - nodes2save[ depth ] = { n_node: [children_left[0], children_right[0]] } - num2node = {0:0} - # For depth in depths - for depth in range( depth2investigate )[:-1]: - nodes4depth = {} - new_n_node = max( nodes2save[ depth ].keys() )+1 - for n_node in nodes2save[ depth ].keys(): - # Get nodes from the children of each node (LH + RH) - for ChildNum in nodes2save[ depth ][ n_node ]: - # Get the children of this node - LHnew = children_left[ ChildNum ] - RHnew = children_right[ ChildNum ] - # save to temp. dict - nodes4depth[ new_n_node ] = [ LHnew, RHnew ] - # increment the counter and - new_n_node += 1 - # Save the new nodes for depth with assigned number - nodes2save[ depth+1 ] = nodes4depth - # Get node numbers to save as a dict - for d in range( depth2investigate )[1:]: - if debug: print ( d, nodes2save[d] ) - for n in nodes2save[d-1].keys(): - if debug: print( n, nodes2save[d-1][n] ) - for nn in nodes2save[d-1][n] : - newnum = max( num2node.keys() ) +1 - num2node[ newnum ] = nn - # Make a series of values for estimators - s = pd.Series() - for node_num in sorted( num2node.keys() ): - # get index of node of interest - idx = num2node[node_num] - # save threadhold value - var_ = 'N{:0>4}: threshold '.format( node_num ) - s[var_] = threshold[ idx ] - # save feature (and convert index to variable name) - var_ = 'N{:0>4}: feature '.format( node_num ) - s[var_] = training_features[ feature[ idx ] ] - # save feature (and convert index to variable name) - var_ = 'N{:0>4}: n_node_samples '.format( node_num ) - s[var_] = n_node_samples[ idx ] - # save right hand children - var_ = 'N{:0>4}: RH child '.format( node_num ) - s[var_] = children_right[ idx ] - # save the left hand children - var_ = 'N{:0>4}: LH child '.format( node_num ) - s[var_] = children_left[ idx ] - # Also add general details for estimator - s['n_nodes'] = n_nodes - # now save to main DataFrame - df[n_estimator] = s.copy() - # Set index to be the estimator number - df = df.T - # Save the core data on the estimators - filename = filename_str.format( model_name, depth2investigate, '_ALL', '') - df.to_csv( filename+'csv' ) - # --- Print a summary to a file screen - dfs = {} - for node_num in sorted( num2node.keys() ): - # get index of node of interest - idx = num2node[node_num] - vars_ = [i for i in df.columns if 'N{:0>4}'.format(node_num) in i ] - # get values of inteest for nodes - FEATvar = [i for i in vars_ if 'feature' in i][0] - THRESvar = [i for i in vars_ if 'threshold' in i][0] - SAMPLEvar = [i for i in vars_ if 'n_node_samples' in i][0] -# RHChildvar = [i for i in vars_ if 'RH child' in i][0] -# LHChildvar = [i for i in vars_ if 'LH child' in i][0] -# print FEATvar, THRESvar - # Get value counts - val_cnts = df[FEATvar].value_counts() - df_tmp = pd.DataFrame( val_cnts ) - # Store the features and rename the # of tress column - df_tmp['feature'] = df_tmp.index - df_tmp.rename( columns={FEATvar:'# of trees'}, inplace=True ) - # Calc percent - df_tmp['%'] = val_cnts.values / float(val_cnts.sum()) *100. - # Save the children for node -# df_tmp['RH child'] = df[RHChildvar][idx] -# df_tmp['LH child'] = df[LHChildvar][idx] - # intialise series objects to store stats - s_mean = pd.Series() - s_median = pd.Series() - s_std = pd.Series() - node_feats = list(df_tmp.index) - s_samples_mean = pd.Series() - s_samples_median = pd.Series() - # Now loop and get values for features - for feat_ in node_feats: - # - Get threshold value for node + stats on this - thres_val4node = df[THRESvar].loc[ df[FEATvar]==feat_] - # make sure the value is a float - thres_val4node = thres_val4node.astype(np.float) - # convert Kelvin to degrees for readability - if feat_ == 'WOA_TEMP_K': - thres_val4node = thres_val4node -273.15 - # exact stats of interest - stats_ = thres_val4node.describe().T - s_mean[feat_] = stats_['mean'] - s_median[feat_] = stats_['50%'] - s_std[feat_] = stats_['std'] - # - also get avg. samples - sample_val4node = df[SAMPLEvar].loc[ df[FEATvar]==feat_] - # make sure the value is a float - sample_val4node = sample_val4node.astype(np.float) - stats_ = sample_val4node.describe().T - s_samples_mean = stats_['mean'] - s_samples_median = stats_['50%'] - # Add stats to tmp DataFrame - df_tmp['std'] = s_std - df_tmp['median'] = s_median - df_tmp['mean'] = s_mean - # set the depth value for each node_num - if node_num == 0: - depth = node_num - elif node_num in range(1,3): - depth = 1 - elif node_num in range(3,3+(2**2) ): - depth = 2 - elif node_num in range(7,7+(3**2) ): - depth = 3 - elif node_num in range(16,16+(4**2)): - depth = 4 - elif node_num in range(32,32+(5**2)): - depth = 5 - elif node_num in range(57,57+(6**2)): - depth = 6 - elif node_num in range(93,93+(7**2)): - depth = 7 - elif node_num in range(129,129+(8**2)): - depth = 8 - else: - print( 'Depth not setup for > n+8' ) - sys.exit() - df_tmp['depth'] = depth - df_tmp['node #'] = node_num - df_tmp['# samples (mean)'] = s_samples_mean - df_tmp['# samples (median)'] = s_samples_median - # Set the index to just a range - df_tmp.index = range( len(df_tmp.index) ) - # Save to main DataFrame - dfs[node_num] = df_tmp.copy() - # loop and save info to files - filename = filename_str.format( model_name, depth2investigate, '', 'txt') - a = open( filename, 'w' ) - for depth in range(depth2investigate): - # print summary - header = '--- At depth {:0>3}:'.format( depth ) - if verbose: - print( header ) - print( dfs[depth] ) - # save - print( header, file=a) - print( dfs[depth], file=a) - # close file to save data - a.close() - # --- Build a DataFrame with details on a node by node basis - # combine by node - keys = sorted( dfs.keys() ) - dfn = dfs[ keys[0] ].append( [dfs[i] for i in keys[1:] ] ) - # re index and order by - dfn.index = range( len(dfn.index ) ) - dfn.sort_values(by=['node #'], ascending=True, inplace=True) - filename = filename_str.format( model_name, depth2investigate, '', 'csv') - dfn.to_csv( filename ) - - -if __name__ == "__main__": - main() - From 714c9826c4f4cf8155aa4483e0e9e05ff8ea6727 Mon Sep 17 00:00:00 2001 From: Tomas Sherwen Date: Thu, 18 Jul 2019 15:34:32 +0100 Subject: [PATCH 2/5] Minor updates to markdown --- .DS_Store | Bin 14340 -> 0 bytes README.md | 6 +++--- 2 files changed, 3 insertions(+), 3 deletions(-) delete mode 100644 .DS_Store diff --git a/.DS_Store b/.DS_Store deleted file mode 100644 index fb076c38bb89f00ab228fca5a45178ed884254f0..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 14340 zcmeHN3v3+48J^#9$nKt{Mq!m?*+9DNIs0e70D*DaL?s@m_ z910QwMYq!K?9R-;GxN{4v)}jcGR7!6lvv7G6Jt!l&QrYthZ`9S^VGHRa=sUA6OcpL zXJf~?&RCZ9vO_G%Qp_l*U6@gZEN)+pU1bSodhD&?&P+OGru#)L+?+jxecn4_!&zx7 zd7$Khk_Sp2D0!gdf%oG9{_e%5_+(l-mON1MK*<9+5AgLNjGd$6iv}8wF8}p&5_P*EIVXHW0`&YYG*8axX(WCGR;)(Jl1!4 z(irL0t6R);HkM2q8DX4E7^t#;Pcoe_N22CXI$??XbjkunQ5CgU4;($ZXwA|U3&YEr zRvlXyKDuW4+J)h@O{ zwns?caYpzjU3bQ)YQBof$y4;Isne>et7>X$rq|Z^1O6Gk`n0}S{L20e?`?~jGBeTC zkkOrto9WKv)kb&LIFjWjz53L=db^Co!Lh8a(LvOy9?KB-YR?!(dS8Dgnb>jYkY!}M zd-W++JY%L(J54K@O`7Q*S830F3r_FI8Ur2aLnc}rhLd*;@@m%QM#e%~VBdWp73(un zJJL}zn>7b|Vwp@V-ETxkc};#d3}k1<;CB)&bq>bj$#j2x^J~(L^^u*<$vk~ zouHH2m5vP=b*H{fwPsud=tPoTk%kT0kpk12xlT z>Y%N(jXqAhq;;7ym32-NGNpY;b5ohJhFc_9%QDVXzCbouF_Heus@t`3-Qykfp17GB z8c18cdR07@iVvk?Sz}Ww#i!KDP3~UH%w#cPsF<7iUR@uuj9mr{lEX$T%q)20WJOJI zR{gw1%T}-7be7{uo2pgs8BSWszLc?hB$>z_?ul7(Bb|Vg_X76mZ1Paj$f#BHrgY_5 zee#sb>9u};_fTK*s-a|dRIgVQkvpT#ukBBurR`>dTP5_Wb9J9Eui&EFk2 z2L{k(tBMxd*&8|XXYAm`ichwK;ZN%j$z0Cf~{>J{!USt1d?+_k+ z(txO1^3%mMhvw0IYNX|~hSt%gw4U04uI=M0_M%d@rx+50ZPk~3NJxCqS!Bsc_U98id7{wzQm2+q(#5)yErZfILsqYDJUu7rRV*uN4*$eC?0PPFW|$Ik;_hfF@IVzzsLg_Ya13co6K&1b;BV8Z3NivA`twmzEgv)41kLhO;rwl$N2M z#fDwqV>o-l1DD4&I6rf9JK{%eh(x$8pBv`9@TMdIm-@h~CA-d2wzFKX}^lPE6^l4J^#Tt&QJY8Dw{k>Ay@wN3`A_x4P;ZfZaIg@mFuFRaOZBW!4cNQILjsyaEBwDDRO7PECbg8)Naip@X zeF~({4KC^PFgwj22OE8wJ;RGFx;&&$6q=l0LX&HHxlrx=5^6l8 zUnImizm+yaYlf6!LI7zEAxNJw?eqV_P70u(Lu)_Bo);1b0tir>NK*i{0OU`or~u+R zyB(37EdU;uKi$+rd!e-r!0s?5h5Q+&5jq0-^GOJx6A(bR0)V&CJ#-)4p9i;m0mOgt zvf-8|e~SDdmiPMLw%*Wj!7pVO?5m`RcU&RRA32Zf_#x2gHMRa4zD}1ypyvsJ)<+&) zBM>ZOp=SF1$3hVW#q+Zp&km1vV#_VTA$jc_DS@*h^XsAw4?FXF&J`E+RR-HPcTC#? z>QE57EDQ&qFEUyXX70(wN(+j*i$_n31g}-~TA?Q3T)*^4Jc$iQUQWW8Vdvc!WKHrTuyKM+8v+2_Eq#dyD;B1W+rek|tw` zzXTyv9^YCD7O_!;=eL7J@VM4q#I#}}fSMA^`4EDr*MUWRmOe)}i8$6*u#`VY-=pu- zWAp?1DgBzB;aq~w(aZEIy&=1r19U#8cxzlhJN)8~;()FLpr>z{|B(gsi}I6FH9jYW z8yXicU$Y^aLqXoud!Xq?<9RvBSssI}3;AX_gd?Abc%!hQ_UyV#>?kb1^v$iC&!ez9 z$$QPMTO{JJqO7TInH`4}mMeX$?KrF`UFTbG$6-Zjvu~3fhZUtEUmNEnAR_OM-1Fl0 zm+Uv}&%o^4f{bW16<7@er;8Ej=Qxcb2HPgW{Bj9(SZ|zC2=}MKeR;tD8pL3)qt5`d zH;EYRU351H$h|xUOApXP;KHZqG(G@G4F80lrr+hE*7?L`KX>$!eRAO>6Xj5v`2T-x z7^x?bc G|NjA847F1L diff --git a/README.md b/README.md index 6c84fccc..0c8aa4cb 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ # TreeSurgeon - Visualisation of Radom Forest Regressor models -*TreeSurgeon* contains routines to visualise Radom Forest Regressor models. The module takes models output files made by [`sklearn`](https://scikit-learn.org/)'s RadomForestRegressor implementation of the random forest regressor algorithm. The raw output files from [`sklearn`](https://scikit-learn.org/) models (`*pkl`) first needs to be converted to the input .csv files required by *TreeSurgeon* using the +**TreeSurgeon** contains routines to visualise Radom Forest Regressor models. The module takes models output files made by [`sklearn`](https://scikit-learn.org/)'s RadomForestRegressor implementation of the random forest regressor algorithm. The raw output files from [`sklearn`](https://scikit-learn.org/) models (`*pkl`) first needs to be converted to the input .csv files required by **TreeSurgeon** using the extract_models4TreeSurgeon.py script in the [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) module. @@ -11,7 +11,7 @@ extract_models4TreeSurgeon.py script in the ## Running -- Process the saved Radom Forest Regressor models `*.pkl` files into the `.csv` that *TreeSurgeon* expects using the script in [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) module. You will need to update some lines in the script as described there. +- Process the saved Radom Forest Regressor models `*.pkl` files into the `.csv` that **TreeSurgeon*** expects using the script in [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) module. You will need to update some lines in the script as described there. `python extract_models4TreeSurgeon.py` @@ -48,7 +48,7 @@ python montage.py # Usage -This package was initially written for use with the [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) package for work to predict sea-surface concentrations ([Sherwen et al. 2019])[https://doi.org/10.5194/essd-2019-40]. However it can be used for any Radom Forest Regressor models made by [`sklearn`](https://scikit-learn.org/) and post-processed to *TreeSurgeon* input by [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) +This package was initially written for use with the [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) package for work to predict sea-surface concentrations [[*Sherwen et al.* 2019]([https://doi.org/10.5194/essd-2019-40)]. However it can be used for any Radom Forest Regressor models made by [`sklearn`](https://scikit-learn.org/) and post-processed to **TreeSurgeon** input by [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) ## Citation(s) From 9b5094407bf7f8f85d636bb4b7d6c96391a77722 Mon Sep 17 00:00:00 2001 From: Tomas Sherwen Date: Thu, 18 Jul 2019 15:35:33 +0100 Subject: [PATCH 3/5] Another minor updates to markdown --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0c8aa4cb..7a59fab9 100644 --- a/README.md +++ b/README.md @@ -48,7 +48,7 @@ python montage.py # Usage -This package was initially written for use with the [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) package for work to predict sea-surface concentrations [[*Sherwen et al.* 2019]([https://doi.org/10.5194/essd-2019-40)]. However it can be used for any Radom Forest Regressor models made by [`sklearn`](https://scikit-learn.org/) and post-processed to **TreeSurgeon** input by [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) +This package was initially written for use with the [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) package for work to predict sea-surface concentrations [[*Sherwen et al.* 2019](https://doi.org/10.5194/essd-2019-40)]. However it can be used for any Radom Forest Regressor models made by [`sklearn`](https://scikit-learn.org/) and post-processed to **TreeSurgeon** input by [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) ## Citation(s) From 44e531ace36ecee11cb5bc164d8bdb75693ba34b Mon Sep 17 00:00:00 2001 From: Tomas Sherwen Date: Thu, 18 Jul 2019 15:37:08 +0100 Subject: [PATCH 4/5] Minor update to README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 7a59fab9..7eeea606 100644 --- a/README.md +++ b/README.md @@ -2,8 +2,8 @@ # TreeSurgeon - Visualisation of Radom Forest Regressor models -**TreeSurgeon** contains routines to visualise Radom Forest Regressor models. The module takes models output files made by [`sklearn`](https://scikit-learn.org/)'s RadomForestRegressor implementation of the random forest regressor algorithm. The raw output files from [`sklearn`](https://scikit-learn.org/) models (`*pkl`) first needs to be converted to the input .csv files required by **TreeSurgeon** using the -extract_models4TreeSurgeon.py script in the +**TreeSurgeon** contains routines to visualise Radom Forest Regressor models. The module takes models output files made by [`sklearn`](https://scikit-learn.org/)'s RadomForestRegressor implementation of the random forest regressor algorithm. The raw output files from [`sklearn`](https://scikit-learn.org/) models (`*pkl`) first needs to be converted to the input `.csv` files required by **TreeSurgeon** using the +`extract_models4TreeSurgeon.py` script in the [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) module. From 7dcc13b5d96944511b79a04e1eb15c59b3938284 Mon Sep 17 00:00:00 2001 From: Tomas Sherwen Date: Thu, 18 Jul 2019 15:38:20 +0100 Subject: [PATCH 5/5] Another minor update to README.md --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 7eeea606..5e9469a5 100644 --- a/README.md +++ b/README.md @@ -18,9 +18,11 @@ - Place files in the [`csv`](https://github.com/wolfiex/TreeSurgeon/tree/master/csv) folder. for composite files: + `python start.py $NCPUS` or for single dot files + `python start.py $NCPUS 1 ` - This then runs in the background (no screen). To change edit `show` option in main.js @@ -51,6 +53,6 @@ python montage.py This package was initially written for use with the [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) package for work to predict sea-surface concentrations [[*Sherwen et al.* 2019](https://doi.org/10.5194/essd-2019-40)]. However it can be used for any Radom Forest Regressor models made by [`sklearn`](https://scikit-learn.org/) and post-processed to **TreeSurgeon** input by [`sparse2spatial`](https://github.com/tsherwen/sparse2spatial) -## Citation(s) +## Reference Sherwen, T., Chance, R. J., Tinel, L., Ellis, D., Evans, M. J., and Carpenter, L. J.: A machine learning based global sea-surface iodide distribution, Earth Syst. Sci. Data Discuss., https://doi.org/10.5194/essd-2019-40, in review, 2019.