Utils

This module is a collection of various useful tools for the use of the scenewalk model.

loadData

As decribed in the section “data requirements” the package expects data to be organized as a list of lists of arrays referring to the levels of subject, trial and fixation respectively. The relevant data are x and y coordinates, image number and fixation duration. Additionally the model needs the fixation densities/saliency maps and the data range.

In loadData datasets are loaded into dictionaries, can be shortened or split in various ways and then saved to their own variables.

Funtions that load, handle and change data

scenewalk.utils.loadData.autopopulate_data_path_dict(folder_path)

finds data paths automatically assuming specific Data structure

scenewalk.utils.loadData.change_resolution(densities, new_size)

Changes the resolution of a list of densities

Parameters
densitieslist of arrays

list of empirical fixation densities

new_sizeint

number of pixels in one direction after resizing

Returns
list

of densities

scenewalk.utils.loadData.check_npy_folder_complete(npy_path)

checks npy data folder is complete

scenewalk.utils.loadData.chop_scanpaths(lower, upper, datadict)

Cuts off scanpaths in a data set. To make them equal lengths or to get a subset of fixations

Parameters
densitieslist of arrays

list of empirical fixation densities

new_sizeint

number of pixels in one direction after resizing

Returns
list

of densities

scenewalk.utils.loadData.dataDict2vars(data_dict)

takes data dictionary and returns vectors x, y, dur, im, densities, range x_dat, y_dat, dur_dat, im_dat, densities_dat, d_range

Parameters
data_dictdict

data dictionary

Returns
arrays

x, y, dur, im, densities, range

scenewalk.utils.loadData.find_data()

tries to find where your data is hiding. Order of preference: 1. You’ve set the DATA_PATH variable 2. config in code directory 3. config in working directory 4. folder called DATA in your working directory

scenewalk.utils.loadData.get_ix_from_set(data_dict, subj_order=None, trials_order=None)

Takes a list of indeces for trials and for subjects and shortend/reorders the dataset. The order of the list IS relevant. No Nones are added, so the absolute indexes will change! If an integer is given in place of a list, it will expand into a list of all subjects up to that one.

Parameters
data_dictdict

data dictionary

subj_order{int, list}

list of subject indexes to return

trials_order{int, list}

list of trial indexes to return

Returns
dict

data dictionary

scenewalk.utils.loadData.get_set_names()

get available data sets

Returns
list

list of dataset names

scenewalk.utils.loadData.load_data(data_ref)

Returns selected data set as a dictionary

Parameters
data_refstr

name or path of the set you want to load

Returns
dict

data dictionary

scenewalk.utils.loadData.load_sim_data(folder_path)

Takes the absolute path of a folder of simulated data and loads the contents into a dictionary

Parameters
folder_pathstr

absolute path of the simulated data folder

Returns
dict

data dictionary

scenewalk.utils.loadData.populate_path_dict_from_yml(yml_path)

loads config yml

scenewalk.utils.loadData.save_dict_to_folder(data_dict, name, folder)

Save a data dict into folder as separate npy files

Parameters
data_dictdict

data

namestr

name of the dataset

folderstr

folder to save to (must exist!)

scenewalk.utils.loadData.setup_data_dict()

Helper function that sets up data dict structure

Returns
dict

dict of all relevant keys for data, but populated with Nones

scenewalk.utils.loadData.shorten_set(data_dict, nvp, vps=None)

Gets x subjects from the selected set

Parameters
data_dictdict

data dictionary

nvpint

number of subjects to return

vpslist

list of vp numbers

Returns
dict

data dictionary

utils

These funtions mainly refer to saving estimation chains returned by pydream into more readable formats

Utility Functions for SceneWalk Related stuff

scenewalk.utils.utils.check_param_dict_order(p_dict, sw_model)

checks that the parameter dictionary is in the correct order for the model if you turn it into a list (Pydream will be returning lists as parameters)

Parameters
p_dictordered dict

parameter dictionary to check

sw_modelscenewalk model object

scenewalk model object with some configuration

Returns
list of bools

indicating the parameters line up

scenewalk.utils.utils.get_all_colors()

Returns all available color names in pyplot

scenewalk.utils.utils.get_git_sha()

Returns git sha of current repo, for meta data

scenewalk.utils.utils.save2dict_by_subj(chains_list, all_vp_list, def_args, fname, perc_last_samples=100)

saves full estimation results as a dictionary

Parameters
chains_listlist

list of pydream estimations (each element is one subject’s estimation)

all_vp_listlist

list of all subject indexes

def_argsdict

dictionary of default arguments (non estimated parameters)

fnamestr

file name

perc_last_samplesint

percentage of samples that are not considered burn in

Returns
dict

with subjects as keys and chains as values

scenewalk.utils.utils.save2dict_overall_point_estimates(chains_list, all_vp_list, def_args, priors, sw, credible_interval, fname, perc_last_samples=75)

saves point estimates as a dictionary averaged over subjects. A point estimate is the center of the credible interval. The final dictionary has a key for each parameter.

Parameters
chains_listlist

list of pydream estimations (each element is one subject’s estimation)

all_vp_listlist

list of all subject indexes

def_argsdict

dictionary of default arguments (non estimated parameters)

credible_intervalfloat

credible interval (example 0.5 if 50% of datapoints should be in the interval)

fnamestr

file name

CIbool

is the credible interval returned in the dictionary?

perc_last_samplesint

percentage of samples that are not considered burn in

Returns
dict

with subjects as keys and chains as values

scenewalk.utils.utils.save2npy_point_estimate_by_subj(chains_list, all_vp_list, def_args, credible_interval, fname, CI=False, perc_last_samples=75, logzeta=False)

saves point estimates as a dictionary by subject. A point estimate is the center of the credible interval. Th final dictionary will have a key for each subject.

Parameters
chains_listlist

list of pydream estimations (each element is one subject’s estimation)

all_vp_listlist

list of all subject indexes

def_argsdict

dictionary of default arguments (non estimated parameters)

credible_intervalfloat

credible interval (example 0.5 if 50% of datapoints should be in the interval)

fnamestr

file name

CIbool

is the credible interval returned in the dictionary?

perc_last_samplesint

percentage of samples that are not considered burn in

logzetabool

separately convert log zeta in the ouput

Returns
dict

with subjects as keys and chains as values

scenewalk.utils.utils.save2pd_overall_point_estimates(chains_list, all_vp_list, def_args, priors, sw, credible_interval, fname, perc_last_samples=75, logzeta=False)

saves point estimates apandas table averaged over subjects. A point estimate is the center of the credible interval.

Parameters
chains_listlist

list of pydream estimations (each element is one subject’s estimation)

all_vp_listlist

list of all subject indexes

def_argsdict

dictionary of default arguments (non estimated parameters)

credible_intervalfloat

credible interval (example 0.5 if 50% of datapoints should be in the interval)

fnamestr

file name

CIbool

is the credible interval returned in the dictionary?

perc_last_samplesint

percentage of samples that are not considered burn in

logzetabool

separately convert log zeta in the ouput

Returns
pandas table
scenewalk.utils.utils.save2pd_subj_point_estimates(chains_list, all_vp_list, priors, credible_interval, fname, perc_last_samples=75)

saves point estimates apandas table with separate fits for each subject. A point estimate is the center of the credible interval.

Parameters
chains_listlist

list of pydream estimations (each element is one subject’s estimation)

all_vp_listlist

list of all subject indexes

def_argsdict

dictionary of default arguments (non estimated parameters)

credible_intervalfloat

credible interval (example 0.5 if 50% of datapoints should be in the interval)

fnamestr

file name

CIbool

is the credible interval returned in the dictionary?

perc_last_samplesint

percentage of samples that are not considered burn in

Returns
pandas table
scenewalk.utils.utils.show_all_colors()

Makes a plot with all available colors in pyplot

scenewalk.utils.utils.trpd(my_mean, my_std, lb, ub)

Truncated Normal Distribution: wrapper to allow calling with intuitive arguments: mean, standard deviation and lower and upper bounds.

Parameters
my_meanfloat

mean of distribution

my_stdfloat

standard deviation of function

lbfloat

lower bound of truncation

ubfloat

upper bound of truncation

Returns
truncated normal distribution object

resort

Use this thing with utmost caution. It is written to suit one particular folder structure and file naming conventions and will likely need adapting to other setups! This is a little commandline script that sorts the estimation files into folders. Since this depends on the precise setup, and folder structure use it with caution! The idea is to combine error and output files with chains files into separate folders for each subject. Use:

python3 -m scenewalk.utils.resort "2019" 5 -d -o

where the first argument is the beginning of the id number, the second argument is how many estimations there are, the -d is the dry run option (without it files will be moved for real) and -o is a different file name structure.

This will probably not word without modification on other setups!