Utils¶
This module is a collection of various useful tools for the use of the scenewalk model.
loadData¶
As decribed in the section “data requirements” the package expects data to be organized as a list of lists of arrays referring to the levels of subject, trial and fixation respectively. The relevant data are x and y coordinates, image number and fixation duration. Additionally the model needs the fixation densities/saliency maps and the data range.
In loadData datasets are loaded into dictionaries, can be shortened or split in various ways and then saved to their own variables.
Funtions that load, handle and change data
-
scenewalk.utils.loadData.
autopopulate_data_path_dict
(folder_path)¶ finds data paths automatically assuming specific Data structure
-
scenewalk.utils.loadData.
change_resolution
(densities, new_size)¶ Changes the resolution of a list of densities
- Parameters
- densitieslist of arrays
list of empirical fixation densities
- new_sizeint
number of pixels in one direction after resizing
- Returns
- list
of densities
-
scenewalk.utils.loadData.
check_npy_folder_complete
(npy_path)¶ checks npy data folder is complete
-
scenewalk.utils.loadData.
chop_scanpaths
(lower, upper, datadict)¶ Cuts off scanpaths in a data set. To make them equal lengths or to get a subset of fixations
- Parameters
- densitieslist of arrays
list of empirical fixation densities
- new_sizeint
number of pixels in one direction after resizing
- Returns
- list
of densities
-
scenewalk.utils.loadData.
dataDict2vars
(data_dict)¶ takes data dictionary and returns vectors x, y, dur, im, densities, range x_dat, y_dat, dur_dat, im_dat, densities_dat, d_range
- Parameters
- data_dictdict
data dictionary
- Returns
- arrays
x, y, dur, im, densities, range
-
scenewalk.utils.loadData.
find_data
()¶ tries to find where your data is hiding. Order of preference: 1. You’ve set the DATA_PATH variable 2. config in code directory 3. config in working directory 4. folder called DATA in your working directory
-
scenewalk.utils.loadData.
get_ix_from_set
(data_dict, subj_order=None, trials_order=None)¶ Takes a list of indeces for trials and for subjects and shortend/reorders the dataset. The order of the list IS relevant. No Nones are added, so the absolute indexes will change! If an integer is given in place of a list, it will expand into a list of all subjects up to that one.
- Parameters
- data_dictdict
data dictionary
- subj_order{int, list}
list of subject indexes to return
- trials_order{int, list}
list of trial indexes to return
- Returns
- dict
data dictionary
-
scenewalk.utils.loadData.
get_set_names
()¶ get available data sets
- Returns
- list
list of dataset names
-
scenewalk.utils.loadData.
load_data
(data_ref)¶ Returns selected data set as a dictionary
- Parameters
- data_refstr
name or path of the set you want to load
- Returns
- dict
data dictionary
-
scenewalk.utils.loadData.
load_sim_data
(folder_path)¶ Takes the absolute path of a folder of simulated data and loads the contents into a dictionary
- Parameters
- folder_pathstr
absolute path of the simulated data folder
- Returns
- dict
data dictionary
-
scenewalk.utils.loadData.
populate_path_dict_from_yml
(yml_path)¶ loads config yml
-
scenewalk.utils.loadData.
save_dict_to_folder
(data_dict, name, folder)¶ Save a data dict into folder as separate npy files
- Parameters
- data_dictdict
data
- namestr
name of the dataset
- folderstr
folder to save to (must exist!)
-
scenewalk.utils.loadData.
setup_data_dict
()¶ Helper function that sets up data dict structure
- Returns
- dict
dict of all relevant keys for data, but populated with Nones
-
scenewalk.utils.loadData.
shorten_set
(data_dict, nvp, vps=None)¶ Gets x subjects from the selected set
- Parameters
- data_dictdict
data dictionary
- nvpint
number of subjects to return
- vpslist
list of vp numbers
- Returns
- dict
data dictionary
utils¶
These funtions mainly refer to saving estimation chains returned by pydream into more readable formats
Utility Functions for SceneWalk Related stuff
-
scenewalk.utils.utils.
check_param_dict_order
(p_dict, sw_model)¶ checks that the parameter dictionary is in the correct order for the model if you turn it into a list (Pydream will be returning lists as parameters)
- Parameters
- p_dictordered dict
parameter dictionary to check
- sw_modelscenewalk model object
scenewalk model object with some configuration
- Returns
- list of bools
indicating the parameters line up
-
scenewalk.utils.utils.
get_all_colors
()¶ Returns all available color names in pyplot
-
scenewalk.utils.utils.
get_git_sha
()¶ Returns git sha of current repo, for meta data
-
scenewalk.utils.utils.
save2dict_by_subj
(chains_list, all_vp_list, def_args, fname, perc_last_samples=100)¶ saves full estimation results as a dictionary
- Parameters
- chains_listlist
list of pydream estimations (each element is one subject’s estimation)
- all_vp_listlist
list of all subject indexes
- def_argsdict
dictionary of default arguments (non estimated parameters)
- fnamestr
file name
- perc_last_samplesint
percentage of samples that are not considered burn in
- Returns
- dict
with subjects as keys and chains as values
-
scenewalk.utils.utils.
save2dict_overall_point_estimates
(chains_list, all_vp_list, def_args, priors, sw, credible_interval, fname, perc_last_samples=75)¶ saves point estimates as a dictionary averaged over subjects. A point estimate is the center of the credible interval. The final dictionary has a key for each parameter.
- Parameters
- chains_listlist
list of pydream estimations (each element is one subject’s estimation)
- all_vp_listlist
list of all subject indexes
- def_argsdict
dictionary of default arguments (non estimated parameters)
- credible_intervalfloat
credible interval (example 0.5 if 50% of datapoints should be in the interval)
- fnamestr
file name
- CIbool
is the credible interval returned in the dictionary?
- perc_last_samplesint
percentage of samples that are not considered burn in
- Returns
- dict
with subjects as keys and chains as values
-
scenewalk.utils.utils.
save2npy_point_estimate_by_subj
(chains_list, all_vp_list, def_args, credible_interval, fname, CI=False, perc_last_samples=75, logzeta=False)¶ saves point estimates as a dictionary by subject. A point estimate is the center of the credible interval. Th final dictionary will have a key for each subject.
- Parameters
- chains_listlist
list of pydream estimations (each element is one subject’s estimation)
- all_vp_listlist
list of all subject indexes
- def_argsdict
dictionary of default arguments (non estimated parameters)
- credible_intervalfloat
credible interval (example 0.5 if 50% of datapoints should be in the interval)
- fnamestr
file name
- CIbool
is the credible interval returned in the dictionary?
- perc_last_samplesint
percentage of samples that are not considered burn in
- logzetabool
separately convert log zeta in the ouput
- Returns
- dict
with subjects as keys and chains as values
-
scenewalk.utils.utils.
save2pd_overall_point_estimates
(chains_list, all_vp_list, def_args, priors, sw, credible_interval, fname, perc_last_samples=75, logzeta=False)¶ saves point estimates apandas table averaged over subjects. A point estimate is the center of the credible interval.
- Parameters
- chains_listlist
list of pydream estimations (each element is one subject’s estimation)
- all_vp_listlist
list of all subject indexes
- def_argsdict
dictionary of default arguments (non estimated parameters)
- credible_intervalfloat
credible interval (example 0.5 if 50% of datapoints should be in the interval)
- fnamestr
file name
- CIbool
is the credible interval returned in the dictionary?
- perc_last_samplesint
percentage of samples that are not considered burn in
- logzetabool
separately convert log zeta in the ouput
- Returns
- pandas table
-
scenewalk.utils.utils.
save2pd_subj_point_estimates
(chains_list, all_vp_list, priors, credible_interval, fname, perc_last_samples=75)¶ saves point estimates apandas table with separate fits for each subject. A point estimate is the center of the credible interval.
- Parameters
- chains_listlist
list of pydream estimations (each element is one subject’s estimation)
- all_vp_listlist
list of all subject indexes
- def_argsdict
dictionary of default arguments (non estimated parameters)
- credible_intervalfloat
credible interval (example 0.5 if 50% of datapoints should be in the interval)
- fnamestr
file name
- CIbool
is the credible interval returned in the dictionary?
- perc_last_samplesint
percentage of samples that are not considered burn in
- Returns
- pandas table
-
scenewalk.utils.utils.
show_all_colors
()¶ Makes a plot with all available colors in pyplot
-
scenewalk.utils.utils.
trpd
(my_mean, my_std, lb, ub)¶ Truncated Normal Distribution: wrapper to allow calling with intuitive arguments: mean, standard deviation and lower and upper bounds.
- Parameters
- my_meanfloat
mean of distribution
- my_stdfloat
standard deviation of function
- lbfloat
lower bound of truncation
- ubfloat
upper bound of truncation
- Returns
- truncated normal distribution object
resort¶
Use this thing with utmost caution. It is written to suit one particular folder structure and file naming conventions and will likely need adapting to other setups! This is a little commandline script that sorts the estimation files into folders. Since this depends on the precise setup, and folder structure use it with caution! The idea is to combine error and output files with chains files into separate folders for each subject. Use:
python3 -m scenewalk.utils.resort "2019" 5 -d -o
where the first argument is the beginning of the id number, the second argument is how many estimations there are, the -d
is the dry run option (without it files will be moved for real) and -o
is a different file name structure.
This will probably not word without modification on other setups!