imhr.Webgazer.processing

@purpose: Hub for running processing and analyzing raw data.
@date: Created on Sat May 1 15:12:38 2019
@author: Semeon Risom

Classes

Processing(config[, isLibrary, isDebug]) Hub for running processing and analyzing raw data.
class imhr.Webgazer.processing.Processing(config, isLibrary=False, isDebug=False)[source]

Bases: object

Hub for running processing and analyzing raw data.

Methods

Methods

append_classify(self, df, cg_df) Appending classification to Dataframe.
classify(self, config, df[, ctype, …]) I-DT algorithm takes into account the distribution or spatial proximity of eye position points in the eye-movement trace.
dwell(self, df[, cores, isMultiprocessing]) Calculate dwell time for sad and neutral images.
filter_data(self, df, filter_type, config) Butterworth: Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.
getData(self[, path]) preparing data for use in analysis.
getEstimatedMonitor(self, diagonal, window) calculate estimate monitor size (w,h;cm) using estimated diagonal monitor (hypotenuse; cm).
onset_diff(self, df0[, merge, cores]) Calculate differences in onset presentation (stimulus, dotloc) using bokeh, seaborn, and pandas.
preprocess(self, df, window) Initial data cleaning.
process(self, window, filters, gxy_df, trial) Plotting and preparing data for classification.
roi(self[, filters, flt, df, manual, …]) Check if fixation is within bounds.
run(self, path[, task_type, single_subject, …]) Processing of data.
subject_metadata(self, fpath, spath) Collect all subjects metadata.
variables(self, df) Output list of variables for easy html viewing.
append_classify(self, df, cg_df) Appending classification to Dataframe.
classify(self, config, df[, ctype, …]) I-DT algorithm takes into account the distribution or spatial proximity of eye position points in the eye-movement trace.
dwell(self, df[, cores, isMultiprocessing]) Calculate dwell time for sad and neutral images.
filter_data(self, df, filter_type, config) Butterworth: Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.
getData(self[, path]) preparing data for use in analysis.
getEstimatedMonitor(self, diagonal, window) calculate estimate monitor size (w,h;cm) using estimated diagonal monitor (hypotenuse; cm).
onset_diff(self, df0[, merge, cores]) Calculate differences in onset presentation (stimulus, dotloc) using bokeh, seaborn, and pandas.
preprocess(self, df, window) Initial data cleaning.
process(self, window, filters, gxy_df, trial) Plotting and preparing data for classification.
roi(self[, filters, flt, df, manual, …]) Check if fixation is within bounds.
run(self, path[, task_type, single_subject, …]) Processing of data.
subject_metadata(self, fpath, spath) Collect all subjects metadata.
variables(self, df) Output list of variables for easy html viewing.
getEstimatedMonitor(self, diagonal, window)[source]

calculate estimate monitor size (w,h;cm) using estimated diagonal monitor (hypotenuse; cm).

Attributes:
df_raw : pandas.DataFrame

Pandas dataframe of subjects.

preprocess(self, df, window)[source]

Initial data cleaning.

Parameters:
df : pandas.DataFrame

Pandas dataframe of raw data.

window : tuple

horizontal, vertical resolution

Attributes:
m_delta : int

Maxinum one-sample change in velocity

Notes

remove_missing:
Remove samples with null values.
remove_bounds:
Remove samples outside of window bounds (1920,1080).
remove_spikes:
remove one-sample spikes if x and y-axis delta is greater than 5.
getData(self, path=None)[source]

preparing data for use in analysis.

Parameters:
path : str

The directory path of the subject data

Attributes:
path : str

Specific directory path used.

Returns:
df : pandas.DataFrame

Pandas dataframe of raw data.

_path : list

list of files used for analysis.

Notes

You can either get data from all subjects within a directory, or from a specific subject (subject_session).

Examples

>>> #if using path:
>>> df_raw = getData(path=self.config['path']['raw'])
>>> #if getting data for single subject:
>>> df_raw = getData(path=self.config['path']['raw'],subject_session=['1099','1', '0'])
filter_data(self, df, filter_type, config)[source]

Butterworth: Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.

Parameters:
df : pandas.DataFrame

Pandas dataframe of raw data.

filter_type : str, optional

Type of filter.

config : dict

Configuration data. i.e. trial number, location.

Attributes:
filter_type : str

Filter type: ‘butterworth’

classify(self, config, df, ctype='ivt', filter_type=None, v_th=None, dr_th=None, di_th=None, missing=None, maxdist=None, mindur=None)[source]

I-DT algorithm takes into account the distribution or spatial proximity of eye position points in the eye-movement trace.

In the I-VT model, the velocity value is computed for every eye position sample. The velocity value is then compared to the threshold. If the sampled velocity is less than the threshold, the corresponding eye-position sample is marked as part of a fixation, otherwise it is marked as a part of a saccade.

The simple model detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data)

Parameters:
config : dict

Configuration data. i.e. trial number, location.

df : pandas.DataFrame

Pandas dataframe of classified data.

ctype : str

Classification type: ‘ivt’

filter_type : [type], optional

Filter type: ‘butter’

ctype : int, optional

velocity threshold (ivt), dispersion threshold (idt; used by SR-Research and Tobii), or simple

v_th : str

Velocity threshold in pix/sec (ivt)

dr_th : str

Fixation duration threshold in pix/msec (idt)

di_th : str

Dispersion threshold in pixels (idt)

missing : str

value to be used for missing data (simple)

maxdist : str

maximal inter sample distance in pixels (simple)

mindur : str

minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (simple)

Returns:
df : pandas.DataFrame

Pandas dataframe of classified data.

Raises:
ValueError

Unknown classification type.

roi(self, filters=None, flt=None, df=None, manual=False, monitorSize=None)[source]

Check if fixation is within bounds.

Attributes:
manual : str

Whether or not processing.roi() is access manually.

monitorSize : list

Monitor size.

filters : list

Filter parameters. Default [[‘SavitzkyGolay’,’sg’]].

df : pandas.DataFrame

Pandas dataframe of classified data.

Returns:
df : pandas.DataFrame

Pandas dataframe of classified data.

process(self, window, filters, gxy_df, trial, _classify=True, ctype='simple', _param='', log=False, v_th=20, dr_th=200, di_th=20, _missing=0.0, _maxdist=25, _mindur=50)[source]

Plotting and preparing data for classification. Combined plot of each filter.

Parameters:
window : list

horizontal, vertical resolution

filters : list

List of filters along with short-hand names.

gxy_df : pandas.DataFrame

Pandas dataframe of raw data. Unfiltered raw data.

trial : str

Trial number.

_classify : bool

parameter to include classification

ctype : str

classification type. simple, idt, ivt

_param : str

[description] (the default is ‘’, which [default_description])

log : bool

[description] (the default is False, which [default_description])

v_th : str

Velocity threshold in px/sec (ivt)

dr_th : str

Fixation duration threshold in px/msec (idt)

di_th : str

Dispersion threshold in px (idt)

_missing : bool

value to be used for missing data (simple)

_maxdist : str

maximal inter sample distance in pixels (simple)

_mindur : str

minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (simple) (default = 100)

Attributes:
_fxy_df : pandas.DataFrame

Pandas dataframe of raw data. Filtered data. Subset of _fgxy_df.

Returns:
_fgxy_df : pandas.DataFrame

Pandas dataframe of filtered data.

c_xy : pandas.DataFrame

Pandas dataframe of classified data.

append_classify(self, df, cg_df)[source]

Appending classification to Dataframe.

Parameters:
df : list

Pandas dataframe of raw data.

gxy_df : pandas.DataFrame

Pandas dataframe of raw data of classification events.

run(self, path, task_type='eyetracking', single_subject=False, single_trial=False, subject=0, trial=0, isMultiprocessing=True, cores=1)[source]

Processing of data. Steps here include: cleaning data, fixation identification, and exporting data.

Parameters:
path : string

Path of raw data.

task_type : string

Running analysis on eyetracking or behavioral data.

single_subject : bool

Whether to run function with all or single subject.

single_trial : bool

Whether to run function with all or single trial.

subject : int

Subject number. Only if single_subject = True.

trial : int

Trial number. Only if single_trial = True.

isMultiprocessing : bool

Whether multiprocessing of data will be used. Only if single_subject = False.

cores : int

Number of cores to use for multiprocessing. Only if single_subject = False & isMultiprocessing=True.

Attributes:
process : bool

Process all data for export.

subject_metadata(self, fpath, spath)[source]

Collect all subjects metadata.

Parameters:
fpath : str

The directory path of all participant data.

spath : str

The directory path of all participant data.

Returns:
df : ndarray

Pandas dataframe of subject metadata.

variables(self, df)[source]

Output list of variables for easy html viewing.

Parameters:
df : pandas.DataFrame

Pandas dataframe of raw data. This is used as a filter to prevent unused participants from being included in the data.

path : str

The directory path save and read the hdf5 dataframe.

Returns:
df_definitions : pandas.DataFrame
dwell(self, df, cores=1, isMultiprocessing=False)[source]

Calculate dwell time for sad and neutral images.

Parameters:
df : pandas.DataFrame

Pandas dataframe of raw data. This is used as a filter to prevent unused participants from being included in the data.

cores : int

Number of cores to use for multiprocessing.

Returns:
df : pandas.DataFrame

Pandas dataframe with dwell time.

error : list

List of participants that were not included in dataframe.

onset_diff(self, df0, merge=None, cores=1)[source]

Calculate differences in onset presentation (stimulus, dotloc) using bokeh, seaborn, and pandas.

Parameters:
df0 : pandas.DataFrame

Pandas dataframe of raw data. This is used to merge variables that may be useful for analysis.

merge : list or None

Variables to merge into returned df.

cores : int

Number of cores to use for multiprocessing.

Returns:
df1 : pandas.DataFrame

Pandas dataframe.

error : pandas.DataFrame

Dataframe of each participants and the amount trials included in their data.

drop : list

List of participants that are 3 SD from median.