imhr.Webgazer¶

@purpose: Module designed for working with the Webgazer exploratory study.  
@date: Created on Sat May 1 15:12:38 2019  
@author: Semeon Risom  
@email: semeon.risom@gmail.com  
@url: https://semeon.io/d/imhr

Classes

`Classify`([isLibrary])	Analysis methods for imhr.processing.preprocesing.
`Metadata`([isLibrary])	Process participants metadata for analysis and export.
`Processing`(config[, isLibrary, isDebug])	Hub for running processing and analyzing raw data.
`raw`([is_library])	processing summary data for output
`redcap`()	Downloading data from REDCap.

class imhr.Webgazer.Classify(isLibrary=False)[source]¶

Bases: imhr.Webgazer.classify.Classify

Analysis methods for imhr.processing.preprocesing.

Parameters:	isLibrary : `bool` Check if required libraries are available. Default False.

Methods

`Acceleration`(time, data_x[, data_y])	Calculate the acceleration (deg/sec/sec) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.
`Velocity`(time, config, d_x[, d_y])	Calculate the instantaneous velocity (degrees / second) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.
`VisualAngle`(g_x, g_y, config)	Convert pixel eye-coordinates to visual angle.
`hmm`(data, filter_type, config)	Hidden Makov Model, adapted from https://gitlab.com/nslr/nslr-hmm.
`idt`(data, dis_threshold, dur_threshold)	Identification with Dispersion Threshold.
`ivt`(data, v_threshold, config)	Identification with Velocity Threshold.
`savitzky_golay`(y, window_size, order[, …])	Smooth (and optionally differentiate) data with a Savitzky-Golay filter.
`simple`(df, missing, maxdist, mindur)	Detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data).

classmethod Acceleration(time, data_x, data_y=None)[source]¶

Calculate the acceleration (deg/sec/sec) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.

Parameters:	time : `numpy.ndarray` Timestamp of each coordinate. d_x,d_y : `numpy.ndarray` List of gaze coordinates. config : `dict` Configuration data for data analysis. i.e. trial number, location.

classmethod Velocity(time, config, d_x, d_y=None)[source]¶

Calculate the instantaneous velocity (degrees / second) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.

Parameters:	time : `numpy.ndarray` Timestamp of each coordinate. d_x,d_y : `numpy.ndarray` List of gaze coordinates. config : `dict` Configuration data for data analysis. i.e. trial number, location.

Notes

Numpy arrays time, d_x, and d_y must all be 1D arrays of the same length. If both d_x and d_y are provided, then the euclidian distance between each set of points is calculated and used in the velocity calculation. Time must be in seconds.msec units, while d_x and d_y are expected to be in visual degrees. If the position traces are in pixel coordinate space, use the VisualAngleCalc class to convert the data into degrees.

classmethod VisualAngle(g_x, g_y, config)[source]¶

Convert pixel eye-coordinates to visual angle.

Parameters:	g_x,g_y : `numpy.ndarray` List of gaze coordinates. drift : `dict` Counter of drift correct runs. config : `dict` Configuration data for data analysis. i.e. trial number, location.

Notes

Stimulus positions (g_x,g_y) are defined in x and y pixel units, with the origin (0,0) being at the center of the display, as to match the PsychoPy pix unit coord type.
The pix2deg method is vectorized, meaning that is will perform the pixel to angle calculations on all elements of the provided pixel position numpy arrays in one numpy call.
The convertion process can use either a fixed eye to calibration plane distance, or a numpy array of eye distances passed as eye_distance_mm. In this case the eye distance array must be the same length as g_x, g_y arrays.

classmethod hmm(data, filter_type, config)[source]¶

Hidden Makov Model, adapted from https://gitlab.com/nslr/nslr-hmm.

Parameters:	data : `pandas.DataFrame` Pandas dataframe of x,y and timestamp positions. filter_type : `dict` Types of filters to use. config : `dict` Configuration data for data analysis. i.e. trial number, location.
Attributes:	data : `numpy.ndarray` The smoothed signal (or it’s n-th derivative). dr_th : `str` Data threshold.

Notes

Definitions

Saccade: The saccade is a ballistic movement, meaning it is pre-programmed and does not change once it has started. Saccades of amplitude 40° peak at velocities of 300–600°/s and last for 80–150 ms.
Fixation: The point between any two saccades, during which the eyes are relatively stationary and virtually all visual input occurs. Regular eye movement alternates between saccades and visual fixations, the notable exception being in smooth pursuit.
Smooth pursuit: Smooth pursuit movements are much slower tracking movements of the eyes designed to keep a moving stimulus on the fovea. Such movements are under voluntary control in the sense that the observer can choose whether or not to track a moving stimulus. (Neuroscience 2nd edition).

References

[1]	Pekkanen, J., & Lappi, O. (2017). A new and general approach to signal denoising and eye movement classification based on segmented linear regression. Scientific Reports, 7(1). doi:10.1038/s41598-017-17983-x.

classmethod idt(data, dis_threshold, dur_threshold)[source]¶

Identification with Dispersion Threshold.

Parameters:	data : `numpy.ndarray` The smoothed signal (or it’s n-th derivative). dr_th : `str` Fixation duration threshold in pix/msec di_th : `str` Dispersion threshold in pixels
Returns:	ys : `numpy.ndarray` The smoothed signal (or it’s n-th derivative).

Notes

The I-DT algorithm has two parameters: a dispersion threshold and the length of a time window in which the dispersion is calculated. The length of the time window is often set to the minimum duration of a fixation, which is around 100-200 ms.

classmethod ivt(data, v_threshold, config)[source]¶

Identification with Velocity Threshold.

In the I-VT model, the velocity value is computed for every eye position sample. The velocity value is then compared to the threshold. If the sampled velocity is less than the threshold, the corresponding eye-position sample is marked as part of a fixation, otherwise it is marked as a part of a saccade.

Parameters:	data : `numpy.ndarray`, shape (N,) the smoothed signal (or it’s n-th derivative). v_threshold : `str` Velocity threshold in pix/sec. config : `dict` Configuration data for data analysis. i.e. trial number, location.
Returns:	ys : `numpy.ndarray`, shape (N,) the smoothed signal (or it’s n-th derivative).

Notes

From https://github.com/ecekt/eyegaze. Formula from: https://dl.acm.org/citation.cfm?id=355028

classmethod savitzky_golay(y, window_size, order, deriv=0, rate=1)[source]¶

Smooth (and optionally differentiate) data with a Savitzky-Golay filter.

The Savitzky-Golay filter removes high frequency noise from data. It has the advantage of preserving the original shape and features of the signal better than other types of filtering approaches, such as moving averages techniques.

Parameters:	y : `numpy.ndarray`, shape (N,) the values of the time history of the signal. window_size : `int` the length of the window. Must be an odd integer number. order : `int` the order of the polynomial used in the filtering. Must be less then window_size - 1. deriv : `int` the order of the derivative to compute (default = 0 means only smoothing)
Returns:	ys : `numpy.ndarray`, shape (N) the smoothed signal (or it’s n-th derivative).

Notes

The Savitzky-Golay is a type of low-pass filter, particularly suited for smoothing noisy data. The main idea behind this approach is to make for each point a least-square fit with a polynomial of high order over a odd-sized window centered at the point. For more information, see: http://wiki.scipy.org/Cookbook/SavitzkyGolay.

Examples

>>> t = np.linspace(-4, 4, 500)
>>> y = np.exp( -t**2 ) + np.random.normal(0, 0.05, t.shape)
>>> ysg = savitzky_golay(y, window_size=31, order=4)
>>> import matplotlib.pyplot as plt
>>> plt.plot(t, y, label='Noisy signal')
>>> plt.plot(t, np.exp(-t**2), 'k', lw=1.5, label='Original signal')
>>> plt.plot(t, ysg, 'r', label='Filtered signal')
>>> plt.legend()
>>> plt.show()

References

[1]	A. Savitzky, Golay, M. (1964). Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry. 36(8), pp 1627-1639.

[2]	S.A. Teukolsky, W.T. Vetterling, B.P. Flannery Numerical Recipes 3rd Edition: The Art of Scientific Computing. W.H. Press,Cambridge University Press ISBN-13: 9780521880688.

classmethod simple(df, missing, maxdist, mindur)[source]¶

Detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data).

Parameters:	df : `pandas.DataFrame` Pandas dataframe of x,y and timestamp positions missing : `str` Value to be used for missing data (default = 0.0) maxdist : `str` Maximal inter-sample distance in pixels (default = 25) mindur : `str` Minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (default = 100).
Returns:	Sfix : `numpy.ndarray` shape (N) list of lists, each containing [starttime] Efix : `numpy.ndarray` shape (N) list of lists, each containing [starttime, endtime, duration, endx, endy]

Notes

From https://github.com/esdalmaijer/PyGazeAnalyser/blob/master/pygazeanalyser/detectors.py

class imhr.Webgazer.Metadata(isLibrary=False)[source]¶

Bases: imhr.Webgazer.metadata.Metadata

Process participants metadata for analysis and export.

Parameters:	isLibrary : `bool` Check if required libraries are available. Default False.

Methods

`predict`(df)	Predicting screen size (cm), device (i.e.
`summary`(df, path)	Preparing data for use in analysis.

classmethod predict(df)[source]¶

Predicting screen size (cm), device (i.e. macbook 2018).

Parameters:	df : `numpy.ndarray` Pandas dataframe of raw data.
Returns:	df : `numpy.ndarray` Pandas dataframe of raw data.

classmethod summary(df, path)[source]¶

Preparing data for use in analysis.

Parameters:	df : `str` Pandas dataframe of raw data. path : `str` The directory path of the subject data
Attributes:	path : `str` Specific directory path used. attr2 : `str`, optional Description of attr2.
Returns:	df : `numpy.ndarray` Pandas dataframe of processed metadata.

Notes

You can either get data from all files within a directory (directory), or from a specific subject (subject_session).

Examples

>>> #if using path:
>>> df = getData(path=self.config['path'])

>>> #if getting data for single subject:
>>> df = getData(path=self.config['path'],subject_session=['1099','1', '0'])

class imhr.Webgazer.Processing(config, isLibrary=False, isDebug=False)[source]¶

Bases: object

Hub for running processing and analyzing raw data.

Methods

`append_classify`(self, df, cg_df)	Appending classification to Dataframe.
`classify`(self, config, df[, ctype, …])	I-DT algorithm takes into account the distribution or spatial proximity of eye position points in the eye-movement trace.
`dwell`(self, df[, cores, isMultiprocessing])	Calculate dwell time for sad and neutral images.
`filter_data`(self, df, filter_type, config)	Butterworth: Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.
`getData`(self[, path])	preparing data for use in analysis.
`getEstimatedMonitor`(self, diagonal, window)	calculate estimate monitor size (w,h;cm) using estimated diagonal monitor (hypotenuse; cm).
`onset_diff`(self, df0[, merge, cores])	Calculate differences in onset presentation (stimulus, dotloc) using bokeh, seaborn, and pandas.
`preprocess`(self, df, window)	Initial data cleaning.
`process`(self, window, filters, gxy_df, trial)	Plotting and preparing data for classification.
`roi`(self[, filters, flt, df, manual, …])	Check if fixation is within bounds.
`run`(self, path[, task_type, single_subject, …])	Processing of data.
`subject_metadata`(self, fpath, spath)	Collect all subjects metadata.
`variables`(self, df)	Output list of variables for easy html viewing.

append_classify(self, df, cg_df)[source]¶

Appending classification to Dataframe.

Parameters:	df : `list` Pandas dataframe of raw data. gxy_df : `pandas.DataFrame` Pandas dataframe of raw data of classification events.

classify(self, config, df, ctype='ivt', filter_type=None, v_th=None, dr_th=None, di_th=None, missing=None, maxdist=None, mindur=None)[source]¶

I-DT algorithm takes into account the distribution or spatial proximity of eye position points in the eye-movement trace.

The simple model detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data)

Parameters:	config : `dict` Configuration data. i.e. trial number, location. df : `pandas.DataFrame` Pandas dataframe of classified data. ctype : `str` Classification type: ‘ivt’ filter_type : [type], optional Filter type: ‘butter’ ctype : `int`, optional velocity threshold (ivt), dispersion threshold (idt; used by SR-Research and Tobii), or simple v_th : `str` Velocity threshold in pix/sec (ivt) dr_th : `str` Fixation duration threshold in pix/msec (idt) di_th : `str` Dispersion threshold in pixels (idt) missing : `str` value to be used for missing data (simple) maxdist : `str` maximal inter sample distance in pixels (simple) mindur : `str` minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (simple)
Returns:	df : `pandas.DataFrame` Pandas dataframe of classified data.
Raises:	ValueError Unknown classification type.

dwell(self, df, cores=1, isMultiprocessing=False)[source]¶

Calculate dwell time for sad and neutral images.

Parameters:	df : `pandas.DataFrame` Pandas dataframe of raw data. This is used as a filter to prevent unused participants from being included in the data. cores : `int` Number of cores to use for multiprocessing.
Returns:	df : `pandas.DataFrame` Pandas dataframe with dwell time. error : `list` List of participants that were not included in dataframe.

filter_data(self, df, filter_type, config)[source]¶

Butterworth: Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.

Parameters:	df : `pandas.DataFrame` Pandas dataframe of raw data. filter_type : `str`, optional Type of filter. config : `dict` Configuration data. i.e. trial number, location.
Attributes:	filter_type : `str` Filter type: ‘butterworth’

getData(self, path=None)[source]¶

preparing data for use in analysis.

Parameters:	path : `str` The directory path of the subject data
Attributes:	path : `str` Specific directory path used.
Returns:	df : `pandas.DataFrame` Pandas dataframe of raw data. _path : `list` list of files used for analysis.

Notes

You can either get data from all subjects within a directory, or from a specific subject (subject_session).

Examples

>>> #if using path:
>>> df_raw = getData(path=self.config['path']['raw'])

>>> #if getting data for single subject:
>>> df_raw = getData(path=self.config['path']['raw'],subject_session=['1099','1', '0'])

getEstimatedMonitor(self, diagonal, window)[source]¶

calculate estimate monitor size (w,h;cm) using estimated diagonal monitor (hypotenuse; cm).

Attributes:	df_raw : `pandas.DataFrame` Pandas dataframe of subjects.

onset_diff(self, df0, merge=None, cores=1)[source]¶

Calculate differences in onset presentation (stimulus, dotloc) using bokeh, seaborn, and pandas.

Parameters:	df0 : `pandas.DataFrame` Pandas dataframe of raw data. This is used to merge variables that may be useful for analysis. merge : `list` or None Variables to merge into returned df. cores : `int` Number of cores to use for multiprocessing.
Returns:	df1 : `pandas.DataFrame` Pandas dataframe. error : `pandas.DataFrame` Dataframe of each participants and the amount trials included in their data. drop : `list` List of participants that are 3 SD from median.

preprocess(self, df, window)[source]¶

Initial data cleaning.

Parameters:	df : `pandas.DataFrame` Pandas dataframe of raw data. window : `tuple` horizontal, vertical resolution
Attributes:	m_delta : `int` Maxinum one-sample change in velocity

Notes

remove_missing:: Remove samples with null values.
remove_bounds:: Remove samples outside of window bounds (1920,1080).
remove_spikes:: remove one-sample spikes if x and y-axis delta is greater than 5.

process(self, window, filters, gxy_df, trial, _classify=True, ctype='simple', _param='', log=False, v_th=20, dr_th=200, di_th=20, _missing=0.0, _maxdist=25, _mindur=50)[source]¶

Plotting and preparing data for classification. Combined plot of each filter.

Parameters:	window : `list` horizontal, vertical resolution filters : `list` List of filters along with short-hand names. gxy_df : `pandas.DataFrame` Pandas dataframe of raw data. Unfiltered raw data. trial : `str` Trial number. _classify : `bool` parameter to include classification ctype : `str` classification type. simple, idt, ivt _param : `str` [description] (the default is ‘’, which [default_description]) log : `bool` [description] (the default is False, which [default_description]) v_th : `str` Velocity threshold in px/sec (ivt) dr_th : `str` Fixation duration threshold in px/msec (idt) di_th : `str` Dispersion threshold in px (idt) _missing : `bool` value to be used for missing data (simple) _maxdist : `str` maximal inter sample distance in pixels (simple) _mindur : `str` minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (simple) (default = 100)
Attributes:	_fxy_df : `pandas.DataFrame` Pandas dataframe of raw data. Filtered data. Subset of _fgxy_df.
Returns:	_fgxy_df : `pandas.DataFrame` Pandas dataframe of filtered data. c_xy : `pandas.DataFrame` Pandas dataframe of classified data.

roi(self, filters=None, flt=None, df=None, manual=False, monitorSize=None)[source]¶

Check if fixation is within bounds.

Attributes:	manual : `str` Whether or not processing.roi() is access manually. monitorSize : `list` Monitor size. filters : `list` Filter parameters. Default [[‘SavitzkyGolay’,’sg’]]. df : `pandas.DataFrame` Pandas dataframe of classified data.
Returns:	df : `pandas.DataFrame` Pandas dataframe of classified data.

run(self, path, task_type='eyetracking', single_subject=False, single_trial=False, subject=0, trial=0, isMultiprocessing=True, cores=1)[source]¶

Processing of data. Steps here include: cleaning data, fixation identification, and exporting data.

Parameters:

Parameters:	path : `string` Path of raw data. task_type : `string` Running analysis on eyetracking or behavioral data. single_subject : `bool` Whether to run function with all or single subject. single_trial : `bool` Whether to run function with all or single trial. subject : `int` Subject number. Only if single_subject = True. trial : `int` Trial number. Only if single_trial = True. isMultiprocessing : `bool` Whether multiprocessing of data will be used. Only if single_subject = False. cores : `int` Number of cores to use for multiprocessing. Only if single_subject = False & isMultiprocessing=True.
Attributes:	process : `bool` Process all data for export.

path : string: Path of raw data.
task_type : string: Running analysis on eyetracking or behavioral data.
single_subject : bool: Whether to run function with all or single subject.
single_trial : bool: Whether to run function with all or single trial.
subject : int: Subject number. Only if single_subject = True.
trial : int: Trial number. Only if single_trial = True.
isMultiprocessing : bool: Whether multiprocessing of data will be used. Only if single_subject = False.
cores : int: Number of cores to use for multiprocessing. Only if single_subject = False & isMultiprocessing=True.

Attributes:

process : bool: Process all data for export.

subject_metadata(self, fpath, spath)[source]¶

Collect all subjects metadata.

Parameters:	fpath : `str` The directory path of all participant data. spath : `str` The directory path of all participant data.
Returns:	df : `ndarray` Pandas dataframe of subject metadata.

variables(self, df)[source]¶

Output list of variables for easy html viewing.

Parameters:	df : `pandas.DataFrame` Pandas dataframe of raw data. This is used as a filter to prevent unused participants from being included in the data. path : `str` The directory path save and read the hdf5 dataframe.
Returns:	df_definitions : `pandas.DataFrame`

class imhr.Webgazer.raw(is_library=False)[source]¶

Bases: object

processing summary data for output

Methods

`download`(self, l_exp, log_path, save_path, …)	Download raw data for use in analysis.
`library`(self)	Check if required libraries are available.

download(self, l_exp, log_path, save_path, hostname, username, password)[source]¶

Download raw data for use in analysis.

Parameters:	l_exp : `str` The list of experiments to pull data from. log_path : `str` The directory path to save the log of participant data downloaded. save_path : `str` The directory path to save paticipant data. hostname : `str` SSH hostname. username : `str` SSH username. password : `str` SSH password.

library(self)[source]¶: Check if required libraries are available.

class imhr.Webgazer.redcap[source]¶

Bases: object

Downloading data from REDCap.

Methods

`cesd`(path, filename, token, url, report_id)	Download CES-D data for use in analysis.
`demographics`(path, filename, token, url, …)	Download demographics data for use in analysis.
`mmpi`(path, filename, token, url, report_id)	Download MMPI data for use in analysis.

classmethod cesd(path, filename, token, url, report_id)[source]¶

Download CES-D data for use in analysis.

Parameters:	path : `str` Path to save data. token : `str` API token to REDCap project. url : `str` API URL to REDCap server. report_id : `int` Name of report to export. payload : `dict` or None Parameters for type of download from REDCap.

classmethod demographics(path, filename, token, url, report_id, payload=None)[source]¶

Download demographics data for use in analysis.

Parameters:	path : `str` Path to save data. token : `str` API token to REDCap project. url : `str` API URL to REDCap server. report_id : `int` Name of report to export. payload : `dict` or None Parameters for type of download from REDCap.

Notes

color = [‘Light Gray’, ‘Gray’, ‘Light Blue’, ‘Blue’ ‘Violet’, ‘Blue-Green’, ‘Green’, ‘Amber’,: ‘Hazel’, ‘Light Brown’, ‘Dark Brown’, ‘Black’, ‘Other’]

classmethod mmpi(path, filename, token, url, report_id, payload=None)[source]¶

Download MMPI data for use in analysis.

Parameters:	path : `str` Path to save data. token : `str` API token to REDCap project. url : `str` API URL to REDCap server. report_id : `int` Name of report to export. payload : `dict` or None Parameters for type of download from REDCap.