imhr.Webgazer

@purpose: Module designed for working with the Webgazer exploratory study.
@date: Created on Sat May 1 15:12:38 2019
@author: Semeon Risom

Classes

Classify([isLibrary]) Analysis methods for imhr.processing.preprocesing.
Metadata([isLibrary]) Process participants metadata for analysis and export.
Processing(config[, isLibrary, isDebug]) Hub for running processing and analyzing raw data.
raw([is_library]) processing summary data for output
redcap() Downloading data from REDCap.
class imhr.Webgazer.Classify(isLibrary=False)[source]

Bases: imhr.Webgazer.classify.Classify

Analysis methods for imhr.processing.preprocesing.

Parameters:
isLibrary : bool

Check if required libraries are available. Default False.

Methods

Acceleration(time, data_x[, data_y]) Calculate the acceleration (deg/sec/sec) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.
Velocity(time, config, d_x[, d_y]) Calculate the instantaneous velocity (degrees / second) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.
VisualAngle(g_x, g_y, config) Convert pixel eye-coordinates to visual angle.
hmm(data, filter_type, config) Hidden Makov Model, adapted from https://gitlab.com/nslr/nslr-hmm.
idt(data, dis_threshold, dur_threshold) Identification with Dispersion Threshold.
ivt(data, v_threshold, config) Identification with Velocity Threshold.
savitzky_golay(y, window_size, order[, …]) Smooth (and optionally differentiate) data with a Savitzky-Golay filter.
simple(df, missing, maxdist, mindur) Detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data).
classmethod Acceleration(time, data_x, data_y=None)[source]

Calculate the acceleration (deg/sec/sec) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.

Parameters:
time : numpy.ndarray

Timestamp of each coordinate.

d_x,d_y : numpy.ndarray

List of gaze coordinates.

config : dict

Configuration data for data analysis. i.e. trial number, location.

classmethod Velocity(time, config, d_x, d_y=None)[source]

Calculate the instantaneous velocity (degrees / second) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.

Parameters:
time : numpy.ndarray

Timestamp of each coordinate.

d_x,d_y : numpy.ndarray

List of gaze coordinates.

config : dict

Configuration data for data analysis. i.e. trial number, location.

Notes

Numpy arrays time, d_x, and d_y must all be 1D arrays of the same length. If both d_x and d_y are provided, then the euclidian distance between each set of points is calculated and used in the velocity calculation. Time must be in seconds.msec units, while d_x and d_y are expected to be in visual degrees. If the position traces are in pixel coordinate space, use the VisualAngleCalc class to convert the data into degrees.

classmethod VisualAngle(g_x, g_y, config)[source]

Convert pixel eye-coordinates to visual angle.

Parameters:
g_x,g_y : numpy.ndarray

List of gaze coordinates.

drift : dict

Counter of drift correct runs.

config : dict

Configuration data for data analysis. i.e. trial number, location.

Notes

  • Stimulus positions (g_x,g_y) are defined in x and y pixel units, with the origin (0,0) being at the center of the display, as to match the PsychoPy pix unit coord type.
  • The pix2deg method is vectorized, meaning that is will perform the pixel to angle calculations on all elements of the provided pixel position numpy arrays in one numpy call.
  • The convertion process can use either a fixed eye to calibration plane distance, or a numpy array of eye distances passed as eye_distance_mm. In this case the eye distance array must be the same length as g_x, g_y arrays.
classmethod hmm(data, filter_type, config)[source]

Hidden Makov Model, adapted from https://gitlab.com/nslr/nslr-hmm.

Parameters:
data : pandas.DataFrame

Pandas dataframe of x,y and timestamp positions.

filter_type : dict

Types of filters to use.

config : dict

Configuration data for data analysis. i.e. trial number, location.

Attributes:
data : numpy.ndarray

The smoothed signal (or it’s n-th derivative).

dr_th : str

Data threshold.

Notes

Definitions
  • Saccade: The saccade is a ballistic movement, meaning it is pre-programmed and does not change once it has started. Saccades of amplitude 40° peak at velocities of 300–600°/s and last for 80–150 ms.
  • Fixation: The point between any two saccades, during which the eyes are relatively stationary and virtually all visual input occurs. Regular eye movement alternates between saccades and visual fixations, the notable exception being in smooth pursuit.
  • Smooth pursuit: Smooth pursuit movements are much slower tracking movements of the eyes designed to keep a moving stimulus on the fovea. Such movements are under voluntary control in the sense that the observer can choose whether or not to track a moving stimulus. (Neuroscience 2nd edition).

References

[1]Pekkanen, J., & Lappi, O. (2017). A new and general approach to signal denoising and eye movement classification based on segmented linear regression. Scientific Reports, 7(1). doi:10.1038/s41598-017-17983-x.
classmethod idt(data, dis_threshold, dur_threshold)[source]

Identification with Dispersion Threshold.

Parameters:
data : numpy.ndarray

The smoothed signal (or it’s n-th derivative).

dr_th : str

Fixation duration threshold in pix/msec

di_th : str

Dispersion threshold in pixels

Returns:
ys : numpy.ndarray

The smoothed signal (or it’s n-th derivative).

Notes

The I-DT algorithm has two parameters: a dispersion threshold and the length of a time window in which the dispersion is calculated. The length of the time window is often set to the minimum duration of a fixation, which is around 100-200 ms.

classmethod ivt(data, v_threshold, config)[source]

Identification with Velocity Threshold.

In the I-VT model, the velocity value is computed for every eye position sample. The velocity value is then compared to the threshold. If the sampled velocity is less than the threshold, the corresponding eye-position sample is marked as part of a fixation, otherwise it is marked as a part of a saccade.

Parameters:
data : numpy.ndarray, shape (N,)

the smoothed signal (or it’s n-th derivative).

v_threshold : str

Velocity threshold in pix/sec.

config : dict

Configuration data for data analysis. i.e. trial number, location.

Returns:
ys : numpy.ndarray, shape (N,)

the smoothed signal (or it’s n-th derivative).

Notes

From https://github.com/ecekt/eyegaze. Formula from: https://dl.acm.org/citation.cfm?id=355028

classmethod savitzky_golay(y, window_size, order, deriv=0, rate=1)[source]

Smooth (and optionally differentiate) data with a Savitzky-Golay filter.

The Savitzky-Golay filter removes high frequency noise from data. It has the advantage of preserving the original shape and features of the signal better than other types of filtering approaches, such as moving averages techniques.

Parameters:
y : numpy.ndarray, shape (N,)

the values of the time history of the signal.

window_size : int

the length of the window. Must be an odd integer number.

order : int

the order of the polynomial used in the filtering. Must be less then window_size - 1.

deriv : int

the order of the derivative to compute (default = 0 means only smoothing)

Returns:
ys : numpy.ndarray, shape (N)

the smoothed signal (or it’s n-th derivative).

Notes

The Savitzky-Golay is a type of low-pass filter, particularly suited for smoothing noisy data. The main idea behind this approach is to make for each point a least-square fit with a polynomial of high order over a odd-sized window centered at the point. For more information, see: http://wiki.scipy.org/Cookbook/SavitzkyGolay.

Examples

>>> t = np.linspace(-4, 4, 500)
>>> y = np.exp( -t**2 ) + np.random.normal(0, 0.05, t.shape)
>>> ysg = savitzky_golay(y, window_size=31, order=4)
>>> import matplotlib.pyplot as plt
>>> plt.plot(t, y, label='Noisy signal')
>>> plt.plot(t, np.exp(-t**2), 'k', lw=1.5, label='Original signal')
>>> plt.plot(t, ysg, 'r', label='Filtered signal')
>>> plt.legend()
>>> plt.show()

References

[1]A. Savitzky, Golay, M. (1964). Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry. 36(8), pp 1627-1639.
[2]S.A. Teukolsky, W.T. Vetterling, B.P. Flannery Numerical Recipes 3rd Edition: The Art of Scientific Computing. W.H. Press,Cambridge University Press ISBN-13: 9780521880688.
classmethod simple(df, missing, maxdist, mindur)[source]

Detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data).

Parameters:
df : pandas.DataFrame

Pandas dataframe of x,y and timestamp positions

missing : str

Value to be used for missing data (default = 0.0)

maxdist : str

Maximal inter-sample distance in pixels (default = 25)

mindur : str

Minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (default = 100).

Returns:
Sfix : numpy.ndarray shape (N)

list of lists, each containing [starttime]

Efix : numpy.ndarray shape (N)

list of lists, each containing [starttime, endtime, duration, endx, endy]

Notes

From https://github.com/esdalmaijer/PyGazeAnalyser/blob/master/pygazeanalyser/detectors.py

class imhr.Webgazer.Metadata(isLibrary=False)[source]

Bases: imhr.Webgazer.metadata.Metadata

Process participants metadata for analysis and export.

Parameters:
isLibrary : bool

Check if required libraries are available. Default False.

Methods

predict(df) Predicting screen size (cm), device (i.e.
summary(df, path) Preparing data for use in analysis.
classmethod predict(df)[source]

Predicting screen size (cm), device (i.e. macbook 2018).

Parameters:
df : numpy.ndarray

Pandas dataframe of raw data.

Returns:
df : numpy.ndarray

Pandas dataframe of raw data.

classmethod summary(df, path)[source]

Preparing data for use in analysis.

Parameters:
df : str

Pandas dataframe of raw data.

path : str

The directory path of the subject data

Attributes:
path : str

Specific directory path used.

attr2 : str, optional

Description of attr2.

Returns:
df : numpy.ndarray

Pandas dataframe of processed metadata.

Notes

You can either get data from all files within a directory (directory), or from a specific subject (subject_session).

Examples

>>> #if using path:
>>> df = getData(path=self.config['path'])
>>> #if getting data for single subject:
>>> df = getData(path=self.config['path'],subject_session=['1099','1', '0'])
class imhr.Webgazer.Processing(config, isLibrary=False, isDebug=False)[source]

Bases: object

Hub for running processing and analyzing raw data.

Methods

append_classify(self, df, cg_df) Appending classification to Dataframe.
classify(self, config, df[, ctype, …]) I-DT algorithm takes into account the distribution or spatial proximity of eye position points in the eye-movement trace.
dwell(self, df[, cores, isMultiprocessing]) Calculate dwell time for sad and neutral images.
filter_data(self, df, filter_type, config) Butterworth: Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.
getData(self[, path]) preparing data for use in analysis.
getEstimatedMonitor(self, diagonal, window) calculate estimate monitor size (w,h;cm) using estimated diagonal monitor (hypotenuse; cm).
onset_diff(self, df0[, merge, cores]) Calculate differences in onset presentation (stimulus, dotloc) using bokeh, seaborn, and pandas.
preprocess(self, df, window) Initial data cleaning.
process(self, window, filters, gxy_df, trial) Plotting and preparing data for classification.
roi(self[, filters, flt, df, manual, …]) Check if fixation is within bounds.
run(self, path[, task_type, single_subject, …]) Processing of data.
subject_metadata(self, fpath, spath) Collect all subjects metadata.
variables(self, df) Output list of variables for easy html viewing.
append_classify(self, df, cg_df)[source]

Appending classification to Dataframe.

Parameters:
df : list

Pandas dataframe of raw data.

gxy_df : pandas.DataFrame

Pandas dataframe of raw data of classification events.

classify(self, config, df, ctype='ivt', filter_type=None, v_th=None, dr_th=None, di_th=None, missing=None, maxdist=None, mindur=None)[source]

I-DT algorithm takes into account the distribution or spatial proximity of eye position points in the eye-movement trace.

In the I-VT model, the velocity value is computed for every eye position sample. The velocity value is then compared to the threshold. If the sampled velocity is less than the threshold, the corresponding eye-position sample is marked as part of a fixation, otherwise it is marked as a part of a saccade.

The simple model detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data)

Parameters:
config : dict

Configuration data. i.e. trial number, location.

df : pandas.DataFrame

Pandas dataframe of classified data.

ctype : str

Classification type: ‘ivt’

filter_type : [type], optional

Filter type: ‘butter’

ctype : int, optional

velocity threshold (ivt), dispersion threshold (idt; used by SR-Research and Tobii), or simple

v_th : str

Velocity threshold in pix/sec (ivt)

dr_th : str

Fixation duration threshold in pix/msec (idt)

di_th : str

Dispersion threshold in pixels (idt)

missing : str

value to be used for missing data (simple)

maxdist : str

maximal inter sample distance in pixels (simple)

mindur : str

minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (simple)

Returns:
df : pandas.DataFrame

Pandas dataframe of classified data.

Raises:
ValueError

Unknown classification type.

dwell(self, df, cores=1, isMultiprocessing=False)[source]

Calculate dwell time for sad and neutral images.

Parameters:
df : pandas.DataFrame

Pandas dataframe of raw data. This is used as a filter to prevent unused participants from being included in the data.

cores : int

Number of cores to use for multiprocessing.

Returns:
df : pandas.DataFrame

Pandas dataframe with dwell time.

error : list

List of participants that were not included in dataframe.

filter_data(self, df, filter_type, config)[source]

Butterworth: Design an Nth-order digital or analog Butterworth filter and return the filter coefficients.

Parameters:
df : pandas.DataFrame

Pandas dataframe of raw data.

filter_type : str, optional

Type of filter.

config : dict

Configuration data. i.e. trial number, location.

Attributes:
filter_type : str

Filter type: ‘butterworth’

getData(self, path=None)[source]

preparing data for use in analysis.

Parameters:
path : str

The directory path of the subject data

Attributes:
path : str

Specific directory path used.

Returns:
df : pandas.DataFrame

Pandas dataframe of raw data.

_path : list

list of files used for analysis.

Notes

You can either get data from all subjects within a directory, or from a specific subject (subject_session).

Examples

>>> #if using path:
>>> df_raw = getData(path=self.config['path']['raw'])
>>> #if getting data for single subject:
>>> df_raw = getData(path=self.config['path']['raw'],subject_session=['1099','1', '0'])
getEstimatedMonitor(self, diagonal, window)[source]

calculate estimate monitor size (w,h;cm) using estimated diagonal monitor (hypotenuse; cm).

Attributes:
df_raw : pandas.DataFrame

Pandas dataframe of subjects.

onset_diff(self, df0, merge=None, cores=1)[source]

Calculate differences in onset presentation (stimulus, dotloc) using bokeh, seaborn, and pandas.

Parameters:
df0 : pandas.DataFrame

Pandas dataframe of raw data. This is used to merge variables that may be useful for analysis.

merge : list or None

Variables to merge into returned df.

cores : int

Number of cores to use for multiprocessing.

Returns:
df1 : pandas.DataFrame

Pandas dataframe.

error : pandas.DataFrame

Dataframe of each participants and the amount trials included in their data.

drop : list

List of participants that are 3 SD from median.

preprocess(self, df, window)[source]

Initial data cleaning.

Parameters:
df : pandas.DataFrame

Pandas dataframe of raw data.

window : tuple

horizontal, vertical resolution

Attributes:
m_delta : int

Maxinum one-sample change in velocity

Notes

remove_missing:
Remove samples with null values.
remove_bounds:
Remove samples outside of window bounds (1920,1080).
remove_spikes:
remove one-sample spikes if x and y-axis delta is greater than 5.
process(self, window, filters, gxy_df, trial, _classify=True, ctype='simple', _param='', log=False, v_th=20, dr_th=200, di_th=20, _missing=0.0, _maxdist=25, _mindur=50)[source]

Plotting and preparing data for classification. Combined plot of each filter.

Parameters:
window : list

horizontal, vertical resolution

filters : list

List of filters along with short-hand names.

gxy_df : pandas.DataFrame

Pandas dataframe of raw data. Unfiltered raw data.

trial : str

Trial number.

_classify : bool

parameter to include classification

ctype : str

classification type. simple, idt, ivt

_param : str

[description] (the default is ‘’, which [default_description])

log : bool

[description] (the default is False, which [default_description])

v_th : str

Velocity threshold in px/sec (ivt)

dr_th : str

Fixation duration threshold in px/msec (idt)

di_th : str

Dispersion threshold in px (idt)

_missing : bool

value to be used for missing data (simple)

_maxdist : str

maximal inter sample distance in pixels (simple)

_mindur : str

minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (simple) (default = 100)

Attributes:
_fxy_df : pandas.DataFrame

Pandas dataframe of raw data. Filtered data. Subset of _fgxy_df.

Returns:
_fgxy_df : pandas.DataFrame

Pandas dataframe of filtered data.

c_xy : pandas.DataFrame

Pandas dataframe of classified data.

roi(self, filters=None, flt=None, df=None, manual=False, monitorSize=None)[source]

Check if fixation is within bounds.

Attributes:
manual : str

Whether or not processing.roi() is access manually.

monitorSize : list

Monitor size.

filters : list

Filter parameters. Default [[‘SavitzkyGolay’,’sg’]].

df : pandas.DataFrame

Pandas dataframe of classified data.

Returns:
df : pandas.DataFrame

Pandas dataframe of classified data.

run(self, path, task_type='eyetracking', single_subject=False, single_trial=False, subject=0, trial=0, isMultiprocessing=True, cores=1)[source]

Processing of data. Steps here include: cleaning data, fixation identification, and exporting data.

Parameters:
path : string

Path of raw data.

task_type : string

Running analysis on eyetracking or behavioral data.

single_subject : bool

Whether to run function with all or single subject.

single_trial : bool

Whether to run function with all or single trial.

subject : int

Subject number. Only if single_subject = True.

trial : int

Trial number. Only if single_trial = True.

isMultiprocessing : bool

Whether multiprocessing of data will be used. Only if single_subject = False.

cores : int

Number of cores to use for multiprocessing. Only if single_subject = False & isMultiprocessing=True.

Attributes:
process : bool

Process all data for export.

subject_metadata(self, fpath, spath)[source]

Collect all subjects metadata.

Parameters:
fpath : str

The directory path of all participant data.

spath : str

The directory path of all participant data.

Returns:
df : ndarray

Pandas dataframe of subject metadata.

variables(self, df)[source]

Output list of variables for easy html viewing.

Parameters:
df : pandas.DataFrame

Pandas dataframe of raw data. This is used as a filter to prevent unused participants from being included in the data.

path : str

The directory path save and read the hdf5 dataframe.

Returns:
df_definitions : pandas.DataFrame
class imhr.Webgazer.raw(is_library=False)[source]

Bases: object

processing summary data for output

Methods

download(self, l_exp, log_path, save_path, …) Download raw data for use in analysis.
library(self) Check if required libraries are available.
download(self, l_exp, log_path, save_path, hostname, username, password)[source]

Download raw data for use in analysis.

Parameters:
l_exp : str

The list of experiments to pull data from.

log_path : str

The directory path to save the log of participant data downloaded.

save_path : str

The directory path to save paticipant data.

hostname : str

SSH hostname.

username : str

SSH username.

password : str

SSH password.

library(self)[source]

Check if required libraries are available.

class imhr.Webgazer.redcap[source]

Bases: object

Downloading data from REDCap.

Methods

cesd(path, filename, token, url, report_id) Download CES-D data for use in analysis.
demographics(path, filename, token, url, …) Download demographics data for use in analysis.
mmpi(path, filename, token, url, report_id) Download MMPI data for use in analysis.
classmethod cesd(path, filename, token, url, report_id)[source]

Download CES-D data for use in analysis.

Parameters:
path : str

Path to save data.

token : str

API token to REDCap project.

url : str

API URL to REDCap server.

report_id : int

Name of report to export.

payload : dict or None

Parameters for type of download from REDCap.

classmethod demographics(path, filename, token, url, report_id, payload=None)[source]

Download demographics data for use in analysis.

Parameters:
path : str

Path to save data.

token : str

API token to REDCap project.

url : str

API URL to REDCap server.

report_id : int

Name of report to export.

payload : dict or None

Parameters for type of download from REDCap.

Notes

color = [‘Light Gray’, ‘Gray’, ‘Light Blue’, ‘Blue’ ‘Violet’, ‘Blue-Green’, ‘Green’, ‘Amber’,
‘Hazel’, ‘Light Brown’, ‘Dark Brown’, ‘Black’, ‘Other’]
classmethod mmpi(path, filename, token, url, report_id, payload=None)[source]

Download MMPI data for use in analysis.

Parameters:
path : str

Path to save data.

token : str

API token to REDCap project.

url : str

API URL to REDCap server.

report_id : int

Name of report to export.

payload : dict or None

Parameters for type of download from REDCap.