imhr.r33

@purpose: Module designed for working with the R33 study.
@date: Created on Sat May 1 15:12:38 2019
@author: Semeon Risom

Classes

Classify([isLibrary]) Analysis methods for imhr.processing.preprocesing.
Metadata([isLibrary]) Process participants metadata for analysis and export.
Model([isLibrary]) Run statistical models for analysis.
Processing() Hub for running processing and analyzing raw data.
Settings([isLibrary]) Default settings for imhr.r33.Processing..
class imhr.r33.Classify(isLibrary=False)[source]

Bases: imhr.r33._classify.Classify

Analysis methods for imhr.processing.preprocesing.

Parameters:
isLibrary : bool

Check if required libraries are available. Default False.

Methods

Acceleration(time, data_x[, data_y]) Calculate the acceleration (deg/sec/sec) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.
Velocity(time, config, d_x[, d_y]) Calculate the instantaneous velocity (degrees / second) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.
VisualAngle(g_x, g_y, config) Convert pixel eye-coordinates to visual angle.
hmm(data, filter_type, config) Hidden Makov Model, adapted from https://gitlab.com/nslr/nslr-hmm.
idt(data, dis_threshold, dur_threshold) Identification with Dispersion Threshold.
ivt(data, v_threshold, config) Identification with Velocity Threshold.
savitzky_golay(y, window_size, order[, …]) Smooth (and optionally differentiate) data with a Savitzky-Golay filter.
simple(df, missing, maxdist, mindur) Detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data).
classmethod Acceleration(time, data_x, data_y=None)[source]

Calculate the acceleration (deg/sec/sec) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.

Parameters:
time : numpy.ndarray

Timestamp of each coordinate.

d_x,d_y : numpy.ndarray

List of gaze coordinates.

config : dict

Configuration data for data analysis. i.e. trial number, location.

classmethod Velocity(time, config, d_x, d_y=None)[source]

Calculate the instantaneous velocity (degrees / second) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.

Parameters:
time : numpy.ndarray

Timestamp of each coordinate.

d_x,d_y : numpy.ndarray

List of gaze coordinates.

config : dict

Configuration data for data analysis. i.e. trial number, location.

Notes

Numpy arrays time, d_x, and d_y must all be 1D arrays of the same length. If both d_x and d_y are provided, then the euclidian distance between each set of points is calculated and used in the velocity calculation. Time must be in seconds.msec units, while d_x and d_y are expected to be in visual degrees. If the position traces are in pixel coordinate space, use the VisualAngleCalc class to convert the data into degrees.

classmethod VisualAngle(g_x, g_y, config)[source]

Convert pixel eye-coordinates to visual angle.

Parameters:
g_x,g_y : numpy.ndarray

List of gaze coordinates.

drift : dict

Counter of drift correct runs.

config : dict

Configuration data for data analysis. i.e. trial number, location.

Notes

  • Stimulus positions (g_x,g_y) are defined in x and y pixel units, with the origin (0,0) being at the center of the display, as to match the PsychoPy pix unit coord type.
  • The pix2deg method is vectorized, meaning that is will perform the pixel to angle calculations on all elements of the provided pixel position numpy arrays in one numpy call.
  • The convertion process can use either a fixed eye to calibration plane distance, or a numpy array of eye distances passed as eye_distance_mm. In this case the eye distance array must be the same length as g_x, g_y arrays.
classmethod hmm(data, filter_type, config)[source]

Hidden Makov Model, adapted from https://gitlab.com/nslr/nslr-hmm.

Parameters:
data : pandas.DataFrame

Pandas dataframe of x,y and timestamp positions.

filter_type : dict

Types of filters to use.

config : dict

Configuration data for data analysis. i.e. trial number, location.

Attributes:
data : numpy.ndarray

The smoothed signal (or it’s n-th derivative).

dr_th : str

Data threshold.

Notes

Definitions
  • Saccade: The saccade is a ballistic movement, meaning it is pre-programmed and does not change once it has started. Saccades of amplitude 40° peak at velocities of 300–600°/s and last for 80–150 ms.
  • Fixation: The point between any two saccades, during which the eyes are relatively stationary and virtually all visual input occurs. Regular eye movement alternates between saccades and visual fixations, the notable exception being in smooth pursuit.
  • Smooth pursuit: Smooth pursuit movements are much slower tracking movements of the eyes designed to keep a moving stimulus on the fovea. Such movements are under voluntary control in the sense that the observer can choose whether or not to track a moving stimulus. (Neuroscience 2nd edition).

References

[1]Pekkanen, J., & Lappi, O. (2017). A new and general approach to signal denoising and eye movement classification based on segmented linear regression. Scientific Reports, 7(1). doi:10.1038/s41598-017-17983-x.
classmethod idt(data, dis_threshold, dur_threshold)[source]

Identification with Dispersion Threshold.

Parameters:
data : numpy.ndarray

The smoothed signal (or it’s n-th derivative).

dr_th : str

Fixation duration threshold in pix/msec

di_th : str

Dispersion threshold in pixels

Returns:
ys : numpy.ndarray

The smoothed signal (or it’s n-th derivative).

Notes

The I-DT algorithm has two parameters: a dispersion threshold and the length of a time window in which the dispersion is calculated. The length of the time window is often set to the minimum duration of a fixation, which is around 100-200 ms.

classmethod ivt(data, v_threshold, config)[source]

Identification with Velocity Threshold.

In the I-VT model, the velocity value is computed for every eye position sample. The velocity value is then compared to the threshold. If the sampled velocity is less than the threshold, the corresponding eye-position sample is marked as part of a fixation, otherwise it is marked as a part of a saccade.

Parameters:
data : numpy.ndarray, shape (N,)

the smoothed signal (or it’s n-th derivative).

v_threshold : str

Velocity threshold in pix/sec.

config : dict

Configuration data for data analysis. i.e. trial number, location.

Returns:
ys : numpy.ndarray, shape (N,)

the smoothed signal (or it’s n-th derivative).

Notes

From https://github.com/ecekt/eyegaze. Formula from: https://dl.acm.org/citation.cfm?id=355028

classmethod savitzky_golay(y, window_size, order, deriv=0, rate=1)[source]

Smooth (and optionally differentiate) data with a Savitzky-Golay filter.

The Savitzky-Golay filter removes high frequency noise from data. It has the advantage of preserving the original shape and features of the signal better than other types of filtering approaches, such as moving averages techniques.

Parameters:
y : numpy.ndarray, shape (N,)

the values of the time history of the signal.

window_size : int

the length of the window. Must be an odd integer number.

order : int

the order of the polynomial used in the filtering. Must be less then window_size - 1.

deriv : int

the order of the derivative to compute (default = 0 means only smoothing)

Returns:
ys : numpy.ndarray, shape (N)

the smoothed signal (or it’s n-th derivative).

Notes

The Savitzky-Golay is a type of low-pass filter, particularly suited for smoothing noisy data. The main idea behind this approach is to make for each point a least-square fit with a polynomial of high order over a odd-sized window centered at the point. For more information, see: http://wiki.scipy.org/Cookbook/SavitzkyGolay.

Examples

>>> t = np.linspace(-4, 4, 500)
>>> y = np.exp( -t**2 ) + np.random.normal(0, 0.05, t.shape)
>>> ysg = savitzky_golay(y, window_size=31, order=4)
>>> import matplotlib.pyplot as plt
>>> plt.plot(t, y, label='Noisy signal')
>>> plt.plot(t, np.exp(-t**2), 'k', lw=1.5, label='Original signal')
>>> plt.plot(t, ysg, 'r', label='Filtered signal')
>>> plt.legend()
>>> plt.show()

References

[1]A. Savitzky, Golay, M. (1964). Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry. 36(8), pp 1627-1639.
[2]S.A. Teukolsky, W.T. Vetterling, B.P. Flannery Numerical Recipes 3rd Edition: The Art of Scientific Computing. W.H. Press,Cambridge University Press ISBN-13: 9780521880688.
classmethod simple(df, missing, maxdist, mindur)[source]

Detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data).

Parameters:
df : pandas.DataFrame

Pandas dataframe of x,y and timestamp positions

missing : str

Value to be used for missing data (default = 0.0)

maxdist : str

Maximal inter-sample distance in pixels (default = 25)

mindur : str

Minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (default = 100).

Returns:
Sfix : numpy.ndarray shape (N)

list of lists, each containing [starttime]

Efix : numpy.ndarray shape (N)

list of lists, each containing [starttime, endtime, duration, endx, endy]

Notes

From https://github.com/esdalmaijer/PyGazeAnalyser/blob/master/pygazeanalyser/detectors.py

class imhr.r33.Metadata(isLibrary=False)[source]

Bases: imhr.r33._metadata.Metadata

Process participants metadata for analysis and export.

Parameters:
isLibrary : bool

Check if required libraries are available. Default False.

Methods

predict(df) Predicting screen size (cm), device (i.e.
summary(df, path) Preparing data for use in analysis.
classmethod predict(df)[source]

Predicting screen size (cm), device (i.e. macbook 2018).

Parameters:
df : numpy.ndarray

Pandas dataframe of raw data.

Returns:
df : numpy.ndarray

Pandas dataframe of raw data.

classmethod summary(df, path)[source]

Preparing data for use in analysis.

Parameters:
df : str

Pandas dataframe of raw data.

path : str

The directory path of the subject data

Attributes:
path : str

Specific directory path used.

attr2 : str, optional

Description of attr2.

Returns:
df : numpy.ndarray

Pandas dataframe of processed metadata.

Notes

You can either get data from all files within a directory (directory), or from a specific subject (subject_session).

Examples

>>> #if using path:
>>> df = getData(path=self.config['path'])
>>> #if getting data for single subject:
>>> df = getData(path=self.config['path'],subject_session=['1099','1', '0'])
class imhr.r33.Model(isLibrary=False)[source]

Bases: imhr.r33._model.Model

Run statistical models for analysis.

Parameters:
isLibrary : bool

Check if required libraries are available. Default False.

Methods

anova(config, y, f, df, csv, path, effects) Run analysis of variance model using rpy2, seaborn and pandas.
lmer(config, y, f, df, exclude, csv, path, …) Run linear mixed regression model, using rpy2, seaborn and pandas.
logistic(config, y, f, df, me, exclude, csv, …) Run logistic regression model, using rpy2, seaborn and pandas.
classmethod anova(config, y, f, df, csv, path, effects, is_html=True)[source]

Run analysis of variance model using rpy2, seaborn and pandas.

Parameters:
y : str

Response variable.

f : str

Formula to use for analysis.

df : pandas.DataFrame

Pandas dataframe of raw data.

f : str

R-compatiable formula.

csv : str

Name of generated CSV file to run analysis in R.

path : str

The directory path to save the generated files.

effects : list

List of main effects.

is_html : bool

Whether html should be generated.

Returns:
model : rpy2.robjects.methods.RS4

Python representation of an R instance of class ‘S4’.

df_anova : pandas.DataFrame

Pandas dataframe of model output.

get_anova : str

R script to run model.

html : str

HTML output.

Notes

Resources
Definition:
A test that allows one to make comparisons between the means of multiple groups of data, where two independent variables are considered.
Assumptions of ANOVA
  1. Normal distribution (normality)
    • Short: Samples are drawn from a normally distributed population (Q-Q Plot, Shapiro-Wilks Test)
    • Detailed Definition: Residuals in data are normally distributed.
  2. Homogeneity of variance (homoscedasticity)
    • Short: Variances are equal (or similar).
    • Detailed: Varience for a DV is constant across the sample. (residual vs fitted plot, Scale-Location plot, Levene’s test)
  3. Independent observations
    • Samples have been drawn independently of each other. No analysis needed.
Hypothesis Interpretation
  • Null: The means of all levels of an IV groups are equal.
  • Alternative: The mean of at least level of an IV is different.
classmethod lmer(config, y, f, df, exclude, csv, path, effects, is_html=True)[source]

Run linear mixed regression model, using rpy2, seaborn and pandas.

Parameters:
y : str

Response variable.

f : list of str

Formula to use for analysis.

df : pandas.DataFrame

Pandas dataframe of raw data.

f : str

R-compatiable formula.

exclude : list

List of participants to be excluded.

csv : str

Name of generated CSV file to run analysis in R.

path : str

The directory path to save the generated files.

effects : list

List of main effects.

is_html : bool

Whether html should be generated.

Returns:
model : rpy2.robjects.methods.RS4

Python representation of an R instance of class ‘S4’.

df_lmer : pandas.DataFrame

Pandas dataframe of model output.

get_lmer : str

R script to run model.

html : str

HTML output.

Notes

Resources
classmethod logistic(config, y, f, df, me, exclude, csv, path, is_html=True)[source]

Run logistic regression model, using rpy2, seaborn and pandas.

Parameters:
y : str

Response variable.

f : str

Formula to use for analysis.

df : pandas.DataFrame

Pandas dataframe of raw data.

me : list of str

List of main effects.

exclude : list

List of participants to be excluded.

csv : str

Name of generated CSV file to run analysis in R.

path : str

The directory path to save the generated files.

is_html : bool

Whether html should be generated.

Returns:
model : rpy2.robjects.methods.RS4

Python representation of an R instance of class ‘S4’.

df_logit : pandas.DataFrame

Pandas dataframe of model output.

get_logit : str

R script to run model.

html : str

HTML output.

Notes

Resources
class imhr.r33.Processing[source]

Bases: imhr.r33._processing.Processing

Hub for running processing and analyzing raw data.

Parameters:
**kwargs : str or None, optional

Optional properties to control how this class will run:

These properties control additional core parameters for the API:

Property Description
cores : bool (if isMultiprocessing == True) Number of cores to use. Default is total available cores - 1.
isLibrary : bool Check if required packages have been installed. Default is False.

Methods

demographics(source, destination[, isHTML]) Output list of demographics for easy html viewing.
device(source, destination[, isHTML]) Output list of variables for easy html viewing.
html([df, raw_data, name, path, source, …]) Create HTML output.
preprocessing(source[, isMultiprocessing, cores]) Preprocessing data for formatting and initial calculations.
summary(source, destination[, metadata, isHTML]) Generate summary from online raw data.
variables(source, destination[, isHTML]) Output list of variables for easy html viewing.
classmethod demographics(source, destination, isHTML=True)[source]

Output list of demographics for easy html viewing.

Parameters:
source : str

Source path.

destination : str

Destination path.

isHTML : bool

Whether or not to export html.

Returns:
df_definitions : pandas.DataFrame

demographics output.

classmethod device(source, destination, isHTML=True)[source]

Output list of variables for easy html viewing.

Parameters:
source : str

Source path.

destination : str

Destination path.

isHTML : bool

Whether or not to export html.

Returns:
df_definitions : pandas.DataFrame

Variables output.

classmethod html(df=None, raw_data=None, name=None, path=None, source=None, figure_title=None, intro=None, footnote=None, script='', **kwargs)[source]

Create HTML output.

Parameters:
destination : str

Path to save file to.

df : pandas.DataFrame

Pandas dataframe of analysis results data.

raw_data : pandas.DataFrame

Pandas dataframe of raw data.

name : str

(py::if source is logit) The name of csv file created.

path : str

The directory path of the html file.

source : str

The type of data being recieved.

figure_title : str

The title of the table or figure.

intro : str

The introduction of the group of figures or tables.

footnote : str

The footnote of the table or figure.

metadata : dict

Additional data to be included.

**kwargs : str, int, or None, optional

Additional properties, relevent for specific content types. Here’s a list of available properties:

These properties control additional parameters for displaying figures:

Property Description
html_title : str HTML title. This is visible on as a header for the html page.
metadata : str Additional data to be included.

While these properties control additional parameters for displaying bokeh plots:

Property Description
short, long : str Short (aoi) and long form (Area of Interest) label of html page. This is primarily used for constructing metadata tags in html.
display : str (For bokeh) The type of calibration/validation display.
trial : str (For bokeh) The trial number for the eyetracking task.
session : int (For bokeh) The session number for the eyetracking task.
day : str (For bokeh) The day the eyetracking task was run.
bokeh_type : str (If Bokeh) Control directory location. If trial, create trial plots.
Returns:
html : str

String of html code.

classmethod preprocessing(source, isMultiprocessing=False, cores=0)[source]

Preprocessing data for formatting and initial calculations. Steps include: * Dates converted to ISO format. * Variables formatted to camelCase. * Variables naming patterns consistant (i.e. osFullName, browserFullName). * Calculations for stimulus onset error, DotLoc onset error, median response time, median stimulus onset error, median DotLoc onset error

Parameters:
source : [type]

[description]

isMultiprocessing : bool, optional

[description], by default False

cores : int, optional

[description], by default 0

Returns:
df : pandas.DataFrame

Pandas dataframe output.

classmethod summary(source, destination, metadata=None, isHTML=True)[source]

Generate summary from online raw data.

Parameters:
source : str

Source path.

destination : str

Destination path.

metadata : str

Metadata path.

isHTML : bool

Whether or not to export html.

Returns:
df : pandas.DataFrame

Pandas dataframe output.

error : pandas.DataFrame

Pandas dataframe error output.

classmethod variables(source, destination, isHTML=True)[source]

Output list of variables for easy html viewing.

Parameters:
source : str

Source path.

destination : str

Destination path.

isHTML : bool

Whether or not to export html.

Returns:
df_definitions : pandas.DataFrame

Variables output.

class imhr.r33.Settings(isLibrary=False)[source]

Bases: imhr.r33._settings.Settings

Default settings for imhr.r33.Processing..

Parameters:
is_library : bool or list

Check if required libraries are available. Default False.

Methods

definitions(config) Store definitions.
classmethod definitions(config)[source]

Store definitions.

Parameters:
message : str

Log message.

source : str

Origin of call. Either debug or timestamp.

Returns:
config : dict

Returned dictionary

Examples

CESD Group
in-text: m_[‘short’][‘cesd_group’] = ‘CESD Group’ title: m_[‘long’][‘cesd_group’] = ‘CESD Group’ definition: m_[‘def’][‘cesd_group’] “a binary measure of CESD score (between subjects; ‘Low’ (&lt16) and ‘High’ (≥16))”