imhr.r33¶

@purpose: Module designed for working with the R33 study.  
@date: Created on Sat May 1 15:12:38 2019  
@author: Semeon Risom  
@email: semeon.risom@gmail.com  
@url: https://semeon.io/d/imhr

Classes

`Classify`([isLibrary])	Analysis methods for imhr.processing.preprocesing.
`Metadata`([isLibrary])	Process participants metadata for analysis and export.
`Model`([isLibrary])	Run statistical models for analysis.
`Processing`()	Hub for running processing and analyzing raw data.
`Settings`([isLibrary])	Default settings for imhr.r33.Processing..

class imhr.r33.Classify(isLibrary=False)[source]¶

Bases: imhr.r33._classify.Classify

Analysis methods for imhr.processing.preprocesing.

Parameters:	isLibrary : `bool` Check if required libraries are available. Default False.

Methods

`Acceleration`(time, data_x[, data_y])	Calculate the acceleration (deg/sec/sec) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.
`Velocity`(time, config, d_x[, d_y])	Calculate the instantaneous velocity (degrees / second) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.
`VisualAngle`(g_x, g_y, config)	Convert pixel eye-coordinates to visual angle.
`hmm`(data, filter_type, config)	Hidden Makov Model, adapted from https://gitlab.com/nslr/nslr-hmm.
`idt`(data, dis_threshold, dur_threshold)	Identification with Dispersion Threshold.
`ivt`(data, v_threshold, config)	Identification with Velocity Threshold.
`savitzky_golay`(y, window_size, order[, …])	Smooth (and optionally differentiate) data with a Savitzky-Golay filter.
`simple`(df, missing, maxdist, mindur)	Detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data).

classmethod Acceleration(time, data_x, data_y=None)[source]¶

Calculate the acceleration (deg/sec/sec) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.

Parameters:	time : `numpy.ndarray` Timestamp of each coordinate. d_x,d_y : `numpy.ndarray` List of gaze coordinates. config : `dict` Configuration data for data analysis. i.e. trial number, location.

classmethod Velocity(time, config, d_x, d_y=None)[source]¶

Calculate the instantaneous velocity (degrees / second) for data points in d_x and (optionally) d_y, using the time numpy array for time delta information.

Parameters:	time : `numpy.ndarray` Timestamp of each coordinate. d_x,d_y : `numpy.ndarray` List of gaze coordinates. config : `dict` Configuration data for data analysis. i.e. trial number, location.

Notes

Numpy arrays time, d_x, and d_y must all be 1D arrays of the same length. If both d_x and d_y are provided, then the euclidian distance between each set of points is calculated and used in the velocity calculation. Time must be in seconds.msec units, while d_x and d_y are expected to be in visual degrees. If the position traces are in pixel coordinate space, use the VisualAngleCalc class to convert the data into degrees.

classmethod VisualAngle(g_x, g_y, config)[source]¶

Convert pixel eye-coordinates to visual angle.

Parameters:	g_x,g_y : `numpy.ndarray` List of gaze coordinates. drift : `dict` Counter of drift correct runs. config : `dict` Configuration data for data analysis. i.e. trial number, location.

Notes

Stimulus positions (g_x,g_y) are defined in x and y pixel units, with the origin (0,0) being at the center of the display, as to match the PsychoPy pix unit coord type.
The pix2deg method is vectorized, meaning that is will perform the pixel to angle calculations on all elements of the provided pixel position numpy arrays in one numpy call.
The convertion process can use either a fixed eye to calibration plane distance, or a numpy array of eye distances passed as eye_distance_mm. In this case the eye distance array must be the same length as g_x, g_y arrays.

classmethod hmm(data, filter_type, config)[source]¶

Hidden Makov Model, adapted from https://gitlab.com/nslr/nslr-hmm.

Parameters:	data : `pandas.DataFrame` Pandas dataframe of x,y and timestamp positions. filter_type : `dict` Types of filters to use. config : `dict` Configuration data for data analysis. i.e. trial number, location.
Attributes:	data : `numpy.ndarray` The smoothed signal (or it’s n-th derivative). dr_th : `str` Data threshold.

Notes

Definitions

Saccade: The saccade is a ballistic movement, meaning it is pre-programmed and does not change once it has started. Saccades of amplitude 40° peak at velocities of 300–600°/s and last for 80–150 ms.
Fixation: The point between any two saccades, during which the eyes are relatively stationary and virtually all visual input occurs. Regular eye movement alternates between saccades and visual fixations, the notable exception being in smooth pursuit.
Smooth pursuit: Smooth pursuit movements are much slower tracking movements of the eyes designed to keep a moving stimulus on the fovea. Such movements are under voluntary control in the sense that the observer can choose whether or not to track a moving stimulus. (Neuroscience 2nd edition).

References

[1]	Pekkanen, J., & Lappi, O. (2017). A new and general approach to signal denoising and eye movement classification based on segmented linear regression. Scientific Reports, 7(1). doi:10.1038/s41598-017-17983-x.

classmethod idt(data, dis_threshold, dur_threshold)[source]¶

Identification with Dispersion Threshold.

Parameters:	data : `numpy.ndarray` The smoothed signal (or it’s n-th derivative). dr_th : `str` Fixation duration threshold in pix/msec di_th : `str` Dispersion threshold in pixels
Returns:	ys : `numpy.ndarray` The smoothed signal (or it’s n-th derivative).

Notes

The I-DT algorithm has two parameters: a dispersion threshold and the length of a time window in which the dispersion is calculated. The length of the time window is often set to the minimum duration of a fixation, which is around 100-200 ms.

classmethod ivt(data, v_threshold, config)[source]¶

Identification with Velocity Threshold.

In the I-VT model, the velocity value is computed for every eye position sample. The velocity value is then compared to the threshold. If the sampled velocity is less than the threshold, the corresponding eye-position sample is marked as part of a fixation, otherwise it is marked as a part of a saccade.

Parameters:	data : `numpy.ndarray`, shape (N,) the smoothed signal (or it’s n-th derivative). v_threshold : `str` Velocity threshold in pix/sec. config : `dict` Configuration data for data analysis. i.e. trial number, location.
Returns:	ys : `numpy.ndarray`, shape (N,) the smoothed signal (or it’s n-th derivative).

Notes

From https://github.com/ecekt/eyegaze. Formula from: https://dl.acm.org/citation.cfm?id=355028

classmethod savitzky_golay(y, window_size, order, deriv=0, rate=1)[source]¶

Smooth (and optionally differentiate) data with a Savitzky-Golay filter.

The Savitzky-Golay filter removes high frequency noise from data. It has the advantage of preserving the original shape and features of the signal better than other types of filtering approaches, such as moving averages techniques.

Parameters:	y : `numpy.ndarray`, shape (N,) the values of the time history of the signal. window_size : `int` the length of the window. Must be an odd integer number. order : `int` the order of the polynomial used in the filtering. Must be less then window_size - 1. deriv : `int` the order of the derivative to compute (default = 0 means only smoothing)
Returns:	ys : `numpy.ndarray`, shape (N) the smoothed signal (or it’s n-th derivative).

Notes

The Savitzky-Golay is a type of low-pass filter, particularly suited for smoothing noisy data. The main idea behind this approach is to make for each point a least-square fit with a polynomial of high order over a odd-sized window centered at the point. For more information, see: http://wiki.scipy.org/Cookbook/SavitzkyGolay.

Examples

>>> t = np.linspace(-4, 4, 500)
>>> y = np.exp( -t**2 ) + np.random.normal(0, 0.05, t.shape)
>>> ysg = savitzky_golay(y, window_size=31, order=4)
>>> import matplotlib.pyplot as plt
>>> plt.plot(t, y, label='Noisy signal')
>>> plt.plot(t, np.exp(-t**2), 'k', lw=1.5, label='Original signal')
>>> plt.plot(t, ysg, 'r', label='Filtered signal')
>>> plt.legend()
>>> plt.show()

References

[1]	A. Savitzky, Golay, M. (1964). Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry. 36(8), pp 1627-1639.

[2]	S.A. Teukolsky, W.T. Vetterling, B.P. Flannery Numerical Recipes 3rd Edition: The Art of Scientific Computing. W.H. Press,Cambridge University Press ISBN-13: 9780521880688.

classmethod simple(df, missing, maxdist, mindur)[source]¶

Detects fixations, defined as consecutive samples with an inter-sample distance of less than a set amount of pixels (disregarding missing data).

Parameters:	df : `pandas.DataFrame` Pandas dataframe of x,y and timestamp positions missing : `str` Value to be used for missing data (default = 0.0) maxdist : `str` Maximal inter-sample distance in pixels (default = 25) mindur : `str` Minimal duration of a fixation in milliseconds; detected fixation cadidates will be disregarded if they are below this duration (default = 100).
Returns:	Sfix : `numpy.ndarray` shape (N) list of lists, each containing [starttime] Efix : `numpy.ndarray` shape (N) list of lists, each containing [starttime, endtime, duration, endx, endy]

Notes

From https://github.com/esdalmaijer/PyGazeAnalyser/blob/master/pygazeanalyser/detectors.py

class imhr.r33.Metadata(isLibrary=False)[source]¶

Bases: imhr.r33._metadata.Metadata

Process participants metadata for analysis and export.

Parameters:	isLibrary : `bool` Check if required libraries are available. Default False.

Methods

`predict`(df)	Predicting screen size (cm), device (i.e.
`summary`(df, path)	Preparing data for use in analysis.

classmethod predict(df)[source]¶

Predicting screen size (cm), device (i.e. macbook 2018).

Parameters:	df : `numpy.ndarray` Pandas dataframe of raw data.
Returns:	df : `numpy.ndarray` Pandas dataframe of raw data.

classmethod summary(df, path)[source]¶

Preparing data for use in analysis.

Parameters:	df : `str` Pandas dataframe of raw data. path : `str` The directory path of the subject data
Attributes:	path : `str` Specific directory path used. attr2 : `str`, optional Description of attr2.
Returns:	df : `numpy.ndarray` Pandas dataframe of processed metadata.

Notes

You can either get data from all files within a directory (directory), or from a specific subject (subject_session).

Examples

>>> #if using path:
>>> df = getData(path=self.config['path'])

>>> #if getting data for single subject:
>>> df = getData(path=self.config['path'],subject_session=['1099','1', '0'])

class imhr.r33.Model(isLibrary=False)[source]¶

Bases: imhr.r33._model.Model

Run statistical models for analysis.

Parameters:	isLibrary : `bool` Check if required libraries are available. Default False.

Methods

`anova`(config, y, f, df, csv, path, effects)	Run analysis of variance model using rpy2, seaborn and pandas.
`lmer`(config, y, f, df, exclude, csv, path, …)	Run linear mixed regression model, using rpy2, seaborn and pandas.
`logistic`(config, y, f, df, me, exclude, csv, …)	Run logistic regression model, using rpy2, seaborn and pandas.

classmethod anova(config, y, f, df, csv, path, effects, is_html=True)[source]¶

Run analysis of variance model using rpy2, seaborn and pandas.

Parameters:	y : `str` Response variable. f : `str` Formula to use for analysis. df : `pandas.DataFrame` Pandas dataframe of raw data. f : `str` R-compatiable formula. csv : `str` Name of generated CSV file to run analysis in R. path : `str` The directory path to save the generated files. effects : `list` List of main effects. is_html : `bool` Whether html should be generated.
Returns:	model : rpy2.robjects.methods.RS4 Python representation of an R instance of class ‘S4’. df_anova : `pandas.DataFrame` Pandas dataframe of model output. get_anova : `str` R script to run model. html : `str` HTML output.

Notes

Resources

Definition:

A test that allows one to make comparisons between the means of multiple groups of data, where two independent variables are considered.

Assumptions of ANOVA

Normal distribution (normality)
- Short: Samples are drawn from a normally distributed population (Q-Q Plot, Shapiro-Wilks Test)
- Detailed Definition: Residuals in data are normally distributed.
Homogeneity of variance (homoscedasticity)
- Short: Variances are equal (or similar).
- Detailed: Varience for a DV is constant across the sample. (residual vs fitted plot, Scale-Location plot, Levene’s test)
Independent observations
- Samples have been drawn independently of each other. No analysis needed.

Hypothesis Interpretation

Null: The means of all levels of an IV groups are equal.
Alternative: The mean of at least level of an IV is different.

classmethod lmer(config, y, f, df, exclude, csv, path, effects, is_html=True)[source]¶

Run linear mixed regression model, using rpy2, seaborn and pandas.

Parameters:

Parameters:	y : `str` Response variable. f : `list` of `str` Formula to use for analysis. df : `pandas.DataFrame` Pandas dataframe of raw data. f : `str` R-compatiable formula. exclude : `list` List of participants to be excluded. csv : `str` Name of generated CSV file to run analysis in R. path : `str` The directory path to save the generated files. effects : `list` List of main effects. is_html : `bool` Whether html should be generated.
Returns:	model : rpy2.robjects.methods.RS4 Python representation of an R instance of class ‘S4’. df_lmer : `pandas.DataFrame` Pandas dataframe of model output. get_lmer : `str` R script to run model. html : `str` HTML output.

y : str: Response variable.
f : list of str: Formula to use for analysis.
df : pandas.DataFrame: Pandas dataframe of raw data.
f : str: R-compatiable formula.
exclude : list: List of participants to be excluded.
csv : str: Name of generated CSV file to run analysis in R.
path : str: The directory path to save the generated files.
effects : list: List of main effects.
is_html : bool: Whether html should be generated.

Returns:

model : rpy2.robjects.methods.RS4: Python representation of an R instance of class ‘S4’.
df_lmer : pandas.DataFrame: Pandas dataframe of model output.
get_lmer : str: R script to run model.
html : str: HTML output.

Notes

Resources

classmethod logistic(config, y, f, df, me, exclude, csv, path, is_html=True)[source]¶

Run logistic regression model, using rpy2, seaborn and pandas.

Parameters:

Parameters:	y : `str` Response variable. f : `str` Formula to use for analysis. df : `pandas.DataFrame` Pandas dataframe of raw data. me : `list` of `str` List of main effects. exclude : `list` List of participants to be excluded. csv : `str` Name of generated CSV file to run analysis in R. path : `str` The directory path to save the generated files. is_html : `bool` Whether html should be generated.
Returns:	model : rpy2.robjects.methods.RS4 Python representation of an R instance of class ‘S4’. df_logit : `pandas.DataFrame` Pandas dataframe of model output. get_logit : `str` R script to run model. html : `str` HTML output.

y : str: Response variable.
f : str: Formula to use for analysis.
df : pandas.DataFrame: Pandas dataframe of raw data.
me : list of str: List of main effects.
exclude : list: List of participants to be excluded.
csv : str: Name of generated CSV file to run analysis in R.
path : str: The directory path to save the generated files.
is_html : bool: Whether html should be generated.

Returns:

model : rpy2.robjects.methods.RS4: Python representation of an R instance of class ‘S4’.
df_logit : pandas.DataFrame: Pandas dataframe of model output.
get_logit : str: R script to run model.
html : str: HTML output.

Notes

Resources

class imhr.r33.Processing[source]¶

Bases: imhr.r33._processing.Processing

Hub for running processing and analyzing raw data.

Parameters:

**kwargs : str or None, optional

Optional properties to control how this class will run:

These properties control additional core parameters for the API:

Property	Description
cores : `bool`	(if isMultiprocessing == True) Number of cores to use. Default is total available cores - 1.
isLibrary : `bool`	Check if required packages have been installed. Default is False.

Methods

`demographics`(source, destination[, isHTML])	Output list of demographics for easy html viewing.
`device`(source, destination[, isHTML])	Output list of variables for easy html viewing.
`html`([df, raw_data, name, path, source, …])	Create HTML output.
`preprocessing`(source[, isMultiprocessing, cores])	Preprocessing data for formatting and initial calculations.
`summary`(source, destination[, metadata, isHTML])	Generate summary from online raw data.
`variables`(source, destination[, isHTML])	Output list of variables for easy html viewing.

classmethod demographics(source, destination, isHTML=True)[source]¶

Output list of demographics for easy html viewing.

Parameters:	source : `str` Source path. destination : `str` Destination path. isHTML : `bool` Whether or not to export html.
Returns:	df_definitions : `pandas.DataFrame` demographics output.

classmethod device(source, destination, isHTML=True)[source]¶

Output list of variables for easy html viewing.

Parameters:	source : `str` Source path. destination : `str` Destination path. isHTML : `bool` Whether or not to export html.
Returns:	df_definitions : `pandas.DataFrame` Variables output.

classmethod html(df=None, raw_data=None, name=None, path=None, source=None, figure_title=None, intro=None, footnote=None, script='', **kwargs)[source]¶

Create HTML output.

Parameters:

destination : str

Path to save file to.

df : pandas.DataFrame

Pandas dataframe of analysis results data.

raw_data : pandas.DataFrame

Pandas dataframe of raw data.

name : str

(py::if source is logit) The name of csv file created.

path : str

The directory path of the html file.

source : str

The type of data being recieved.

figure_title : str

The title of the table or figure.

intro : str

The introduction of the group of figures or tables.

footnote : str

The footnote of the table or figure.

metadata : dict

Additional data to be included.

**kwargs : str, int, or None, optional

Additional properties, relevent for specific content types. Here’s a list of available properties:

These properties control additional parameters for displaying figures:

Property	Description
html_title : `str`	HTML title. This is visible on as a header for the html page.
metadata : `str`	Additional data to be included.

While these properties control additional parameters for displaying bokeh plots:

Property	Description
short, long : `str`	Short (aoi) and long form (Area of Interest) label of html page. This is primarily used for constructing metadata tags in html.
display : `str`	(For bokeh) The type of calibration/validation display.
trial : `str`	(For bokeh) The trial number for the eyetracking task.
session : `int`	(For bokeh) The session number for the eyetracking task.
day : `str`	(For bokeh) The day the eyetracking task was run.
bokeh_type : `str`	(If Bokeh) Control directory location. If trial, create trial plots.

Returns:

html : str: String of html code.

classmethod preprocessing(source, isMultiprocessing=False, cores=0)[source]¶

Preprocessing data for formatting and initial calculations. Steps include: * Dates converted to ISO format. * Variables formatted to camelCase. * Variables naming patterns consistant (i.e. osFullName, browserFullName). * Calculations for stimulus onset error, DotLoc onset error, median response time, median stimulus onset error, median DotLoc onset error

Parameters:	source : [type] [description] isMultiprocessing : bool, optional [description], by default False cores : int, optional [description], by default 0
Returns:	df : `pandas.DataFrame` Pandas dataframe output.

classmethod summary(source, destination, metadata=None, isHTML=True)[source]¶

Generate summary from online raw data.

Parameters:	source : `str` Source path. destination : `str` Destination path. metadata : `str` Metadata path. isHTML : `bool` Whether or not to export html.
Returns:	df : `pandas.DataFrame` Pandas dataframe output. error : `pandas.DataFrame` Pandas dataframe error output.

classmethod variables(source, destination, isHTML=True)[source]¶

Output list of variables for easy html viewing.

Parameters:	source : `str` Source path. destination : `str` Destination path. isHTML : `bool` Whether or not to export html.
Returns:	df_definitions : `pandas.DataFrame` Variables output.

class imhr.r33.Settings(isLibrary=False)[source]¶

Bases: imhr.r33._settings.Settings

Default settings for imhr.r33.Processing..

Parameters:	is_library : `bool` or list Check if required libraries are available. Default False.

Methods

definitions(config) Store definitions.

classmethod definitions(config)[source]¶

Store definitions.

Parameters:	message : `str` Log message. source : `str` Origin of call. Either debug or timestamp.
Returns:	config : `dict` Returned dictionary

Examples

CESD Group: in-text: m_[‘short’][‘cesd_group’] = ‘CESD Group’ title: m_[‘long’][‘cesd_group’] = ‘CESD Group’ definition: m_[‘def’][‘cesd_group’] “a binary measure of CESD score (between subjects; ‘Low’ (&lt16) and ‘High’ (≥16))”