Subsampling

Functions for subsampling datasets.

The functions featured in this module can be used to easily subsample either dHdl or u_nk datasets to give less correlated timeseries.

API Reference

alchemlyb.preprocessing.subsampling.slicing(df, lower=None, upper=None, step=None, force=False)

Subsample a DataFrame using simple slicing.

Parameters:
  • df (DataFrame) – DataFrame to subsample.
  • lower (float) – Lower time to slice from.
  • upper (float) – Upper time to slice to (inclusive).
  • step (int) – Step between rows to slice by.
  • force (bool) – Ignore checks that DataFrame is in proper form for expected behavior.
Returns:

df subsampled.

Return type:

DataFrame

alchemlyb.preprocessing.subsampling.statistical_inefficiency(df, series=None, lower=None, upper=None, step=None)

Subsample a DataFrame based on the calculated statistical inefficiency of a timeseries.

If series is None, then this function will behave the same as slicing().

Parameters:
  • df (DataFrame) – DataFrame to subsample according statistical inefficiency of series.
  • series (Series) – Series to use for calculating statistical inefficiency. If None, no statistical inefficiency-based subsampling will be performed.
  • lower (float) – Lower bound to pre-slice series data from.
  • upper (float) – Upper bound to pre-slice series to (inclusive).
  • step (int) – Step between series items to pre-slice by.
Returns:

df subsampled according to subsampled series.

Return type:

DataFrame

See also

pymbar.timeseries.statisticalInefficiency()
detailed background
alchemlyb.preprocessing.subsampling.equilibrium_detection(df, series=None, lower=None, upper=None, step=None)

Subsample a DataFrame using automated equilibrium detection on a timeseries.

If series is None, then this function will behave the same as slicing().

Parameters:
  • df (DataFrame) – DataFrame to subsample according to equilibrium detection on series.
  • series (Series) – Series to detect equilibration on. If None, no equilibrium detection-based subsampling will be performed.
  • lower (float) – Lower bound to pre-slice series data from.
  • upper (float) – Upper bound to pre-slice series to (inclusive).
  • step (int) – Step between series items to pre-slice by.
Returns:

df subsampled according to subsampled series.

Return type:

DataFrame

See also

pymbar.timeseries.detectEquilibration()
detailed background