Convergence API Reference¶
Functions for assessing convergence of free energy estimates and raw data.
The alchemlyb.convergence.convergence
module contains building blocks that perform a specific convergence analysis. They typically operate on lists of raw data and either run estimators on these data sets to obtain free energies as a function of the amount of data or they directly assess the convergence of the raw data.
Note
Read the original literature to learn the exact meaning of parameters and how to interpret the output of the convergence analysis.
All convergence functions are located in this submodule but for convenience they are also made available from alchemlyb.convergence
, as shown here:
- alchemlyb.convergence.forward_backward_convergence(df_list, estimator='MBAR', num=10, error_tol: float = 3, **kwargs)¶
Forward and backward convergence of the free energy estimate.
Generate the free energy estimate as a function of time in both directions, with the specified number of equally spaced points in the time [Klimovich2015]. For example, setting num to 10 would give the forward convergence which is the free energy estimate from the first 10%, 20%, 30%, … of the data. The Backward would give the estimate from the last 10%, 20%, 30%, … of the data.
- Parameters:
df_list (list) – List of DataFrame of either dHdl or u_nk, where each represents a different value of lambda.
estimator ({'MBAR', 'BAR', 'TI'}) –
Name of the estimators.
Deprecated since version 1.0.0: Lower case input is also accepted until release 2.0.0.
num (int) – The number of blocks used to divide each DataFrame and progressively add to assess convergence. Note that if the DataFrames are different lengths, the number of samples contributed with each block will be different.
error_tol (float) –
The maximum error tolerated for analytic error. If the analytic error is bigger than the error tolerance, the bootstrap error will be used.
Added in version 2.3.0.
Changed in version 2.4.0: Clarified docstring, removed incorrect estimation of std for cumulative result in bar and added check that only a single lambda state is represented in the indices of each df in df_list.
kwargs (dict) – Keyword arguments to be passed to the estimator.
- Returns:
The DataFrame with convergence data.
Forward Forward_Error Backward Backward_Error data_fraction 0 3.016442 0.052748 3.065176 0.051036 0.1 1 3.078106 0.037170 3.078567 0.036640 0.2 2 3.072561 0.030186 3.047357 0.029775 0.3 3 3.048325 0.026070 3.057527 0.025743 0.4 4 3.049769 0.023359 3.037454 0.023001 0.5 5 3.034078 0.021260 3.040484 0.021075 0.6 6 3.043274 0.019642 3.032495 0.019517 0.7 7 3.035460 0.018340 3.036670 0.018261 0.8 8 3.042032 0.017319 3.046597 0.017233 0.9 9 3.044149 0.016405 3.044385 0.016402 1.0
- Return type:
Added in version 0.6.0.
Changed in version 1.0.0: The
estimator
accepts uppercase input. The default for usingestimator='MBAR'
was changed fromMBAR
toAutoMBAR
.Changed in version 2.0.0: Use pymbar.MBAR instead of the AutoMBAR option.
- alchemlyb.convergence.fwdrev_cumavg_Rc(series, precision=0.01, tol=2)¶
Generate the convergence criteria \(R_c\) for a single simulation.
The input will be
pandas.Series
generated bydecorrelate_u_nk()
ordecorrelate_dhdl()
.The output will be the float \(R_c\) [Fan2020] [Fan2021] and a
pandas.DataFrame
with the forward and backward cumulative average at precision fractional increments, as described below.\(R_c = 0\) indicates that the system is well equilibrated right from the beginning while \(R_c = 1\) signifies that the whole trajectory is not equilibrated.
- Parameters:
series (pandas.Series) – The input energy array.
precision (float) – The precision of the output \(R_c\). To speed the calculation up, the data has been block-averaged before doing the calculation, the size of the block is controlled by the desired precision.
tol (float) – Tolerance (or convergence threshold \(\epsilon\) in [Fan2021]) in \(kT\).
- Returns:
float – Convergence time fraction \(R_c\) [Fan2021]
-
The DataFrame with block average.
Forward Backward data_fraction 0 3.016442 3.065176 0.1 1 3.078106 3.078567 0.2 2 3.072561 3.047357 0.3 3 3.048325 3.057527 0.4 4 3.049769 3.037454 0.5 5 3.034078 3.040484 0.6 6 3.043274 3.032495 0.7 7 3.035460 3.036670 0.8 8 3.042032 3.046597 0.9 9 3.044149 3.044385 1.0
Notes
This function computes \(R_c\) from equation 16 from [Fan2021]. The code is modified based on Shujie Fan’s (@VOD555) work. Zhiyi Wu (@xiki-tempula) improved the performance of the original algorithm.
Please cite [Fan2021] when using this function.
See also
Added in version 1.0.0.
- alchemlyb.convergence.A_c(series_list, precision=0.01, tol=2)¶
Generate the ensemble convergence criteria \(A_c\) for a set of simulations.
The input is a
list
ofpandas.Series
generated bydecorrelate_u_nk()
ordecorrelate_dhdl()
.The output will the float \(A_c\) [Fan2020] [Fan2021]. \(A_c\) is a number between 0 and 1 that can be interpreted as the ratio of the total equilibrated simulation time to the whole simulation time for a full set of simulations. \(A_c = 1\) means that all simulation time frames in all windows can be considered equilibrated, while \(A_c = 0\) indicates that nothing is equilibrated.
- Parameters:
series_list (list) – A list of
pandas.Series
energy array.precision (float) – The precision of the output \(A_c\). To speed the calculation up, the data has been block-averaged before doing the calculation, the size of the block is controlled by the desired precision.
tol (float) – Tolerance (or convergence threshold \(\epsilon\) in [Fan2021]) in \(kT\).
- Returns:
The area \(A_c\) under curve for convergence time fraction.
- Return type:
Notes
This function computes \(A_c\) from equation 18 from [Fan2021].
Please cite [Fan2021] when using this function.
See also
Added in version 1.0.0.
- alchemlyb.convergence.block_average(df_list, estimator='MBAR', num=10, **kwargs)¶
Free energy estimate for portions of the trajectory.
Generate the free energy estimate for a series of blocks in time, with the specified number of equally spaced points. For example, setting num to 10 would give the block averages which is the free energy estimate from the first 10% alone, then the next 10% … of the data.
- Parameters:
df_list (list) – List of DataFrame of either dHdl or u_nk, where each represents a different value of lambda.
estimator ({'MBAR', 'BAR', 'TI'}) – Name of the estimators.
num (int) – The number of blocks used to divide each DataFrame. Note that if the DataFrames are different lengths, the number of samples contributed to each block will be different.
kwargs (dict) – Keyword arguments to be passed to the estimator.
- Returns:
The DataFrame with estimate data.
FE FE_Error 0 3.016442 0.052748 1 3.078106 0.037170 2 3.072561 0.030186 3 3.048325 0.026070 4 3.049769 0.023359 5 3.034078 0.021260 6 3.043274 0.019642 7 3.035460 0.018340 8 3.042032 0.017319 9 3.044149 0.016405
- Return type:
Added in version 2.4.0.