The ABFE workflow¶
The Absolute binding free energy (ABFE) workflow provides a complete workflow that uses the energy files generated by MD engine as input and generates the binding free energy as well as the analysis plots.
Fully Automatic analysis¶
Absolute binding free energy (ABFE) calculations can be analyzed with
two lines of code in a fully automated manner (similar to
Alchemical Analysis).
In this case, any parameters are set when invoking ABFE
and reasonable defaults are chosen for any parameters not set explicitly. The two steps
are to
For a GROMACS ABFE simulation, executing the workflow would look similar to the following code (See how to configure the logger).
>>> from alchemtest.gmx import load_ABFE
>>> from alchemlyb.workflows import ABFE
>>> # Obtain the path of the data
>>> import os
>>> dir = os.path.dirname(load_ABFE()['data']['complex'][0])
>>> print(dir)
'alchemtest/gmx/ABFE/complex'
>>> workflow = ABFE(units='kcal/mol', software='GROMACS', dir=dir,
>>> prefix='dhdl', suffix='xvg', T=298, outdirectory='./')
>>> workflow.run(skiptime=10, uncorr='dhdl', threshold=50,
>>> estimators=('MBAR', 'BAR', 'TI'), overlap='O_MBAR.pdf',
>>> breakdown=True, forwrev=10)
The workflow uses the parsing
to parse the data from the
energy files, remove the initial unequilibrated frames and decorrelate the data
with subsampling
. The decorrelated dataset
dHdl and u_nk are then passed to
estimators
for free energy estimation. The workflow will
also perform a set of analysis that allows the user to examine the quality of
the estimation.
File Input¶
This command expects the energy files to be structured in two common ways. It could either be
simulation
├── lambda_0
│ ├── prod.xvg
│ └── ...
├── lambda_1
│ ├── prod.xvg
│ └── ...
└── ...
Where dir='simulation/lambda_*', prefix='prod', suffix='xvg'
. Or
dhdl_files
├── dhdl_0.xvg
├── dhdl_1.xvg
└── ...
Where dir='dhdl_files', prefix='dhdl_', suffix='xvg'
.
Output¶
The workflow returns the free energy estimate using all of
TI
, BAR
,
MBAR
. For ABFE calculations, the alchemical
transformation is usually done is three stages, the bonded, coul and vdw
which corresponds to the free energy contribution from applying the
restraint to restrain the ligand to the protein, decouple/annihilate the
coulombic interaction between the ligand and the protein and
decouple/annihilate the protein-ligand lennard jones interactions. The result
will be stored in summary
as
pandas.Dataframe
.
MBAR MBAR_Error BAR BAR_Error TI TI_Error
States 0 -- 1 0.065967 0.001293 0.066544 0.001661 0.066663 0.001675
1 -- 2 0.089774 0.001398 0.089303 0.002101 0.089566 0.002144
2 -- 3 0.132036 0.001638 0.132687 0.002990 0.133292 0.003055
...
26 -- 27 1.243745 0.011239 1.245873 0.015711 1.248959 0.015762
27 -- 28 1.128429 0.012859 1.124554 0.016999 1.121892 0.016962
28 -- 29 1.010313 0.016442 1.005444 0.017692 1.019747 0.017257
Stages coul 10.215658 0.033903 10.017838 0.041839 10.017854 0.048744
vdw 22.547489 0.098699 22.501150 0.060092 22.542936 0.106723
bonded 2.374144 0.014995 2.341631 0.005507 2.363828 0.021078
TOTAL 35.137291 0.103580 34.860619 0.087022 34.924618 0.119206
Output Files¶
For quality assessment, a couple of plots were generated and written to the folder specified by outdirectory.
The overlay matrix for the MBAR estimator will be
plotted and saved to O_MBAR.pdf
, which examines the overlap between
different lambda windows.
The dHdl for TI will be plotted to
dhdl_TI.pdf
, allows one to examine if the lambda scheduling has
covered the change of the gradient in the lambda space.
The dF states will be plotted to dF_state.pdf
in
portrait model and dF_state_long.pdf
in landscape model, which
allows the user to example the contributions from each lambda window.
The forward and backward convergence will be plotted to dF_t.pdf
using
MBAR
and saved in
convergence
, which allows the user to
examine if the simulation time is enough to achieve a converged result.
Semi-automatic analysis¶
The same analysis could also performed in steps allowing access and modification to the data generated at each stage of the analysis.
>>> from alchemtest.gmx import load_ABFE
>>> from alchemlyb.workflows import ABFE
>>> # Obtain the path of the data
>>> import os
>>> dir = os.path.dirname(load_ABFE()['data']['complex'][0])
>>> print(dir)
'alchemtest/gmx/ABFE/complex'
>>> # Load the data
>>> workflow = ABFE(software='GROMACS', dir=dir,
>>> prefix='dhdl', suffix='xvg', T=298, outdirectory='./')
>>> # Set the unit.
>>> workflow.update_units('kcal/mol')
>>> # Read the data
>>> workflow.read()
>>> # Decorrelate the data.
>>> workflow.preprocess(skiptime=10, uncorr='dhdl', threshold=50)
>>> # Run the estimator
>>> workflow.estimate(estimators=('mbar', 'bar', 'ti'))
>>> # Retrieve the result
>>> summary = workflow.generate_result()
>>> # Plot the overlap matrix
>>> workflow.plot_overlap_matrix(overlap='O_MBAR.pdf')
>>> # Plot the dHdl for TI
>>> workflow.plot_ti_dhdl(dhdl_TI='dhdl_TI.pdf')
>>> # Plot the dF states
>>> workflow.plot_dF_state(dF_state='dF_state.pdf')
>>> # Convergence analysis
>>> workflow.check_convergence(10, dF_t='dF_t.pdf')
API Reference¶
- class alchemlyb.workflows.ABFE(T, units='kT', software='GROMACS', dir='.', prefix='dhdl', suffix='xvg', outdirectory='.')¶
Workflow for absolute and relative binding free energy calculations.
This workflow provides functionality similar to the
alchemical-analysis.py
script. It loads multiple input files from alchemical free energy calculations and computes the free energies between different alchemical windows using different estimators. It produces plots to aid in the assessment of convergence.- Parameters:
T (float) – Temperature in K.
units (str) – The unit used for printing and plotting results. {‘kcal/mol’, ‘kJ/mol’, ‘kT’}. Default: ‘kT’.
software (str) – The software used for generating input (case-insensitive). {‘GROMACS’, ‘AMBER’, ‘PARQUET’}. This option chooses the appropriate parser for the input file.
dir (str) – Directory in which data files are stored. Default: os.path.curdir.
prefix (str) – Prefix for datafile sets. This argument accepts regular expressions and the input files are searched using
Path(dir).glob("**/" + prefix + "*" + suffix)
. Default: ‘dhdl’.suffix (str) – Suffix for datafile sets. Default: ‘xvg’.
outdirectory (str) – Directory in which the output files produced by this script will be stored. Default: os.path.curdir.
- logger¶
The logging object.
- Type:
Logger
Added in version 1.0.0.
Changed in version 2.0.1: The dir argument expects a real directory without wildcards and wildcards will no longer work as expected. Use prefix to specify wildcard-based patterns to search under dir.
Changed in version 2.1.0: The serialised dataframe could be read via software=’PARQUET’.
- read(read_u_nk=True, read_dHdl=True)¶
Read the u_nk and dHdL data from the
file_list
- Parameters:
- u_nk_list¶
A list of
pandas.DataFrame
of u_nk.- Type:
- dHdl_list¶
A list of
pandas.DataFrame
of dHdl.- Type:
- run(skiptime=0, uncorr='dE', threshold=50, estimators=('MBAR', 'BAR', 'TI'), overlap='O_MBAR.pdf', breakdown=True, forwrev=None, *args, **kwargs)¶
The method for running the automatic analysis.
- Parameters:
skiptime (float) – Discard data prior to this specified time as ‘equilibration’ data. Units are specified by the corresponding MD Engine. Default: 0.
uncorr (str) – The observable to be used for the autocorrelation analysis; ‘dE’.
threshold (int) – Proceed with correlated samples if the number of uncorrelated samples is found to be less than this number. If 0 is given, the time series analysis will not be performed at all. Default: 50.
estimators (str or list of str) – A list of the estimators to estimate the free energy with. Default: (‘MBAR’, ‘BAR’, ‘TI’).
overlap (str) – The filename for the plot of overlap matrix. Default: ‘O_MBAR.pdf’.
breakdown (bool) – Plot the free energy differences evaluated for each pair of adjacent states for all methods, including the dH/dlambda curve for TI. Default:
True
.forwrev (int) – Plot the free energy change as a function of time in both directions, with the specified number of points in the time plot. The number of time points (an integer) must be provided. Specify as
None
will not do the convergence analysis. Default: None. By default, ‘MBAR’ estimator will be used for convergence analysis, as it is usually the fastest converging method. If the dataset does not contain u_nk, please run meth:~alchemlyb.workflows.ABFE.check_convergence manually with estimator=’TI’.
- summary¶
The summary of the free energy estimate.
- Type:
Dataframe
- convergence¶
The summary of the convergence results. See
forward_backward_convergence()
for further explanation.- Type:
DataFrame
- update_units(units=None)¶
Update the unit.
- Parameters:
units ({'kcal/mol', 'kJ/mol', 'kT'}) – The unit used for printing and plotting results.
- preprocess(skiptime=0, uncorr='dE', threshold=50)¶
Preprocess the data by removing the equilibration time and decorrelate the date.
- Parameters:
skiptime (float) – Discard data prior to this specified time as ‘equilibration’ data. Units are specified by the corresponding MD Engine. Default: 0.
uncorr (str) – The observable to be used for the autocorrelation analysis; ‘dE’.
threshold (int) – Proceed with correlated samples if the number of uncorrelated samples is found to be less than this number. If 0 is given, the time series analysis will not be performed at all. Default: 50.
- estimate(estimators=('MBAR', 'BAR', 'TI'), **kwargs)¶
Estimate the free energy using the selected estimator.
- Parameters:
- estimator¶
The dictionary of estimators. The keys are in [‘TI’, ‘BAR’, ‘MBAR’]. Note that the estimators are in their original form where no unit conversion has been attempted.
- Type:
Changed in version 2.1.0: DeprecationWarning for using analytic error for MBAR estimator.
- generate_result()¶
Summarise the result into a dataframe.
- Returns:
The DataFrame with convergence data.
MBAR MBAR_Error BAR BAR_Error TI TI_Error States 0 -- 1 0.065967 0.001293 0.066544 0.001661 0.066663 0.001675 1 -- 2 0.089774 0.001398 0.089303 0.002101 0.089566 0.002144 2 -- 3 0.132036 0.001638 0.132687 0.002990 0.133292 0.003055 3 -- 4 0.116494 0.001213 0.116348 0.002691 0.116845 0.002750 4 -- 5 0.105251 0.000980 0.106344 0.002337 0.106603 0.002362 5 -- 6 0.349320 0.002781 0.343399 0.006839 0.350568 0.007393 6 -- 7 0.402346 0.002767 0.391368 0.006641 0.395754 0.006961 7 -- 8 0.322284 0.002058 0.319395 0.005333 0.321542 0.005434 8 -- 9 0.434999 0.002683 0.425680 0.006823 0.430251 0.007155 9 -- 10 0.355672 0.002219 0.350564 0.005472 0.352745 0.005591 10 -- 11 3.574227 0.008744 3.513595 0.018711 3.514790 0.018078 11 -- 12 2.896685 0.009905 2.821760 0.017844 2.823210 0.018088 12 -- 13 2.223769 0.011229 2.188885 0.018438 2.189784 0.018478 13 -- 14 1.520978 0.012526 1.493598 0.019155 1.490070 0.019288 14 -- 15 0.911279 0.009527 0.894878 0.015023 0.896010 0.015140 15 -- 16 0.892365 0.010558 0.886706 0.015260 0.884698 0.015392 16 -- 17 1.737971 0.025315 1.720643 0.031416 1.741028 0.030624 17 -- 18 1.790706 0.025560 1.788112 0.029435 1.801695 0.029244 18 -- 19 1.998635 0.023340 2.007404 0.027447 2.019213 0.027096 19 -- 20 2.263475 0.020286 2.265322 0.025023 2.282040 0.024566 20 -- 21 2.565680 0.016695 2.561324 0.023611 2.552977 0.023753 21 -- 22 1.384094 0.007553 1.385837 0.011672 1.381999 0.011991 22 -- 23 1.428567 0.007504 1.422689 0.012524 1.416010 0.013012 23 -- 24 1.440581 0.008059 1.412517 0.013125 1.408267 0.013539 24 -- 25 1.411329 0.009022 1.419167 0.013356 1.411446 0.013795 25 -- 26 1.340320 0.010167 1.360679 0.015213 1.356953 0.015260 26 -- 27 1.243745 0.011239 1.245873 0.015711 1.248959 0.015762 27 -- 28 1.128429 0.012859 1.124554 0.016999 1.121892 0.016962 28 -- 29 1.010313 0.016442 1.005444 0.017692 1.019747 0.017257 Stages coul 10.215658 0.033903 10.017838 0.041839 10.017854 0.048744 vdw 22.547489 0.098699 22.501150 0.060092 22.542936 0.106723 bonded 2.374144 0.014995 2.341631 0.005507 2.363828 0.021078 TOTAL 35.137291 0.103580 34.860619 0.087022 34.924618 0.119206
- Return type:
DataFrame
- summary¶
The summary of the free energy estimate.
- Type:
Dataframe
- plot(*args, **kwargs)¶
The function for producing any plots.
- plot_overlap_matrix(overlap='O_MBAR.pdf', ax=None)¶
Plot the overlap matrix for MBAR estimator using
plot_mbar_overlap_matrix()
.- Parameters:
overlap (str) – The filename for the plot of overlap matrix. Default: ‘O_MBAR.pdf’
ax (matplotlib.axes.Axes) – Matplotlib axes object where the plot will be drawn on. If
ax=None
, a new axes will be generated.
- Returns:
An axes with the overlap matrix drawn.
- Return type:
matplotlib.axes.Axes
- plot_ti_dhdl(dhdl_TI='dhdl_TI.pdf', labels=None, colors=None, ax=None)¶
Plot the dHdl for TI estimator using
plot_ti_dhdl()
.- Parameters:
dhdl_TI (str) – The filename for the plot of TI dHdl. Default: ‘dhdl_TI.pdf’
labels (List) – list of labels for labelling all the alchemical transformations.
colors (List) – list of colors for plotting all the alchemical transformations. Default: [‘r’, ‘g’, ‘#7F38EC’, ‘#9F000F’, ‘b’, ‘y’]
ax (matplotlib.axes.Axes) – Matplotlib axes object where the plot will be drawn on. If
ax=None
, a new axes will be generated.
- Returns:
An axes with the TI dhdl drawn.
- Return type:
matplotlib.axes.Axes
- plot_dF_state(dF_state='dF_state.pdf', labels=None, colors=None, orientation='portrait', nb=10)¶
Plot the dF states using
plot_dF_state()
.- Parameters:
dF_state (str) – The filename for the plot of dF states. Default: ‘dF_state.pdf’
labels (List) – list of labels for labelling different estimators.
colors (List) – list of colors for plotting different estimators.
orientation (string) – The orientation of the figure. Can be portrait or landscape
nb (int) – Maximum number of dF states in one row in the portrait mode
- Returns:
An Figure with the dF states drawn.
- Return type:
matplotlib.figure.Figure
- check_convergence(forwrev, estimator='MBAR', dF_t='dF_t.pdf', ax=None, **kwargs)¶
Compute the forward and backward convergence using
plot_convergence()
.- Parameters:
forwrev (int) – Plot the free energy change as a function of time in both directions, with the specified number of points in the time plot. The number of time points (an integer) must be provided.
estimator ({'TI', 'BAR', 'MBAR'}) – The estimator used for convergence analysis. Default: ‘MBAR’
dF_t (str) – The filename for the plot of convergence. Default: ‘dF_t.pdf’
ax (matplotlib.axes.Axes) – Matplotlib axes object where the plot will be drawn on. If
ax=None
, a new axes will be generated.kwargs (dict) – Keyword arguments to be passed to the estimator.
- convergence¶
- Type:
DataFrame
- Returns:
An axes with the convergence drawn.
- Return type:
matplotlib.axes.Axes