The ABFE workflow

The Absolute binding free energy (ABFE) workflow provides a complete workflow that uses the energy files generated by MD engine as input and generates the binding free energy as well as the analysis plots.

Fully Automatic analysis

Absolute binding free energy (ABFE) calculations can be analyzed with two lines of code in a fully automated manner (similar to Alchemical Analysis). In this case, any parameters are set when invoking ABFE and reasonable defaults are chosen for any parameters not set explicitly. The two steps are to

  1. initialize an instance of the ABFE class

  2. invoke the run() method to execute complete workflow.

For a GROMACS ABFE simulation, executing the workflow would look similar to the following code (See how to configure the logger).

>>> from alchemtest.gmx import load_ABFE
>>> from alchemlyb.workflows import ABFE
>>> # Obtain the path of the data
>>> import os
>>> dir = os.path.dirname(load_ABFE()['data']['complex'][0])
>>> print(dir)
'alchemtest/gmx/ABFE/complex'
>>> workflow = ABFE(units='kcal/mol', software='GROMACS', dir=dir,
>>>                 prefix='dhdl', suffix='xvg', T=298, outdirectory='./')
>>> workflow.run(skiptime=10, uncorr='dhdl', threshold=50,
>>>              estimators=('MBAR', 'BAR', 'TI'), overlap='O_MBAR.pdf',
>>>              breakdown=True, forwrev=10)

The workflow uses the parsing to parse the data from the energy files, remove the initial unequilibrated frames and decorrelate the data with subsampling. The decorrelated dataset dHdl and u_nk are then passed to estimators for free energy estimation. The workflow will also perform a set of analysis that allows the user to examine the quality of the estimation.

File Input

This command expects the energy files to be structured in two common ways. It could either be

simulation
├── lambda_0
│   ├── prod.xvg
│   └── ...
├── lambda_1
│   ├── prod.xvg
│   └── ...
└── ...

Where dir='simulation/lambda_*', prefix='prod', suffix='xvg'. Or

dhdl_files
├── dhdl_0.xvg
├── dhdl_1.xvg
└── ...

Where dir='dhdl_files', prefix='dhdl_', suffix='xvg'.

Output

The workflow returns the free energy estimate using all of TI, BAR, MBAR. For ABFE calculations, the alchemical transformation is usually done is three stages, the bonded, coul and vdw which corresponds to the free energy contribution from applying the restraint to restrain the ligand to the protein, decouple/annihilate the coulombic interaction between the ligand and the protein and decouple/annihilate the protein-ligand lennard jones interactions. The result will be stored in summary as pandas.Dataframe.

                      MBAR  MBAR_Error        BAR  BAR_Error         TI  TI_Error
States 0 -- 1     0.065967    0.001293   0.066544   0.001661   0.066663  0.001675
       1 -- 2     0.089774    0.001398   0.089303   0.002101   0.089566  0.002144
       2 -- 3     0.132036    0.001638   0.132687   0.002990   0.133292  0.003055
...
       26 -- 27   1.243745    0.011239   1.245873   0.015711   1.248959  0.015762
       27 -- 28   1.128429    0.012859   1.124554   0.016999   1.121892  0.016962
       28 -- 29   1.010313    0.016442   1.005444   0.017692   1.019747  0.017257
Stages coul      10.215658    0.033903  10.017838   0.041839  10.017854  0.048744
       vdw       22.547489    0.098699  22.501150   0.060092  22.542936  0.106723
       bonded     2.374144    0.014995   2.341631   0.005507   2.363828  0.021078
       TOTAL     35.137291    0.103580  34.860619   0.087022  34.924618  0.119206

Output Files

For quality assessment, a couple of plots were generated and written to the folder specified by outdirectory.

The overlay matrix for the MBAR estimator will be plotted and saved to O_MBAR.pdf, which examines the overlap between different lambda windows.

The dHdl for TI will be plotted to dhdl_TI.pdf, allows one to examine if the lambda scheduling has covered the change of the gradient in the lambda space.

The dF states will be plotted to dF_state.pdf in portrait model and dF_state_long.pdf in landscape model, which allows the user to example the contributions from each lambda window.

The forward and backward convergence will be plotted to dF_t.pdf using MBAR and saved in convergence, which allows the user to examine if the simulation time is enough to achieve a converged result.

Semi-automatic analysis

The same analysis could also performed in steps allowing access and modification to the data generated at each stage of the analysis.

>>> from alchemtest.gmx import load_ABFE
>>> from alchemlyb.workflows import ABFE
>>> # Obtain the path of the data
>>> import os
>>> dir = os.path.dirname(load_ABFE()['data']['complex'][0])
>>> print(dir)
'alchemtest/gmx/ABFE/complex'
>>> # Load the data
>>> workflow = ABFE(software='GROMACS', dir=dir,
>>>                 prefix='dhdl', suffix='xvg', T=298, outdirectory='./')
>>> # Set the unit.
>>> workflow.update_units('kcal/mol')
>>> # Read the data
>>> workflow.read()
>>> # Decorrelate the data.
>>> workflow.preprocess(skiptime=10, uncorr='dhdl', threshold=50)
>>> # Run the estimator
>>> workflow.estimate(estimators=('mbar', 'bar', 'ti'))
>>> # Retrieve the result
>>> summary = workflow.generate_result()
>>> # Plot the overlap matrix
>>> workflow.plot_overlap_matrix(overlap='O_MBAR.pdf')
>>> # Plot the dHdl for TI
>>> workflow.plot_ti_dhdl(dhdl_TI='dhdl_TI.pdf')
>>> # Plot the dF states
>>> workflow.plot_dF_state(dF_state='dF_state.pdf')
>>> # Convergence analysis
>>> workflow.check_convergence(10, dF_t='dF_t.pdf')

API Reference

class alchemlyb.workflows.ABFE(T, units='kT', software='GROMACS', dir='.', prefix='dhdl', suffix='xvg', outdirectory='.')

Workflow for absolute and relative binding free energy calculations.

This workflow provides functionality similar to the alchemical-analysis.py script. It loads multiple input files from alchemical free energy calculations and computes the free energies between different alchemical windows using different estimators. It produces plots to aid in the assessment of convergence.

Parameters:
  • T (float) – Temperature in K.

  • units (str) – The unit used for printing and plotting results. {‘kcal/mol’, ‘kJ/mol’, ‘kT’}. Default: ‘kT’.

  • software (str) – The software used for generating input (case-insensitive). {‘GROMACS’, ‘AMBER’, ‘PARQUET’}. This option chooses the appropriate parser for the input file.

  • dir (str) – Directory in which data files are stored. Default: os.path.curdir.

  • prefix (str) – Prefix for datafile sets. This argument accepts regular expressions and the input files are searched using Path(dir).glob("**/" + prefix + "*" + suffix). Default: ‘dhdl’.

  • suffix (str) – Suffix for datafile sets. Default: ‘xvg’.

  • outdirectory (str) – Directory in which the output files produced by this script will be stored. Default: os.path.curdir.

logger

The logging object.

Type:

Logger

file_list

The list of filenames sorted by the lambda state.

Type:

list

New in version 1.0.0.

Changed in version 2.0.1: The dir argument expects a real directory without wildcards and wildcards will no longer work as expected. Use prefix to specify wildcard-based patterns to search under dir.

Changed in version 2.1.0: The serialised dataframe could be read via software=’PARQUET’.

read(read_u_nk=True, read_dHdl=True)

Read the u_nk and dHdL data from the file_list

Parameters:
  • read_u_nk (bool) – Whether to read the u_nk.

  • read_dHdl (bool) – Whether to read the dHdl.

u_nk_list

A list of pandas.DataFrame of u_nk.

Type:

list

dHdl_list

A list of pandas.DataFrame of dHdl.

Type:

list

run(skiptime=0, uncorr='dE', threshold=50, estimators=('MBAR', 'BAR', 'TI'), overlap='O_MBAR.pdf', breakdown=True, forwrev=None, *args, **kwargs)

The method for running the automatic analysis.

Parameters:
  • skiptime (float) – Discard data prior to this specified time as ‘equilibration’ data. Units are specified by the corresponding MD Engine. Default: 0.

  • uncorr (str) – The observable to be used for the autocorrelation analysis; ‘dE’.

  • threshold (int) – Proceed with correlated samples if the number of uncorrelated samples is found to be less than this number. If 0 is given, the time series analysis will not be performed at all. Default: 50.

  • estimators (str or list of str) – A list of the estimators to estimate the free energy with. Default: (‘MBAR’, ‘BAR’, ‘TI’).

  • overlap (str) – The filename for the plot of overlap matrix. Default: ‘O_MBAR.pdf’.

  • breakdown (bool) – Plot the free energy differences evaluated for each pair of adjacent states for all methods, including the dH/dlambda curve for TI. Default: True.

  • forwrev (int) – Plot the free energy change as a function of time in both directions, with the specified number of points in the time plot. The number of time points (an integer) must be provided. Specify as None will not do the convergence analysis. Default: None. By default, ‘MBAR’ estimator will be used for convergence analysis, as it is usually the fastest converging method. If the dataset does not contain u_nk, please run meth:~alchemlyb.workflows.ABFE.check_convergence manually with estimator=’TI’.

summary

The summary of the free energy estimate.

Type:

Dataframe

convergence

The summary of the convergence results. See forward_backward_convergence() for further explanation.

Type:

DataFrame

update_units(units=None)

Update the unit.

Parameters:

units ({'kcal/mol', 'kJ/mol', 'kT'}) – The unit used for printing and plotting results.

preprocess(skiptime=0, uncorr='dE', threshold=50)

Preprocess the data by removing the equilibration time and decorrelate the date.

Parameters:
  • skiptime (float) – Discard data prior to this specified time as ‘equilibration’ data. Units are specified by the corresponding MD Engine. Default: 0.

  • uncorr (str) – The observable to be used for the autocorrelation analysis; ‘dE’.

  • threshold (int) – Proceed with correlated samples if the number of uncorrelated samples is found to be less than this number. If 0 is given, the time series analysis will not be performed at all. Default: 50.

u_nk_sample_list

The list of u_nk after decorrelation.

Type:

list

dHdl_sample_list

The list of dHdl after decorrelation.

Type:

list

estimate(estimators=('MBAR', 'BAR', 'TI'), **kwargs)

Estimate the free energy using the selected estimator.

Parameters:
  • estimators (str or list of str) – A list of the estimators to estimate the free energy with. Default: [‘TI’, ‘BAR’, ‘MBAR’].

  • kwargs (dict) – Keyword arguments to be passed to the estimator.

estimator

The dictionary of estimators. The keys are in [‘TI’, ‘BAR’, ‘MBAR’]. Note that the estimators are in their original form where no unit conversion has been attempted.

Type:

dict

Changed in version 2.1.0: DeprecationWarning for using analytic error for MBAR estimator.

generate_result()

Summarise the result into a dataframe.

Returns:

The DataFrame with convergence data.

                      MBAR  MBAR_Error        BAR  BAR_Error         TI  TI_Error
States 0 -- 1     0.065967    0.001293   0.066544   0.001661   0.066663  0.001675
       1 -- 2     0.089774    0.001398   0.089303   0.002101   0.089566  0.002144
       2 -- 3     0.132036    0.001638   0.132687   0.002990   0.133292  0.003055
       3 -- 4     0.116494    0.001213   0.116348   0.002691   0.116845  0.002750
       4 -- 5     0.105251    0.000980   0.106344   0.002337   0.106603  0.002362
       5 -- 6     0.349320    0.002781   0.343399   0.006839   0.350568  0.007393
       6 -- 7     0.402346    0.002767   0.391368   0.006641   0.395754  0.006961
       7 -- 8     0.322284    0.002058   0.319395   0.005333   0.321542  0.005434
       8 -- 9     0.434999    0.002683   0.425680   0.006823   0.430251  0.007155
       9 -- 10    0.355672    0.002219   0.350564   0.005472   0.352745  0.005591
       10 -- 11   3.574227    0.008744   3.513595   0.018711   3.514790  0.018078
       11 -- 12   2.896685    0.009905   2.821760   0.017844   2.823210  0.018088
       12 -- 13   2.223769    0.011229   2.188885   0.018438   2.189784  0.018478
       13 -- 14   1.520978    0.012526   1.493598   0.019155   1.490070  0.019288
       14 -- 15   0.911279    0.009527   0.894878   0.015023   0.896010  0.015140
       15 -- 16   0.892365    0.010558   0.886706   0.015260   0.884698  0.015392
       16 -- 17   1.737971    0.025315   1.720643   0.031416   1.741028  0.030624
       17 -- 18   1.790706    0.025560   1.788112   0.029435   1.801695  0.029244
       18 -- 19   1.998635    0.023340   2.007404   0.027447   2.019213  0.027096
       19 -- 20   2.263475    0.020286   2.265322   0.025023   2.282040  0.024566
       20 -- 21   2.565680    0.016695   2.561324   0.023611   2.552977  0.023753
       21 -- 22   1.384094    0.007553   1.385837   0.011672   1.381999  0.011991
       22 -- 23   1.428567    0.007504   1.422689   0.012524   1.416010  0.013012
       23 -- 24   1.440581    0.008059   1.412517   0.013125   1.408267  0.013539
       24 -- 25   1.411329    0.009022   1.419167   0.013356   1.411446  0.013795
       25 -- 26   1.340320    0.010167   1.360679   0.015213   1.356953  0.015260
       26 -- 27   1.243745    0.011239   1.245873   0.015711   1.248959  0.015762
       27 -- 28   1.128429    0.012859   1.124554   0.016999   1.121892  0.016962
       28 -- 29   1.010313    0.016442   1.005444   0.017692   1.019747  0.017257
Stages coul      10.215658    0.033903  10.017838   0.041839  10.017854  0.048744
       vdw       22.547489    0.098699  22.501150   0.060092  22.542936  0.106723
       bonded     2.374144    0.014995   2.341631   0.005507   2.363828  0.021078
       TOTAL     35.137291    0.103580  34.860619   0.087022  34.924618  0.119206

Return type:

DataFrame

summary

The summary of the free energy estimate.

Type:

Dataframe

plot(*args, **kwargs)

The function for producing any plots.

plot_overlap_matrix(overlap='O_MBAR.pdf', ax=None)

Plot the overlap matrix for MBAR estimator using plot_mbar_overlap_matrix().

Parameters:
  • overlap (str) – The filename for the plot of overlap matrix. Default: ‘O_MBAR.pdf’

  • ax (matplotlib.axes.Axes) – Matplotlib axes object where the plot will be drawn on. If ax=None, a new axes will be generated.

Returns:

An axes with the overlap matrix drawn.

Return type:

matplotlib.axes.Axes

plot_ti_dhdl(dhdl_TI='dhdl_TI.pdf', labels=None, colors=None, ax=None)

Plot the dHdl for TI estimator using plot_ti_dhdl().

Parameters:
  • dhdl_TI (str) – The filename for the plot of TI dHdl. Default: ‘dhdl_TI.pdf’

  • labels (List) – list of labels for labelling all the alchemical transformations.

  • colors (List) – list of colors for plotting all the alchemical transformations. Default: [‘r’, ‘g’, ‘#7F38EC’, ‘#9F000F’, ‘b’, ‘y’]

  • ax (matplotlib.axes.Axes) – Matplotlib axes object where the plot will be drawn on. If ax=None, a new axes will be generated.

Returns:

An axes with the TI dhdl drawn.

Return type:

matplotlib.axes.Axes

plot_dF_state(dF_state='dF_state.pdf', labels=None, colors=None, orientation='portrait', nb=10)

Plot the dF states using plot_dF_state().

Parameters:
  • dF_state (str) – The filename for the plot of dF states. Default: ‘dF_state.pdf’

  • labels (List) – list of labels for labelling different estimators.

  • colors (List) – list of colors for plotting different estimators.

  • orientation (string) – The orientation of the figure. Can be portrait or landscape

  • nb (int) – Maximum number of dF states in one row in the portrait mode

Returns:

An Figure with the dF states drawn.

Return type:

matplotlib.figure.Figure

check_convergence(forwrev, estimator='MBAR', dF_t='dF_t.pdf', ax=None, **kwargs)

Compute the forward and backward convergence using plot_convergence().

Parameters:
  • forwrev (int) – Plot the free energy change as a function of time in both directions, with the specified number of points in the time plot. The number of time points (an integer) must be provided.

  • estimator ({'TI', 'BAR', 'MBAR'}) – The estimator used for convergence analysis. Default: ‘MBAR’

  • dF_t (str) – The filename for the plot of convergence. Default: ‘dF_t.pdf’

  • ax (matplotlib.axes.Axes) – Matplotlib axes object where the plot will be drawn on. If ax=None, a new axes will be generated.

  • kwargs (dict) – Keyword arguments to be passed to the estimator.

convergence
Type:

DataFrame

Returns:

An axes with the convergence drawn.

Return type:

matplotlib.axes.Axes