Using Various SelectionScheme Classes
Objective
The objective of this tutorial is to learn how to fully utilize set_method() and learn different SelectionSchemes that are preset in QMzyme. SelectionSchemes is an abstract base class designed to assist users in constructing chemically meaningful QMzymeRegion objects. Regions may be defined using the MDAnalysis selection string syntax, or predefined SelectionScheme subclasses. Importantly, selection metadata are stored in the QMzymeRegion creation_params attribute, allowing users to trace
how a region was generated and improving method transparency and reproducibility. The broader goal of the SelectionScheme module is to provide systematic and reproducible approaches for QM region selection in enzyme active site studies.
This workflow allows you to:
Learn and utilize various SelectionSchemes that are present in QMzyme.
In this specific example, we are using ketosteroid isomerase (KSI) as the model system. The structure for KSI is obtained from the PDB 1OH0 and MM-minimized prior to this tutorial.
Classes used in this example
Required Files
To start, you will need:
A fully prepped and protonated PDB
[ ]:
# Here are the necesary imports for this tutorial!
import QMzyme
import MDAnalysis
from QMzyme.data import PDB, CSA_pkl, CSA_holo, CSA_apo
from QMzyme.SelectionSchemes import DistanceCutoff, ChargeShiftAnalysis
Manual Region Selection
First, we can directly select the residues of our interest as our QM region. In here, the user needs to define which specific amino acid residues are included in the QM region with set_region(selection=). Be mindful that when selecting a region manually, you need to specify the name of the region. The selection nomenclature is derived from MDAnalysis.1,2 To select multiple residues, we use “resid A or resid B.” As odd as this may seem, this translates to “select atoms that are in residue A
or residue B,” which will ultimately result in the selection of these two amino acids.
[2]:
model = QMzyme.GenerateModel(PDB)
QMzyme.data.residue_charges.update({'EQU': -1})
model.set_region(selection="resid 263 or resid 16 or resid 40 or resid 103", name="region")
print(model.regions)
Charge information not present. QMzyme will try to guess region charges based on residue names consistent with AMBER naming conventions (i.e., aspartate: ASP --> Charge: -1, aspartic acid: ASH --> Charge: 0.). See QMzyme.data.residue_charges for the full set.
Nonconventional Residues Found
------------------------------
EQU --> Charge: UNK, defaulting to 0
You can update charge information for nonconventional residues by running
>>>QMzyme.data.residue_charges.update({'3LETTER_RESNAME':INTEGER_CHARGE}).
Note your changes will not be stored after you exit your session. It is recommended to only alter the residue_charges dictionary. If you alter the protein_residues dictionary instead that could cause unintended bugs in other modules (TruncationSchemes).
[<QMzymeRegion region contains 83 atom(s) and 4 residue(s)>]
You can also select an atom group using select_atoms() from the MDAnalysis Universe and create a region of it.1,2
[3]:
u = MDAnalysis.Universe(PDB)
atom_group = u.select_atoms('(byres around 3 resid 263) or resid 263')
model.set_region(selection=atom_group, name="MDA_selection")
print(model.regions)
[<QMzymeRegion region contains 83 atom(s) and 4 residue(s)>, <QMzymeRegion MDA_selection contains 275 atom(s) and 19 residue(s)>]
Distance Cutoff
The DistanceCutoff class selects residues based on their distance from a user-defined set_catalytic_center(). Specifically, the workflow identifies atoms within a specified cutoff distance of the catalytic center and determines which residues should be included in the resulting region. Users may choose whether the selection includes full-residue or partial-residue selection based only on atoms satisfying the cutoff criterion. Distance-based selection provides a straightforward method for
constructing QM regions. However, while the distance cutoff scheme is conceptually simple, this ad-hoc approach often requires large QM regions in systematic convergence studies.3
[4]:
model = QMzyme.GenerateModel(PDB)
QMzyme.data.residue_charges.update({'EQU': -1})
model.set_catalytic_center(selection='resid 263')
model.set_region(selection=DistanceCutoff, cutoff=3)
print(model.regions)
Charge information not present. QMzyme will try to guess region charges based on residue names consistent with AMBER naming conventions (i.e., aspartate: ASP --> Charge: -1, aspartic acid: ASH --> Charge: 0.). See QMzyme.data.residue_charges for the full set.
[<QMzymeRegion catalytic_center contains 37 atom(s) and 1 residue(s)>, <QMzymeRegion cutoff_3 contains 275 atom(s) and 19 residue(s)>]
ChargeShiftAnalysis
The ChargeShiftAnalysis class implements Charge Shift Analysis (CSA) for region residue selection.3,4 CSA identifies catalytically important residues by analyzing differences in partial atomic charges between holo and apo forms of the biomolecular system. The workflow in QMzyme proceeds in two stages:
An initial large-region selection (approximately 900-1000 atoms) is generated for both holo and apo systems using a distance-based cutoff (default range approximately 9-10 Å from the catalytic center). Quantum mechanical calculations are performed on both systems.
Partial charge differences are analyzed to identify residues that significantly perturb the electronic structure of the catalytic center.
The default selection size is based on recommendations from CSA development. Once the QM calculations are completed, the output files are supplied back into the CSA workflow to create the QMzymeRegion.
The first iteration of ChargeShiftAnalysis requires users to undergo set_catalytic_center() and QM_Method(). This will generate two QM input files for frequency calculation. For site-directed mutagenesis to alanine, user can define specific protein residues in alanine_mutation=.
[ ]:
model = QMzyme.GenerateModel(PDB)
qm_method = QMzyme.QM_Method(
basis_set='6-31G*',
functional='wB97XD',
qm_input='pop=hirshfeld',
program='gaussian'
)
model.set_catalytic_center('resid 263')
model.set_region(selection=ChargeShiftAnalysis, name=None, min_atoms=950, max_atoms=1050,
method=qm_method, alanine_mutation=None)
print(model.regions)
After running QM calculations on these two files, partial charge difference is calculated and assigned to QMzymeAtoms. This is used to calculate the partial charge difference between the residues. Then,ChargeShiftAnalysis SelectionScheme will create a region based on the residues that have a higher partial charge difference than the user-defined charge_threshold=. To get a .csv file containing the partial charge calculations, you can set charge_output_csv=True.
[6]:
model = QMzyme.GenerateModel(pickle_file=CSA_pkl)
qm_method = QMzyme.QM_Method(
basis_set='6-31G*',
functional='wB97X-D3',
qm_input='OPT FREQ',
program='orca'
)
model.set_region(selection=ChargeShiftAnalysis, method=qm_method, holo_output_files=CSA_holo,
apo_output_files=CSA_apo, pop="cm5", charge_threshold=0.05)
print(model.regions)
Calculating partial charge differences from provided QM output files.
Successfully retrieved CSA_holo_truncated, CSA_apo_truncated from the QMzymeModel.
Use of this selection scheme requires citing the following reference(s):
(1) Kulik, Heather J.; Zhang, Jianyu; Klinman, Judith P.; Martinez, Todd J.(2016) How Large Should the QM Region Be in QM/MM Calculations? The Case of Catechol O-Methyltransferase. Journal of Physical Chemistry B, 120(44). and (2) Karelina, M., & Kulik, H. J. (2017). Systematic quantum mechanical region determination in QM/MM simulation. Journal of chemical theory and computation, 13(2).
QMzymeRegion CSA_cutoff_region has an estimated charge of -1.
[<QMzymeRegion catalytic_center contains 37 atom(s) and 1 residue(s)>, <QMzymeRegion CSA_holo contains 1008 atom(s) and 88 residue(s)>, <QMzymeRegion CSA_holo_truncated contains 982 atom(s) and 88 residue(s)>, <QMzymeRegion CSA_apo contains 971 atom(s) and 87 residue(s)>, <QMzymeRegion CSA_apo_truncated contains 945 atom(s) and 87 residue(s)>, <QMzymeRegion CSA_cutoff_region contains 77 atom(s) and 5 residue(s)>]
References Cited
Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 32 (2011), 2319–2327. doi:10.1002/jcc.21787
Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler, D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein. MDAnalysis: A Python package for the rapid analysis of molecular dynamics simulations. In S. Benthall and S. Rostrup, editors, Proceedings of the 15th Python in Science Conference, pages 98-105, Austin, TX, 2016. SciPy. doi:10.25080/Majora-629e541a-00e
Kulik, Heather J.; Zhang, Jianyu; Klinman, Judith P.; Martinez, Todd J.(2016) How Large Should the QM Region Be in QM/MM Calculations? The Case of Catechol O-Methyltransferase. Journal of Physical Chemistry B, 120(44).
Karelina, M., & Kulik, H. J. (2017). Systematic quantum mechanical region determination in QM/MM simulation. Journal of chemical theory and computation, 13(2))