Inspecting Model Architecture

Objective

The objective of this tutorial is to provide various methods to examine the QMzymeModel throughout the workflow.

This workflow allows you to:

  • Examine QMzymeModel and QMzymeResidue in various ways.

In this specific example, we are using ketosteroid isomerase (KSI) as the model system. The structure for KSI is obtained from the PDB 1OH0 and MM-minimized prior to this tutorial.

Classes used in this example

Required Files

To start, you will need:

  • A fully prepped and protonated PDB of the reference protein file with the ligand bound (if applicable)


[ ]:
import QMzyme
from QMzyme.SelectionSchemes import DistanceCutoff
from QMzyme.data import PDB
import pandas as pd

Before starting, here is the model system we will be using for this workflow.

[8]:
model = QMzyme.GenerateModel(PDB)
QMzyme.data.residue_charges.update({'EQU': -1})
qm_method = QMzyme.QM_Method(
    basis_set='6-31G*',
    functional='wB97XD',
    qm_input='OPT FREQ',
    program='gaussian'
)
model.set_catalytic_center(selection='resid 263')
model.set_region(selection='all', name='full_protein')
model.set_region(selection=DistanceCutoff, cutoff=3)
c_alpha_atoms = model.cutoff_3.get_atoms(attribute='name', value='CA')
model.cutoff_3.set_fixed_atoms(atoms=c_alpha_atoms)
qm_method.assign_to_region(region=model.cutoff_3)
model.truncate()

Charge information not present. QMzyme will try to guess region charges based on residue names consistent with AMBER naming conventions (i.e., aspartate: ASP --> Charge: -1, aspartic acid: ASH --> Charge: 0.). See QMzyme.data.residue_charges for the full set.
QMzymeRegion cutoff_3 has an estimated charge of -2.

Truncated model has been created and saved to attribute 'truncated' and stored in QMzyme.CalculateModel.calculation under key QM. This model will be used to write the calculation input.

Using Print Statements

One of the simplest ways to visualize the setup of your model is by directly printing its attributes. This provides a raw, immediate look at the system.

[3]:
# This outputs all QMzymeRegions currently registered in your QMzymeModel.
print(model.regions)

# This outputs all QMzymeAtoms currently registered in your QMzymeRegion.
print(model.catalytic_center.atoms)
[<QMzymeRegion catalytic_center contains 37 atom(s) and 1 residue(s)>, <QMzymeRegion full_protein contains 4258 atom(s) and 324 residue(s)>, <QMzymeRegion cutoff_3 contains 275 atom(s) and 19 residue(s)>, <QMzymeRegion cutoff_3_truncated contains 249 atom(s) and 19 residue(s)>]
[<QMzymeAtom 4005: C1 of resname EQU, resid 263>, <QMzymeAtom 4006: O1 of resname EQU, resid 263>, <QMzymeAtom 4007: C2 of resname EQU, resid 263>, <QMzymeAtom 4008: C3 of resname EQU, resid 263>, <QMzymeAtom 4009: C4 of resname EQU, resid 263>, <QMzymeAtom 4010: C5 of resname EQU, resid 263>, <QMzymeAtom 4011: C6 of resname EQU, resid 263>, <QMzymeAtom 4012: C7 of resname EQU, resid 263>, <QMzymeAtom 4013: C8 of resname EQU, resid 263>, <QMzymeAtom 4014: C9 of resname EQU, resid 263>, <QMzymeAtom 4015: C10 of resname EQU, resid 263>, <QMzymeAtom 4016: C11 of resname EQU, resid 263>, <QMzymeAtom 4017: C12 of resname EQU, resid 263>, <QMzymeAtom 4018: C13 of resname EQU, resid 263>, <QMzymeAtom 4019: C14 of resname EQU, resid 263>, <QMzymeAtom 4020: C15 of resname EQU, resid 263>, <QMzymeAtom 4021: C16 of resname EQU, resid 263>, <QMzymeAtom 4022: C17 of resname EQU, resid 263>, <QMzymeAtom 4023: O2 of resname EQU, resid 263>, <QMzymeAtom 4024: C18 of resname EQU, resid 263>, <QMzymeAtom 4025: H1 of resname EQU, resid 263>, <QMzymeAtom 4026: H2 of resname EQU, resid 263>, <QMzymeAtom 4027: H3 of resname EQU, resid 263>, <QMzymeAtom 4028: H4 of resname EQU, resid 263>, <QMzymeAtom 4029: H5 of resname EQU, resid 263>, <QMzymeAtom 4030: H6 of resname EQU, resid 263>, <QMzymeAtom 4031: H7 of resname EQU, resid 263>, <QMzymeAtom 4032: H8 of resname EQU, resid 263>, <QMzymeAtom 4033: H9 of resname EQU, resid 263>, <QMzymeAtom 4034: H10 of resname EQU, resid 263>, <QMzymeAtom 4035: H11 of resname EQU, resid 263>, <QMzymeAtom 4036: H12 of resname EQU, resid 263>, <QMzymeAtom 4037: H13 of resname EQU, resid 263>, <QMzymeAtom 4038: H14 of resname EQU, resid 263>, <QMzymeAtom 4039: H15 of resname EQU, resid 263>, <QMzymeAtom 4040: H16 of resname EQU, resid 263>, <QMzymeAtom 4041: H17 of resname EQU, resid 263>]

Using region.summarize with Pandas Dataframe

Next, we will examine the region using pandas and summarize() method. When using summarize() directly, it returns a list of attributes assigned to the specific region.

[4]:
model.catalytic_center.summarize()
[4]:
{'Resid': [np.int64(263)],
 'Resname': ['EQU'],
 'Charge': [-1],
 'Removed atoms': [[]],
 'Fixed atoms': [[]],
 'Segids': ['A']}

As region size gets larger, the list also gets larger. To create more approachable data sets, we can use a Python package Pandas, to transfer the list into a table!

[5]:
df = pd.DataFrame(model.catalytic_center.summarize())
df
[5]:
Resid Resname Charge Removed atoms Fixed atoms Segids
0 263 EQU -1 [] [] A

This is especially useful when looking at a truncated QMzymeRegion to examine their designated attributes and conditions.

[6]:
df = pd.DataFrame(model.cutoff_3_truncated.summarize())
df
[6]:
Resid Resname Charge Removed atoms Fixed atoms Segids
0 16 TYR 0 [N, H, C, O] [CA] QM
1 20 VAL 0 [N, H, C, O] [CA] QM
2 40 ASP -1 [N, H, C, O] [CA] QM
3 60 GLY 0 [N, H] [CA] QM
4 61 LEU 0 [C, O] [CA] QM
5 66 VAL 0 [N, H, C, O] [CA] QM
6 86 PHE 0 [N, H, C, O] [CA] QM
7 88 VAL 0 [N, H, C, O] [CA] QM
8 90 MET 0 [N, H, C, O] [CA] QM
9 99 LEU 0 [N, H, C, O] [CA] QM
10 101 VAL 0 [N, H, C, O] [CA] QM
11 103 ASH 0 [N, H, C, O] [CA] QM
12 118 ALA 0 [N, H, C, O] [CA] QM
13 120 TRP 0 [N, H, C, O] [CA] QM
14 263 EQU -1 [] [] QM
15 372 WAT 0 [] [] QM
16 373 WAT 0 [] [] QM
17 376 WAT 0 [] [] QM
18 378 WAT 0 [] [] QM

Using print_overview()

To get a more general overview of the system, you can use print_overview() method to examine both QMzymeModel and QMzymeRegion. This method acts as a diagnostic report for your entire workflow.

[7]:
model.print_overview()
-----------------------------
Model Overview: 1oh0
-----------------------------
  - total atoms: 4258
  - total residues: 324
  - total regions: 4
-----------------------------
Region Overview
-----------------------------
Region Name: catalytic_center
  - atoms: 37
  - residues: 1
  - method: None
  - selection_scheme: resid 263
-----------------------------
Region Name: full_protein
  - atoms: 4258
  - residues: 324
  - method: None
  - selection_scheme: all
-----------------------------
Region Name: cutoff_3
  - atoms: 275
  - residues: 19
  - method: {'type': 'QM', 'qm_input': '6-31G* wB97XD OPT FREQ', 'basis_set': '6-31G*', 'functional': 'wB97XD', 'qm_end': '', 'program': 'gaussian', 'freeze_atoms': [2, 23, 39, 51, 58, 77, 93, 113, 129, 146, 165, 181, 194, 204], 'mult': 1, 'charge': -2}
  - selection_scheme: DistanceCutoff
  - cutoff: 3
-----------------------------
Region Name: cutoff_3_truncated
  - atoms: 249
  - residues: 19
  - method: {'type': 'QM', 'qm_input': '6-31G* wB97XD OPT FREQ', 'basis_set': '6-31G*', 'functional': 'wB97XD', 'qm_end': '', 'program': 'gaussian', 'freeze_atoms': [2, 23, 39, 51, 58, 77, 93, 113, 129, 146, 165, 181, 194, 204], 'mult': 1, 'charge': -2}
  - selection_scheme: truncated from cutoff_3
-----------------------------