structure module contains the core class defining systems of
particles – the
Structure class. The API documentation makes this
class look intimidating, but its core features are the
(which is an
AtomList) and its
residues attribute (which is a
ResidueList). These two attributes are enough to write numerous files
(e.g., PDB and PDBx/mmCIF), and these are the only two attributes currently
populated through the PDB and mmCIF parsing routines.
A chemical structure composed of atoms, bonds, angles, torsions, and other topological features
Structure class may be instantiated directly, but is more often
created by one of the parsers (see below for
load_file) or extended to
support the structure files of various computational parmed programs.
Structure class has many attributes that store the various
topological features and force field parameters of the molecular model. These
include valence bonds, angles, torsions, and other more complex valence bond
terms (like coupled-torsion correction maps and a variety of other terms
describing common vibrational modes in the AMOEBA force field).
Each of these attributes is stored in a
TrackedList instance, which is a simple subclass of
list type that registers every time the list itself is changed
(e.g., by adding an item, deleting an item, swapping two items, etc.). In this
Structure can determine whether or not certain cached attributes
need to be updated. This is particularly important for various subclasses, like
AmberParm in which the topological
arrays and a dictionary of “raw data” need to be kept synchronized.
You should never delete or reassign any of these list attributes—instead you
should delete all of the items from the list using the idiom
If the list is empty, then no terms of that type exist in the model. All terms
stretch_bends to the bottom are exclusive to the Amoeba force field.
In addition to those attributes, there is a list of parameter types
corresponding to each of those valence parameters as well (except for the chiral
and multipole frame lists). For example, a
urey_bradley_types, … etc, that contain the parameter type objects
BondType objects, respectively, since
the Urey-Bradley term is identical in functional form to a simple bond). The
same warnings apply here—do not explicitly delete or reassign these lists.
There are also two special subclasses of
AtomList and the
ResidueList classes that have
extra functionality aiding in the automatic bookkeeping, as described below).
One of the strengths of the
Structure class and its corresponding
components (such as
Bond) is that it
automates the process of bookkeeping for your structure’s model. For instance,
Bond instance automatically
registers the pair of atoms involved in the bond as partners in the “bond
graph”, which is used to determine atom connectivity. So when you create a
Bond object, you should immediately add
it to that
bonds attribute. Likewise, if you give that
BondType, you should
immediately add that instance to the
bond_types attribute of the
Structure instance as well.
If you plan on creating your own models directly through the
API, you are encouraged to use existing parsers as examples, such as those for
the AMBER, GROMACS, and CHARMM topology files.
It is often of interest to store (and use) a particular conformation of a
molecule defined by the Cartesian coordinates of each atom. A
need not have an assigned set of atomic positions (many files do not provide
them by default, such as Amber prmtops, CHARMM PSF, and GROMACS topologies, just
to name a few). Others do define them, such as PDB and PDBx/mmCIF files.
ParmEd offers limited support for coordinate handling, and supports a handful of coordinate and trajectory file formats. However, it is not optimized for trajectory analysis, as there are far superior libraries for doing that (e.g., MDTraj and pytraj to name two). However, it is still often desirable to have access to atomic coordinates.
Structure instance, coordinates can be accessed as the
xz attributes on each atom individually, or they can be
accessed from the
coordinates attribute on the
coordinates attribute is a numpy array of shape
>>> import parmed as pmd >>> pdb = pmd.download_PDB('2koc') >>> pdb.atoms.xx, pdb.atoms.xy, pdb.atoms.xz (-8.886, -5.163, 9.647) >>> pdb.coordinates array([[ -8.886, -5.163, 9.647], [-10.305, -4.745, 10.229], [-10.282, -3.296, 10.528], ..., [ -9.693, 3.638, 2.337], [-10.552, 4.682, 0.317], [-11.877, 4.323, -1.696]])
coordinates attribute is simply a copy of the position attributes on
each atom (and is set to
None if any of the atoms do not have defined
positions). As such, if you modify individual positions in the
array, they will not actually change the locations of that atom. In fact, if
you change the
coordinates array, all of the cached coordinates will be
discarded and replaced with a copy of the position attributes on each atom:
>>> pdb.coordinates = [0, 0, 0] >>> pdb.atoms.xx, pdb.atoms.xy, pdb.atoms.xz (-8.886, -5.163, 9.647) >>> pdb.coordinates array([[ -8.886, -5.163, 9.647], [-10.305, -4.745, 10.229], [-10.282, -3.296, 10.528], ..., [ -9.693, 3.638, 2.337], [-10.552, 4.682, 0.317], [-11.877, 4.323, -1.696]])
Note if you change the position of the first atom directly, that does get
reflected in the coordinate array, but not in a reference to the old
coordinates – i.e., the
coordinates array is a cache that may be discarded
when it no longer matches the positions on each atom:
>>> coords = pdb.coordinates >>> pdb.atoms.xx = 0; pdb.atoms.xy = 0; pdb.atoms.xz = 0 >>> coords # A reference to the original cache! array([[ -8.886, -5.163, 9.647], [-10.305, -4.745, 10.229], [-10.282, -3.296, 10.528], ..., [ -9.693, 3.638, 2.337], [-10.552, 4.682, 0.317], [-11.877, 4.323, -1.696]]) >>> pdb.coordinates # A new cache is created array([[ 0. , 0. , 0. ], [-10.305, -4.745, 10.229], [-10.282, -3.296, 10.528], ..., [ -9.693, 3.638, 2.337], [-10.552, 4.682, 0.317], [-11.877, 4.323, -1.696]])
This may seem surprising at first, but makes sense when you realize that
Structure.coordinates is simply a descriptor that reports on the positions
of each atom.
However, assigning to the
coordinates attribute has special meaning. You
may assign any iterable of floating point numbers (must be reshape-able into a
numpy array with the last two dimensions being
(natom, 3)), which will be
automatically converted into a numpy array and assigned to the individual
atoms. For example, using the
coords reference we stored in our previous
example and assigned to
pdb.coordinates will reset the first atoms’ position
to its original value:
>>> pdb.coordinates = coords >>> pdb.atoms.xx, pdb.atoms.xy, pdb.atoms.xz (-8.8859999999999992, -5.1630000000000003, 9.6470000000000002) >>> pdb.coordinates array([[ -8.886, -5.163, 9.647], [-10.305, -4.745, 10.229], [-10.282, -3.296, 10.528], ..., [ -9.693, 3.638, 2.337], [-10.552, 4.682, 0.317], [-11.877, 4.323, -1.696]])
This brings us to our last note about working with coordinates. Several file
types, like PDB and PDBx/mmCIF, permit storing multiple conformations of the
coordinates attribute only returns the first one, and each
atom only stores a single x-, y-, and z-coordinate. However,
caches all of the conformations it reads (or is assigned). You can access all
conformers using the
Structure.get_coordinates() method, passing either a
conformer number or the word ‘all’ to get all coordinates.
As an example, consider the PDB 2KOC, which is an NMR structure with 20 conformers:
>>> pdb = pmd.download_PDB('2KOC') >>> pdb.get_coordinates(0) array([[ -8.886, -5.163, 9.647], [-10.305, -4.745, 10.229], [-10.282, -3.296, 10.528], ..., [ -9.693, 3.638, 2.337], [-10.552, 4.682, 0.317], [-11.877, 4.323, -1.696]]) >>> pdb.get_coordinates(1) array([[-10.637, -4.586, 11.116], [-11.86 , -5.331, 10.426], [-13.048, -4.453, 10.55 ], ..., [ -9.703, 4.539, 3.584], [-10.407, 5.725, 1.585], [-11.594, 5.52 , -0.533]]) >>> pdb.get_coordinates(19) array([[-10.439, -4.998, 9.616], [-11.992, -4.669, 9.507], [-12.19 , -3.251, 9.891], ..., [-10.938, 4.441, 2.276], [-11.359, 5.537, 0.158], [-12.051, 5.205, -2.146]]) >>> pdb.get_coordinates().shape # default is 'all' (20, 451, 3)
Be careful, though! Anything you do that makes the first conformer differ from the positions on each of the atoms will invalidate the cache and delete all conformers, as shown below:
>>> pdb.atoms.xx = 0 # Invalidates cached coordinates! >>> pdb.coordinates array([[ 0. , -5.163, 9.647], [-10.305, -4.745, 10.229], [-10.282, -3.296, 10.528], ..., [ -9.693, 3.638, 2.337], [-10.552, 4.682, 0.317], [-11.877, 4.323, -1.696]]) >>> pdb.get_coordinates().shape # only one frame now... (1, 451, 3)
The way that ParmEd ensures that
Structure.coordinates is always correct is
to generate a new numpy array from the positions each time the attribute is
accessed. As a result, repeated access to
Structure.coordinates can be
very slow, and should be avoided. Iterating over the coordinates directly
requires only a single access, as does storing a reference to it and accessing
that instead. Below, I show some performance timings from IPython for a simple
task of finding the lowest x-coordinate value among all atoms:
In : import parmed as pmd In : def slow_min_x(struct): ...: return min([struct.coordinates[i] ...: for i in range(len(struct.atoms))]) ...: In : def iter_min_x(struct): ...: return min([x for x in struct.coordinates]) ...: In : def ref_min_x(struct): ...: coords = struct.coordinates ...: return min([coords[i] for i in range(len(struct.atoms))]) ...: In : pmd.download_PDB('4lzt') Out: <Structure 1164 atoms; 274 residues; 0 bonds; PBC (triclinic); NOT parametrized> In : struct = pmd.download_PDB('4lzt') In : %timeit slow_min_x(struct) 1 loops, best of 3: 1.08 s per loop In : %timeit iter_min_x(struct) 1000 loops, best of 3: 1.2 ms per loop In : %timeit ref_min_x(struct) 1000 loops, best of 3: 1.29 ms per loop
Notice – the
slow_min_x function is about 1000x slower than all of the
others! This is a contrived example, but it shows the danger of repeatedly
accessing elements of
Structure.coordinates instead of iterating over it or
working with a reference to the cached numpy array.
Structure manipulation: slicing, combining, replicating, and splitting¶
This section describes a number of simple, yet powerful manipulations you can do
Structure instances (and, by
extension, instances of their subclasses). In order, these are slicing,
combining (or merging), and replicating.
One of the things that ParmEd tries to make easy to do is to select a subset of
Structure instance as a new
Structure (complete with all
remaining atoms, residues, and parameters) by using the common Python idiom of
The slice, or selection syntax of
Structure is designed to be flexible, expressive, and
intuitive (although accomplishing all of these is a challenge!).
There are three many ways to select from a
By atom index
By Amber selection mask (see
By selection arrays, mask arrays, or slices of chains, residues, and/or atoms.
When selecting from a
instance, the return value can be one of two things:
Atominstance if the selection specified only a single atom either by atom index, atom index within a residue index, or an atom index within a residue index within a single chain.
Structurewith all of the selected atoms and all parameters that were present between the selected atoms. In this case, a copy is made of all selected atoms. If no atoms were selected, the resulting structure is empty, and will evaluate to boolean
False. Note that selections return the same type as the original object being selected, so the resulting object may be a subclass of
The return value of a selection—unless it is selecting a single atom—is a copy of the original structure, meaning that changes to the result of the slice will not change the structure from which you sliced.
This is not always desirable. In cases where you want the resulting structure to
contain the same atoms, residues, bonds, etc. as the original Structure so
that you can simplify the process of modifying a subset of the structure, you
want to use the
view descriptor of
Structure instead. This is described in more detail below.
Let’s look at the simplest form of the selection syntax – by atom index:
>>> from parmed import load_file >>> struct = load_file('4LZT.cif') # use the 4LZT.cif file in test/files/ >>> struct <Structure 1164 atoms; 274 residues; 1043 bonds; PBC; NOT parametrized> >>> struct <Atom N ; In LYS 0> >>> struct <Atom CA ; In VAL 1> >>> struct <Atom O ; In HOH 273> >>> struct[-1] # Negative indices also work! <Atom O ; In HOH 273> >>> struct Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/.../structure.py", line 1021, in __getitem__ return self.atoms[selection] File "/.../topologyobjects.py", line 3932, in __getitem__ retval = list.__getitem__(self, thing) IndexError: list index out of range
Oops! Our original structure only had 1164 atoms, so selecting the 1165th index
(remember, Python indices start from 0) results in an
IndexError. As with
Python slicing, though, slice and Amber mask selections simply return
if no atoms match the selection.
Now let’s look at the slightly more complex Amber mask selection. See Amber mask syntax for more details.
If you pass a single string, it will be interpreted as an Amber mask. The
example below continues from the same
struct we were using above:
>>> struct['@CA'] <Structure 129 atoms; 129 residues; 0 bonds; NOT parametrized> >>> struct[':1-10'] <Structure 75 atoms; 10 residues; 75 bonds; NOT parametrized> >>> struct[':1-10@CA,CB'] <Structure 19 atoms; 10 residues; 9 bonds; NOT parametrized> >>> struct[':10<@5'] <Structure 49 atoms; 13 residues; 43 bonds; NOT parametrized> >>> struct[':10<:5'] <Structure 88 atoms; 13 residues; 84 bonds; NOT parametrized>
Now let’s look at mask arrays.
I’ll define my terminology. A mask array is a boolean array (or it is
interpreted as a boolean array) with a Truthy-value (e.g.,
etc.) for atoms you want to select and a Falsey-value (e.g.,
etc.) for atoms you do not want to select. A selection array, described in
more detail below, is an array of atom indices (first index is 0) of the atoms
you want to select. These arrays can be any subscriptable iterable (e.g.,
A mask array must have the same number of elements as your system has atoms.
So if an array has the same number of elements as you have atoms, it is
interpreted as a mask array. A common use-case for a mask array is to convert a
Structure to a Pandas
for more information), and using Pandas/numpy to generate the mask array. An
example of this is shown below:
>>> df = struct.to_dataframe() >>> struct[df.name == 'CA'] <Structure 129 atoms; 129 residues; 0 bonds; NOT parametrized> >>> struct[(df.name == 'CA')&(df.resid < 10)] <Structure 10 atoms; 10 residues; 0 bonds; NOT parametrized>
A mask array cannot be used alongside a residue or chain selection. In fact, ParmEd will simply interpret that array as a selection array, which may lead to unexpected results, as seen below:
>>> struct['A',:,df.name == 'CA'] <Structure 409 atoms; 274 residues; 135 bonds; NOT parametrized>
While you might expect this to select only the
CA atoms (since all residues
are in chain A), the key is that
df.name == 'CA' is interpreted as a
selection array, and it contains 0s and 1s. So it selects the first two atoms of
every residue (not all residues have more than 1 atom, though, which is why the
number of atoms is not double the number of residues).
Now let’s look at selection arrays and slices of chains, residues, and/or atoms
Structure instances can take between 1
and 3 “slots” in their indexing scheme, corresponding to atom, residue, and
chain selections. If one slot is used, that selection applies to the list of
atoms. If two slots are used, the first slot applies to the list of residues
while the second slot refers to the list of atoms within the selected
residues. Note the difference between the atom selections when a residue
selection is given compared to when one is not. If all three slots are used, the
first is interpreted as a chain selection, the second as a selection of residues
within each selected chain and the third as a selection of each atom within
Slots can be assigned either an index, string, slice, or selection array. Each is defined below:
An index is a number between 0 and the number of elements in either the residue or atom list (depending on which slot is found in).
A string is interpreted as a single name that must match every chain, residue, or atom name (as determined by the slot it is found in).
A slice is a standard Python slice (e.g.,
iterable[10:20:2]) and can be used for either the residue or atom selections. A raw
:in any slot means a “full slice” that selects everything according to its slot.
A selection array is an array of indexes or strings (interpreted as names), and can have as many elements as you want (including negative numbers to count from the end of the atom list). but no atom index can be outside the range of the atom list (or you will get an
IndexError). So practically speaking, there is no use-case for a selection array to have the same length as a mask array, since that would either include duplicates or select every atom.
Let’s have a look at some examples, again continuing with the
we defined above:
>>> struct[10:20:2] <Structure 5 atoms; 2 residues; 0 bonds; NOT parametrized> >>> struct[0,10:20:2] # note, this is the 10th to 20th atom of the 1st residue! <Structure 0 atoms; 0 residues; 0 bonds; NOT parametrized> >>> struct[:,0] # First atom of every residue <Structure 274 atoms; 274 residues; 0 bonds; NOT parametrized> >>> struct['A',:,:] # All atoms in chain A <Structure 1164 atoms; 274 residues; 1043 bonds; NOT parametrized> >>> struct[:,'CA'] # All atoms named CA in all residues <Structure 129 atoms; 129 residues; 0 bonds; NOT parametrized>
There is so much flexibility in the Atom selection here that we can’t possibly cover everything. You are encouraged to try things out!
In the previous section, we alluded to a way of applying the selection syntax to obtain a view of a structure, rather than a full copy of the subset of selected atoms. You still need to familiarize yourself with the selection syntax, as it is the same when you are trying to take a view.
However, instead of selecting directly from the
Structure instance, you instead select from
Structure.view, as demonstrated below on a downloaded PDB:
>>> import parmed as pmd >>> pdb = pmd.download_PDB('4lzt') >>> pdb.residues <Residue LYS; chain=A> >>> # Changing a slice does NOT change the original ... pdb[:1,:].residues.name = 'MOL' >>> pdb.residues <Residue LYS; chain=A> >>> # However, changing a view DOES change the original ... pdb.view[:1,:].residues.name = 'MOL' >>> pdb.residues <Residue MOL; chain=A>
Two structures can be combined in the sense that a
Structure instance can be created by taking the atoms,
residues, and parameters of one
Structure and tacking it on to the end of
This can naturally be thought of as a sum of two
Structure instances, so
it was implemented via the addition operator. This can be done both in-place
(in a way that modifies the first
Structure) as well as creating a new copy
that is the sum of the originals. This is demonstrated below using AMBER
topology files of two small molecules, phenol and biphenyl, which can be found
test/files directory of the ParmEd distribution (
>>> phenol = load_file('phenol.prmtop') >>> biphenyl = load_file('biphenyl.prmtop') >>> phenol <AmberParm 13 atoms; 1 residues; 13 bonds; parametrized> >>> biphenyl <AmberParm 22 atoms; 1 residues; 23 bonds; parametrized> >>> phenol + biphenyl <AmberParm 35 atoms; 2 residues; 36 bonds; parametrized> >>> # Note that neither phenol or biphenyl have changed ... phenol <AmberParm 13 atoms; 1 residues; 13 bonds; parametrized> >>> biphenyl <AmberParm 22 atoms; 1 residues; 23 bonds; parametrized>
Note that the order of addition controls the order that the atoms are added to
Structure, as you would
>>> [len(residue) for residue in (phenol + biphenyl).residues] [13, 22] >>> [len(residue) for residue in (biphenyl + phenol).residues] [22, 13]
In-place addition is also supported, which can be noticeably more efficient than combining using standard addition, particularly for large systems. Note, if you are adding a large and a small structure together, adding the small one to the large one in-place is the most efficient way to do that. In-place combination is demonstrated below:
>>> phenol <AmberParm 13 atoms; 1 residues; 13 bonds; parametrized> >>> biphenyl <AmberParm 22 atoms; 1 residues; 23 bonds; parametrized> >>> phenol += biphenyl >>> phenol <AmberParm 35 atoms; 2 residues; 36 bonds; parametrized>
The addition preserves both the valence terms and the parameters from both
structures. All of the parameter type arrays (e.g.,
bond_types) will be
the sum of the type arrays from the two structures (you will see why this is
important in the next section about replicating structures).
One final comment to make is in regards to the type of the resulting
Structure instance. You can add any
Structure instances together,
including instances of subclasses (such as
AmberParm). The result will take the type of the
There are times when you also want to model several copies of the same
structure. Like with structure combining (described above), replicating is
implemented by overloading the natural mathematical operator – the
multiplication operator (
The only mode supported here is multiplying a
Structure instance by an integer, which indicates the
number of copies of the original structure will be added to the result. And
like with combination, replication can be done both in-place and not:
>>> phenol = load_file('phenol.prmtop') >>> phenol <AmberParm 13 atoms; 1 residues; 13 bonds; parametrized> >>> phenol * 2 <AmberParm 26 atoms; 2 residues; 26 bonds; parametrized> >>> 100 * phenol # multiplication can commute <AmberParm 1300 atoms; 100 residues; 1300 bonds; parametrized> >>> # phenol still hasn't changed ... phenol <AmberParm 13 atoms; 1 residues; 13 bonds; parametrized> >>> # In-place replicate... phenol WILL change now ... phenol *= 10 >>> phenol <AmberParm 130 atoms; 10 residues; 130 bonds; parametrized>
One comment about the parameter type arrays (e.g.,
bond_type) – unlike
structure combination, all replicates have the same parameters, so there is no
reason to enlarge the type arrays. As a result, all valence terms in each
replicate points to the same parameter type as that same valence term in the
As a result, adding a structure to itself will result in an equivalent
Structure instance (in that it will
have the same atom and residue order, valence terms and their order, and each
parameter will have the same type), but combining a structure with itself will
double the size of its type arrays, while replicating it will not.
Finally, replication is more efficient than combination arising from the simpler nature of replicating a structure than combining two different ones.
Sometimes you want to be able to manipulate individual molecules inside a
Structure instance individually. The
Structure.split method does just
this. It uses the bond graph in order to identify which atoms belong to which
molecules and then return a list of molecules along with how many times that
molecule occurs in the original
It only returns one copy of each molecule due to the cost of splitting off
potentially thousands of solvent molecules in larger solvated systems. As a
result, the return value is a list of
tuple instances where each tuple is
Structure (or subclass) instances
followed by the number of times that structure occurs. Using our phenol and
biphenyl examples from earlier:
>>> phenol = load_file('test/files/phenol.prmtop') >>> biphenyl = load_file('test/files/biphenyl.prmtop') >>> phenol <AmberParm 13 atoms; 1 residues; 13 bonds; parametrized> >>> biphenyl <AmberParm 22 atoms; 1 residues; 23 bonds; parametrized> >>> (phenol*10 + biphenyl*10).split() [(<AmberParm 13 atoms; 1 residues; 13 bonds; parametrized>, 10), (<AmberParm 22 atoms; 1 residues; 23 bonds; parametrized>, 10)]
Identifies the file format of the specified file and returns its parsed contents.
load_file() function automatically determines the format of the file
whose name is passed as an argument. The following formats are currently
recognized and result in the instantiation of either a
Structure or one of its subclasses:
Gromacs topology file
Amber topology file
CHARMM PSF file
CHARMM coordinate file
OpenMM Topology object
Tinker XYZ file
Here, we will focus on instantiating a
Structure instance from PDB and
mmCIF files. PDB files and mmCIF files downloaded from the RCSB Protein Data
Bank or the world wide Protein Data Bank often contain a large amount of
metadata describing the structure, such as the citation information,
experimental method (e.g., X-ray crystallography or NMR spectroscopy), authors,
and related database entries (such as BMRB entries for NMR-solved structures).
This information is extracted from both PDB and PDBx/mmCIF files when available,
along with anisotropic B-factors.
The following sections will briefly demonstrate parsing a PDB file and a mmCIF
file to a
For the purposes of this example, we will download the 4LZT structure as both a PDB file and a CIF file. These structures are both used in the ParmEd unittest suite, so you can get the files from there, or download them from here.
The first example demonstrates reading the PDB file:
>>> from parmed import load_file >>> lzt_pdb = load_file('4lzt.pdb') >>> len(lzt_pdb.atoms) 1164 >>> len(lzt_pdb.residues) 274 >>> lzt_pdb.experimental # See the experimental method 'X-RAY DIFFRACTION' >>> # See how many chains we have >>> chain_list = set() >>> for residue in lzt_pdb.residues: ... chain_list.add(residue.chain) ... >>> chain_list set(['A']) >>> # Only one chain. Now see how many atoms have alternate conformations >>> atoms_with_altconf = [atom for atom in lzt_pdb.atoms ... if atom.other_locations] >>> len(atoms_with_altconf) 19 >>> # Just for sanity's sake, make sure that the sum of all atoms in all >>> # residues is equal to the total number of atoms >>> sum(len(residue) for residue in lzt_pdb.residues) 1164 >>> len(lzt_pdb.atoms) 1164 >>> lzt_pdb.atoms.anisou # Look, we even get anisotropic B-factors! array([ 0.2066, 0.1204, 0.1269, 0.0044, 0.0126, 0.0191])
The second example demonstrates reading the CIF file:
>>> from parmed import load_file >>> lzt_cif = load_file('4LZT.cif') >>> len(lzt_cif.atoms) 1164 >>> len(lzt_cif.residues) 274 >>> lzt_cif.experimental # See the experimental method 'X-RAY DIFFRACTION' >>> # See how many chains we have >>> chain_list = set() >>> for residue in lzt_cif.residues: ... chain_list.add(residue.chain) ... >>> chain_list set(['A']) >>> # Only one chain. Now see how many atoms have alternate conformations >>> atoms_with_altconf = [atom for atom in lzt_cif.atoms ... if atom.other_locations] >>> len(atoms_with_altconf) 19 >>> # Just for sanity's sake, make sure that the sum of all atoms in all >>> # residues is equal to the total number of atoms >>> sum(len(residue) for residue in lzt_cif.residues) 1164 >>> len(lzt_pdb.atoms) 1164 >>> lzt_cif.atoms.anisou # Look, we even get anisotropic B-factors! array([ 0.2066, 0.1204, 0.1269, 0.0044, 0.0126, 0.0191])
Converting PDBx/mmCIF files to PDB files¶
If you noticed the
write_pdb methods attached to the
Structure class, you may have deduced
that you can very simply convert a PDBx/mmCIF file to a PDB file.
This is likely to be increasingly popular, since the PDB is moving to the mmCIF format, but many programs in the field of computational parmed and physics has decades worth of legacy code built around PDB files. Not to worry! A quick 1-liner will seamlessly convert PDBx to PDB:
from parmed import load_file load_file('4LZT.cif').write_pdb('4lzt_converted.pdb', write_anisou=True, renumber=False)
In this case, the metadata is not copied (i.e., the
AUTHOR records, to name a few). Only the coordinates, unit cell
CYRST1 record), and optionally anisotropic B-factor lines are translated.
renumber argument tells
Structure to use the original PDB numbers, rather than
its internal number scheme that numbers sequentially from 1 to N, where N is
the number of residues.
If your system has more than 99,999 atoms (and/or more than 9,999 residues), the numbering cycles back, such that the atom serial number after 99,999 is 0, and the numbering starts again.