File input/output with ParmEd¶
ParmEd has several convenience functions designed specifically to make file input/output easier for a wide range of file formats.
The API for parsing different file types is not always consistent due to the
technical needs of each class. For instance, the Amber topology file class
subclasses from Structure
, so it can be
instantiated directly from a filename. On the other hand, PDBFile
contains a parse
static method that returns a
Structure
instance directly rather than
subclassing.
Reading files with load_file
¶
|
Identifies the file format of the specified file and returns its parsed contents. |
To provide a single interface for parsing any type of file, the
load_file
function takes a filename
and any relevant arguments or keyword arguments for the supported file formats
and calls the appropriate parsing function. Supported file formats along with
the extra supported keyword arguments (if applicable) are shown in the table
below (the filename argument is omitted):
File type |
Supported arguments |
---|---|
Amber ASCII restart file |
None |
Amber prmtop |
|
Amber MDL (e.g., RISM) |
None |
Amber MDCRD trajectory |
|
Amber OFF library |
None |
Amber frcmod/parm.dat |
list of filenames |
PDBx/mmCIF file |
|
CHARMM coordinate file |
None |
CHARMM restart file |
None |
Gromacs GRO file |
|
Gromacs topology file |
|
Mol2 and Mol3 files |
|
NetCDF restart file |
None |
NetCDF trajectory file |
None |
PDB file |
|
PQR file |
|
PSF file |
None |
Serialized System XML |
None |
Serialized State XML |
None |
Serialized Integrator XML file |
None |
ForceField XML file |
None |
*These arguments are required when parsing the corresponding format file
The optional keyword arguments are described below:
xyz
– Either a file name for a coordinate file or an array with the coordinates. If the unit cell information is stored in the coordinate file, it is used (unless an explicitbox
argument is given; see below). If the file is a trajectory file, the first frame is used for the coordinates.box
– The unit cell dimensions in Angstroms (and angles in degrees).natom
– The number of atoms from which the file was writtenhasbox
– Whether unit cell information is stored in this trajectory filedefines
–dict
of preprocessor defines (order is respected if given via anOrderedDict
)parametrize
– IfTrue
, parameters are assigned from the parameter database. IfFalse
, they are not (default isTrue
).structure
– IfTrue
, return the Mol2/Mol3 file as aStructure
instance. Default isFalse
skip_bonds
– IfTrue
, no attempt is made to identify covalent bonds between any atoms for these formats. IfFalse
(default), bonds are first assigned by comparing to standard residue templates, then based on distance criteria for any atom not specified within the templates. This has a side-effect of improving element identification for PDB files that have no element columns specified as well as substantially improving element identification for ions in GRO files. You are suggested to keep the default (False
) unless you are only using the coordinates, in which case settingskip_bonds
toTrue
will result in significantly improved performance.
load_file()
automatically inspects the contents of the file with the given
name to determine what format the file is based on the first couple lines.
Except in rare, pathological cases, the file format detection mechanism is
fairly robust. If any files fail this detection, feel free to file an issue on
the Github issue tracker to improve file type prediction.
load_file()
has a number of helpful features. For instance, files ending
with the .gz
or .bz2
suffix will automatically be decompressed in-memory
using Gzip or Bzip2, respectively (except for some binary file formats, like
NetCDF). Furthermore, URLs beginning with http://
, https://
, or
ftp://
are valid file names and will result in the remote file being
downloaded and processed (again, in-memory).
Finally, to make it so that you can always retrieve a Structure
instance from file types that support returning
one by passing the structure=True
keyword to load_file
. If this argument
is not supported by the resulting file type, it is simply ignored, as is the
natom
, hasbox
, and skip_bonds
keywords.
Writing files with Structure.save
¶
Many of the file formats supported by ParmEd either parse directly to a
Structure
instance or subclass, and many
of the desired file type conversions that ParmEd is designed to facilitate are
between these formats (e.g., Amber topology, PDB, CHARMM PSF file, etc.).
To facilitate the required conversion and file writing, the base
Structure
class has a save
method that
will convert to the requested file format and write the output file. The desired
format is specified either explicitly or by file name extension (with explicit
format specifications taking precedence). Because Structure.save
is a convenience method, it will protect
against accidentally overwriting an existing file. The overwrite
argument,
when set to True
, will allow an existing file to be overwritten. If set to
False
(or left at its default), IOError
will be raised when attempting
to overwrite an existing file. The supported file formats, along with their
supported extra keyword arguments, are detailed in the following table.
File type |
Recognized extension(s) |
Format keyword |
Supported arguments |
---|---|---|---|
PDB |
|
|
|
PDBx/mmCIF |
|
|
|
PQR |
|
|
|
Amber prmtop |
|
|
None |
CHARMM PSF |
|
|
|
Gromacs topology |
|
|
|
Gromacs GRO |
|
|
|
Mol2 |
|
|
|
Mol3 |
|
|
|
Amber ASCII coordinates |
|
|
|
Amber NetCDF restart |
|
|
|
* PDB format only
The meanings and default values of each of the keywords is described in the next subsection.
Keywords¶
charmm
– IfTrue
, the SEGID will be printed in columns 73 to 76 of the PDB file (default isFalse
)renumber
– IfTrue
, atoms and residues will be numbered according to their absolute index (starting from 1) in the system. IfFalse
, the numbers will be retained from their original source (e.g., in the original PDB file). Default isTrue
coordinates
– A set of coordinates for one or multiple frames. If more than one frame is provided, the resulting PDB or PDBx/mmCIF file will have multiple models defined. Default isNone
, and the generated coordinates are the ones stored on the atoms themselves.altlocs
– Allowable values areall
(print all alternate locations for all conformers),first
(print only the first alternate conformer), andoccupancy
(print the conformer with the highest occupancy). Default isall
write_anisou
– IfTrue
, print anistropic B-factors for the various atoms (either as a separate CIF section or asANISOU
records in a PDB file). Default isFalse
standard_resnames
– IfTrue
, residue names will be regularlized from common alternatives back to the PDB standard. For example, ASH and GLH will be translated to ASP and GLU, respectively, as they often refer to different protomers of aspartate and glutamate.vmd
– IfTrue
, write a VMD-style PSF file. This is very similar to XPLOR format PSF files. Default isFalse
.combine
– Can beNone
to combine no molecules when writing a GROMACS topology file. A value ofall
will combine all of the molecules into a single moleculetype. Default isNone
.parameters
– Can beinline
(write the parameters inline in the GROMACS topology file). Can also be a string or a file-like object. If it is the same as the topology file name, it will be written in the previous sections of the GROMACS topology file. Other strings will be interpreted as filenames to print the parameters to as an include topology file. Default isinline
.precision
– The number of decimal places to print coordinates with in GRO files. Default is 3.nobox
– IfTrue
and theStructure
does not have a unit cell defined, no box is written to the bottom of the GRO file. Otherwise, an enclosing box (buffered by 5 angstroms) is writtensplit
– IfTrue
, all residues will be split into separate mol2 or mol3 entries in the same file (like the ZINC database, for example). IfFalse
, all residues will be part of the same mol2 or mol3 entry. Default isFalse
.title
– Purely cosmetic, it will specify the title that will be written to the coordinate filestime
– Also cosmetic, this is the time corresponding to the snapshot that will be written to the coordinate files
Examples¶
The following examples use various files from the ParmEd test suite, which can
be found in the test/files/
directory of the ParmEd source code:
>>> import parmed as pmd
>>> # Load a Mol2 file
... pmd.load_file('tripos1.mol2')
<ResidueTemplate DAN: 31 atoms; 33 bonds; head=None; tail=None>
>>> # Load a Mol2 file as a Structure
... pmd.load_file('tripos1.mol2', structure=True)
<Structure 31 atoms; 1 residues; 33 bonds; NOT parametrized>
>>> # Load an Amber topology file
... parm = pmd.load_file('trx.prmtop', xyz='trx.inpcrd')
>>> parm
<AmberParm 1654 atoms; 108 residues; 1670 bonds; parametrized>
>>> # Load a CHARMM PSF file
... psf = pmd.load_file('ala_ala_ala.psf')
>>> psf
<CharmmPsfFile 33 atoms; 3 residues; 32 bonds; NOT parametrized>
>>> # Load a PDB and CIF file
... pdb = pmd.load_file('4lzt.pdb')
>>> cif = pmd.load_file('4LZT.cif')
>>> pdb
<Structure 1164 atoms; 274 residues; 1043 bonds; PBC (triclinic); NOT parametrized>
>>> cif
<Structure 1164 atoms; 274 residues; 1043 bonds; PBC (triclinic); NOT parametrized>
>>> # Load a Gromacs topology file -- only works with Gromacs installed
... top = pmd.load_file('1aki.ff99sbildn.top')
>>> top
<GromacsTopologyFile 40560 atoms [9650 EPs]; 9779 residues; 30934 bonds; parametrized>
Any of the Structure
subclasses shown
above can be saved as any other kind of Structure
class or subclass for which the conversion is
supported. For instance, a raw PDB file has no defined parameters, so it cannot
be saved as an Amber topology file. An Amber topology file, on the other hand,
has all of the information required for a PDB, and so that conversion is
supported:
>>> parm.save('test_parm.pdb')
>>> # You can also convert CIF to PDB
... cif.save('test_cif.pdb')
>>> # Or you can convert PDB to CIF
... pdb.save('test_pdb.cif')
>>> # Check the resulting saved files
... pmd.load_file('test_parm.pdb')
<Structure 1654 atoms; 108 residues; 1670 bonds; NOT parametrized>
>>> pmd.load_file('test_cif.pdb')
<Structure 1164 atoms; 274 residues; 1043 bonds; PBC (triclinic); NOT parametrized>
>>> pmd.load_file('test_pdb.cif')
<Structure 1164 atoms; 274 residues; 1043 bonds; PBC (triclinic); NOT parametrized>