File input/output with ParmEd¶

ParmEd has several convenience functions designed specifically to make file input/output easier for a wide range of file formats.

The API for parsing different file types is not always consistent due to the technical needs of each class. For instance, the Amber topology file class subclasses from Structure, so it can be instantiated directly from a filename. On the other hand, PDBFile contains a parse static method that returns a Structure instance directly rather than subclassing.

Reading files with `load_file`¶

load_file(filename, *args, **kwargs)

Identifies the file format of the specified file and returns its parsed contents.

To provide a single interface for parsing any type of file, the load_file function takes a filename and any relevant arguments or keyword arguments for the supported file formats and calls the appropriate parsing function. Supported file formats along with the extra supported keyword arguments (if applicable) are shown in the table below (the filename argument is omitted):

File type	Supported arguments
Amber ASCII restart file	None
Amber prmtop	`xyz`, `box`
Amber MDL (e.g., RISM)	None
Amber MDCRD trajectory	`natom`, `hasbox`
Amber OFF library	None
Amber frcmod/parm.dat	list of filenames
PDBx/mmCIF file	`skip_bonds`
CHARMM coordinate file	None
CHARMM restart file	None
Gromacs GRO file	`skip_bonds`
Gromacs topology file	`defines`, `parametrize` `xyz`, `box`
Mol2 and Mol3 files	`structure`
NetCDF restart file	None
NetCDF trajectory file	None
PDB file	`skip_bonds`
PQR file	`skip_bonds`
PSF file	None
Serialized System XML	None
Serialized State XML	None
Serialized Integrator XML file	None
ForceField XML file	None

*These arguments are required when parsing the corresponding format file

The optional keyword arguments are described below:

xyz – Either a file name for a coordinate file or an array with the coordinates. If the unit cell information is stored in the coordinate file, it is used (unless an explicit box argument is given; see below). If the file is a trajectory file, the first frame is used for the coordinates.
box – The unit cell dimensions in Angstroms (and angles in degrees).
natom – The number of atoms from which the file was written
hasbox – Whether unit cell information is stored in this trajectory file
defines – dict of preprocessor defines (order is respected if given via an OrderedDict)
parametrize – If True, parameters are assigned from the parameter database. If False, they are not (default is True).
structure – If True, return the Mol2/Mol3 file as a Structure instance. Default is False
skip_bonds – If True, no attempt is made to identify covalent bonds between any atoms for these formats. If False (default), bonds are first assigned by comparing to standard residue templates, then based on distance criteria for any atom not specified within the templates. This has a side-effect of improving element identification for PDB files that have no element columns specified as well as substantially improving element identification for ions in GRO files. You are suggested to keep the default (False) unless you are only using the coordinates, in which case setting skip_bonds to True will result in significantly improved performance.

load_file() automatically inspects the contents of the file with the given name to determine what format the file is based on the first couple lines. Except in rare, pathological cases, the file format detection mechanism is fairly robust. If any files fail this detection, feel free to file an issue on the Github issue tracker to improve file type prediction.

load_file() has a number of helpful features. For instance, files ending with the .gz or .bz2 suffix will automatically be decompressed in-memory using Gzip or Bzip2, respectively (except for some binary file formats, like NetCDF). Furthermore, URLs beginning with http://, https://, or ftp:// are valid file names and will result in the remote file being downloaded and processed (again, in-memory).

Finally, to make it so that you can always retrieve a Structure instance from file types that support returning one by passing the structure=True keyword to load_file. If this argument is not supported by the resulting file type, it is simply ignored, as is the natom, hasbox, and skip_bonds keywords.

Writing files with `Structure.save`¶

Many of the file formats supported by ParmEd either parse directly to a Structure instance or subclass, and many of the desired file type conversions that ParmEd is designed to facilitate are between these formats (e.g., Amber topology, PDB, CHARMM PSF file, etc.).

To facilitate the required conversion and file writing, the base Structure class has a save method that will convert to the requested file format and write the output file. The desired format is specified either explicitly or by file name extension (with explicit format specifications taking precedence). Because Structure.save is a convenience method, it will protect against accidentally overwriting an existing file. The overwrite argument, when set to True, will allow an existing file to be overwritten. If set to False (or left at its default), IOError will be raised when attempting to overwrite an existing file. The supported file formats, along with their supported extra keyword arguments, are detailed in the following table.

File type	Recognized extension(s)	Format keyword	Supported arguments
PDB	`.pdb`	`pdb`	`charmm`*, `renumber`, `coordinates`, `altlocs`, `write_anisou`, `standard_resnames`
PDBx/mmCIF	`.cif`, `.pdbx`	`cif`
PQR	`.pqr`	`pqr`	`renumber`, `coordinates`, `standard_resnames`
Amber prmtop	`.parm7`, `.prmtop`	`amber`	None
CHARMM PSF	`.psf`	`charmm`	`vmd`
Gromacs topology	`.top`	`gromacs`	`combine`, `parameters`
Gromacs GRO	`.gro`	`gro`	`precision`, `nobox`
Mol2	`.mol2`	`mol2`	`split`
Mol3	`.mol3`	`mol3`	`split`
Amber ASCII coordinates	`.rst7`, `.inpcrd`, `.restrt`	`rst7`	`title`, `time`
Amber NetCDF restart	`.ncrst`	`ncrst`	`title`, `time`

* PDB format only

The meanings and default values of each of the keywords is described in the next subsection.

Keywords¶

charmm – If True, the SEGID will be printed in columns 73 to 76 of the PDB file (default is False)
renumber – If True, atoms and residues will be numbered according to their absolute index (starting from 1) in the system. If False, the numbers will be retained from their original source (e.g., in the original PDB file). Default is True
coordinates – A set of coordinates for one or multiple frames. If more than one frame is provided, the resulting PDB or PDBx/mmCIF file will have multiple models defined. Default is None, and the generated coordinates are the ones stored on the atoms themselves.
altlocs – Allowable values are all (print all alternate locations for all conformers), first (print only the first alternate conformer), and occupancy (print the conformer with the highest occupancy). Default is all
write_anisou – If True, print anistropic B-factors for the various atoms (either as a separate CIF section or as ANISOU records in a PDB file). Default is False
standard_resnames – If True, residue names will be regularlized from common alternatives back to the PDB standard. For example, ASH and GLH will be translated to ASP and GLU, respectively, as they often refer to different protomers of aspartate and glutamate.
vmd – If True, write a VMD-style PSF file. This is very similar to XPLOR format PSF files. Default is False.
combine – Can be None to combine no molecules when writing a GROMACS topology file. A value of all will combine all of the molecules into a single moleculetype. Default is None.
parameters – Can be inline (write the parameters inline in the GROMACS topology file). Can also be a string or a file-like object. If it is the same as the topology file name, it will be written in the previous sections of the GROMACS topology file. Other strings will be interpreted as filenames to print the parameters to as an include topology file. Default is inline.
precision – The number of decimal places to print coordinates with in GRO files. Default is 3.
nobox – If True and the Structure does not have a unit cell defined, no box is written to the bottom of the GRO file. Otherwise, an enclosing box (buffered by 5 angstroms) is written
split – If True, all residues will be split into separate mol2 or mol3 entries in the same file (like the ZINC database, for example). If False, all residues will be part of the same mol2 or mol3 entry. Default is False.
title – Purely cosmetic, it will specify the title that will be written to the coordinate files
time – Also cosmetic, this is the time corresponding to the snapshot that will be written to the coordinate files

Examples¶

The following examples use various files from the ParmEd test suite, which can be found in the test/files/ directory of the ParmEd source code:

>>> import parmed as pmd
>>> # Load a Mol2 file
... pmd.load_file('tripos1.mol2')
<ResidueTemplate DAN: 31 atoms; 33 bonds; head=None; tail=None>
>>> # Load a Mol2 file as a Structure
... pmd.load_file('tripos1.mol2', structure=True)
<Structure 31 atoms; 1 residues; 33 bonds; NOT parametrized>
>>> # Load an Amber topology file
... parm = pmd.load_file('trx.prmtop', xyz='trx.inpcrd')
>>> parm
<AmberParm 1654 atoms; 108 residues; 1670 bonds; parametrized>
>>> # Load a CHARMM PSF file
... psf = pmd.load_file('ala_ala_ala.psf')
>>> psf
<CharmmPsfFile 33 atoms; 3 residues; 32 bonds; NOT parametrized>
>>> # Load a PDB and CIF file
... pdb = pmd.load_file('4lzt.pdb')
>>> cif = pmd.load_file('4LZT.cif')
>>> pdb
<Structure 1164 atoms; 274 residues; 1043 bonds; PBC (triclinic); NOT parametrized>
>>> cif
<Structure 1164 atoms; 274 residues; 1043 bonds; PBC (triclinic); NOT parametrized>
>>> # Load a Gromacs topology file -- only works with Gromacs installed
... top = pmd.load_file('1aki.ff99sbildn.top')
>>> top
<GromacsTopologyFile 40560 atoms [9650 EPs]; 9779 residues; 30934 bonds; parametrized>

Any of the Structure subclasses shown above can be saved as any other kind of Structure class or subclass for which the conversion is supported. For instance, a raw PDB file has no defined parameters, so it cannot be saved as an Amber topology file. An Amber topology file, on the other hand, has all of the information required for a PDB, and so that conversion is supported:

>>> parm.save('test_parm.pdb')
>>> # You can also convert CIF to PDB
... cif.save('test_cif.pdb')
>>> # Or you can convert PDB to CIF
... pdb.save('test_pdb.cif')
>>> # Check the resulting saved files
... pmd.load_file('test_parm.pdb')
<Structure 1654 atoms; 108 residues; 1670 bonds; NOT parametrized>
>>> pmd.load_file('test_cif.pdb')
<Structure 1164 atoms; 274 residues; 1043 bonds; PBC (triclinic); NOT parametrized>
>>> pmd.load_file('test_pdb.cif')
<Structure 1164 atoms; 274 residues; 1043 bonds; PBC (triclinic); NOT parametrized>

File input/output with ParmEd¶

Reading files with load_file¶

Writing files with Structure.save¶

Keywords¶

Examples¶

Reading files with `load_file`¶

Writing files with `Structure.save`¶