pepstats reads one or more protein sequences and writes an output file with various statistics on the protein properties. This includes:
|  | 
Absorption coefficients use values read from the EMBOSS data file 'Eamino.dat'. Values in this file assume cysteines are reduced. If cysteines are in disulphide bridges the value should be adjusted as documented at the top of the file, and a local copy used to override the default values.
Molecular weights are read from a local data file Emolwt.dat.
DayhoffStat is the amino acid's molar percentage divided by the Dayhoff statistic. The Dayhoff statistic is read from the EMBOSS data file Edayhoff.freq and is the amino acid's relative occurence per 1000 aa normalised to 100.
The probability of expression in inclusion bodies is sometimes referred to as a type of solubility measure. If, however, a recombinant protein is expressed in Escherichia coli, it can be expressed as soluble in the cytosol or insoluble in inclusion bodies. If the Harrison model predicts a given protein to be probably expressed in includion bodies, this doesn't mean that it is not possible to get it soluble in the cytosol. One example: Thermatoga maritima cell divison protein FtsA with a C-terminal His-Tag has a 58% Harrison probability of being expressed in inclusion bodies. However, there was plenty of soluble protein in the E. coli cytosol (F. van den Ent and J. Lowe, EMBO J. 19, 5300-5307 2000). If the protein is expressed in inclusion bodies or not is not only dependent on the sequence, but also on many other factors, such as E. coli strain, incubation temperature, type of expression vector, strength of promoter and medium.