Protein domains are difficult to define in terms that are acceptable to everyone. For those who believe that a domain is simply a piece of a protein’s chain which has been demonstrated to display an autonomous ability to fold when it is taken out of the sequence- and structural-context of the protein chain from which it is derived, it will come as a big surprise to learn that there are well over ten different ways in which protein domains are defined (or identified) by the protein science and engineering community, and that new ways of defining domains, and identifying their boundaries, are still being developed and discussed. Into this melee, I have thrown in a new way of identifying and defining protein domains. Actually, I did this work in the late nineteen nineties, but I did not publish it in the bioinformatics journals of that time because two such journals thought that the work was very interesting but required validation (especially statistical validation) through testing on more than one hundred proteins, whereas I tested the method on only three proteins. Lacking the wherewithal and skills with which to test it on a much larger set of proteins, I let the manuscript lie unpublished for many years until I gave it to this new journal published by a group of scientists from India, as an invited contribution, just to let the idea appear in the literature. Briefly, what is done here it to take the entire structure of a multi-domain, single-subunit protein, and determine the energy of all non-bonded interactions in the protein. Thereafter, what is done is to remove one amino acid from the chain, from its C-terminus, and recalculate the energy of non-bonded interactions (ENBI). Progressively, in this manner, one removes amino acids, one-by-one, from the C-terminus, and calculates and lists the alterations in the ENBI. Then, one plots the changes in ENBI occurring as a function of the reducing length of the sequence, and as a function of truncation of structures that are altered because each structure has one amino acid less than the previous structure in the sequence of truncations. I show that inflections in the curve obtained by plotting the ENBIs provides excellent pointers to the boundaries of domains, showing how – for at least three different proteins – these boundaries match with those identified by crystallographic analysis of C-alpha atom clustering; in fact, some potential new domains and domain boundaries are also additionally identified. The method works, mostly because chain-sections of a larger polypeptide which constitute a single structural, autonomously-folding domain, have a tendency to return back (in space) towards an existing domain structure, to reinforce stability through more and more non-bonded interactions; however, when the chain sets off in a new direction, to begin to make a new domain (tracing the chain’s trajectory within the structure) the chain stops adding any further non-bonded contacts for the length of a few amino acids, or even a few tens of amino acids (because further contacts take place only when the chain ‘notionally’ returns back to the same region of space, after having struck-out in a totally new direction). This gives rise to a noticeable inflection in the curve plotting ENBI with chain length, every time. Please read the paper if you find this stuff interesting, to understand the full explanation. I was quite excited when I did this work, and wanted to put it out there for others to take it up and test it even more rigorously, using data sets of several hundreds of proteins. I am still excited, even though this rigorous testing never got done.

Click to Read in Detail