Protein domains are difficult to define in terms that are acceptable to everyone. For those who believe that a domain is simply a piece of a protein’s chain which has been demonstrated to display an autonomous ability to fold when it is taken out of the sequence- and structural-context of the protein chain from which it is derived, it will come as a big surprise to learn that there are well over ten different ways in which protein domains are defined (or identified) by the protein science and engineering community, and that new ways of defining domains, and identifying their boundaries, are still being developed and discussed. Into this melee, I have thrown in a new way of identifying and defining protein domains. Actually, I did this work in the late nineteen nineties, but I did not publish it in the bioinformatics journals of that time because two such journals thought that the work was very interesting but required validation (especially statistical validation) through testing on more than one hundred proteins, whereas I tested the method on only three proteins. Lacking the wherewithal and skills with which to test it on a much larger set of proteins, I let the manuscript lie unpublished for many years until I gave it to this new journal published by a group of scientists from India, as an invited contribution, just to let the idea appear in the literature. Briefly, what is done here it to take the entire structure of a multi-domain, single-subunit protein, and determine the energy of all non-bonded interactions in the protein. Thereafter, what is done is to remove one amino acid from the chain, from its C-terminus, and recalculate the energy of non-bonded interactions (ENBI). Progressively, in this manner, if one removes amino acids, one-by-one, from the C-terminus, and calculates and lists the alterations in the ENBI, to plot the changes in ENBI as a function of the reducing length of the sequence, and as a function of structures that are essentially altered because each has one amino acid less that the previous one, I show that inflections in the curve obtained provided excellent pointers to the boundaries of domains, showing how – for three proteins – these boundaries match with those identified by crystallographic analysis of C-alpha atom clustering; in fact, some potential new domains and domain boundaries are also additionally identified. The method works because mostly chains sections of a larger polypeptide that constitute a single structural, or autonomously-folding, domain have a tendency to return to an existing domain structure to reinforce its stability through non-bonded interactions, but when the chain sets of in a new direction to begin to make a new domain (as one traces its trajectory within the structure) then for the length of a few amino acids, or even a few tens of amino acids, the chain stops adding any further non-bonded contacts (because such contacts will take place only when the chain ‘notionally’ returns back to that region of space, after having struck-out in a totally new direction). This gives rise to a noticeable inflection in the curve plotting ENBI with chain length. Please read the paper if you find this stuff interesting. I was quite excited when I did this work, and wanted to put it out there for others to take it up and test it even more rigorously, using data sets of several hundreds of proteins. I am still excited, even though this rigorous testing never got done.

Click to Read in Detail