by James Apgar, Mary Ross, Xiao Zuo, Sarah Dohle, Derek Sturtevant, Binzhang Shen, Humberto de la Vega, Philip Lessard, Gabor Lazar, R. Michael Raab
Inteins are intervening protein domains with self-splicing ability that can be used as molecular switches to control activity of their host protein. Successfully engineering an intein into a host protein requires identifying an insertion site that permits intein insertion and splicing while allowing for proper folding of the mature protein post-splicing. By analyzing sequence and structure based properties of native intein insertion sites we have identified four features that showed significant correlation with the location of the intein insertion sites, and therefore may be useful in predicting insertion sites in other proteins that provide native-like intein function. Three of these properties, the distance to the active site and dimer interface site, the SVM score of the splice site cassette, and the sequence conservation of the site showed statistically significant correlation and strong predictive power, with area under the curve (AUC) values of 0.79, 0.76, and 0.73 respectively, while the distance to secondary structure/loop junction showed significance but with less predictive power (AUC of 0.54). In a case study of 20 insertion sites in the XynB xylanase, two features of native insertion sites showed correlation with the splice sites and demonstrated predictive value in selecting non-native splice sites. Structural modeling of intein insertions at two sites highlighted the role that the insertion site location could play on the ability of the intein to modulate activity of the host protein. These findings can be used to enrich the selection of insertion sites capable of supporting intein splicing and hosting an intein switch.