IS Families/IS200-IS605 family: Difference between revisions
No edit summary |
|||
| (45 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
=== | == Historical == | ||
One of the founding members of this group, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''], was identified in ''[[wikipedia:Salmonella_enterica_subsp._enterica|Salmonella typhimurium]]'' <ref name=":11">{{#pmid:6313217}}</ref> as a mutation in ''hisD'' ([https://www.ncbi.nlm.nih.gov/nuccore/X56834.1 hisD984]) which mapped as a point mutation but which did not revert and was polar on the downstream ''hisC'' gene (see <ref name=":3">{{#pmid:15179601}}</ref>). [[wikipedia:Salmonella_enterica_subsp._enterica|''S. typhimurium'' LT2]] was found to contain six [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] copies and the IS was unique to ''[[wikipedia:Salmonella_enterica_subsp._enterica|Salmonella]]'' <ref name=":15">{{#pmid:6315530}}</ref>. Further studies <ref name=":13">{{#pmid:3009825}}</ref> showed that the IS did not carry repeated sequences, either '''direct''' or '''inverted''', at its ends, and that removal of 50 bp at the transposase proximal end (which includes a structure resembling a transcription terminator) removed the strong transcriptional block. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] elements from ''[[wikipedia:Salmonella_enterica_subsp._enterica|S. typhimurium]]'' and ''[[wikipedia:Salmonella_enterica_subsp._enterica|S. abortusovis]]'' revealed a highly conserved structure of 707–708 bp with a single open-reading-frame potentially encoding a 151 aa peptide and a putative upstream [[wikipedia:Ribosome-binding_site|ribosome-binding-site]] <ref name=":17">{{#pmid:9060429}}</ref>. | |||
It has been suggested that a combination of inefficient transcription, protection from impinging transcription by a transcriptional terminator, and repression of translation by a stem-loop mRNA structure. All contribute to tight repression of transposase synthesis <ref name=":3" />. However, although [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] seems to be relatively inactive in transposition <ref>{{#pmid:2546038}}</ref>, it is involved in chromosome arrangements in ''[[wikipedia:Salmonella_enterica_subsp._enterica|S. typhimurium]]'' by recombination between copies <ref>{{#pmid:8601470}}</ref>. | |||
A second group of “founding” members of this family was, arguably, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] from the [[wikipedia:Thermophile|thermophilic bacterium PS3]] <ref name=":16">{{#pmid:7557457}}</ref>, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS891 IS''891''] from [[wikipedia:Anabaena|''Anabaena'' sp]]. M-131 <ref name=":8">{{#pmid:2553665}}</ref> and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1136 IS''1136''] from ''[[wikipedia:Saccharopolyspora_erythraea|Saccharopolyspora erythraea]]'' <ref>{{#pmid:8386127}}</ref>. The “transposases” of both elements were observed to be associated in a single IS, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''], from the gastric pathogen ''[[wikipedia:Helicobacter_pylori|Helicobacter pylori]]'' <ref name=":12">{{#pmid:9858724}}</ref>. It was identified in many independent isolates of ''[[wikipedia:Helicobacter_pylori|H. pylori]]'' and is now considered to be a central member which defines this large family. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] was shown to possess unique, not inverted repeat, ends; did not duplicate target sequences during transposition; and inserted with its left ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200'']-homolog) end abutting 5'-'''TTTAA''' or 5'-'''TTTAAC''' target sequences <ref name=":12" />. Additionally, a second derivative, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS606 IS''606''], with only 25% amino acid identity in the two proteins (''orfA'' and ''orfB'') was also identified in many of the ''[[wikipedia:Helicobacter_pylori|H. pylori]]'' isolates including some which were devoid of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605'']. The Berg lab also identified another ''[[wikipedia:Helicobacter_pylori|H. pylori]]'' IS, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] <ref name=":5">{{#pmid:10986230}}</ref> which carried a similar [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']-like orf (''orfB'') but with another upstream orf with similarities to that of the mycobacterial [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1535 IS''1535''] <ref>{{#pmid:10220167}}</ref> annotated as a resolvase due the presence of a site-specific [[wikipedia:Site-specific_recombination|serine recombinase]] motif. Another [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] derivative, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''Hp608''], which appeared widely distributed in [[wikipedia:Helicobacter_pylori|''H. pylori'']] was shown to transpose in ''[[wikipedia:Escherichia_coli|E. coli]]'', required only ''orfA'' to transpose and inserted downstream from a 5’-'''TTAC''' target sequence <ref name=":6">{{#pmid:11807059}}</ref>. | |||
== General == | |||
[[ | The IS''200''/IS''605'' family members transpose using obligatory '''s'''ingle '''s'''trand(ss) DNA intermediates <ref name=":22">{{#pmid:26104715}}</ref> by a mechanism called “'''peel and paste'''”. They differ fundamentally in the organization from classical IS. They have sub-terminal palindromic structures rather than terminal '''IRs''' ([[:File:FigIS200 605 1.png|Fig. IS200.1]]) and insert 3’ to specific AT-rich tetra- or penta-nucleotides without duplicating the target site. | ||
[[File:FigIS200 605 1.png|alt=|center|thumb|640x640px|'''Fig. IS200.1.''' Genetic organization. '''Left''' (LE) and '''right''' (RE) ends carrying the subterminal hairpin (HP) are presented as red and blue boxes, respectively. Left and right cleavage sites (CL and CR) are presented as black and blue boxes respectively, where the black box also represents element-specific tetra-/pentanucleotide target site (TS). The cleavage positions are indicated by small vertical arrows. Gray arrows: ''tnpA'' and ''tnpB'' open reading frames (orfs); '''(i)''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group with ''tnpA'' alone; '''(ii)''' to '''(iv)''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group with ''tnpA'' and ''tnpB'' in different configurations; '''(v)''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] group with ''tnpB'' alone.]] | |||
The transposase, TnpA, is a member of the HUH enzyme superfamily ([[wikipedia:Relaxase|Relaxases]], Rep proteins of RCR plasmids/ss phages, bacterial and eukaryotic transposases of [[IS Families/IS91-ISCR families|IS''91''/IS''CR'' and Helitrons]]<ref>{{#pmid:26350323}}</ref><ref name=":1">{{#pmid:23832240}}</ref>)([[:File:FigIS200 605 2rev.png|Fig. IS200.2]]) which all catalyze cleavage and rejoining of ssDNA substrates. | |||
[[File:FigIS200 605 2rev.png|alt=|center|thumb|720x720px|'''Fig. IS200.2.''' The IS''200''/IS''605'' family transposases are “minimal” and the smallest transposases presently know. They include the HUH and Y motifs and use Y as the attacking nucleophile to generate 5’ phosphotyrosine covalent intermediates. HUH transposases from other transposon families include additional domains.]] | |||
[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''], the founding member ([[:File:FigIS200 605 1.png|Fig. IS200.3]]), was identified 30 years ago in ''[[wikipedia:Salmonella_enterica_subsp._enterica|Salmonella typhimurium]] <ref name=":11" />'' but there has been renewed interest for these elements since the identification of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group in ''[[wikipedia:Helicobacter_pylori|Helicobacter pylori]] <ref name=":12" />''<ref>{{#pmid:9631304}}</ref><ref name=":6" />. Studies of two elements of this group, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] from ''[[wikipedia:Helicobacter_pylori|H. pylori]]'' and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] from the radiation resistant ''[[wikipedia:Deinococcus_radiodurans|Deinococcus radiodurans]]'', have provided a detailed picture of their mobility <ref name=":23">{{#pmid:16209952}}</ref><ref name=":24">{{#pmid:16163392}}</ref><ref name=":7">{{#pmid:18280236}}</ref><ref name=":0">{{#pmid:18243097}}</ref><ref name=":32">{{#pmid:20090938}}</ref><ref name=":25">{{#pmid:20691900}}</ref><ref name=":9">{{#pmid:20890269}}</ref>. | |||
[[File:FigIS200 605 3.png|alt=|center|thumb|720x720px|'''Fig. IS200.3.''' '''Top''': [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] Secondary structures in '''LE ('''red) and '''RE''' (blue), promoter (pL), [[wikipedia:Ribosome-binding_site|'''R'''ibosome '''B'''inding '''S'''ite]] ('''RBS'''), and ''tnpA'' start and stop codons (AUG and UAA) are indicated. '''(i)''' DNA top strand with perfect palindromes at LE and RE in red and blue, interior stem-loop in black, '''(ii)''' RNA stem-loop structure in transcript originated from pL. '''Bottom:''' ''tnpA'' transcription originates at about nt 40, but promoter elements are not defined; the ‘left end’ contains two internal inverted repeats (opposing arrows), one of which acts as a transcription terminator (nts 12–34). The second, (nts 69–138) in the 5’UTR of the tnpA mRNA sequesters the [[wikipedia:Shine-Dalgarno_sequence|Shine-Dalgarno]] sequence. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] in ''[[wikipedia:Salmonella|Salmonella]]'' also expresses a 90 nt sRNA (asRNA, art200, or STnc490) perfectly complementary to the 5’UTR and the first three codons of ''tnpA''. The transcription start site and 3’ end for art200 in ''[[wikipedia:Salmonella|Salmonella]]'' (derived from RNA-Seq experiments) are shown, but promoter elements were not previously defined.]] | |||
[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''], the | == Distribution and Organization == | ||
The family is widely distributed in prokaryotes with more than 153 distinct members (89 are distributed over 45 genera and 61 species of [[wikipedia:Bacteria|bacteria]], and 64 are from [[wikipedia:Archaea|archaea]]). It is divided into three major groups based on the presence or absence and on the configuration of two genes: the transposase ''tnpA'' (https://www.ncbi.nlm.nih.gov/research/cog/cog/COG1943/), sufficient to promote IS mobility ''in vivo'' and ''in vitro'' and ''tnpB'' (https://www.ncbi.nlm.nih.gov/research/cog/cog/COG0675/) ([[:File:FigIS200 605 1.png|Fig. IS200.1]]) initially of unknown function and not required for transposition activity but now known to de an RNA-guide endonuclease (see [http://tnpedia.fcav.unesp.br/index.php/IS_Families/IS200-IS605_family#TnpB TnpB below]) . These groups are: [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']. TnpB is also present in another [[IS Families/IS607 family|IS family, IS''607'']], which uses a [[wikipedia:Site-specific_recombination|serine-recombinase]] as a transposase. In the phylogeny of this group ([[:File:FigIS200 605 4A.png|Fig. IS200.4A]]) of IS, both ''tnpB'' and ''tnpA'' of bacterial or archaeal origin are intercalated, suggesting some degree of horizontal transfer between these two groups of organisms<ref name=":2">{{#pmid:17347521}}</ref>. | |||
[[File:FigIS200 605 4A.png|center|thumb|720x720px|'''Fig. IS200.4.''' '''(i)''' Phylogeny-based on ''tnpB'' of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200'']/[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605'']/[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] family. '''(ii)''' Phylogeny-based on ''tnpA'' of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] family ([[wikipedia:Site-specific_recombination|serine recombinase]]). '''(iii)''' Phylogeny-based on ''tnpA'' of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] family (HUH transposase). IS''608'' elements are underlined, single ''orfB'' elements are indicated between brackets, and the asterisk indicates the mosaic construction of the elements of this family (see the text). The various Archaea have been color-coded as follows for clarity: [[wikipedia:Sulfolobales|Sulfolobales]], red; [[wikipedia:Thermoplasmatales|Thermoplasmatales]], magenta; [[wikipedia:Halophile|halophiles]], green; [[wikipedia:Methanogen|methanogens]], blue; “other,” orange. Bacteria are indicated in black.]] | |||
Isolated copies of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200'']-like ''tnpA'' can be identified in both bacteria and archaea<ref name=":2" />. Full length copies of IS''605''-like elements are also found in bacteria and several archaea and all have corresponding MITEs ('''M'''iniature '''I'''nverted repeat '''T'''ransposable '''E'''lements) derivatives in their host genomes. | |||
====The IS''200'' group==== | |||
[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group members encode only ''tnpA'', and are present in gram-positive and gram-negative bacteria and certain archaea<ref name=":3" /><ref>{{#pmid:10418150}}</ref> ([[:File:FigIS200 605 1.png|Fig. IS200.1]] and [[:File:FigIS200 605 3.png|Fig. IS200.3]]). Alignment of TnpA from various members shows that they are highly conserved but may carry short C-terminal tails of variable length and sequence. Among approximately 400 entries in ISfinder (December 2023), about 50 examples IS''200''-like derivatives. | |||
They can occur in relatively high copy number (e.g. >50 copies of IS''1541'' in ''[[wikipedia:Yersinia_pestis|Yersinia pestis]]'') and are among the smallest known autonomous IS with lengths generally between 600-700 pb. Some members such as [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISW1 IS''W1''] (from ''[[wikipedia:Wolbachia|Wolbachia]]'' sp.) or [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISPrp13 IS''Prp13''] (from ''[[wikipedia:Photobacterium_profundum|Photobacterium profundum]]'') are even shorter. | |||
[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] was initially identified as an insertion mutation in the ''[[wikipedia:Salmonella_enterica_subsp._enterica|Salmonella typhimurium]]'' histidine operon <ref name=":11" />. It is abundant in different ''[[wikipedia:Salmonella|Salmonella]]'' strains and has now also been identified in a variety of other enterobacteria such as ''[[wikipedia:Escherichia|Escherichia]]'', ''[[wikipedia:Shigella|Shigella]]'' and ''[[wikipedia:Yersinia|Yersinia]]''. | |||
Different enterobacterial [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] copies have almost identical lengths of between 707 and 711bp. Analysis of the ECOR (''[[wikipedia:Escherichia_coli|E. coli]]'') and SARA (''[[wikipedia:Salmonella|Salmonellae]]'') collections showed that the level of sequence divergence between [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] copies from these hosts is equivalent to that observed for chromosomally encoded genes from the same taxa<ref>{{#pmid:8253675}}</ref><ref>{{#pmid:8384142}}</ref>. This suggests that [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] was present in the common ancestor of ''[[wikipedia:Escherichia_coli|E. coli]]'' and ''[[wikipedia:Salmonella|Salmonellae]]''. | |||
In spite of their abundance, an enigma of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] behavior is its poor contribution to spontaneous mutation in its original ''[[wikipedia:Salmonella|Salmonella]]'' host: only very rare insertion events have been documented <ref name=":3" />. One reason for these rare insertions could be due to poor expression of the TnpA<sub>IS''200''</sub> gene from a weak promoter pL identified at the left IS end (LE)<ref name=":13" /><ref name=":17" /> ([[:File:FigIS200 605 3.png|Fig. IS200.3]]). | |||
Besides the characteristic major subterminal palindromes <ref name=":13" /> presumed binding sites of the transposase at both '''LE''' and the right end ('''RE''') (Substrate recognition), [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] carries also a potential supplementary interior stem-loop structure ([[:File:Fig. IS200.3.png|Fig. IS200.3]]). These two structures play a role in regulating [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] gene expression. The first (perfect palindrome at LE; nts 12–34) overlaps the TnpA<sub>IS''200''</sub> promoter pL, can act as a bi-directional transcription terminator upstream of TnpA<sub>IS''200''</sub> and terminates up to 80% of transcripts<ref name=":4">{{#pmid:10471738}}</ref> ([[:File:Fig. IS200.3.png|Fig. IS200.3]]). The second (interior stem-loop; nts 69–138) ([[:File:Fig. IS200.3.png|Fig. IS200.3]]), at the RNA level, can repress mRNA translation by sequestration of the [[wikipedia:Ribosome-binding_site|'''R'''ibosome '''B'''inding '''S'''ite (RBS)]] ([[:File:Fig. IS200.3.png|Fig. IS200.3]]). Experimental data suggested that the stem-loop is formed ''in vivo'' and its removal by mutagenesis caused up to a 10 fold increase in protein production<ref name=":4" />. Recent deep sequencing analysis revealed another aspect in post-transcriptional regulation of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] expression: A small anti-sense RNA ('''asRNA''') [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] transposase expression ([[:File:Fig. IS200.3.png|Fig. IS200.3]]) was identified as a substrate of [[wikipedia:Hfq_protein|Hfq]], an RNA chaperone involved in post-transcriptional regulation in numerous bacteria<ref>{{#pmid:18725932}}</ref>. Interestingly, asRNA and [[wikipedia:Hfq_protein|Hfq]] independently inhibit [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] transposase expression: knock-out of both components resulted in a synergistic increase in transposase expression. Moreover, footprint data showed that [[wikipedia:Hfq_protein|Hfq]] binds directly to the 5’ part of the transposase transcript and blocks access to the [[wikipedia:Ribosome-binding_site|RBS]]<ref>{{#pmid:26044710}}</ref>. | |||
[[ | |||
In spite of its very low transposition activity, an increase in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] copy number was observed during strain storage in stab cultures<ref name=":11" /><ref name=":15" />. However, the factors triggering this activity remain unknown<ref name=":3" /> . Transient high transposase expression leading to a burst of transposition was proposed to explain the observed high [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] (>20) copy number in various hosts and in stab cultures <ref name=":11" />. | |||
Although regulatory structures similar to that observed in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] ([[:File:Fig. IS200.3.png|Fig. IS200.3]]) were predicted in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1541 IS''1541''], another member of this group with 85% identity to [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''], this element can be detected in higher copy number (> 50) in ''[[wikipedia:Salmonella|Salmonella]]'' and ''[[wikipedia:Yersinia|Yersinia]]'' genomes. However, no detailed analysis of its transposition is available and since no de novo insertions have been experimentally documented and chromosomal copies appear stable in ''[[wikipedia:Yersinia_pestis|Y. pestis]]''<ref>{{#pmid:9422611}}</ref>, it remains possible that [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1541 IS''1541''] also behaves like [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200'']. | |||
However, the regulatory structures are not systematically present in other [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group members and understanding of the control of transposase synthesis requires further study. | |||
====The IS''605'' group==== | |||
[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group members are generally longer (1.6-1.8 kb) due to the presence of a second ''orf'', ''tnpB'' in addition to ''tnpA''. Alignment of TnpA copies from this group indicated that although they do not form a separate clade from the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group TnpA, they generally carry the short C-terminal tail. The ''tnpA'' and ''tnpB'' orfs exhibit various configurations with respect to each other. They may be divergent ([[:File:Fig. IS200.1.png|Fig. IS200.1]] '''i''' top: e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS606 IS''606'']) or expressed in the same direction with ''tnpA'' upstream of ''tnpB''. In these latter cases, the orfs may be partially overlapping ([[:File:Fig. IS200.1.png|Fig. IS200.1]] '''ii'''; e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']) or separate [[:File:FigIS200 605 1.png|Fig. IS200.1]] '''iii'''; e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSCpe2 IS''SCpe2''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISEfa4 IS''Efa4'']). ''tnpB'' is also sometimes associated with another transposase, a member of the S-transposases (e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']''<ref name=":5" />''<ref name=":45">{{#pmid:24195768}}</ref>, see <ref name=":22" />. TnpB was not required for transposition of either [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] or [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']. | |||
=== | Three related IS, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS606 IS''606''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] ([[:File:Fig. IS200.1.png|Fig. IS200.1]]) have been identified in numerous strains of the gastric pathogen ''[[wikipedia:Helicobacter_pylori|Helicobacter pylori]] <ref name=":12" /><ref name=":6" />'' . [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] is involved in genomic rearrangements in various ''[[wikipedia:Helicobacter_pylori|H. pylori]]'' isolates<ref>{{#pmid:9789049}}</ref>. | ||
[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name= | |||
The ''[[wikipedia:Helicobacter_pylori|H. pylori]]'' elements transpose in ''[[wikipedia:Escherichia_coli|E. coli]]'' at detectable frequencies in a standard "mating-out" assay using a derivative of the conjugative [[wikipedia:Fertility_factor_(bacteria)|F plasmid]] as a target <ref name=":12" /><ref name=":6" />. | |||
[[File:FigIS200 605 4B.png|center|thumb|720x720px|'''Fig. IS200.4B. An IS''605'' Group Tree.''' Distribution based on Xiang et al <ref name=":14">{{#pmid:37386294}}</ref>. The different colors represent the 8 TnpB clusters identified layered onto the tree of life (A new view of the tree of life <ref name=":10">{{#pmid:27572647}}</ref>. Figure kindly provided by Yuanqing Li.]] | |||
The two best characterized members of this family are [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and the closely related [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] from ''[[wikipedia:Deinococcus_radiodurans|Deinococcus radiodurans]]''. Both have overlapping ''tnpA'' and ''tnpB'' genes ([[:File:FigIS200 605 1.png|Fig. IS200.1]] '''ii'''). Like other family members, insertion is sequence-specific: [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] inserts in a specific orientation with its left end 3’ to the tetranucleotide TTAC both ''in vivo'' and ''in vitro<ref name=":6" />'' while [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] inserts 3’ to the pentanucleotide TTGAT<ref>{{#pmid:14676423}}</ref>. Interestingly [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] transposition in its highly radiation resistant Deinococcal host is strongly induced by irradiation<ref name=":262">{{#pmid:17006450}}</ref> (Single strand DNA ''in vivo''). Their detailed transposition pathway has been deciphered by a combination of ''in vivo'' studies and ''in vitro'' biochemical and structural approaches ([[IS Families/IS200 IS605 family#Mechanism of IS200.2FIS605 single strand DNA transposition|Mechanism of IS''200''/IS''605'' single strand DNA transposition]]). | |||
A more detailed and recent analysis of the distribution of 107 IS''605'' group elements in ISfinder is shown in [[:File:FigIS200 605 4B.png|Fig. IS200.4B]] <ref name=":14" />. The tree, based on TnpB sequences could be divided into 8 clusters which are overlaid onto the universal tree described by Hug et al., 2016 <ref name=":10" />. | |||
====The IS''1341'' group==== | |||
Elements of the third group, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''], are devoid of ''tnpA'' and carry only ''tnpB'' ([[:File:FigIS200 605 1.png|Fig. IS200.1]] '''v'''). The IS occurs in three copies in [[wikipedia:Thermophile|Thermophilic bacterium]] PS3 <ref name=":16" />. Multiple presumed full-length elements (including ''tnpA'' and ''tnpB'') and closely related copies have been identified in other bacteria such as ''[[wikipedia:Geobacillus|Geobacillus]]''. On the other hand, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS891 IS''891''] from the cyanobacterium ''[[wikipedia:Anabaena|Anabaena]]'' is present in multiple copies on the chromosome and is thought to be mobile since a copy was observed to have inserted into a plasmid introduced in the strain<ref name=":8" />. | |||
Another isolated ''tnpB''-related gene, ''[https://www.uniprot.org/uniprot/Q50HS5 gipA]'', present in the ''[[wikipedia:Salmonella|Salmonella]]'' Gifsy-1 prophage may be a virulence factor since a ''gipA'' null mutation compromised ''[[wikipedia:Salmonella|Salmonella]]'' survival in a Peyer's patch assay <ref>{{#pmid:10913072}}</ref>. While no mobility function has been suggested for ''gipA'', it is indeed bordered by structures characteristic of IS''200''/IS''605'' family ends and closely related to ''[[wikipedia:Escherichia_coli|E. coli]]'' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISEc42 IS''Ec42'']. | |||
[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] | In spite of their presence in multiple copies, it is still unclear whether [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] group members are autonomous IS or products of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group degradation and require TnpA supplied from a related IS in the same cell for transposition. | ||
====IS decay==== | |||
Circumstantial evidence based on analysis of the [https://isfinder.biotoul.fr/ ISfinder database] suggests that IS carrying both ''tnpA'' and ''tnpB'' genes may be unstable. Thus, although members of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group are often present in high copy number in their host genomes, intact full-length [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group members are invariably found in low copy number ([https://scholar.google.com/citations?user=WHAtfqcAAAAJ&hl=pt-BR P. Siguier], unpublished) (See also [[IS Families/IS200-IS605 family#TnpB|TnpB]]). On the other hand, various truncated [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group derivatives appear quite frequently ([[:File:IS200.slide.show.1.A.png|Fig. IS200.slide show 1]], [[:File:IS200.slide.show.2.A.png|slide show 2]] ,[[:File:IS200.slide.show.3.A.png|slide show 3]], [[:File:IS200.slide.show.4.A.png|slide show 4]], and [[:File:IS200.slide.show.5.A.png|slide show 5]]).<gallery mode="slideshow"> | |||
File:IS200.slide.show.1.A.png|'''Fig. IS200.slide show 1.''' Decay of ''[[wikipedia:Campylobacter|Campylobacter coli]]'' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCco1 IS''Cco1''] | |||
File:IS200.slide.show.1.B.png|'''Fig. IS200.slide show 1.''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCco1 IS''Cco1''] Insertion Sites and '''LE''' and '''RE''' Cleavage | |||
File:IS200.slide.show.1.C.png|'''Fig. IS200.slide show 1.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCco1 IS''Cco1''] Copies | |||
File:IS200.slide.show.1.D.png|'''Fig. IS200.slide show 1.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCco1 IS''Cco1''] Copies | |||
File:IS200.slide.show.1.E.png|'''Fig. IS200.slide show 1.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCco1 IS''Cco1''] Copies | |||
</gallery><gallery mode="slideshow"> | |||
File:IS200.slide.show.2.A.png|'''Fig. IS200.slide show 2.''' Decay of ''[[wikipedia:Cyanothece|Cyanothece]]'' sp. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCysp14 IS''Cysp14'']''.'' | |||
File:IS200.slide.show.2.B.png|'''Fig. IS200.slide show 2.''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCysp13 IS''Cysp13''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCysp14 IS''Cysp14''] '''LE''' and '''RE''' Cleavage Sites | |||
File:IS200.slide.show.2.C.png|'''Fig. IS200.slide show 2.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCysp14 IS''Cysp14''] Copies | |||
File:IS200.slide.show.2.D.png|'''Fig. IS200.slide show 2.''' Alignment of ''[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCysp14 ISCysp14]'' Copies | |||
</gallery><gallery mode="slideshow"> | |||
File:IS200.slide.show.3.A.png|'''Fig. IS200.slide show 3.''' Decay of [[wikipedia:Synechococcus|''Synechococcus'' sp.]] [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3''] | |||
File:IS200.slide.show.3.B.png|'''Fig. IS200.slide show 3.''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3''] '''LE''' and '''RE''' Cleavage Sites | |||
File:IS200.slide.show.3.C.png|'''Fig. IS200.slide show 3.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3''] Copies | |||
File:IS200.slide.show.3.D.png|'''Fig. IS200.slide show 3.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3''] Copies | |||
File:IS200.slide.show.3.E.png|'''Fig. IS200.slide show 3.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3''] Copies | |||
File:IS200.slide.show.3.F.png|'''Fig. IS200.slide show 3.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3''] Copies | |||
File:IS200.slide.show.3.G.png|'''Fig. IS200.slide show 3.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3''] Copies | |||
</gallery><gallery mode="slideshow"> | |||
File:IS200.slide.show.4.A.png|'''Fig. IS200.slide show 4.''' Decay of ''[[wikipedia:Synechococcus_elongatus|Thermosynechococcus elongatus]]'' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2''] | |||
File:IS200.slide.show.4.B.png|'''Fig. IS200.slide show 4.''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2''] '''LE''' and '''RE''' Cleavage Sites | |||
File:IS200.slide.show.4.C.png|'''Fig. IS200.slide show 4.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2''] Copies | |||
File:IS200.slide.show.4.D.png| '''Fig. IS200.slide show 4.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2''] Copies | |||
File:IS200.slide.show.4.E.png|'''Fig. IS200.slide show 4.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2''] Copies | |||
File:IS200.slide.show.4.F.png|'''Fig. IS200.slide show 4.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2''] Copies | |||
File:IS200.slide.show.4.G.png|'''Fig. IS200.slide show 4.''' Alignment of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2''] Copies | |||
</gallery><gallery mode="slideshow"> | |||
File:IS200.slide.show.5.A.png| '''Fig. IS200.slide show 5.''' Decay of ''[[wikipedia:Synechococcus_elongatus|T. elongatus]]'' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel3 IS''Tel3''] Towards MICs | |||
File:IS200.slide.show.5.B.png|'''Fig. IS200.slide show 5.''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel3 IS''Tel3''] '''LE''' and '''RE''' Cleavage Sites | |||
</gallery>These forms seem to result from successive internal deletions and retain intact '''LE''' and '''RE''' copies. Sometimes, as in the case of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3''] ([[:File:IS200.slide.show.3.A.png|slide show 3]])., orf inactivation appears to have occurred by successive insertion/deletion of short sequences (indels) generating frameshifts and truncated proteins. For some IS (e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCco1 IS''Cco1''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCysp14 IS''Cysp14''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISSoc3 IS''Soc3'']) degradation can be precisely reconstituted and each successive step validated by the presence of several identical copies ([https://scholar.google.com/citations?user=WHAtfqcAAAAJ&hl=pt-BR P. Siguier], unpublished - [[:File:IS200.slide.show.1.A.png|Fig. IS200.slide show 1]], [[:File:IS200.slide.show.2.A.png|slide show 2]] ,[[:File:IS200.slide.show.3.A.png|slide show 3]], [[:File:IS200.slide.show.4.A.png|slide show 4]], and [[:File:IS200.slide.show.5.A.png|slide show 5]], respectively). This suggests that the degradation process is recent and that these derivatives are likely mobilized by TnpA supplied in trans by autonomous copies in the genome. | |||
Among the approximately 400 IS''200''/IS''605'' family entries in [https://tncentral.ncc.unesp.br/ISfinder/index.php ISfinder] (December 2023), there are more than 200 examples of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']-like derivatives. It was suggested that the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']-like derivatives might undergo transposition using a resident ''tnpA'' gene to supply a Y1 transposase ''in trans.'' There is some circumstantial evidence for transposition of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']-like elements. For example, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS891 IS''891''], present in multiple copies in the cyanobacterium ''[[wikipedia:Anabaena|Anabaena]]'' sp. strain M-131 genome <ref name=":8" /> was observed to have inserted into a plasmid which had been introduced into the strain and more recently it has been shown experimentally that [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] derivatives can be mobilized by a resident ''tnpA'' gene <ref name=":19">{{#pmid:37758954}}</ref> (see [[IS Families/IS200 IS605 family#The IS1341 Conundrum: how do derivatives without their transposase transpose.3F|The IS''1341'' Conundrum]]). This can be followed from a full length IS to the formation of MITES (e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel2 IS''Tel2'']; [[:File:IS200.slide.show.4.A.png|slide show 4]]) and MICs (e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTel3 IS''Tel3'']; [[:File:IS200.slide.show.5.A.png|slide show 5]]). | |||
====ISC: A group of Elements Related to the IS''605'' Group==== | |||
Another group of potential IS of similar organisation, the ISC insertion sequence group, was defined by Kapitonov et al.<ref name=":29">{{#pmid:PMC4810608}}</ref> following identification of [[wikipedia:Cas9|Cas9]] homologues which occur outside the [[wikipedia:CRISPR|CRISPR]] structure, so called “stand-alone” homologues. While related to TnpB, they are more similar to [[wikipedia:Cas9|Cas9]] than to TnpB proteins. These genes were often flanked by short DNA sequences which, like '''LE''' and '''RE''' of the IS''200''/IS''605'' family, were capable of forming secondary structures. Moreover, it was reported that the ends of many ISC derivatives showed significant identity to members of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] derivatives identified by these authors in the same study. ([[:File:FigIS200 605 5.png|Fig. IS200.5]]). These structures therefore resemble the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']-like group. | |||
[[File:FigIS200 605 5.png|center|thumb|720x720px|'''Fig. IS200.5.''' Potential secondary structures in IS''200''/IS''605''/ISC ends. For the IS''200/''IS''605'' members, the sequences of the left ('''LE''') and right ('''RE''') ends are shown in red and blue respectively. Note that these structures have only been verified for I[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''Dra2''] The other sequences are from Kapitonov et al. <ref name=":29" />. The potential secondary structures are indicated by horizontal blue arrows and bold type face.]] | |||
These potential transposable elements were called '''ISC''' (Insertion Sequences Encoding [[wikipedia:Cas9|Cas9]]; not to be confused with [https://tncentral.ncc.unesp.br/TnPedia/index.php/IS_Families/IS91-ISCR_families#ISCR IS''CR'', IS with '''C'''ommon '''R'''egion]). The name IscB was coined for the [[wikipedia:Cas9|Cas9-like]] protein and IscA for an associated potential transposase protein which was identified in a very limited number of cases. Examples of ISC elements with both ''iscA'' and ''iscB'' genes are quite rare. Only 7 cases were identified by Kapitonov et al.,<ref name=":29" /> ([[:File:FigIS200 605 6.png|Fig. IS200.6)]] and only 56 of 2811 ''iscB'' examples observed in a more extensive analysis were accompanied by an ''iscA'' copy <ref name=":30">{{#pmid:34591643}}</ref> . Most ISC identified were [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']-like with only the ''iscB'' (''tnpB''-like) gene. These stand-alone IscB copies were identified in multiple copies in a large number of bacterial and archaeal genomes generally in low numbers (<10 copies) although some genomes contained more elevated numbers (e.g. 22 in ''[[wikipedia:Methanosarcina|Methanosarcina lacustris]]''; 25 in ''[https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=64178 Coleofasciculus chthonoplastes]'' PCC 7420; 52 in ''[https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=363277&lvl=3&lin=f&keep=1&srchmode=1&unlock Ktedonobacter racemifer]'')<ref name=":29" />. | |||
However, in contrast to the observations of Kapitonov et al.,<ref name=":29" /> more wide-ranging studies <ref name=":30" /> identified rare IscB proteins which were not “stand alone” but were associated with [[wikipedia:CRISPR|CRISPR]] arrays (31 examples in a sample of 2811). | |||
A tree of “full-length” elements ([[:File:FigIS200 605 6.png|Fig. IS200.6]]; <ref name=":29" />)(i.e. those with both ''tnpA'' and ''tnpB'' or ''iscB'' genes) based on TnpA/IscA sequences showed that full length [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] and ISC examples carrying both ''tnpA''/''iscA'' and ''tnpB''/''iscB'' are interleaved. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] is among those family members with divergent ''tnpA'' and ''tnpB'' genes ([[:File:FigIS200 605 1.png|Fig. IS200.1]]) while other family members carry ''tnpA'' upstream of ''tnpB'' (e.g. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']). However, in contrast to all [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605'']-like derivatives, those full length ISC elements included in this tree all have the ''iscA'' gene downstream of and slightly overlapping with ''iscB''. | |||
[[File:FigIS200 605 6.png|center|thumb|720x720px|'''Fig. IS200.6.''' Phylogenetic tree of Y1 transposases encoded by IS''605'' (TnpA) and IS''C2Y'' (IscA). From Kapitonov et al. <ref name=":29" />. TnpA: RED; IscA: LIGHT RED; TnpB: GREY; IscB: BLUE. The arrowheads indicate the direction of expression. The IS were identified in: '''KR''', ''[https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=363277 Ktedonobacter racemifer]'' DSM44963; '''CS''', ''[[wikipedia:Coprobacillus|Coprobacillus]]'' sp. 3_3_56FAA; '''EC''', ''[[wikipedia:Enterococcus|Enterococcus cecorum]]'' DSM 20682 (ATCC 43198); '''AA''', ''[[wikipedia:Anaeromusa_acidaminophila|Anaeromusa acidaminophila]]'' DSM 3853; '''CH''', ''[[wikipedia:Clostridium_novyi|Clostridium. haemolyticum]]'' NCTC 9693; '''MA''', ''[https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=313606 Microscilla marina]'' ATCC 23134; '''VB''', ''[https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=553239 Vibrio breoganii]''; '''BC''', ''[[wikipedia:Bacteroides|Bacteroides coprophilus]]''; '''MM''', ''[[wikipedia:Methanosarcina|Methanosarcina mazei]]''; '''MZ''', ''[[wikipedia:Methanosalsum|Methanosalsum zhilinae]]''; '''EH''', ''[[wikipedia:Anaerobutyricum_hallii|Eubacterium hallii]]'' DSM 3353; '''BSp''', ''[[wikipedia:Butyrivibrio|Butyrivibrio]]'' sp MB2005; '''BMT2''', ''[[wikipedia:Bacillus|Bacillus]]'' sp. MT2; '''FP''', ''Francisella philomiragia''; '''HP''', ''[[wikipedia:Helicobacter_pylori|Helicobacter pylori]]'' Hp H-16; '''RI''', ''[[wikipedia:Roseburia_inulinivorans|Roseburia inulinivorans]]''.]] | |||
ISC have very similar transposases to those of the IS''200''/IS''605'' family and are therefore part of the same super family. | |||
An alignment of full length TnpA from the IS''200''/IS''605'' group [[:File:FigIS200 605 7.png|(Fig. 200.7]]; [https://tncentral.ncc.unesp.br/ISfinder/index.php ISfinder] November 2021) shows the highly conserved '''HuH''' triad, catalytic tyrosine ('''Y''') and important glutamine ('''Q''') residues all central to the transposition chemistry ([[:File:FigIS200 605 7.png|Fig. IS200.7]], [[:File:FigIS200 605 11.png|Fig. IS200.11]] and [[:File:FigIS200 605 12.png|Fig. IS200.12]]) together with a number of other highly conserved amino acid positions. An alignment with the available IscA from the ISC group ([[:File:FigIS200 605 8.png|Fig. 200.8]] '''Top''') shows that these also include all the highly conserved TnpA amino acid positions and are therefore very closely related to TnpA. However, the IscA and TnpA proteins appear to fall into separate clades ([[:File:FigIS200 605 8.png|Fig. 200.8]] '''bottom''') with some overlap. | |||
[[File:FigIS200 605 7.png|center|thumb|650x650px|'''Fig. IS200.7.''' Alignment of TnpA proteins from the IS''200''/IS''605'' family. The data is drawn from ISfinder (November 2021). The alignment was performed with Clustal omega2 (https://www.ebi.ac.uk/Tools/msa/clustalw2/) and drawn using Jalview Version 2. The '''HuH''', '''Y''' and '''Q''' residues are indicated. A consensus sequence is included beneath.]] | |||
Since IS families are defined by their transposases rather than their accessory genes, and those of ISC and the IS''200''/IS''605'' family are so similar, it seems reasonable to include the ISC group as a subgroup of the IS''200''/IS''605'' family (or [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] super family;<ref name=":29" /> ). For many of the archaeal elements, there is a small, potential 40-45 amino acid, peptide located upstream of the TnpB analogue. | |||
[[File:FigIS200 605 8.png|center|thumb|650x650px|'''Fig. IS200.8.''' Alignment of TnpA proteins from the IS''200''/IS''605'' family with IscA. The sequences for IS''200''/IS''605'' family members are from [https://tncentral.ncc.unesp.br/ISfinder/index.php ISfinder] (November 2021) and the IscA proteins from Kapitonov et al.<ref name=":29" /> kindly supplied by [[wikipedia:Kira_Makarova|Kira Makarova]]. '''Top.''' The alignment was performed with Clustal omega2 and drawn using Jalview Version 2. The '''HuH''', '''Y''' and '''Q''' residues are indicated. A consensus sequence is included beneath. The IscA proteins from Kapitonov et al. are included. '''Bottom.''' Phylogenetic tree from the same alignment. The Sequences from Kapitonov et al. <ref name=":29" />are boxed.]] | |||
A tree based on the TnpB/IscB ([[:File:FigIS200 605 9.png|Fig. IS200.9]]) examples presented by Kapitonov, et al.,<ref name=":29" /> shows that the TnpB homologues form a clade separate from IscB and that the latter can be divided into two clades, IscB1 and IscB2. | |||
These considerations therefore reinforce the idea that the IS''200''/IS''605'' family and ISC group might be considered as a superfamily which includes a number of related accessory genes (''tnpB'', ''iscB1'', ''iscB2'' etc), which carry flanking DNA sequences with secondary structure potential and in which a Y1 HuH transposase assures the chemistry of transposition. A similar conclusion was also reached by Altae-Tran et al.<ref name=":30" /> .However, this picture is complicated by the identification of another group of transposable elements, the [https://tncentral.ncc.unesp.br/TnPedia/index.php/IS_Families/IS607_family IS''607'' family] in which ''tnpB'' is associated with a different type of transposase, in this case a serine site-specific recombinase ([https://tncentral.ncc.unesp.br/TnPedia/index.php/IS_Families/IS607_family IS''607'' family]). | |||
[[File:FigIS200 605 9.png|center|thumb|720x720px|'''Fig. IS200.9.''' Alignment of TnpA proteins from the IS''200''/IS''605'' family with IscA. The sequences for IS''200''/IS''605'' family members are from [https://tncentral.ncc.unesp.br/ISfinder/index.php ISfinder] (November 2021) and the IscA proteins from Kapitonov et al.<ref name=":29" /> kindly supplied by [[wikipedia:Kira_Makarova|Kira Makarova]]. '''Top.''' The alignment was performed with Clustal omega2 and drawn using Jalview Version 2. The '''HuH''', '''Y''' and '''Q''' residues are indicated. A consensus sequence is included beneath. The IscA proteins from Kapitonov et al. are included. '''Bottom.''' Phylogenetic tree from the same alignment. The Sequences from Kapitonov et al. <ref name=":29" /> are boxed.]] | |||
== Mechanism of IS''200''/IS''605'' single strand DNA transposition == | |||
=== | === Early models === | ||
A number of alternative mechanisms were initially proposed to explain [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] transposition <ref name=":24" /> ([[:File:FigIS200 605 10.png|Fig. IS200.10]]). These all included the insertion of a double-strand circular transposon copy ([[:File:FigIS200 605 10.png|Fig. IS200.10]] '''D'''). One model ([[:File:FigIS200-4b.png|Fig. IS200.10]] '''A''') envisaged simultaneous or consecutive cleavage at '''LE''' and '''RE''' and reciprocal strand transfer would generate a [[wikipedia:Holliday_junction|'''H'''olliday '''j'''unction (HJ)]] which then could be resolved into double-strand circular copies of the transposon. The second ([[:File:FigIS200 605 10.png|Fig. IS200.10]] '''B''') cleavage at '''LE''' and replicative strand displacement using a 3’OH of the flanking donor DNA. This could assist formation of a single strand region accessible for cleavage of '''RE''' to generate a single-strand transposon circle which could be replicated into a double-strand copy. The third ([[:File:FigIS200 605 10.png|Fig. IS200.10]] '''C''') proposed cleavage at '''LE''' with displacement of the transposon strand to form a single strand loop. Subsequent ''in vitro'' and ''in vivo'' experiments (below) demonstrated that not only was [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] capable of excision as a single-strand DNA circle but that this could be inserted into a single strand target. | |||
[[File:FigIS200 605 10.png|center|thumb|720x720px|'''Fig. IS200.10.''' Proposed Models for [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] Transposition. Donor and target replicons are indicated (full and dotted lines, respectively). Dashed lines indicate newly replicated DNA. The conserved target sequence '''TTAC''' is also indicated. '''A)''' simultaneous or consecutive cleavage at '''LE''' and '''RE''' and reciprocal strand transfer would generate a [[wikipedia:Holliday_junction|'''H'''olliday '''j'''unction (HJ)]] which could be resolved into double-strand circular copies of the transposon; '''B)''' Cleavage at '''LE''' and replicated strand displacement using a 3’OH of the flanking donor DNA. This could assist the formation of a single strand circle region accessible for cleavage of '''RE''' to generate a single-strand transposon circle which could be replicated into a double-strand copy. '''C)''' Cleavage at '''LE''' with a displacement of the transposon strand to form a single strand loop. '''D)''' Integration. From Ton-Hoang et al.<ref name=":24" />.]] | |||
<br /> | |||
=== General transposition pathway === | |||
The transposition pathway of IS''200''/IS''605'' family members is shown in [[:File:FigIS200 605 11.png|Fig. IS200.11]]. Much of the biochemistry was elucidated using an [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] cell-free ''in vitro'' system which recapitulates each step of the reaction. This requires purified TnpA<sub>IS''608''</sub> protein, single strand [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] DNA substrates and divalent metal ions such as Mg2+ or Mn2+ <ref name=":24" /><ref name=":7" /><ref name=":0" />. Similar and complementary results were also obtained with [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']''<ref name=":32" /><ref name=":25" /><ref name=":9" />''. The reactions are not only strictly dependent on single strand (ss) DNA substrates but are also strand-specific: only the “top” strand (defined as the strand carrying target sequence, TS, 5’ to the IS; [[:File:FigIS200 605 11.png|Fig. IS200.11]] '''top''') is recognized and processed whereas the “bottom” strand is refractory<ref name=":24" /> <ref name=":7" />. Cleavage of the top strand at the left and right cleavage sites (TS/CL and CR, note that TS is also the left cleavage site CL) ([[:File:FigIS200 605 11.png|Fig. IS200.11]] '''B''') leads to excision as a circular ssDNA intermediate with abutted left and right ends (transposon joint) ([[:File:FigIS200 605 11.png|Fig. IS200.11]] '''C''' bottom left). This is accompanied by rejoining of the DNA originally flanking the excised strand (donor joint). | |||
[[File:FigIS200 605 11.png|center|thumb|720x720px|'''Fig. IS200.11.''' '''Top''': [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] organization. The left (LE) and right (RE) ends with a subterminal hairpin (HP) are in red and blue, left and right cleavage sites (CL/TS and CR) are represented by black and blue boxes, respectively. Bottom left: Excision. '''(A)''' TnpA activity: top strand (active strand) structures are recognized and cleaved by TnpA (vertical arrows). '''(B)''' Upon cleavage, a 5′ phosphotyrosine bond (green cylinder) is formed with LE, and with the RE 3′ flank and 3′-OH (yellow circle) is formed at the left flank and RE. '''(C)''' Excision of the IS''608'' single-strand circle intermediate with abutted LE and RE (RE–LE junction or transposon joint) accompanied by the formation of donor joint retaining the target sequence. '''Bottom right''': Integration. '''(D)''' Transposon circle with the transposon joint and target DNA (black) with the target site. '''(E)''' TnpA catalyzes the cleavage of transposon joint and single-strand target. '''(F)''' Integration.]] | |||
The transposon joint is then cleaved ([[:File:FigIS200 605 11.png|Fig. IS200.5]] '''E''' bottom right) and integrated into a single strand conserved element-specific target sequence (TS) where the left end invariably inserts 3’ to TS ([[:File:FigIS200 605 11.png|Fig. IS200.5]] '''F'''). This target specificity is another unusual feature of IS''200''/IS''605'' transposition. The target sequence is characteristic of the particular family member and, although it is not part of the IS, it is essential for further transposition because it is also the left end cleavage site CL of the inserted IS <ref name=":24" /> (''The Single strand Transpososome and Cleavage site recognition'') and is therefore intimately involved in the transposition mechanism. | |||
=== | === TnpA, Y1 transposases and transposition chemistry === | ||
IS''200''/IS''605'' family transposases belong to the '''HUH enzyme superfamily.''' All contain a conserved amino-acid triad composed of Histidine (H)-bulky hydrophobic residue (U)-Histidine (H)<ref>{{#pmid:8374079}}</ref> providing two of three ligands required for coordination of a divalent metal ion that localizes and prepares the scissile phosphate for nucleophilic attack. HUH proteins catalyze ssDNA breakage and joining with a unique mechanism. They all catalyse DNA strand cleavage using a transitory covalent 5' phosphotyrosine enzyme-substrate intermediate and release a 3' OH group <ref name=":1" /> ([[General Information/Major Groups are Defined by the Type of Transposase They Use#Groups%20with%20HUH%20Enzymes|Groups with HUH Enzymes]]; [[:File:1.10.1.png|Fig.7.5]]). | |||
The | The HUH enzyme family also includes other transposases of the [[IS Families/IS91-ISCR families|IS''91''/IS''CR'' and Helitron families]] as well as proteins involved in DNA transactions essential for plasmid/virus rolling circle replication (Rep; not to be confused with the TnpA<sub>REP</sub>/REP system described in [[IS Families/IS200-IS605 family#Y1 transposase domestication|Domestication]]) and plasmid conjugation (Mob/relaxase) ([[General Information/Major Groups are Defined by the Type of Transposase They Use#Groups with HUH Enzymes|Groups with HUH Enzymes]]; [[:File:1.10.1.png|Fig.7.5]]). | ||
IS''200''/IS''605'' transposases are single-domain proteins containing a single catalytic tyrosine residue, called '''Y1 transposase'''. They use the tyrosine residue (Y127 for IS608) as a nucleophile to attack the phosphodiester link at the cleavage sites (vertical arrows in [[:File:FigIS200 605 11.png|Fig. IS200.11]] '''A''' and '''D'''). Since cleavages at both IS ends occur on the same strand, the polarity of the reaction implies that the enzyme forms a covalent 5’-phosphotyrosine bond with the IS at LE producing a 3’-OH on the DNA flank and a 5’-phosphotyrosine bond at the RE flank producing a 3’-OH on RE itself ([[:File:FigIS200 605 11.png|Fig. IS200.11]] '''B'''). The released 3′-OH groups then act as nucleophiles to attack the appropriate phospho-tyrosine bond resealing the DNA backbone in one case and generating a single-strand DNA transposon circle in the other ([[:File:FigIS200 605 11.png|Fig. IS200.11]] '''C'''). The same polarity is applied to the integration step ([[:File:FigIS200 605 11.png|Fig. IS200.11]] '''D''', '''E''' and '''F'''). As an important mechanistic consequence of this chemistry, IS''200''/IS''605'' transposition occurs without loss or gain of nucleotides. ''In vitro'', the reaction requires only TnpA and does not require host cell factors. | |||
IS''200''/IS''605'' | |||
=== TnpA overall structure === | |||
Crystal structures of Y1 transposases have been determined for three family members: [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] (TnpA<sub>IS''608''</sub>) from ''[[wikipedia:Helicobacter_pylori|Helicobacter pylori]]'' <ref name=":23" /><ref name=":0" /> [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] (TnpA<sub>IS''Dra2''</sub>) from ''[[wikipedia:Deinococcus_radiodurans|Deinococcus radiodurans]] <ref name=":9" />'' and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISC1474 IS''C1474''] from ''[[wikipedia:Sulfolobus_solfataricus|Sulfolobus solfataricus]]''<ref name=":162">{{#pmid:16340015}}</ref>. In contrast to most characterised '''HUH enzymes''', which are usually monomeric and have two catalytic tyrosines, '''Y1 transposases''' form obligatory dimers with two active sites ([[:File:FigIS200 605 12.png|Fig. IS200.12]] '''A'''). The two monomers dimerize by merging their β-sheets into one large central β-sheet sandwiched between α-helices. Each catalytic site is constituted by the '''HUH motif''' from one TnpA monomer (H64 and H66 in the case of TnpA<sub>IS''608''</sub>) and a catalytic tyrosine residue (Y127) located in the C-terminal αD helix tail of the other monomer ([[:File:FigIS200 605 12.png|Fig. IS200.12]] '''A'''). This is joined to the body of the protein by a flexible loop (trans configuration, Active site assembly and Catalytic activation and [[IS Families/IS200 IS605 family#Transposition cycle: the trans.2Fcis rotational model|Transposition cycle: the trans/cis rotational model]]). | |||
[[File:FigIS200 605 12.png|center|thumb|720x720px|'''Fig. IS200.12.''' '''(A)''' Crystallographic structure of TnpA alone. The two monomers of the TnpA dimer are colored green and orange, respectively. Positions of helix αD and catalytic residues are shown. '''(B)''' Co structure TnpA–RE HP22. HP22 is shown in blue. The extrahelical T17 and the T located in the hairpin loop are indicated in red (6). Note that in the TnpA–HP22 co-structure, binding sites for the hairpins are located on the same face of the TnpA dimer whereas the two catalytic sites are formed on the opposite surface (A, C–F).]] | |||
The TnpA enzyme active sites are believed to adopt two functionally important conformations: the trans configuration described above ([[:File:FigIS200 605 12.png|Fig. IS200.12]] '''A'''), in which each active site is composed of the '''HUH motif''' supplied by one monomer with the tyrosine residue supplied by the other, and the cis configuration, in which both motifs are contributed by the same monomer (IS''200''/IS''605'' '''video 1''' '''below'''; kindly supplied by [https://www.embl.de/research/units/scb/barabas/ O. Barabas] and [https://www-mslmb.niddk.nih.gov/dyda/dydalab.html Fred Dyda]). | |||
The trans conformation is active during cleavage where Tyrosine acts as nucleophile whereas the cis conformation is thought to function during strand transfer where the 3’OH is the attacking nucleophile ([[IS Families/IS200 IS605 family#Transposition cycle: the trans.2Fcis rotational model|Transposition cycle: the trans/cis rotational model]]). Only the trans configuration of TnpA<sub>IS''608''</sub> and TnpA<sub>IS''Dra2''</sub> has yet been observed crystallographically <ref name=":23" /><ref name=":9" /> but the existence of the cis configuration is supported by biochemical data <ref name=":82">{{#pmid:23345619}}</ref>.<br /><center> | |||
<center> | |||
{| class="wikitable" | {| class="wikitable" | ||
![[File:IS200 S605-video-1.mp4|center|380x380px]]'''<small>IS''200''/IS''605'' video 1</small>''' | ![[File:IS200 S605-video-1.mp4|center|380x380px]]'''<small>IS''200''/IS''605'' video 1</small>''' | ||
| Line 103: | Line 146: | ||
</center> | </center> | ||
=== The Single strand Transpososome === | |||
The key machinery for transposition is the higher-order protein-DNA complex, the transpososome (or synaptic complex) which contains both transposase and two IS DNA ends with or without target DNA. Transpososome formation, stability, and the temporal changes in a configuration which occur during the transposition cycle have been characterized for | The key machinery for transposition is the higher-order protein-DNA complex, the transpososome (or synaptic complex) which contains both transposase and two IS DNA ends with or without target DNA. Transpososome formation, stability, and the temporal changes in a configuration which occur during the transposition cycle have been characterized for TnpA<sub>IS''608''</sub> by crystallographic and biochemical approaches. | ||
Although for technical reasons it was not possible to obtain structures with both LE and RE hairpins together, co-crystal structures with either LE or RE showed that a TnpA dimer binds two subterminal DNA hairpins suggesting that it could bind both '''LE''' and '''RE''' ends simultaneously. Binding sites for the hairpins are located on the same face of the TnpA dimer while the two catalytic sites are formed on the opposite surface ([[:File:Fig. IS200.6.png|Fig. IS200.6]] '''A''' and '''B)''' (IS''200''/IS''605'' '''video 2 below'''; kindly supplied by [https://www.embl.de/research/units/scb/barabas/ O. Barabas] and [https://www-mslmb.niddk.nih.gov/dyda/dydalab.html Fred Dyda]). The hairpin forms a distorted helix anchored by base interactions at the foot (IS''200''/IS''605'' '''video 2 below'''; kindly supplied by [https://www.embl.de/research/units/scb/barabas/ O. Barabas] and [https://www-mslmb.niddk.nih.gov/dyda/dydalab.html Fred Dyda]). | |||
<center> | <center> | ||
{| class="wikitable" | {| class="wikitable" | ||
![[File:IS200 S605-video-2 1.mp4|center|380x380px]]'''<small>IS''200''/IS''605'' video 2</small>''' | ![[File:IS200 S605-video-2 1.mp4|center|380x380px]]'''<small>IS''200''/IS''605'' video 2</small>''' | ||
|}</center><br /> | |}</center> | ||
<br /> | |||
==== Substrate recognition ==== | |||
A key feature of TnpA is that it is only active on one strand, the “top” strand. The [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] ends carry subterminal imperfect hairpins. In addition to specific sequences on the loops, the irregularities on the hairpins help the enzyme to distinguish between “top” and “bottom” strands <ref name=":23" /> <ref name=":9" />. The initial co-crystal structure was obtained with | A key feature of TnpA is that it is only active on one strand, the “top” strand. The [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] ends carry subterminal imperfect hairpins. In addition to specific sequences on the loops, the irregularities on the hairpins help the enzyme to distinguish between “top” and “bottom” strands <ref name=":23" /><ref name=":9" />. The initial co-crystal structure was obtained with TnpA<sub>IS''608''</sub> and a 22nt imperfect RE hairpin (HP22) including its characteristic extrahelical T17 located mid-way along the DNA stem ([[:File:FigIS200 605 12.png|Fig. IS200.12]] and [[:File:FigIS200 605 13.png|Fig. IS200.13]]). In addition to a number of backbone contacts with HP22, TnpA<sub>IS''608''</sub> also shows several base-specific contacts, in particular with T10 in the loop and the extrahelical T17<ref name=":23" /> ([[:File:FigIS200 605 12.png|Fig. IS200.12]] '''B'''). | ||
Although most members of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group, which includes [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], have imperfect palindromes with extrahelical bases or bulges, some members of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group (e.g [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1541 IS''1541'']) include perfect hairpins. Whether base-specific interactions with the loop sequence is exclusively responsible for strand-specific activity of the corresponding transposase remains to be clarified.<center> | Exchange of T10 and neighboring T nucleotides in the loop abolished binding whereas the exchange of T17 for an A significantly reduced but did not eliminate binding <ref name=":102">{{#pmid:21745812}}</ref>. Similar studies with TnpA<sub>IS''Dra2''</sub> showed that it also recognises a similarly located T in the hairpin loop of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] and that this is essential for binding <ref name=":9" /> . Instead of an extrahelical T, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] '''LE''' and '''RE''' include a bulge caused by two mismatched nucleotides (G and T) in the hairpin stem. These unpaired nucleotides are specifically recognized and stabilized by the protein. Again, mutation of the T (to C which, in this case, eliminates the bulge to generate a GC base pair in the stem) greatly reduces binding (IS''200''/IS''605'' '''video 3A below'''; kindly supplied by [https://www.embl.de/research/units/scb/barabas/ O.Barabas] and [https://www-mslmb.niddk.nih.gov/dyda/dydalab.html Fred Dyda]). | ||
Although most members of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group, which includes [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], have imperfect palindromes with extrahelical bases or bulges, some members of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group (e.g [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1541 IS''1541'']) include perfect hairpins. Whether base-specific interactions with the loop sequence is exclusively responsible for strand-specific activity of the corresponding transposase remains to be clarified. | |||
<center> | |||
{| class="wikitable" | {| class="wikitable" | ||
![[File:IS200 S605-video-3A.mp4|center|380x380px]]<small>IS''200''/IS''605'' '''video 3A'''</small> | ![[File:IS200 S605-video-3A.mp4|center|380x380px]]<small>IS''200''/IS''605'' '''video 3A'''</small> | ||
|}</center> | |}</center> | ||
The left (CL/TS) and right (CR) [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] cleavage sites (TTACl and TCAAl respectively, where l represents the point of cleavage) are located some distance from the subterminal recognition hairpins (19 nt at LE and 10 nt at RE) ([[:File: | ==== Cleavage site recognition ==== | ||
[[ | The left (CL/TS) and right (CR) [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] cleavage sites (TTACl and TCAAl respectively, where l represents the point of cleavage) are located some distance from the subterminal recognition hairpins (19 nt at '''LE''' and 10 nt at '''RE''') ([[:File:FigIS200 605 13.png|Fig. IS200.13]]). The system is asymmetric because the two distinct cleavage sites are separated from the hairpins by linkers of different lengths and the CL/TS sequence does not form part of IS while CR does. | ||
[[File:FigIS200 605 13.png|center|thumb|720x720px|'''Fig. IS200.13.''' Canonical and noncanonical base interactions in '''(A)''' '''left end''' (LE) and '''(B)''' '''right end''' (RE). LE and RE (red and blue). Cleavage sequences CL or CR (black or dark blue boxes); guide sequences GL and GR pink or light blue, respectively. Two nucleotides at the 3′ foot of HPL, R involved in triplet formation are highlighted by bold and in a black frame. LE and RE and the base paring within HPL and HPR are shown. Insets show interactions between cleavage and guide sequences. Filled lines: canonical base interactions, dotted lines: additional noncanonical base interactions.]] | |||
Structural studies revealed that the cleavage sites are recognized in a unique way that does not involve direct sequence recognition by TnpA. Instead, an internal part of the IS sequence is co-opted to recognize different cleavage sites allowing TnpA to catalyze both excision and integration of the element with a single DNA binding domain. | Structural studies revealed that the cleavage sites are recognized in a unique way that does not involve direct sequence recognition by TnpA. Instead, an internal part of the IS sequence is co-opted to recognize different cleavage sites allowing TnpA to catalyze both excision and integration of the element with a single DNA binding domain. | ||
Internal transposon sequences, the left (GL) and right (GR) tetranucleotide guide sequences, AAAG and GAAT, located 5’ to the foot of the hairpins ([[:File:Fig. IS200.7.png|Fig. IS200.7]]), recognize their respective cleavage sites by direct base interactions. These GL/CL and GR/CR interactions involve 3 of the 4 nt of GL and GR. They include both [[wikipedia:Base_pair|canonical Watson-Crick interactions]] and in the case of RE, non-canonical interactions resulting in base triplets ([[:File: | Internal transposon sequences, the left ('''GL''') and right ('''GR''') tetranucleotide guide sequences, AAAG and GAAT, located 5’ to the foot of the hairpins ([[:File:Fig. IS200.7.png|Fig. IS200.7]]), recognize their respective cleavage sites by direct base interactions. These GL/CL and GR/CR interactions involve 3 of the 4 nt of GL and GR. They include both [[wikipedia:Base_pair|canonical Watson-Crick interactions]] and in the case of '''RE''', non-canonical interactions resulting in base triplets ([[:File:FigIS200 605 13.png|Fig. IS200.13]] and [[:File:FigIS200 605 14.png|Fig. IS200.14]], bases joined by both regular and dotted lines respectively). In the case of '''LE''' and the transposon joint, base triples (dotted lines) are suggested from biochemical data <ref name=":102" /> (IS''200''/IS''605'' '''video 3B below'''; kindly supplied by [https://www.embl.de/research/units/scb/barabas/ O. Barabas] and [https://www-mslmb.niddk.nih.gov/dyda/dydalab.html Fred Dyda]). | ||
[[ | [[File:FigIS200 605 14.png|center|thumb|720x720px|'''Fig. IS200.14.''' Structure of the co-complex TnpA<sub>IS''608''</sub>–RE35 adapted from reference 8 showing the active site and the base pairs between '''CR''' ('''TCAA''', dark blue) and '''GR''' ('''GAAT''', light blue). The gray sphere is bound Mn2+. '''Right''': Two base triplets observed in the TnpA<sub>IS''608''</sub>–RE35 complex.]] | ||
<center> | <center> | ||
{| class="wikitable" | {| class="wikitable" | ||
![[File:IS200 S605-video-3B.mp4|center|381x381px]]<small>IS''200''/IS''605'' '''video 3B'''</small> | ![[File:IS200 S605-video-3B.mp4|center|381x381px]]<small>IS''200''/IS''605'' '''video 3B'''</small> | ||
|} | |} | ||
</center>These interactions place the scissile phosphate precisely into the two active sites of | </center> | ||
[[ | These interactions place the scissile phosphate precisely into the two active sites of TnpA<sub>IS''608''</sub> for nucleophilic attack by the catalytic Y127. Interestingly, the base-pairing patterns responsible for cleavage site recognition are similar at '''LE''', '''RE''' and the target site in spite of sequence differences ([[:File:FigIS200 605 13.png|Fig. IS200.13]], [[:File:FigIS200 605 14.png|Fig. IS200.14]], [[:File:FigIS200 605 15.png|Fig. IS200.15]]). Since TS is also CL, this type of recognition not only explains the requirement for the TS located at the left end of the inserted IS ([[:File:FigIS200 605 11.png|Fig. IS200.11]], [[:File:FigIS200 605 15.png|Fig. IS200.15]]) for further transposition, but also the target specificity. Upon integration, TS is presumably recognized by the GL present on the excised transposon joint. Note that the transposon joint contains only the '''LE''' guide sequence GL but not the '''LE''' cleavage site CL ([[:File:FigIS200 605 11.png|Fig. IS200.11]], [[:File:FigIS200 605 15.png|Fig. IS200.15]]). | ||
[[File:FigIS200 605 15.png|center|thumb|720x720px|'''Fig. IS200.15.''' Target recognition: single-strand transposon joint ('''RE'''–'''LE''' junction) and target Ts are presented. For simplicity, only the recognition of the target cleavage site is indicated. '''LE''' and '''RE''' are shown in red and blue. Cleavage sequences '''C<sub>L</sub>''' or '''C<sub>R</sub>''' are placed in black or dark blue boxes; guide sequences '''G<sub>L</sub>''' and '''G<sub>R</sub>''' are framed in pink and light blue, respectively. Two nucleotides at the 3′ foot of the left and right hairpin structures '''HP<sub>L</sub>''' and '''HP<sub>R</sub>''' involved in triplet formation are highlighted by bold and are in a black frame. Nucleotide sequences of '''LE''' and '''RE''' and the base paring within '''HP<sub>L</sub>''' and '''HP<sub>R</sub>''' are shown. The inset figures describe the interactions between the cleavage sequences and guide sequences. The filled lines indicate canonical base interactions and the dotted lines indicate additional noncanonical base interactions.]] | |||
Similar crystal structures were obtained with TnpA<sub>IS''Dra2''</sub> (see also Single strand DNA ''in vivo'') with a similar interaction network between the guide sequences and cleavage sites. | |||
The [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] transpososome is structurally very similar to those of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] despite only 34% sequence identity of the transposases. It is important to note that the target sequence in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] is a pentanucleotide instead of a tetranucleotide as in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608'']. The fifth nucleotide in the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] sequence is however not involved in DNA-DNA interactions but in DNA-protein interaction<ref name=":9" />. | |||
The potential cleavage site recognition mode (i.e. the canonical interaction network between CL,R and GL,R) is indeed well conserved throughout the family ([[:File:FigIS200 605 16.png|Fig. IS200.16]]). | |||
[[File:FigIS200 605 16.png|center|thumb|720x720px|'''Fig. IS200.16.''' Multiple sequence alignment of the cleavage sites and guide sequences using [http://weblogo.berkeley.edu/ Weblogo] was carried out on 38, 43 and 23 members of the IS''200'' '''(i)''', the IS''605'' '''(ii)''', and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] '''(iii)''' groups, respectively.]] | |||
This model has been validated ''in vitro'' and ''in vivo'' by showing that it is possible to modify cleavage sites by changing corresponding guide sequences. Moreover, in the case of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''], modifications of GL in the transposon joint generate predictable changes in insertion site-specificity of the element <ref name=":52">{{#pmid:19524540}}</ref>. The [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] recognition system has also been modified to include additional sequences which assist more specific targeting of insertions<ref>{{#pmid:29635476}}</ref>. | |||
==== Active site assembly and Catalytic activation ==== | |||
Comparison of crystal structures of different TnpA protein-DNA complexes <ref name=":23" /><ref name=":0" /> <ref name=":162" /> revealed TnpA in both active and inactive configurations. In both the free TnpA<sub>IS''608''</sub> dimer and TnpA<sub>IS''608''</sub>-DNA complexes bound to a “minimal” HP22 hairpin (which does not include the guide sequence), the catalytic tyrosine residue (Y127) points away from the HUH motif (H64 and H66) and therefore cannot act as a nucleophile <ref name=":23" /> ([[:File:FigIS200 605 11.png|Fig. IS200.11]]). | |||
The enzyme is therefore in an inactive conformation. Binding to the appropriate substrate containing the 4 nucleotide guide sequence 5’ to the hairpin foot (compare [[:File:FigIS200 605 17.png|Fig. IS200.17]] '''left''' and '''right''') triggers a change in TnpA configuration that permits assembly of functional active sites. A single A (A+18, [[:File:FigIS200 605 13.png|Fig. IS200.13]] and [[:File:FigIS200 605 14.png|Fig. IS200.13]]) in the guide sequence present in both GL and GR does not participate in base interactions with the cleavage site. On formation of the CL(R)/GL(R) base interaction network, this single base penetrates the structure and forces the C-terminal αD helix carrying Y127 closer to the '''HuH motif''' placing it in the correct position poised for catalysis <ref name=":0" /> (compare [[:File:FigIS200 605 17.png|Fig. IS200.17]] '''left''' and '''right'''; [[:File:FigIS200 605 18.png|Fig. IS200.18]])(IS''200''/IS''605'' '''video 4 below'''; kindly supplied by [https://www.embl.de/research/units/scb/barabas/ O. Barabas] and [https://www-mslmb.niddk.nih.gov/dyda/dydalab.html Fred Dyda]). | |||
This movement also places a third amino acid (Q131 located at the C-terminal end of helix αD on the same face as Y127) in a position enabling it to function in conjunction with both H residues to complete the metal ion binding pocket. This movement is made possible by the fact that the αD helix is attached to the protein body by a flexible loop. This conformational change involving αD helix movement will be discussed below ([[IS Families/IS200-IS605 family#Transposition cycle: the trans.2Fcis rotational model|Transposition cycle: the trans/cis rotational model]]). | |||
<br /> | |||
[[File:FigIS200 605 17.png|alt=|center|thumb|720x720px|'''Fig. IS200.17.''' The presence of the guide sequence AAAG at the foot of IPL results in the movement of helices αD and places tyrosine Y127 in the correct position with respect to the HUH to form the active site.]] | |||
<br /> | |||
[[File:FigIS200 605 18.png|center|thumb|720x720px|'''Fig. IS200.18.''' '''(C)''' Configuration of the active site in the TnpA–RE HP22. HP22 is shown in blue. Note that in A, B and C, TnpA is in the inactive conformation. The arrow shows the presumed rotation of the αD helix to activate the protein. '''(D)''' Configuration of the active site in the TnpA–LE HP26 co-structure. LE HP26 is shown in red and the 5′ 4-nucleotide extension ('''GL''') in yellow). The base A+18 has displaced Y127 to activate the protein. (Adapted from references 6 and 8.)]] | |||
<br /> | |||
<center> | |||
{| class="wikitable" | |||
![[File:IS200 S605-video-4.mp4|center|381x381px]]<small>IS''200''/IS''605'' '''video 4'''</small> | |||
|} | |||
</center> | |||
==== | ==== Transpososome assembly and stability ==== | ||
Excision requires the assembly of a transpososome containing both '''LE''' and '''RE'''. However, it is technically difficult to generate crystallographically pure complexes of this type. Only crystal structures containing two '''LE''' or two '''RE''' were obtained. The excision transpososome was initially modelled using information obtained from the IS''608''LE-TnpA and RE-TnpA structures <ref name=":0" /> ([[:File:FigIS200 605 12.png|Fig. IS200.12]] '''B'''; [[:File:FigIS200 605 19.png|Fig. IS200.19]]). However, complexes containing both '''LE''' and '''RE''' have now been identified using a band shift assay and characterized biochemically <ref name=":102" />. | |||
[[File:FigIS200 605 19.png|center|thumb|720x720px|'''Fig. IS200.19.''' '''(E)''' TnpA–RE35 complex. Interaction of '''GR'''-'''CR''' (in light and dark blue, respectively) positions the cleavage site within the catalytic site of the protein. '''(F)''' Modeled TnpA–LE–RE complex. '''LE''', '''RE''', and flanking sequences in red, blue, and black, respectively.]] | |||
A TnpA co-complex with either '''LE''' or '''RE''' can be titrated by the addition of increasing quantities of the other end ('''RE''' or '''LE''') to obtain a transpososome containing both '''LE''' and '''RE'''. This can be easily detected in a gel shift assay. Such species proved to be catalytically active since they could be removed from the gel and, when incubated with the essential divalent metal ion, robust reaction products could be detected in a denaturing ge <ref name=":102" />. | |||
< | This approach was used to monitor both transpososome formation and stability using oligonucleotides carrying point mutations in GL,R and CL,R. Robust transpososome formation and cleavage activity requires much of the network of GL,R and CL,R interactions observed in the crystal structures <ref name=":102" /> (schematised in [[:File:FigIS200 605 13.png|Fig. IS200.13]]). Although base triplets in the original '''LE''' co-crystal structure were not detected since the '''LE''' substrate was too short <ref name=":0" />, the biochemical data suggested that such interactions probably exist (grey dotted lines in [[:File:Fig. IS200.7.png|Fig. IS200.13]]). | ||
| | |||
For example, the two nucleotides 3’ to the foot of the '''LE''' hairpin (''at equivalent positions to triplet forming bases in '''RE''''', [[:File:FigIS200 605 13.png|Fig. IS200.13]] are required for robust synaptic complex formation and cleavage <ref name=":102" />. This further implies that these base triplets might also be involved in target DNA capture (grey dotted lines in [[:File:FigIS200 605 15.png|Fig. IS200.15]]). | |||
[[ | |||
Base changes in GL resulted in a predictable choice of target sequence <ref name=":52" />. However, large differences in insertion frequencies were observed. The influence of the presumed non-canonical interactions in LE would provide an explanation for this variability since these were not taken into account in the choice of '''LE''' guide sequence. | |||
In both [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], the extra-helical bases in the hairpin stem and nucleotides in the loop are also important for transpososome formation even in a context which includes both GL,R and CL,R<ref name=":9" /><ref name=":102" />. | |||
=== Transposition cycle: the trans/cis rotational model === | |||
Transpososome assembly is followed by two critical chemical steps: cleavage and strand transfer. These are thought to be accomplished by a series of large changes in transpososome configuration. A detailed model has been proposed for the dynamics of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] transpososome during the transposition reactions<ref name=":0" /> ([[:File:FigIS200 605 19.png|Fig. IS200.19]]; IS''200''/IS''605'' '''video 1'''). As described in TnpA overall structure (above), TnpA<sub>IS''608''</sub> could in principle assume two configurations: trans and cis. Switching between these two states would involve rotation of the two unconstrained flexible arms which join the αD helix to the protein body. | |||
The current model for [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] transposition proposes that the strand transfer step involves rotation of these arms from the trans to the cis configuration: cleavage occurs while the enzyme is in the trans configuration. A trans to cis conformational change then occurs allowing strand transfer. The ground state of the IS''608'' and IS''Dra2'' transpososomes obtained from crystallography is the trans configuration. LE and RE binding and cleavage occur with the enzyme in its trans configuration ([[:File:File:FigIS200 605 19.png|Fig. IS200.19]]; IS''200''/IS''605'' '''video 1'''). | |||
This results in the formation of the 5’ phosphotyrosine bond with LE liberating a 3’-OH on the flanking DNA and the 5’phosphotyrosine bond with the RE DNA flank liberating a 3’-OH on the RE transposon end. Rotation of the two arms would displace LE towards the sequestered 3’-OH of RE and the RE flank towards the 3’-OH of the LE flank ([[:File:File:FigIS200 605 19.png|Fig. IS200.19]]; IS''200''/IS''605'' '''video 1''') and position them so that both 3’-OH can attack the appropriate phosphodiester bond. This model is supported by several lines of indirect evidence from studies of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608'']. | |||
An initial piece of evidence concerns the length differences in the LE and RE “linker” (the distance between the hairpin foot and the cleavage site): this is only 10 nt for RE but 19 nt for LE ([[:File:FigIS200 605 15.png|Fig. IS200.15]]). The rotation model suggests that the longer LE linker may be required to provide sufficient length to rotate the 5’ LE phospho-tyrosine bond to position it closes the immobile RE 3’-OH ([[:File:FigIS200 605 19.png|Fig. IS200.19]]; IS''200''/IS''605'' '''video 1'''). This would imply that LE linker length is critical for strand transfer. Indeed, sequential reduction in the length of the LE linker has a large effect on transposition frequency and excision ''in vivo''. ''In vitro'', it also had a somewhat larger effect on strand transfer than on cleavage <ref name=":102" />, supporting the idea that the linker is important for mechanical movement. | |||
However, transpososome formation and stability was also observed to be affected with the shortest linkers. This presumably reflects steric barriers to GL(R)/CL(R) interaction and supports the notion that these interactions are important in transpososome assembly. A survey of over 100 different IS from all three groups (35 from the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group; 47 from [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] and 24 from [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']) in the public databases has shown that the asymmetry of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] ends is conserved across the entire family: the left linker is always longer than the right (15-16 nt versus 8 nt) <ref name=":82" /> ([[:File:FigIS200 605 20.png|Fig. IS200.20]]). | |||
[[File:FigIS200 605 20.png|center|thumb|640x640px|'''Fig. IS200.20.''' Linker length distribution of LE and RE from 76 (red) and 80 (blue) different IS, respectively.]] | |||
The second piece of evidence comes from the behaviour of TnpA<sub>IS''608''</sub> [[wikipedia:Protein_dimer|heterodimers]] carrying point mutations in the '''HuH or catalytic Y.''' These were expressed and assembled ''in vivo'' and purified based on two different C-terminal affinity tags (one for each monomer). This permitted [[wikipedia:Protein_dimer|heterodimers]] to be distinguished form [[wikipedia:Protein_dimer|homodimers]]. A [[wikipedia:Protein_dimer|heterodimer]] with a combination of mutations that enforce a trans-active TnpA site (in which the wildtype HuH motif and Y127 belong to different TnpA monomers) is proficient for cleavage but not for rejoining. In contrast, a heterodimer with cis-active TnpA site (in which the wildtype HuH motif and Y127 belong to the same TnpA monomer) is proficient for rejoining but inactive in cleavage <ref name=":82" />. | |||
This implies that all chemical reactions involved in cleavage occur in the trans site while the chemical reactions for strand transfer occur in the cis site. This strongly supports the rotational model. | |||
A third piece of evidence comes from studies of the flexible arm that joins helix αD to the body of the protein and which is proposed to play a pivotal role in the rotation. This flexibility may be facilitated by two glycine residues (G117 and G118). Mutation of these two residues did not affect strand cleavage but led to inhibition of strand transfer suggesting that the two residues are required for achieving a cis configuration. The importance of these G residues is reflected in their conservation throughout the family <ref name=":82" />. | |||
A | Thus, while the cis configuration has not been observed crystallographically for these elements, its existence is strongly suggested by experimental data, supporting the trans/cis rotational model ([[:File:FigIS200 605 21.png|Fig. IS200.21]]). | ||
[[File:FigIS200 605 21.png|center|thumb|720x720px|'''Fig. IS200.21.''' Strand transfer and reset model of IS''608'' transpososome. '''(A)''' The inactive form of TnpA dimer in the absence of DNA (pale green, orange ovals, and dark green and orange cylinders represent the body and the αD helices of two monomers, respectively). At the ends, dotted red and blue lines represent linkers at the left end ('''LE''') and the right end ('''RE'''), light red and light blue boxes represent GL and GR, respectively. '''(B)''' Binding of a copy of LE and RE resulting in TnpA activation (catalytic sites in trans). '''(C)''' Cleavage of both ends forms a 5′ phosphotyrosine linkage between Y127 and '''LE''' on one αD helix (dark orange cylinders) and between Y127 and the RE flank on the other (dark green cylinders). 3′-OH groups are shown as yellow circles. Reciprocal rotation of both αD helices from trans to the cis configuration is indicated by large arrows. '''(D)''' Strand transfer takes place to reconstitute the joined donor backbone (donor's joint) and generate the '''RE'''–'''LE''' transposon junction at cis configuration. '''(E)''' Release of the donor's joint and transition from cis to trans configuration. '''(F)''' Reset to the transform and target site engagement. '''(G)''' Cleavage of the '''RE'''–'''LE''' junction and target and transition from trans to cis configuration. '''(H)''' Regeneration of the left and right transposon ends.]] | |||
<br /> | |||
== Regulation of single strand transposition == | |||
====Single strand DNA ''in vivo''==== | ====Single strand DNA ''in vivo''==== | ||
The obligatory single-stranded nature of IS''200''/IS''605'' transposition ''in vitro'' suggests that it is limited in vivo by the availability of its ssDNA substrates inside the cells and processes that produce ssDNA may stimulate transposition. We describe below a link between the transposition of these elements and the replication fork. Moreover, in the case of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], single strand DNA produced during re-assembly of the ''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]]'' genome following irradiation results in stimulation of transposition<ref | The obligatory single-stranded nature of IS''200''/IS''605'' transposition ''in vitro'' suggests that it is limited ''in vivo'' by the availability of its ssDNA substrates inside the cells and processes that produce ssDNA may stimulate transposition. We describe below a link between the transposition of these elements and the [[wikipedia:DNA_replication#Replication_fork|replication fork]]. Moreover, in the case of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], single strand DNA produced during re-assembly of the ''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]]'' genome following irradiation results in stimulation of transposition<ref name=":32" /><ref name=":172">{{#pmid:16359337}}</ref>. Transcription or other processes leading to horizontal gene transfer such as transformation, conjugative transfer, or transduction with single strand [[wikipedia:Bacteriophage|phages]] might also favor their mobility. | ||
=====Replication fork===== | =====Replication fork===== | ||
The replication fork modulates the transposition of many transposable elements ([ | The [[wikipedia:DNA_replication#Replication_fork|replication fork]] modulates the transposition of many transposable elements ([https://tncentral.ncc.unesp.br/report/te/Tn7-NC_002525 Tn''7''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS903 IS''903''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS10R IS''10''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS50R IS''50''], [https://tncentral.ncc.unesp.br/report/te/Tn4430-X07651.1 Tn''4430''], [[wikipedia:P_element|P element]]<ref>{{#pmid:19703395}}</ref><ref>{{#pmid:9620951}}</ref><ref>{{#pmid:3000598}}</ref><ref>{{#pmid:2451025}}</ref><ref>{{#pmid:2546858}}</ref><ref>{{#pmid:21896744}}</ref>. For IS''200''/IS''605'' family members, the [[wikipedia:DNA_replication#Replication_fork|replication fork]], in particular the lagging strand template, is an important source of ss DNA substrates for both excision and integration. Transposition can be considered to follow a “Peel and Paste ” mechanism ([[:File:FigIS200 605 22.png|Fig. IS200.22]]) where the IS excises or is “peeled” off as a single strand circle from the lagging strand template of the donor molecule and then integrates or is “pasted” in a ss target at the [[wikipedia:DNA_replication#Replication_fork|replication fork]]. | ||
[[ | [[File:FigIS200 605 22.png|center|thumb|720x720px|'''Fig. IS200.22.''' '''Top''': Excision of the single-strand circular intermediate (transposon joint) from the lagging strand template of a donor plasmid. Arrow tip: replication direction. '''Bottom''': Integration of right end ('''RE''')–left end ('''LE''') transposon joint into the single-strand target at the [[wikipedia:DNA_replication#Replication_fork|replication fork]].]] | ||
'''Excision:''' Excision of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name= | '''Excision:''' Excision of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] is sensitive to the direction of replication across the element: it is more frequent when the active strand (top strand) is on the lagging strand (discontinuous) template ([[:File:FigIS200 605 22.png|Fig. IS200.22]] '''top'''; [[:File:FigIS200 605 23.png|Fig. IS200.23]]) but difficult to detect when it is on the leading (continuous) strand <ref name=":25" />. Moreover, excision ''in vitro'' requires that both ends are in single strand form at the same time<ref name=":24" />. | ||
<br /> | |||
[[File:FigIS200 605 23.png|center|thumb|720x720px|'''Fig. IS200.23.''' Orientation with respect to replication direction. The disposition of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] active ('''top''') strand with respect to replication direction is shown when the fork approaches from one direction ('''left''') when it is part of the lagging-strand template or the other ('''right''') when it is part of the leading strand. [[wikipedia:Okazaki_fragments|Okazaki fragments]] on the lagging strand are indicated as short lines. The direction of DNA synthesis is indicated with half arrowheads.]] | |||
The | The length of ssDNA on the lagging-strand template depends on the initiation frequency of [[wikipedia:Okazaki_fragments|Okazaki fragment]] synthesis by the [[wikipedia:DnaG|DnaG primase]]<ref>{{#pmid:1531480}}</ref><ref>{{#pmid:1740453}}</ref>. Transient inactivation of [[wikipedia:DnaG|DnaG]] activity reduces this frequency and therefore increases the average length of ssDNA between [[wikipedia:Okazaki_fragments|Okazaki fragments]]; the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] excision frequency increased. Under permissive conditions for ''[[wikipedia:Escherichia_coli|E. coli]]'' carrying a dnaGts mutation, using a plasmid-based assay with [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] derivatives of different lengths, the excision frequency decreased strongly as IS length increased. In contrast, when DnaGts activity was reduced by growth under sub-lethal conditions, excision showed a much less pronounced length-dependence ([[:File:FigIS200 605 24.png|Fig. IS200.24]]). This length-dependence might also contribute to the difference in copy numbers observed in the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] groups (see "[[IS Families/IS200-IS605 family#Distribution and Organization|Distribution and Organization]]"). | ||
[[File:FigIS200 605 24.png|center|thumb|720x720px|'''Fig. IS200.24.''' '''[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] Excision of as a function of IS length (in kilobases). Bottom panel:''' Shows the effect of IS length on the frequency of excision '''using [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] derivatives of''' 0.3; 0.5; 0.8; 1.1; 1.4; 1.9; 3 and 4 kb. Excision frequency falls steeply with increasing length and assumes a lower length dependence for IS of greater than about 2kb. In a dnaGts strain at the permissive temperature of 33°C, excision is significantly reduced as a function of length (X). Taken from ton-Hoang et al. 2010.]] | |||
''' | '''Integration:''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] integration is oriented (with its left end 3’ to a TTAC target site) and it requires an ssDNA target ''in vitro <ref name=":6" /><ref name=":7" />''. The close link between transposition and the [[wikipedia:DNA_replication#Replication_fork|replication fork]] is also illustrated by the integration bias, consistent with a preference for an ssDNA target on the lagging strand template ([[:File:FigIS200 605 22.png|Fig. IS200.22]] '''bottom'''). This was indeed found to be the case in ''E. coli'' for both plasmid and chromosome targets <ref name=":25" />. As expected, the orientation of insertions into the ''E. coli'' chromosome was correlated with the direction of replication of each replicore and was consistent with integration into the lagging strand template. | ||
Overall GC skew (G – C / G + C) is indicated in blue and orange. Replication is bidirectional from a single origin (red spot). Arrows top or bottom indicate the point and orientation of insertion. The IS orientation adheres strictly to the GC skew, suggesting that there have been many chromosome rearrangements after IS insertion. | The orientation bias is not restricted to [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608][https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']. An ''in silico'' analysis of a large number of bacterial genomes carrying copies of various family members revealed that most had a strong insertional bias consistent with the direction of replication<ref name=":25" /> ([[:File:FigIS200 605 25.png|Fig. IS200.25]]). Moreover, in certain cases, elements which did not follow the orientation pattern could be correlated to the genomic region that had undergone inversion or displacement ([[:File:FigIS200 605 26.png|Fig. IS200.26]]; [[:File:FigIS200 605 27.png|Fig. IS200.27]]) suggesting that, once they occur, insertions are quite stable. It seems possible that this type of genomic archaeology based on orientation patterns could be used to complement the study of bacterial genome evolution. | ||
[[ | [[File:FigIS200 605 25.png|center|thumb|680x680px|'''Fig. IS200.25.''' The orientation of IS''200''/IS''605'' family members in different bacterial genomes. Overall GC skew (G – C / G + C) is indicated in blue and orange. '''Top.''' ''[[wikipedia:Salmonella_enterica|S. enterica]]'' (typhi) CT18; '''Middle.''' ''[[wikipedia:Yersinia_pseudotuberculosis|Y. pseudotuberculosis]]'' IP31758; '''Bottom'''. ''[[wikipedia:Photobacterium_profundum|P. profundum]]'' SS9. Replication is bidirectional from a single origin (red spot). Arrows top or bottom indicate the point and orientation of insertion.]] | ||
[[File:FigIS200 605 26.png|center|thumb|680x680px|'''Fig. IS200.26.''' Orientation of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1541 IS''1541''] in ''[[wikipedia:Yersinia_pestis|Yersinia pestis]]'' (Microtus). Overall GC skew (G – C / G + C) is indicated in blue and orange. Replication is bidirectional from a single origin (red spot). Arrows top or bottom indicate the point and orientation of insertion. The IS orientation adheres strictly to the GC skew, suggesting that there have been many chromosome rearrangements after IS insertion.]] | |||
[[File:FigIS200 605 27.png|center|thumb|680x680px|'''Fig. IS200.27.''' '''Comparison of ''S. enterica'' (typhi) CT18 and Ty2 genomes.''' The two ''[[wikipedia:Salmonella_enterica|S. enterica]]'' genomes are known to contain a large inversion generated by recombination between two rRNA operons. This is illustrated by the circular map at the bottom of the figure (from Deng et al., 2003). The '''top''' of the figure shows the positions of inversion with respect to the origin of replication. The two replicores are shown in '''blue''' and '''orange'''. The multiple copies of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] are shown as black vertical arrowheads. Those pointing upwards indicate [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] in one orientation, while those pointing downwards indicate the opposite orientation.]] | |||
=====Genome re-assembly after irradiation in ''[[wikipedia:Deinococcus_radiodurans| | '''Stalled replication forks:''' Stalled [[wikipedia:DNA_replication#Replication_fork|replication fork]]<nowiki/>s appeared preferential targets for [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] insertion. In the experiments using the Tus/ter replication termination or operator/repressor system, [[wikipedia:DNA_replication#Replication_fork|replication fork]] arrest attracts [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] insertion <ref name=":25" />. Transient blockade of the unidirectional [[wikipedia:DNA_replication#Replication_fork|replication fork]] by the Tus protein at the ter site resulted in preferential [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] insertion into the array of target sequences behind the stalled forks on the lagging strand but not on the leading strand ([[:File:FigIS200 605 28.png|Fig. IS200.28]]). A similar result was obtained in the ''[[wikipedia:Escherichia_coli|E. coli]]'' chromosome using the ''lacI''/''lacO'' and ''tetR''/''tetO'' repressor/operator roadblock systems<ref>{{#pmid:12864855}}</ref><ref name=":34">{{#pmid:27466393}}</ref> ([[:File:FigIS200 605 29.png|Fig. IS200.29]]). Moreover, a significant number of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] insertions into the ''[[wikipedia:Escherichia_coli|E. coli]]'' chromosome were localized in the highly transcribed rrn operons. This suggests that high transcription levels might affect [[wikipedia:DNA_replication#Replication_fork|replication fork]] progression (fork arrest by collision with RNA polymerase, R-loop formation, etc.) and could account for targeting the rrn operons. Thus, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] insertions can be targeted to the stalled forks and this may well represent a major pathway for targeting transposition. | ||
''[[wikipedia:Deinococcus_radiodurans|Deinococcus radiodurans]]'', arguably the most radiation-resistant organism known, has a remarkable capacity to survive the lethal effects of DNA-damaging agents, such as ionizing radiation, UV light and desiccation. After exposure to high irradiation doses, the ''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]]'' chromosome which is present in multiple copies per cell<ref> | [[File:FigIS200 605 28.png|center|thumb|680x680px|'''Fig. IS200.28.''' Map of insertions with ter in the permissive and non-permissive orientations. Replication from d’ori is from left to right; * = target sequences TTAC close to ter ; horizontal arrow heads = Ternp (red) et Terp (black) ; vertical black arrow heads = [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] insertions, vertical red arrow heads = multiple [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608][https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] insertions upstream of Ternp and within Terp. (Ton-Hoang et al., 2010).]] | ||
[[File:FigIS200 605 29.png|center|thumb|680x680px|'''Fig. IS200.29.''' '''Top''': Position of the ''lacO'' and ''tetO'' arrays in ''E.coli'' WX45 and WX51: The replication origin, '''ori''', is shown as a red ellipse and the left and right replicores in blue and orange respectively. ''E.coli'' WX45 and WX51 contain arrays at different locations. '''Bottom:''' Insertions into ''E.coli'' WX45 and WX51; the left and right replicores have been separated for convenience. '''Above''': a detail of the ''lacO'' array (light orange or light green rectangles) on the left replicore. '''Below''': a detail of the ''tetO'' array (orange or green rectangles) on the right replicore. Black vertical arrows: insertions obtained in the absence of LacI (top) or TetR (bottom). Green or orange vertical arrows: insertions obtained in the presence of LacI ('''top''') or TetR ('''bottom''') in several independent experiments. The positions of the oligonucleotides (not to scale) used to localize the insertions are shown with half arrowheads. The [[wikipedia:Kanamycin_A|kanamycin]] and [[wikipedia:Gentamicin|gentamycin]] resistance cassettes used in the construction and insertion of the lac and tet operator arrays are also shown. * represents potential '''TTAC''' target sequences present in the region.]] | |||
<br /> | |||
=====Genome re-assembly after irradiation in ''[[wikipedia:Deinococcus_radiodurans|Deinococcus radiodurans]]''===== | |||
''[[wikipedia:Deinococcus_radiodurans|Deinococcus radiodurans]]'', arguably the most radiation-resistant organism known, has a remarkable capacity to survive the lethal effects of DNA-damaging agents, such as ionizing radiation, UV light and desiccation. After exposure to high irradiation doses, the ''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]]'' chromosome which is present in multiple copies per cell<ref>{{#pmid:649572}}</ref><ref>{{#pmid:7309705}}</ref> is shattered and degraded, but can be very rapidly reassembled in a process called ESDSA ('''E'''xtended '''S'''ynthesis '''D'''ependent '''S'''trand '''A'''nnealing). This involves resection of the multiple dsDNA fragments to generate extensive ssDNA segments, reannealing of complementary DNA and reconstitution of the intact chromosome <ref name=":262" />. | |||
Mennecier et al.<ref name=": | Mennecier et al.<ref name=":172" /> analyzed the mutational profile in the ''thyA'' gene following irradiation. The majority of mutants were due to the insertion of a single IS, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] which is present in a single copy in the genome of the laboratory ''D. radiodurans'' strain. Furthermore, using a tailored genetic system, both [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] excision and insertion efficiency was found to increase significantly following host cell irradiation<ref name=":32" />. A [[wikipedia:Polymerase_chain_reaction|PCR]]-based approach was used to follow irradiation-induced excision of the single genomic [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] copy and re-closure of flanking sequences. Remarkably, these events are temporally closely correlated with the start of the ESDSA. The signal that triggers IS''Dra2'' transposition is likely the production of ssDNA intermediates generated during genome reassembly. Consistent with this, the requirement of ssDNA substrates for [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], as for [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''], was confirmed by ''in vitro'' studies of TnpAIS''Dra2''-catalysed cleavage and strand transfer<ref name=":32" />. | ||
[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] excision also depends on the direction of replication and is consistent with a requirement for the active strand to be located on the lagging strand template in normally growing cells. However, this bias disappeared in irradiated ''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]] <ref name=":25" />''. Since no apparent strand bias was observed in generating ssDNA during ESDSA, the lack of orientation bias in irradiated [[wikipedia:Deinococcus_radiodurans|''D. radiodurans'']] suggests that ssDNA substrates are no longer limited to those rendered accessible during replication. This indicates that ssDNA sources are different in the contexts of vegetative replication and in genome reassembly. | [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] excision also depends on the direction of replication and is consistent with a requirement for the active strand to be located on the lagging strand template in normally growing cells. However, this bias disappeared in irradiated ''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]] <ref name=":25" />''. Since no apparent strand bias was observed in generating ssDNA during ESDSA, the lack of orientation bias in irradiated [[wikipedia:Deinococcus_radiodurans|''D. radiodurans'']] suggests that ssDNA substrates are no longer limited to those rendered accessible during replication. This indicates that ssDNA sources are different in the contexts of vegetative replication and in genome reassembly. | ||
=====Real-time transposition (excision) activity===== | |||
The dynamics of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] excision from a donor site has been examined at the colony and single-cell level in real-time using an artificial [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] derivative inserted between the -35 and -10 elements of a PlacIQ1 promoter<ref>{{#pmid:27298350}}</ref> driving expression of the blue fluorescent protein mCerulean<ref>{{#pmid:21479270}}</ref>. TnpAIS''608'', N-terminally tagged with the bright yellow reporter Venus<ref>{{#pmid:11753368}}</ref> was supplied in trans driven by PLTetO1 and controllable over a 100x range. Excision rates were proportional to the transposase levels and, as expected, excision depended on the orientation of the IS derivative with respect to the direction of replication in the donor plasmid: IS in an orientation with the active IS strand in the lagging strand template excised more frequently and at lower (10x) TnpA levels than when inserted into the leading strand, demonstrating the validity of the experimental system. In this system, individual excision events as bright flashes of blue fluorescence. Following an initial activity in the part of the population when cells are applied to a solid medium, activity decreases or ceases during “exponential” growth but increases again at a constant rate (in a sub-population) upon growth arrest in a random ([[wikipedia:Poisson_distribution|Poisson distributed]]) way. Moreover, the events do not occur randomly in the growing colonies and tend to be excluded from the colony edges. The study underlines the heterogeneity of TE activity rates in both space and time possibly resulting from heterogenous TnpA levels at the individual cell level in the population. These studies are reminiscent of the early studies of [[wikipedia:James_A._Shapiro|Jim Shapiro]] on [[wikipedia:Bacteriophage_Mu|phage Mu]]-mediated rearrangements in growing bacterial colonies<ref>{{#pmid:2838063}}</ref><ref>{{#pmid:2553666}}</ref>. | |||
== TnpB and its Relatives: Guide RNA Endonucleases == | |||
TnpA alone can carry out both the cleavage and joining steps ''in vitro''. TnpB is encoded only by the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] groups and is not required for transposition of either [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''] or [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] in ''[[wikipedia:Escherichia_coli|Escherichia coli]]'' and ''[[wikipedia:Deinococcus_radiodurans|Deinococcus radiodurans]]'' respectively <ref name=":6" /><ref name=":24" />. The full length TnpB is approximately 400 amino acids long. | |||
====IS''200''/IS''605'' and the ISC group==== | |||
An overview of TnpB organization was originally obtained by comparing the entire [https://isfinder.biotoul.fr/ ISfinder] collection of 85 ''tnpB'' copies with the [https://pfam.xfam.org/ Pfam domain database] ([[:File:FigIS200 605 30.png|Fig. IS200.30]]). This revealed three major domains: an N-terminal putative [[wikipedia:Helix-turn-helix|helix-turn-helix]], a longer and more variable central domain, OrfB_IS605, with a putative '''DDE motif''' and a C-terminal [[wikipedia:Zinc_finger|zinc finger (ZF) domain]] of the '''CPXCG''' type. Half of the analyzed TnpB copies including TnpB<sub>IS''Dra2''</sub> but not TnpB<sub>IS''608''</sub> contained all three domains, while only two did not include a [[wikipedia:Zinc_finger|zinc finger]]. | |||
TnpB<sub>IS''608''</sub> was missing the N-terminal [[wikipedia:Helix-turn-helix|HTH domain]] which would provide an explanation for its lack of activity in certain assays <ref name=":19" />. | |||
Pasternak et al.'''<ref name=":18">{{#pmid:23461641}}</ref>''' observed that TnpB<sub>ISDra2</sub> appears to have an inhibitory effect on [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] excision and insertion in its host, ''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]]'', and on excision in ''[[wikipedia:Escherichia_coli|E. coli]]'', and that the integrity of its putative [[wikipedia:Zinc_finger|zinc finger motif]] is required for this effect. | |||
Relatives of TnpB has been identified in both prokaryotes and eukaryotes. It is carried by members of the [https://tncentral.ncc.unesp.br/TnPedia/index.php/IS_Families/IS607_family IS''607'' family] found both in prokaryotes and in eukaryotes and their viruses but is dispensable for [https://tncentral.ncc.unesp.br/TnPedia/index.php/IS_Families/IS607_family IS''607'' transposition] in ''[[wikipedia:Escherichia_coli|E. coli]]'' . As it is for IS''200''/IS''605'' transposition. TnpB analogues, known as '''Fanzor1''' and '''Fanzor2''' (see: [[IS Families/IS200 IS605 family#Fanzor1|Fanzor section below]]), have also been identified in diverse eukaryotic transposable elements.[[File:FigIS200 605 30.png|center|thumb|680x680px|'''Fig. IS200.30.''' Organization of TnpB protein and derivatives: putative N-terminal [[wikipedia:Helix-turn-helix|helix-turn-helix motif (HTH)]], central OrfB_IS''605'' domain with a putative DDE motif (Pfam), and C-terminal [[wikipedia:Zinc_finger|zinc finger motif (ZF)]] are shown. Numbers represent the occurrence of corresponding variants among 85 analyzed sequences: 46 carry all the three domains (e.g., [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']), 33 lack the [[wikipedia:Helix-turn-helix|HTH motif]] (e.g., [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608][https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608'']), whereas others retain separate domains.]]<br /> | |||
====TnpB and IscB are Related to the RNA-guided nucleases [[wikipedia:CRISPR/Cas12a|Cas12]] and [[wikipedia:Cas9|Cas9]].==== | |||
More extensive analysis showed that TnpB shares some similarity with the RNA-guided nuclease [[wikipedia:CRISPR/Cas12a|Cas12]] while IscB showed greater similarity to [[wikipedia:Cas9|Cas9]]. Both, like [[wikipedia:Cas9|Cas9]] and [[wikipedia:CRISPR/Cas12a|Cas12]], themselves exhibit split [[wikipedia:RuvABC|RuvC endonuclease domains]] <ref name=":29" /><ref>{{#pmid:24728998}}</ref> <ref>{{#pmid:PMC5851899}}</ref><ref>{{#pmid:31857715}}</ref><ref name=":37">{{#pmid:34619744}}</ref> ([[:File:FigIS200 605 31.png|Fig. IS200.31]]). While [[wikipedia:Cas9|Cas9]] and [[wikipedia:CRISPR/Cas12a|Cas12]] carry related functional domains, their architectures are somewhat different and the configuration of their guide RNAs also differ. | |||
[[File:FigIS200 605 31.png|center|thumb|680x680px|'''Fig. IS200.31.''' Schematic of IscB and TnpB showing the relative positions of the different functional motifs. '''Top:''' IscB (Extracted from Altae-Tran et al.<ref name=":30" />). Botton: TnpB from [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] compared with the Cas12 derivative, Un1Cas12f1 (from Karvelis, et al.<ref name=":37" />). [Green]: [[wikipedia:RuvABC|RuvC]] segment '''I''', '''II''' and '''III'''; [red]: [[wikipedia:Zinc_finger|Zinc Finger]] or [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH nuclease]; [blue]: Arginine rich helix; [yellow]: Wedge domain; [grey]: Helical bundle. As defined by Altae-Tran et al.<ref name=":30" /> (IscB) and TnpB Karvelis, et al.<ref name=":37" />. Note, compared to Cas9-like IscB, TnpB and Cas12 have an N-terminal extension before the first [[wikipedia:RuvABC|RuvC]] ('''i''') motif.]] | |||
<br /> | |||
=====IscB and Cas9===== | |||
[[wikipedia:Cas9|Cas9]] (also called Cas5, Csn1, or Csx12) is an RNA-guided dual nuclease generally associated with [[wikipedia:CRISPR|CRISPR systems]] in bacteria and widely used in genome engineering. The [[wikipedia:RuvABC|RuvC DED catalytic]] triad is split into three sections (I, II and III) in which I and II are interrupted by the R-rich region and II and III by an [https://proteopedia.org/wiki/index.php/H-N-H_motif HNH nuclease] domain ([[:File:FigIS200 605 31.png|Fig. IS200.31]]). A region common to all [[wikipedia:Cas9|Cas9]] derivatives is located at the C-terminal end. | |||
The [[wikipedia:Cas9|Cas9]] structure has been determined ([[:File:FigIS200 605 32.png|Fig. IS200.32]]. '''B''' <ref name=":39">{{#pmid:24505130}}</ref>). The protein is a monomer in which the three [[wikipedia:RuvABC|RuvC]] segments I, II and II carrying the D, E and D catalytic residues respectively, are assembled into the correct three-dimensional configuration to generate a [[wikipedia:RuvABC|RuvC-like]] catalytic pocket with the [https://proteopedia.org/wiki/index.php/H-N-H_motif HNH nuclease] domain extruded ([[:File:FigIS200 605 32.png|Fig. IS200.32]]. '''A'''). The [[wikipedia:Cas9|Cas9]] guide RNA (crRNA) is composed of a region containing secondary structure potential and a 5’ extension (spacer) of about 20 nts, complementary to the target sequence and which forms an RNA/DNA [[wikipedia:Heteroduplex|heteroduplex]] ([[:File:FigIS200 605 32.png|Fig. IS200.32]]. '''C'''). Activated [[wikipedia:Cas9|Cas9]] recognises a specific sequence, PAM ('''P'''rotospacer '''A'''djacent '''M'''otif), located next to the target sequence on the complementary strand downstream of the target sequence. This is necessary for binding of the [[wikipedia:Cas9|Cas9]]-crRNA complex and subsequent cleavage <ref name=":36">{{#pmid:22949671}}</ref>. Cleavage is catalysed by both the [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH] nuclease (target strand) and the reconstituted [[wikipedia:RuvABC|RuvC nuclease]] (complementary strand). Cleavage is often “blunt” (i.e. occurs at the same position on both strands) and PAM proximal <ref name=":36" />. | |||
[[File:FigIS200 605 32.png|center|thumb|680x680px|'''Fig. IS200.32.''' '''[[wikipedia:Cas9|Cas9]] Structure and Activity.''' '''A)''' Cartoon of [[wikipedia:Cas9|Cas9]] showing the “assembled” [[wikipedia:RuvABC|RuvC]] domains. (colors as in legend to [[:File:FigIS200 605 31.png|Fig. IS200.31]]. '''B)''' [[wikipedia:Cas9|Cas9]] structure from Jinek et al. <ref name=":39" /> . | |||
Taken from https://en.wikipedia.org/wiki/Cas9#/media/File:Cas9_Apo_Structure.png. The [[wikipedia:RuvABC|RuvC]] and HNH endonuclease domains are indicated. '''C)''' Mechanism of [[wikipedia:Cas9|Cas9]] action. The target DNA is invaded by 3' end of the guide RNA and cleavage of the PAM-carrying strand is accomplished by the [[wikipedia:RuvABC|RuvC]] segments of Cas12 while cleavage of the RNA-bound opposite strand is assisted by the [[wikipedia:Zinc_finger|zinc-finger domain]].]] | |||
IscB shares [[wikipedia:Cas9|Cas9]] sequence features such as the split [[wikipedia:RuvABC|RuvC]] and [https://proteopedia.org/wiki/index.php/H-N-H_motif HNH nuclease] domains and an arginine-rich (R-rich also known as a bridge helix) domain ([[:File:FigIS200 605 31.png|Fig. IS200.31]] '''Top''') with a group of [[wikipedia:Cas9|Cas9]] derivatives, Cyan7822_6324, in particular <ref>{{#pmid:21756346}}</ref>. In addition, a more detailed investigation <ref name=":30" /> led to identification of an additional IscB N-terminal domain (called PLMP after its conserved amino acid residues) not present in [[wikipedia:Cas9|Cas9]] ([[:File:FigIS200 605 30.png|Fig. IS200.30]]. '''Top'''). These features appear in alignments of IscB sequences <ref name=":29" /> ; [[:File:FigIS200 605 33.png|Fig. IS200.33]]. | |||
[[File:FigIS200 605 33rev.png|alt=|center|thumb|780x780px|'''Fig. IS200.33.''' '''Alignment of IscB. Sequences from Kapitonov et al.''' <ref name=":29" />. The alignment was performed with Clustal Omega2 and drawn using Jalview Version 2. PLMP (Altae-Tran et al. <ref name=":30" />), [[wikipedia:RuvABC|RuvC]] I, II and II, arginine rich region (R-rich) and '''HN(H) motifs''' are indicated as well as a '''CXXC''' zinc finger. A consensus sequence is included below.]] | |||
<br /> | |||
===== | =====TnpB and Cas12===== | ||
[[wikipedia:CRISPR/Cas12a|Cas12]] is also an RNA-guided nuclease. A number of subtypes have been described <ref>{{#pmid:31021231}}</ref> and the structures of several of these have been solved. They have similar C-terminal ends but carry (related) N-terminal ends of various lengths (see Karvelis, et al.<ref name=":37" />). One of the shorter derivatives [[wikipedia:CRISPR/Cas12a|Cas12F]] (AKA Cas14) <ref>{{#pmid:33764415}}</ref> acts as a dimer. Like [[wikipedia:Cas9|Cas9]], the common C-terminal end is composed of a split [[wikipedia:RuvABC|RuvC]] (I, II and III) in which I and II are interrupted by the R/K-rich region. In this case, however, instead of the [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH domain], [[wikipedia:RuvABC|RuvC]] segments II and III are separated by a [[wikipedia:Zinc_finger|zinc finger]] of the CPXCG typeI ([[:File:FigIS200 605 31.png|Fig. IS200.31]] '''bottom'''). | |||
For [[wikipedia:CRISPR/Cas12a|Cas12]], the guide RNA is composed of a region containing secondary structure potential and a 3’ extension (spacer) of about 20 nts, complementary to the target sequence ([[:File:FigIS200 605 34.png|Fig. IS200.34]]). The PAM sequence is located upstream of the target sequence. Cleavage is PAM distal and staggered. | |||
[[File:FigIS200 605 34.png|center|thumb|680x680px|'''Fig. IS200.34.''' '''Cas12 Structure and Activity. A)''' A comparison of the structure of Cas12f1 with the model of TnpB (kindly provided Karvelis et al. <ref name=":37" />) showing the REC and WED domains (left) and the helical, [[wikipedia:RuvABC|RuvC]] and [[wikipedia:Zinc_finger|Zn-finger domains]] (right). '''B) Mechanism of Cas12 action.''' The target DNA is invaded by the 3'end of the guide RNA and cleavage of the PAM-carrying strand is accomplished by the [[wikipedia:RuvABC|RuvC]] segments of Cas12 while cleavage of the RNA-bound opposite strand is assisted by the [[wikipedia:Zinc_finger|zinc finger domain]].]] | |||
Karvelis, et al.<ref name=":37" /> describe the domain structure of TnpB and present evidence that it is related to [[wikipedia:CRISPR/Cas12a|Cas12]], another derivative of the Cas family ([[:File:FigIS200 605 34.png|Fig. IS200.34]] '''bottom'''). Like [[wikipedia:CRISPR/Cas12a|Cas12F]], it also carries a [[wikipedia:RuvABC|RuvC]] in which the '''D''' (I), '''E''' (II) and '''D''' (III) catalytic residues are split. Again, [[wikipedia:RuvABC|RuvCI]] and [[wikipedia:RuvABC|RuvCII]] are separated by an R-rich region and [[wikipedia:RuvABC|RuvCII]] and [[wikipedia:RuvABC|RuvCII]] by a [[wikipedia:Zinc_finger|zinc finger]] with three modules ([[:File:FigIS200 605 31.png|Fig. IS200.31]] '''bottom'''). Moreover, the N-terminal region which corresponds to the minimal common structural elements present in [[wikipedia:CRISPR/Cas12a|Cas12]] <ref name=":37" />, includes a three helical bundle Rec domain (labelled [[wikipedia:Helix-turn-helix|HTH]] in an earlier TnpB analysis; [[:File:FigIS200 605 31.png|Fig. IS200.31]] '''bottom'''), inserted into a [[wikipedia:Beta_barrel|β-barrel domain]], referred to as the “Wedge” domain in [[wikipedia:CRISPR/Cas12a|Cas12]]. It should be noted that the [[wikipedia:RuvABC|RuvC]] domain is used to cleave both DNA strands while the Z domain simply assists this cleavage. | |||
These features can be identified in an alignment of the entire TnpB library (349 examples from [https://tncentral.ncc.unesp.br/ISfinder/index.php ISfinder]; November 2021) ([[:File:FigIS200 605 35i.png|Fig. IS200.35]] '''i''', '''ii''' and '''iii''') and in TnpB sequences provided by Kapitonov et a.,<ref name=":29" /> ([[:File:FigIS200 605 36.png|Fig. IS200.36]]). <gallery mode="slideshow"> | |||
File:FigIS200 605 35i.png|'''Fig. IS200.35i.''' TnpB Alignment | |||
File:FigIS200 605 35ii.png|'''Fig. IS200.35ii.''' TnpB Alignment | |||
File:FigIS200 605 35iii.png|'''Fig. IS200.35iii.''' TnpB Alignment | |||
</gallery> | |||
The relationship between [[wikipedia:CRISPR/Cas12a|Cas12]] and TnpB has strong support from structural modelling <ref name=":37" />: for example Un1Cas12f1 (Cas14a) from an uncultured archeon <ref name=":46">{{#pmid:33333018}}</ref>, which functions as an asymmetric dimer and represents a minimal domain organization of the [[wikipedia:CRISPR/Cas12a|Cas12]] group <ref name=":37" />. However, TnpB from [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] (see below) appears to be a monomer <ref name=":37" />. | |||
[[File:FigIS200 605 36.png|center|thumb|680x680px|'''Fig. IS200.36.''' TnpB Alignment with [[wikipedia:RuvABC|RuvC]] and Other Domains. Sequences from Kapitonov et al <ref name=":29" />. The alignment was performed with Clustal omega2 and drawn using Jalview Version 2 The long N-terminal extension, [[wikipedia:RuvABC|RuvC]] I, II and II, arginine/lysine rich region (RK-rich) and [[wikipedia:Zinc_finger|zinc finger motifs]] are indicated <ref name=":37" />. The two yellow residues indicated by vertical blue arrows indicate the major differences in [[wikipedia:RuvABC|RuvC]] I and [[wikipedia:RuvABC|RuvC]] II between TnpB and '''Fanzors''' <ref name=":47">{{#pmid:37971304}}</ref>.]] | |||
====Evolution of TnpB and IscB from an Ancestral RuvC?==== | |||
In view of the relationship between TnpB, IscB, [[wikipedia:RuvABC|RuvC]] and the Cas proteins, the important question of the evolutionary trajectory of these proteins arises. Using various analytic tools, it was concluded that all [[wikipedia:Cas9|Cas9]] examples identified to date are probably descended from a single IscB derivative ancestor <ref name=":30" />. This contention arose from the observation that the CRISPR-associated IscB derivatives do not form a single clade but are distributed over the IscB phylogenetic tree suggesting that they evolved independently from a single acquisition <ref name=":30" />. Additional IscB derivatives were also identified in this study which led to an evolutionary scenario involving successive acquisition of domains by an ancestral [[wikipedia:RuvABC|RuvC]] ([[:File:FigIS200 605 37.png|Fig. IS200.37]]). The additional species included a shorter derivative, IsrB, which carried the bridging helix but not the [https://proteopedia.org/wiki/index.php/H-N-H_motif HNH domain] and a longer derivative which had acquired a so-called REC domain <ref name=":30" />. | |||
TnpB appears to have followed an alternative evolutionary route towards [[wikipedia:CRISPR/Cas12a|Cas12]]. In addition, it is thought that TnpB was an ancestor of the eukaryotic '''Fanzor proteins''' <ref name=":35">{{#pmid:23548000}}</ref> (see: [[IS Families/IS200 IS605 family#Fanzor1|Fanzor section below]]) associated with diverse eukaryotic potential transposable elements. | |||
[[File:FigIS200 605 37.png|center|thumb|680x680px|'''Fig. IS200.37.''' S'''equential Acquisition of Domains by an Ancentral [[wikipedia:RuvABC|RuvC]] from Altae-Tran et al.'''<ref name=":30" />. [green]: [[wikipedia:RuvABC|RuvC]] segments '''I''', '''II''' and '''III'''; [red]: [[wikipedia:Zinc_finger|Zinc-finger]] or [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH nuclease]; [blue]: Arginine rich helix; [yellow]: Wedge domain. As defined by Altae-Tran et al.<ref name=":30" /> (IscB) and TnpB Karvelis et al. <ref name=":37" />. Note, compared to Cas9-like ''iscB'', TnpB and Cas12 have an N-terminal extension before the first [[wikipedia:RuvABC|RuvC]] (i) motif while iscB carries the N-terminal P domain [grey].]] | |||
<br /> | |||
====Functional analysis of TnpB and IscB==== | |||
Clearly, the relationship between TnpB and IscB and [[wikipedia:CRISPR/Cas12a|Cas12]] and [[wikipedia:Cas9|Cas9]] respectively suggested that TnpB and IscB might function as RNA guided nucleases which may, in some way, be involved in transposition <ref name=":30" /><ref name=":37" /> and this has been extensively tested. | |||
=====TnpB functions as an RNA-guided Endonuclease===== | |||
For TnpB, Karvelis, et al.<ref name=":37" /> used [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] as a model system. This has the advantage that its transposition behavior has been well characterized <ref name=":32" />'''<ref name=":18" />'''. | |||
In [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], the 3’ end of the upstream ''tnpA'' gene overlaps the 5’ end of ''tnpB''. The authors were unable to efficiently express TnpB as a fusion protein but observed that its yield was significantly increased when in its natural context but in which TnpA had been inactivated by mutation. Although the nature of the mutation is not specified in the article, its behavior could be explained if it were an in-frame deletion or other mutation which does not affect C-terminal translation since it seems likely that expression of TnpB involves translational coupling <ref>{{#pmid:7517937}}</ref><ref>{{#pmid:PMC6728339}}</ref> with TnpA suggested by their overlapping reading frames ([[:File:FigIS200 605 38.png|Fig. IS200.38]]). | |||
[[File:FigIS200 605 38.png|center|thumb|720x720px|'''Fig. IS200.38.''' Organization of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']. '''Top:''' Map of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] showing the left ('''LE''') and right ends ('''RE''') (red and blue respectively), the 5' pentanucleotide target sequence, '''TTGAT''', the position of cleavage indicated by the vertical arrowheads and the overlapping ''tnpA'' and ''tnpB'' genes. Above is shown the DNA and protein sequences at the position of the overlap, which are presumably involved in translational coupling of ''tnpA'' and ''tnpB''. '''Bottom:''' Sequence of the guide RNA (reRNA) derived from the right IS end including a few bases of the IS interior, the '''RE''' secondary structure (blue) and the IS flank which acts the guide (green) from Karvelis et al. <ref name=":37" />.]] | |||
TnpB was found to purify with RNA of approximately 150 nts derived from the '''IS''' RE (reRNA). reRNA was complementary to the ''tnpB'' 3’ end, RE, and about 16 nt of (host) flanking DNA ([[:File:FigIS200 605 38.png|Fig. IS200.38]]). This RNA, with the secondary structure provided by the RE sequence and the 3’ extended flanking DNA is of the expected configuration for relatives of [[wikipedia:CRISPR/Cas12a|Cas12]] ([[:File:FigIS200 605 37.png|Fig. IS200.36]]). Previous studies had identified non coding RNA (ncRNA) from the 3’ end of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''], a related IS from ''[[wikipedia:Halobacterium_salinarum|Halobacterium salinarum]]'' NRC-1, called sense overlapping transcripts (sotRNAs) <ref name=":31">{{#pmid:PMC4615843}}</ref>.<br /><br /> | |||
== ncRNAs, sotRNAs and reRNAs == | |||
There has been much interest in [[wikipedia:Non-coding_RNA|non-coding RNA]] (ncRNA) and global searches in Archaea had revealed ncRNA expressed from [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] group members which carry only a ''tnpB'' gene and are devoid of the TnpA transposase <ref>{{#pmid:15752202}}</ref><ref name=":20">{{#pmid:25127548}}</ref><ref>{{#pmid:21668986}}</ref>. | |||
During a detailed analysis of ncRNA produced from ''[[wikipedia:Halobacterium_salinarum|Halobacterium salinarum]]'' NRC-1 <ref name=":21">{{#pmid:25806405}}</ref><ref name=":43">{{#pmid:34209065}}</ref>, an ncRNA from the region encompassing the right end of these IS''200''/IS''605'' family members was identified. This was called sotRNA ('''s'''ense '''o'''verlapping '''t'''ranscript). The authors demonstrated from a publicly available transcriptome compendium <ref>{{#pmid:19536208}}</ref> that all 10 [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] group members in ''[[wikipedia:Halobacterium_salinarum|H. salinarum]]'' NRC-1 genome express a sotRNA ([[:File:FigIS200 605 39.png|Fig. IS200.39]]) and show condition-dependent differential regulation between sotRNAs and their cognate genes. sotRNA started within ''tnpB'' at approximately 1100 nt from its initiation codon, had an average size of 218 nt, and ended approximately 74 nt 3’ to the ''tnpB'' termination codon. The authors could not distinguish between the hypotheses that sotRNAs are generated by primary transcription or by processing from a full length transcript of the tnp gene (although they were unable to locate any potential promoter). | |||
[[File:FigIS200 605 39.png|center|thumb|720x720px|'''Fig. IS200.39. Identification of sotRNAs in 3 of the 10 IS''1341''-type transposases of [[wikipedia:Halobacterium_salinarum|''Halobacterium salinarum'']] NRC-1 (GenBank Accession: [https://www.ncbi.nlm.nih.gov/nuccore/AE004437.1 AE004437]). Top:''' Examples of expression data from three [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 '''IS''1341''''']-related IS taken from Gomes-Filho et al 2015<ref name=":21" />. Heatmaps are color-coded according to log10 expression ratios between each of the 13 time points relative to reference condition. ( B ) Tiling array signal in reference condition and expression profiles of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']-type tnpB (arrows in yellow for genes in forward strand and in orange for genes on the reverse strand) and their sotRNAs (light blue arrows). This signature identifies a change in the expression signal inside the insertion sequence near the 3’ end, indicating the existence of sense overlapping transcripts (sotRNAs). '''Bottom: Mapping the 5’ end of sotRNAs in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] -type transposases''' RNA-seq data visualized as log2 of total reads aligned in each genomic position for VNG0042G and VNG_sot0042. Enrichment of 5’ ends of mapped reads are visualized as peaks immediately below small RNA-seq coverage. Light blue arrow: sotRNA. Dark orange arrows: genes annotated on the reverse strand.]] | |||
Such sotRNA transcripts, specific for ''tnpB'' genes, had previously been identified by Gomes-Filho et al., <ref name=":21" /> in a number of Archaea and Bacteria including ''[[wikipedia:Sulfolobus_acidocaldarius|S. acidocaldarius]]'', ''[[wikipedia:Methanopyrus|Methanopyrus kandleri]]'', ''[[wikipedia:Helicobacter_pylori|Helicobacter pylori]]'' and ''[[wikipedia:Escherichia_coli|E. coli]]'' K12. There has also been some indication of “transposase-related” sense overlapping transcripts of ''tnpB''-like genes from [[wikipedia:Thermococcus_kodakarensis|''T. kodakarensis'']] <ref>{{#pmid:PMC4247193}}</ref> and ''[[wikipedia:Pyrococcus_furiosus|P. furiosus]]'', <ref>{{#pmid:PMC124278}}</ref>. However, that these may represent guide RNAs had not been explicitly considered. | |||
Furthermore, sotRNA included, what the authors called, an RE-like tetraloop resembling the RE DNA loop structure as do sotRNA from ''[[wikipedia:Pyrococcus_abyssi|P. abyssi]]'' and other thermococcal genomes <ref name=":20" />. | |||
====TnpB: mechanism of action==== | |||
Karvelis et al.<ref name=":37" /> demonstrated that TnpB, purified using a '''His''' tag, could cleave DNA. They argued that since the 3’ end of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] reRNA corresponds to the DNA target, it would vary according to the position of the IS insertion and the reRNA may (have) serve(d) as a guide RNA. If true, cleavage of the target DNA should occur within the 3’ extension sequence of the flank (the foot of '''RE''' in [[:File:FigIS200 605 38.png|Fig. IS200.38]]). In this context, it is interesting that the (DNA) structure of the right end was shown to form a base triple which is a characteristic of RNA <ref name=":7" />. | |||
To determine whether RNA-guided cleavage occurred , they constructed a system ([[:File:FigIS200 605 40.png|Fig. IS200.40]]) using a plasmid supplying TnpB together with an reRNA ([[:File:FigIS200 605 40.png|Fig. IS200.40]] '''A''') which included a 16 (or 20) defined nucleotide flank sequence and was terminated by a specific [[wikipedia:Hepatitis_delta_virus_ribozyme|Hepatitis delta virus ribozyme]] (HDV; <ref>{{#pmid:9288893}}</ref>) to produce a defined 3’ RNA end <ref>{{#pmid:9783582}}</ref>. A lysate from the host strain was then used in cleavage assays of a library of target plasmids each containing a specific defined 16 base pair sequence directly downstream from a 7 bp (7N) randomised sequence ([[:File:FigIS200 605 40.png|Fig. IS200.40]] '''B'''). This has previously been used to identify conserved PAM sequences <ref name=":37" /> <ref>{{#pmid:30691644}}</ref>. Specific double strand cleavage products were captured by adapter ligation (details in Karvelis et al.<ref name=":37" /> and the sequence of the resulting enriched 7N region was determined. This corresponded to the conserved [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] target pentanucleotide '''TTGAT''' (with a higher enrichment for GA) sequence which is essential for IS insertion and abuts '''LE''' in the integrated IS. By equivalence to PAM, this sequence was called TAM ('''T'''ransposon '''A'''djacent '''M'''otif) <ref name=":37" /> see also <ref name=":30" /> ([[:File:FigIS200 605 40.png|Fig. IS200.40]] '''C'''). | |||
[[File:FigIS200 605 40.png|center|thumb|680x680px|'''Fig. IS200.40.''' Defining TAM and the Position of Cleavage. '''A) Experimental design'''. A plasmid encoding TnpB [purple] and an reRNA with a 16 (or 20) defined flank sequence [green] terminated by a specific [[wikipedia:Hepatitis_delta_virus_ribozyme|Hepatitis delta virus ribozyme]], HDV [black], ('''left''') was used to produce the TnpB-reRNA complex and a lystate was used to treat a plasmid library containing the defined flank sequence with an upstream heptanucleotide of random sequence [red]. Double-strand cleavage products were captured by adapter ligation. '''B) Library DNA sequence.''' Randomized nucleotides are shown in red, guide sequence in green. '''C) Preferred TAM sequence and cleavage.''' The preferred TAM sequence, the observed pentanucleotide '''[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']''' target sequence TTGAT, is shown in red and the cleavages observed are shown by vertical arrowheads <ref name=":37" />.]] | |||
This cleavage specificity was confirmed using purified TnpB-RNP in which the protein and RNA components were produced by separate plasmids and a target plasmid carrying a 3’ flank, a 5’ TTGAT '''TAM''' pentanucleotide and a different guide sequence ([[:File:FigIS200 605 40.png|Fig. IS200.40]] '''C'''). The results showed a majority of double strand breaks in the supercoiled target plasmid to generate linear plasmids but also a significant level of nicked product. The TnpB-RNP was also active on a linear substrate (i.e. activity does not require supercoiling). In both cases, use of a TnpB D191A mutant, part of the conserved [[wikipedia:RuvABC|RuvC]] DED catalytic triad, eliminated the reaction. Robust TnpB-mediated cleavage activity was observed and required both '''TAM''' and guide RNA sequences. Further sequence analysis revealed that cleavage occurred distal to the '''TAM''' sequence at the guide sequence boundary and was specific for cleavage on the bottom strand but showed some variation on the top strand ([[:File:FigIS200.38.png|Fig. IS200.40]]). There are some differences however with Cas12. TnpB is a monomer and requires a single copy of reRNA <ref name=":37" />. | |||
A similar study by Altae-Tran et al.<ref name=":30" /> using purified TnpB from a less well characterised ''tnpB'' gene of ''[[wikipedia:Alicyclobacillus_macrosporangiidus|Alicyclobacillus macrosporangiidus]]'', (TnpB<sub>Ama</sub>), showed that the protein catalysed cleavage of both double- and single-stranded DNA targets in both a '''TAM'''-dependent and '''TAM''' independent manner. As in the case of TnpB<sub>IS''Dra2''</sub>, ''[[wikipedia:Alicyclobacillus_macrosporangiidus|A. macrosporangiidus]]'' TnpB-associated guide RNA was identified and derived from the 3’ end of the ''tnpB'' gene. In this case, the '''TAM''' appeared to be the tetranucleotide TCAC. | |||
These studies therefore identify C<sub>L</sub> (which is outside the transposon but necessary for transposition by interacting with G<sub>L</sub> [[:File:FigIS200 605 13.png|Fig. IS200.13]]) as the '''TAM'''. | |||
=====An explanation of the “inhibitory effect reported for TnpB?===== | |||
Moreover, ''in vivo'', TnpB expression together with reRNA from one plasmid resulted in loss of a second plasmid carrying the reDNA target (interference), presumably as a result of cleavage at the target site and linearization of the plasmid. This of course may explain the inhibitory effect of TnpB originally observed by Pasternak et al. '''<ref name=":18" />'''. | |||
=====A system which functions in Eukaryotes===== | |||
Additionally, the authors were able to demonstrate that the system functions in eukaryotic cells opening the possibility that it could be suitably modified for gene editing. | |||
<br /> | |||
== RNA Nomenclature, Processing, Structure, Diversity and mode of function == | |||
IS''605'' group guide RNAs have been called both reRNA and ωRNA (OMEGA for obligate mobile element-guided activity). Here, to eliminate confusion, we will use the term re(ω)RNA (or ω (re)RNA) for that from both ''tnpB'' and ''iscB'' groups although they have different secondary structures and functions. | |||
====Generating re(ω)RNA: Processing==== | |||
The important question of how re(ω)RNA is generated was addressed by Nety et al. <ref name=":26">{{#pmid:37272862}}</ref>. Given that TnpB is thought to be an ancestor of Cas12 <ref name=":27">{{#pmid:27096362}}</ref><ref>{{#pmid:28431230}}</ref>, the ability of Cas12 to process RNA (e.g. <ref name=":27" />) may have originated from analogous functions in TnpB <ref name=":26" />. They demonstrated that a TnpB orthologue from the bacterium, ''[[wikipedia:Alicyclobacillus_macrosporangiidus|A. macrosporangiidus]]'' (AmaTnpB or TnpB<sub>Ama</sub>), has RNA processing TnpB<sub>Ama</sub> activity and can generate an re(ω)RNA. | |||
The purified TnpB<sub>Ama</sub> (either wildtype or a [[wikipedia:RuvABC|RuvC]]-II catalytic mutant) was incubated with four different ''in vitro'' transcribed RNA substrates ([[:File:FigIS200 605 41.png|Fig. IS200.41]] '''i''' and '''ii''') produced from [[wikipedia:Polymerase_chain_reaction|PCR]]-generated DNA templates: a “random” negative control of 1190 nt ([[:File:FigIS200 605 41.png|Fig. IS200.41]] '''i1'''); a 166 nt RNA with the RNA guide very similar to that found to be associated with an TnpB<sub>Ama</sub> orthologue, a potential re(ω)RNA ([[:File:FigIS200 605 13.png|Fig. IS200.41]] '''i2'''); a full length tnpB transcript extended to include the guide sequence of 1190 nt ([[:File:FigIS200 605 41.png|Fig. IS200.41]] '''i3'''); and the potential re(ω)RNA with a 59 nt 3’ extension of 225 nt ([[:File:FigIS200 605 41.png|Fig. IS200.41]] '''i4'''). | |||
[[File:FigIS200 605 41.png|center|thumb|680x680px|'''Fig. IS200.41. i) Substrates used in testing TnpB<sub>Ama</sub> RNA processing activity''': blue, probable TnpB<sub>Ama</sub> coding sequence; orange, RNA guide sequence; grey, “stuffer or padding” sequence containing coding sequence (blue), putative x RNA scaffold (orange), guide (pink), and padding sequence (gray). '''ii) mapping the cis DNA cleavage inhibitor sequence.''' '''iii) The target joint showing the abutted TAM and guide sequences in red.''']] | |||
While substrate 1 was refractory to processing, both substrates 2 and 3 generated a 126 nt fragment. Substrate 4 generated a 185 nt fragment suggesting that, while it was processed correctly at the 5’ end, the 3’ extension was not processed. These conclusions were confirmed by RNAseq. All substrates were refractory to the TnpB<sub>Ama</sub> [[wikipedia:RuvABC|RuvC]]-II mutant. | |||
DNA cleavage activities were assessed by including a 1221 nt dsDNA substrate containing the TnpB<sub>Ama</sub> TAM ([[:File:FigIS200 605 41.png|Fig. IS200.41]] '''i'''). RNA substrates 2, 3 and 4 all catalyzed TnpB-mediated DNA cleavage. These results are consistent with those obtained with TnpB<sub>Dra2</sub> (see below;<ref name=":28">{{#pmid:37020030}}</ref><ref name=":33">{{#pmid:37020015}}</ref>) showing that only the proximal 12 nt of the guide sequence is sufficient for DNA targeting. | |||
The cleavage activity of the three substrates was not identical. The activity of substrate 3, which carries a substantial 5’ extension, was significantly lower than the other two raising the question of whether the extension may include inhibitory sequences. | |||
To investigate this, RNA samples were prepared with different 3’ deletions ([[:File:FigIS200 605 41.png|Fig. IS200.41]] '''ii''') When these RNA species were included in the cleavage reactions, a region between co-ordinates 825 and 875 which shows extensive complementarity to the re(ω)RNA scaffold was observed to be responsible for the inhibitory effect. | |||
This suggests a cis-regulatory mechanism engaged in controlling re(ω)RNA activity <ref name=":26" />. | |||
Using [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] <ref name=":32" />, Nakagawa et al.,<ref name=":28" /> observed that, although TnpB was co-expressed with a 247 nt re(ω)RNA in their purification system, it remained bound to only 100-160 nt of the RNA even in a denaturing gel. Further analysis revealed that the RNA was rapidly degraded in the absence of TnpB<sub>Dra2</sub> but, in its presence, three different RNAs of approximately 220, 160 and 130 nt were observed, the latter two included the guide sequence at the 3’ end. Very little of the 200nt species was observed in the purified RNP, suggesting degradation, but LC–MS analyses suggested that the 160nt species was cleaved between co-ordinates −150 and −149 or −138 and −137 by TnpB and/or endogenous RNases. They also provide evidence that the ~130-nt RNA is cleaved between −117U and −116G ([[:File:FigIS200 605 41.png|Fig. IS200.41]] '''ii'''). | |||
Furthermore, Sasnauskas et al., <ref name=":33" />, observed that an re(ω)RNA from between co-ordinates -130 and + 16 was active in DNA cleavage. Nakagawa et al.,<ref name=":28" /> also found that truncation of the 5′ region of the re(ω)RNA (−231G to −117U) had no effect on TnpB-mediated DNA cleavage. | |||
Thus re(ω)RNA of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] also appears to be processed at its 5′ end, and at least a 130 nt fragment including the 3’ guide are stably bound to the TnpB protein. | |||
====Structure of TnpB-reRNA in association with DNA==== | |||
Two studies addressed how TnpB interacts with its DNA template <ref name=":28" /> <ref name=":33" /> both used TnpB<sub>Dra2</sub>. ([[:File:FigIS200 605 42.png|Fig. IS200.42]]). and an re(ω)RNA which included nucleotides -130 to + 16 of the right end ([[:File:FigIS200 605 42.png|Fig. IS200.42]] ii) <ref name=":33" />. Nakagawa et al., <ref name=":28" /> used a substrate which was slightly extended in the 5' direction. Both sets of results were essentially the same. | |||
The RNP structure and the ternary structure with the target sequence TnpB could be divided into two “lobes” <ref name=":28" /><ref name=":33" />: an N-Terminal lobe (Recognition or Rec) comprising the wedge (WED) and REC domains and a nuclease lobe (Nuc) (insert in [[:File:FigIS200 605 42.png|Fig. IS200.42]] '''iii''') in which the three individual [[wikipedia:RuvABC|RuvC]] domains adopt an RNase H fold including D191 ([[wikipedia:RuvABC|RuvC]] I), E278 ([[wikipedia:RuvABC|RuvC]] II) and D361 ([[wikipedia:RuvABC|RuvC]] III). | |||
The results showed that in the RNP complex ([[:File:FigIS200 605 42.png|Fig. IS200.42]] '''iii''' '''left'''), the principal interactions are with the [[wikipedia:RuvABC|RuvC]] and WED domains whereas in the ternary structure with target DNA ([[:File:FigIS200 605 42.png|Fig. IS200.42]] '''iii right'''), not only does WED interact with '''TAM''' but the [[wikipedia:RecA|RecA domain]] intervenes around the branch point and the [[wikipedia:RuvABC|RuvC]] domain interacts extensively with the target-guide RNA hybrid helix. Note that the C<sub>R</sub> ('''TAM''') sequence which interacts with G<sub>R</sub> as DNA during TnpA-mediated transposition ( [[:File:FigIS200 605 42.png|Fig. IS200.42]] '''i''') also forms a short interaction with a sequence upstream which is identical to G<sub>R</sub> ([[:File:FigIS200 605 42.png|Fig. IS200.42]] '''ii''') to generate a pseudoknot. The scaffold core is formed by the RNA triplex region delimited by the pseudoknot while stem 1 and stem 2 protrude in opposite directions ([[:File:FigIS200 605 42.png|Fig. IS200.42]] '''iii'''). | |||
All five '''TAM''' positions ([[:File:FigIS200 605 42.png|Fig. IS200.42]] '''iii right''') are recognized directly by the WED domain and substitutions at any '''TAM''' position eliminates both target DNA binding and cleavage <ref name=":33" />. | |||
On the other hand, substitutions in the guide sequence do not prevent TnpB binding but prevent cleavage. The re(ω)RNA–target DNA heteroduplex ([[:File:FigIS200 605 42.png|Fig. IS200.42]] '''iii right''') is accommodated within a central channel formed by the WED, REC and [[wikipedia:RuvABC|RuvC]] domains <ref name=":28" /><ref name=":33" />. | |||
The authors conclude from the structural results that, for cleavage, the system senses formation of a (perfect) B-form RNA-DNA hybrid without any mismatches because of the effect of guide substitutions and that TnpB requires a 12–16-bp long target perfect DNA-guide RNA heteroduplex to initiate DNA cleavage. | |||
Additional information concerning activity was provided in a study principally exploring diversity in this system (see: [[IS Families/IS200 IS605 family#Exploring and defining TAM sequences|Exploring and defining '''TAM''' sequences]]). | |||
Xlang et al <ref name=":14" /> analyzed re(ω)RNA activity requirements of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] and three additional IS: [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTfu1 IS''Tfu1''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDge10 IS''Dge10''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba30 IS''Aba30'']. In these experiments, the 3’ re(ω)RNA scaffold end was defined as the RE tip ([[:File:FigIS200 605 44.png|Fig. IS200.44]]). | |||
Activity was exquisitely sensitive to the integrity of C<sub>R</sub>. Deletion or mutation of all but the 3’ terminal C<sub>R</sub> base pair significantly reduced activity. | |||
Additionally, the length of the guide sequence was important as was its sequence matches with the target. Optimal editing efficiency occurred with guide sequences between 16 and 20 nucleotides and subsequently decreased with increasing length but was observed to vary somewhat between the three IS ([[:File:FigIS200 605 42.png|Fig. IS200.42]] '''ii'''). | |||
Similarly, introduction of single and double base pair transversions into the target, especially in the '''TAM''' proximal region approximately up to base pair 12, severely reduced or eliminated activity ([[:File:FigIS200 605 42.png|Fig. IS200.42]] '''ii''') with some variation between the different IS. | |||
This is similar to results obtained with Cas9 and Cas12 systems themselves <ref>{{#pmid:23287718}}</ref><ref>{{#pmid:26422227}}</ref>. Finally, variation in 5’ length showed that shortest active scaffolds were 120–140 nt long and lengths of 300 nts were active. | |||
[[File:FigIS200 605 42.png|center|thumb|720x720px|'''Fig. IS200.42. Overall interactions between TnpB, reRNA and target DNA. i) Structure of the right end of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']''' <ref name=":9" /> showing a cartoon of the secondary stricture, the DNA sequence from -30 to -1 and the base pairing observed between G<sub>R</sub> and C<sub>R</sub>. '''ii)''' '''reRNA from -119 to +16''' showing detailed secondary structures. Note that the colors are those shown in ('''iii'''). The guide sequence is shown in red. The G<sub>R</sub> and C<sub>R</sub> sequence equivalents in reRNA are boxed. '''iii)''' '''two dimentional representation of reRNA structures''' in the TnpB-RNP complex (left) and in the Ternary complex with target DNA (right). The dark green, yellow and grey circles surrounding each nucleotide indicate the interacting segments of TnpB (insert below). Note that in the target sequence, the 5 nucleotide sequence 3’ to TAM is shown as complementary, however, for technical reasons (to facilitate unpairing ready for interaction with the reRNA quide sequence), the sequence CTCAG was used <ref name=":33" />.]] | |||
For TnpB<sub>Dra2</sub>, the C-terminal domain (residues 376 to 408; [[:File:FigIS200 605 42.png|Fig. IS200.42]] '''bottom insert''') has relatively low sequence similarity among TnpB proteins and is disordered in the structures. The C-terminal truncation mutant (Δ376 to 408; ΔCTD) is efficient in target DNA cleavage but exhibits somewhat reduced protein stability. Thus the CTD is not required for RNA-guided target DNA cleavage. | |||
====TnpB-re(ω)RNA: Diversity and Activity==== | |||
In view of the minimal size of the TnpB family guide endonucleases, they may prove useful for targeting applied for biotechnological purposes. It is therefore of importance to determine the extent of their diversity and inherent activities. It had been reported that the TnpB family is an order of magnitude more diverse than the IscB family and an [[wikipedia:HMMER|HMMER search]] of prokaryotic genomes identified >106 ''tnpB'' loci <ref name=":30" />. | |||
At least two studies <ref name=":14" /><ref name=":26" /> have addressed this question in some detail. | |||
=====Exploring and defining TAM sequences===== | |||
To further explore TnpB diversity ''tnpB'' DNA sequences of the 107 [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] subgroup ISfinder entries ([[:File:FigIS200 605 4B.png|Fig. IS200.4B]]) were more extensively analyzed <ref name=":14" />) with a view to uncovering differences in activities and identifying highly active members. This analysis did not include the 244 [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] members which are flanked by typical IS''200''-IS''605'' family secondary structures but carry only a TnpB gene. | |||
Firstly, the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] subgroup members were used as a seed to search the [https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/ non-redundant NCBI nucleotide sequence database]. Full length copies were extracted and their flanking sequences were examined to eliminate identical insertion events. | |||
To confirm the ISfinder validation, the right end of each multicopy IS was aligned and the tetranucleotide which forms C<sub>R</sub> and undergoes special base pairing with the tetranucleotide guide sequence (G<sub>R</sub>) within RE ([[:File:FigIS200 605 13.png|Fig. IS200.13]]) was identified, while the single copy IS were examined and compared to their ISfinder annotations. Additionally, the integrity of ''tnpB'' was confirmed. This is important because it has been observed that in IS containing ''tnpA'' and ''tnpB'', ''tnpB'' is often decayed (see He et al., <ref name=":38">{{#pmid:26350330}}</ref>). | |||
It should be noted that these procedures are always undertaken as a matter of course before any IS''200''/IS''605'' family entry is made in ISfinder. | |||
The collection was arranged into 64 bins using a 90% identity threshold and these were named after the IS with the highest copy number in each group ([[:File:FigIS200 605 43.png|Fig. IS200.43]]). Many of these groups consisted of only single example although several included a few additional examples. | |||
[[File:FigIS200 605 43.png|center|thumb|680x680px| » character represents the cleavage site used in transposition. The horizontal blue arrows show the IS used in further activity analysis <ref name=":14" />. ]] | |||
To examine how the sequence identities between C<sub>L</sub> and TAM ([[:File:FigIS200 605 44.png|Fig. IS200.44]]) correlate over the range of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] group members distributed over the 64 TnpB bins ([[:File:FigIS200 605 43.png|Fig. IS200.43]]), activities were tested separately for each of the 64 using a 2 plasmid, TAM depletion assay ([[:File:FigIS200 605 44.png|Fig. IS200.44]] '''ii''') <ref name=":14" />. | |||
One plasmid included ~200 nucleotides of the 3’ IS ends including a 20nt abutting “guide” sequence cloned downstream of a ''tnpB'' gene which, when expressed together ([[:File:FigIS200 605 44.png|Fig. IS200.44]] '''ii'''), are capable of forming the re(ω)RNA complex. The second plasmid consisted of a library with five randomized base pairs (N5) located 5’ to a target sequence recognized by the guide sequence, an assay similar to that used by Karvelis et al., <ref name=":37" /> ([[:File:FigIS200 605 40.png|Fig. IS200.40]]). Both plasmids were introduced concomitantly into a host cell. Those that carry an N5 sequence susceptible to the corresponding re(ω)RNA complex will be depleted and underrepresented in the plasmid population (reduced level of [[wikipedia:Kanamycin_A|Km<sup>R</sup>]] colonies in the population). | |||
[[File:FigIS200 605 44.png|center|thumb|720x720px|'''Fig. IS200.44. [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] Group Organization. i) General Organisation of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']-like IS.''' The left (red) and right (blue) ends are shown with their DNA secondary structures and the C<sub>L</sub>, G<sub>L</sub>, G<sub>R</sub> and C<sub>R</sub> boxes, TnpA and TnpB genes and the reRNA scaffold <ref name=":52" /><ref name=":38" />. '''ii) Experimental System to determine TAM activities. A two plasmid system is used.''' One plasmid is designed to supply both TnpB (purple) and reRNA (blue) expressed independently and carries a [[wikipedia:Chloramphenicol|chloramphenicol]] resistance gene (red). The target plasmid includes a 5 bp '''TAM''' sequence (NNNNN) abutting a guide sequence and carries a [[wikipedia:Kanamycin_A|kanamycin]] resistance gene <ref name=":14" />. ]] | |||
The corresponding TAM sequences ([[:File:FigIS200 605 43.png|Fig. IS200.43]]) showed a remarkable identity to the C<sub>L</sub> sequences with very few variations. For these variants, the authors propose alternative base pairings which would need to be confirmed experimentally. | |||
Further analysis based on a tree generated from TnpB alignments such as those shown in [[:File:FigIS200 605 35i.png|Fig. IS200.35]], revealed, perhaps not unexpectedly, that '''TAM''' sequences were more similar between closely related IS. | |||
The relative activities of the '''TAM''' sequences in each case were then assessed in ''[[wikipedia:Escherichia_coli|E. coli]]'' using a similar plasmid system to that of [[:File:FigIS200 605 44.png|Fig. IS200.44]], but in which the N5 sequence was substituted for the proposed '''TAM'''. | |||
A high proportion (25/64) of these '''TAM'''/TnpB derivatives were found to be active. | |||
=====Sequence requirements of the re(ω)RNA===== | |||
To explore re(ω)RNA sequence requirements in greater detail, three IS systems, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISTfu1 IS''Tfu1''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDge10 IS''Dge10''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISAba30 IS''Aba30''], in addition to [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], were analyzed in for their guide RNA functions <ref name=":14" />. | |||
The relatively small TnpB protein had been demonstrated to function in gene targeting in human cells <ref name=":37" />. Since the interest of Xiang et at <ref name=":14" /> was to optimize TnpB as a targeting tool in human cells, the assay was designed for use by transfection into the [[wikipedia:HEK_293_cells|HEK293T human cell line]] It used a system in which an out of frame downstream [[wikipedia:Green_fluorescent_protein|GFP gene]] was reframed only when the TnpB nuclease could act on its target and the DNA break was repaired by non-homologous joining ([[:File:FigIS200 605 45.png|Fig. IS200.45]] '''i'''). | |||
[[File:FigIS200 605 45.png|center|thumb|680x680px|'''Fig. IS200.45. Assay for re(w)RNA function in Eukaryotic Cells. i) Cartoon describing the showing the [[wikipedia:Green_fluorescent_protein|GFP]] reporter assay'''. A promoter (marked as a blue arrow) driving a constitutively expressed red fluorescent protein, mRFP, gene (for monitoring transfection efficiency) followed by an intervening target sequence and an out-of-frame eGFP gene (pale green). eGFP is expressed following a double-strand break in the target and repair by non-homologous end joining (NHEJ) when repair can introduce indels bringing eGFP into frame. '''ii) eGFP activation efficiencies''' of four TnpB systems quantified by flow cytometry.]] | |||
When this reporter plasmid and a TnpB/ re(ω)RNA plasmid were co-transfected, all four TnpB systems were shown to function, yielding 10% to 34% of GFP transfected cells ([[:File:FigIS200 605 45.png|Fig. IS200.45]] '''ii'''). They each generated short, deletions of various lengths, some of which lead to placing the [[wikipedia:Green_fluorescent_protein|GFP gene]] in phase yielding [[wikipedia:Green_fluorescent_protein|GFP]]+ cells in the population. The overall organization of the IS including '''TAM''', scaffold and guide sequence is shown in [[:File:FigIS200 605 46.png|Fig. IS200.46]] '''i.''' | |||
<br /> | |||
[[File:FigIS200 605 46.png|center|thumb|680x680px|'''Fig. IS200.46. Details of reRNA function.''' The RE is shown in blue with its DNA cleavage and guide sites (CL<sub>R</sub> and G<sub>R</sub>) with the sequence of C<sub>R</sub> and neighboring nucleotides indicated above. The arrows above indicate nucleotides which when mutated severely reduce activity. The RNA scaffolds and guide sequence regions are shown by horizontal arrows below. The sealed target site with its TAM sequence (the C<sub>L</sub> sequence from the left IS end) and the RNA guide sequence is shown at the bottom. Either single or double transversions in the bracketed sequence severely affect reaction efficiency]] | |||
Severely decreased activity in re(ω)RNA guide activity was observed with mutation of either C<sub>R</sub> or the four proximal nucleotides ([[:File:FigIS200 605 46.png|Fig. IS200.46]]) and in the target site with single or double transversion in the '''TAM''' proximal region. | |||
It should be noted that where assays were carried out following transfection of human [[wikipedia:HEK_293_cells|HEK293T cells]] and it is possible that the results may vary in the appropriate bacterial hosts. | |||
=====Exploring and defining TAM sequences in a library extracted from NCBI===== | |||
In a second study to investigate whether the re(ω)RNAs were present across the widely diverse TnpB systems <ref name=":30" />, Nety et al.,<ref name=":26" /> constructed a TnpB sequence library, extracted from data from NCBI, which included those associated with Y1 (HUH; IS''200''-IS''605'' family), serine ([[IS Families/IS607 family|IS''607'' family]]) transposases or “non-mobile” orthologues. This generated 5 clades <ref name=":26" />; background in Fig. IS200.47). The clades follow the configuration of the [[wikipedia:RuvABC|RuvC]] catalytic motif (Fig. IS200.47) ([[wikipedia:RuvABC|RuvC]]-III DRDXN, typical; [[wikipedia:RuvABC|RuvC]]-III NADXN, derived) or “catalytic rearrangements ([[wikipedia:RuvABC|RuvC]]-II (RII-r3 and 5) or [[wikipedia:RuvABC|RuvC]]-III (RIII-r4) domain) <ref name=":40">{{#pmid:37983496}}</ref> ([[:File:FigIS200 605 47.png|Fig. IS200.47]]). | |||
The authors chose 59 TnpB orthologs covering the diversity (background to [[:File:FigIS200 605 47.png|Fig. IS200.47]]; <ref name=":26" /> and varying in length between 353 to 550 aa. The TnpB-re(ω)RNA-encoding loci including a suitable promoter were expressed in an ''in vitro'' transcription/translation (IVTT) system and the 5’ ends were determined by RACE from the 3’ re(ω)RNA end lacking the guide sequence. | |||
This identified 30/59 orthologs with a defined 5’ end and lengths of between 79 and 466 nt. TnpB<sub>Ama</sub> generated a 106 nt scaffold, and is thus identical in processing as was found in the experiments of [[:File:FigIS200 605 41.png|Fig. IS200.41]]. Some orthologs, such as TnpB<sub>Dra2</sub> showed multiple 5’ ends, consistent with previous observations suggesting either incomplete or promiscuous RNase activity <ref name=":37" /><ref name=":28" />. | |||
A screen for DNA nuclease activities of the IVTT-produced re(ω)RNAs revealed that 27/59 were active. They also defined the '''TAM''' sequences revealing only limited diversity of these sequences as was also found for the ISfinder collection <ref name=":14" />. The assay was validated by confirming both the TnpB<sub>Ama</sub> (TCAC) and TnpB<sub>Dra2</sub> (TTGAT) '''TAM''' sequences. | |||
[[File:FigIS200 605 47.png|center|thumb|680x680px|'''Fig. IS200.47 reRNA and TnpB Diversity.''' The figure shows the diversity of [[wikipedia:RuvABC|RuvC]] catalytic sites observed in the [[wikipedia:RuvABC|RuvC]] III region. [[wikipedia:RuvABC|RuvC]] segment '''I''', '''II''' and '''III''' [Green] with the catalytic residues indicated above; Zinc Finger or [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH nuclease] [red]; Arginine rich helix [blue]; Wedge domain [yellow]; Helical bundle [grey]. Note, compared to Cas9-like IscB, TnpB and Cas12 have an N-terminal extension [dark grey] before the RuvC I motif. The amino acids in the variant catalytic sites are indicated below <ref name=":40" />.]] | |||
====re(ω)RNA and ''tnpB'' Co-evolution==== | |||
It was noted that [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] re(ω)RNA includes the 3’ segment of ''tnpB'' (residues 335 to 408 and −231G to −10U) which suggests that TnpB and the guide sequence system might have co-evolved <ref name=":33" />. However, although re(ω)RNA expression and processing may require co-expression with the TnpB protein, Nakagawa et al., <ref name=":28" /> suggest that co-evolution might be less constrained than previously predicted because, they argue, that functionally essential gene regions and those of re(ω)RNA do not overlap significantly: the structures imply that the TnpB C-terminus (residues 376 to 408 overlapping with −109G to −10U) is not involved in DNA cleavage, and the 5′ re(ω)RNA terminus (−231G to −117T, overlapping with residues 336 to 373) is not required for target DNA cleavage. | |||
The question of co-evolution is complex since it must also take into account the constraints imposed by the mechanism(s) involved in the DNA transposition process: the TAM sequence which abuts the left IS end ('''LE''') also serves as a sequence required for cleavage and insertion at the left end C<sub>L</sub> and that C<sub>L</sub> interacts in a complex way with a partially complementary sequence, G<sub>L</sub>, located at the foot of a stem loop (DNA) structure recognised by the TnpA transposase (see He et al., <ref name=":38" />). Moreover, changing the G<sub>L</sub> sequence leads to a change in the specificity of insertion – i.e. changes the C<sub>L</sub> sequence <ref name=":52" />. More importantly, the C<sub>R</sub> sequence which is an integral part of the IS, plays a central role in both the RNA guide and TnpA-mediated DNA cleavage reactions and interacts both with a sequence at the foot of a secondary structure at the right end ('''RE'''), G<sub>R</sub>. and, in the re(ω)RNA where it forms part of a pseudoknot ([[:File:FigIS200 605 42.png|Fig. IS200.42]]) <ref name=":28" /> <ref name=":33" />. | |||
====IscB, like TnpB, is also an RNA-guided Endonuclease==== | |||
Altae-Tran et al.,<ref name=":30" /> also examined a very large number of rather disperse IscB systems for their endonuclease properties, their association with RNA and their capacity as RNA guide proteins. Initial studies concerned a [[wikipedia:CRISPR|CRISPR]] associated IscB (marked in the article as Delaware Bay acquatic sample), which when purified from a heterologous ''[[wikipedia:Escherichia_coli|Escherichia coli]]'' host was associated with an RNA localised directly upstream of ''iscB'' which generated a signal in a PAM ('''TAM''') “discovery” assay and was able to generate cleavage products ''in vitro'' with the appropriate target. | |||
An alignment of over 500 (non-redundant) ''iscB'' genes revealed an upstream region of conserved sequence of about 300 bp which terminated at what the authors state is an IS''200''/IS''605''-like end. One specific example examined, present in the host ''[http://microbes.sites.haverford.edu/LaboratoryWiki/Ktedonobacter_racemifer K. racemifer]'' genome in nearly 50 copies, was associated with non-coding RNA species in most cases, which they called ΩRNA, with significant secondary structure potential. An example of ''[http://microbes.sites.haverford.edu/LaboratoryWiki/Ktedonobacter_racemifer K. racemifer]'' IscB was investigated ''in vitro'' using a plasmid substrate and shown to: use a target adjacent pentanucleotide TAM, ATAAA; and observed that by changing the complementary RNA extension (guide),cleavage was reprogrammable. | |||
To further characterize IscB, the TAM sequences of 57 examples from a collection of 86 genes from a phylogenetically diverse set of bacteria could be determined; of those 57, 5 were reconstituted with their omega RNA and found to be active in target cleavage; and one, AwaIscB or IscB<sub>Awa</sub>) from ''[[wikipedia:Allochromatium|Allochromatium warmingii]]'', was chosen for further study. | |||
Biochemically, IscB<sub>Awa</sub> could cleave double strand DNA in a magnesium dependent reprogrammable way with a temperature optimum of 35-40°C and with RNA guide lengths of between 15 and 45 nts. A mutation of the [[wikipedia:RuvABC|RuvC]] E residue eliminated cleavage of the non-target strand while mutation of an '''H''' residues in the [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ '''HNH''' motif] eliminated cleavage of the target strand (as expected for a Cas9-related enzyme; [[:File:FigIS200 605 32.png|Fig. IS200.32]]). Mutation of both residues eliminated cleavage altogether. Also, like Cas9, cleavage was: TAM (PAM) proximal (3 nts from TAM for the target and 8 or 12 nts for the non-target strands); the RNP protected DNA from ExoIII digestion 19 nts upstream of the TAM on the target and 6 downstream on the non-target ([[:File:FigIS200 605 32.png|Fig. IS200.32]]); and truncation of the newly identified N-terminal PLMP domain (named after a cluster of conserved amino acids; [[:File:FigIS200 605 48.png|Fig. IS200.48]] '''top''') eliminated activity. | |||
[[File:FigIS200 605 48.png|center|thumb|680x680px|'''Fig. IS200.48. Updated IscB and IsrB Domain Organization.''' Schematic of IscB showing the relative positions of the different functional motifs and domains. '''Top''': IscB (Extracted from Altae-Tran et al.<ref name=":30" />) and as modified by Kato et al. <ref name=":41">{{#pmid:36344504}}</ref> and, below, a derivative deleted for the HND nuclease domain. '''Bottom:''' ISrB, a related protein naturally lacking the nuclease domain <ref name=":42">{{#pmid:36224386}}</ref>. [[wikipedia:RuvABC|RuvC]] segment I, II and III [Green] with the catalytic residues indicated above; Zinc Finger or [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH nuclease] [red]; Arginine rich helix [blue]; Wedge domain [yellow]; Helical bundle[grey]. Note, compared to Cas9-like IscB, TnpB and Cas12 have an N-terminal extension [dark grey] before the [[wikipedia:RuvABC|RuvC]] I motif.]] | |||
=====The Structure of IscB–ωRNA ribonucleoprotein complex and the ternary complex containing target DNA.===== | |||
IscB associates with a 200-400nt ωRNA, significantly longer than the 100nt guide RNA of its probable offspring, Cas9 <ref name=":30" />. IscB are much smaller than Cas9 and lack the α-helical nucleic-acid recognition domain but share the [[wikipedia:RuvABC|RuvC]] and [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH endonuclease domains] ([[:File:FigIS200 605 48.png|Fig. IS200.48]]). | |||
Kato et al., <ref name=":41" /> used an IscB protein derived from the human gut metagenome (IscB<sub>Ogeu</sub>) as a model while Hirano et al., <ref name=":42" /> used an IrsB (IsrB<sub>Dt</sub>) from ''[[wikipedia:Desulfovirgula|Desulfovirgula thermocuniculi]]''. IrsB are related to IscB but lack the [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH nuclease domain] ([[:File:FigIS200 605 48.png|Fig. IS200.48]]). Note that this is a more detailed description of the domain structure than shown in [[:File:FigIS200 605 37.png|Fig. IS200.37]]. A detailed study by Meer et al., <ref name=":19" /> found that the IscB and IrsB formed clearly separate groups on a phylogenetic tree. | |||
For the structural cryo-em studies, a catalytically inactivated IscB<sub>Ogeu</sub> E193A ([[wikipedia:RuvABC|RuvC]])/H247A ([https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH]) derivative was used. In the IscB<sub>Ogeu</sub> structure, the catalytic D61 ([[wikipedia:RuvABC|RuvC]] I), E193 ([[wikipedia:RuvABC|RuvC]] II), H340, and D343 ([[wikipedia:RuvABC|RuvC]] III) and a divalent Mg<sup>2+</sup> ion ([[:File:FigIS200 605 48.png|Fig. IS200.48]]) are configured similarly to those in Cas9 although the structure lacked the [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH domain]. | |||
[[File:FigIS200 605 49.png|center|thumb|680x680px|'''Fig. IS200.49. IscB sequence wRNA organization. Top: wRNA sequence''' showing the various color-coded repeated elements arrows show orientation of the structural elements. '''Bottom Left: secondary structure features.''' Color-coded as in the linear sequence ('''Top'''). Disordered nucleotides are shown as unfilled circles. Stem 3 and stem 4 contribute to the formation of a Pseudoknot, Y. The outer circles and half circles show contacts between the RNA and IscB DHNH used in this study. The colors indicate the domains of the protein involved. '''Right top: interaction between the RNA guide and DNA target.''' The IscB interactions are also indicated: wedge [yellow]; Helical bundle, B[blue]; and Rec [grey]. TS and NTS show the target strand and non-target strand respectively. '''Right bottom: Simplified cartoon''' showing the relative arrangement of RNA [red] and DNA [black] and the various functional IscB domains. Redrawn from Kato et al., <ref name=":41" />.]] | |||
The ωRNA structure is complex ([[:File:FigIS200 605 49.png|Fig. IS200.49]]) comprising a 27 nt guide sequence and a 206 nt scaffold with 5 stem loops, 4 stems and a linker. The guide adaptor, stem-loop 1 (yellow), connects the guide segment (dark red) and stem 1 (green; which the authors call the “nexus” stem widely conserved in the tracrRNA of Cas9s; <ref>{{#pmid:25373540}}</ref>). Stem 1, stem 2 (grey; the central stem), and stem-loop 3 (brown) form a three-way junction. Like TnpB ωRNA, IscB<sub>Ogeu</sub> ωRNA also includes a pseudoknot (??). Stem loop 2 (blue) stacks with the nexus pseudoknot hairpin (pink) which in turn interacts with the pseudoknot stem 4 (red). | |||
The cognate ωRNA and IscB<sub>Ogeu</sub> E193/H247 were expressed in ''[[wikipedia:Escherichia_coli|E.coli]],'' the IscB-ωRNA complex purified and the ternary complex assembled by mixing with target DNA. However, to improve resolution, it was found necessary to delete the [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH domain] (residues 199 – 295) ([[:File:FigIS200 605 48.png|Fig. IS200.48]]), which is flexible in Cas9 <ref>{{#pmid:29127285}}</ref><ref>{{#pmid:26524520}}</ref>. The complex, composed of an IscB monomer and a single ωRNA was formed using the deletion derivative IscBω, an ωRNA of 233 nt including a 27 nt guide sequence and a partially double strand DNA target (Fig. IS200.49 '''right'''). | |||
In the ternary complex IscB ωRNA guide sequence forms a 14 bp heteroduplex with the target DNA ([[:File:FigIS200 605 49.png|Fig. IS200.49]] '''middle right''') and is recognized by IscB in a sequence-specific fashion using the short Rec region ([[:File:FigIS200 605 48.png|Fig. IS200.48]]) shown in grey in [[:File:FigIS200 605 49.png|Fig. IS200.49]] '''middle right'''. A simplified cartoon is shown in [[:File:FigIS200 605 49.png|Fig. IS200.49]] '''bottom right'''. This is somewhat different from Cas9 which form a 20 bp heteroduplex with a much larger Rec domain. '''TAM''' is recognized by the CT domain and mismatches at positions 15 and 16 are tolerated for cleavage. The differences in a full complex with the [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH domain] and with the ωHNH IscB derivative is shown in [[:File:FigIS200 605 49.png|Fig. IS200.50]]. | |||
[[File:FigIS200 605 50.png|center|thumb|680x680px|'''Fig. IS200.50. Difference between IscB and IscBDHNH''' '''Cleavage'''. '''Top:''' Cas9 cleavage configuration as shown in Fig. IS200.31. Cas9 cleavage was: '''TAM''' (PAM) proximal (3 nts from TAM for the target and 8 and 12 nts for the non-target strands; that the RNP protected DNA from ExoIII digestion 19 nts upstream of the TAM on the target and 6 downstream on the non-target strand. '''Bottom:''' Configuration of IscB with its guide RNA (red), the neighboring stem-loop 1 (yellow) and complete target DNA (black) showing the TAM sequence and of the IscBDHNH derivative with the partial substrate used. Redrawn from Kato et al., <ref name=":41" />.]] | |||
<br /> | |||
=====The Structure of IsrB–ωRNA ribonucleoprotein complex and the ternary complex containing target DNA===== | |||
IsrB is short, about 350 amino acids and lacking an [https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH domain] ([[:File:FigIS200 605 51.png|Fig. IS200.51]]) (therefore equivalent to the ΔHNH IscB derivative). It is associated with a long RNA guide of ~300-nt which guides IsrB to nick the non-target strand (NTS) of double-stranded (ds) DNA (see [[:File:FigIS200 605 51.png|Fig. IS200.51]] '''top''') containing a 5′-NTGA-3′ '''TAM <ref name=":30" />'''. | |||
The ''[[wikipedia:Desulfovirgula|Desulfovirgula thermocuniculi]]'' IsrB (IsrB<sub>Dt</sub>) ωRNA (284 nt) is longer than that of IscB<sub>Ogeu</sub>, and includes a 20 nt guide segment which forms a heteroduplex with the target DNA <ref name=":42" />. Like IscB<sub>Ogeu</sub>, IsrB<sub>Dt</sub> ωRNA is structurally complex including eight stem loops and four stems ([[:File:FigIS200 605 51.png|Fig. IS200.51]] '''middle'''). The structure includes 2 pseudoknots: one defined by two of the stem-loops (2 and 5, red boxes ([[:File:FigIS200 605 51.png|Fig. IS200.51]] '''middle''') and the other the “nexus” pseudoknot (blue boxes). | |||
[[File:FigIS200 605 51.png|center|thumb|680x680px|'''Fig. IS200.51.''' IsrB<sub>Dt</sub> '''sequence wRNA organization. Top: IsrB domain organization. Middle: wRNA sequence''' showing the various color-coded repeated elements with arrows indicating orientation of the structural elements. '''Bottom Left:''' '''interaction between the RNA guide and DNA target.''' TAM sequence is shown in red. '''Bottom Right:''' Simplified cartoon showing the relative arrangement of RNA [red] and DNA [black] and the various functional IscB domains (as shown in '''Top'''). Redrawn from Kato et al., <ref name=":41" />.]] | |||
IsrB<sub>Dt</sub> recognizes the TTGA '''TAM''' in the NTS by both hydrogen bonds and [[wikipedia:Van_der_Waals_force|van der Waals interactions]] and cleavage occurred 8–11 nt upstream of '''TAM''', further than the 2–5 nt of Cas9. '''TAM''' recognition was more specific at 60 °C for this thermophilic enzyme than at lower temperatures where NTGA was recognized <ref name=":30" />. | |||
=====IsrB diversity of structure and ωRNA architecture===== | |||
As in numerous publications in this field, Hirano et al., <ref name=":42" />, explored IsrB diversity and ωRNA ternary structure. They identified five orthologues and their cognate ωRNAs from: ''[[wikipedia:Crocosphaera_watsonii|Crocosphaera watsonii]]'' (IsrB<sub>Cw</sub>); ''[[wikipedia:Dolichospermum|Dolichospermum]]'' sp. (IsrB<sub>Ds</sub>); ''[[wikipedia:Calditerricola_satsumensis|Calditerricola satsumensis]]'' (IsrB<sub>Cs</sub>); [[wikipedia:Burkholderiales|Burkholderiales bacterium]] (IsrB<sub>Bb</sub>); and a viral metagenome assembly (IsrB<sub>K2</sub>). A standard '''TAM''' identification assay (such as that shown in [[:File:FigIS200 605 40.png|Fig. IS200.40]]) indicated that IsrB<sub>Bb</sub> recognizes NTGG while IsrB<sub>Cw</sub>, IsrB<sub>Cs</sub>, IsrB<sub>Ds</sub> and IsrB<sub>K2</sub> recognize NTG. All were active in an ''in vitro'' reconstituted IsrB-ωRNA RNPpromoted nicking of dsDNA substrates'''.''' | |||
ωRNAs of the five orthologues and IsrB<sub>Ds</sub> retain the core domain composition: four stems (S1–4) and five stem loops (SL1/2/4/5/7) ([[:File:FigIS200 605 51.png|Fig. IS200.51]] '''middle'''). Inspection of the ωRNAs showed some significant architectural differences, however: For example, in a group, including IsrB<sub>Cs</sub>, IsrB<sub>K2</sub> and IsrB<sub>Bb</sub>, SL2 and SL4 form pseudoknots, and SL5 and the intermediate region between S2 and SL7 form pseudoknots while in a second group, including IsrB<sub>Dt</sub>, IsrB<sub>Cw</sub> and IsrB<sub>Ds</sub>, SL2 and SL5 form pseudoknots, and SL4 and the intermediate region between S2 and SL7 form pseudoknots. | |||
<br /> | |||
== The IS''1341'' Conundrum: how do derivatives without their transposase transpose? == | |||
It had been noted that there are a large number of IS''200''-IS''605'' relatives which carry only the TnpB gene flanked by typical S''200''-IS''605'' family secondary structures <ref name=":38" /> in a number of bacteria including the thermophilic ''[[wikipedia:Geobacillus|Geobacillus]]'' and the cyanobacterium ''[[wikipedia:Anabaena|Anabaena]]''. These are grouped into the subfamily, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] with nearly 250 entries in ISfinder (December 2023) and were not included in the study of TnpB/'''TAM''' diversity of Xlang et al., <ref name=":14" />. Since the multiple [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS891 IS''891''] copies such as those found in ''[[wikipedia:Anabaena|Anabaena]]'' imply that [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] group members are mobile, the question arises as to how their mobility might be accomplished. One possibility is that this is assured by a tnpA copy in the cell or that tnpB itself is involved. | |||
====IS''1341'' Group Diversity: Mining the [https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/ NCBI NR database]==== | |||
The entries in ISfinder do not necessarily reflect the abundance of the different IS''200''-IS''605'' derivatives in the prokaryotic kingdom and Meer et al., <ref name=":19" /> mined the [https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/ NCBI NR] database for ''tnpB'' and ''iscB'' homologues and extracted their flanking genomic regions to provide some perspective of the proportion of ''tnpB'' genes associated with [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] group members. | |||
They found that only 25% of ''tnpB'' were associated with a ''tnpA'' copy. Note that nearly half IS''200''/IS''605'' members in ISfinder do not carry the ''tnpA'' gene. | |||
Moreover, in the same analysis, ''iscB'' genes were much less abundant than ''tnpB'' and only 1.5% of these were associated with ''tnpA''. Additionally, 8% of the ''tnpB'' collection were associated with a [[wikipedia:Site-specific_recombination|serine recombinase]] and are therefore probably members of the [[IS Families/IS607 family|IS''607'' family]] while none of the ''iscB'' genes were found associated with this type of enzyme. | |||
Both IscB and TnpB use transposon-encoded RNAs: For the IscB copies, a conserved intergenic region upstream of ''iscB'' that was bounded by the transposon '''RE''' was observed , which bore marked similarity to a non-coding RNA termed HEARO ([https://www.ebi.ac.uk/interpro/entry/InterPro/IPR003615/ HNH Endonuclease]-Associated RNA and ORF; <ref>{{#pmid:19956260}}</ref>) and those encoded downstream of ''tnpB'' have of course been known for some time, initially in [[wikipedia:Haloarchaea|Halobacteria]] (ncRNAs, sotRNAs and reRNAs;<ref name=":21" /><ref name=":43" />). | |||
====Conserved secondary structure motifs==== | |||
Covariation in the collection of ''tnpB'' and ''iscB'' re(ω)RNAs was analyzed separately to highlight the conserved secondary structure motifs ([[:File:FigIS200 605 52.png|Fig. IS200.52]]) which straddle the IS ends and flanking DNA. This means that the (external) guide sequences (NNN… in [[:File:FigIS200 605 52.png|Fig. IS200.52]] '''i''' '''and ii''') change with each transposition event into another target. | |||
[[File:FigIS200 605 52.png|center|thumb|680x680px|'''Fig. IS200.52. Co-variation in IS''1341'' Group re(w)RNAs. i)''' Covariation/conservation model of TnpB group re(w)RNA. '''ii)''' Covariation/conservation model of IscB Group re(w)RNA. Note that these occur on opposite DNA strands as indicated by the cartoon of the TE. The circles indicate nucleotide conservation at a given site. Colors represent the extent of conservation. The boxes indicate co-conservation in the sequence library <ref name=":19" />. ]] | |||
<br /> | |||
====IS''1341'' group orientation suggests ''iscB'' re('''ω''')RNA but not ''tnpB'' re('''ω''')RNA is expressed in transcriptionally active environments.==== | |||
A strong correlation was noted in orientation of between upstream genes and ''istB'' copies but not for ''tnpB'' copies. This is an important observation since it suggests that the re(ω)RNA of ''iscB'' must be expressed from an outside promoter towards ''iscB'' ([[:File:FigIS200 605 52.png|Fig. IS200.52]] '''ii''') thus favoring production when inserted into transcriptionally active regions, while that of ''tnpB'' as shown previously ([[IS Families/IS200 IS605 family#Generating re.28.CF.89.29RNA: Processing|Generating re(ω)RNA: Processing]]) is expressed by processing from the ''tnpB'' transcript ([[:File:FigIS200 605 52.png|Fig. IS200.52]] '''i'''). | |||
====IS''1341'' Group Function==== | |||
More detailed studies focused on ''tnpB''- and ''iscB''-carrying elements from ''[[wikipedia:Geobacillus_stearothermophilus|G. stearothermophilus]]'' representing 1% of the genome <ref name=":19" />. These could be divided into 5 families ([https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst2 IS''Gst2'']-''6'') based on ''tnpB'', '''RE''' and '''LE''' sequences and have quite similar '''RE''' and '''LE''' boundaries and all exhibited clade-specific C<sub>L</sub> (=TAM) and C<sub>R</sub> (=TEM) (e.g. [[:File:FigIS200 605 44.png|Fig. IS200.44]] '''i'''; [[:File:FigIS200 605 53.png|Fig. IS200.53]]) with clade-specific co-varying mutations between both the TAM and TEM sequences and associated DNA guide sequences (e.g. [[:File:FigIS200 605 44.png|Fig. IS200.44]] '''i'''). Evidence from RNA-seq also showed that re(ω)RNA was expressed from multiple copies of these transposase-less IS (i.e. at different genomic positions and with different guide sequences). Derivatives lacking any protein-coding gene '''PATES''' ('''P'''alindrome-'''A'''ssociated '''T'''ransposable '''E'''lements) <ref>{{#pmid:21701686}}</ref> were also identified. | |||
====Does a Resident TnpA copy Drive IS''1341'' group Transposition?==== | |||
Importantly, the authors also identified a ''tnpA'' gene in the ''[[wikipedia:Geobacillus_stearothermophilus|G. stearothermophilus]]'' genome within [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst2 IS''Gst2''] which might serve to drive transposition of these IS. To asses this, a plasmid-based excision system was used which included a cloned copy of ''tnpA'', (TnpA<sub>Gst</sub>) to catalyze excision of a mini [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst2 IS''Gst2''] ([[:File:FigIS200 605 53.png|Fig. IS200.53]]). | |||
Excision ''in vivo'' as monitored by a [[wikipedia:Polymerase_chain_reaction|PCR reaction]] appeared robust not only with [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst2 IS''Gst2''] but [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst3 IS''Gst3'']'', [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst4 ISGst4]'' and [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst5 IS''Gst5''] all gave robust excision reactions (while that of [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst6 IS''Gst6''] was weak) and all generated the expected donor junction sequence after excision. The reaction was dependent on an active TnpA catalytic site. However, a substrate derived from [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] was inactive presumably because it lacks an upstream domain ([[:File:FigIS200 605 30.png|Fig. IS200.30]]). Interestingly, excision occurred when the mini-IS was present on the leading or lagging strand template but required both '''LE''' and '''RE'''. Mutation of the TAM (C<sub>L</sub> sequence) or the guide sequence (G<sub>L</sub>) reduced or eliminated activity but compensatory mutations which should restore the C<sub>L</sub>/G<sub>L</sub> interactions <ref name=":52" /> restored some excision activity ([[:File:FigIS200 605 53.png|Fig. IS200.53]]). Perhaps surprisingly, mutation of TEM (G<sub>R</sub>) did not eliminate excision since the system was able to select an alternative wildtype TEM sequence downstream to create an alternative IS end. It seems possible that, since '''LE''' is involved in both excision and targeted insertion, its correct interactions between G<sub>L</sub> and C<sub>L</sub> may be more stringent for activity. | |||
[[File:FigIS200 605 53.png|center|thumb|680x680px|'''Fig. IS200.53. ISGst Excision Assay. Top:''' Experimental setup. Left: plasmid expressing TnpA (purple) and carrying a mini IS (yellow). The left (red) and right (blue) ends are indicated. Right: plasmid following Excision. '''Below left and right:''' illustrate the interactions between C<sub>L</sub> (TAM) and G<sub>L</sub>(red) and C<sub>R</sub> and G<sub>R</sub> (blue) respectively. The Central table shows the relative excision activity of wildtype [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst3 IS''Gst3''] (line 1), a mutation in TAM(CL) (line 2), a mutation in GL (line3), a mutation in TAM(CL), and a compensatory mutation in GL (line 4).]] | |||
The authors also determined that, although not in single chromosome copy, the cloned ''tnpA'' could drive transposition at similar frequencies whether ''in cis'' or ''in trans'' thus reinforcing the idea that a single ''tnpA'' gene could drive transposition of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] group members in the same cell as measured in a mating out assay ([[IS Families/IS200 IS605 family#TnpBGst and IscBGst proteins are active RNA-guided Nucleases.|TnpB<sub>Gst</sub> and IscB<sub>Gst</sub> proteins are active RNA-guided Nucleases]] below). | |||
However, it is puzzling that [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341'']-like elements of both the TnpB and IscB type have proliferated extensively compared to the “full length” IS. This raises the question of the way in which these genetic objects arose and their function in the cell. It would be interesting to undertake a reconstruction experiment using a single chromosomally located TnpA copy together with a single [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1341 IS''1341''] group IS to follow the kinetics of transposition over the long term to determine whether, as seems to be the case with ''[[wikipedia:Geobacillus_stearothermophilus|G. stearothermophilus]]'', an accumulation of these pared-down IS occurs. | |||
====TnpB<sub>Gst</sub> and IscB<sub>Gst</sub> proteins are active RNA-guided Nucleases==== | |||
The activity of the IS''Gst'' encoded proteins, TnpB<sub>Gst</sub> and IscB<sub>Gst</sub>, was also explored <ref name=":19" /> using a two-plasmid interference assay: the effector plasmid tagged with [[wikipedia:Spectinomycin|Spectinomycin resistance]] and which expresses ωRNA with a fused guide sequence together with a cloned copy of TnpB or IscB and a target plasmid tagged with [[wikipedia:Kanamycin_A|kanamycin]] resistance which carried the TAM sequence and DNA flanking '''RE''' ([[:File:FigIS200 605 54.png|Fig. IS200.54]] '''A'''). | |||
[[File:FigIS200 605 54.png|center|thumb|720x720px|'''[[wikipedia:Kanamycin_A|Fig]]. IS200.54. Plasmid Interference and Replacement Assay. A) Plasmid Interference Assay.''' One plasmid is designed to supply both TnpB or IscB (purple) and ωRNA (green) with the guide sequence (black) expressed independently and carries a spectinomycin resistance gene (spc; pink). The target plasmid includes a 5 bp TAM sequence (NNNNN;red) abutting a guide sequence (blue) and carries a [[wikipedia:Kanamycin_A|kanamycin]] resistance gene (pink) <ref name=":14" /> <ref name=":19" />. '''B) Replacement Assay.''' A mini IS (yellow with left, red, and right blue ends) inserted into the chromosomal lac gene in the presence of '''Left:''' a plasmid expressing TnpA (purple) and carrying ampicillin resistance, ApR (red) or '''Right:''' the TnpA-expressing plasmid and a second plasmid expressing an appropriate wRNA and TAM sequence (green) and TnpB (purple). Excision produces lac<sup>+</sup> colonies (Left) whereas TnpB-ωRNA replacement (Right) maintains the lac<sup>-</sup> phenotype.]] | |||
The target sequences were verified as [[the]] donor joint generated by IS excision. Successful targeting results in plasmid loss (loss of [[wikipedia:Kanamycin_A|kanamycin resistance-carrying]] plasmid due to double strand breakage and loss of cell viability under selective conditions. IscB<sub>Gst</sub> (IscB, [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst2 IS''Gst6'']) and three distinct TnpB<sub>Gst</sub> (TnpB1, [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst2 IS''Gst2'']; TnpB2, [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst3 IS''Gst3'']; TnpB3, [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst4 IS''Gst4'']; and TnpB4, [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst5 IS''Gst5'']) homologues were highly active for RNA-guided DNA cleavage of their native donor joints. | |||
TnpB of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] was highly active in this type of assay but that of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] was inactive presumably because TnpB<sub>IS608</sub> lacks the N-terminal [[wikipedia:Helix-turn-helix|HTH domain]] ([[:File:FigIS200 605 30.png|Fig. IS200.30]]) | |||
Additionally, they cloned the native [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst3 IS''Gst3''] (TnpB3) and demonstrated that its TnpB2–ωRNA robustly cleaved not only the target plasmid but also a plasmid devoid of the donor joint. When a ''tnpA'' gene was inserted upstream of ''tnpB'' in [https://isfinder.biotoul.fr/scripts/ficheIS.php?name=ISGst3 IS''Gst3''], it proved competent for transposition in a mating out assay at levels ‘'''in cis'''’ comparable to when “'''in trans'''”. | |||
Binding specificity was then examined using [[wikipedia:ChIP_sequencing|ChIP–seq]] and catalytically dead IscB and TnpB programmed with ''[[wikipedia:Lac_operon|lacZ]]''-specific ωRNAs (i.e. ωRNAs in which the guide sequence was complementary to a short sequence in the ''[[wikipedia:Lac_operon|lac]]'' gene) to map chromosomal binding sites of nuclease-dead. These results revealed both the “on-target” (''[[wikipedia:Lac_operon|lac]]'') site but also numerous off-target sites, indicating that the Cas12 (TnpB) and Cas9 (IscB) evolutionary “parents” show less dependence on RNA–DNA complementarity for stable DNA binding than their Cas12 and Cas9 descendants. The results also suggested that they might show a higher reliance on a more extensive TAM motif. | |||
====TnpB is Required for Replacement of the Deleted IS Copy==== | |||
“Peel and Paste” transposition <ref name=":38" /> implies that the transposition would result in loss of the IS from the lagging strand of the “donor’ [[wikipedia:DNA_replication#Replication_fork|replication fork]] ([[:File:FigIS200 605 22.png|Fig. IS200.22]]). Excision of the IS creates a perfect donor joint. The TnpB<sub>Y</sub>/ωRNA endonuclease/guide RNA system could provide a solution to this by “intercepting” the donor joint and retargeting it, recopying the remaining IS copy back into the empty site <ref name=":37" />. The model, developed from data obtained with ''[[wikipedia:Deinococcus_radiodurans|Deinococcus radiodurans]]'' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], was elegantly confirmed by Meers et al., <ref name=":19" /> using a ISGst-based IS. | |||
Using a mini-IS tagged with [[wikipedia:Kanamycin_A|kanamycin]] resistance and inserted into the [[wikipedia:Lac_operon|''lacZ'' gene]] in the ''[[wikipedia:Escherichia_coli|E.coli]]'', chromosome, excision could be measured by the appearance of [[wikipedia:Lac_operon|''lac''<sup>+</sup>]] colonies while transposition could be measured by retention of [[wikipedia:Kanamycin_A|kanamycin resistance]] in the presence of non-targeting or lac targeting ωRNA, TnpA (wild type or catalytic mutant and or TnpB (wildtype or catalytic mutant) ([[:File:FigIS200 605 54.png|Fig. IS200.54]] '''B'''). The results showed that: a large fraction of colonies were [[wikipedia:Lac_operon|''lac''<sup>+</sup>]] in the presence of wild-type but not mutant TnpA; this was reduced 1000x when [[wikipedia:Kanamycin_A|kanamycin resistance]] was selected; TnpA+TnpB and ''[[wikipedia:Lac_operon|lacZ]]''-specific ωRNA completely eliminated [[wikipedia:Lac_operon|''lac''<sup>+</sup>]] colonies. This result is consistent with TnpB being responsible for retention (replacement) of the IS copy following “'''''peel and paste'''''”. Meers et al.,<ref name=":19" /> coined the term “'''''peel and paste/cut and copy'''''” for the overall process. | |||
====The Copy Choice Model for TnpB Function During Transposition==== | |||
One of the important questions concerning IS''200''/IS''605'' transposition pathway and those of the related IscB-carrying elements had been the way in which these IS maintain their copy number in the donor site following peeling off the donor daughter “[[wikipedia:Chromatid|chromatid]]” to leave a transposon-less donor joint in the lagging strand of the [[wikipedia:DNA_replication#Replication_fork|replication fork]]. In the model shown in [[:File:FigIS200 605 16.png|Fig. IS200.16]], excision of a single stranded circular IS intermediate from the lagging strand leaves one double strand copy of the IS in the donor replicon carried by one of the daughter “[[wikipedia:Chromatid|chromatids]]” (on the leading strand) and a “donor joint” on the other. In this scenario, no increase in the IS copy number would occur. | |||
The results of Karvelis et al. with [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] provided a solution for this problem which has been extended and supported by a number of authors (see Meer et al.,<ref name=":19" />). Karvelis et al., <ref name=":37" /> and Meer et al., <ref name=":19" /> proposed a model in which, following excision of the single strand circular IS copy and formation of the donor joint ([[:File:FigIS200.40.png|Fig. IS200.55]] '''A'''), reRNA/TnpB targeted cleavage of the donor joint is used to initiate a copy/choice replacement of the IS from the remaining IS on the daughter [[wikipedia:Chromatid|chromatid]] ([[:File:FigIS200 605 55.png|Fig. IS200.55]] '''B''') for example (see Cox et al., <ref>{{#pmid:17364684}}</ref>). This would permit maintenance of donor replicon integrity while assuring an increase in IS copy number via transposition. They point out that conceptually, this in some ways resembles [[wikipedia:Group_I_catalytic_intron|group I intron]] behavior. <br /> | |||
[[File:FigIS200 605 55.png|center|thumb|720x720px|'''Fig. IS200.55.''' '''TnpB-Facilated Replacement of IS''Dra2'' at the Donor Joint by Copy-Choice. A)''' Model Proposed by Karvelis et al.,<ref name=":37" />. '''From left to right:''' The [[wikipedia:DNA_replication#Replication_fork|replication fork]] generates single-strand DNA on the lagging strand which reveals a single strand circular IS copy allowing the intervention of the TnpA transposase. This liberates a single strand circular IS copy and creates a donor joint in which the '''LE''' target sequence and the '''RE''' flanking DNA are joined. TnpB together with the guide RNA produced from the intact IS copy targets the donor joint and introduces a double-strand cleavage (red arrowheads). The broken strands then invade the intact IS and generate a new IS copy by copy choice. '''B) Possible copy-choice mechanism.''' Invasion of the sister [[wikipedia:Chromatid|chromatid]] carrying the intact IS by the 3' end created from the donor joint double-strand break, providing primers for replicative copying. LE and RE are shown in red and blue respectively, the RE flank in green. Newly replicated DNA is shown as a dotted line and the TAM sequence as a black box.]] | |||
TnpA-mediated transposition of the IS''200''/IS''605'' family is well documented. The transposition model is based on ''in vitro'' experiments using single-strand oligonucleotides, results from ''in vivo'' experiments which implicate [[wikipedia:DnaG|DnaG]] and the observation that the orientation of insertion is correlated with the direction of replication of the target chromosome (see <ref name=":38" />and references therein). The proposed function of TnpB in this process neatly completes the transposition model by offering an explanation of how IS copy number is maintained by replacement of the excised IS at the resulting donor joint.<br /> | |||
== IStrons == | |||
Another role for '''Y1 transposases''' was suggested by the identification of chimeric genetic elements widely distributed in the genome of ''[[wikipedia:Clostridium_difficile|Clostridium difficile]]''<ref name=":62">{{#pmid:10931294}}</ref>, the ''[[wikipedia:Bacillus_cereus|Bacillus cereus]]'' group and ''[[wikipedia:Fusobacterium_nucleatum|Fusobacterium nucleatum]]'' Subspecies Polymorphum <ref>{{#pmid:17668047}}</ref><ref>{{#pmid:16907808}}</ref><ref>{{#pmid:18587153}}</ref><ref name=":282">{{#pmid:16030238}}</ref> and many other bacterial species <ref name=":44">{{#pmid:38045383}}</ref>: '''IStrons'''. These combine functional and structural properties of [[wikipedia:Group_I_catalytic_intron|group I introns]] at their 5’-end with those of an IS element at their 3’-end ([[:File:FigIS200 605 56.png|Fig. IS200.56]] '''A''' and '''B'''). This 3' part contains an IS''200''/IS''605'' related sequence including two full length or truncated orfs, ''tnpA'' and ''tnpB'', very similar to those found in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] (''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]]'') and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISCpe2 IS''Cpe2''] (''[[wikipedia:Clostridium_perfringens|C. perfringens]]''). | |||
'''IStrons''' are present at several loci in the same genome, indicating that this element is mobile and may move as a complete genetic unit. All IStron copies analyzed so far are inserted 3’ to the pentanucleotide '''TTGAT'''. ''In vivo'', all variants can be efficiently and precisely excised signifying that components necessary for [[wikipedia:Ribozyme|ribozyme]] activity are present <ref name=":62" />. The data suggest that IS components could mediate the spread of IStron while the intron component could assure splicing. | |||
[[File:FigIS200 605 56A.png|center|thumb|720x720px|'''Fig. IS200.56.''' '''A) Organization of IStron''' where Intron and IS parts are indicated. P1–P8 and IGS (the internal guide sequence) represent characteristic features of [[wikipedia:Group_I_catalytic_intron|group I Introns]]. '''LE''', '''RE''', '''TTGAT''' target site and two orf of the IS part are indicated, as are the characteristic secondary structure features, P1-9. The functional domains are outlined in color: pink (substrate), green (scaffold) ;and yellow (catalytic). The 5’ splice site is determined by an interaction with a sequence within the upstream intron and a guide sequence, to form the P1 loop.]] | |||
''In vitro'' oligonucleotide-based assays using purified IStron transposase confirmed that at the DNA level, '''TTGAT''' is the '''LE''' cleavage site in excision and the target site respectively ([https://www-lmgm.biotoul.fr/uk/equipes/grpieva/equipe.html Caumont-Sarcos], unpublished, cited in He et al.,<ref name=":38" />). At the RNA level, the same sequence is probably required in the splicing reaction<ref>{{#pmid:19667762}}</ref>. This would represent a novel type of intron invasion and transposition mechanism and provide a direct link between RNA and DNA worlds<ref name=":38" />. | |||
[[File:FigIS200 605 56B.png|center|thumb|720x720px|'''Fig. IS200.56. B) Splicing Mechanism is a two-step process.''' An exogenous GTP, G, molecule is bound to a site within P7. The GTP 3’OH attacks the 5’ splice site attaching G to the 5’ intron RNA end via a 3’-5’ phosphodiester bond followed by a conformational change in which the downstream exon 3’ G trades position permitting a 3’OH attack from the upstream exon 3’ end (3’ splice site) to form and exon-exon RNA joint and the release of the intron (From Hausner et al., <ref>{{#pmid:24612670}}</ref>. ]] | |||
It is interesting to note that related IStrons have now recently been identified which include components of the [[IS Families/IS607 family|IS''607'' family]] <ref name=":282" /><ref>{{#pmid:25324310}}</ref>. These are characterized by a serine transposase together with a ''tnpB'' gene<ref name=":5" />. | |||
More recently attention has been focused on the various activities of these IStrons based mainly on the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-containing derivatives <ref name=":44" />. These studies have investigated the TnpB activity of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] itself and the interplay between DNA transposition, self-splicing intron mobility and RNA guide activity in both [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605'']- and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-based IStrons. | |||
Based on a [[:File:FigIS200 605 57.png|TnpB phylogenetic tree]] <ref name=":19" />, a bioinformatic analysis for each ''tnpB'' gene identified associations with ''tnpA''<sub>S</sub> ([[IS Families/IS607 family|IS''607'' family]]), tnpA<sub>Y</sub> ([[IS Families/IS200 IS605 family#General|IS''200''/IS''605'' family]]), [[wikipedia:Group_I_catalytic_intron|group I introns]] (IStron), and ωRNA loci ([[:File:FigIS200 605 57.png|Fig. IS200.57]]). | |||
As in the simple IS derivatives, ωRNA loci were invariably located at the left end, 3′ to ''tnpB'' in the same region critical for 3’SS (Splice Site) intron recognition. This analysis confirmed the previous observation that not all ''tnpB''-containing IStrons include an intact ''tnpA''. | |||
[[File:FigIS200 605 57.png|center|thumb|720x720px|'''Fig. IS200.57 Phylogenetic tree of TnpB homologs''' revealing associations with tnpA<sub>S</sub> ([[IS Families/IS607 family|IS''607'' – family]], outer circle – red), tnpA<sub>Y</sub> (IS''200''/IS''605'' – family; blue circle), [[wikipedia:Group_I_catalytic_intron|group I introns]] (Istron; purple circle), and ωRNA loci (white on grey circle). Large groups of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] - (pink sector) and IS''200''/IS''605'' - family (blue sector) IStrons were identified, and representative Cdi ([[wikipedia:Clostridioides_difficile|''Clostridioides'' ''difficile'']]) IStron, Cbo ([[wikipedia:Clostridium_botulinum|''Clostridium'' ''botulinum'']]) IStron, and Cse (''[[wikipedia:Clostridium|C. senegalense]]'') IStron members are indicated <ref name=":44" />. ]] | |||
====The IS''605''-based IStron: CdiIStron==== | |||
The IStron content of [[wikipedia:Clostridioides_difficile|''Clostridioides'' ''difficile'']] (Cdi) and [[wikipedia:Clostridium_botulinum|''Clostridium'' ''botulinum'']] (Cbo), was examined in detail: ''[[wikipedia:Clostridioides_difficile|C. difficile]]'' 630 carries 8 IS''605'' family IStrons ('''CdiIStron''') all without a full length TnpA<sub>Y</sub> (unlike CdISt1 which carries an intact tnpA<sub>Y</sub> gene <ref>{{#pmid:15060058}}</ref>, three freestanding [[wikipedia:Group_I_catalytic_intron|group I introns]] and three IS''605'' copies. IStron '''LE''' and '''RE''' corresponding to those of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] were identified using a [[wikipedia:Covariance|covariance model]] developed previously <ref name=":19" /> and revealed a '''TTGAT''' TAM sequence abutting '''LE''' similar to that identified as the C<sub>L</sub> earlier (see He et al.,<ref name=":38" />) ([[:File:FigIS200 605 60.png|Fig. IS200.60]] '''A'''). The overall organization resembled a so-called Twort group I [[wikipedia:Ribozyme|ribozyme]] <ref>{{#pmid:10359829}}</ref><ref>{{#pmid:15580277}}</ref>. | |||
Analysis of published RNA sequencing data <ref>{{#pmid:34908523}}</ref><ref>{{#pmid:34131082}}</ref> revealed expression from all intron and IS''605'' elements in the [[wikipedia:Clostridioides_difficile|''C. difficile'']] genome. Moreover, spliced and unspliced sequences for all but one of the '''CdiIStron''' were detected demonstrating that the '''IStrons''' are active and defining the exon-intron boundaries. These corresponded perfectly to the predicted '''IS''605''''' '''LE/RE'''. An example is shown in [[:File:FigIS200 605 62.png|Fig. IS200.62]]. Here the '''IStron''' is inserted into a Toxin Glycosylating Gene, ''tcdA'', in one strain but not in another. | |||
[[File:FigIS200 605 58.png|center|thumb|680x680px|'''Fig. IS200.58. An IS''605'' family Istron from [[wikipedia:Clostridioides_difficile|''Clostridioides'' ''difficile'']]''.''CdiIStron''' with only a partial tnpA<sub>Y</sub> gene inserted into the Toxin Glycosylating Gene, ''tcdA'', in Cdi strain [https://www.ncbi.nlm.nih.gov/nuccore/NZ_CP037844.1 NZ_CP037844.1]. Codons are underlined in the upstream and downstream Exons. The TAM sequence is shown in red and the position of the wRNA at the right end is indicated. '''Bottom:''' Sequence of a ''tcdA'' close homologue in Cdi strain [https://www.ncbi.nlm.nih.gov/nuccore/MN625142.1 MN625142.1] without the insertion. The percentage identities are indicated below <ref name=":44" />.]] | |||
=====IS''607''-based IStrons===== | |||
An IStron including full length ''tnpAS'' and ''tnpB'' genes ('''CboIStron'''; [[:File:FigIS200 605 59.png|Fig. IS200.59]]) was identified in ''[[wikipedia:Clostridium_botulinum|C. botulinum]]'' strain BKT015925 located on a large [[wikipedia:Botulinum_toxin|botulinum neurotoxin]]-encoding plasmid together with an IStron lacking ''tnpA'' and multiple stand-alone [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] elements. IS''607'' elements are difficult to identify since they do not have inverted repeat or palindromic ends and do not generate flanking target repeats ('''TSD''') on insertion. Žedaveinytė et al., <ref name=":44" /> were able to detect the '''LE''' and '''RE''' boundaries using a combination of comparative genomics (comparison of full and empty sites) and homology between '''CboIStrons'''. They could define a consensus TAM sequence as '''TGGG''' ([[:File:FigIS200 605 59.png|Fig. IS200.59]]). Moreover, the [[wikipedia:Covariance|covariance model]] (CM) used to define Cdi [[wikipedia:Group_I_catalytic_intron|group I intron]]<nowiki/>s was also be used successfully to detect CBo [[wikipedia:Group_I_catalytic_intron|group I intron]]<nowiki/>s and, as in the IS''605'' IStron, the splice sites overlapped the IS ends. An example of a '''CBoIStron''' inserted into a phage antirepressor gene, ''ant'', is shown in [[:File:FigIS200 605 59.png|Fig. IS200.59]]. | |||
[[File:FigIS200 605 59.png|center|thumb|680x680px|'''Fig. IS200.59. An [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] family IStron from [[wikipedia:Clostridium_botulinum|''Clostridium'' ''botulinum'']]''.'' Top:''' '''CboIStron''' with only a full tnpA<sub>S</sub> gene inserted into a phage antirepressor gene, ''ant'', in Cbo strain [https://www.ncbi.nlm.nih.gov/nuccore/NC_015417.1/ NC_015417.1]. Codons are underlined in the upstream and downstream Exons. The TAM sequence is shown in red and the position of the wRNA at the right end is indicated. '''Bottom:''' Sequence of an ''ant'' close homologue in Cbo strain [https://www.ncbi.nlm.nih.gov/nuccore/NZ_JAAMYD010000011.1 NZ_JAAMYD010000011.1] without the insertion. The percentage identities are indicated below.]] | |||
=====IS''605'' and IS''607'' ωRNAs Share Common Structural Features===== | |||
While the transposition reactions of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] are quite different (TnpAy recognizes secondary structures generated in single-strand DNA in the IS ends while TnpAs recognizes double strand DNA), it had been noted that the ends of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] and its relatives included a number of repeat sequences ([[:File:IS607.1.png|Fig. IS607.1]] and [[:File:IS607.2.png|Fig. IS607.2]]). Identification of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] ωRNA using the [[wikipedia:Covariance|covariance model]] showed these share common secondary structure features ([[:File:FigIS200 605 60.png|Fig. IS200.60]], [[:File:FigIS200 605 61.png|Fig. IS200.61]] and [[:File:FigIS200 605 62.png|Fig. IS200.62]]) <ref name=":44" /> comprising three consecutive stem-loop features with a so-called '''''Nexus stem loop''''' (SL1) predicted to facilitate a pseudoknot structure (F[[:File:FigIS200 605 65.png|ig. IS200.65]] and [[:File:FigIS200 605 66.png|Fig. IS200.66]]) with the 3’ ωRNA end as has been demonstrated in the case of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] ([[:File:FigIS200 605 42.png|Fig. IS200.42]]; <ref name=":33" />. This appears to be conserved in TnpB systems and in Cas12 guide RNA <ref name=":46" /><ref name=":28" /><ref name=":33" />. | |||
[[File:FigIS200 605 60.png|center|thumb|680x680px|'''Fig. IS200.60 Covariance models for the Cdi IStron (left) and Cbo IStron (right) ωRNA.''' Covariation/conservation model of TnpB group re(w)RNA. The circles indicate nucleotide conservation at a given site. Colors represent the extent of conservation. The boxes indicate co-conservation in the sequence library <ref name=":44" />. ]] | |||
<br /> | |||
[[File:FigIS200 605 61.png|center|thumb|680x680px|'''Fig. IS200.61 Comparison of ωRNAs from representative IS''605''-family IS''Dra2'' from ''[[wikipedia:Deinococcus_radiodurans|D. radiodurans]]'' (left) and the predicted structure of IS''605''-family CDiIStron (right).''' The color scheme is the same as in [[:File:FigIS200 605 42.png|Fig. IS200.42]], indicating specific secondary structure motifs. The position of the common pseudoknot (PK in blue) is indicated, and the RNA guide sequence is shown in red within a black outline. The triplex structure in the structural studies ([[:File:FigIS200 605 42.png|Fig. IS200.42]]) is shown in red <ref name=":44" />.]] | |||
<br /> | |||
[[File:FigIS200 605 62.png|center|thumb|680x680px|'''Fig. IS200.62 Comparison of predicted ωRNAs secondary structure of IS''Xfa'' from ''[[wikipedia:Xylella_fastidiosa|Xylella fastidiosa]]'' (left) and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-family IStron from ''[[wikipedia:Clostridium_botulinum|C. botulinum]]'' (right).''' Color scheme is the same as that in F[[:File:FigIS200 605 42.png|ig. IS200.42]] indicating specific secondary structure motifs. The position of the common pseudoknot ('''PK''' in blue) is indicated, and the RNA guide sequence is shown in red within a black outline. The triplex structure revealed in the structural studies ([[:File:FigIS200 605 42.png|Fig. IS200.42]]) is shown in red. Note that '''CboIStron''' has an additional stem-loop, SL3, between SL1 and SL2 <ref name=":44" />. ]] | |||
<br /> | |||
====TnpA<sub>S</sub> IS''607'' Excision and Insertion Activity==== | |||
A strong excision reaction, detected by [[wikipedia:Polymerase_chain_reaction|PCR]], was observed in ''[[wikipedia:Escherichia_coli|E. coli]]'' using a “donor” plasmid carrying a mini [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-based '''CboIStron''' lacking both ''tnpA<sub>S</sub>'' and ''tnpB<sub>S</sub>'' and TnpA<sub>S</sub> expressed from a second plasmid. Circular transposon copies were also observed ([[:File:IS607.7.png|Fig. IS607.7]]). This depended on a functional TnpA<sub>S</sub> active site and the presence of both IStron(IS) ends. The excision products, which appeared at a frequency of 5% in overnight cultures (several orders of magnitude higher than that catalyzed by TnpA<sub>Y</sub> in an IS''605'' system under similar conditions; Meers et al., <ref name=":19" /> carried a precise donor joint. Internal end deletions showed a requirement for 40 ('''LE''') and 60 ('''RE''') bp for robust excision. In particular, these carry a subset of the repeated elements (3 in '''LE''' and 2 in '''RE''') of those identified by Boocock and Rice<ref name=":45" /> and Chen et al. <ref name=":48">{{#pmid:30289389}}</ref> ([[:File:IS607.2.png|Fig. IS607.2]]) and implicated in the cooperative assembly of multiple TnpA<sub>S</sub> dimers into a synaptic complex <ref name=":48" />. Mutation of these repeats individually resulted in reduced excision activity while multiple mutations eliminated excision entirely <ref name=":44" />. | |||
Thus, as in the case of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''], transposase activity of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-related transposons, leads to loss from the donor site. | |||
To investigate [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-IStron insertion, a [[wikipedia:Chloramphenicol|chloramphenicol resistance]] tagged suicide (non-replicative) donor plasmid carrying abutted '''LE''' and '''RE''' and a ''pir''-dependent R6K replication origin, ''ori'', was used: grown in a ''pir''<sup>+</sup> strain and transformed into a ''pir''<sup>-</sup> strain expressing TnpA<sub>S</sub> ([[:File:IS607.8.png|Fig. IS607.8]]). Cell viability in the presence of [[wikipedia:Chloramphenicol|chloramphenicol]] was dependent on TnpA<sub>S</sub>. Integration specificity was investigated by genome-wide sequencing and found to strictly require a '''GG''' dinucleotide with a preference but not an absolute requirement, for the predicted TAM sequence: TGGG. | |||
====IStron-encoded TnpB nucleases==== | |||
Sequencing of RNA in immunoprecipitated '''CboIStron''' TnpB<sub>S</sub> (as has been shown in other TnpB systems; <ref name=":28" />) revealed that it strongly enriched its expected RNA partner. However, there was also a strong signal located 42nt downstream from the [[wikipedia:Covariance|covariance model]] ([[:File:FigIS200 605 60.png|Fig. IS200.60]], [[:File:FigIS200 605 62.png|Fig. IS200.62]]). Mutating the [[wikipedia:RuvABC|RuvC]] domain resulted in the disappearance of this species suggesting that it is the product of a TnpB<sub>S</sub>-catalized precursor transcript processing as found for other TnpB systems (e.g. Nety et al. <ref name=":26" />). | |||
Using a plasmid interference assay (see: [[IS Families/IS200 IS605 family#TnpBGst and IscBGst proteins are active RNA-guided Nucleases|TnpB<sub>Gst</sub> and IscB<sub>Gst</sub> proteins are active RNA-guided Nucleases]]''';''' [[:File:FigIS200 605 54.png|Fig. IS200.54]]) '''CboIStron''' TnpB<sub>S</sub> showed robust in RNA-guide mediated DNA cleavage and reduced colony formation (i.e. cell viability) by a factor of 10<sup>5</sup> in a process necessitating target DNA- guide RNA complementarity, a cognate TAM and a catalytically competent TnpB. The reaction was effective using either a natural configuration in which both '''CboIStron''' ωRNA and TnpB are expressed from a single transcript or when they are expressed independently. | |||
====Defining the CboIStron TAM Sequence: a double role in both nuclease and transposase recognition==== | |||
An assay similar to that shown in [[:File:FigIS200 605 40.png|Fig. IS200.40]] was used to precisely define the '''CboIStron''' TAM sequence required by TnpB during DNA recognition and cleavage. The target plasmid included a randomized 6N library together with the guide sequence and a [[wikipedia:Kanamycin_A|kanamycin resistance]] marker. Plasmids carrying the TAM sequence are strongly depleted under selective conditions. The results confirmed the predicted TAM was indeed '''TGGG''' (as shown in [[:File:FigIS200 605 59.png|Fig. IS200.59]]). | |||
Thus the TnpAS/TnpB system, like TnpAY/TnpB (e.g. Karvelis et al.,<ref name=":37" />) systems have evolved to specify the same DNA motif for nuclease recognition and for transposase recognition. | |||
====CboIStron TnpB/wRNA promotes transposon copy number maintenance==== | |||
Given the similarities of TnpB activities including recognition of the TAM/C<sub>L</sub>sequence by both transposase and nuclease, it seemed probable that the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] CBo TnpB performs the same function. | |||
This was tested using the retention/replacement assay of Meers et al.,<ref name=":19" />([[:File:FigIS200 605 54.png|Fig. IS200 54]] '''B''') where the '''CboIstron''' interrupted a plasmid-based [[wikipedia:Lac_operon|lacZ gene]] and oriented opposite to the direction of transcription of the gene (to avoid the splicing reaction as has been used in assays of transposition of retroelements. Expression of TnpA<sub>S</sub> alone resulted in transposon loss evidenced by about half [[wikipedia:Lac_operon|lac<sup>+</sup>]] colonies, a frequency reduced about 10x by co-expression of wildtype but not mutant TnpB. The veracity of these results was confirmed by [[wikipedia:Polymerase_chain_reaction|PCR]] <ref name=":44" />. | |||
====Busy Ends: Functional interactions between IStron splicing, TnpB and ωRNA==== | |||
To investigate the interactions between peel and paste/cut and copy transposition and [[wikipedia:Group_I_catalytic_intron|type I intron]] splicing, a minimal '''CboIStron''' lacking both TnpA and TnpB was first examined for its ability to undergo self-splicing in ''[[wikipedia:Escherichia_coli|E. coli]]''. The splicing reaction uses exogenous GTP to undergo a transesterification reaction at the 5’SS and a 3’ OH end of the upstream exon which then attacks the 3’SS to form an exon-exon joint and liberate the intron. [[wikipedia:Reverse_transcription_polymerase_chain_reaction|RT-PCR]] on extracted RNA revealed both spliced and unspliced products. The spliced product was a perfect joining of the two flanking exons with the exact sequence observed following TnpA<sub>S</sub>-mediated IStron excision. The reaction required the P7 - P9 catalytic region ([[:File:FigIS200 605 56.png|Fig. IS200.56]] '''A''') and a wildtype 5’SS sequence. | |||
Self-splicing without the intervention of protein factors was confirmed since identical products were obtained with purified RNA. | |||
Coding mRNAs for TnpA and TnpB can be produced from both unspliced and spliced introns, but since the ωRNA scaffold is severed from the guide region, spliced introns are no longer capable of forming functional ωRNAs ([[:File:FigIS200 605 56.png|Fig. IS200.56]] '''B'''). Conversely, TnpB-mediated ωRNA processing would separate 5′SS and 3′SS on two distinct molecules, allowing only trans – splicing TnpB - ωRNA binding would also likely obstruct physical interactions required at 3’SS for splicing. | |||
Consecutive deletions in the first 180bp of '''CboIstron''' from the ωRNA 5′ end ([[:File:FigIS200 605 63.png|Fig. IS200.63]] '''Top''') dramatically increased splicing. This included most of the ωRNA stem loops which was shown to eliminate TnpB-mediated RNA-guided DNA cleavage. Single or combinations of stem-loop deletions except for stem-loop 5 required for splicing had similar effects. A large change splicing activity (measured as spliced/Unspliced substrate by [[wikipedia:Polymerase_chain_reaction|PCR]]; [[:File:FigIS200 605 63.png|Fig. IS200.63]] '''Bottom''') for the 180bp deletion implies sequence and/or structural features in this region inhibit splicing in the full length wildtype IStron. The results suggest, perhaps not surprisingly, that the RNA structure alone influences splicing and that splicing and TnpB/ ωRNA activity are negatively correlated. | |||
[[File:FigIS200 605 63.png|center|thumb|680x680px|'''Fig. IS200.63. Effect of Deletions in the 3’ end of the CboIStron ωRNA on Splicing activity. Top:''' Color scheme is the same as that in [[:File:FigIS200 605 66.png|Fig. IS200.66]] indicating specific secondary structure motifs. The position of the common pseudoknot ('''PK''' in blue) is indicated and the RNA guide sequence is shown in red within a black outline. The series of deletions from the 3’ end are marked in base pairs. The small red arrow indicates the position at which the stimulation in splicing activity no longer occurs. '''Bottom:''' ratio of spliced and unspliced substrates measured by PCR <ref name=":44" />.]] | |||
Additionally, the pseudoknot ([[:File:FigIS200 605 42.png|Fig. IS200.42]] and [[:File:FigIS200 605 63.png|Fig. IS200.63]]) which appears to be a common feature in ωRNAs and which is essential for TnpB/ωRNA guided DNA cleavage, was observed to play an important role in inhibiting splicing: individual point mutations in PK1 ([[:File:FigIS200 605 63.png|Fig. IS200.63]]) which destroy its formation (and eliminate guided DNA cleavage) greatly stimulate splicing activity although mutation in PK2 which is shared by the 3’SS site eliminated both guide activity and splicing. | |||
Moreover, expression of wildtype or catalytically inactive TnpB ''in trans'' greatly reduced splicing indicating that TnpB-facilitated ωRNA binding was sufficient for splicing repression. This was further confirmed since TnpB-dependent repression was only observed when most of the ωRNA scaffold was present. It did not occur if ωRNA was replaced by ''[[wikipedia:Lac_operon|lac]]'' RNA. | |||
=====Busy Ends===== | |||
There is therefore clearly an intricate balance between the IStron splicing activity and the mutually exclusive functions involved in the '''''cut-and-copy phase of transposition'''''. The end sequences have evolved to accommodate both transposase (either TnpA<sub>Y</sub> or TnpA<sub>S</sub>), the TnpB/wRNA systems and the self-splicing 5’ and 3’SS. The advantage for the associated IS is that the intron has a wider target choice since it can occur, co-oriented, within coding sequences without consequences for expression for the interrupted gene. | |||
The intricate relationship is apparent from the fact that the ωRNA scaffold is contained within the 3’ intron end but the targeting sequence is contained within the downstream exon. Splicing therefore separates the guide and scaffold sequences. The balance between splicing and guided target DNA cleavage is modulated both by TnpB- ωRNA binding which, by occlusion, prevents the access of the upstream exon to the 3’ phosphate bond ([[:File:FigIS200 605 56.png|Fig. IS200.56]]'''B''') and by the ωRNA itself by its pseudoknot formation which competes for the 3’ splice site. An overview is provided in [[:File:FigIS200 605 64.png|Fig. IS200.64]] taken directly from Žedaveinytė et al.,<ref name=":44" />. | |||
<br />[[File:FigIS200 605 64.png|center|thumb|780x780px|'''Fig. IS200.64. Overall model for the balanced effects of intron splicing, TnpB/ωRNA, and TnpA<sub>S</sub> transposition activity in the maintenance and spread of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-family IStron elements.''' Similarly to IS''200''/IS''605''-family transposons, scarless TnpA<sub>S</sub>-mediated DNA excision of [[IS Families/IS607 family|IS''607''-family elements]] leads to transposon loss at the donor site and thus eventual transposon extinction, without the crucial function provided by TnpB/ωRNA in generating targeted DNA double-strand breaks and triggering homologous recombination to maintain the presence of the transposon ('''top'''). Unlike canonical [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] and [[IS Families/IS607 family|IS''607'' family]] transposons, [[wikipedia:Group_I_catalytic_intron|group I intron]]-containing IStrons mitigate their fitness costs on the host by splicing themselves out of interrupted transcripts at the RNA level, thereby restoring functional gene expression ('''bottom''', '''middle'''). Splicing and ωRNA maturation are mutually exclusive since splicing severs the ωRNA scaffold and guide sequences, and TnpB represses splicing through competitive binding of the 3′SS. The competition between intron splicing and TnpB/ωRNA activity thus regulates the dual objectives of maintaining transposon stealth and promoting transposon proliferation for IStron elements. A similar mechanism is hypothesized for IS''200''/IS''605'' - family IStrons. Taken from Žedaveinytė et al., <ref name=":44" />.|alt=]] | |||
== The Eukaryotic Connection: Fanzor eukaryotic TnpB relatives == | |||
Fanzor proteins are eukaryotic relatives of TnpB first identified in a bioinformatics search <ref name=":35" />. The first, SPu-1-1p (633-aa), was identified in a fungus ''[[wikipedia:Spizellomyces_punctatus|Spizellomyces punctatus]]''. The single Orf was flanked by 33-bp '''T'''erminal '''I'''vertead '''R'''epeats ('''TIRs)''' and a putative 2 bp TSD ('''TA'''). The 2,100-bp long SPu 1 element was found in 17 full length copies and homologues were also identified in a number of other eukaryotes including metazoans, fungi, protists and dsDNA viruses infecting eukaryotes. They are very distantly related to TnpB from both the IS''200''/IS''605'' and [[IS Families/IS607 family|IS''607'' families]] with which they share 15% identity over the 300 aa C-terminus (F[[:File:FigIS200 605 55.png|ig. IS200.55]]). This comprises a number of highly conserved residues including the TnpB [[wikipedia:Zinc_finger|Zn finger]] ([[:File:FigIS200 605 30.png|Fig. IS200.30]], [[:File:FigIS200 605 31.png|Fig. IS200.31]], [[:File:FigIS200 605 36.png|Fig. IS200.36]], [[:File:FigIS200 605 47.png|Fig. IS200.47]]). | |||
[[File:FigIS200 605 65.png|center|thumb|680x680px|'''Fig. IS200.65. Alignments of Fanzor and TnpB proteins. Top:''' conserved amino acids are marked above the cartoon. The numbers above refer to residue positions in SPu-1-1p or TnpB_IS''608''. '''Bottom:''' alignment showing the conserved residues indicated with blue vertical arrows with conserved amino acids marked above in single letter code. Fanzor1 and TnpB protein names are shown on the left. Rearranged from Bao and Jurka <ref name=":35" />, ]] | |||
A phylogenetic tree (F[[:File:FigIS200 605 56.png|ig. IS200.56]]) showed that '''Fanzor1''' formed a well separated clade and that Fanzor2 was associated with some, but not all, TnpB proteins. | |||
<br /> | |||
[[File:FigIS200 605 66.png|center|thumb|680x680px|'''Fig. IS200.66 Phylogenetic Trees of Fanzor/TnpB proteins.''' Branch colors are: dsDNA viruses (blue), metazoa (yellow), fungi (green), chlorophyta (cyan), rhodophyta (red), stramenopiles (dark red), choanoflagellida (orange), and amoebozoa (pink). TnpB proteins are from the ISfinder database or GenBank (with accession number). The tree is based on the alignment of a region including most of the N-terminal and the C-terminal portions. Colored branches represent eukaryotic examples. [[wikipedia:Mimivirus|Mimivirus]] examples and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''], and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS608 IS''608''] positions are indicated. Rearranged from Bao and Jurka <ref name=":35" />. ]] | |||
====TnpB Clade==== | |||
This is restricted to prokaryotic elements although a later analysis <ref name=":47" /> identified TnpB which are closely related to Fanzor2 (pro-Fanzor) and mainly found in [[wikipedia:Cyanobacteria|cyanobacteria]]. | |||
====Fanzor1==== | |||
Fanzor1 which is more distantly related to TnpB ([[:File:FigIS200 605 66.png|Fig. IS200.66]]) was observed to be associated with a number of different TE including so-called IS''4'' -type elements in the alga ''[[wikipedia:Ectocarpus_siliculosus|Ectocarpus siliculosus]]'' and its virus, virus 1 and Sola2 elements from the slime moulds ''[[wikipedia:Dictyostelium|Dictyostelium fasciculatum]]'' and ''[[wikipedia:Polysphondylium_pallidum|Polysphondylium pallidum]]'', Tc/mariner, Helitrons, MuDr relatives and several insect viruses. All these examples are from eukaryotes. | |||
====Fanzor2 and/or Fanzor1 are of bacterial origin==== | |||
On the other hand, Fanzor2 proteins are found associated with serine recombinases as in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']. It should be pointed out that most of these are from Giant viruses or nucleocytoplasmic large DNA viruses ([[wikipedia:Nucleocytoviricota|NCLDVs]]) that infect algae ([[wikipedia:Phycodnaviridae|Phycodnaviruses]]) and amoebae ([[wikipedia:Mimivirus|Mimivirus]]). It has been demonstrated that such viruses acquire genetic information from ingested/infecting bacteria <ref name=":49">{{#pmid:17109990}}</ref><ref>{{#pmid:18572389}}</ref><ref>{{#pmid:20551687}}</ref>. Indeed, several of these had been identified earlier as [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] derivatives which maybe of bacterial origin. These appear to be a different subclade to the majority of bacterial examples although a few prokaryotic elements are associated ([[wikipedia:Anabaena|''Anabaena'' sp]]. PCC 7120; [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISArma1 IS''Arma1'']; and [[wikipedia:Microcystis_aeruginosa|''Microcystis'' ''aeruginosa'']] NIES-843). | |||
A more extensive analysis based on a much larger sequence library <ref name=":51">{{#pmid:37380027}}</ref> identified more than 3000 representatives of the TnpB superfamily. These were chosen based on structural mining of an [[wikipedia:AlphaFold|AlphaFold database]] and sequence profiling of the [https://www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/ non-redundant NCBI database] and grouped these into a phylogenetic tree ([[:File:FigIS200 605 67.png|Fig. IS200. 67]]). The overall topology is similar to that described by Bao and Jurka <ref name=":35" /> the eukaryotic examples fall in two major groups comprising Fanzor1 and Fanzor2. | |||
[[File:FigIS200 605 67.png|center|thumb|680x680px|'''Fig. IS200.67. A Fanzor Tree.''' A tree based on a library of over 3000 examples based on structural[[wikipedia:AlphaFold|Alphafold]] and sequence profiling. The overall topology is similar to that described by Bao and Jurka <ref name=":35" /> with eukaryotic examples falling into two major groups comprising '''Fanzor1''' (blue sector) and '''Fanzor2''' (pink sector). The position of the Giant viruses and Fanzors from ''[[wikipedia:Guillardia|Guillardia theta]]'' (GtFz1), ''[[wikipedia:Spizellomyces_punctatus|Spizellomyces punctatus]]'' (SpuFz1), ''[[wikipedia:Hard_clam|Mercenaria mercenaria]]'' (MmeFz2), and ''[[wikipedia:Naegleria_lovaniensis|Naegleria lovaniensis]]'' (NlovFz2) are marked as well as [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] ''<ref name=":51" />''.]] | |||
'''Fanzor1''' is found in fungi, but also in protists, eukaryotic viruses, in particular giant viruses where there is a close association with internalized bacteria since these infect hosts living in symbiosis with bacteria (see Filee et al., <ref name=":49" />), arthropods and plants. '''Fanzor2''' is also found in several giant viruses and in ''[[wikipedia:Choanoflagellate|choanoflagellates]]'' which also feed on bacteria (and viruses) as well as in ''[[wikipedia:Stramenopile|Stramenopiles]], [[wikipedia:Alveolate|Alveolates]]'' and ''[[wikipedia:Rhizaria|Rhizaria]]'' which may also ingest bacteria. | |||
The authors manually examined eukaryotic branches (radiations), and sometimes simply single leaves, emerging from other TnpB branches around the tree <ref name=":51" />. These showed that they were also from hosts featuring lifestyles intimately connected to bacterial species (e.g. bacterivores or living with parasitic bacteria). | |||
These data were interpreted to suggested that the Fanzor proteins, '''FZ1''' and '''FZ2''', were originally acquired from bacterial hosts possibly twice <ref name=":51" /><ref>{{#pmid:37756409}}</ref>. It should be noted that those '''Fanzor2''' proteins which have similar spacing to TnpB ([[:File:FigIS200 605 65.png|Fig. IS200.65]]) and were attributed a eukaryotic association, are now thought to be misclassified prokaryotic proteins <ref name=":47" /> moreover, Yoon et al.,<ref name=":47" /> could find no support for independent evolution of '''Fanzor1''' from prokaryotic '''Fanzor1-like''' derivatives. | |||
They show a similar domain arrangement with increasing complexity from the closely related TnpB such as that of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] and the '''Fanzor2''' proteins ([[:File:FigIS200 605 68.png|Fig. IS200.68]]), through an expansion of the [[wikipedia:Helix-turn-helix|HTH]] (Rec) region in the '''Fanzor1''' derivatives to the extensive expansion found in the Cas12a proteins ([[:File:FigIS200 605 68.png|Fig. IS200.68]]). | |||
<br /> | |||
[[File:FigIS200 605 68.png|center|thumb|680x680px|'''Fig. IS200.68. Fanzor Domain Structure.''' Schematic of Fanzor proteins showing the relative positions of the different functional motifs and domains from Saito et al., <ref name=":51" />. '''Top:''' AsCas12a '''Middle:''' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] and the related '''Fanzor2''' proteins. '''Bottom:''' '''Fanzor1''' proteins. [[wikipedia:RuvABC|RuvC]] segment '''I''', '''II''' and '''III''' [Green] with the catalytic residues indicated above; [[wikipedia:Zinc_finger|Zinc Finger]] nuclease [red]; Arginine rich helix [blue]; Wedge domain [yellow]; Helical bundle[grey]. Note the N-terminal extension [white] in the '''FZ2''' proteins.]] | |||
<br /> | |||
====Fanzor2 and/or Fanzor1 may have evolved from an IS''607'' ancestor==== | |||
It has been proposed that Fanzor evolved from a clade of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-related elements <ref name=":47" /> with an unusual active site configuration. Interestingly, IS identified in the Mimi [[wikipedia:Nucleocytoviricota|NCLDV]], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISvMimi_1 ISvMimi_1] and [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISvMimi_2 ISvMimi_2] in ISfinder ([https://www.ncbi.nlm.nih.gov/nuccore/NC_006450.1/ NC_006450]), had already been identified as related to [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] based on their transposase, ''tnpA'', genes but carry a ''tnpB''-like downstream gene which is much longer than typical ''tnpB <ref name=":49" />''<ref name=":50">{{#pmid:19036122}}</ref>. | |||
Two features distinguish TnpB and Fanzors in spite of their sharing similar domain organizations <ref name=":47" />; summarized in [[:File:FigIS200 605 69A.png|Fig. IS200.69]] '''A'''): firstly, the Fanzor [[wikipedia:RuvABC|RuvC]]1 catalytic '''D''' is followed almost exclusively by a proline ('''GPG'''; [[:File:FigIS200 605 55.png|Fig. IS200.55]]) whereas in TnpB there is typically a hydrophobic residue, φ (where φIcan be: , L, F, W, Y and M), instead (DφG; [[:File:FigIS200 605 55.png|Fig. IS200.55]]; [[:File:FigIS200 605 35i.png|Fig. IS200.35]]; [[:File:FigIS200 605 36.png|Fig. IS200.36]]); secondly, TnpB [[wikipedia:RuvABC|RuvC]]2 typically contains a catalytic glutamate situated ∼ 50 residues up- stream of the [[wikipedia:Zinc_finger|ZF motif]] which they call E<sub>can</sub> for cannonical ([[:File:FigIS200 605 35i.png|Fig. IS200.35]]; [[:File:FigIS200 605 36.png|Fig. IS200.36]]; [[:File:FigIS200 605 58.png|Fig. IS200.58]]) whereas in the Fanzor [[wikipedia:RuvABC|RuvC]]2 this glutamate is six residues upstream of the [[wikipedia:Zinc_finger|ZF motif]], which they call E<sub>alt</sub> for alternative ([[:File:FigIS200 605 55.png|Fig. IS200.55]]; [[:File:FigIS200 605 59.png|Fig. IS200.59]]). Yoon et al., <ref name=":47" /> also suggest that Fanzors with the '''E''' spacing typical of TnpB and previously interpreted as novel Fanzor subtypes <ref name=":35" /><ref name=":51" /> appear to be prokaryotic TnpBs that had been mis-annotated as eukaryotic '''Fanzor2'''. | |||
[[File:FigIS200 605 69A.png|center|thumb|680x680px|'''Fig. IS200.69A. Alignment of the [[wikipedia:RuvABC|RuvC]] segments of Fanzor and TnpB. Top:''' Domain structure of TnpB<sub>Dra2</sub>. '''Bottom:''' Catalytic amino acids in each of the [[wikipedia:RuvABC|RuvC]] segments. [[wikipedia:RuvABC|RuvC]]II and [[wikipedia:RuvABC|RuvC]]III are shown separated by a [[wikipedia:Zinc_finger|Zn finger]] represented by a single C residue with a yellow background. TheDistances in residues between the [[wikipedia:Zinc_finger|Zn finger]] and the catalytic E in the [[wikipedia:RuvABC|RuvC]]II domain are shown below. Fanzors and TnpBs are differentiated by the DPG and DG (where is a hydrophobic residue). The E position in the TnpB is called E<sub>can</sub> while that in the fanzors is called and E<sub>alt</sub> . The names of the different elements are shown of the left and their respective groups on the right.]] | |||
<br /> | |||
====Fanzor1 may have evolved from Fanzor2==== | |||
However, when about 800 TnpB homologues obtained by database mining were used to create a [[wikipedia:Maximum_likelihood_estimation|maximum likelihood tree]] based on their [[wikipedia:RuvABC|RuvC]] features, Yoon et al., <ref name=":47" /> observed a clear separation between [[wikipedia:RuvABC|RuvC]] II E<sub>can</sub> (TnpB) and E<sub>alt</sub> (Fanzor) containing sequences ([[:File:FigIS200 605 69B.png|Fig. IS200.69]] '''B''') with those carrying E<sub>alt</sub> further distributed into two clades with [[wikipedia:RuvABC|RuvC]] I DG (TnpB) or DPG (Fanzor). A small number of TnpB-like proteins were observed to be closely related to Fanzor2, mainly from cyanobacteria, are associated with [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] TnpA also closely related to those associated with Fanzor2, and were called ‘'''''pro-Fanzors'''''’ ([[:File:FigIS200 605 69B.png|Fig. IS200.69]] '''B'''). | |||
Yoon et al.,<ref name=":47" /> could not find compelling evidence that '''Fanzor1''' was acquired directly from a prokaryote and it seemed possible that it might have evolved from an acquired '''Fanzor2''' protein. Bao and Jurka <ref name=":35" /> observed that '''Fanzor1''' can be found in a number of eukaryotic transposons such as Tc/''mariner,'' Helitrons, and, associated with what the authors call IS''4''-type Tpases in ESvi1B and ESv2 (brown algae ''[[wikipedia:Ectocarpus_siliculosus|Ectocarpus siliculosus]]'' (see Filee et al., <ref name=":49" /> <ref name=":50" /> for an early description of [[wikipedia:Nucleocytoviricota|NCLDV]]-associated IS). However, it was thought that '''Fanzor2''' was limited to prokaryotic [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-like elements <ref name=":35" />. To examine this further, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] derivatives encoding Fanzor1 and eukaryotic transposons carrying Fanzor2 were searched in the updated library <ref name=":47" />. | |||
[[File:FigIS200 605 69B.png|center|thumb|680x680px|'''Fig. IS200.69B. Maximum likelihood tree of TnpBs and Fanzors annotated by their [[wikipedia:RuvABC|RuvC]] features.''' The segments indicate Fanzor1 (blue), Fanzor2 (green) and '''''pro-fanzor''''' (red). TnpB are located on the left of the tree. The outer circle shows those examples which include a DPG motif in [[wikipedia:RuvABC|RuvC]]I (blue) or DfG (white). The second circle indicates those examples with the [[wikipedia:RuvABC|RuvC]] II E<sub>alt</sub> motif (orange) or E<sub>can</sub> motif (white). Yoon et al., <ref name=":47" />.]] | |||
[[File:FigIS200 605 69C.png|center|thumb|680x680px|'''Fig. IS200.69 C. Various TnpB and Fanzor encoding TE.''' Examples of TE of prokaryotic and eukaryotic origin carrying either TnpB or fanzor proteins together with an upstream transposase. '''Left''' (LE, red) and '''Right''' (RE, blue) ends are indicated. The ends were defined by the sequences of empty sites, and these also revealed the number and sequence of flanking direct repeats (green). Yoon et al., <ref name=":47" />]] | |||
As expected, the results identified a number of '''Fanzor1''' associated with a number of eukaryotic transposons but failed to identify [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']-associated Fanzor1. On the other hand, a number of '''Fanzor2''' were found to be associated with non-[https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] eukaryotic elements of different families ([[:File:FigIS200 605 69C.png|Fig. IS200.69]] '''C''') from organisms such as the algae ''[[wikipedia:Chloropicon|Chloropicon primus]]'' and various mollusc species including ''[[wikipedia:Hard_clam|Mercenaria mercenaria]]''. These were called '''Fanzor2*''' to distinguish them from the “''prokaryotic''” '''Fanzor2'''. | |||
The authors proposed that an ancestral '''Fanzor2''' gave rise to '''Fanzor1''' based on: the conserved [[wikipedia:RuvABC|RuvC]] profiles; the absence of prokaryotic '''Fanzor1''' proteins; and that distantly related '''Fanzor2''' was found in different eukaryotic transposon suggesting that it had been captured several times. Moreover, the absence of a detectable close evolutionary link between Fanzors and IS''200''/IS''605'' TnpBs is noteworthy since since [[IS Families/IS200 IS605 family#General|IS''200''/IS''605'' family]] members appear to be more abundant than [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] derivatives <ref name=":19" />. This suggested that specific features of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] TnpB might have facilitated their evolution in eukaryotes <ref name=":47" /> (see [[IS Families/IS607 family#Is IS607 TnpB the Ancestor of Fanzor Proteins.3F|Is IS''607'' TnpB the Ancestor of Fanzor Proteins?]]''').''' | |||
Although the Fanzor proteins are very widely distributed in the eukaryotic world ([[:File:FigIS200 605 67.png|Fig. IS200.67]]) and are sometimes found associated with potential transposable elements and in multicopy, their function has yet to be clearly established. | |||
====Fanzor Activity==== | |||
Saito et al.,<ref name=":51" /> used two FZ1 examples (from the soil fungus ''[[wikipedia:Spizellomyces_punctatus|S. punctatus]]'' , SpuFz1, and the alga ''[[wikipedia:Guillardia|G. theta]]'', GtFz1) and 2 FZ2 (from ''[[wikipedia:Naegleria_lovaniensis|N. lovaniensis]]'', NlovFZ2, and the marine mollusk ''[[wikipedia:Hard_clam|M. mercenaria]]'', MmeFZ2) for functional studies. | |||
Comparison of [[wikipedia:AlphaFold|Alphafold]] structural predictions of FZ proteins with known structures of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] (PDB: [https://www.rcsb.org/structure/8H1J 8H1J]) and AsCas12a (PDB: [https://www.rcsb.org/structure/5B43 5B43]) showed that despite a large sequence and length variation ([[:File:FigIS200 605 58.png|Fig. IS200.58]]) all six proteins share a common “core” domain including a WED and [[wikipedia:RuvABC|RuvC]] region. | |||
A predicted active catalytic site formed by positively charged residues is found in the [[wikipedia:RuvABC|RuvC]] region in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] TnpB, SpuFz1, GtFz1, NlovFz2 and MmeFz2. However, the core regions include various family-specific insertions. The Cas12 protein, AsCas12a (1307aa), carries a 900aa insertion in the WED domain, the REC region, which forms a protective channel for the spacer–target RNA-DNA heteroduplex region and is likely to be involved in this R-loop formation. It is reduced to three helices (100aa) in TnpB<sub>''Dra2''</sub>, NlovFz2 and MmeFz2 (see [[:File:FigIS200 605 34.png|Fig. IS200.34]]) and probably serves the same function. NlovFz2 and MmeFz2 are very similar to I TnpB<sub>''Dra2''</sub>, but each harbors a unique amino-terminal disordered region, with NlovFz2 featuring a 96aa and MmeFz 61aa segment. | |||
Characterization of the guide RNA system followed relatively established procedures: identification of the associated ωRNAs from the RE region of the FZ orthologues expressed in ''[[wikipedia:Saccharomyces_cerevisiae|S. cerevisiae]]'' by small [[wikipedia:RNA-Seq|RNA-seq]] for RNPs and secondary RNA structure prediction. | |||
Initially, SpuFz1 <ref name=":35" />, a single open reading frame (ORF) flanked by well-conserved 30bp terminal repeats was used for further functional studies. ''[[wikipedia:Spizellomyces_punctatus|S. punctatus]]'' DAOM carries 42 copies of this 2.1-kilobase pair Spu1 transposon: 19 with full length or remnants and 134 lacking the FZ ''orf'' which they call “'''''ghosts'''''” but which are equivalent to previously described '''MITES'''. [[wikipedia:RNA-Seq|RNA-seq]] revealed an 88–90-nt ncRNA species downstream of Fz in several ''[[wikipedia:Spizellomyces_punctatus|S. punctatus]]'' loci which could also be identified in pulldown experiments in [[wikipedia:Saccharomyces_cerevisiae|''Saccharomyces'' ''cerevisiae'']]. These included 14–15 nt of variable sequences beyond the conserved 75-nt region at the 3′ end. This was repeated with GtFz1 and with an Fz1 locus from ''G. theta'', four ''Fz2'' loci from ''[[wikipedia:Naegleria_lovaniensis|N. lovaniensis]]'', and two Fz2 loci from ''[[wikipedia:Hard_clam|M. mercenaria]]''. | |||
Purified complexes of all four proteins (SpuFz1, GtFz1, NlovFz2 and MmeFz2) with their ωRNA were used in an assay (e.g. [[:File:FigIS200 605 40.png|Fig. IS200.40]]) to define the associated TAM sequences (SpuFz1, CATA; GtFz1, TTAAN; NlovFz2, CCG; and MmeFz2, TAG) and to identify cleavage points on the non-target and target strands (NTS and TS) in a target DNA. These varied according to the protein generating: 5’ overhangs (SpuFz1), 5’ or 3’ overhangs or blunt ends (GtFz1), blunt ends (NlocFz2) or 3’ overhangs (MmeFz2). Finally, the structure of the SpuFz1 RNP complex with its target DNA was obtained by cryo-em and, as expected, was found to consist of an SpuFz1 monomer associated with a single ωRNA molecule together with the target DNA. | |||
====Functional Relationship Between Fanzor Evolution and IS''607'' TnpB==== | |||
Yoon et al., <ref name=":47" /> suggested that specific features of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] TnpB might have facilitated their evolution in eukaryotes. The most obvious major difference that [[IS Families/IS200 IS605 family#General|IS''200''/IS''605'' family]] members use a single strand circular DNA intermediate whereas [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] uses a double strand circular intermediate. | |||
However, they also noted that while [[IS Families/IS200 IS605 family#General|IS''200''/IS''605'' family]] members, insert downstream of a short motif ([[:File:FigIS200 605 3.png|Fig. IS200.3]], [[:File:FigIS200 605 5.png|Fig. IS200.5]], [[:File:FigIS200 605 7.png|Fig. IS200.7]]) <ref name=":0" /><ref name=":38" />, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] family members typically insert via recombination between matching dinucleotide motifs ([[:File:IS607.6.png|Fig. IS607.6]]) <ref name=":45" /><ref name=":48" />. This means that for IS''200''/IS''605'' members, the first nucleotide after the right end (or the ‘right-flanking nucleotide’) is variable whereas in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] members it is fixed ('''G''' in Fig. [[:File:IS607.6.png|IS607.6]]). | |||
The nucleotide abuting the IS''200''/IS''605'' right end corresponds to the start of the guide sequence in TnpBs <ref name=":37" /> and it was reasonable to determine whether this is also true for [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607'']. TAM depletion assays (which uses a target plasmid carrying a library of potential TAM sequences abutting a guide sequence e.g. F[[:File:FigIS200 605 44.png|ig. IS200.44]] '''ii''') in which TnpB Recognition of a TAM/target site results in depletion in the TAM carrying bacterial sub-population was used to investigate this. Using re(ω)RNA variable boundary mutants of IS''Xfa1'', an [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] member from ''[[wikipedia:Xylella_fastidiosa|Xylella fastidiosa]]'', (see also Žedaveinytė et al., <ref name=":44" />) it was observed that the RE abuting G nucleotide behaved as part of the re(ω)RNA scaffold rather than as part of the guide as has also been observed by Žedaveinytė et al for an IStron [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS607] derivative <ref name=":47" /><ref name=":44" /> . | |||
[[File:FigIS200 605 69D.png|center|thumb|680x680px|'''Fig. IS200.69D. [[IS Families/IS607 family|IS''607'' Family]], IS''Xfa1'' Flanking reRNA Nucleotide.''' Stem loop structure is taken from [[:File:FigIS200 605 62.png|Fig. IS200.62]]. The figure shows those nucleotides involved in pseudoknot ('''PK''') formation (blue circles). The arrow shows the flanking G nucleotide which is part of the “core” sequenceinvolved in the integration process. Experiments in which this was removed we used to determine that it forms part of the re(w)RNA scaffold rather than the guide sequence <ref name=":47" /><ref name=":44" />.]] | |||
Analysis of IS''Xfa1'' reRNA by Žedaveinytė et al., <ref name=":44" /> ([[:File:FigIS200 605 53.png|Fig. IS200. 53]]) and by Yoon et al., <ref name=":47" /> gave similar results with potential stem-loops and, like other re(ω)RNAs, including a pdeudoknot essential for activity. Based on similarities and differences between the reRNAs and from structural models, it was proposed ([[:File:FigIS200 605 59.png|Fig. IS200.59]] '''E''') that TnpB from an ancestor of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS607 IS''607''] gave rise to '''Fanzor1''' (for example, that found in [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISvMimi_1 IS''vMimi1'']) via an intermediate (Pro-fanzor) and that '''Fanzor2''' derived from a eukaryotic '''Fanzor1'''. | |||
[[File:FigIS200 605 69E.png|center|thumb|680x680px|'''Fig. IS200.69E. Proposed Pathway of Fanzor Evolution.''' The top of the figure shows the pathway with examples of microbial TnpB, Pro-Fanzor and Fanzor2 together with eukaryotic '''Fanzor2*''' and '''Fanzor1''' in the blue boxes Below the associated type of [[wikipedia:RuvABC|RuvC]]I and [[wikipedia:RuvABC|RuvC]]II motifs are indicated. The cartoon below shows a schematic domain structure of these proteins indicating the expansions, contractions and insertions occurring. The color scheme is that shown in [[:File:FigIS200 605 37.png|Fig. IS200.37]], [[:File:FigIS200 605 47.png|Fig. IS200.47]] and [[:File:FigIS200 605 58.png|Fig IS00.58]]. From Yoon et al.,<ref name=":47" />.]] | |||
== Y1 transposase domestication == | |||
There are many examples of eukaryotic transposases whose activities have been appropriated to perform various cellular functions (see <ref>{{#pmid:16937363}}</ref><ref>{{#pmid:23935529}}</ref><ref>{{#pmid:24348275}}</ref>. However, the very few examples of this domestication for prokaryotic enzymes concern Y1 transposases. | |||
====TnpA<sub>REP</sub> and REP/BIME==== | |||
Recently, a new clade of Y1 transposases (TnpA<sub>REP</sub>) was found associated with REP/BIME sequences in structures called '''REPtrons''' <ref name=":192">{{#pmid:20085626}}</ref><ref name=":272">{{#pmid:22199259}}</ref> ([[:File:FigIS200 605 70.png|Fig. IS200.70]] '''A'''). In spite of their compact size, bacterial genomes carry many repetitive sequences, often important for genome function and evolution. Among them, '''R'''epetitive '''E'''xtragenic '''P'''alindromic sequences (or REPs) are short DNA repeats of 20-40 bp that can form stem-loop structures preceded by a conserved tetranucleotide ('''GTAG''' or '''GGAG''') ([[:File:FigIS200 605 71.png|Fig. IS200.71]]). REPs are found in intergenic regions in many bacterial species, particularly in proteobacteria, at high copy number <ref name=":192" /><ref name=":54">{{#pmid:20528935}}</ref><ref name=":202">{{#pmid:23758774}}</ref>. | |||
There are nearly 590 copies in ''[[wikipedia:Escherichia_coli|Escherichia coli]]'' K12<ref name=":402">{{#pmid:2092362}}</ref> ([[:File:FigIS200 605 42.png|Fig. IS200.42]]) and up to 2200 copies in ''[[wikipedia:Pseudomonas|Pseudomonas]]'' sp GM79<ref name=":202" />. REPs can exist as individual units but can cluster in more complex structures called '''B'''acterial '''I'''nterspersed '''M'''osaic '''E'''lements (BIME). These are composed of two individual REPs in inverse orientation (REP and iREP) separated by a short linker of variable length. BIME are often found in consecutive tandem copies ([[:File:FigIS200 605 70.png|Fig. IS200.70]]). Several roles have been attributed to these sequences including genome structuring, post-transcriptional regulation and genome plasticity. REPs are known to interact with protein partners such as [[wikipedia:Bacterial_DNA_binding_protein#IHF|Integration Host Factor]]<ref>{{#pmid:8262044}}</ref>, [[wikipedia:DNA_gyrase|DNA gyrase]]<ref>{{#pmid:9427406}}</ref> and [[wikipedia:DNA_polymerase#Pol_I|DNA polymerase I]]<ref>{{#pmid:2197600}}</ref>. | |||
REPs also increase mRNA stability and can act as transcriptional terminators <ref name=":54" /> or as targets for different IS <ref name=":22" /><ref>{{#pmid:16563168}}</ref>. It has also been suggested that REP sequences are involved in REP sequences can downregulate translation of upstream genes dependent on trans-translation. This occurs only if they are within 15 nt of a termination codon. It has been suggested that that REPs can stall ribosomes, leading to mRNA cleavage and induction of the trans-translation process<ref>{{#pmid:25891074}}</ref>. Recombination at REP sequences has also been shown to be involved in the formation of F’ plasmid derivatives (the classic [[wikipedia:F-plasmid|F plasmid]] carrying various portions of the chromosome ([[:File:FigIS200 605 72.png|Fig. IS200.72]]) from Hfr strains<ref>{{#pmid:12511513}}</ref>. However, the origin of REPs and their dissemination mechanisms are poorly understood. | |||
[[File:FigIS200 605 70.png|center|thumb|680x680px|'''Fig. IS200.70.''' '''Top''': Representation of two categories of REP structures in ''[[wikipedia:Escherichia_coli|E. coli]]''/''[[wikipedia:Shigella|Shigella]]'' with mismatches in the hairpin stem in orange and light blue, violet box represents the conserved tetranucleotide '''GTAG'''. Corresponding iREP structures in red and dark blue where green box represents the complementary tetranucleotide CTAC. '''(ii)''' Structure of BIME: REP and iREP separated by linkers C or D. '''BIME''' are frequently found as consecutive copies. '''Bottom''': '''(iii)''' Examples of REPtrons from some representative ''[[wikipedia:Escherichia_coli|E. coli]]'' strains. TnpA<sub>REP</sub> is shown in gray, the flanking genes ''yafL'' and ''fhiA'' in green and in violet, respectively. Arrows represent the direction of transcription.]] | |||
<br /> | |||
[[File:FigIS200 605 71.png|center|thumb|680x680px|'''Fig. IS200.71. Rep/BIME distribution round the ''E. coli'' K12 chromosome.''' Data is from Bachellier et al., <ref name=":53">{{#pmid:10673002}}</ref> . The number of symbols at a given map position indicates the number of tandem copies of each element. Details on the sequence of the different BIMEs can be found in Bachellier et al., <ref name=":53" /> .]] | |||
Although more complex, REPtrons are reminiscent of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS200 IS''200''] group members ([[:File:FigIS200 605 70.png|Fig. IS200.70]]). However REPtrons do not appear to be mobile and, in general, a single copy of a given REPtron co-exists with numerous corresponding REP/BIME and genomes may harbor several distinct REPtrons<ref name=":192" /><ref name=":202" />. It has therefore been suggested that REP/BIMEs represent a special type of non-autonomous transposable element mobilizable by TnpA<sub>REP</sub>. | |||
In vitro analysis of REPtrons: Analysis of ''[[wikipedia:Escherichia_coli|E. coli]]'' REPtron activity in vitro has shown that, like TnpAIS''200''/IS''605'', TnpA<sub>REP</sub> strictly requires single stranded REP/BIME DNA substrates and is strand specific, only REP can be processed, whereas iREP are refractory to cleavage <ref name=":272" />. Purified ''[[wikipedia:Escherichia_coli|E. coli]]'' TnpA<sub>REP</sub> promotes ssREP cleavage (in the linker sequences either 3’ or 5’ to the REP structure) and rejoining, and this activity requires the conserved tetranucleotide '''GTAG''' and the bulge in the middle of the REP stem <ref name=":272" /><ref name=":212">{{#pmid:22885300}}</ref>. Cleavage in vitro is less specific than that of TnpAIS''200''/IS''605'' and occurs at a '''CT''' dinucleotide. | |||
In contrast to TnpAIS''608'' and TnpAIS''Dra2'', ''[[wikipedia:Escherichia_coli|E. coli]]'' TnpA<sub>REP</sub> is a monomer in solution and in the crystal structure<ref name=":212" />. Moreover, in the co-crystal structure, the short C-terminal tail is inserted into the active site blocking access to an ssDNA. It may, therefore, play a regulatory role in the activity. Indeed C-terminal truncation of TnpA<sub>REP</sub> resulted in increased cleavage activity relative to the full-length protein ''in vitro''. The biochemical and structural analysis suggested that the GTAG 5’ to the foot of the REP hairpin may play a similar role to the guide sequences GL/R in IS''200''/IS''605''. | |||
Moreover, structural data also highlighted numerous specific contacts between TnpA<sub>REP</sub> and '''GTAG''', explaining its importance in the activity and clearly distinguishing TnpA<sub>REP</sub> from TnpAIS''200''/IS''605'', which do not directly contact the guide sequences ('''''Cleavage site recognition'''''). The way by which TnpA<sub>REP</sub> promotes REP/BIME proliferation through their host genomes remains to be determined. | |||
[[File:FigIS200 605 72.png|center|thumb|680x680px|'''Fig. IS200.72. Rep sequences involved in the formation of F’ primes.''' Adapted from Slechta and Roth 2003. Integration of plasmid F and its excision to generate F’128. Filled arrows, genes; open arrows, insertion sequences. Gray arrows designate nested or recombinant insertion sequences. The arc inside the map at the right indicates the extent of pOX38, a minimal [[wikipedia:F-plasmid|F plasmid]] derivative still capable of both vegetative replication and conjugation. Insertion occurs by homologous recombination between an IS''3'' copy located on the chromosome and the resident IS''3'' copy on plasmid F. Excision occurs between two Rep sequences located downstream of the ''mhp'' locus and upstream of the ''dinB'' locus on the ''[[wikipedia:Escherichia_coli|E. coli]]'' chromosome.]]<br /> | |||
== IS''200'' Regulation and ''Salmonella'' Pathogenicity == | |||
Although IS''200'' is present in moderate to high copy number in certain bacteria (e.g. ''[https://pt.wikipedia.org/wiki/Salmonella Salmonella typhimurium]'' 5-12 copies and ''[https://pt.wikipedia.org/wiki/Salmonella Salmonella typh]i 26'' copies; and the IS''200'' family member, [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1541A IS''1541''], in ''[https://pt.wikipedia.org/wiki/Yersinia_pestis Yersinia pestis]'' >50 copies), it appears to be recalcitrant to transposition <ref name=":55">{{#pmid:15179601}}</ref> ([[IS Families/IS200-IS605 family#The IS200 group|The IS''200'' group]]) and exists in a “dormant” <ref name=":56">{{#pmid:28335027}}</ref> state. Although samples taken 30 years apart showed no changes in IS''200'' patterns <ref name=":55" /><ref>{{#pmid:2546037}}</ref><ref>{{#pmid:8779575}}</ref>, there is evidence that the closely related [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1541A IS''1541''] element is active in facilitating mouse infection by ''[https://pt.wikipedia.org/wiki/Yersinia_pestis Y. pestis]'' <ref name=":56" />. The high copy number, low transposition frequency and little accumulation of mutations would imply the existence of a robust selective pressure to maintain the IS <ref name=":60">{{#pmid:29120256}}</ref>. | |||
====Transposase expression is regulated by an antisense RNA==== | |||
IS''200'' elements express two small RNA molecules, a tnpA-encoding mRNA, mRNA<sub>tnp</sub> ([[:File:FigIS200 605 73.png|Fig. IS200.73]]) and a small antisense RNA, asRNA. Regulation of transposase expression occurs at several levels: IS''200'' '''LE''' includes a pair of inverted repeats which constitute a strong, bidirectional, r-dependent terminator (Fig. IS200.73a and b) reducing impinging transcription from entering the IS by ~85%; in addition, '''LE'''-proximal mRNA secondary structure sequesters the [[wikipedia:Shine–Dalgarno_sequence|Shine-Dalgarno sequence (SD)]] ([[:File:FigIS200 605 73.png|Fig. IS200.73]] '''b''' and '''d''') which inhibits ''tnpA'' translation by a factor of 20; and a small antisense RNA, asRNA (also called art200; <ref name=":57">{{#pmid:26044710}}</ref> which, by pairing with mRNA<sub>tnp</sub>, reduces translation 15 fold. A promoter for asRNA, P<sub>A</sub>, was identified which, when mutated, reduced art200 expression in an ''E. coli'' host. Additionally, direct binding of the chaperone RNA binding protein Hfq to a region upstream of the ribosome binding site also occludes ribosome binding <ref name=":57" />. | |||
TnpA expression has been investigated using a ''lacZ'' translational fusion (codon 10) with ''tnpA'' (codon 60; [[:File:FigIS200 605 73.png|Fig. IS200.73]] '''c'''). Reducing art200 expression using a promoter mutant (P<sub>A-6</sub>) led to a significant increase in ''lacZ'' expression (~13 fold compared to the wildtype IS2''00'' sequence). Also, when a ''lacZ'' fusion with a wildtype IS''200'' P<sub>A</sub> sequence was challenged with constructions carrying a 5’ ''tnpA'' mRNA segment (nts 45-298; tnpA<sub>trunc-wt</sub>; [[:File:FigIS200 605 73.png|Fig. IS200.73]] '''c''') complementary to asRNA and under control of a moderate (P<sub>tet</sub>) or a strong (P<sub>T7</sub>) promoter, a significant increase in ''lac'' expression occurred. This presumably resulted from titration and degradation of art200 by over production of its tnpA RNA complement. Use of a tnpA<sub>trunc-M1</sub> mutant, unable to pair with art200, failed to show this response indicating that RNA-RNA pairing is necessary. Moreover, supplying art200 ''in trans'', produced from its own promoter to the ''lacZ'' translational fusion with a IS''200'' P<sub>A-6</sub> mutation also reduced ''lacZ'' expression. | |||
[[File:FigIS200 605 73.png|center|thumb|720x720px|'''Fig. IS200.73. Organization and Expression of IS''200.'' a)''': Organization of IS''200''. IS''200'' Secondary structures in '''LE''' (red) and '''RE''' (blue), transposase, tnpA promoter (pL), ribosome binding site (RBS), and ''tnpA'' start and stop codons (AUG and UAA) are indicated as is an inverted repeat which constitutes a r-dependent transcription terminator. '''(i)''' DNA top strand with perfect palindromes at '''LE''' and '''RE''' in red and blue, interior stem-loop in black, '''(ii)''' RNA stem-loop structure in transcript originated from pL. '''b)''' Transcriptional organization. ''tnpA'' transcription (red) originates at about nt 40, but promoter elements are not defined; the ‘left end’ contains two internal inverted repeats (opposing arrows), one of which acts as a transcription terminator (nts 12–34). The second, (nts 69–138) in the 5’UTR of the tnpA mRNA sequesters the [[wikipedia:Shine–Dalgarno_sequence|Shine-Dalgarno sequence]]. IS''200'' in ''[[wikipedia:Salmonella|Salmonella]]'' also expresses a 90 nt sRNA (blue; called asRNA, art200, or STnc490) with perfect complementarity to the 5’UTR and the first three codons of ''tnpA''. The transcription start site and 3’ end for art200 in ''[[wikipedia:Salmonella|Salmonella]]'' (derived from RNA-Seq experiments) are shown. art200 promoter elements were predicted and their mutation was found to reduce or eliminate art200 expression. '''c)''' IS''200'' ''lacZ'' translational fusion (TLF) constructed to measure tnpA expression '''(i)'''. art200 RNA levels were manipulated by: introducing the P<sub>A-6</sub> mutations into the TLF. '''(ii)''' co-expression of a ''tnpA'' RNA segment (IS''200'' nts 45–298) under control of a constitutive, P<sub>tet</sub>, or strong, P<sub>T7</sub>, promoterto titrate art200. '''(iii)''' co-expression of art200 ''in trans''.from its own promoter. tnpA expression was under the control of its native regulatory elements. '''d)''' mFold was used to produce structures for art200 and tnpA1–173 with structural constraints from footprinting. art200 residues with weak (green circle) or strong (green circle and asterisk) reduction in Pb2+ sensitivity when mixed with tnpA1–173. Residues in tnpA1–173 showing strong (red circles) decreases in RNase A or T1, or strong increases (blue circles) in V1 reactivity on art200 addition. Two residues ( − 44 and − 47) showed increased V1 sensitivity and decreased A1 / T1 sensitivity (blue-red circles). Nucleotide changes present in M1 and M1’ versions of art200 and tnpA1–173 respectively are shown in bold. Mutation M1 alters three nucleotides in the tip of ''tnpA'' preventing pairing with art200. Mutation M1’ in art200 restores complementarity. The positions of the '''LS''' and '''LS’''' mutations used in analysis of regulation of the ''invF'' virulence system are also shown. '''e)''' “''kissing''” and extension. Recognition of tnpA RNA by art200 is initiated at the tips of the secondary structures (left) and proceeds by base pairing to each side (right) leading to degradation. The sequestered Shine-Dalgarno sequence (SD) of ''tnpA'' is indicated with a box and the translation start codon (AUG) is shown. ]] | |||
====Transposase expression is regulated by an antisense RNA and Hfq==== | |||
The involvement of Hfq in art200 RNA/mRNA interaction was suspected from results obtained with asRNA (RNAout) of [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS10R IS''10''] from the transposon [https://tncentral.ncc.unesp.br/report/te/Tn10-AF162223 Tn''10''] in which Hfq was found to promote antisense pairing with the transposase RNA ([https://tncentral.ncc.unesp.br/report/te/Tn10-AF162223 Tn''10'']; <ref>{{#pmid:23510801}}</ref>. art200 RNA was identified from Hfq immunoprecipitation (Hfq-IP) data sets as an asRNA (called STnc490 by the authors) complementary to 90nt of the 5’UTR (untranslated region upstream of the ''tnpA'' gene; [[:File:FigIS200 605 73.png|Fig. IS200.73]]) of IS''200'' in ''[https://pt.wikipedia.org/wiki/Salmonella Salmonella]'' <ref name=":58">{{#pmid:18725932}}</ref> <ref>{{#pmid:22538806}}</ref>. A similar RNA from the closely related ''[https://pt.wikipedia.org/wiki/Yersinia_pestis Yersinia pestis]'' [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS1541A IS''1541''] has also been identified <ref>{{#pmid:24040259}}</ref>. | |||
The involvement of Hfq in transposase (''lacZ'') expression ([[:File:FigIS200 605 73.png|Fig. IS200.73]] '''ci''') was confirmed by performing the ''[https://pt.wikipedia.org/wiki/Escherichia_coli E. coli]'' titration ([[:File:FigIS200 605 73.png|Fig. IS200.73]] '''cii''') and trans complementation experiments with art200 produced from its native promoter ([[:File:FigIS200 605 73.png|Fig. IS200.73]] '''ciii'''). Experiments carried out in isogenic ''hfq''<sup>+</sup> and ''hfq''<sup>-</sup> strains showed that: in the context of a wildtype P<sub>A</sub> sequence, LacZ expression was 5 fold higher in the absence of Hfq indicating that Hfq represses TnpA expression; the ''hfq''<sup>-</sup> and P<sub>A-6</sub> mutations act synergistically in derepressing TnpA expression; and art200 supplied ''in trans'' could repress the expression in the P<sub>A-6</sub> mutant independently of the Hfq status of the host <ref name=":57" />. | |||
Additionally, it was demonstrated that RNA pairing occurs ''in vitro''. Lead (Pb<sup>2+</sup>) acetate “footprinting” ([[:File:FigIS200 605 73.png|Fig. IS200.73]] '''d''') was carried out on a mixture of the two prefolded RNA molecules. The art200 secondary structure and interaction with the 5’- fragment of transposase RNA was probed by RNA footprinting in the presence and absence of a ''tnpA'' RNA fragment (tnpA<sub>1-173</sub>; see [[:File:FigIS200 605 73.png|Fig. IS200.73]] '''d'''). Specific residues of each RNA became refractory to cleavage indicating a transition from single-stranded to a double-stranded state: a number of art200 residues showed reduced Pb2+ sensitivity in the presence of tnpA1–173 and in tnpA1-173, certain residues showed strong decreases in reactivity to RNase A (which degrades single-stranded RNA at C and U residues) or T1 (specific for at G residues in single-stranded RNA), or strong increases in V1 (which cleaves base-paired nucleotides) on art200 addition. This occurred on the upper part of the art200 RNA stem loop [[:File:FigIS200 605 73.png|Fig. IS200.73]] '''d''' left green) on addition of the tnpA<sub>1-173</sub> RNA and on the upper part of the tnpA<sub>1-173</sub> stem loop [[:File:FigIS200 605 73.png|Fig. IS200.73]] '''Ad''' right red) on addition of art200 in a reciprocal experiment <ref name=":57" />. | |||
These observations are consistent with a model, shown in [[:File:FigIS200 605 73.png|Fig. IS200.73]] '''e''', in which the two RNA molecules initiate the interaction at the tips of the stem loops <ref name=":57" /> followed by propagation of base pairing to the left and right sides facilitated by Hfq. This base-pairing occludes the 30S ribosome binding site (as was demonstrated ''in vitro'') and inhibits TnpA expression ''in vivo'' <ref name=":56" /><ref name=":57" /> . | |||
It should be noted that even in the absence of asRNA, Hfq binds upstream of the ribosome binding site and prevents 30S binding directly to tnpA<sub>1-173</sub> RNA. | |||
====What is the impact of IS''200'' on its host genome?==== | |||
As a quiescent insertion sequence which carries no passenger genes, it was argued that IS''200'' probably does “not contribute transposition-dependent functions to the host” <ref name=":56" /> but its extreme stability over long time periods suggests that it might contribute important functions to its host. In a subsequent study, the [https://publish.uwo.ca/~haniford/ Haniford lab] raised the possibility that this function may be related to the RNA it expresses since small RNAs are known to play an important role in the control of bacterial cell processes often facilitated by Hfq (e.g. <ref>{{#pmid:27057757}}</ref><ref>{{#pmid:26805574}}</ref><ref>{{#pmid:22922465}}</ref><ref>{{#pmid:25030700}}</ref><ref>{{#pmid:23666921}}</ref><ref>{{#pmid:27044921}}</ref><ref>{{#pmid:26609136}}</ref><ref>{{#pmid:25649688}}</ref><ref>{{#pmid:22965121}}</ref><ref>{{#pmid:21760622}}</ref>. | |||
Although there are multiple layers of regulation which lead to low levels of ''tnpA'' translation, ''tnpA'' expression is relatively significant. In addition to the maintenance of some IS''200''-driven transcription, art200 expression appears to be growth phase regulated, increasing during ''[https://pt.wikipedia.org/wiki/Salmonella S. typhimurium]'' transition into stationary phase in rich medium and in growth media which stimulate ''[https://pt.wikipedia.org/wiki/Salmonella '''S'''almonella]'' '''P'''athogenicity '''I'''sland (SPI) expression <ref name=":58" /><ref name=":57" />. Additionally, art200 expression increases in stationary-phase while tnpA RNA expression decreases ∼ 5-fold. | |||
====Which host genes might be regulated by IS''200'' RNA?==== | |||
One interesting possibility is that these RNAs are in some way involved in regulating various processes in the host cell <ref name=":56" /> and evidence was obtained from RNA-seq experiments that the tnpA 5’UTR RNA acts as a repressor of a number of host genes by base pairing. | |||
In these experiments, the levels of IS''200'' RNA in ''[https://pt.wikipedia.org/wiki/Salmonella S. typhimurium]'' was modified in various ways. By introducing a plasmid carrying a truncated tnpA mRNA derivative, tnpA<sub>trunc WT-255</sub> (including nt 1-255), highly expressed constitutively from a P<sub>tet</sub> promoter, the levels of art200 RNA could be reduced by titration and degradation of the base-paired tnpA mRNA and art200 RNA. RNA-seq revealed 187 genes whose transcription was altered under these conditions. To determine whether the effect was due to depletion of art200 or to high tnpA<sub>trunc WT-255</sub> levels, a mutant, tnpA<sub>trunc M1-255</sub>, which prevents initiation of pairing ('''M''' in[[:File:FigIS200 605 73.png|Fig. IS200.73]] '''d''' and '''e'''), was used: art200 RNA-affected genes were expected to show differential expression with highly expressed tnpA<sub>trunc WT-255</sub> but not with tnpA<sub>trunc M1-255</sub> while tnpA<sub>trunc</sub>-regulated genes would show differential expression with both tnpA<sub>trunc WT-255</sub> and tnpA<sub>trunc M1-255</sub> compared to the empty vector plasmid. | |||
Overexpression of tnpA had a far greater effect on gene expression than did depleting art200: 77 genes were differentially expressed in the presence of either tnpA<sub>trunc WT-255</sub> and tnpA<sub>trunc M1-255</sub> while 6 were repressed by art200 (''glnH'' , ''gltI'' , ''acs'' , ''icdA'' , ''hutU'' and ''fadR'')<ref name=":56" />. Among the tnpA<sub>trunc</sub> RNA-regulated genes, a number were located on the ''[https://pt.wikipedia.org/wiki/Salmonella '''S'''almonella]'' '''P'''athogenicity '''I'''sland (SPI-1) and are involved in virulence: in particular the ''sipABC'' effector (translocon) proteins involved in cell invasion during ''[https://pt.wikipedia.org/wiki/Salmonella Salmonella]'' infection (e.g. <ref>{{#pmid:23544147}}</ref>), was repressed. The fact that these were affected by both tnpA<sub>trunc WT-255</sub> and tnpA<sub>trunc M1-255</sub> indicated that their expression is directly influenced by tnpA itself and does not depend on pairing with art200. | |||
====5’UTR RNA is processed.==== | |||
Since the model bacterium ''[[wikipedia:Salmonella|S. typhimurium]]'' strain LT2 is not virulent, Ellis and coworkers <ref name=":56" /> subsequently examined a virulent strain, SL1344, in some further studies. This carries 7 IS''200'' copies instead of the 6 carried by LT2. | |||
It was thought more likely that, instead of the entire 5’UTR of tnpA mRNA, the true regulator might be a processed form of this RNA. Total RNA was examined by Northern blot from the wildtype SL1344 strain and from a derivative in which one of the chromosomal ''tnpA'' genes had been fused to a ''tet'' promoter, P<sub>tet</sub> , providing constitutive expression. Three RNA species of ∼ 90 nt, ∼ 110 nt and > 310 nt were observed using a 5’UTR probe ([[:File:FigIS200 605 74.png|Fig. IS200.74]]). They could not be detected when 4 of the 7 IS''200'' tnpA copies were removed but overexpression of tnpA<sub>trunc WT-255</sub> resulted in the reappearance of both the 90 nt and 110 nt species indicating that both are located within the first 255 nucleotides of the tnpA mRNA <ref name=":56" />. | |||
Primer extension studies combined with TEX treatment (Terminator 5’ monophosphate dependent Exonuclease) which would remove processed RNA, revealed two processing sites ('''A''' and '''B''' in [[:File:FigIS200 605 74.png|Fig. IS200.74]]<nowiki/>a) at nts 19 and 108 [[:File:FigIS200 605 74.png|Fig. IS200.74]]<nowiki/>b). | |||
====The processed RNA represses SPI-1 genes by repressing ''invF'' transcriptional activator transcription.==== | |||
Cloned derivatives carrying the first 5’ NTR 50, 200 and 250 nts (tnpA<sub>50</sub>, tnpA<sub>200</sub> and tnpA<sub>250</sub>) all repressed the translocon genes ''sicA'', ''sicB'' and ''sicC'' by a factor of ~2.5 but not expression of a control gene, ''thrS''. Interestingly, tnpA<sub>50</sub> gave the strongest effect. It is the only derivative unable to pair with art200. All three tnpA RNAs but, in particular tnpA<sub>50</sub>, were also found to reduce expression of ''invF'' mRNA as did high tnpA<sub>trunc</sub> expression. InvF is an SPI-1-encoded transcription factor which activates the large SPI-1 T3SS ([[wikipedia:Type_III_secretion_system|Type III Secretion System]]) translocon operon which in turn promotes entry into the intestinal epithelium in the course of an infection. | |||
The effect of this on ''[https://pt.wikipedia.org/wiki/Salmonella S. typhimurium]'' SL1344 invasion of [[wikipedia:HeLa|HeLa cells]] showed that ''tnpA'' overexpression resulted in a reduction in invasion by a factor of 2 compared to the wildtype strain<ref name=":56" />. | |||
<br /> | |||
[[File:FigIS200 605 74.png|center|thumb|720x720px|'''Fig IS200.74. Organization and Expression of IS''200''. a)''' Full length ''tnpA'' mRNA with its translation start site (TSS) and translation initiation codon is shown together with the two processed products tnpA-110 and tnpA-90 revealed by Northern blots using o429 (green) as a probe and confirmed by primer extension. The two primer binding positions used for primer extension are also shown. Processing sites were inferred from the 5’ template ends of the small RNAs and their size on Northern blots. The results indicate that tnpA RNA is processed at two sites, A and B, to produce the two stable small RNAs: tnpA-110 at site B and tnpA-90 at sites A+B. '''b) Proposed processing pathway for tnpA RNA processing.''' Full-length tnpA is first processed at site ’B’ to generate tnpA-10 with subsequent processing at site ’A’ on tnpA (red) to generate the most stable tnpA species, tnpA-90. The binding site for the northern probe (oDH429) is indicated in green. '''c) Predicted pairing interaction between tnpA 1-63 and invF 104-160 RNAs.''' The main transcription start site (TSS, +1) for invF is 132 nt upstream of the start codon and nucleotides upstream of the TSS are shown in grey. invF nucleotides shown experimentally to be involved in pairing with tnpA are indicated in red; tnpA LS and and T1 mutations are shown in bold <ref name=":56" />.]] | |||
====Direct tnpA RNA-invF RNA Interaction ''in vivo'' and ''in vitro''.==== | |||
Although no complementarity was found between ''tnpA'' RNA and any sequences within the large [[wikipedia:Type_III_secretion_system|T3SS operon]], an extensive complementarity was identified between 5’ ''tnpA'' transcript nts 1- 63 and nts 104-160 upstream of the invF initiation codon ([[:File:FigIS200 605 74.png|Fig. IS200.74]] '''c'''). A gel shift assay demonstrated that the two regions can interact to give a slow mobility complex whether the ''invF'' RNA or ''tnpA'' (nt 1-173) were labelled. This did not occur when using ''tnpA'' RNA mutant LS (black in [[:File:FigIS200 605 74.png|Fig. IS200.74]] '''c'''). | |||
Pb<sup>2+</sup> footprinting with P<sup>32</sup>-labelled ''invF'' RNA and unlabeled WT or LS ''tnpA'' RNA revealed substantial pairing at nts 17-23 in the case of WT ''tnpA'' RNA (red in[[:File:FigIS200 605 74.png|Fig. IS200.74]] '''c'''). This interaction was probed ''in vitro'' using an overexpressing chromosomal ''tnpA'' RNA with and without a T1 mutation which eliminates the interactions observed using Pb<sup>2+</sup> (black in [[:File:FigIS200 605 74.png|Fig. IS200.74]] '''c'''). SL13344 WT RNA from late exponential phase showed reduced ''invF'' and ''sicA'' RNA levels while the T1 mutation had no effect. | |||
It not yet clear whether one or both processed ''tnpA'' small RNAs base-pair with ''invF'' mRNA to inhibit expression and induce a rapid transcript turnover neither is the role of art200 in tnpA regulation of SPI-1 gene expression <ref name=":60" />. | |||
Both ''tnpA'' RNA processing sites ([[:File:FigIS200 605 74.png|Fig. IS200.74]] '''a''') nts 19 (A) and 108 (B) are located approximately at the boundaries of the art200 pairing sequence ([[:File:FigIS200 605 74.png|Fig. IS200.74]] '''d''') raising the possibility that art200 might in some way be involved in ''tnpA'' RNA processing. There is a loose correlation of growth phase dependent expression of art200 and SPI-1 genes consistent with the notion that art200 might “silence” ''tnpA'' RNA to liberate ''invF'' expression <ref name=":60" />. | |||
====Growth phase dependence.==== | |||
InvF expression increases in late exponential/early stationary phase <ref name=":56" /> and it is possible that this is the result of changes in ''tnpA'' RNA levels. To examine this, expression of genes influenced by ''tnpA'' RNA (''invF'', ''sicA'', ''sipB'', ''sipC'' and ''prgH'') were monitored during different growth phases in WT and in the ''tnpA'' RNA over-expressing strain, both of which had identical growth rates. TnpA RNA over-expression had no effect during lag- or early exponential-phase but in late-exponential phase, ''tnpA'' RNA over-expression reduced ''invF'' (2-fold), ''sicA'' (5.5-fold), ''sipB'' (4-fold) and ''sipC'' (2fold) but not the ''invF''-independent SPI-1 encoded ''prgH''. | |||
Overall, the data indicated that ''tnpA'' RNA over-expression affects ''invF'' RNA levels only when expressed at lower levels than ''invF'' RNA, implying a stoichiometry between both transcripts and a direct ''tnpA''-''invF'' RNA interaction. Moreover, ''tnpA'' RNA over-expression only affected SPI-1 gene expression in early-exponential phase and late-exponential where the WT ''tnpA'' RNA levels are limiting relative to ''invF'' RNA. | |||
These results suggest that the native IS''200'' copies may be important in controlling expression of the pathogenicity island. This was tested by comparing ''invF'' RNA expression following deletion of 4 of the 7 IS''200'' copies (Δ tnpA<sub>4/7</sub>) where, in both early- and late-exponential phase, native tnpA RNA was reduced 2.5 fold and ''invF'' RNA increased 2 fold in early- and 1.5 fold in late-exponential phase. Additionally, excising all 7 IS''200'' copies (Δ tnpA<sub>7/7</sub>), resulted in a reduction in growth rate and a 25 fold increase in SPI-1 expression. These effects were reversed on introducing a chromosomal module (tnpA 7::kan-pTet, in which a kan pTet cassette was placed in front of tnpA 7 – in IS200#7- such that the Tet promoter drives transcription of tnpA) which overexpresses ''tnpA'' RNA. Additionally, in the [[wikipedia:HeLa|HeLa cell]] invasion assay, the overexpressing ''tnpA'' strain showed reduced invasiveness while the Δ tnpA<sub>7/7</sub> strain was between 5 and 10 fold more invasive. | |||
====TnpA controls expression of more than 200 host genes==== | |||
Further analysis of differential gene expression <ref name=":61">'''A small RNA derived from the 5’ end of the IS200 ''tnpA'' transcript regulates multiple virulence regulons in ''Salmonella'' Typhimurium.''' Ryan S. Trussler, Naomi-Jean Q. Scherba, Michael J. Ellis, Konrad U. Förstner, Matthew Albert, Alexander J. Westermann, David B. Haniford. bioRxiv 2024.06.26.600842; doi: https://doi.org/10.1101/2024.06.26.600842 </ref> using comparative RNA seq between the invasive Salmonella SL1344 and the derivative strain lackng all seven IS''200'' copies revealed more than 200 genes affected by the tnpA 5’UTR. These included master regulators for HilD (invasion) and FlhDC (flagellar) regulons, the cysteine biosynthesis regulon and phsABC, a thiosulfate reductase operon. These effects resulted in an 80-fold increase in a [[wikipedia:HeLa|HeLa cell]] invasion assay. Some of these interactions are shown in [[:File:FigIS200 605 75.png|Fig. IS200.75]]. | |||
[[File:FigIS200 605 75.png|center|thumb|720x720px|'''Fig IS200.75. Model for Part of the ''Salmonella'' SPI-1 regulatory network <ref name=":61" />'''. The figure shows ''tnpA'' regulation of ''hilD'' and the impact of 5’tnpA deletion on the cysteine regulon and the ''phsABC'' operon. Red lettering indicates pathways predicted to be regulated by 5’tnpA: inhibition pathways are shown in red, activation pathways in green. Question marks indicate that the regulation of ''lrhA'' , ''crp'' and ''sirA'' by 5’ ''tnpA'' RNA may not be direct. Dotted lines indicate genes that may be regulated by LrhA. There are additional components and interactions which are not shown including Lon inhibition of HilC and HilD and HNS inhibition of HilA, HilE and RtsA <ref name=":63">{{#pmid:31428589}}</ref>.]] | |||
These genes are central to ''[[wikipedia:Salmonella|Salmonella]]'' virulence: The HilD transcription factor is a central control element in a complex cascade of reactions. It acts upstream of IlvF. The phsABC operon is involved in anerobic growth conditions and its inactivation results in increased invasiveness. Other details can be found in Trussler et al 2024<ref name=":61" /> and Lou et al <ref name=":63" />. | |||
====A model of the IS''200'' regulatory network==== | |||
A model for RNA-mediated metabolism involving IS200 RNA ([[:File:FigIS200 605 76.png|Fig. IS200.76]]; <ref name=":60" />) proposes that the 5’- UTR terminal segment of ''tnpA'' mRNA assumes a folded structure recognized by art200 asRNA facilitated by Hfq which blocks TnpA translation and leadsto degradation (bottom) or is processed and recognizes the 5’ end of the ''invF'' message blocking InvF translation and provoking degradation. Thus deletion of native IS''200'' copies increases invasion (and reduces growth rate) <ref name=":60" />. One possibility is that art200 pairs with the ''tnpA'' 5’UTR to prevent its processing. | |||
[[File:FigIS200 605 76.png|center|thumb|720x720px|'''Fig IS200.76. A model for the activities of the IS200 5’UTR.''' IS200 is represented as a black-bordered box (left). The 5’- UTR terminal segment of ''tnpA'' mRNA (red line) is highly structured sequestering the TnpA translational initiation signals. Binding of Hfq also directly inhibits translation. ''tnpA'' mRNA is processed to generate two highly structured small RNAs, ''tnpA'' 110 and ''tnpA'' 90 (red above). ''tnpA'' 90 is stable and is generated from ''tnpA'' 110. IS''200'' also produces a small antisense RNA, art200 (blue below) which also assumes a folded structure and recognizes ''tnpA'' mRNA/''tnpA'' ''110/tnpA'' 90 by art200 (bottom middle), a process facilitated by Hfq, leading to propagation of base pairing and degradation (bottom right). ''tnpA'' mRNA processing may be regulated by art200. The processed ''tnpA'' ''110/tnpA'' 90 is also involved in regulation of invF, the transcriptional activator of effector proteins for the SPI-1 encoded Type 3 Secretion System (T3SS) (black-bordered box right). It recognizes the 5’ end of the ''invF'' message (dark blue), possibly facilitated by Hfq, blocking translation of InvF, and provoking degradation. ''tnpA'' RNAs are also involved directly or indirectly in regulating a large number of other genes involved in ''Salmonella'' virulence <ref name=":56" /><ref name=":60" />.]] | |||
<br /> | |||
== TnpB co-option as transcription factors. == | |||
The notion that RuvC evolved to generate the [[IS Families/IS200-IS605 family#TnpB and IscB are Related to the RNA-guided nucleases Cas12 and Cas9.|TnpB and IscB families of guide endonucleases]], which maintain copy number of their associated transposable elements, and then into Cas12 and Cas9 proteins (TnpB; [[:File:FigIS200 605 37.png|Fig.IS200.37]]), which act in bacterial immunity to invading mobile elements, led to the question of whether they might have evolved to fulfill other functions. For example, type V-K CRISPR-associated transposases similarly rely on nuclease-inactivated Cas12k homologues that are still active for RNA-guided DNA binding, facilitated programmable sequence-specific targeted transposition. | |||
In view of the identification of atypical Cas12 homologues, Cas12c and Cas12m, which have lost their cleavage functions but not their binding function and are capable of repressing gene transcription, preventing bacteriophage proliferation <ref>{{#pmid:35659325}}</ref> or plasmid invasion <ref>{{#pmid:36427491}}</ref>.Weigand et al., <ref name=":59">{{#pmid:38076855}}</ref><ref name=":64">{{#pmid:38926585}}</ref> sought to determine whether some members of the TnpB group had also been domesticated and had assumed transposition-independent functions. | |||
====Repurposing TnpB proteins==== | |||
Truncated copies of TnpB had been noted early in the identification of the IS''200''/IS''605'' family and using a sample of only 85 [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=IS605 IS''605''] derivatives in [https://tncentral.ncc.unesp.br/ISfinder/index.php ISfinder]. They had been believed to be decay products but with hindsight should be considered as repurposed derivatives ([[IS Families/IS200-IS605 family#IS decay|IS Decay]]; [[:File:FigIS200 605 30.png|Fig. IS200.30]]; see <ref>{{#pmid:24499397}}</ref><ref name=":65">{{#pmid:26350330}}</ref>. Weigand et al.,<ref name=":59" /><ref name=":64" /> used a library of nearly 96,000 TnpB-related proteins extracted from public databases. They identified over 500 nuclease-inactive variants containing at least 2 mutations in the DED catalytic nuclease triad. Doubly inactivated catalytic mutants were chosen since it had been shown that one of the three RuvC catalytic amino acids can occur at an alternative position <ref>{{#pmid:37756409}}</ref>. These were obtained from “'''''diverse genetic neighborhoods'''''” including examples which were not associated with ''tnpA''. | |||
In view of their distribution across the phylogenetic tree ([[:File:FigIS200 605 77.png|Fig. IS200.77]]), Weigand et al., <ref name=":59" /><ref name=":64" /> suggest that they may have arisen independently over time from different ''tnpB'' genes. These showed different degrees of mutation ranging from examples with one or more mutated catalytic site residues to homologues with C-terminal truncated domains removing RuvC and the zinc finger domains ([[:File:FigIS200 605 77.png|Fig. IS200.77]]). Interestingly, among the mutated TnpB examples in the original [https://tncentral.ncc.unesp.br/ISfinder/index.php ISfinder] sample, 2 had lost their C-terminal zinc finger domains <ref name=":65" /> and, since they remain associated with a ''tnpA'' gene, might represent examples of a ''tnpB'' on an evolutionary path to alternative functions. | |||
[[File:FigIS200 605 77.png|center|thumb|720x720px|'''Fig. IS200.77. Phylogenetic tree of TnpB-related proteins.''' The figure shows the positions of previously studied homologues [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISHp608 IS''608''], [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''], IS''Gst2'' and Kra (blue circles) and examples of the three newly identified TldR groups associated with ''csrA'', ''oppF'' and the phage-associated fliCP (green circles) highlighted. Circles: '''1)''' protein size. '''2)''' tyrosine-family or serine-family TnpA transposase association and '''3)''' RuvC DED active site intactness <ref name=":59" /><ref name=":64" />.]] | |||
Weigand et al., <ref name=":59" /><ref name=":64" /> also used [https://alphafold.ebi.ac.uk/ AlphaFold] predictions which provided supporting structural evidence of sequential mutation of the TnpB nuclease catalytic site. However, in each case, the TnpB RNA-binding interface, which determines TnpB DNA targeting functions, had been retained. | |||
Further studies revealed that several of these TnpB derivatives with inactive nucleases function as repressors of expression of a number of genes: they were called '''TldRs''' (for TnpB-Like nuclease- Dead Repressors).[[File:FigIS200 605 78.png|center|thumb|720x720px|'''Fig. IS200.78.''' '''Multiple sequence alignment of representative TnpB and TldR homologues.''' '''a)''' The three core RuvC regions (black bold text) carrying the remaining catalytic acidic amino acid DED triad (red text) are shown with their accompanying co-ordinates in the protein. Gray background shows highly conserved residues. Dashes indicate missing residues. The Left column includes the Genbank ID. The domain structure of the [https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2''] TnpB is shown as a cartoon: '''Below:''' RuvC segments I, II and III [green]; [[wikipedia:Zinc_finger|Zinc-finger]], HNH nuclease [red]; Arginine rich helix [blue]; Wedge domain [yellow]; C-terminal tail [white] <ref name=":30" />. The bottom panel highlighta the decay of RuvC active site motifs and loss of the C-terminal zinc-finger (ZnF)/RuvC domain <ref name=":59" /><ref name=":64" />.]] | |||
====TldR Genetic Context==== | |||
Many TldR were found neighboring non-IS-related genes. These were found to be frequently clade-specific ([[:File:FigIS200 605 77.png|Fig. IS200.77]]): one group was consistently associated with [[wikipedia:ABC_transporter|ABC transporter systems genes]] including ''oppF'', mainly present in [[wikipedia:Enterococcus|Enterococci]] and located downstream ([[:File:FigIS200 605 79B.png|Fig. IS200.79A b]]); another with ''fliC'', encoding the flagellin subunit of flagellar assemblies in [[wikipedia:Enterobacteriaceae|Enterobacteriaceae]] and associated with a prophage (called ''fliC<sub>p</sub>'' to distinguish it from the genomic copy – note that the ''fliC<sub>p</sub>''-associated TldR was identified in nearly 30 prophages), also located downstream; and a third group from [[wikipedia:Clostridia|Clostridia]] also associated with flagellin genes and with the carbon storage regulator, ''csrA'', involved in flagellar subunit regulation and generally located downstream. Such strong associations suggested the TldRs may have functional role. | |||
=====TldR Guide RNA Identification and binding===== | |||
It was of considerable interest to determine whether small guide RNAs (gRNA) are associated with TldRs. Generally, these are composed of a “''scaffold''” domain followed by a guide sequence produced from the flanking sequence at the right end ('''RE''') of the IS (see: [[IS Families/IS200-IS605 family#Structure of TnpB-reRNA in association with DNA|Structure of TnpB-reRNA in association with DNA]]). However, since there are no '''RE''' associated with the TldRs to define potential guide sequences, Weigand et al., <ref name=":59" /><ref name=":64" /> used a co-variance approach previously used for gRNA identification (see: [[IS Families/IS200-IS605 family#Conserved secondary structure motifs|Conserved]] secondary structure motifs; <ref name=":66">{{#pmid:37758954}}</ref> combined with BLAST. This identified the '''LE/RE''' boundaries and potential guide RNAs associated with active TnpB homologues closely related to fliC-associated and oppF-associated TldRs ([[:File:FigIS200 605 79B.png|Fig. IS200.79A c]]) and from these, they deduced the potential gRNA sequences of the fliC-associated and oppF-associated TldRs. In addition, RNA-seq datasets from [[wikipedia:Enterococcus|Enterococci]] carrying fliC–tldR or oppF–tldR <ref name=":67">{{#pmid:33324581}}</ref> revealed reads covering the TldR ''orfs'' and the proposed RNA predicted from the co-variance, thus confirming their expression. | |||
TldR gRNA expression was investigated by cloning and expressing Enterococcal FLAG-tagged fliC<sub>P</sub>-associated TldR (''[[wikipedia:Enterobacter|Enterobacter hormaechei]]'', EhoTldR; [[:File:FigIS200 605 79B.png|Fig. IS200.79 '''A''' anf '''B''']]) and oppF-associated TldR (''[[wikipedia:Enterococcus_faecalis|Entercoccus faecalis]],'' Efa1TldR) on a 240 bp DNA segment including their putative guide RNA scaffold and 20-bp guide sequence in ''[[wikipedia:Escherichia_coli|E.coli]]''. RNA immunoprecipitation ('''IP''') on total RNA analyzed by sequencing and mapping, revealed an 113nt EhoTldR gRNA comprising a 97 nt scaffold and a downstream 16 nt guide sequence and a 109nt Efa1TldR gRNA, comprising a 100-nt scaffold and an approximately 9-nt guide. A shorter guide of 11nt was also identified from a homologue in publicly available RNA-seq data <ref name=":67" />. | |||
Although TnpB has been shown to process its transcript to generate the final gRNA (see: [[IS Families/IS200-IS605 family#Generating re.28.CF.89.29RNA: Processing|Generating re(ω)RNA: Processing]]; <ref>{{#pmid:37272862}}</ref> using its RuvC activity, the TldR are RuvC-defective. It was suggested that the mature gRNA may simply be a structure which is protected from other cell ribonucleases. | |||
[[File:FigIS200 605 79A.png|center|thumb|720x720px|'''Fig.IS200.79. A)''' '''Guide RNA Position and sequence.''' '''a)''' Genomic architecture of well-studied tnpB (lilac) encoding insertion sequences with ([https://tncentral.ncc.unesp.br/ISfinder/scripts/ficheIS.php?name=ISDra2 IS''Dra2'']) and without (IS''Gst2'') a ''tnpA'' (lilac) gene showing the location of the gRNA (orange). '''b)''' Novel regions encode TldR (bottom) in association with: (i) prophage-encoded fliCP; (ii) flagellin gene oppF; and (iii) a transcriptional regulator (csrA) of an accompanying flagellin gene Genomic accession numbers are shown. The TldR are shown in lilac. '''c)''' Alignment of a representative fliCP–tldR locus from E. cloacae | |||
(Ecl) and related tnpB loci at a region near the 3′ end of covariance model (CM)coverage (yellow box) reveals conservation of 3′ scaffold sequences. Pmi, Eco2, Sen, Bub together with putative guide sequences (red box) <ref name=":59" /><ref name=":64" />.]] | |||
[[File:FigIS200 605 79B.png|center|thumb|720x720px|'''Fig.IS200.79. B) A larger data set of potential gRNA.''' Additional examples of fliCP-associated (top) and oppF-associated (bottom) TldR potential gRNA are shown together with their equivalent gRNA from related active TnpB copies <ref name=":59" /><ref name=":64" />. ]] | |||
=====What do TldR gRNA target?===== | |||
To determine the targets of fliC<sub>P</sub>- phage-associated TldR, a library of gRNA assembled from bioinformatic and R IP-seq analyses <ref name=":59" /><ref name=":64" /> was used as queries in a BLAST search. This yielded a strong match of the prophage –associated in a genomic region carrying flagellar genes from ''[[wikipedia:Enterobacter_cloacae|E. cloacae]]'' AR_1054 ([[:File:FigIS200 605 80.png|Fig. IS200.80a]]) and located between the genomic ''fliC'' and ''fliD'' genes ([[:File:FigIS200 605 80.png|Fig.IS200.80a]] '''top''') at a position distinct from the prophage. | |||
The putative gRNA target overlapped a potential σ28 promoter ([[:File:FigIS200 605 80.png|Fig.IS200.80a]] '''middle''') which, in ''[[wikipedia:Escherichia_coli|E. coli]]'', is recognized by FilA/σ28 and drives ''fil'' expression. Moreover, the putative target sequence was flanked on its 5’ side by the sequence GTTAT. This is conserved in a number of prophage genomes identified in several Enterobacter species ([[:File:FigIS200 605 80.png|Fig.IS200.80a]] '''bottom''') and resembles the TAM sequences recognized by active TnpB nucleases similar to the TldRs. | |||
It was suggested that phage TldR gRNA might repress expression of the host FliC while maintaining its own FliC<sub>p</sub> synthesis. | |||
Further studies used the short 9nt guide associated with oppF TldRs together with the [[IS Families/IS200-IS605 family#Exploring and defining TAM sequences|TAM sequence]], TTTAA/TTTAT, of related TnpB and uncovered a potential target upstream of the initiation codon of a chromosomal ''oppA'' [[wikipedia:ABC_transporter|ABC transporter]] gene in ''[[wikipedia:Enterococcus_faecalis|E. faecalis]]'' ([[IS Families/IS200-IS605 family#Exploring and defining TAM sequences|TAM sequence]], TTAAA; [[:File:FigIS200 605 80.png|Fig.IS200.80b]]) and ''[[wikipedia:Enterococcus|E. cecorum]]'' ([[IS Families/IS200-IS605 family#Exploring and defining TAM sequences|TAM sequence]], TTTAA; [[:File:FigIS200 605 80.png|Fig.IS200.80c]]). In both cases, the gRNA sequence (9nt for ''[[wikipedia:Enterococcus_faecalis|E. faecalis]]'' and 7nt ''[[wikipedia:Enterococcus|E. cecorum]]'') was complementary to sequences overlapping the promoter, again suggesting that the TldR/gRNA would repress expression of the associated ''opp'' operon by competition with RNA polymerase. Interestingly the analysis of ''[[wikipedia:Enterococcus|E. cecorum]]'' identified a significant number of additional potential targets ([[:File:FigIS200 605 81.png|Fig.IS200.81]]), all with a 7nt complementary core and a 5’TTTAA [[IS Families/IS200-IS605 family#Exploring and defining TAM sequences|TAM sequence]]. This suggests that the oppF-TldR may be involved in an extended regulatory network. | |||
<br /> | |||
[[File:FigIS200 605 80.png|center|thumb|720x720px|'''Fig. IS200.80. Identification of guide RNAa. a)''' fliCp (phage-associated) gRNA. '''Top:''' putative Ecl TldR-associated gRNA from revealed a putative genomic target near the (lilac) predicted promoter of a distinct (host) copy of fliC located approximately 1 Mb away. '''Middle:''' predicted TAM (yellow box) and gRNA–target DNA (red) base-pairing interactions (blue), relative to the fliC coding sequence at left. The direction of transcription is shown by the large blue arrow. The putative -10 and -35 promoter elements are boxed (green). '''Bottom:''' [https://weblogo.berkeley.edu/logo.cgi WebLogos] of predicted guides and genomic targets associated with diverse fliCP-associated TldRs from a number of prophages from various [[wikipedia:Enterobacter|Enterobacter]] species showing the variation in the target sequence and the conserved TAM-like pentanucleotide, GTTAT. '''b)''' oppA (genomic) gRNA from ''[[wikipedia:Enterococcus_faecalis|E. faecalis]]''. A potential Efa1 TldR-associated RNA target located upstream of Efa1 TldR (lilac) in the same operon and spanning the predicted opp operon promoter with a TAM sequence, TTTAA. '''c)''' oppA (genomic) gRNA from ''[[wikipedia:Enterococcus|E. cecorum]]''. A potential Efa1 TldR-associated RNA target located upstream of Ece TldR (lilac) in the same operon and spanning the predicted opp operon promoter. with a TAM sequence, TTTAA. Note that the two opp gRNAs in '''b)''' and '''c)''' target opposite strands<ref name=":59" /><ref name=":64" />.]] | |||
==== | ====Functional Analysis: TldR/gRNA target their cognate target sites==== | ||
Fifteen TldR/gRNA examples were chosen for functional analysis: several fliC<sub>P</sub>-TldR [[:File:FigIS200 605 78.png|(Fig. IS200.78B]] '''top''') and ''oppF''-associated ([[:File:FigIS200 605 78.png|Fig. IS200.78B]] '''bottom''') loci were cloned together with their putative gRNA and expressed in an ''[[wikipedia:Escherichia_coli|E. coli]]'' derivative carrying predicted target site integrated into the chromosome. Their genome-wide binding specificity was then ascertained by [[wikipedia:ChIP_sequencing|CHIP-Seq (chromatin immunoprecipitation)]] using FLAG-tagged TnpB and TldR and subsequently sequenced ([[wikipedia:ChIP_sequencing|ChIP–seq]]). For the majority of the examples, the results revealed strong peaks corresponding to the expected target site: the nuclease-inactive TldR therefore retained the ability to bind to specific target sites in genomic DNA. | |||
=====Functional Analysis: extent of target complementarity required for TldR/gRNA binding.===== | |||
The results also included a significant proportion of “'''''off-target'''''” sites. When analyzed in more detail, 3 prominent off-target peaks were observed for the ''fliC''-associated TldR homologues: Kpi, Eco, Eko1, Eko2 and Eho. One of these was found to be the ''[[wikipedia:Escherichia_coli|E. coli]]'' host chromosomal ''filC'' and ''filD'' intergenic region which differs from the ''[[wikipedia:Enterobacter_cloacae|Enterobacter cloacae]]'' sp. AR_15 fliC-TldR, by 5 of the core complementary nucleotides ([[:File:FigIS200 605 80.png|Fig. IS200. 80a]]; [[:File:FigIS200 605 82.png|Fig. IS200.82]]). A similar analysis of off-target ''oppF''-associated TldR insertions (Eca-, Emu, Efa, Tos and Ece) ([[:File:FigIS200 605 82.png|Fig. IS200.82]]) also indicated a rather relaxed recognition. These data are consistent with the approximately 6ny “seed” sequence found to be sufficient for certain Cas12a homologues <ref>{{#pmid:28431230}}</ref> and corresponds to the length of the core sequence complementarity found for the multiple potential TAM-associated ''EceTldR'' targets identified in the [[wikipedia:Enterococcus|''E. cecorum'']] genome ([[:File:FigIS200 605 81.png|Fig. IS200.81]]). | |||
[[ | |||
[[ | |||
Systematic analysis of all CHIP-seq peaks for enriched motifs (see <ref name=":66" />) revealed that fliCp-associated TldRs enriched for GTTAT identical to that flanking ''fliC'' promoters ([[:File:FigIS200 605 80.png|Fig.IS200.80a]]), while oppF-associated TldR homologues enriched TTTAA motifs, the TAM specificity predicted for closely related TnpB relatives (TTTAA and TTTAT) ([[:File:FigIS200 605 80.png|Fig.IS200.80b]]) | |||
[[File:FigIS200 605 81.png|center|thumb|720x720px|'''Fig. IS200.81. Identification of multiple potential targets for oppF-TldR guide RNA.''' The ''[[wikipedia:Enterococcus|Enterococcus cecorum]]'' genome is shown at the top with the Ece TldR (lilac) and the associated opp operon including the orfs and its genomic position (indicated as #4). The position of additional putative gRNA targets are numbered (1-7). Below (Column 1: Target) shows, each numbered locus and its genomic co-ordinates. The column to the right (Column 2: Context) shows the open reading frames, direction of expression and position of the potential target. Colunm 3 (Orientation) indicated the relative orientation of the TAM (yellow box) and guide sequence (red box). Column 4 (Putative target Sequence 5’-3’) shows the nucleotide sequence of the TAM (red text, yellow box) and target (red text). The guide sequence is shown below in bold text. Those nucleotides in the target common to both target and guide are marked in bold and indicated by a blue line. The core 7nt complementary target and guide sequences are boxed in red. There is complete conservation of 12 nts (TAM+core) in all 7 potential target sites <ref name=":59" /><ref name=":64" />.]] | |||
=====Functional Analysis: TldR are inactive in nuclease functions.===== | |||
To determine that the TldR derivatives identified in the study were truly nuclease-deficient, they were tested, together with the related active TnpB derivatives using a plasmid interference assay ([[:File:FigIS200 605 54.png|Fig.IS200.54a]]; TnpB<sub>Gst</sub> and IscB<sub>Gst</sub> proteins are active RNA-guided Nucleases; <ref name=":66" />). All 4 FliC TldR-related nuclease proficient TnpB homologues ([[:File:FigIS200 605 79B.png|Fig. IS200.79B]] '''top'''), reduced the [[wikipedia:Colony-forming_unit|CFU (colony forming units)]] in this assay whereas there was no effect with the 7 TdlR proteins ([[:File:FigIS200 605 79B.png|Fig. IS200.79B]] '''top''') and all 4 oppF TldR-related nuclease proficient TnpB homologues ([[:File:FigIS200 605 79B.png|Fig. IS200.79B]] '''bottom'''), reduced the CFU whereas there was no effect with the 8 opp7 TdlR proteins ([[:File:FigIS200 605 79B.png|Fig. IS200.79B]] '''bottom'''). | |||
It was concluded therefore that the TldRs are RNA-guided DNA proteins without nuclease activity <ref name=":59" /><ref name=":64" />. | |||
[[File:FigIS200 605 82.png|center|thumb|720x720px|'''Fig IS200.82. Identification of off-target binding sites in ''[[wikipedia:Escherichia_coli|E.coli]]''.''' Genome-wide ChIP–seq profiles fliCP-associated TldR homologs (Kpi, Eco, Eko1, Eko2 and Eho) exhibited off target insertions. A shared example was located in the intergenic region between filC and filD in the ''[[wikipedia:Escherichia_coli|E. coli]]'' host chromosome which differs from the ''[[wikipedia:Enterobacter_cloacae|Enterobacter cloacae]]'' sp. AR_15 fliC-TldR, by 5 of the core complementary nucleotides ('''top'''). A second off-target fliCp insertion spot, #1, together with the engineered inserted target are also shown. A third off-target peak occurred at a tRNA locus (not shown) without a recognizable TAM sequence. For the oppF TldR (Eca, Emu Efa2, Tos and Ece) four off-target insertion spots were identified including the tRNA locus (not shown)('''bottom''')<ref name=":59" /><ref name=":64" />.]] | |||
=====Functional Analysis: target DNA binding by TldR modulates gene expression.===== | |||
To determine whether the Tld systems modulate gene expression by target binding, <ref name=":59" /><ref name=":64" /> used an RFP/GFP assay in which ''gfp'' chromosomal expression would act as a standard control while TldR binding would be expected to repress ''rfp'' expression ([[:File:FigIS200 605 83.png|Fig. IS200.83]]). The assay involved two plasmids: one which supplies gRNA and the TldR and another which carries the target sequence upstream of the ''rfp'' gene. gRNAs were designed to target promoter sequences occluding transcription initiation by or to target the 5′ UTR to block transcription elongation. | |||
Using promoter targeting gRNAs, fliCp(Eho)- and oppF(Efa1)-associated TldR strongly repressed RFP when targeting the sense (top) strand. This is the native target orientation in the fliCp promoter ([[:File:FigIS200 605 83.png|Fig. IS200.83]] '''top'''). Shorter stretches of complementarity between target and gRNA were tested and a 6nt sequence showed repression similar to but a little lower than the 20nt guide sequence. Removal of the short sequence 5’ to the guide had little effect ([[:File:FigIS200 605 83.png|Fig.IS200.83]] '''bottom'''). | |||
Repression was largely unaffected when the target was placed in the 5’UTR (i.e. downstream of the promoter) on the top strand. When placed on the bottom strand some repression could be detected for a small subgroup of both fliC- and oppF-associated TdlR. | |||
Thus nuclease deficient TldR can efficiently repress downstream genes in a position- and orientation-dependent way. | |||
<br /> | |||
[[File:FigIS200 605 83.png|center|thumb|720x720px|'''Fig.IS200.83. RFP assay for TldR binding. Top Left:''' pTldRplasmid supplying TldRs (lilac) and gRNA (green) under control of seperate promoters (black arrows). '''Top Center:''' pRFT plasmids carrying a reporter RFP gene (red) with an upstream target sequence (bright red) and associated TAM sequence (yellow). These are oriented in such a way that in one (pRFPb) the gRNA is complementary to the bottom strand while in the other (pRFPt) it is complementary to the top strand. '''Right:''' details of the target sequence location between the -10 and -35 promoter components (green boxes) and orientation. The lilac structure represents the TldR protein associated with the gRNA. The gRNA is shown as a blue line. The large arrows on the right indicate RFP levels. '''Bottom:''' figure showing target complementarity (red) to the gRNA derivative used. Yellow indicates the target TAM sequence, red squares indicated nts complementary to the target sequence. NT: non-targeting guide. The right hand column are the results for the Eho TldR. They indicates the NT had no effect (i.e. did not repress) while all other gRNA repressed strongly. Slightly less repression was observed for the 6nt “seed” sequence <ref name=":59" /><ref name=":64" />.]] | |||
====FliC<sub>p</sub>-TldR from prophage helps supplant host FliC in flagella structures.==== | |||
Inspection of the phage and bacterial FliC structures indicated that although the surface-exposed structures were very different, the protomer-protomer interface surface was well conserved suggesting that the phage and host proteins are interchangeable in flagella assembly. To test the notion that prophage FliCp replaces the host FliC allowing the phage to assume control of host flagella composition via TldR host gene repression, total RNA-seq of three FliC-Tdlr carrying lysogenic strains and one which is devoid of the prophage was undertaken. | |||
This demonstrated that strong expression of gRNA with the expected 5’ and 3’ boundaries occurred in the fliC<sub>P</sub>-associated TldR carrying strains. In these strains, expression of host fliC compared to that of the host fliD was nearly undetectable whereas the phage fliC<sub>P</sub> gene was strongly expressed. In the prophage-free strain the host fliC was strongly expressed. That this effect was due to TldR repression, strains deleted for tldR, tldR–gRNA, the entire fliC<sub>P</sub>–TldR–gRNA and the entire prophage were created. All led to about a 100-fold increase in host fliC. Additionally, substitution of the guide segment of the gRNA for a non-targeting sequence had the same effect. Moreover, the de-repression of the host ''fliC'' gene could be reversed in the tldR gRNA deletion mutant by trans-complementation introducing a plasmid-encoded filC<sub>P</sub>-TdlR/gRNA cassette. | |||
The data therefore show that host flagella by a coupled host fliC repression and increased incorporation of the phage FliC<sub>P</sub> product into the host flagella. | |||
It will now be of interest to examine the impact of other TdlR on their bacterial host. | |||
== Acknowledgements == | |||
We are grateful to [https://www.niddk.nih.gov/about-niddk/staff-directory/biography/dyda-frederick Fred Dyda] and [https://www-mslmb.niddk.nih.gov/dyda/alison.html Alison Hickman] for advice concerning transposition mechanism, to [https://molbio.unige.ch/en/research-group/orsolya-barabas Orsyla Barabas] for certain figures and videos of structures, and to [[wikipedia:Kira_Makarova|Kira Makarova]], [[wikipedia:Virginijus_Šikšnys|Virginijus Šikšnys]], and [https://www.sternberglab.org/sam-sternberg Sam Sternberg] for advice concerning the RNA guide endonucleases. The Siksnys group also kindly supplied the Cas12 structural panel. Thanks also to [https://publish.uwo.ca/~haniford/ David Haniford] for comments on the impact of IS''200'' on expression of the SPI-1 ''Salmonella'' virulence genes. We are also grateful to all the above for permission permission to use derivatives of their published figures. | |||
==Bibliography== | ==Bibliography== | ||
< | {{Reflist|32em}} | ||
<br /> | |||
== How to Cite? == | |||
TnPedia Team. (2025). TnPedia: IS''200''/IS''605'' Family of Prokaryotic Insertion Sequences. Zenodo. https://doi.org/10.5281/zenodo.15640112 | |||
[[File:IS200-zenodo.15640112.png|link=https://doi.org/10.5281/zenodo.15640112|DOI badge]] | |||
<hr> | |||
{{TnPedia}} | |||
Latest revision as of 08:53, 11 June 2025
Historical
One of the founding members of this group, IS200, was identified in Salmonella typhimurium [1] as a mutation in hisD (hisD984) which mapped as a point mutation but which did not revert and was polar on the downstream hisC gene (see [2]). S. typhimurium LT2 was found to contain six IS200 copies and the IS was unique to Salmonella [3]. Further studies [4] showed that the IS did not carry repeated sequences, either direct or inverted, at its ends, and that removal of 50 bp at the transposase proximal end (which includes a structure resembling a transcription terminator) removed the strong transcriptional block. IS200 elements from S. typhimurium and S. abortusovis revealed a highly conserved structure of 707–708 bp with a single open-reading-frame potentially encoding a 151 aa peptide and a putative upstream ribosome-binding-site [5].
It has been suggested that a combination of inefficient transcription, protection from impinging transcription by a transcriptional terminator, and repression of translation by a stem-loop mRNA structure. All contribute to tight repression of transposase synthesis [2]. However, although IS200 seems to be relatively inactive in transposition [6], it is involved in chromosome arrangements in S. typhimurium by recombination between copies [7].
A second group of “founding” members of this family was, arguably, IS1341 from the thermophilic bacterium PS3 [8], IS891 from Anabaena sp. M-131 [9] and IS1136 from Saccharopolyspora erythraea [10]. The “transposases” of both elements were observed to be associated in a single IS, IS605, from the gastric pathogen Helicobacter pylori [11]. It was identified in many independent isolates of H. pylori and is now considered to be a central member which defines this large family. IS605 was shown to possess unique, not inverted repeat, ends; did not duplicate target sequences during transposition; and inserted with its left (IS200-homolog) end abutting 5'-TTTAA or 5'-TTTAAC target sequences [11]. Additionally, a second derivative, IS606, with only 25% amino acid identity in the two proteins (orfA and orfB) was also identified in many of the H. pylori isolates including some which were devoid of IS605. The Berg lab also identified another H. pylori IS, IS607 [12] which carried a similar IS1341-like orf (orfB) but with another upstream orf with similarities to that of the mycobacterial IS1535 [13] annotated as a resolvase due the presence of a site-specific serine recombinase motif. Another IS605 derivative, ISHp608, which appeared widely distributed in H. pylori was shown to transpose in E. coli, required only orfA to transpose and inserted downstream from a 5’-TTAC target sequence [14].
General
The IS200/IS605 family members transpose using obligatory single strand(ss) DNA intermediates [15] by a mechanism called “peel and paste”. They differ fundamentally in the organization from classical IS. They have sub-terminal palindromic structures rather than terminal IRs (Fig. IS200.1) and insert 3’ to specific AT-rich tetra- or penta-nucleotides without duplicating the target site.

The transposase, TnpA, is a member of the HUH enzyme superfamily (Relaxases, Rep proteins of RCR plasmids/ss phages, bacterial and eukaryotic transposases of IS91/ISCR and Helitrons[16][17])(Fig. IS200.2) which all catalyze cleavage and rejoining of ssDNA substrates.

IS200, the founding member (Fig. IS200.3), was identified 30 years ago in Salmonella typhimurium [1] but there has been renewed interest for these elements since the identification of the IS605 group in Helicobacter pylori [11][18][14]. Studies of two elements of this group, IS608 from H. pylori and ISDra2 from the radiation resistant Deinococcus radiodurans, have provided a detailed picture of their mobility [19][20][21][22][23][24][25].

Distribution and Organization
The family is widely distributed in prokaryotes with more than 153 distinct members (89 are distributed over 45 genera and 61 species of bacteria, and 64 are from archaea). It is divided into three major groups based on the presence or absence and on the configuration of two genes: the transposase tnpA (https://www.ncbi.nlm.nih.gov/research/cog/cog/COG1943/), sufficient to promote IS mobility in vivo and in vitro and tnpB (https://www.ncbi.nlm.nih.gov/research/cog/cog/COG0675/) (Fig. IS200.1) initially of unknown function and not required for transposition activity but now known to de an RNA-guide endonuclease (see TnpB below) . These groups are: IS200, IS605 and IS1341. TnpB is also present in another IS family, IS607, which uses a serine-recombinase as a transposase. In the phylogeny of this group (Fig. IS200.4A) of IS, both tnpB and tnpA of bacterial or archaeal origin are intercalated, suggesting some degree of horizontal transfer between these two groups of organisms[26].

Isolated copies of IS200-like tnpA can be identified in both bacteria and archaea[26]. Full length copies of IS605-like elements are also found in bacteria and several archaea and all have corresponding MITEs (Miniature Inverted repeat Transposable Elements) derivatives in their host genomes.
The IS200 group
IS200 group members encode only tnpA, and are present in gram-positive and gram-negative bacteria and certain archaea[2][27] (Fig. IS200.1 and Fig. IS200.3). Alignment of TnpA from various members shows that they are highly conserved but may carry short C-terminal tails of variable length and sequence. Among approximately 400 entries in ISfinder (December 2023), about 50 examples IS200-like derivatives.
They can occur in relatively high copy number (e.g. >50 copies of IS1541 in Yersinia pestis) and are among the smallest known autonomous IS with lengths generally between 600-700 pb. Some members such as ISW1 (from Wolbachia sp.) or ISPrp13 (from Photobacterium profundum) are even shorter.
IS200 was initially identified as an insertion mutation in the Salmonella typhimurium histidine operon [1]. It is abundant in different Salmonella strains and has now also been identified in a variety of other enterobacteria such as Escherichia, Shigella and Yersinia.
Different enterobacterial IS200 copies have almost identical lengths of between 707 and 711bp. Analysis of the ECOR (E. coli) and SARA (Salmonellae) collections showed that the level of sequence divergence between IS200 copies from these hosts is equivalent to that observed for chromosomally encoded genes from the same taxa[28][29]. This suggests that IS200 was present in the common ancestor of E. coli and Salmonellae.
In spite of their abundance, an enigma of IS200 behavior is its poor contribution to spontaneous mutation in its original Salmonella host: only very rare insertion events have been documented [2]. One reason for these rare insertions could be due to poor expression of the TnpAIS200 gene from a weak promoter pL identified at the left IS end (LE)[4][5] (Fig. IS200.3).
Besides the characteristic major subterminal palindromes [4] presumed binding sites of the transposase at both LE and the right end (RE) (Substrate recognition), IS200 carries also a potential supplementary interior stem-loop structure (Fig. IS200.3). These two structures play a role in regulating IS200 gene expression. The first (perfect palindrome at LE; nts 12–34) overlaps the TnpAIS200 promoter pL, can act as a bi-directional transcription terminator upstream of TnpAIS200 and terminates up to 80% of transcripts[30] (Fig. IS200.3). The second (interior stem-loop; nts 69–138) (Fig. IS200.3), at the RNA level, can repress mRNA translation by sequestration of the Ribosome Binding Site (RBS) (Fig. IS200.3). Experimental data suggested that the stem-loop is formed in vivo and its removal by mutagenesis caused up to a 10 fold increase in protein production[30]. Recent deep sequencing analysis revealed another aspect in post-transcriptional regulation of IS200 expression: A small anti-sense RNA (asRNA) IS200 transposase expression (Fig. IS200.3) was identified as a substrate of Hfq, an RNA chaperone involved in post-transcriptional regulation in numerous bacteria[31]. Interestingly, asRNA and Hfq independently inhibit IS200 transposase expression: knock-out of both components resulted in a synergistic increase in transposase expression. Moreover, footprint data showed that Hfq binds directly to the 5’ part of the transposase transcript and blocks access to the RBS[32].
In spite of its very low transposition activity, an increase in IS200 copy number was observed during strain storage in stab cultures[1][3]. However, the factors triggering this activity remain unknown[2] . Transient high transposase expression leading to a burst of transposition was proposed to explain the observed high IS200 (>20) copy number in various hosts and in stab cultures [1].
Although regulatory structures similar to that observed in IS200 (Fig. IS200.3) were predicted in IS1541, another member of this group with 85% identity to IS200, this element can be detected in higher copy number (> 50) in Salmonella and Yersinia genomes. However, no detailed analysis of its transposition is available and since no de novo insertions have been experimentally documented and chromosomal copies appear stable in Y. pestis[33], it remains possible that IS1541 also behaves like IS200.
However, the regulatory structures are not systematically present in other IS200 group members and understanding of the control of transposase synthesis requires further study.
The IS605 group
IS605 group members are generally longer (1.6-1.8 kb) due to the presence of a second orf, tnpB in addition to tnpA. Alignment of TnpA copies from this group indicated that although they do not form a separate clade from the IS200 group TnpA, they generally carry the short C-terminal tail. The tnpA and tnpB orfs exhibit various configurations with respect to each other. They may be divergent (Fig. IS200.1 i top: e.g. IS605, IS606) or expressed in the same direction with tnpA upstream of tnpB. In these latter cases, the orfs may be partially overlapping (Fig. IS200.1 ii; e.g. IS608, ISDra2) or separate Fig. IS200.1 iii; e.g. ISSCpe2, ISEfa4). tnpB is also sometimes associated with another transposase, a member of the S-transposases (e.g. IS607[12][34], see [15]. TnpB was not required for transposition of either IS608 or ISDra2.
Three related IS, IS605, IS606 and IS608 (Fig. IS200.1) have been identified in numerous strains of the gastric pathogen Helicobacter pylori [11][14] . IS605 is involved in genomic rearrangements in various H. pylori isolates[35].
The H. pylori elements transpose in E. coli at detectable frequencies in a standard "mating-out" assay using a derivative of the conjugative F plasmid as a target [11][14].

The two best characterized members of this family are IS608 and the closely related ISDra2 from Deinococcus radiodurans. Both have overlapping tnpA and tnpB genes (Fig. IS200.1 ii). Like other family members, insertion is sequence-specific: IS608 inserts in a specific orientation with its left end 3’ to the tetranucleotide TTAC both in vivo and in vitro[14] while ISDra2 inserts 3’ to the pentanucleotide TTGAT[38]. Interestingly ISDra2 transposition in its highly radiation resistant Deinococcal host is strongly induced by irradiation[39] (Single strand DNA in vivo). Their detailed transposition pathway has been deciphered by a combination of in vivo studies and in vitro biochemical and structural approaches (Mechanism of IS200/IS605 single strand DNA transposition).
A more detailed and recent analysis of the distribution of 107 IS605 group elements in ISfinder is shown in Fig. IS200.4B [36]. The tree, based on TnpB sequences could be divided into 8 clusters which are overlaid onto the universal tree described by Hug et al., 2016 [37].
The IS1341 group
Elements of the third group, IS1341, are devoid of tnpA and carry only tnpB (Fig. IS200.1 v). The IS occurs in three copies in Thermophilic bacterium PS3 [8]. Multiple presumed full-length elements (including tnpA and tnpB) and closely related copies have been identified in other bacteria such as Geobacillus. On the other hand, IS891 from the cyanobacterium Anabaena is present in multiple copies on the chromosome and is thought to be mobile since a copy was observed to have inserted into a plasmid introduced in the strain[9].
Another isolated tnpB-related gene, gipA, present in the Salmonella Gifsy-1 prophage may be a virulence factor since a gipA null mutation compromised Salmonella survival in a Peyer's patch assay [40]. While no mobility function has been suggested for gipA, it is indeed bordered by structures characteristic of IS200/IS605 family ends and closely related to E. coli ISEc42.
In spite of their presence in multiple copies, it is still unclear whether IS1341 group members are autonomous IS or products of IS605 group degradation and require TnpA supplied from a related IS in the same cell for transposition.
IS decay
Circumstantial evidence based on analysis of the ISfinder database suggests that IS carrying both tnpA and tnpB genes may be unstable. Thus, although members of the IS200 group are often present in high copy number in their host genomes, intact full-length IS605 group members are invariably found in low copy number (P. Siguier, unpublished) (See also TnpB). On the other hand, various truncated IS605 group derivatives appear quite frequently (Fig. IS200.slide show 1, slide show 2 ,slide show 3, slide show 4, and slide show 5).
These forms seem to result from successive internal deletions and retain intact LE and RE copies. Sometimes, as in the case of ISSoc3 (slide show 3)., orf inactivation appears to have occurred by successive insertion/deletion of short sequences (indels) generating frameshifts and truncated proteins. For some IS (e.g. ISCco1, ISTel2, ISCysp14, ISSoc3) degradation can be precisely reconstituted and each successive step validated by the presence of several identical copies (P. Siguier, unpublished - Fig. IS200.slide show 1, slide show 2 ,slide show 3, slide show 4, and slide show 5, respectively). This suggests that the degradation process is recent and that these derivatives are likely mobilized by TnpA supplied in trans by autonomous copies in the genome.
Among the approximately 400 IS200/IS605 family entries in ISfinder (December 2023), there are more than 200 examples of IS1341-like derivatives. It was suggested that the IS1341-like derivatives might undergo transposition using a resident tnpA gene to supply a Y1 transposase in trans. There is some circumstantial evidence for transposition of IS1341-like elements. For example, IS891, present in multiple copies in the cyanobacterium Anabaena sp. strain M-131 genome [9] was observed to have inserted into a plasmid which had been introduced into the strain and more recently it has been shown experimentally that IS1341 derivatives can be mobilized by a resident tnpA gene [41] (see The IS1341 Conundrum). This can be followed from a full length IS to the formation of MITES (e.g. ISTel2; slide show 4) and MICs (e.g. ISTel3; slide show 5).
ISC: A group of Elements Related to the IS605 Group
Another group of potential IS of similar organisation, the ISC insertion sequence group, was defined by Kapitonov et al.[42] following identification of Cas9 homologues which occur outside the CRISPR structure, so called “stand-alone” homologues. While related to TnpB, they are more similar to Cas9 than to TnpB proteins. These genes were often flanked by short DNA sequences which, like LE and RE of the IS200/IS605 family, were capable of forming secondary structures. Moreover, it was reported that the ends of many ISC derivatives showed significant identity to members of the IS605 derivatives identified by these authors in the same study. (Fig. IS200.5). These structures therefore resemble the IS1341-like group.

These potential transposable elements were called ISC (Insertion Sequences Encoding Cas9; not to be confused with ISCR, IS with Common Region). The name IscB was coined for the Cas9-like protein and IscA for an associated potential transposase protein which was identified in a very limited number of cases. Examples of ISC elements with both iscA and iscB genes are quite rare. Only 7 cases were identified by Kapitonov et al.,[42] (Fig. IS200.6) and only 56 of 2811 iscB examples observed in a more extensive analysis were accompanied by an iscA copy [43] . Most ISC identified were IS1341-like with only the iscB (tnpB-like) gene. These stand-alone IscB copies were identified in multiple copies in a large number of bacterial and archaeal genomes generally in low numbers (<10 copies) although some genomes contained more elevated numbers (e.g. 22 in Methanosarcina lacustris; 25 in Coleofasciculus chthonoplastes PCC 7420; 52 in Ktedonobacter racemifer)[42].
However, in contrast to the observations of Kapitonov et al.,[42] more wide-ranging studies [43] identified rare IscB proteins which were not “stand alone” but were associated with CRISPR arrays (31 examples in a sample of 2811).
A tree of “full-length” elements (Fig. IS200.6; [42])(i.e. those with both tnpA and tnpB or iscB genes) based on TnpA/IscA sequences showed that full length IS605 and ISC examples carrying both tnpA/iscA and tnpB/iscB are interleaved. IS605 is among those family members with divergent tnpA and tnpB genes (Fig. IS200.1) while other family members carry tnpA upstream of tnpB (e.g. ISDra2). However, in contrast to all IS605-like derivatives, those full length ISC elements included in this tree all have the iscA gene downstream of and slightly overlapping with iscB.

ISC have very similar transposases to those of the IS200/IS605 family and are therefore part of the same super family.
An alignment of full length TnpA from the IS200/IS605 group (Fig. 200.7; ISfinder November 2021) shows the highly conserved HuH triad, catalytic tyrosine (Y) and important glutamine (Q) residues all central to the transposition chemistry (Fig. IS200.7, Fig. IS200.11 and Fig. IS200.12) together with a number of other highly conserved amino acid positions. An alignment with the available IscA from the ISC group (Fig. 200.8 Top) shows that these also include all the highly conserved TnpA amino acid positions and are therefore very closely related to TnpA. However, the IscA and TnpA proteins appear to fall into separate clades (Fig. 200.8 bottom) with some overlap.

Since IS families are defined by their transposases rather than their accessory genes, and those of ISC and the IS200/IS605 family are so similar, it seems reasonable to include the ISC group as a subgroup of the IS200/IS605 family (or IS605 super family;[42] ). For many of the archaeal elements, there is a small, potential 40-45 amino acid, peptide located upstream of the TnpB analogue.

A tree based on the TnpB/IscB (Fig. IS200.9) examples presented by Kapitonov, et al.,[42] shows that the TnpB homologues form a clade separate from IscB and that the latter can be divided into two clades, IscB1 and IscB2.
These considerations therefore reinforce the idea that the IS200/IS605 family and ISC group might be considered as a superfamily which includes a number of related accessory genes (tnpB, iscB1, iscB2 etc), which carry flanking DNA sequences with secondary structure potential and in which a Y1 HuH transposase assures the chemistry of transposition. A similar conclusion was also reached by Altae-Tran et al.[43] .However, this picture is complicated by the identification of another group of transposable elements, the IS607 family in which tnpB is associated with a different type of transposase, in this case a serine site-specific recombinase (IS607 family).

Mechanism of IS200/IS605 single strand DNA transposition
Early models
A number of alternative mechanisms were initially proposed to explain IS608 transposition [20] (Fig. IS200.10). These all included the insertion of a double-strand circular transposon copy (Fig. IS200.10 D). One model (Fig. IS200.10 A) envisaged simultaneous or consecutive cleavage at LE and RE and reciprocal strand transfer would generate a Holliday junction (HJ) which then could be resolved into double-strand circular copies of the transposon. The second (Fig. IS200.10 B) cleavage at LE and replicative strand displacement using a 3’OH of the flanking donor DNA. This could assist formation of a single strand region accessible for cleavage of RE to generate a single-strand transposon circle which could be replicated into a double-strand copy. The third (Fig. IS200.10 C) proposed cleavage at LE with displacement of the transposon strand to form a single strand loop. Subsequent in vitro and in vivo experiments (below) demonstrated that not only was IS608 capable of excision as a single-strand DNA circle but that this could be inserted into a single strand target.

General transposition pathway
The transposition pathway of IS200/IS605 family members is shown in Fig. IS200.11. Much of the biochemistry was elucidated using an IS608 cell-free in vitro system which recapitulates each step of the reaction. This requires purified TnpAIS608 protein, single strand IS608 DNA substrates and divalent metal ions such as Mg2+ or Mn2+ [20][21][22]. Similar and complementary results were also obtained with ISDra2[23][24][25]. The reactions are not only strictly dependent on single strand (ss) DNA substrates but are also strand-specific: only the “top” strand (defined as the strand carrying target sequence, TS, 5’ to the IS; Fig. IS200.11 top) is recognized and processed whereas the “bottom” strand is refractory[20] [21]. Cleavage of the top strand at the left and right cleavage sites (TS/CL and CR, note that TS is also the left cleavage site CL) (Fig. IS200.11 B) leads to excision as a circular ssDNA intermediate with abutted left and right ends (transposon joint) (Fig. IS200.11 C bottom left). This is accompanied by rejoining of the DNA originally flanking the excised strand (donor joint).

The transposon joint is then cleaved (Fig. IS200.5 E bottom right) and integrated into a single strand conserved element-specific target sequence (TS) where the left end invariably inserts 3’ to TS (Fig. IS200.5 F). This target specificity is another unusual feature of IS200/IS605 transposition. The target sequence is characteristic of the particular family member and, although it is not part of the IS, it is essential for further transposition because it is also the left end cleavage site CL of the inserted IS [20] (The Single strand Transpososome and Cleavage site recognition) and is therefore intimately involved in the transposition mechanism.
TnpA, Y1 transposases and transposition chemistry
IS200/IS605 family transposases belong to the HUH enzyme superfamily. All contain a conserved amino-acid triad composed of Histidine (H)-bulky hydrophobic residue (U)-Histidine (H)[44] providing two of three ligands required for coordination of a divalent metal ion that localizes and prepares the scissile phosphate for nucleophilic attack. HUH proteins catalyze ssDNA breakage and joining with a unique mechanism. They all catalyse DNA strand cleavage using a transitory covalent 5' phosphotyrosine enzyme-substrate intermediate and release a 3' OH group [17] (Groups with HUH Enzymes; Fig.7.5).
The HUH enzyme family also includes other transposases of the IS91/ISCR and Helitron families as well as proteins involved in DNA transactions essential for plasmid/virus rolling circle replication (Rep; not to be confused with the TnpAREP/REP system described in Domestication) and plasmid conjugation (Mob/relaxase) (Groups with HUH Enzymes; Fig.7.5).
IS200/IS605 transposases are single-domain proteins containing a single catalytic tyrosine residue, called Y1 transposase. They use the tyrosine residue (Y127 for IS608) as a nucleophile to attack the phosphodiester link at the cleavage sites (vertical arrows in Fig. IS200.11 A and D). Since cleavages at both IS ends occur on the same strand, the polarity of the reaction implies that the enzyme forms a covalent 5’-phosphotyrosine bond with the IS at LE producing a 3’-OH on the DNA flank and a 5’-phosphotyrosine bond at the RE flank producing a 3’-OH on RE itself (Fig. IS200.11 B). The released 3′-OH groups then act as nucleophiles to attack the appropriate phospho-tyrosine bond resealing the DNA backbone in one case and generating a single-strand DNA transposon circle in the other (Fig. IS200.11 C). The same polarity is applied to the integration step (Fig. IS200.11 D, E and F). As an important mechanistic consequence of this chemistry, IS200/IS605 transposition occurs without loss or gain of nucleotides. In vitro, the reaction requires only TnpA and does not require host cell factors.
TnpA overall structure
Crystal structures of Y1 transposases have been determined for three family members: IS608 (TnpAIS608) from Helicobacter pylori [19][22] ISDra2 (TnpAISDra2) from Deinococcus radiodurans [25] and ISC1474 from Sulfolobus solfataricus[45]. In contrast to most characterised HUH enzymes, which are usually monomeric and have two catalytic tyrosines, Y1 transposases form obligatory dimers with two active sites (Fig. IS200.12 A). The two monomers dimerize by merging their β-sheets into one large central β-sheet sandwiched between α-helices. Each catalytic site is constituted by the HUH motif from one TnpA monomer (H64 and H66 in the case of TnpAIS608) and a catalytic tyrosine residue (Y127) located in the C-terminal αD helix tail of the other monomer (Fig. IS200.12 A). This is joined to the body of the protein by a flexible loop (trans configuration, Active site assembly and Catalytic activation and Transposition cycle: the trans/cis rotational model).

The TnpA enzyme active sites are believed to adopt two functionally important conformations: the trans configuration described above (Fig. IS200.12 A), in which each active site is composed of the HUH motif supplied by one monomer with the tyrosine residue supplied by the other, and the cis configuration, in which both motifs are contributed by the same monomer (IS200/IS605 video 1 below; kindly supplied by O. Barabas and Fred Dyda).
The trans conformation is active during cleavage where Tyrosine acts as nucleophile whereas the cis conformation is thought to function during strand transfer where the 3’OH is the attacking nucleophile (Transposition cycle: the trans/cis rotational model). Only the trans configuration of TnpAIS608 and TnpAISDra2 has yet been observed crystallographically [19][25] but the existence of the cis configuration is supported by biochemical data [46].
The Single strand Transpososome
The key machinery for transposition is the higher-order protein-DNA complex, the transpososome (or synaptic complex) which contains both transposase and two IS DNA ends with or without target DNA. Transpososome formation, stability, and the temporal changes in a configuration which occur during the transposition cycle have been characterized for TnpAIS608 by crystallographic and biochemical approaches.
Although for technical reasons it was not possible to obtain structures with both LE and RE hairpins together, co-crystal structures with either LE or RE showed that a TnpA dimer binds two subterminal DNA hairpins suggesting that it could bind both LE and RE ends simultaneously. Binding sites for the hairpins are located on the same face of the TnpA dimer while the two catalytic sites are formed on the opposite surface (Fig. IS200.6 A and B) (IS200/IS605 video 2 below; kindly supplied by O. Barabas and Fred Dyda). The hairpin forms a distorted helix anchored by base interactions at the foot (IS200/IS605 video 2 below; kindly supplied by O. Barabas and Fred Dyda).
Substrate recognition
A key feature of TnpA is that it is only active on one strand, the “top” strand. The IS608 and ISDra2 ends carry subterminal imperfect hairpins. In addition to specific sequences on the loops, the irregularities on the hairpins help the enzyme to distinguish between “top” and “bottom” strands [19][25]. The initial co-crystal structure was obtained with TnpAIS608 and a 22nt imperfect RE hairpin (HP22) including its characteristic extrahelical T17 located mid-way along the DNA stem (Fig. IS200.12 and Fig. IS200.13). In addition to a number of backbone contacts with HP22, TnpAIS608 also shows several base-specific contacts, in particular with T10 in the loop and the extrahelical T17[19] (Fig. IS200.12 B).
Exchange of T10 and neighboring T nucleotides in the loop abolished binding whereas the exchange of T17 for an A significantly reduced but did not eliminate binding [47]. Similar studies with TnpAISDra2 showed that it also recognises a similarly located T in the hairpin loop of ISDra2 and that this is essential for binding [25] . Instead of an extrahelical T, ISDra2 LE and RE include a bulge caused by two mismatched nucleotides (G and T) in the hairpin stem. These unpaired nucleotides are specifically recognized and stabilized by the protein. Again, mutation of the T (to C which, in this case, eliminates the bulge to generate a GC base pair in the stem) greatly reduces binding (IS200/IS605 video 3A below; kindly supplied by O.Barabas and Fred Dyda).
Although most members of the IS605 group, which includes IS608 and ISDra2, have imperfect palindromes with extrahelical bases or bulges, some members of the IS200 group (e.g IS200, IS1541) include perfect hairpins. Whether base-specific interactions with the loop sequence is exclusively responsible for strand-specific activity of the corresponding transposase remains to be clarified.
Cleavage site recognition
The left (CL/TS) and right (CR) IS608 cleavage sites (TTACl and TCAAl respectively, where l represents the point of cleavage) are located some distance from the subterminal recognition hairpins (19 nt at LE and 10 nt at RE) (Fig. IS200.13). The system is asymmetric because the two distinct cleavage sites are separated from the hairpins by linkers of different lengths and the CL/TS sequence does not form part of IS while CR does.

Structural studies revealed that the cleavage sites are recognized in a unique way that does not involve direct sequence recognition by TnpA. Instead, an internal part of the IS sequence is co-opted to recognize different cleavage sites allowing TnpA to catalyze both excision and integration of the element with a single DNA binding domain.
Internal transposon sequences, the left (GL) and right (GR) tetranucleotide guide sequences, AAAG and GAAT, located 5’ to the foot of the hairpins (Fig. IS200.7), recognize their respective cleavage sites by direct base interactions. These GL/CL and GR/CR interactions involve 3 of the 4 nt of GL and GR. They include both canonical Watson-Crick interactions and in the case of RE, non-canonical interactions resulting in base triplets (Fig. IS200.13 and Fig. IS200.14, bases joined by both regular and dotted lines respectively). In the case of LE and the transposon joint, base triples (dotted lines) are suggested from biochemical data [47] (IS200/IS605 video 3B below; kindly supplied by O. Barabas and Fred Dyda).

These interactions place the scissile phosphate precisely into the two active sites of TnpAIS608 for nucleophilic attack by the catalytic Y127. Interestingly, the base-pairing patterns responsible for cleavage site recognition are similar at LE, RE and the target site in spite of sequence differences (Fig. IS200.13, Fig. IS200.14, Fig. IS200.15). Since TS is also CL, this type of recognition not only explains the requirement for the TS located at the left end of the inserted IS (Fig. IS200.11, Fig. IS200.15) for further transposition, but also the target specificity. Upon integration, TS is presumably recognized by the GL present on the excised transposon joint. Note that the transposon joint contains only the LE guide sequence GL but not the LE cleavage site CL (Fig. IS200.11, Fig. IS200.15).

Similar crystal structures were obtained with TnpAISDra2 (see also Single strand DNA in vivo) with a similar interaction network between the guide sequences and cleavage sites.
The ISDra2 transpososome is structurally very similar to those of IS608 despite only 34% sequence identity of the transposases. It is important to note that the target sequence in ISDra2 is a pentanucleotide instead of a tetranucleotide as in IS608. The fifth nucleotide in the ISDra2 sequence is however not involved in DNA-DNA interactions but in DNA-protein interaction[25].
The potential cleavage site recognition mode (i.e. the canonical interaction network between CL,R and GL,R) is indeed well conserved throughout the family (Fig. IS200.16).

This model has been validated in vitro and in vivo by showing that it is possible to modify cleavage sites by changing corresponding guide sequences. Moreover, in the case of IS608, modifications of GL in the transposon joint generate predictable changes in insertion site-specificity of the element [48]. The IS608 recognition system has also been modified to include additional sequences which assist more specific targeting of insertions[49].
Active site assembly and Catalytic activation
Comparison of crystal structures of different TnpA protein-DNA complexes [19][22] [45] revealed TnpA in both active and inactive configurations. In both the free TnpAIS608 dimer and TnpAIS608-DNA complexes bound to a “minimal” HP22 hairpin (which does not include the guide sequence), the catalytic tyrosine residue (Y127) points away from the HUH motif (H64 and H66) and therefore cannot act as a nucleophile [19] (Fig. IS200.11).
The enzyme is therefore in an inactive conformation. Binding to the appropriate substrate containing the 4 nucleotide guide sequence 5’ to the hairpin foot (compare Fig. IS200.17 left and right) triggers a change in TnpA configuration that permits assembly of functional active sites. A single A (A+18, Fig. IS200.13 and Fig. IS200.13) in the guide sequence present in both GL and GR does not participate in base interactions with the cleavage site. On formation of the CL(R)/GL(R) base interaction network, this single base penetrates the structure and forces the C-terminal αD helix carrying Y127 closer to the HuH motif placing it in the correct position poised for catalysis [22] (compare Fig. IS200.17 left and right; Fig. IS200.18)(IS200/IS605 video 4 below; kindly supplied by O. Barabas and Fred Dyda).
This movement also places a third amino acid (Q131 located at the C-terminal end of helix αD on the same face as Y127) in a position enabling it to function in conjunction with both H residues to complete the metal ion binding pocket. This movement is made possible by the fact that the αD helix is attached to the protein body by a flexible loop. This conformational change involving αD helix movement will be discussed below (Transposition cycle: the trans/cis rotational model).


Transpososome assembly and stability
Excision requires the assembly of a transpososome containing both LE and RE. However, it is technically difficult to generate crystallographically pure complexes of this type. Only crystal structures containing two LE or two RE were obtained. The excision transpososome was initially modelled using information obtained from the IS608LE-TnpA and RE-TnpA structures [22] (Fig. IS200.12 B; Fig. IS200.19). However, complexes containing both LE and RE have now been identified using a band shift assay and characterized biochemically [47].

A TnpA co-complex with either LE or RE can be titrated by the addition of increasing quantities of the other end (RE or LE) to obtain a transpososome containing both LE and RE. This can be easily detected in a gel shift assay. Such species proved to be catalytically active since they could be removed from the gel and, when incubated with the essential divalent metal ion, robust reaction products could be detected in a denaturing ge [47].
This approach was used to monitor both transpososome formation and stability using oligonucleotides carrying point mutations in GL,R and CL,R. Robust transpososome formation and cleavage activity requires much of the network of GL,R and CL,R interactions observed in the crystal structures [47] (schematised in Fig. IS200.13). Although base triplets in the original LE co-crystal structure were not detected since the LE substrate was too short [22], the biochemical data suggested that such interactions probably exist (grey dotted lines in Fig. IS200.13).
For example, the two nucleotides 3’ to the foot of the LE hairpin (at equivalent positions to triplet forming bases in RE, Fig. IS200.13 are required for robust synaptic complex formation and cleavage [47]. This further implies that these base triplets might also be involved in target DNA capture (grey dotted lines in Fig. IS200.15).
Base changes in GL resulted in a predictable choice of target sequence [48]. However, large differences in insertion frequencies were observed. The influence of the presumed non-canonical interactions in LE would provide an explanation for this variability since these were not taken into account in the choice of LE guide sequence.
In both IS608 and ISDra2, the extra-helical bases in the hairpin stem and nucleotides in the loop are also important for transpososome formation even in a context which includes both GL,R and CL,R[25][47].
Transposition cycle: the trans/cis rotational model
Transpososome assembly is followed by two critical chemical steps: cleavage and strand transfer. These are thought to be accomplished by a series of large changes in transpososome configuration. A detailed model has been proposed for the dynamics of the IS608 transpososome during the transposition reactions[22] (Fig. IS200.19; IS200/IS605 video 1). As described in TnpA overall structure (above), TnpAIS608 could in principle assume two configurations: trans and cis. Switching between these two states would involve rotation of the two unconstrained flexible arms which join the αD helix to the protein body.
The current model for IS608 and ISDra2 transposition proposes that the strand transfer step involves rotation of these arms from the trans to the cis configuration: cleavage occurs while the enzyme is in the trans configuration. A trans to cis conformational change then occurs allowing strand transfer. The ground state of the IS608 and ISDra2 transpososomes obtained from crystallography is the trans configuration. LE and RE binding and cleavage occur with the enzyme in its trans configuration (Fig. IS200.19; IS200/IS605 video 1).
This results in the formation of the 5’ phosphotyrosine bond with LE liberating a 3’-OH on the flanking DNA and the 5’phosphotyrosine bond with the RE DNA flank liberating a 3’-OH on the RE transposon end. Rotation of the two arms would displace LE towards the sequestered 3’-OH of RE and the RE flank towards the 3’-OH of the LE flank (Fig. IS200.19; IS200/IS605 video 1) and position them so that both 3’-OH can attack the appropriate phosphodiester bond. This model is supported by several lines of indirect evidence from studies of IS608.
An initial piece of evidence concerns the length differences in the LE and RE “linker” (the distance between the hairpin foot and the cleavage site): this is only 10 nt for RE but 19 nt for LE (Fig. IS200.15). The rotation model suggests that the longer LE linker may be required to provide sufficient length to rotate the 5’ LE phospho-tyrosine bond to position it closes the immobile RE 3’-OH (Fig. IS200.19; IS200/IS605 video 1). This would imply that LE linker length is critical for strand transfer. Indeed, sequential reduction in the length of the LE linker has a large effect on transposition frequency and excision in vivo. In vitro, it also had a somewhat larger effect on strand transfer than on cleavage [47], supporting the idea that the linker is important for mechanical movement.
However, transpososome formation and stability was also observed to be affected with the shortest linkers. This presumably reflects steric barriers to GL(R)/CL(R) interaction and supports the notion that these interactions are important in transpososome assembly. A survey of over 100 different IS from all three groups (35 from the IS200 group; 47 from IS605 and 24 from IS1341) in the public databases has shown that the asymmetry of the IS608 ends is conserved across the entire family: the left linker is always longer than the right (15-16 nt versus 8 nt) [46] (Fig. IS200.20).

The second piece of evidence comes from the behaviour of TnpAIS608 heterodimers carrying point mutations in the HuH or catalytic Y. These were expressed and assembled in vivo and purified based on two different C-terminal affinity tags (one for each monomer). This permitted heterodimers to be distinguished form homodimers. A heterodimer with a combination of mutations that enforce a trans-active TnpA site (in which the wildtype HuH motif and Y127 belong to different TnpA monomers) is proficient for cleavage but not for rejoining. In contrast, a heterodimer with cis-active TnpA site (in which the wildtype HuH motif and Y127 belong to the same TnpA monomer) is proficient for rejoining but inactive in cleavage [46].
This implies that all chemical reactions involved in cleavage occur in the trans site while the chemical reactions for strand transfer occur in the cis site. This strongly supports the rotational model.
A third piece of evidence comes from studies of the flexible arm that joins helix αD to the body of the protein and which is proposed to play a pivotal role in the rotation. This flexibility may be facilitated by two glycine residues (G117 and G118). Mutation of these two residues did not affect strand cleavage but led to inhibition of strand transfer suggesting that the two residues are required for achieving a cis configuration. The importance of these G residues is reflected in their conservation throughout the family [46].
Thus, while the cis configuration has not been observed crystallographically for these elements, its existence is strongly suggested by experimental data, supporting the trans/cis rotational model (Fig. IS200.21).

Regulation of single strand transposition
Single strand DNA in vivo
The obligatory single-stranded nature of IS200/IS605 transposition in vitro suggests that it is limited in vivo by the availability of its ssDNA substrates inside the cells and processes that produce ssDNA may stimulate transposition. We describe below a link between the transposition of these elements and the replication fork. Moreover, in the case of ISDra2, single strand DNA produced during re-assembly of the D. radiodurans genome following irradiation results in stimulation of transposition[23][50]. Transcription or other processes leading to horizontal gene transfer such as transformation, conjugative transfer, or transduction with single strand phages might also favor their mobility.
Replication fork
The replication fork modulates the transposition of many transposable elements (Tn7, IS903, IS10, IS50, Tn4430, P element[51][52][53][54][55][56]. For IS200/IS605 family members, the replication fork, in particular the lagging strand template, is an important source of ss DNA substrates for both excision and integration. Transposition can be considered to follow a “Peel and Paste ” mechanism (Fig. IS200.22) where the IS excises or is “peeled” off as a single strand circle from the lagging strand template of the donor molecule and then integrates or is “pasted” in a ss target at the replication fork.

Excision: Excision of IS608 is sensitive to the direction of replication across the element: it is more frequent when the active strand (top strand) is on the lagging strand (discontinuous) template (Fig. IS200.22 top; Fig. IS200.23) but difficult to detect when it is on the leading (continuous) strand [24]. Moreover, excision in vitro requires that both ends are in single strand form at the same time[20].

The length of ssDNA on the lagging-strand template depends on the initiation frequency of Okazaki fragment synthesis by the DnaG primase[57][58]. Transient inactivation of DnaG activity reduces this frequency and therefore increases the average length of ssDNA between Okazaki fragments; the IS608 excision frequency increased. Under permissive conditions for E. coli carrying a dnaGts mutation, using a plasmid-based assay with IS608 derivatives of different lengths, the excision frequency decreased strongly as IS length increased. In contrast, when DnaGts activity was reduced by growth under sub-lethal conditions, excision showed a much less pronounced length-dependence (Fig. IS200.24). This length-dependence might also contribute to the difference in copy numbers observed in the IS200 and IS605 groups (see "Distribution and Organization").

Integration: IS608 integration is oriented (with its left end 3’ to a TTAC target site) and it requires an ssDNA target in vitro [14][21]. The close link between transposition and the replication fork is also illustrated by the integration bias, consistent with a preference for an ssDNA target on the lagging strand template (Fig. IS200.22 bottom). This was indeed found to be the case in E. coli for both plasmid and chromosome targets [24]. As expected, the orientation of insertions into the E. coli chromosome was correlated with the direction of replication of each replicore and was consistent with integration into the lagging strand template.
The orientation bias is not restricted to [3]IS608 and ISDra2. An in silico analysis of a large number of bacterial genomes carrying copies of various family members revealed that most had a strong insertional bias consistent with the direction of replication[24] (Fig. IS200.25). Moreover, in certain cases, elements which did not follow the orientation pattern could be correlated to the genomic region that had undergone inversion or displacement (Fig. IS200.26; Fig. IS200.27) suggesting that, once they occur, insertions are quite stable. It seems possible that this type of genomic archaeology based on orientation patterns could be used to complement the study of bacterial genome evolution.



Stalled replication forks: Stalled replication forks appeared preferential targets for IS608 insertion. In the experiments using the Tus/ter replication termination or operator/repressor system, replication fork arrest attracts IS608 insertion [24]. Transient blockade of the unidirectional replication fork by the Tus protein at the ter site resulted in preferential IS608 insertion into the array of target sequences behind the stalled forks on the lagging strand but not on the leading strand (Fig. IS200.28). A similar result was obtained in the E. coli chromosome using the lacI/lacO and tetR/tetO repressor/operator roadblock systems[59][60] (Fig. IS200.29). Moreover, a significant number of IS608 insertions into the E. coli chromosome were localized in the highly transcribed rrn operons. This suggests that high transcription levels might affect replication fork progression (fork arrest by collision with RNA polymerase, R-loop formation, etc.) and could account for targeting the rrn operons. Thus, IS608 insertions can be targeted to the stalled forks and this may well represent a major pathway for targeting transposition.


Genome re-assembly after irradiation in Deinococcus radiodurans
Deinococcus radiodurans, arguably the most radiation-resistant organism known, has a remarkable capacity to survive the lethal effects of DNA-damaging agents, such as ionizing radiation, UV light and desiccation. After exposure to high irradiation doses, the D. radiodurans chromosome which is present in multiple copies per cell[61][62] is shattered and degraded, but can be very rapidly reassembled in a process called ESDSA (Extended Synthesis Dependent Strand Annealing). This involves resection of the multiple dsDNA fragments to generate extensive ssDNA segments, reannealing of complementary DNA and reconstitution of the intact chromosome [39].
Mennecier et al.[50] analyzed the mutational profile in the thyA gene following irradiation. The majority of mutants were due to the insertion of a single IS, ISDra2 which is present in a single copy in the genome of the laboratory D. radiodurans strain. Furthermore, using a tailored genetic system, both ISDra2 excision and insertion efficiency was found to increase significantly following host cell irradiation[23]. A PCR-based approach was used to follow irradiation-induced excision of the single genomic ISDra2 copy and re-closure of flanking sequences. Remarkably, these events are temporally closely correlated with the start of the ESDSA. The signal that triggers ISDra2 transposition is likely the production of ssDNA intermediates generated during genome reassembly. Consistent with this, the requirement of ssDNA substrates for ISDra2, as for IS608, was confirmed by in vitro studies of TnpAISDra2-catalysed cleavage and strand transfer[23].
ISDra2 excision also depends on the direction of replication and is consistent with a requirement for the active strand to be located on the lagging strand template in normally growing cells. However, this bias disappeared in irradiated D. radiodurans [24]. Since no apparent strand bias was observed in generating ssDNA during ESDSA, the lack of orientation bias in irradiated D. radiodurans suggests that ssDNA substrates are no longer limited to those rendered accessible during replication. This indicates that ssDNA sources are different in the contexts of vegetative replication and in genome reassembly.
Real-time transposition (excision) activity
The dynamics of IS608 excision from a donor site has been examined at the colony and single-cell level in real-time using an artificial IS608 derivative inserted between the -35 and -10 elements of a PlacIQ1 promoter[63] driving expression of the blue fluorescent protein mCerulean[64]. TnpAIS608, N-terminally tagged with the bright yellow reporter Venus[65] was supplied in trans driven by PLTetO1 and controllable over a 100x range. Excision rates were proportional to the transposase levels and, as expected, excision depended on the orientation of the IS derivative with respect to the direction of replication in the donor plasmid: IS in an orientation with the active IS strand in the lagging strand template excised more frequently and at lower (10x) TnpA levels than when inserted into the leading strand, demonstrating the validity of the experimental system. In this system, individual excision events as bright flashes of blue fluorescence. Following an initial activity in the part of the population when cells are applied to a solid medium, activity decreases or ceases during “exponential” growth but increases again at a constant rate (in a sub-population) upon growth arrest in a random (Poisson distributed) way. Moreover, the events do not occur randomly in the growing colonies and tend to be excluded from the colony edges. The study underlines the heterogeneity of TE activity rates in both space and time possibly resulting from heterogenous TnpA levels at the individual cell level in the population. These studies are reminiscent of the early studies of Jim Shapiro on phage Mu-mediated rearrangements in growing bacterial colonies[66][67].
TnpB and its Relatives: Guide RNA Endonucleases
TnpA alone can carry out both the cleavage and joining steps in vitro. TnpB is encoded only by the IS1341 and IS605 groups and is not required for transposition of either IS608 or ISDra2 in Escherichia coli and Deinococcus radiodurans respectively [14][20]. The full length TnpB is approximately 400 amino acids long.
IS200/IS605 and the ISC group
An overview of TnpB organization was originally obtained by comparing the entire ISfinder collection of 85 tnpB copies with the Pfam domain database (Fig. IS200.30). This revealed three major domains: an N-terminal putative helix-turn-helix, a longer and more variable central domain, OrfB_IS605, with a putative DDE motif and a C-terminal zinc finger (ZF) domain of the CPXCG type. Half of the analyzed TnpB copies including TnpBISDra2 but not TnpBIS608 contained all three domains, while only two did not include a zinc finger.
TnpBIS608 was missing the N-terminal HTH domain which would provide an explanation for its lack of activity in certain assays [41].
Pasternak et al.[68] observed that TnpBISDra2 appears to have an inhibitory effect on ISDra2 excision and insertion in its host, D. radiodurans, and on excision in E. coli, and that the integrity of its putative zinc finger motif is required for this effect.
Relatives of TnpB has been identified in both prokaryotes and eukaryotes. It is carried by members of the IS607 family found both in prokaryotes and in eukaryotes and their viruses but is dispensable for IS607 transposition in E. coli . As it is for IS200/IS605 transposition. TnpB analogues, known as Fanzor1 and Fanzor2 (see: Fanzor section below), have also been identified in diverse eukaryotic transposable elements.

TnpB and IscB are Related to the RNA-guided nucleases Cas12 and Cas9.
More extensive analysis showed that TnpB shares some similarity with the RNA-guided nuclease Cas12 while IscB showed greater similarity to Cas9. Both, like Cas9 and Cas12, themselves exhibit split RuvC endonuclease domains [42][69] [70][71][72] (Fig. IS200.31). While Cas9 and Cas12 carry related functional domains, their architectures are somewhat different and the configuration of their guide RNAs also differ.

IscB and Cas9
Cas9 (also called Cas5, Csn1, or Csx12) is an RNA-guided dual nuclease generally associated with CRISPR systems in bacteria and widely used in genome engineering. The RuvC DED catalytic triad is split into three sections (I, II and III) in which I and II are interrupted by the R-rich region and II and III by an HNH nuclease domain (Fig. IS200.31). A region common to all Cas9 derivatives is located at the C-terminal end.
The Cas9 structure has been determined (Fig. IS200.32. B [73]). The protein is a monomer in which the three RuvC segments I, II and II carrying the D, E and D catalytic residues respectively, are assembled into the correct three-dimensional configuration to generate a RuvC-like catalytic pocket with the HNH nuclease domain extruded (Fig. IS200.32. A). The Cas9 guide RNA (crRNA) is composed of a region containing secondary structure potential and a 5’ extension (spacer) of about 20 nts, complementary to the target sequence and which forms an RNA/DNA heteroduplex (Fig. IS200.32. C). Activated Cas9 recognises a specific sequence, PAM (Protospacer Adjacent Motif), located next to the target sequence on the complementary strand downstream of the target sequence. This is necessary for binding of the Cas9-crRNA complex and subsequent cleavage [74]. Cleavage is catalysed by both the HNH nuclease (target strand) and the reconstituted RuvC nuclease (complementary strand). Cleavage is often “blunt” (i.e. occurs at the same position on both strands) and PAM proximal [74].

IscB shares Cas9 sequence features such as the split RuvC and HNH nuclease domains and an arginine-rich (R-rich also known as a bridge helix) domain (Fig. IS200.31 Top) with a group of Cas9 derivatives, Cyan7822_6324, in particular [75]. In addition, a more detailed investigation [43] led to identification of an additional IscB N-terminal domain (called PLMP after its conserved amino acid residues) not present in Cas9 (Fig. IS200.30. Top). These features appear in alignments of IscB sequences [42] ; Fig. IS200.33.

TnpB and Cas12
Cas12 is also an RNA-guided nuclease. A number of subtypes have been described [76] and the structures of several of these have been solved. They have similar C-terminal ends but carry (related) N-terminal ends of various lengths (see Karvelis, et al.[72]). One of the shorter derivatives Cas12F (AKA Cas14) [77] acts as a dimer. Like Cas9, the common C-terminal end is composed of a split RuvC (I, II and III) in which I and II are interrupted by the R/K-rich region. In this case, however, instead of the HNH domain, RuvC segments II and III are separated by a zinc finger of the CPXCG typeI (Fig. IS200.31 bottom).
For Cas12, the guide RNA is composed of a region containing secondary structure potential and a 3’ extension (spacer) of about 20 nts, complementary to the target sequence (Fig. IS200.34). The PAM sequence is located upstream of the target sequence. Cleavage is PAM distal and staggered.

Karvelis, et al.[72] describe the domain structure of TnpB and present evidence that it is related to Cas12, another derivative of the Cas family (Fig. IS200.34 bottom). Like Cas12F, it also carries a RuvC in which the D (I), E (II) and D (III) catalytic residues are split. Again, RuvCI and RuvCII are separated by an R-rich region and RuvCII and RuvCII by a zinc finger with three modules (Fig. IS200.31 bottom). Moreover, the N-terminal region which corresponds to the minimal common structural elements present in Cas12 [72], includes a three helical bundle Rec domain (labelled HTH in an earlier TnpB analysis; Fig. IS200.31 bottom), inserted into a β-barrel domain, referred to as the “Wedge” domain in Cas12. It should be noted that the RuvC domain is used to cleave both DNA strands while the Z domain simply assists this cleavage.
These features can be identified in an alignment of the entire TnpB library (349 examples from ISfinder; November 2021) (Fig. IS200.35 i, ii and iii) and in TnpB sequences provided by Kapitonov et a.,[42] (Fig. IS200.36).
The relationship between Cas12 and TnpB has strong support from structural modelling [72]: for example Un1Cas12f1 (Cas14a) from an uncultured archeon [78], which functions as an asymmetric dimer and represents a minimal domain organization of the Cas12 group [72]. However, TnpB from ISDra2 (see below) appears to be a monomer [72].

Evolution of TnpB and IscB from an Ancestral RuvC?
In view of the relationship between TnpB, IscB, RuvC and the Cas proteins, the important question of the evolutionary trajectory of these proteins arises. Using various analytic tools, it was concluded that all Cas9 examples identified to date are probably descended from a single IscB derivative ancestor [43]. This contention arose from the observation that the CRISPR-associated IscB derivatives do not form a single clade but are distributed over the IscB phylogenetic tree suggesting that they evolved independently from a single acquisition [43]. Additional IscB derivatives were also identified in this study which led to an evolutionary scenario involving successive acquisition of domains by an ancestral RuvC (Fig. IS200.37). The additional species included a shorter derivative, IsrB, which carried the bridging helix but not the HNH domain and a longer derivative which had acquired a so-called REC domain [43].
TnpB appears to have followed an alternative evolutionary route towards Cas12. In addition, it is thought that TnpB was an ancestor of the eukaryotic Fanzor proteins [80] (see: Fanzor section below) associated with diverse eukaryotic potential transposable elements.

Functional analysis of TnpB and IscB
Clearly, the relationship between TnpB and IscB and Cas12 and Cas9 respectively suggested that TnpB and IscB might function as RNA guided nucleases which may, in some way, be involved in transposition [43][72] and this has been extensively tested.
TnpB functions as an RNA-guided Endonuclease
For TnpB, Karvelis, et al.[72] used ISDra2 as a model system. This has the advantage that its transposition behavior has been well characterized [23][68].
In ISDra2, the 3’ end of the upstream tnpA gene overlaps the 5’ end of tnpB. The authors were unable to efficiently express TnpB as a fusion protein but observed that its yield was significantly increased when in its natural context but in which TnpA had been inactivated by mutation. Although the nature of the mutation is not specified in the article, its behavior could be explained if it were an in-frame deletion or other mutation which does not affect C-terminal translation since it seems likely that expression of TnpB involves translational coupling [81][82] with TnpA suggested by their overlapping reading frames (Fig. IS200.38).

TnpB was found to purify with RNA of approximately 150 nts derived from the IS RE (reRNA). reRNA was complementary to the tnpB 3’ end, RE, and about 16 nt of (host) flanking DNA (Fig. IS200.38). This RNA, with the secondary structure provided by the RE sequence and the 3’ extended flanking DNA is of the expected configuration for relatives of Cas12 (Fig. IS200.36). Previous studies had identified non coding RNA (ncRNA) from the 3’ end of IS1341, a related IS from Halobacterium salinarum NRC-1, called sense overlapping transcripts (sotRNAs) [83].
ncRNAs, sotRNAs and reRNAs
There has been much interest in non-coding RNA (ncRNA) and global searches in Archaea had revealed ncRNA expressed from IS1341 group members which carry only a tnpB gene and are devoid of the TnpA transposase [84][85][86].
During a detailed analysis of ncRNA produced from Halobacterium salinarum NRC-1 [87][88], an ncRNA from the region encompassing the right end of these IS200/IS605 family members was identified. This was called sotRNA (sense overlapping transcript). The authors demonstrated from a publicly available transcriptome compendium [89] that all 10 IS1341 group members in H. salinarum NRC-1 genome express a sotRNA (Fig. IS200.39) and show condition-dependent differential regulation between sotRNAs and their cognate genes. sotRNA started within tnpB at approximately 1100 nt from its initiation codon, had an average size of 218 nt, and ended approximately 74 nt 3’ to the tnpB termination codon. The authors could not distinguish between the hypotheses that sotRNAs are generated by primary transcription or by processing from a full length transcript of the tnp gene (although they were unable to locate any potential promoter).

Such sotRNA transcripts, specific for tnpB genes, had previously been identified by Gomes-Filho et al., [87] in a number of Archaea and Bacteria including S. acidocaldarius, Methanopyrus kandleri, Helicobacter pylori and E. coli K12. There has also been some indication of “transposase-related” sense overlapping transcripts of tnpB-like genes from T. kodakarensis [90] and P. furiosus, [91]. However, that these may represent guide RNAs had not been explicitly considered.
Furthermore, sotRNA included, what the authors called, an RE-like tetraloop resembling the RE DNA loop structure as do sotRNA from P. abyssi and other thermococcal genomes [85].
TnpB: mechanism of action
Karvelis et al.[72] demonstrated that TnpB, purified using a His tag, could cleave DNA. They argued that since the 3’ end of the ISDra2 reRNA corresponds to the DNA target, it would vary according to the position of the IS insertion and the reRNA may (have) serve(d) as a guide RNA. If true, cleavage of the target DNA should occur within the 3’ extension sequence of the flank (the foot of RE in Fig. IS200.38). In this context, it is interesting that the (DNA) structure of the right end was shown to form a base triple which is a characteristic of RNA [21].
To determine whether RNA-guided cleavage occurred , they constructed a system (Fig. IS200.40) using a plasmid supplying TnpB together with an reRNA (Fig. IS200.40 A) which included a 16 (or 20) defined nucleotide flank sequence and was terminated by a specific Hepatitis delta virus ribozyme (HDV; [92]) to produce a defined 3’ RNA end [93]. A lysate from the host strain was then used in cleavage assays of a library of target plasmids each containing a specific defined 16 base pair sequence directly downstream from a 7 bp (7N) randomised sequence (Fig. IS200.40 B). This has previously been used to identify conserved PAM sequences [72] [94]. Specific double strand cleavage products were captured by adapter ligation (details in Karvelis et al.[72] and the sequence of the resulting enriched 7N region was determined. This corresponded to the conserved ISDra2 target pentanucleotide TTGAT (with a higher enrichment for GA) sequence which is essential for IS insertion and abuts LE in the integrated IS. By equivalence to PAM, this sequence was called TAM (Transposon Adjacent Motif) [72] see also [43] (Fig. IS200.40 C).

This cleavage specificity was confirmed using purified TnpB-RNP in which the protein and RNA components were produced by separate plasmids and a target plasmid carrying a 3’ flank, a 5’ TTGAT TAM pentanucleotide and a different guide sequence (Fig. IS200.40 C). The results showed a majority of double strand breaks in the supercoiled target plasmid to generate linear plasmids but also a significant level of nicked product. The TnpB-RNP was also active on a linear substrate (i.e. activity does not require supercoiling). In both cases, use of a TnpB D191A mutant, part of the conserved RuvC DED catalytic triad, eliminated the reaction. Robust TnpB-mediated cleavage activity was observed and required both TAM and guide RNA sequences. Further sequence analysis revealed that cleavage occurred distal to the TAM sequence at the guide sequence boundary and was specific for cleavage on the bottom strand but showed some variation on the top strand (Fig. IS200.40). There are some differences however with Cas12. TnpB is a monomer and requires a single copy of reRNA [72].
A similar study by Altae-Tran et al.[43] using purified TnpB from a less well characterised tnpB gene of Alicyclobacillus macrosporangiidus, (TnpBAma), showed that the protein catalysed cleavage of both double- and single-stranded DNA targets in both a TAM-dependent and TAM independent manner. As in the case of TnpBISDra2, A. macrosporangiidus TnpB-associated guide RNA was identified and derived from the 3’ end of the tnpB gene. In this case, the TAM appeared to be the tetranucleotide TCAC.
These studies therefore identify CL (which is outside the transposon but necessary for transposition by interacting with GL Fig. IS200.13) as the TAM.
An explanation of the “inhibitory effect reported for TnpB?
Moreover, in vivo, TnpB expression together with reRNA from one plasmid resulted in loss of a second plasmid carrying the reDNA target (interference), presumably as a result of cleavage at the target site and linearization of the plasmid. This of course may explain the inhibitory effect of TnpB originally observed by Pasternak et al. [68].
A system which functions in Eukaryotes
Additionally, the authors were able to demonstrate that the system functions in eukaryotic cells opening the possibility that it could be suitably modified for gene editing.
RNA Nomenclature, Processing, Structure, Diversity and mode of function
IS605 group guide RNAs have been called both reRNA and ωRNA (OMEGA for obligate mobile element-guided activity). Here, to eliminate confusion, we will use the term re(ω)RNA (or ω (re)RNA) for that from both tnpB and iscB groups although they have different secondary structures and functions.
Generating re(ω)RNA: Processing
The important question of how re(ω)RNA is generated was addressed by Nety et al. [95]. Given that TnpB is thought to be an ancestor of Cas12 [96][97], the ability of Cas12 to process RNA (e.g. [96]) may have originated from analogous functions in TnpB [95]. They demonstrated that a TnpB orthologue from the bacterium, A. macrosporangiidus (AmaTnpB or TnpBAma), has RNA processing TnpBAma activity and can generate an re(ω)RNA.
The purified TnpBAma (either wildtype or a RuvC-II catalytic mutant) was incubated with four different in vitro transcribed RNA substrates (Fig. IS200.41 i and ii) produced from PCR-generated DNA templates: a “random” negative control of 1190 nt (Fig. IS200.41 i1); a 166 nt RNA with the RNA guide very similar to that found to be associated with an TnpBAma orthologue, a potential re(ω)RNA (Fig. IS200.41 i2); a full length tnpB transcript extended to include the guide sequence of 1190 nt (Fig. IS200.41 i3); and the potential re(ω)RNA with a 59 nt 3’ extension of 225 nt (Fig. IS200.41 i4).

While substrate 1 was refractory to processing, both substrates 2 and 3 generated a 126 nt fragment. Substrate 4 generated a 185 nt fragment suggesting that, while it was processed correctly at the 5’ end, the 3’ extension was not processed. These conclusions were confirmed by RNAseq. All substrates were refractory to the TnpBAma RuvC-II mutant.
DNA cleavage activities were assessed by including a 1221 nt dsDNA substrate containing the TnpBAma TAM (Fig. IS200.41 i). RNA substrates 2, 3 and 4 all catalyzed TnpB-mediated DNA cleavage. These results are consistent with those obtained with TnpBDra2 (see below;[98][99]) showing that only the proximal 12 nt of the guide sequence is sufficient for DNA targeting.
The cleavage activity of the three substrates was not identical. The activity of substrate 3, which carries a substantial 5’ extension, was significantly lower than the other two raising the question of whether the extension may include inhibitory sequences.
To investigate this, RNA samples were prepared with different 3’ deletions (Fig. IS200.41 ii) When these RNA species were included in the cleavage reactions, a region between co-ordinates 825 and 875 which shows extensive complementarity to the re(ω)RNA scaffold was observed to be responsible for the inhibitory effect.
This suggests a cis-regulatory mechanism engaged in controlling re(ω)RNA activity [95].
Using ISDra2 [23], Nakagawa et al.,[98] observed that, although TnpB was co-expressed with a 247 nt re(ω)RNA in their purification system, it remained bound to only 100-160 nt of the RNA even in a denaturing gel. Further analysis revealed that the RNA was rapidly degraded in the absence of TnpBDra2 but, in its presence, three different RNAs of approximately 220, 160 and 130 nt were observed, the latter two included the guide sequence at the 3’ end. Very little of the 200nt species was observed in the purified RNP, suggesting degradation, but LC–MS analyses suggested that the 160nt species was cleaved between co-ordinates −150 and −149 or −138 and −137 by TnpB and/or endogenous RNases. They also provide evidence that the ~130-nt RNA is cleaved between −117U and −116G (Fig. IS200.41 ii).
Furthermore, Sasnauskas et al., [99], observed that an re(ω)RNA from between co-ordinates -130 and + 16 was active in DNA cleavage. Nakagawa et al.,[98] also found that truncation of the 5′ region of the re(ω)RNA (−231G to −117U) had no effect on TnpB-mediated DNA cleavage.
Thus re(ω)RNA of ISDra2 also appears to be processed at its 5′ end, and at least a 130 nt fragment including the 3’ guide are stably bound to the TnpB protein.
Structure of TnpB-reRNA in association with DNA
Two studies addressed how TnpB interacts with its DNA template [98] [99] both used TnpBDra2. (Fig. IS200.42). and an re(ω)RNA which included nucleotides -130 to + 16 of the right end (Fig. IS200.42 ii) [99]. Nakagawa et al., [98] used a substrate which was slightly extended in the 5' direction. Both sets of results were essentially the same.
The RNP structure and the ternary structure with the target sequence TnpB could be divided into two “lobes” [98][99]: an N-Terminal lobe (Recognition or Rec) comprising the wedge (WED) and REC domains and a nuclease lobe (Nuc) (insert in Fig. IS200.42 iii) in which the three individual RuvC domains adopt an RNase H fold including D191 (RuvC I), E278 (RuvC II) and D361 (RuvC III).
The results showed that in the RNP complex (Fig. IS200.42 iii left), the principal interactions are with the RuvC and WED domains whereas in the ternary structure with target DNA (Fig. IS200.42 iii right), not only does WED interact with TAM but the RecA domain intervenes around the branch point and the RuvC domain interacts extensively with the target-guide RNA hybrid helix. Note that the CR (TAM) sequence which interacts with GR as DNA during TnpA-mediated transposition ( Fig. IS200.42 i) also forms a short interaction with a sequence upstream which is identical to GR (Fig. IS200.42 ii) to generate a pseudoknot. The scaffold core is formed by the RNA triplex region delimited by the pseudoknot while stem 1 and stem 2 protrude in opposite directions (Fig. IS200.42 iii).
All five TAM positions (Fig. IS200.42 iii right) are recognized directly by the WED domain and substitutions at any TAM position eliminates both target DNA binding and cleavage [99].
On the other hand, substitutions in the guide sequence do not prevent TnpB binding but prevent cleavage. The re(ω)RNA–target DNA heteroduplex (Fig. IS200.42 iii right) is accommodated within a central channel formed by the WED, REC and RuvC domains [98][99].
The authors conclude from the structural results that, for cleavage, the system senses formation of a (perfect) B-form RNA-DNA hybrid without any mismatches because of the effect of guide substitutions and that TnpB requires a 12–16-bp long target perfect DNA-guide RNA heteroduplex to initiate DNA cleavage.
Additional information concerning activity was provided in a study principally exploring diversity in this system (see: Exploring and defining TAM sequences).
Xlang et al [36] analyzed re(ω)RNA activity requirements of ISDra2 and three additional IS: ISTfu1, ISDge10 and ISAba30. In these experiments, the 3’ re(ω)RNA scaffold end was defined as the RE tip (Fig. IS200.44).
Activity was exquisitely sensitive to the integrity of CR. Deletion or mutation of all but the 3’ terminal CR base pair significantly reduced activity.
Additionally, the length of the guide sequence was important as was its sequence matches with the target. Optimal editing efficiency occurred with guide sequences between 16 and 20 nucleotides and subsequently decreased with increasing length but was observed to vary somewhat between the three IS (Fig. IS200.42 ii).
Similarly, introduction of single and double base pair transversions into the target, especially in the TAM proximal region approximately up to base pair 12, severely reduced or eliminated activity (Fig. IS200.42 ii) with some variation between the different IS.
This is similar to results obtained with Cas9 and Cas12 systems themselves [100][101]. Finally, variation in 5’ length showed that shortest active scaffolds were 120–140 nt long and lengths of 300 nts were active.

For TnpBDra2, the C-terminal domain (residues 376 to 408; Fig. IS200.42 bottom insert) has relatively low sequence similarity among TnpB proteins and is disordered in the structures. The C-terminal truncation mutant (Δ376 to 408; ΔCTD) is efficient in target DNA cleavage but exhibits somewhat reduced protein stability. Thus the CTD is not required for RNA-guided target DNA cleavage.
TnpB-re(ω)RNA: Diversity and Activity
In view of the minimal size of the TnpB family guide endonucleases, they may prove useful for targeting applied for biotechnological purposes. It is therefore of importance to determine the extent of their diversity and inherent activities. It had been reported that the TnpB family is an order of magnitude more diverse than the IscB family and an HMMER search of prokaryotic genomes identified >106 tnpB loci [43].
At least two studies [36][95] have addressed this question in some detail.
Exploring and defining TAM sequences
To further explore TnpB diversity tnpB DNA sequences of the 107 IS605 subgroup ISfinder entries (Fig. IS200.4B) were more extensively analyzed [36]) with a view to uncovering differences in activities and identifying highly active members. This analysis did not include the 244 IS1341 members which are flanked by typical IS200-IS605 family secondary structures but carry only a TnpB gene.
Firstly, the IS605 subgroup members were used as a seed to search the non-redundant NCBI nucleotide sequence database. Full length copies were extracted and their flanking sequences were examined to eliminate identical insertion events.
To confirm the ISfinder validation, the right end of each multicopy IS was aligned and the tetranucleotide which forms CR and undergoes special base pairing with the tetranucleotide guide sequence (GR) within RE (Fig. IS200.13) was identified, while the single copy IS were examined and compared to their ISfinder annotations. Additionally, the integrity of tnpB was confirmed. This is important because it has been observed that in IS containing tnpA and tnpB, tnpB is often decayed (see He et al., [102]).
It should be noted that these procedures are always undertaken as a matter of course before any IS200/IS605 family entry is made in ISfinder.
The collection was arranged into 64 bins using a 90% identity threshold and these were named after the IS with the highest copy number in each group (Fig. IS200.43). Many of these groups consisted of only single example although several included a few additional examples.

To examine how the sequence identities between CL and TAM (Fig. IS200.44) correlate over the range of IS605 group members distributed over the 64 TnpB bins (Fig. IS200.43), activities were tested separately for each of the 64 using a 2 plasmid, TAM depletion assay (Fig. IS200.44 ii) [36].
One plasmid included ~200 nucleotides of the 3’ IS ends including a 20nt abutting “guide” sequence cloned downstream of a tnpB gene which, when expressed together (Fig. IS200.44 ii), are capable of forming the re(ω)RNA complex. The second plasmid consisted of a library with five randomized base pairs (N5) located 5’ to a target sequence recognized by the guide sequence, an assay similar to that used by Karvelis et al., [72] (Fig. IS200.40). Both plasmids were introduced concomitantly into a host cell. Those that carry an N5 sequence susceptible to the corresponding re(ω)RNA complex will be depleted and underrepresented in the plasmid population (reduced level of KmR colonies in the population).

The corresponding TAM sequences (Fig. IS200.43) showed a remarkable identity to the CL sequences with very few variations. For these variants, the authors propose alternative base pairings which would need to be confirmed experimentally.
Further analysis based on a tree generated from TnpB alignments such as those shown in Fig. IS200.35, revealed, perhaps not unexpectedly, that TAM sequences were more similar between closely related IS.
The relative activities of the TAM sequences in each case were then assessed in E. coli using a similar plasmid system to that of Fig. IS200.44, but in which the N5 sequence was substituted for the proposed TAM.
A high proportion (25/64) of these TAM/TnpB derivatives were found to be active.
Sequence requirements of the re(ω)RNA
To explore re(ω)RNA sequence requirements in greater detail, three IS systems, ISTfu1, ISDge10 and ISAba30, in addition to ISDra2, were analyzed in for their guide RNA functions [36].
The relatively small TnpB protein had been demonstrated to function in gene targeting in human cells [72]. Since the interest of Xiang et at [36] was to optimize TnpB as a targeting tool in human cells, the assay was designed for use by transfection into the HEK293T human cell line It used a system in which an out of frame downstream GFP gene was reframed only when the TnpB nuclease could act on its target and the DNA break was repaired by non-homologous joining (Fig. IS200.45 i).

When this reporter plasmid and a TnpB/ re(ω)RNA plasmid were co-transfected, all four TnpB systems were shown to function, yielding 10% to 34% of GFP transfected cells (Fig. IS200.45 ii). They each generated short, deletions of various lengths, some of which lead to placing the GFP gene in phase yielding GFP+ cells in the population. The overall organization of the IS including TAM, scaffold and guide sequence is shown in Fig. IS200.46 i.

Severely decreased activity in re(ω)RNA guide activity was observed with mutation of either CR or the four proximal nucleotides (Fig. IS200.46) and in the target site with single or double transversion in the TAM proximal region.
It should be noted that where assays were carried out following transfection of human HEK293T cells and it is possible that the results may vary in the appropriate bacterial hosts.
Exploring and defining TAM sequences in a library extracted from NCBI
In a second study to investigate whether the re(ω)RNAs were present across the widely diverse TnpB systems [43], Nety et al.,[95] constructed a TnpB sequence library, extracted from data from NCBI, which included those associated with Y1 (HUH; IS200-IS605 family), serine (IS607 family) transposases or “non-mobile” orthologues. This generated 5 clades [95]; background in Fig. IS200.47). The clades follow the configuration of the RuvC catalytic motif (Fig. IS200.47) (RuvC-III DRDXN, typical; RuvC-III NADXN, derived) or “catalytic rearrangements (RuvC-II (RII-r3 and 5) or RuvC-III (RIII-r4) domain) [103] (Fig. IS200.47).
The authors chose 59 TnpB orthologs covering the diversity (background to Fig. IS200.47; [95] and varying in length between 353 to 550 aa. The TnpB-re(ω)RNA-encoding loci including a suitable promoter were expressed in an in vitro transcription/translation (IVTT) system and the 5’ ends were determined by RACE from the 3’ re(ω)RNA end lacking the guide sequence.
This identified 30/59 orthologs with a defined 5’ end and lengths of between 79 and 466 nt. TnpBAma generated a 106 nt scaffold, and is thus identical in processing as was found in the experiments of Fig. IS200.41. Some orthologs, such as TnpBDra2 showed multiple 5’ ends, consistent with previous observations suggesting either incomplete or promiscuous RNase activity [72][98].
A screen for DNA nuclease activities of the IVTT-produced re(ω)RNAs revealed that 27/59 were active. They also defined the TAM sequences revealing only limited diversity of these sequences as was also found for the ISfinder collection [36]. The assay was validated by confirming both the TnpBAma (TCAC) and TnpBDra2 (TTGAT) TAM sequences.

re(ω)RNA and tnpB Co-evolution
It was noted that ISDra2 re(ω)RNA includes the 3’ segment of tnpB (residues 335 to 408 and −231G to −10U) which suggests that TnpB and the guide sequence system might have co-evolved [99]. However, although re(ω)RNA expression and processing may require co-expression with the TnpB protein, Nakagawa et al., [98] suggest that co-evolution might be less constrained than previously predicted because, they argue, that functionally essential gene regions and those of re(ω)RNA do not overlap significantly: the structures imply that the TnpB C-terminus (residues 376 to 408 overlapping with −109G to −10U) is not involved in DNA cleavage, and the 5′ re(ω)RNA terminus (−231G to −117T, overlapping with residues 336 to 373) is not required for target DNA cleavage.
The question of co-evolution is complex since it must also take into account the constraints imposed by the mechanism(s) involved in the DNA transposition process: the TAM sequence which abuts the left IS end (LE) also serves as a sequence required for cleavage and insertion at the left end CL and that CL interacts in a complex way with a partially complementary sequence, GL, located at the foot of a stem loop (DNA) structure recognised by the TnpA transposase (see He et al., [102]). Moreover, changing the GL sequence leads to a change in the specificity of insertion – i.e. changes the CL sequence [48]. More importantly, the CR sequence which is an integral part of the IS, plays a central role in both the RNA guide and TnpA-mediated DNA cleavage reactions and interacts both with a sequence at the foot of a secondary structure at the right end (RE), GR. and, in the re(ω)RNA where it forms part of a pseudoknot (Fig. IS200.42) [98] [99].
IscB, like TnpB, is also an RNA-guided Endonuclease
Altae-Tran et al.,[43] also examined a very large number of rather disperse IscB systems for their endonuclease properties, their association with RNA and their capacity as RNA guide proteins. Initial studies concerned a CRISPR associated IscB (marked in the article as Delaware Bay acquatic sample), which when purified from a heterologous Escherichia coli host was associated with an RNA localised directly upstream of iscB which generated a signal in a PAM (TAM) “discovery” assay and was able to generate cleavage products in vitro with the appropriate target.
An alignment of over 500 (non-redundant) iscB genes revealed an upstream region of conserved sequence of about 300 bp which terminated at what the authors state is an IS200/IS605-like end. One specific example examined, present in the host K. racemifer genome in nearly 50 copies, was associated with non-coding RNA species in most cases, which they called ΩRNA, with significant secondary structure potential. An example of K. racemifer IscB was investigated in vitro using a plasmid substrate and shown to: use a target adjacent pentanucleotide TAM, ATAAA; and observed that by changing the complementary RNA extension (guide),cleavage was reprogrammable.
To further characterize IscB, the TAM sequences of 57 examples from a collection of 86 genes from a phylogenetically diverse set of bacteria could be determined; of those 57, 5 were reconstituted with their omega RNA and found to be active in target cleavage; and one, AwaIscB or IscBAwa) from Allochromatium warmingii, was chosen for further study.
Biochemically, IscBAwa could cleave double strand DNA in a magnesium dependent reprogrammable way with a temperature optimum of 35-40°C and with RNA guide lengths of between 15 and 45 nts. A mutation of the RuvC E residue eliminated cleavage of the non-target strand while mutation of an H residues in the HNH motif eliminated cleavage of the target strand (as expected for a Cas9-related enzyme; Fig. IS200.32). Mutation of both residues eliminated cleavage altogether. Also, like Cas9, cleavage was: TAM (PAM) proximal (3 nts from TAM for the target and 8 or 12 nts for the non-target strands); the RNP protected DNA from ExoIII digestion 19 nts upstream of the TAM on the target and 6 downstream on the non-target (Fig. IS200.32); and truncation of the newly identified N-terminal PLMP domain (named after a cluster of conserved amino acids; Fig. IS200.48 top) eliminated activity.

The Structure of IscB–ωRNA ribonucleoprotein complex and the ternary complex containing target DNA.
IscB associates with a 200-400nt ωRNA, significantly longer than the 100nt guide RNA of its probable offspring, Cas9 [43]. IscB are much smaller than Cas9 and lack the α-helical nucleic-acid recognition domain but share the RuvC and HNH endonuclease domains (Fig. IS200.48).
Kato et al., [104] used an IscB protein derived from the human gut metagenome (IscBOgeu) as a model while Hirano et al., [105] used an IrsB (IsrBDt) from Desulfovirgula thermocuniculi. IrsB are related to IscB but lack the HNH nuclease domain (Fig. IS200.48). Note that this is a more detailed description of the domain structure than shown in Fig. IS200.37. A detailed study by Meer et al., [41] found that the IscB and IrsB formed clearly separate groups on a phylogenetic tree.
For the structural cryo-em studies, a catalytically inactivated IscBOgeu E193A (RuvC)/H247A (HNH) derivative was used. In the IscBOgeu structure, the catalytic D61 (RuvC I), E193 (RuvC II), H340, and D343 (RuvC III) and a divalent Mg2+ ion (Fig. IS200.48) are configured similarly to those in Cas9 although the structure lacked the HNH domain.

The ωRNA structure is complex (Fig. IS200.49) comprising a 27 nt guide sequence and a 206 nt scaffold with 5 stem loops, 4 stems and a linker. The guide adaptor, stem-loop 1 (yellow), connects the guide segment (dark red) and stem 1 (green; which the authors call the “nexus” stem widely conserved in the tracrRNA of Cas9s; [106]). Stem 1, stem 2 (grey; the central stem), and stem-loop 3 (brown) form a three-way junction. Like TnpB ωRNA, IscBOgeu ωRNA also includes a pseudoknot (??). Stem loop 2 (blue) stacks with the nexus pseudoknot hairpin (pink) which in turn interacts with the pseudoknot stem 4 (red).
The cognate ωRNA and IscBOgeu E193/H247 were expressed in E.coli, the IscB-ωRNA complex purified and the ternary complex assembled by mixing with target DNA. However, to improve resolution, it was found necessary to delete the HNH domain (residues 199 – 295) (Fig. IS200.48), which is flexible in Cas9 [107][108]. The complex, composed of an IscB monomer and a single ωRNA was formed using the deletion derivative IscBω, an ωRNA of 233 nt including a 27 nt guide sequence and a partially double strand DNA target (Fig. IS200.49 right).
In the ternary complex IscB ωRNA guide sequence forms a 14 bp heteroduplex with the target DNA (Fig. IS200.49 middle right) and is recognized by IscB in a sequence-specific fashion using the short Rec region (Fig. IS200.48) shown in grey in Fig. IS200.49 middle right. A simplified cartoon is shown in Fig. IS200.49 bottom right. This is somewhat different from Cas9 which form a 20 bp heteroduplex with a much larger Rec domain. TAM is recognized by the CT domain and mismatches at positions 15 and 16 are tolerated for cleavage. The differences in a full complex with the HNH domain and with the ωHNH IscB derivative is shown in Fig. IS200.50.

The Structure of IsrB–ωRNA ribonucleoprotein complex and the ternary complex containing target DNA
IsrB is short, about 350 amino acids and lacking an HNH domain (Fig. IS200.51) (therefore equivalent to the ΔHNH IscB derivative). It is associated with a long RNA guide of ~300-nt which guides IsrB to nick the non-target strand (NTS) of double-stranded (ds) DNA (see Fig. IS200.51 top) containing a 5′-NTGA-3′ TAM [43].
The Desulfovirgula thermocuniculi IsrB (IsrBDt) ωRNA (284 nt) is longer than that of IscBOgeu, and includes a 20 nt guide segment which forms a heteroduplex with the target DNA [105]. Like IscBOgeu, IsrBDt ωRNA is structurally complex including eight stem loops and four stems (Fig. IS200.51 middle). The structure includes 2 pseudoknots: one defined by two of the stem-loops (2 and 5, red boxes (Fig. IS200.51 middle) and the other the “nexus” pseudoknot (blue boxes).

IsrBDt recognizes the TTGA TAM in the NTS by both hydrogen bonds and van der Waals interactions and cleavage occurred 8–11 nt upstream of TAM, further than the 2–5 nt of Cas9. TAM recognition was more specific at 60 °C for this thermophilic enzyme than at lower temperatures where NTGA was recognized [43].
IsrB diversity of structure and ωRNA architecture
As in numerous publications in this field, Hirano et al., [105], explored IsrB diversity and ωRNA ternary structure. They identified five orthologues and their cognate ωRNAs from: Crocosphaera watsonii (IsrBCw); Dolichospermum sp. (IsrBDs); Calditerricola satsumensis (IsrBCs); Burkholderiales bacterium (IsrBBb); and a viral metagenome assembly (IsrBK2). A standard TAM identification assay (such as that shown in Fig. IS200.40) indicated that IsrBBb recognizes NTGG while IsrBCw, IsrBCs, IsrBDs and IsrBK2 recognize NTG. All were active in an in vitro reconstituted IsrB-ωRNA RNPpromoted nicking of dsDNA substrates.
ωRNAs of the five orthologues and IsrBDs retain the core domain composition: four stems (S1–4) and five stem loops (SL1/2/4/5/7) (Fig. IS200.51 middle). Inspection of the ωRNAs showed some significant architectural differences, however: For example, in a group, including IsrBCs, IsrBK2 and IsrBBb, SL2 and SL4 form pseudoknots, and SL5 and the intermediate region between S2 and SL7 form pseudoknots while in a second group, including IsrBDt, IsrBCw and IsrBDs, SL2 and SL5 form pseudoknots, and SL4 and the intermediate region between S2 and SL7 form pseudoknots.
The IS1341 Conundrum: how do derivatives without their transposase transpose?
It had been noted that there are a large number of IS200-IS605 relatives which carry only the TnpB gene flanked by typical S200-IS605 family secondary structures [102] in a number of bacteria including the thermophilic Geobacillus and the cyanobacterium Anabaena. These are grouped into the subfamily, IS1341 with nearly 250 entries in ISfinder (December 2023) and were not included in the study of TnpB/TAM diversity of Xlang et al., [36]. Since the multiple IS891 copies such as those found in Anabaena imply that IS1341 group members are mobile, the question arises as to how their mobility might be accomplished. One possibility is that this is assured by a tnpA copy in the cell or that tnpB itself is involved.
IS1341 Group Diversity: Mining the NCBI NR database
The entries in ISfinder do not necessarily reflect the abundance of the different IS200-IS605 derivatives in the prokaryotic kingdom and Meer et al., [41] mined the NCBI NR database for tnpB and iscB homologues and extracted their flanking genomic regions to provide some perspective of the proportion of tnpB genes associated with IS1341 group members.
They found that only 25% of tnpB were associated with a tnpA copy. Note that nearly half IS200/IS605 members in ISfinder do not carry the tnpA gene.
Moreover, in the same analysis, iscB genes were much less abundant than tnpB and only 1.5% of these were associated with tnpA. Additionally, 8% of the tnpB collection were associated with a serine recombinase and are therefore probably members of the IS607 family while none of the iscB genes were found associated with this type of enzyme.
Both IscB and TnpB use transposon-encoded RNAs: For the IscB copies, a conserved intergenic region upstream of iscB that was bounded by the transposon RE was observed , which bore marked similarity to a non-coding RNA termed HEARO (HNH Endonuclease-Associated RNA and ORF; [109]) and those encoded downstream of tnpB have of course been known for some time, initially in Halobacteria (ncRNAs, sotRNAs and reRNAs;[87][88]).
Conserved secondary structure motifs
Covariation in the collection of tnpB and iscB re(ω)RNAs was analyzed separately to highlight the conserved secondary structure motifs (Fig. IS200.52) which straddle the IS ends and flanking DNA. This means that the (external) guide sequences (NNN… in Fig. IS200.52 i and ii) change with each transposition event into another target.

IS1341 group orientation suggests iscB re(ω)RNA but not tnpB re(ω)RNA is expressed in transcriptionally active environments.
A strong correlation was noted in orientation of between upstream genes and istB copies but not for tnpB copies. This is an important observation since it suggests that the re(ω)RNA of iscB must be expressed from an outside promoter towards iscB (Fig. IS200.52 ii) thus favoring production when inserted into transcriptionally active regions, while that of tnpB as shown previously (Generating re(ω)RNA: Processing) is expressed by processing from the tnpB transcript (Fig. IS200.52 i).
IS1341 Group Function
More detailed studies focused on tnpB- and iscB-carrying elements from G. stearothermophilus representing 1% of the genome [41]. These could be divided into 5 families (ISGst2-6) based on tnpB, RE and LE sequences and have quite similar RE and LE boundaries and all exhibited clade-specific CL (=TAM) and CR (=TEM) (e.g. Fig. IS200.44 i; Fig. IS200.53) with clade-specific co-varying mutations between both the TAM and TEM sequences and associated DNA guide sequences (e.g. Fig. IS200.44 i). Evidence from RNA-seq also showed that re(ω)RNA was expressed from multiple copies of these transposase-less IS (i.e. at different genomic positions and with different guide sequences). Derivatives lacking any protein-coding gene PATES (Palindrome-Associated Transposable Elements) [110] were also identified.
Does a Resident TnpA copy Drive IS1341 group Transposition?
Importantly, the authors also identified a tnpA gene in the G. stearothermophilus genome within ISGst2 which might serve to drive transposition of these IS. To asses this, a plasmid-based excision system was used which included a cloned copy of tnpA, (TnpAGst) to catalyze excision of a mini ISGst2 (Fig. IS200.53).
Excision in vivo as monitored by a PCR reaction appeared robust not only with ISGst2 but ISGst3, ISGst4 and ISGst5 all gave robust excision reactions (while that of ISGst6 was weak) and all generated the expected donor junction sequence after excision. The reaction was dependent on an active TnpA catalytic site. However, a substrate derived from IS608 was inactive presumably because it lacks an upstream domain (Fig. IS200.30). Interestingly, excision occurred when the mini-IS was present on the leading or lagging strand template but required both LE and RE. Mutation of the TAM (CL sequence) or the guide sequence (GL) reduced or eliminated activity but compensatory mutations which should restore the CL/GL interactions [48] restored some excision activity (Fig. IS200.53). Perhaps surprisingly, mutation of TEM (GR) did not eliminate excision since the system was able to select an alternative wildtype TEM sequence downstream to create an alternative IS end. It seems possible that, since LE is involved in both excision and targeted insertion, its correct interactions between GL and CL may be more stringent for activity.

The authors also determined that, although not in single chromosome copy, the cloned tnpA could drive transposition at similar frequencies whether in cis or in trans thus reinforcing the idea that a single tnpA gene could drive transposition of IS1341 group members in the same cell as measured in a mating out assay (TnpBGst and IscBGst proteins are active RNA-guided Nucleases below).
However, it is puzzling that IS1341-like elements of both the TnpB and IscB type have proliferated extensively compared to the “full length” IS. This raises the question of the way in which these genetic objects arose and their function in the cell. It would be interesting to undertake a reconstruction experiment using a single chromosomally located TnpA copy together with a single IS1341 group IS to follow the kinetics of transposition over the long term to determine whether, as seems to be the case with G. stearothermophilus, an accumulation of these pared-down IS occurs.
TnpBGst and IscBGst proteins are active RNA-guided Nucleases
The activity of the ISGst encoded proteins, TnpBGst and IscBGst, was also explored [41] using a two-plasmid interference assay: the effector plasmid tagged with Spectinomycin resistance and which expresses ωRNA with a fused guide sequence together with a cloned copy of TnpB or IscB and a target plasmid tagged with kanamycin resistance which carried the TAM sequence and DNA flanking RE (Fig. IS200.54 A).

The target sequences were verified as the donor joint generated by IS excision. Successful targeting results in plasmid loss (loss of kanamycin resistance-carrying plasmid due to double strand breakage and loss of cell viability under selective conditions. IscBGst (IscB, ISGst6) and three distinct TnpBGst (TnpB1, ISGst2; TnpB2, ISGst3; TnpB3, ISGst4; and TnpB4, ISGst5) homologues were highly active for RNA-guided DNA cleavage of their native donor joints.
TnpB of ISDra2 was highly active in this type of assay but that of IS608 was inactive presumably because TnpBIS608 lacks the N-terminal HTH domain (Fig. IS200.30)
Additionally, they cloned the native ISGst3 (TnpB3) and demonstrated that its TnpB2–ωRNA robustly cleaved not only the target plasmid but also a plasmid devoid of the donor joint. When a tnpA gene was inserted upstream of tnpB in ISGst3, it proved competent for transposition in a mating out assay at levels ‘in cis’ comparable to when “in trans”.
Binding specificity was then examined using ChIP–seq and catalytically dead IscB and TnpB programmed with lacZ-specific ωRNAs (i.e. ωRNAs in which the guide sequence was complementary to a short sequence in the lac gene) to map chromosomal binding sites of nuclease-dead. These results revealed both the “on-target” (lac) site but also numerous off-target sites, indicating that the Cas12 (TnpB) and Cas9 (IscB) evolutionary “parents” show less dependence on RNA–DNA complementarity for stable DNA binding than their Cas12 and Cas9 descendants. The results also suggested that they might show a higher reliance on a more extensive TAM motif.
TnpB is Required for Replacement of the Deleted IS Copy
“Peel and Paste” transposition [102] implies that the transposition would result in loss of the IS from the lagging strand of the “donor’ replication fork (Fig. IS200.22). Excision of the IS creates a perfect donor joint. The TnpBY/ωRNA endonuclease/guide RNA system could provide a solution to this by “intercepting” the donor joint and retargeting it, recopying the remaining IS copy back into the empty site [72]. The model, developed from data obtained with Deinococcus radiodurans ISDra2, was elegantly confirmed by Meers et al., [41] using a ISGst-based IS.
Using a mini-IS tagged with kanamycin resistance and inserted into the lacZ gene in the E.coli, chromosome, excision could be measured by the appearance of lac+ colonies while transposition could be measured by retention of kanamycin resistance in the presence of non-targeting or lac targeting ωRNA, TnpA (wild type or catalytic mutant and or TnpB (wildtype or catalytic mutant) (Fig. IS200.54 B). The results showed that: a large fraction of colonies were lac+ in the presence of wild-type but not mutant TnpA; this was reduced 1000x when kanamycin resistance was selected; TnpA+TnpB and lacZ-specific ωRNA completely eliminated lac+ colonies. This result is consistent with TnpB being responsible for retention (replacement) of the IS copy following “peel and paste”. Meers et al.,[41] coined the term “peel and paste/cut and copy” for the overall process.
The Copy Choice Model for TnpB Function During Transposition
One of the important questions concerning IS200/IS605 transposition pathway and those of the related IscB-carrying elements had been the way in which these IS maintain their copy number in the donor site following peeling off the donor daughter “chromatid” to leave a transposon-less donor joint in the lagging strand of the replication fork. In the model shown in Fig. IS200.16, excision of a single stranded circular IS intermediate from the lagging strand leaves one double strand copy of the IS in the donor replicon carried by one of the daughter “chromatids” (on the leading strand) and a “donor joint” on the other. In this scenario, no increase in the IS copy number would occur.
The results of Karvelis et al. with ISDra2 provided a solution for this problem which has been extended and supported by a number of authors (see Meer et al.,[41]). Karvelis et al., [72] and Meer et al., [41] proposed a model in which, following excision of the single strand circular IS copy and formation of the donor joint (Fig. IS200.55 A), reRNA/TnpB targeted cleavage of the donor joint is used to initiate a copy/choice replacement of the IS from the remaining IS on the daughter chromatid (Fig. IS200.55 B) for example (see Cox et al., [111]). This would permit maintenance of donor replicon integrity while assuring an increase in IS copy number via transposition. They point out that conceptually, this in some ways resembles group I intron behavior.

TnpA-mediated transposition of the IS200/IS605 family is well documented. The transposition model is based on in vitro experiments using single-strand oligonucleotides, results from in vivo experiments which implicate DnaG and the observation that the orientation of insertion is correlated with the direction of replication of the target chromosome (see [102]and references therein). The proposed function of TnpB in this process neatly completes the transposition model by offering an explanation of how IS copy number is maintained by replacement of the excised IS at the resulting donor joint.
IStrons
Another role for Y1 transposases was suggested by the identification of chimeric genetic elements widely distributed in the genome of Clostridium difficile[112], the Bacillus cereus group and Fusobacterium nucleatum Subspecies Polymorphum [113][114][115][116] and many other bacterial species [117]: IStrons. These combine functional and structural properties of group I introns at their 5’-end with those of an IS element at their 3’-end (Fig. IS200.56 A and B). This 3' part contains an IS200/IS605 related sequence including two full length or truncated orfs, tnpA and tnpB, very similar to those found in ISDra2 (D. radiodurans) and ISCpe2 (C. perfringens).
IStrons are present at several loci in the same genome, indicating that this element is mobile and may move as a complete genetic unit. All IStron copies analyzed so far are inserted 3’ to the pentanucleotide TTGAT. In vivo, all variants can be efficiently and precisely excised signifying that components necessary for ribozyme activity are present [112]. The data suggest that IS components could mediate the spread of IStron while the intron component could assure splicing.

In vitro oligonucleotide-based assays using purified IStron transposase confirmed that at the DNA level, TTGAT is the LE cleavage site in excision and the target site respectively (Caumont-Sarcos, unpublished, cited in He et al.,[102]). At the RNA level, the same sequence is probably required in the splicing reaction[118]. This would represent a novel type of intron invasion and transposition mechanism and provide a direct link between RNA and DNA worlds[102].

It is interesting to note that related IStrons have now recently been identified which include components of the IS607 family [116][120]. These are characterized by a serine transposase together with a tnpB gene[12].
More recently attention has been focused on the various activities of these IStrons based mainly on the IS607-containing derivatives [117]. These studies have investigated the TnpB activity of IS607 itself and the interplay between DNA transposition, self-splicing intron mobility and RNA guide activity in both IS605- and IS607-based IStrons.
Based on a TnpB phylogenetic tree [41], a bioinformatic analysis for each tnpB gene identified associations with tnpAS (IS607 family), tnpAY (IS200/IS605 family), group I introns (IStron), and ωRNA loci (Fig. IS200.57).
As in the simple IS derivatives, ωRNA loci were invariably located at the left end, 3′ to tnpB in the same region critical for 3’SS (Splice Site) intron recognition. This analysis confirmed the previous observation that not all tnpB-containing IStrons include an intact tnpA.

The IS605-based IStron: CdiIStron
The IStron content of Clostridioides difficile (Cdi) and Clostridium botulinum (Cbo), was examined in detail: C. difficile 630 carries 8 IS605 family IStrons (CdiIStron) all without a full length TnpAY (unlike CdISt1 which carries an intact tnpAY gene [121], three freestanding group I introns and three IS605 copies. IStron LE and RE corresponding to those of IS605 were identified using a covariance model developed previously [41] and revealed a TTGAT TAM sequence abutting LE similar to that identified as the CL earlier (see He et al.,[102]) (Fig. IS200.60 A). The overall organization resembled a so-called Twort group I ribozyme [122][123].
Analysis of published RNA sequencing data [124][125] revealed expression from all intron and IS605 elements in the C. difficile genome. Moreover, spliced and unspliced sequences for all but one of the CdiIStron were detected demonstrating that the IStrons are active and defining the exon-intron boundaries. These corresponded perfectly to the predicted IS605 LE/RE. An example is shown in Fig. IS200.62. Here the IStron is inserted into a Toxin Glycosylating Gene, tcdA, in one strain but not in another.

IS607-based IStrons
An IStron including full length tnpAS and tnpB genes (CboIStron; Fig. IS200.59) was identified in C. botulinum strain BKT015925 located on a large botulinum neurotoxin-encoding plasmid together with an IStron lacking tnpA and multiple stand-alone IS607 elements. IS607 elements are difficult to identify since they do not have inverted repeat or palindromic ends and do not generate flanking target repeats (TSD) on insertion. Žedaveinytė et al., [117] were able to detect the LE and RE boundaries using a combination of comparative genomics (comparison of full and empty sites) and homology between CboIStrons. They could define a consensus TAM sequence as TGGG (Fig. IS200.59). Moreover, the covariance model (CM) used to define Cdi group I introns was also be used successfully to detect CBo group I introns and, as in the IS605 IStron, the splice sites overlapped the IS ends. An example of a CBoIStron inserted into a phage antirepressor gene, ant, is shown in Fig. IS200.59.

While the transposition reactions of IS605 and IS607 are quite different (TnpAy recognizes secondary structures generated in single-strand DNA in the IS ends while TnpAs recognizes double strand DNA), it had been noted that the ends of IS607 and its relatives included a number of repeat sequences (Fig. IS607.1 and Fig. IS607.2). Identification of IS605 and IS607 ωRNA using the covariance model showed these share common secondary structure features (Fig. IS200.60, Fig. IS200.61 and Fig. IS200.62) [117] comprising three consecutive stem-loop features with a so-called Nexus stem loop (SL1) predicted to facilitate a pseudoknot structure (Fig. IS200.65 and Fig. IS200.66) with the 3’ ωRNA end as has been demonstrated in the case of ISDra2 (Fig. IS200.42; [99]. This appears to be conserved in TnpB systems and in Cas12 guide RNA [78][98][99].



TnpAS IS607 Excision and Insertion Activity
A strong excision reaction, detected by PCR, was observed in E. coli using a “donor” plasmid carrying a mini IS607-based CboIStron lacking both tnpAS and tnpBS and TnpAS expressed from a second plasmid. Circular transposon copies were also observed (Fig. IS607.7). This depended on a functional TnpAS active site and the presence of both IStron(IS) ends. The excision products, which appeared at a frequency of 5% in overnight cultures (several orders of magnitude higher than that catalyzed by TnpAY in an IS605 system under similar conditions; Meers et al., [41] carried a precise donor joint. Internal end deletions showed a requirement for 40 (LE) and 60 (RE) bp for robust excision. In particular, these carry a subset of the repeated elements (3 in LE and 2 in RE) of those identified by Boocock and Rice[34] and Chen et al. [126] (Fig. IS607.2) and implicated in the cooperative assembly of multiple TnpAS dimers into a synaptic complex [126]. Mutation of these repeats individually resulted in reduced excision activity while multiple mutations eliminated excision entirely [117].
Thus, as in the case of IS605, transposase activity of IS607-related transposons, leads to loss from the donor site.
To investigate IS607-IStron insertion, a chloramphenicol resistance tagged suicide (non-replicative) donor plasmid carrying abutted LE and RE and a pir-dependent R6K replication origin, ori, was used: grown in a pir+ strain and transformed into a pir- strain expressing TnpAS (Fig. IS607.8). Cell viability in the presence of chloramphenicol was dependent on TnpAS. Integration specificity was investigated by genome-wide sequencing and found to strictly require a GG dinucleotide with a preference but not an absolute requirement, for the predicted TAM sequence: TGGG.
IStron-encoded TnpB nucleases
Sequencing of RNA in immunoprecipitated CboIStron TnpBS (as has been shown in other TnpB systems; [98]) revealed that it strongly enriched its expected RNA partner. However, there was also a strong signal located 42nt downstream from the covariance model (Fig. IS200.60, Fig. IS200.62). Mutating the RuvC domain resulted in the disappearance of this species suggesting that it is the product of a TnpBS-catalized precursor transcript processing as found for other TnpB systems (e.g. Nety et al. [95]).
Using a plasmid interference assay (see: TnpBGst and IscBGst proteins are active RNA-guided Nucleases; Fig. IS200.54) CboIStron TnpBS showed robust in RNA-guide mediated DNA cleavage and reduced colony formation (i.e. cell viability) by a factor of 105 in a process necessitating target DNA- guide RNA complementarity, a cognate TAM and a catalytically competent TnpB. The reaction was effective using either a natural configuration in which both CboIStron ωRNA and TnpB are expressed from a single transcript or when they are expressed independently.
Defining the CboIStron TAM Sequence: a double role in both nuclease and transposase recognition
An assay similar to that shown in Fig. IS200.40 was used to precisely define the CboIStron TAM sequence required by TnpB during DNA recognition and cleavage. The target plasmid included a randomized 6N library together with the guide sequence and a kanamycin resistance marker. Plasmids carrying the TAM sequence are strongly depleted under selective conditions. The results confirmed the predicted TAM was indeed TGGG (as shown in Fig. IS200.59).
Thus the TnpAS/TnpB system, like TnpAY/TnpB (e.g. Karvelis et al.,[72]) systems have evolved to specify the same DNA motif for nuclease recognition and for transposase recognition.
CboIStron TnpB/wRNA promotes transposon copy number maintenance
Given the similarities of TnpB activities including recognition of the TAM/CLsequence by both transposase and nuclease, it seemed probable that the IS607 CBo TnpB performs the same function.
This was tested using the retention/replacement assay of Meers et al.,[41](Fig. IS200 54 B) where the CboIstron interrupted a plasmid-based lacZ gene and oriented opposite to the direction of transcription of the gene (to avoid the splicing reaction as has been used in assays of transposition of retroelements. Expression of TnpAS alone resulted in transposon loss evidenced by about half lac+ colonies, a frequency reduced about 10x by co-expression of wildtype but not mutant TnpB. The veracity of these results was confirmed by PCR [117].
Busy Ends: Functional interactions between IStron splicing, TnpB and ωRNA
To investigate the interactions between peel and paste/cut and copy transposition and type I intron splicing, a minimal CboIStron lacking both TnpA and TnpB was first examined for its ability to undergo self-splicing in E. coli. The splicing reaction uses exogenous GTP to undergo a transesterification reaction at the 5’SS and a 3’ OH end of the upstream exon which then attacks the 3’SS to form an exon-exon joint and liberate the intron. RT-PCR on extracted RNA revealed both spliced and unspliced products. The spliced product was a perfect joining of the two flanking exons with the exact sequence observed following TnpAS-mediated IStron excision. The reaction required the P7 - P9 catalytic region (Fig. IS200.56 A) and a wildtype 5’SS sequence.
Self-splicing without the intervention of protein factors was confirmed since identical products were obtained with purified RNA.
Coding mRNAs for TnpA and TnpB can be produced from both unspliced and spliced introns, but since the ωRNA scaffold is severed from the guide region, spliced introns are no longer capable of forming functional ωRNAs (Fig. IS200.56 B). Conversely, TnpB-mediated ωRNA processing would separate 5′SS and 3′SS on two distinct molecules, allowing only trans – splicing TnpB - ωRNA binding would also likely obstruct physical interactions required at 3’SS for splicing.
Consecutive deletions in the first 180bp of CboIstron from the ωRNA 5′ end (Fig. IS200.63 Top) dramatically increased splicing. This included most of the ωRNA stem loops which was shown to eliminate TnpB-mediated RNA-guided DNA cleavage. Single or combinations of stem-loop deletions except for stem-loop 5 required for splicing had similar effects. A large change splicing activity (measured as spliced/Unspliced substrate by PCR; Fig. IS200.63 Bottom) for the 180bp deletion implies sequence and/or structural features in this region inhibit splicing in the full length wildtype IStron. The results suggest, perhaps not surprisingly, that the RNA structure alone influences splicing and that splicing and TnpB/ ωRNA activity are negatively correlated.

Additionally, the pseudoknot (Fig. IS200.42 and Fig. IS200.63) which appears to be a common feature in ωRNAs and which is essential for TnpB/ωRNA guided DNA cleavage, was observed to play an important role in inhibiting splicing: individual point mutations in PK1 (Fig. IS200.63) which destroy its formation (and eliminate guided DNA cleavage) greatly stimulate splicing activity although mutation in PK2 which is shared by the 3’SS site eliminated both guide activity and splicing.
Moreover, expression of wildtype or catalytically inactive TnpB in trans greatly reduced splicing indicating that TnpB-facilitated ωRNA binding was sufficient for splicing repression. This was further confirmed since TnpB-dependent repression was only observed when most of the ωRNA scaffold was present. It did not occur if ωRNA was replaced by lac RNA.
Busy Ends
There is therefore clearly an intricate balance between the IStron splicing activity and the mutually exclusive functions involved in the cut-and-copy phase of transposition. The end sequences have evolved to accommodate both transposase (either TnpAY or TnpAS), the TnpB/wRNA systems and the self-splicing 5’ and 3’SS. The advantage for the associated IS is that the intron has a wider target choice since it can occur, co-oriented, within coding sequences without consequences for expression for the interrupted gene.
The intricate relationship is apparent from the fact that the ωRNA scaffold is contained within the 3’ intron end but the targeting sequence is contained within the downstream exon. Splicing therefore separates the guide and scaffold sequences. The balance between splicing and guided target DNA cleavage is modulated both by TnpB- ωRNA binding which, by occlusion, prevents the access of the upstream exon to the 3’ phosphate bond (Fig. IS200.56B) and by the ωRNA itself by its pseudoknot formation which competes for the 3’ splice site. An overview is provided in Fig. IS200.64 taken directly from Žedaveinytė et al.,[117].

The Eukaryotic Connection: Fanzor eukaryotic TnpB relatives
Fanzor proteins are eukaryotic relatives of TnpB first identified in a bioinformatics search [80]. The first, SPu-1-1p (633-aa), was identified in a fungus Spizellomyces punctatus. The single Orf was flanked by 33-bp Terminal Ivertead Repeats (TIRs) and a putative 2 bp TSD (TA). The 2,100-bp long SPu 1 element was found in 17 full length copies and homologues were also identified in a number of other eukaryotes including metazoans, fungi, protists and dsDNA viruses infecting eukaryotes. They are very distantly related to TnpB from both the IS200/IS605 and IS607 families with which they share 15% identity over the 300 aa C-terminus (Fig. IS200.55). This comprises a number of highly conserved residues including the TnpB Zn finger (Fig. IS200.30, Fig. IS200.31, Fig. IS200.36, Fig. IS200.47).

A phylogenetic tree (Fig. IS200.56) showed that Fanzor1 formed a well separated clade and that Fanzor2 was associated with some, but not all, TnpB proteins.

TnpB Clade
This is restricted to prokaryotic elements although a later analysis [79] identified TnpB which are closely related to Fanzor2 (pro-Fanzor) and mainly found in cyanobacteria.
Fanzor1
Fanzor1 which is more distantly related to TnpB (Fig. IS200.66) was observed to be associated with a number of different TE including so-called IS4 -type elements in the alga Ectocarpus siliculosus and its virus, virus 1 and Sola2 elements from the slime moulds Dictyostelium fasciculatum and Polysphondylium pallidum, Tc/mariner, Helitrons, MuDr relatives and several insect viruses. All these examples are from eukaryotes.
Fanzor2 and/or Fanzor1 are of bacterial origin
On the other hand, Fanzor2 proteins are found associated with serine recombinases as in IS607. It should be pointed out that most of these are from Giant viruses or nucleocytoplasmic large DNA viruses (NCLDVs) that infect algae (Phycodnaviruses) and amoebae (Mimivirus). It has been demonstrated that such viruses acquire genetic information from ingested/infecting bacteria [127][128][129]. Indeed, several of these had been identified earlier as IS607 derivatives which maybe of bacterial origin. These appear to be a different subclade to the majority of bacterial examples although a few prokaryotic elements are associated (Anabaena sp. PCC 7120; ISArma1; and Microcystis aeruginosa NIES-843).
A more extensive analysis based on a much larger sequence library [130] identified more than 3000 representatives of the TnpB superfamily. These were chosen based on structural mining of an AlphaFold database and sequence profiling of the non-redundant NCBI database and grouped these into a phylogenetic tree (Fig. IS200. 67). The overall topology is similar to that described by Bao and Jurka [80] the eukaryotic examples fall in two major groups comprising Fanzor1 and Fanzor2.

Fanzor1 is found in fungi, but also in protists, eukaryotic viruses, in particular giant viruses where there is a close association with internalized bacteria since these infect hosts living in symbiosis with bacteria (see Filee et al., [127]), arthropods and plants. Fanzor2 is also found in several giant viruses and in choanoflagellates which also feed on bacteria (and viruses) as well as in Stramenopiles, Alveolates and Rhizaria which may also ingest bacteria.
The authors manually examined eukaryotic branches (radiations), and sometimes simply single leaves, emerging from other TnpB branches around the tree [130]. These showed that they were also from hosts featuring lifestyles intimately connected to bacterial species (e.g. bacterivores or living with parasitic bacteria).
These data were interpreted to suggested that the Fanzor proteins, FZ1 and FZ2, were originally acquired from bacterial hosts possibly twice [130][131]. It should be noted that those Fanzor2 proteins which have similar spacing to TnpB (Fig. IS200.65) and were attributed a eukaryotic association, are now thought to be misclassified prokaryotic proteins [79] moreover, Yoon et al.,[79] could find no support for independent evolution of Fanzor1 from prokaryotic Fanzor1-like derivatives.
They show a similar domain arrangement with increasing complexity from the closely related TnpB such as that of ISDra2 and the Fanzor2 proteins (Fig. IS200.68), through an expansion of the HTH (Rec) region in the Fanzor1 derivatives to the extensive expansion found in the Cas12a proteins (Fig. IS200.68).

Fanzor2 and/or Fanzor1 may have evolved from an IS607 ancestor
It has been proposed that Fanzor evolved from a clade of IS607-related elements [79] with an unusual active site configuration. Interestingly, IS identified in the Mimi NCLDV, ISvMimi_1 and ISvMimi_2 in ISfinder (NC_006450), had already been identified as related to IS607 based on their transposase, tnpA, genes but carry a tnpB-like downstream gene which is much longer than typical tnpB [127][132].
Two features distinguish TnpB and Fanzors in spite of their sharing similar domain organizations [79]; summarized in Fig. IS200.69 A): firstly, the Fanzor RuvC1 catalytic D is followed almost exclusively by a proline (GPG; Fig. IS200.55) whereas in TnpB there is typically a hydrophobic residue, φ (where φIcan be: , L, F, W, Y and M), instead (DφG; Fig. IS200.55; Fig. IS200.35; Fig. IS200.36); secondly, TnpB RuvC2 typically contains a catalytic glutamate situated ∼ 50 residues up- stream of the ZF motif which they call Ecan for cannonical (Fig. IS200.35; Fig. IS200.36; Fig. IS200.58) whereas in the Fanzor RuvC2 this glutamate is six residues upstream of the ZF motif, which they call Ealt for alternative (Fig. IS200.55; Fig. IS200.59). Yoon et al., [79] also suggest that Fanzors with the E spacing typical of TnpB and previously interpreted as novel Fanzor subtypes [80][130] appear to be prokaryotic TnpBs that had been mis-annotated as eukaryotic Fanzor2.

Fanzor1 may have evolved from Fanzor2
However, when about 800 TnpB homologues obtained by database mining were used to create a maximum likelihood tree based on their RuvC features, Yoon et al., [79] observed a clear separation between RuvC II Ecan (TnpB) and Ealt (Fanzor) containing sequences (Fig. IS200.69 B) with those carrying Ealt further distributed into two clades with RuvC I DG (TnpB) or DPG (Fanzor). A small number of TnpB-like proteins were observed to be closely related to Fanzor2, mainly from cyanobacteria, are associated with IS607 TnpA also closely related to those associated with Fanzor2, and were called ‘pro-Fanzors’ (Fig. IS200.69 B).
Yoon et al.,[79] could not find compelling evidence that Fanzor1 was acquired directly from a prokaryote and it seemed possible that it might have evolved from an acquired Fanzor2 protein. Bao and Jurka [80] observed that Fanzor1 can be found in a number of eukaryotic transposons such as Tc/mariner, Helitrons, and, associated with what the authors call IS4-type Tpases in ESvi1B and ESv2 (brown algae Ectocarpus siliculosus (see Filee et al., [127] [132] for an early description of NCLDV-associated IS). However, it was thought that Fanzor2 was limited to prokaryotic IS607-like elements [80]. To examine this further, IS607 derivatives encoding Fanzor1 and eukaryotic transposons carrying Fanzor2 were searched in the updated library [79].


As expected, the results identified a number of Fanzor1 associated with a number of eukaryotic transposons but failed to identify IS607-associated Fanzor1. On the other hand, a number of Fanzor2 were found to be associated with non-IS607 eukaryotic elements of different families (Fig. IS200.69 C) from organisms such as the algae Chloropicon primus and various mollusc species including Mercenaria mercenaria. These were called Fanzor2* to distinguish them from the “prokaryotic” Fanzor2.
The authors proposed that an ancestral Fanzor2 gave rise to Fanzor1 based on: the conserved RuvC profiles; the absence of prokaryotic Fanzor1 proteins; and that distantly related Fanzor2 was found in different eukaryotic transposon suggesting that it had been captured several times. Moreover, the absence of a detectable close evolutionary link between Fanzors and IS200/IS605 TnpBs is noteworthy since since IS200/IS605 family members appear to be more abundant than IS607 derivatives [41]. This suggested that specific features of IS607 TnpB might have facilitated their evolution in eukaryotes [79] (see Is IS607 TnpB the Ancestor of Fanzor Proteins?).
Although the Fanzor proteins are very widely distributed in the eukaryotic world (Fig. IS200.67) and are sometimes found associated with potential transposable elements and in multicopy, their function has yet to be clearly established.
Fanzor Activity
Saito et al.,[130] used two FZ1 examples (from the soil fungus S. punctatus , SpuFz1, and the alga G. theta, GtFz1) and 2 FZ2 (from N. lovaniensis, NlovFZ2, and the marine mollusk M. mercenaria, MmeFZ2) for functional studies.
Comparison of Alphafold structural predictions of FZ proteins with known structures of ISDra2 (PDB: 8H1J) and AsCas12a (PDB: 5B43) showed that despite a large sequence and length variation (Fig. IS200.58) all six proteins share a common “core” domain including a WED and RuvC region.
A predicted active catalytic site formed by positively charged residues is found in the RuvC region in ISDra2 TnpB, SpuFz1, GtFz1, NlovFz2 and MmeFz2. However, the core regions include various family-specific insertions. The Cas12 protein, AsCas12a (1307aa), carries a 900aa insertion in the WED domain, the REC region, which forms a protective channel for the spacer–target RNA-DNA heteroduplex region and is likely to be involved in this R-loop formation. It is reduced to three helices (100aa) in TnpBDra2, NlovFz2 and MmeFz2 (see Fig. IS200.34) and probably serves the same function. NlovFz2 and MmeFz2 are very similar to I TnpBDra2, but each harbors a unique amino-terminal disordered region, with NlovFz2 featuring a 96aa and MmeFz 61aa segment.
Characterization of the guide RNA system followed relatively established procedures: identification of the associated ωRNAs from the RE region of the FZ orthologues expressed in S. cerevisiae by small RNA-seq for RNPs and secondary RNA structure prediction.
Initially, SpuFz1 [80], a single open reading frame (ORF) flanked by well-conserved 30bp terminal repeats was used for further functional studies. S. punctatus DAOM carries 42 copies of this 2.1-kilobase pair Spu1 transposon: 19 with full length or remnants and 134 lacking the FZ orf which they call “ghosts” but which are equivalent to previously described MITES. RNA-seq revealed an 88–90-nt ncRNA species downstream of Fz in several S. punctatus loci which could also be identified in pulldown experiments in Saccharomyces cerevisiae. These included 14–15 nt of variable sequences beyond the conserved 75-nt region at the 3′ end. This was repeated with GtFz1 and with an Fz1 locus from G. theta, four Fz2 loci from N. lovaniensis, and two Fz2 loci from M. mercenaria.
Purified complexes of all four proteins (SpuFz1, GtFz1, NlovFz2 and MmeFz2) with their ωRNA were used in an assay (e.g. Fig. IS200.40) to define the associated TAM sequences (SpuFz1, CATA; GtFz1, TTAAN; NlovFz2, CCG; and MmeFz2, TAG) and to identify cleavage points on the non-target and target strands (NTS and TS) in a target DNA. These varied according to the protein generating: 5’ overhangs (SpuFz1), 5’ or 3’ overhangs or blunt ends (GtFz1), blunt ends (NlocFz2) or 3’ overhangs (MmeFz2). Finally, the structure of the SpuFz1 RNP complex with its target DNA was obtained by cryo-em and, as expected, was found to consist of an SpuFz1 monomer associated with a single ωRNA molecule together with the target DNA.
Functional Relationship Between Fanzor Evolution and IS607 TnpB
Yoon et al., [79] suggested that specific features of IS607 TnpB might have facilitated their evolution in eukaryotes. The most obvious major difference that IS200/IS605 family members use a single strand circular DNA intermediate whereas IS607 uses a double strand circular intermediate.
However, they also noted that while IS200/IS605 family members, insert downstream of a short motif (Fig. IS200.3, Fig. IS200.5, Fig. IS200.7) [22][102], IS607 family members typically insert via recombination between matching dinucleotide motifs (Fig. IS607.6) [34][126]. This means that for IS200/IS605 members, the first nucleotide after the right end (or the ‘right-flanking nucleotide’) is variable whereas in IS607 members it is fixed (G in Fig. IS607.6).
The nucleotide abuting the IS200/IS605 right end corresponds to the start of the guide sequence in TnpBs [72] and it was reasonable to determine whether this is also true for IS607. TAM depletion assays (which uses a target plasmid carrying a library of potential TAM sequences abutting a guide sequence e.g. Fig. IS200.44 ii) in which TnpB Recognition of a TAM/target site results in depletion in the TAM carrying bacterial sub-population was used to investigate this. Using re(ω)RNA variable boundary mutants of ISXfa1, an IS607 member from Xylella fastidiosa, (see also Žedaveinytė et al., [117]) it was observed that the RE abuting G nucleotide behaved as part of the re(ω)RNA scaffold rather than as part of the guide as has also been observed by Žedaveinytė et al for an IStron IS607 derivative [79][117] .

Analysis of ISXfa1 reRNA by Žedaveinytė et al., [117] (Fig. IS200. 53) and by Yoon et al., [79] gave similar results with potential stem-loops and, like other re(ω)RNAs, including a pdeudoknot essential for activity. Based on similarities and differences between the reRNAs and from structural models, it was proposed (Fig. IS200.59 E) that TnpB from an ancestor of IS607 gave rise to Fanzor1 (for example, that found in ISvMimi1) via an intermediate (Pro-fanzor) and that Fanzor2 derived from a eukaryotic Fanzor1.

Y1 transposase domestication
There are many examples of eukaryotic transposases whose activities have been appropriated to perform various cellular functions (see [133][134][135]. However, the very few examples of this domestication for prokaryotic enzymes concern Y1 transposases.
TnpAREP and REP/BIME
Recently, a new clade of Y1 transposases (TnpAREP) was found associated with REP/BIME sequences in structures called REPtrons [136][137] (Fig. IS200.70 A). In spite of their compact size, bacterial genomes carry many repetitive sequences, often important for genome function and evolution. Among them, Repetitive Extragenic Palindromic sequences (or REPs) are short DNA repeats of 20-40 bp that can form stem-loop structures preceded by a conserved tetranucleotide (GTAG or GGAG) (Fig. IS200.71). REPs are found in intergenic regions in many bacterial species, particularly in proteobacteria, at high copy number [136][138][139].
There are nearly 590 copies in Escherichia coli K12[140] (Fig. IS200.42) and up to 2200 copies in Pseudomonas sp GM79[139]. REPs can exist as individual units but can cluster in more complex structures called Bacterial Interspersed Mosaic Elements (BIME). These are composed of two individual REPs in inverse orientation (REP and iREP) separated by a short linker of variable length. BIME are often found in consecutive tandem copies (Fig. IS200.70). Several roles have been attributed to these sequences including genome structuring, post-transcriptional regulation and genome plasticity. REPs are known to interact with protein partners such as Integration Host Factor[141], DNA gyrase[142] and DNA polymerase I[143].
REPs also increase mRNA stability and can act as transcriptional terminators [138] or as targets for different IS [15][144]. It has also been suggested that REP sequences are involved in REP sequences can downregulate translation of upstream genes dependent on trans-translation. This occurs only if they are within 15 nt of a termination codon. It has been suggested that that REPs can stall ribosomes, leading to mRNA cleavage and induction of the trans-translation process[145]. Recombination at REP sequences has also been shown to be involved in the formation of F’ plasmid derivatives (the classic F plasmid carrying various portions of the chromosome (Fig. IS200.72) from Hfr strains[146]. However, the origin of REPs and their dissemination mechanisms are poorly understood.


Although more complex, REPtrons are reminiscent of IS200 group members (Fig. IS200.70). However REPtrons do not appear to be mobile and, in general, a single copy of a given REPtron co-exists with numerous corresponding REP/BIME and genomes may harbor several distinct REPtrons[136][139]. It has therefore been suggested that REP/BIMEs represent a special type of non-autonomous transposable element mobilizable by TnpAREP.
In vitro analysis of REPtrons: Analysis of E. coli REPtron activity in vitro has shown that, like TnpAIS200/IS605, TnpAREP strictly requires single stranded REP/BIME DNA substrates and is strand specific, only REP can be processed, whereas iREP are refractory to cleavage [137]. Purified E. coli TnpAREP promotes ssREP cleavage (in the linker sequences either 3’ or 5’ to the REP structure) and rejoining, and this activity requires the conserved tetranucleotide GTAG and the bulge in the middle of the REP stem [137][148]. Cleavage in vitro is less specific than that of TnpAIS200/IS605 and occurs at a CT dinucleotide.
In contrast to TnpAIS608 and TnpAISDra2, E. coli TnpAREP is a monomer in solution and in the crystal structure[148]. Moreover, in the co-crystal structure, the short C-terminal tail is inserted into the active site blocking access to an ssDNA. It may, therefore, play a regulatory role in the activity. Indeed C-terminal truncation of TnpAREP resulted in increased cleavage activity relative to the full-length protein in vitro. The biochemical and structural analysis suggested that the GTAG 5’ to the foot of the REP hairpin may play a similar role to the guide sequences GL/R in IS200/IS605.
Moreover, structural data also highlighted numerous specific contacts between TnpAREP and GTAG, explaining its importance in the activity and clearly distinguishing TnpAREP from TnpAIS200/IS605, which do not directly contact the guide sequences (Cleavage site recognition). The way by which TnpAREP promotes REP/BIME proliferation through their host genomes remains to be determined.

IS200 Regulation and Salmonella Pathogenicity
Although IS200 is present in moderate to high copy number in certain bacteria (e.g. Salmonella typhimurium 5-12 copies and Salmonella typhi 26 copies; and the IS200 family member, IS1541, in Yersinia pestis >50 copies), it appears to be recalcitrant to transposition [149] (The IS200 group) and exists in a “dormant” [150] state. Although samples taken 30 years apart showed no changes in IS200 patterns [149][151][152], there is evidence that the closely related IS1541 element is active in facilitating mouse infection by Y. pestis [150]. The high copy number, low transposition frequency and little accumulation of mutations would imply the existence of a robust selective pressure to maintain the IS [153].
Transposase expression is regulated by an antisense RNA
IS200 elements express two small RNA molecules, a tnpA-encoding mRNA, mRNAtnp (Fig. IS200.73) and a small antisense RNA, asRNA. Regulation of transposase expression occurs at several levels: IS200 LE includes a pair of inverted repeats which constitute a strong, bidirectional, r-dependent terminator (Fig. IS200.73a and b) reducing impinging transcription from entering the IS by ~85%; in addition, LE-proximal mRNA secondary structure sequesters the Shine-Dalgarno sequence (SD) (Fig. IS200.73 b and d) which inhibits tnpA translation by a factor of 20; and a small antisense RNA, asRNA (also called art200; [154] which, by pairing with mRNAtnp, reduces translation 15 fold. A promoter for asRNA, PA, was identified which, when mutated, reduced art200 expression in an E. coli host. Additionally, direct binding of the chaperone RNA binding protein Hfq to a region upstream of the ribosome binding site also occludes ribosome binding [154].
TnpA expression has been investigated using a lacZ translational fusion (codon 10) with tnpA (codon 60; Fig. IS200.73 c). Reducing art200 expression using a promoter mutant (PA-6) led to a significant increase in lacZ expression (~13 fold compared to the wildtype IS200 sequence). Also, when a lacZ fusion with a wildtype IS200 PA sequence was challenged with constructions carrying a 5’ tnpA mRNA segment (nts 45-298; tnpAtrunc-wt; Fig. IS200.73 c) complementary to asRNA and under control of a moderate (Ptet) or a strong (PT7) promoter, a significant increase in lac expression occurred. This presumably resulted from titration and degradation of art200 by over production of its tnpA RNA complement. Use of a tnpAtrunc-M1 mutant, unable to pair with art200, failed to show this response indicating that RNA-RNA pairing is necessary. Moreover, supplying art200 in trans, produced from its own promoter to the lacZ translational fusion with a IS200 PA-6 mutation also reduced lacZ expression.

Transposase expression is regulated by an antisense RNA and Hfq
The involvement of Hfq in art200 RNA/mRNA interaction was suspected from results obtained with asRNA (RNAout) of IS10 from the transposon Tn10 in which Hfq was found to promote antisense pairing with the transposase RNA (Tn10; [155]. art200 RNA was identified from Hfq immunoprecipitation (Hfq-IP) data sets as an asRNA (called STnc490 by the authors) complementary to 90nt of the 5’UTR (untranslated region upstream of the tnpA gene; Fig. IS200.73) of IS200 in Salmonella [156] [157]. A similar RNA from the closely related Yersinia pestis IS1541 has also been identified [158].
The involvement of Hfq in transposase (lacZ) expression (Fig. IS200.73 ci) was confirmed by performing the E. coli titration (Fig. IS200.73 cii) and trans complementation experiments with art200 produced from its native promoter (Fig. IS200.73 ciii). Experiments carried out in isogenic hfq+ and hfq- strains showed that: in the context of a wildtype PA sequence, LacZ expression was 5 fold higher in the absence of Hfq indicating that Hfq represses TnpA expression; the hfq- and PA-6 mutations act synergistically in derepressing TnpA expression; and art200 supplied in trans could repress the expression in the PA-6 mutant independently of the Hfq status of the host [154].
Additionally, it was demonstrated that RNA pairing occurs in vitro. Lead (Pb2+) acetate “footprinting” (Fig. IS200.73 d) was carried out on a mixture of the two prefolded RNA molecules. The art200 secondary structure and interaction with the 5’- fragment of transposase RNA was probed by RNA footprinting in the presence and absence of a tnpA RNA fragment (tnpA1-173; see Fig. IS200.73 d). Specific residues of each RNA became refractory to cleavage indicating a transition from single-stranded to a double-stranded state: a number of art200 residues showed reduced Pb2+ sensitivity in the presence of tnpA1–173 and in tnpA1-173, certain residues showed strong decreases in reactivity to RNase A (which degrades single-stranded RNA at C and U residues) or T1 (specific for at G residues in single-stranded RNA), or strong increases in V1 (which cleaves base-paired nucleotides) on art200 addition. This occurred on the upper part of the art200 RNA stem loop Fig. IS200.73 d left green) on addition of the tnpA1-173 RNA and on the upper part of the tnpA1-173 stem loop Fig. IS200.73 Ad right red) on addition of art200 in a reciprocal experiment [154].
These observations are consistent with a model, shown in Fig. IS200.73 e, in which the two RNA molecules initiate the interaction at the tips of the stem loops [154] followed by propagation of base pairing to the left and right sides facilitated by Hfq. This base-pairing occludes the 30S ribosome binding site (as was demonstrated in vitro) and inhibits TnpA expression in vivo [150][154] .
It should be noted that even in the absence of asRNA, Hfq binds upstream of the ribosome binding site and prevents 30S binding directly to tnpA1-173 RNA.
What is the impact of IS200 on its host genome?
As a quiescent insertion sequence which carries no passenger genes, it was argued that IS200 probably does “not contribute transposition-dependent functions to the host” [150] but its extreme stability over long time periods suggests that it might contribute important functions to its host. In a subsequent study, the Haniford lab raised the possibility that this function may be related to the RNA it expresses since small RNAs are known to play an important role in the control of bacterial cell processes often facilitated by Hfq (e.g. [159][160][161][162][163][164][165][166][167][168].
Although there are multiple layers of regulation which lead to low levels of tnpA translation, tnpA expression is relatively significant. In addition to the maintenance of some IS200-driven transcription, art200 expression appears to be growth phase regulated, increasing during S. typhimurium transition into stationary phase in rich medium and in growth media which stimulate Salmonella Pathogenicity Island (SPI) expression [156][154]. Additionally, art200 expression increases in stationary-phase while tnpA RNA expression decreases ∼ 5-fold.
Which host genes might be regulated by IS200 RNA?
One interesting possibility is that these RNAs are in some way involved in regulating various processes in the host cell [150] and evidence was obtained from RNA-seq experiments that the tnpA 5’UTR RNA acts as a repressor of a number of host genes by base pairing.
In these experiments, the levels of IS200 RNA in S. typhimurium was modified in various ways. By introducing a plasmid carrying a truncated tnpA mRNA derivative, tnpAtrunc WT-255 (including nt 1-255), highly expressed constitutively from a Ptet promoter, the levels of art200 RNA could be reduced by titration and degradation of the base-paired tnpA mRNA and art200 RNA. RNA-seq revealed 187 genes whose transcription was altered under these conditions. To determine whether the effect was due to depletion of art200 or to high tnpAtrunc WT-255 levels, a mutant, tnpAtrunc M1-255, which prevents initiation of pairing (M inFig. IS200.73 d and e), was used: art200 RNA-affected genes were expected to show differential expression with highly expressed tnpAtrunc WT-255 but not with tnpAtrunc M1-255 while tnpAtrunc-regulated genes would show differential expression with both tnpAtrunc WT-255 and tnpAtrunc M1-255 compared to the empty vector plasmid.
Overexpression of tnpA had a far greater effect on gene expression than did depleting art200: 77 genes were differentially expressed in the presence of either tnpAtrunc WT-255 and tnpAtrunc M1-255 while 6 were repressed by art200 (glnH , gltI , acs , icdA , hutU and fadR)[150]. Among the tnpAtrunc RNA-regulated genes, a number were located on the Salmonella Pathogenicity Island (SPI-1) and are involved in virulence: in particular the sipABC effector (translocon) proteins involved in cell invasion during Salmonella infection (e.g. [169]), was repressed. The fact that these were affected by both tnpAtrunc WT-255 and tnpAtrunc M1-255 indicated that their expression is directly influenced by tnpA itself and does not depend on pairing with art200.
5’UTR RNA is processed.
Since the model bacterium S. typhimurium strain LT2 is not virulent, Ellis and coworkers [150] subsequently examined a virulent strain, SL1344, in some further studies. This carries 7 IS200 copies instead of the 6 carried by LT2.
It was thought more likely that, instead of the entire 5’UTR of tnpA mRNA, the true regulator might be a processed form of this RNA. Total RNA was examined by Northern blot from the wildtype SL1344 strain and from a derivative in which one of the chromosomal tnpA genes had been fused to a tet promoter, Ptet , providing constitutive expression. Three RNA species of ∼ 90 nt, ∼ 110 nt and > 310 nt were observed using a 5’UTR probe (Fig. IS200.74). They could not be detected when 4 of the 7 IS200 tnpA copies were removed but overexpression of tnpAtrunc WT-255 resulted in the reappearance of both the 90 nt and 110 nt species indicating that both are located within the first 255 nucleotides of the tnpA mRNA [150].
Primer extension studies combined with TEX treatment (Terminator 5’ monophosphate dependent Exonuclease) which would remove processed RNA, revealed two processing sites (A and B in Fig. IS200.74a) at nts 19 and 108 Fig. IS200.74b).
The processed RNA represses SPI-1 genes by repressing invF transcriptional activator transcription.
Cloned derivatives carrying the first 5’ NTR 50, 200 and 250 nts (tnpA50, tnpA200 and tnpA250) all repressed the translocon genes sicA, sicB and sicC by a factor of ~2.5 but not expression of a control gene, thrS. Interestingly, tnpA50 gave the strongest effect. It is the only derivative unable to pair with art200. All three tnpA RNAs but, in particular tnpA50, were also found to reduce expression of invF mRNA as did high tnpAtrunc expression. InvF is an SPI-1-encoded transcription factor which activates the large SPI-1 T3SS (Type III Secretion System) translocon operon which in turn promotes entry into the intestinal epithelium in the course of an infection.
The effect of this on S. typhimurium SL1344 invasion of HeLa cells showed that tnpA overexpression resulted in a reduction in invasion by a factor of 2 compared to the wildtype strain[150].

Direct tnpA RNA-invF RNA Interaction in vivo and in vitro.
Although no complementarity was found between tnpA RNA and any sequences within the large T3SS operon, an extensive complementarity was identified between 5’ tnpA transcript nts 1- 63 and nts 104-160 upstream of the invF initiation codon (Fig. IS200.74 c). A gel shift assay demonstrated that the two regions can interact to give a slow mobility complex whether the invF RNA or tnpA (nt 1-173) were labelled. This did not occur when using tnpA RNA mutant LS (black in Fig. IS200.74 c).
Pb2+ footprinting with P32-labelled invF RNA and unlabeled WT or LS tnpA RNA revealed substantial pairing at nts 17-23 in the case of WT tnpA RNA (red inFig. IS200.74 c). This interaction was probed in vitro using an overexpressing chromosomal tnpA RNA with and without a T1 mutation which eliminates the interactions observed using Pb2+ (black in Fig. IS200.74 c). SL13344 WT RNA from late exponential phase showed reduced invF and sicA RNA levels while the T1 mutation had no effect.
It not yet clear whether one or both processed tnpA small RNAs base-pair with invF mRNA to inhibit expression and induce a rapid transcript turnover neither is the role of art200 in tnpA regulation of SPI-1 gene expression [153].
Both tnpA RNA processing sites (Fig. IS200.74 a) nts 19 (A) and 108 (B) are located approximately at the boundaries of the art200 pairing sequence (Fig. IS200.74 d) raising the possibility that art200 might in some way be involved in tnpA RNA processing. There is a loose correlation of growth phase dependent expression of art200 and SPI-1 genes consistent with the notion that art200 might “silence” tnpA RNA to liberate invF expression [153].
Growth phase dependence.
InvF expression increases in late exponential/early stationary phase [150] and it is possible that this is the result of changes in tnpA RNA levels. To examine this, expression of genes influenced by tnpA RNA (invF, sicA, sipB, sipC and prgH) were monitored during different growth phases in WT and in the tnpA RNA over-expressing strain, both of which had identical growth rates. TnpA RNA over-expression had no effect during lag- or early exponential-phase but in late-exponential phase, tnpA RNA over-expression reduced invF (2-fold), sicA (5.5-fold), sipB (4-fold) and sipC (2fold) but not the invF-independent SPI-1 encoded prgH.
Overall, the data indicated that tnpA RNA over-expression affects invF RNA levels only when expressed at lower levels than invF RNA, implying a stoichiometry between both transcripts and a direct tnpA-invF RNA interaction. Moreover, tnpA RNA over-expression only affected SPI-1 gene expression in early-exponential phase and late-exponential where the WT tnpA RNA levels are limiting relative to invF RNA.
These results suggest that the native IS200 copies may be important in controlling expression of the pathogenicity island. This was tested by comparing invF RNA expression following deletion of 4 of the 7 IS200 copies (Δ tnpA4/7) where, in both early- and late-exponential phase, native tnpA RNA was reduced 2.5 fold and invF RNA increased 2 fold in early- and 1.5 fold in late-exponential phase. Additionally, excising all 7 IS200 copies (Δ tnpA7/7), resulted in a reduction in growth rate and a 25 fold increase in SPI-1 expression. These effects were reversed on introducing a chromosomal module (tnpA 7::kan-pTet, in which a kan pTet cassette was placed in front of tnpA 7 – in IS200#7- such that the Tet promoter drives transcription of tnpA) which overexpresses tnpA RNA. Additionally, in the HeLa cell invasion assay, the overexpressing tnpA strain showed reduced invasiveness while the Δ tnpA7/7 strain was between 5 and 10 fold more invasive.
TnpA controls expression of more than 200 host genes
Further analysis of differential gene expression [170] using comparative RNA seq between the invasive Salmonella SL1344 and the derivative strain lackng all seven IS200 copies revealed more than 200 genes affected by the tnpA 5’UTR. These included master regulators for HilD (invasion) and FlhDC (flagellar) regulons, the cysteine biosynthesis regulon and phsABC, a thiosulfate reductase operon. These effects resulted in an 80-fold increase in a HeLa cell invasion assay. Some of these interactions are shown in Fig. IS200.75.

These genes are central to Salmonella virulence: The HilD transcription factor is a central control element in a complex cascade of reactions. It acts upstream of IlvF. The phsABC operon is involved in anerobic growth conditions and its inactivation results in increased invasiveness. Other details can be found in Trussler et al 2024[170] and Lou et al [171].
A model of the IS200 regulatory network
A model for RNA-mediated metabolism involving IS200 RNA (Fig. IS200.76; [153]) proposes that the 5’- UTR terminal segment of tnpA mRNA assumes a folded structure recognized by art200 asRNA facilitated by Hfq which blocks TnpA translation and leadsto degradation (bottom) or is processed and recognizes the 5’ end of the invF message blocking InvF translation and provoking degradation. Thus deletion of native IS200 copies increases invasion (and reduces growth rate) [153]. One possibility is that art200 pairs with the tnpA 5’UTR to prevent its processing.

TnpB co-option as transcription factors.
The notion that RuvC evolved to generate the TnpB and IscB families of guide endonucleases, which maintain copy number of their associated transposable elements, and then into Cas12 and Cas9 proteins (TnpB; Fig.IS200.37), which act in bacterial immunity to invading mobile elements, led to the question of whether they might have evolved to fulfill other functions. For example, type V-K CRISPR-associated transposases similarly rely on nuclease-inactivated Cas12k homologues that are still active for RNA-guided DNA binding, facilitated programmable sequence-specific targeted transposition.
In view of the identification of atypical Cas12 homologues, Cas12c and Cas12m, which have lost their cleavage functions but not their binding function and are capable of repressing gene transcription, preventing bacteriophage proliferation [172] or plasmid invasion [173].Weigand et al., [174][175] sought to determine whether some members of the TnpB group had also been domesticated and had assumed transposition-independent functions.
Repurposing TnpB proteins
Truncated copies of TnpB had been noted early in the identification of the IS200/IS605 family and using a sample of only 85 IS605 derivatives in ISfinder. They had been believed to be decay products but with hindsight should be considered as repurposed derivatives (IS Decay; Fig. IS200.30; see [176][177]. Weigand et al.,[174][175] used a library of nearly 96,000 TnpB-related proteins extracted from public databases. They identified over 500 nuclease-inactive variants containing at least 2 mutations in the DED catalytic nuclease triad. Doubly inactivated catalytic mutants were chosen since it had been shown that one of the three RuvC catalytic amino acids can occur at an alternative position [178]. These were obtained from “diverse genetic neighborhoods” including examples which were not associated with tnpA.
In view of their distribution across the phylogenetic tree (Fig. IS200.77), Weigand et al., [174][175] suggest that they may have arisen independently over time from different tnpB genes. These showed different degrees of mutation ranging from examples with one or more mutated catalytic site residues to homologues with C-terminal truncated domains removing RuvC and the zinc finger domains (Fig. IS200.77). Interestingly, among the mutated TnpB examples in the original ISfinder sample, 2 had lost their C-terminal zinc finger domains [177] and, since they remain associated with a tnpA gene, might represent examples of a tnpB on an evolutionary path to alternative functions.

Weigand et al., [174][175] also used AlphaFold predictions which provided supporting structural evidence of sequential mutation of the TnpB nuclease catalytic site. However, in each case, the TnpB RNA-binding interface, which determines TnpB DNA targeting functions, had been retained.
Further studies revealed that several of these TnpB derivatives with inactive nucleases function as repressors of expression of a number of genes: they were called TldRs (for TnpB-Like nuclease- Dead Repressors).

TldR Genetic Context
Many TldR were found neighboring non-IS-related genes. These were found to be frequently clade-specific (Fig. IS200.77): one group was consistently associated with ABC transporter systems genes including oppF, mainly present in Enterococci and located downstream (Fig. IS200.79A b); another with fliC, encoding the flagellin subunit of flagellar assemblies in Enterobacteriaceae and associated with a prophage (called fliCp to distinguish it from the genomic copy – note that the fliCp-associated TldR was identified in nearly 30 prophages), also located downstream; and a third group from Clostridia also associated with flagellin genes and with the carbon storage regulator, csrA, involved in flagellar subunit regulation and generally located downstream. Such strong associations suggested the TldRs may have functional role.
TldR Guide RNA Identification and binding
It was of considerable interest to determine whether small guide RNAs (gRNA) are associated with TldRs. Generally, these are composed of a “scaffold” domain followed by a guide sequence produced from the flanking sequence at the right end (RE) of the IS (see: Structure of TnpB-reRNA in association with DNA). However, since there are no RE associated with the TldRs to define potential guide sequences, Weigand et al., [174][175] used a co-variance approach previously used for gRNA identification (see: Conserved secondary structure motifs; [179] combined with BLAST. This identified the LE/RE boundaries and potential guide RNAs associated with active TnpB homologues closely related to fliC-associated and oppF-associated TldRs (Fig. IS200.79A c) and from these, they deduced the potential gRNA sequences of the fliC-associated and oppF-associated TldRs. In addition, RNA-seq datasets from Enterococci carrying fliC–tldR or oppF–tldR [180] revealed reads covering the TldR orfs and the proposed RNA predicted from the co-variance, thus confirming their expression.
TldR gRNA expression was investigated by cloning and expressing Enterococcal FLAG-tagged fliCP-associated TldR (Enterobacter hormaechei, EhoTldR; Fig. IS200.79 A anf B) and oppF-associated TldR (Entercoccus faecalis, Efa1TldR) on a 240 bp DNA segment including their putative guide RNA scaffold and 20-bp guide sequence in E.coli. RNA immunoprecipitation (IP) on total RNA analyzed by sequencing and mapping, revealed an 113nt EhoTldR gRNA comprising a 97 nt scaffold and a downstream 16 nt guide sequence and a 109nt Efa1TldR gRNA, comprising a 100-nt scaffold and an approximately 9-nt guide. A shorter guide of 11nt was also identified from a homologue in publicly available RNA-seq data [180].
Although TnpB has been shown to process its transcript to generate the final gRNA (see: Generating re(ω)RNA: Processing; [181] using its RuvC activity, the TldR are RuvC-defective. It was suggested that the mature gRNA may simply be a structure which is protected from other cell ribonucleases.


What do TldR gRNA target?
To determine the targets of fliCP- phage-associated TldR, a library of gRNA assembled from bioinformatic and R IP-seq analyses [174][175] was used as queries in a BLAST search. This yielded a strong match of the prophage –associated in a genomic region carrying flagellar genes from E. cloacae AR_1054 (Fig. IS200.80a) and located between the genomic fliC and fliD genes (Fig.IS200.80a top) at a position distinct from the prophage.
The putative gRNA target overlapped a potential σ28 promoter (Fig.IS200.80a middle) which, in E. coli, is recognized by FilA/σ28 and drives fil expression. Moreover, the putative target sequence was flanked on its 5’ side by the sequence GTTAT. This is conserved in a number of prophage genomes identified in several Enterobacter species (Fig.IS200.80a bottom) and resembles the TAM sequences recognized by active TnpB nucleases similar to the TldRs.
It was suggested that phage TldR gRNA might repress expression of the host FliC while maintaining its own FliCp synthesis.
Further studies used the short 9nt guide associated with oppF TldRs together with the TAM sequence, TTTAA/TTTAT, of related TnpB and uncovered a potential target upstream of the initiation codon of a chromosomal oppA ABC transporter gene in E. faecalis (TAM sequence, TTAAA; Fig.IS200.80b) and E. cecorum (TAM sequence, TTTAA; Fig.IS200.80c). In both cases, the gRNA sequence (9nt for E. faecalis and 7nt E. cecorum) was complementary to sequences overlapping the promoter, again suggesting that the TldR/gRNA would repress expression of the associated opp operon by competition with RNA polymerase. Interestingly the analysis of E. cecorum identified a significant number of additional potential targets (Fig.IS200.81), all with a 7nt complementary core and a 5’TTTAA TAM sequence. This suggests that the oppF-TldR may be involved in an extended regulatory network.

Functional Analysis: TldR/gRNA target their cognate target sites
Fifteen TldR/gRNA examples were chosen for functional analysis: several fliCP-TldR (Fig. IS200.78B top) and oppF-associated (Fig. IS200.78B bottom) loci were cloned together with their putative gRNA and expressed in an E. coli derivative carrying predicted target site integrated into the chromosome. Their genome-wide binding specificity was then ascertained by CHIP-Seq (chromatin immunoprecipitation) using FLAG-tagged TnpB and TldR and subsequently sequenced (ChIP–seq). For the majority of the examples, the results revealed strong peaks corresponding to the expected target site: the nuclease-inactive TldR therefore retained the ability to bind to specific target sites in genomic DNA.
Functional Analysis: extent of target complementarity required for TldR/gRNA binding.
The results also included a significant proportion of “off-target” sites. When analyzed in more detail, 3 prominent off-target peaks were observed for the fliC-associated TldR homologues: Kpi, Eco, Eko1, Eko2 and Eho. One of these was found to be the E. coli host chromosomal filC and filD intergenic region which differs from the Enterobacter cloacae sp. AR_15 fliC-TldR, by 5 of the core complementary nucleotides (Fig. IS200. 80a; Fig. IS200.82). A similar analysis of off-target oppF-associated TldR insertions (Eca-, Emu, Efa, Tos and Ece) (Fig. IS200.82) also indicated a rather relaxed recognition. These data are consistent with the approximately 6ny “seed” sequence found to be sufficient for certain Cas12a homologues [182] and corresponds to the length of the core sequence complementarity found for the multiple potential TAM-associated EceTldR targets identified in the E. cecorum genome (Fig. IS200.81).
Systematic analysis of all CHIP-seq peaks for enriched motifs (see [179]) revealed that fliCp-associated TldRs enriched for GTTAT identical to that flanking fliC promoters (Fig.IS200.80a), while oppF-associated TldR homologues enriched TTTAA motifs, the TAM specificity predicted for closely related TnpB relatives (TTTAA and TTTAT) (Fig.IS200.80b)

Functional Analysis: TldR are inactive in nuclease functions.
To determine that the TldR derivatives identified in the study were truly nuclease-deficient, they were tested, together with the related active TnpB derivatives using a plasmid interference assay (Fig.IS200.54a; TnpBGst and IscBGst proteins are active RNA-guided Nucleases; [179]). All 4 FliC TldR-related nuclease proficient TnpB homologues (Fig. IS200.79B top), reduced the CFU (colony forming units) in this assay whereas there was no effect with the 7 TdlR proteins (Fig. IS200.79B top) and all 4 oppF TldR-related nuclease proficient TnpB homologues (Fig. IS200.79B bottom), reduced the CFU whereas there was no effect with the 8 opp7 TdlR proteins (Fig. IS200.79B bottom).
It was concluded therefore that the TldRs are RNA-guided DNA proteins without nuclease activity [174][175].

Functional Analysis: target DNA binding by TldR modulates gene expression.
To determine whether the Tld systems modulate gene expression by target binding, [174][175] used an RFP/GFP assay in which gfp chromosomal expression would act as a standard control while TldR binding would be expected to repress rfp expression (Fig. IS200.83). The assay involved two plasmids: one which supplies gRNA and the TldR and another which carries the target sequence upstream of the rfp gene. gRNAs were designed to target promoter sequences occluding transcription initiation by or to target the 5′ UTR to block transcription elongation.
Using promoter targeting gRNAs, fliCp(Eho)- and oppF(Efa1)-associated TldR strongly repressed RFP when targeting the sense (top) strand. This is the native target orientation in the fliCp promoter (Fig. IS200.83 top). Shorter stretches of complementarity between target and gRNA were tested and a 6nt sequence showed repression similar to but a little lower than the 20nt guide sequence. Removal of the short sequence 5’ to the guide had little effect (Fig.IS200.83 bottom).
Repression was largely unaffected when the target was placed in the 5’UTR (i.e. downstream of the promoter) on the top strand. When placed on the bottom strand some repression could be detected for a small subgroup of both fliC- and oppF-associated TdlR.
Thus nuclease deficient TldR can efficiently repress downstream genes in a position- and orientation-dependent way.

FliCp-TldR from prophage helps supplant host FliC in flagella structures.
Inspection of the phage and bacterial FliC structures indicated that although the surface-exposed structures were very different, the protomer-protomer interface surface was well conserved suggesting that the phage and host proteins are interchangeable in flagella assembly. To test the notion that prophage FliCp replaces the host FliC allowing the phage to assume control of host flagella composition via TldR host gene repression, total RNA-seq of three FliC-Tdlr carrying lysogenic strains and one which is devoid of the prophage was undertaken.
This demonstrated that strong expression of gRNA with the expected 5’ and 3’ boundaries occurred in the fliCP-associated TldR carrying strains. In these strains, expression of host fliC compared to that of the host fliD was nearly undetectable whereas the phage fliCP gene was strongly expressed. In the prophage-free strain the host fliC was strongly expressed. That this effect was due to TldR repression, strains deleted for tldR, tldR–gRNA, the entire fliCP–TldR–gRNA and the entire prophage were created. All led to about a 100-fold increase in host fliC. Additionally, substitution of the guide segment of the gRNA for a non-targeting sequence had the same effect. Moreover, the de-repression of the host fliC gene could be reversed in the tldR gRNA deletion mutant by trans-complementation introducing a plasmid-encoded filCP-TdlR/gRNA cassette.
The data therefore show that host flagella by a coupled host fliC repression and increased incorporation of the phage FliCP product into the host flagella.
It will now be of interest to examine the impact of other TdlR on their bacterial host.
Acknowledgements
We are grateful to Fred Dyda and Alison Hickman for advice concerning transposition mechanism, to Orsyla Barabas for certain figures and videos of structures, and to Kira Makarova, Virginijus Šikšnys, and Sam Sternberg for advice concerning the RNA guide endonucleases. The Siksnys group also kindly supplied the Cas12 structural panel. Thanks also to David Haniford for comments on the impact of IS200 on expression of the SPI-1 Salmonella virulence genes. We are also grateful to all the above for permission permission to use derivatives of their published figures.
Bibliography
- ↑ 1.0 1.1 1.2 1.3 1.4 Lam & Roth. IS200: a Salmonella-specific insertion sequence. Cell. 1983. 34. pp. 951-60. doi: 10.1016/0092-8674(83)90552-4. PMID: 6313217.
- ↑ 2.0 2.1 2.2 2.3 2.4 Beuzón et al.. IS200: an old and still bacterial transposon. International microbiology : the official journal of the Spanish Society for Microbiology. 2004. 7. pp. 3-12. PMID: 15179601.
- ↑ 3.0 3.1 Lam & Roth. Genetic mapping of IS200 copies in Salmonella typhimurim strain LT2. Genetics. 1983. 105. pp. 801-11. doi: 10.1093/genetics/105.4.801. PMID: 6315530.
- ↑ 4.0 4.1 4.2 Lam & Roth. Structural and functional studies of insertion element IS200. Journal of molecular biology. 1986. 187. pp. 157-67. doi: 10.1016/0022-2836(86)90225-1. PMID: 3009825.
- ↑ 5.0 5.1 Beuzón & Casadesús. Conserved structure of IS200 elements in Salmonella. Nucleic acids research. 1997. 25. pp. 1355-61. doi: 10.1093/nar/25.7.1355. PMID: 9060429.
- ↑ Casadesus & Roth. Absence of insertions among spontaneous mutants of Salmonella typhimurium. Molecular & general genetics : MGG. 1989. 216. pp. 210-6. doi: 10.1007/BF00334358. PMID: 2546038.
- ↑ Haack & Roth. Recombination between chromosomal IS200 elements supports frequent duplication formation in Salmonella typhimurium. Genetics. 1995. 141. pp. 1245-52. doi: 10.1093/genetics/141.4.1245. PMID: 8601470.
- ↑ 8.0 8.1 Murai et al.. A novel insertion sequence (IS)-like element of the thermophilic bacterium PS3 promotes expression of the alanine carrier protein-encoding gene. Gene. 1995. 163. pp. 103-7. doi: 10.1016/0378-1119(95)00384-i. PMID: 7557457.
- ↑ 9.0 9.1 9.2 Bancroft & Wolk. Characterization of an insertion sequence (IS891) of novel structure from the cyanobacterium Anabaena sp. strain M-131. Journal of bacteriology. 1989. 171. pp. 5949-54. doi: 10.1128/jb.171.11.5949-5954.1989. PMID: 2553665.
- ↑ Donadio & Staver. IS1136, an insertion element in the erythromycin gene cluster of Saccharopolyspora erythraea. Gene. 1993. 126. pp. 147-51. doi: 10.1016/0378-1119(93)90604-2. PMID: 8386127.
- ↑ 11.0 11.1 11.2 11.3 11.4 Kersulyte et al.. Novel sequence organization and insertion specificity of IS605 and IS606: chimaeric transposable elements of Helicobacter pylori. Gene. 1998. 223. pp. 175-86. doi: 10.1016/s0378-1119(98)00164-4. PMID: 9858724.
- ↑ 12.0 12.1 12.2 Kersulyte et al.. Functional organization and insertion specificity of IS607, a chimeric element of Helicobacter pylori. Journal of bacteriology. 2000. 182. pp. 5300-8. doi: 10.1128/JB.182.19.5300-5308.2000. PMID: 10986230.
- ↑ Gordon et al.. New insertion sequences and a novel repeated sequence in the genome of Mycobacterium tuberculosis H37Rv. Microbiology (Reading, England). 1999. 145 ( Pt 4). pp. 881-892. doi: 10.1099/13500872-145-4-881. PMID: 10220167.
- ↑ 14.0 14.1 14.2 14.3 14.4 14.5 14.6 Kersulyte et al.. Transposable element ISHp608 of Helicobacter pylori: nonrandom geographic distribution, functional organization, and insertion specificity. Journal of bacteriology. 2002. 184. pp. 992-1002. doi: 10.1128/jb.184.4.992-1002.2002. PMID: 11807059.
- ↑ 15.0 15.1 15.2 Siguier et al.. Everyman's Guide to Bacterial Insertion Sequences. Microbiology spectrum. 2015. 3. pp. MDNA3-0030-2014. doi: 10.1128/microbiolspec.MDNA3-0030-2014. PMID: 26104715.
- ↑ Thomas & Pritham. Helitrons, the Eukaryotic Rolling-circle Transposable Elements. Microbiology spectrum. 2015. 3. doi: 10.1128/microbiolspec.MDNA3-0049-2014. PMID: 26350323.
- ↑ 17.0 17.1 Chandler et al.. Breaking and joining single-stranded DNA: the HUH endonuclease superfamily. Nature reviews. Microbiology. 2013. 11. pp. 525-38. doi: 10.1038/nrmicro3067. PMID: 23832240.
- ↑ Höök-Nikanne et al.. DNA sequence conservation and diversity in transposable element IS605 of Helicobacter pylori. Helicobacter. 1998. 3. pp. 79-85. doi: 10.1111/j.1523-5378.1998.08011.x. PMID: 9631304.
- ↑ 19.0 19.1 19.2 19.3 19.4 19.5 19.6 Ronning et al.. Active site sharing and subterminal hairpin recognition in a new class of DNA transposases. Molecular cell. 2005. 20. pp. 143-54. doi: 10.1016/j.molcel.2005.07.026. PMID: 16209952.
- ↑ 20.0 20.1 20.2 20.3 20.4 20.5 20.6 20.7 Ton-Hoang et al.. Transposition of ISHp608, member of an unusual family of bacterial insertion sequences. The EMBO journal. 2005. 24. pp. 3325-38. doi: 10.1038/sj.emboj.7600787. PMID: 16163392.
- ↑ 21.0 21.1 21.2 21.3 21.4 Guynet et al.. In vitro reconstitution of a single-stranded transposition mechanism of IS608. Molecular cell. 2008. 29. pp. 302-12. doi: 10.1016/j.molcel.2007.12.008. PMID: 18280236.
- ↑ 22.0 22.1 22.2 22.3 22.4 22.5 22.6 22.7 22.8 Barabas et al.. Mechanism of IS200/IS605 family DNA transposases: activation and transposon-directed target site selection. Cell. 2008. 132. pp. 208-20. doi: 10.1016/j.cell.2007.12.029. PMID: 18243097.
- ↑ 23.0 23.1 23.2 23.3 23.4 23.5 23.6 Pasternak et al.. Irradiation-induced Deinococcus radiodurans genome fragmentation triggers transposition of a single resident insertion sequence. PLoS genetics. 2010. 6. pp. e1000799. doi: 10.1371/journal.pgen.1000799. PMID: 20090938.
- ↑ 24.0 24.1 24.2 24.3 24.4 24.5 24.6 Ton-Hoang et al.. Single-stranded DNA transposition is coupled to host replication. Cell. 2010. 142. pp. 398-408. doi: 10.1016/j.cell.2010.06.034. PMID: 20691900.
- ↑ 25.0 25.1 25.2 25.3 25.4 25.5 25.6 25.7 25.8 Hickman et al.. DNA recognition and the precleavage state during single-stranded DNA transposition in D. radiodurans. The EMBO journal. 2010. 29. pp. 3840-52. doi: 10.1038/emboj.2010.241. PMID: 20890269.
- ↑ 26.0 26.1 Filée et al.. Insertion sequence diversity in archaea. Microbiology and molecular biology reviews : MMBR. 2007. 71. pp. 121-57. doi: 10.1128/MMBR.00031-06. PMID: 17347521.
- ↑ Devalckenaere et al.. Characterization of IS1541-like elements in Yersinia enterocolitica and Yersinia pseudotuberculosis. FEMS microbiology letters. 1999. 176. pp. 229-33. doi: 10.1111/j.1574-6968.1999.tb13666.x. PMID: 10418150.
- ↑ Bisercić & Ochman. The ancestry of insertion sequences common to Escherichia coli and Salmonella typhimurium. Journal of bacteriology. 1993. 175. pp. 7863-8. doi: 10.1128/jb.175.24.7863-7868.1993. PMID: 8253675.
- ↑ Bisercić & Ochman. Natural populations of Escherichia coli and Salmonella typhimurium harbor the same classes of insertion sequences. Genetics. 1993. 133. pp. 449-54. doi: 10.1093/genetics/133.3.449. PMID: 8384142.
- ↑ 30.0 30.1 Beuzón et al.. Repression of IS200 transposase synthesis by RNA secondary structures. Nucleic acids research. 1999. 27. pp. 3690-5. doi: 10.1093/nar/27.18.3690. PMID: 10471738.
- ↑ Sittka et al.. Deep sequencing analysis of small noncoding RNA and mRNA targets of the global post-transcriptional regulator, Hfq. PLoS genetics. 2008. 4. pp. e1000163. doi: 10.1371/journal.pgen.1000163. PMID: 18725932.
- ↑ Ellis et al.. A cis-encoded sRNA, Hfq and mRNA secondary structure act independently to suppress IS200 transposition. Nucleic acids research. 2015. 43. pp. 6511-27. doi: 10.1093/nar/gkv584. PMID: 26044710.
- ↑ Odaert et al.. Molecular characterization of IS1541 insertions in the genome of Yersinia pestis. Journal of bacteriology. 1998. 180. pp. 178-81. doi: 10.1128/JB.180.1.178-181.1998. PMID: 9422611.
- ↑ 34.0 34.1 34.2 Boocock & Rice. A proposed mechanism for IS607-family serine transposases. Mobile DNA. 2013. 4. pp. 24. doi: 10.1186/1759-8753-4-24. PMID: 24195768.
- ↑ Akopyants et al.. PCR-based subtractive hybridization and differences in gene content among strains of Helicobacter pylori. Proceedings of the National Academy of Sciences of the United States of America. 1998. 95. pp. 13108-13. doi: 10.1073/pnas.95.22.13108. PMID: 9789049.
- ↑ 36.00 36.01 36.02 36.03 36.04 36.05 36.06 36.07 36.08 36.09 36.10 36.11 36.12 Xiang et al.. Evolutionary mining and functional characterization of TnpB nucleases identify efficient miniature genome editors. Nature biotechnology. 2024. 42. pp. 745-757. doi: 10.1038/s41587-023-01857-x. PMID: 37386294.
- ↑ 37.0 37.1 Hug et al.. A new view of the tree of life. Nature microbiology. 2016. 1. pp. 16048. doi: 10.1038/nmicrobiol.2016.48. PMID: 27572647.
- ↑ Islam et al.. Characterization and distribution of IS8301 in the radioresistant bacterium Deinococcus radiodurans. Genes & genetic systems. 2003. 78. pp. 319-27. doi: 10.1266/ggs.78.319. PMID: 14676423.
- ↑ 39.0 39.1 Zahradka et al.. Reassembly of shattered chromosomes in Deinococcus radiodurans. Nature. 2006. 443. pp. 569-73. doi: 10.1038/nature05160. PMID: 17006450.
- ↑ Stanley et al.. Tissue-specific gene expression identifies a gene in the lysogenic phage Gifsy-1 that affects Salmonella enterica serovar typhimurium survival in Peyer's patches. Journal of bacteriology. 2000. 182. pp. 4406-13. doi: 10.1128/JB.182.16.4406-4413.2000. PMID: 10913072.
- ↑ 41.00 41.01 41.02 41.03 41.04 41.05 41.06 41.07 41.08 41.09 41.10 41.11 41.12 41.13 41.14 41.15 41.16 Meers et al.. Transposon-encoded nucleases use guide RNAs to promote their selfish spread. Nature. 2023. 622. pp. 863-871. doi: 10.1038/s41586-023-06597-1. PMID: 37758954.
- ↑ 42.00 42.01 42.02 42.03 42.04 42.05 42.06 42.07 42.08 42.09 42.10 42.11 42.12 42.13 42.14 42.15 42.16 42.17 Kapitonov et al.. ISC, a Novel Group of Bacterial and Archaeal DNA Transposons That Encode Cas9 Homologs. Journal of bacteriology. 2015. 198. pp. 797-807. doi: 10.1128/JB.00783-15. PMID: 26712934.
- ↑ 43.00 43.01 43.02 43.03 43.04 43.05 43.06 43.07 43.08 43.09 43.10 43.11 43.12 43.13 43.14 43.15 43.16 43.17 43.18 43.19 43.20 43.21 43.22 Altae-Tran et al.. The widespread IS200/IS605 transposon family encodes diverse programmable RNA-guided endonucleases. Science (New York, N.Y.). 2021. 374. pp. 57-65. doi: 10.1126/science.abj6856. PMID: 34591643.
- ↑ Koonin & Ilyina. Computer-assisted dissection of rolling circle DNA replication. Bio Systems. 1993. 30. pp. 241-68. doi: 10.1016/0303-2647(93)90074-m. PMID: 8374079.
- ↑ 45.0 45.1 Lee et al.. Crystal structure of a metal ion-bound IS200 transposase. The Journal of biological chemistry. 2006. 281. pp. 4261-6. doi: 10.1074/jbc.M511567200. PMID: 16340015.
- ↑ 46.0 46.1 46.2 46.3 He et al.. IS200/IS605 family single-strand transposition: mechanism of IS608 strand transfer. Nucleic acids research. 2013. 41. pp. 3302-13. doi: 10.1093/nar/gkt014. PMID: 23345619.
- ↑ 47.0 47.1 47.2 47.3 47.4 47.5 47.6 47.7 He et al.. Reconstitution of a functional IS608 single-strand transpososome: role of non-canonical base pairing. Nucleic acids research. 2011. 39. pp. 8503-12. doi: 10.1093/nar/gkr566. PMID: 21745812.
- ↑ 48.0 48.1 48.2 48.3 48.4 Guynet et al.. Resetting the site: redirecting integration of an insertion sequence in a predictable way. Molecular cell. 2009. 34. pp. 612-9. doi: 10.1016/j.molcel.2009.05.017. PMID: 19524540.
- ↑ Morero et al.. Targeting IS608 transposon integration to highly specific sequences by structure-based transposon engineering. Nucleic acids research. 2018. 46. pp. 4152-4163. doi: 10.1093/nar/gky235. PMID: 29635476.
- ↑ 50.0 50.1 Mennecier et al.. Mutagenesis via IS transposition in Deinococcus radiodurans. Molecular microbiology. 2006. 59. pp. 317-25. doi: 10.1111/j.1365-2958.2005.04936.x. PMID: 16359337.
- ↑ Parks et al.. Transposition into replicating DNA occurs through interaction with the processivity factor. Cell. 2009. 138. pp. 685-95. doi: 10.1016/j.cell.2009.06.011. PMID: 19703395.
- ↑ Hu & Derbyshire. Target choice and orientation preference of the insertion sequence IS903. Journal of bacteriology. 1998. 180. pp. 3039-48. doi: 10.1128/JB.180.12.3039-3048.1998. PMID: 9620951.
- ↑ Roberts et al.. IS10 transposition is regulated by DNA adenine methylation. Cell. 1985. 43. pp. 117-30. doi: 10.1016/0092-8674(85)90017-0. PMID: 3000598.
- ↑ Yin et al.. Effect of dam methylation on Tn5 transposition. Journal of molecular biology. 1988. 199. pp. 35-45. doi: 10.1016/0022-2836(88)90377-4. PMID: 2451025.
- ↑ Dodson & Berg. Factors affecting transposition activity of IS50 and Tn5 ends. Gene. 1989. 76. pp. 207-13. doi: 10.1016/0378-1119(89)90161-3. PMID: 2546858.
- ↑ Spradling et al.. Drosophila P elements preferentially transpose to replication origins. Proceedings of the National Academy of Sciences of the United States of America. 2011. 108. pp. 15948-53. doi: 10.1073/pnas.1112960108. PMID: 21896744.
- ↑ Zechner et al.. Coordinated leading- and lagging-strand synthesis at the Escherichia coli DNA replication fork. III. A polymerase-primase interaction governs primer size. The Journal of biological chemistry. 1992. 267. pp. 4054-63. PMID: 1531480.
- ↑ Wu et al.. Coordinated leading- and lagging-strand synthesis at the Escherichia coli DNA replication fork. V. Primase action regulates the cycle of Okazaki fragment synthesis. The Journal of biological chemistry. 1992. 267. pp. 4074-83. PMID: 1740453.
- ↑ Lau et al.. Spatial and temporal organization of replicating Escherichia coli chromosomes. Molecular microbiology. 2003. 49. pp. 731-43. doi: 10.1046/j.1365-2958.2003.03640.x. PMID: 12864855.
- ↑ Lavatine et al.. Single strand transposition at the host replication fork. Nucleic acids research. 2016. 44. pp. 7866-83. doi: 10.1093/nar/gkw661. PMID: 27466393.
- ↑ Hansen. Multiplicity of genome equivalents in the radiation-resistant bacterium Micrococcus radiodurans. Journal of bacteriology. 1978. 134. pp. 71-5. doi: 10.1128/jb.134.1.71-75.1978. PMID: 649572.
- ↑ Harsojo et al.. Genome multiplicity and radiation resistance in Micrococcus radiodurans. Journal of biochemistry. 1981. 90. pp. 877-80. doi: 10.1093/oxfordjournals.jbchem.a133544. PMID: 7309705.
- ↑ Kim et al.. Real-time transposable element activity in individual live cells. Proceedings of the National Academy of Sciences of the United States of America. 2016. 113. pp. 7278-83. doi: 10.1073/pnas.1601833113. PMID: 27298350.
- ↑ Markwardt et al.. An improved cerulean fluorescent protein with enhanced brightness and reduced reversible photoswitching. PloS one. 2011. 6. pp. e17896. doi: 10.1371/journal.pone.0017896. PMID: 21479270.
- ↑ Nagai et al.. A variant of yellow fluorescent protein with fast and efficient maturation for cell-biological applications. Nature biotechnology. 2002. 20. pp. 87-90. doi: 10.1038/nbt0102-87. PMID: 11753368.
- ↑ Shapiro & Higgins. Variation of beta-galactosidase expression from Mudlac elements during the development of Escherichia coli colonies. Annales de l'Institut Pasteur. Microbiology. 1988. 139. pp. 79-103. doi: 10.1016/0769-2609(88)90098-1. PMID: 2838063.
- ↑ Shapiro & Higgins. Differential activity of a transposable element in Escherichia coli colonies. Journal of bacteriology. 1989. 171. pp. 5975-86. doi: 10.1128/jb.171.11.5975-5986.1989. PMID: 2553666.
- ↑ 68.0 68.1 68.2 Pasternak et al.. ISDra2 transposition in Deinococcus radiodurans is downregulated by TnpB. Molecular microbiology. 2013. 88. pp. 443-55. doi: 10.1111/mmi.12194. PMID: 23461641.
- ↑ Chylinski et al.. Classification and evolution of type II CRISPR-Cas systems. Nucleic acids research. 2014. 42. pp. 6091-105. doi: 10.1093/nar/gku241. PMID: 24728998.
- ↑ Shmakov et al.. Diversity and evolution of class 2 CRISPR-Cas systems. Nature reviews. Microbiology. 2017. 15. pp. 169-182. doi: 10.1038/nrmicro.2016.184. PMID: 28111461.
- ↑ Makarova et al.. Evolutionary classification of CRISPR-Cas systems: a burst of class 2 and derived variants. Nature reviews. Microbiology. 2020. 18. pp. 67-83. doi: 10.1038/s41579-019-0299-x. PMID: 31857715.
- ↑ 72.00 72.01 72.02 72.03 72.04 72.05 72.06 72.07 72.08 72.09 72.10 72.11 72.12 72.13 72.14 72.15 72.16 72.17 72.18 72.19 72.20 72.21 72.22 72.23 72.24 72.25 72.26 72.27 72.28 Karvelis et al.. Transposon-associated TnpB is a programmable RNA-guided DNA endonuclease. Nature. 2021. 599. pp. 692-696. doi: 10.1038/s41586-021-04058-1. PMID: 34619744.
- ↑ 73.0 73.1 Jinek et al.. Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science (New York, N.Y.). 2014. 343. pp. 1247997. doi: 10.1126/science.1247997. PMID: 24505130.
- ↑ 74.0 74.1 Gasiunas et al.. Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proceedings of the National Academy of Sciences of the United States of America. 2012. 109. pp. E2579-86. doi: 10.1073/pnas.1208507109. PMID: 22949671.
- ↑ Makarova et al.. Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems. Biology direct. 2011. 6. pp. 38. doi: 10.1186/1745-6150-6-38. PMID: 21756346.
- ↑ Swarts. Stirring Up the Type V Alphabet Soup. The CRISPR journal. 2019. 2. pp. 14-16. doi: 10.1089/crispr.2019.29044.dcs. PMID: 31021231.
- ↑ Xiao et al.. Structural basis for substrate recognition and cleavage by the dimerization-dependent CRISPR-Cas12f nuclease. Nucleic acids research. 2021. 49. pp. 4120-4128. doi: 10.1093/nar/gkab179. PMID: 33764415.
- ↑ 78.0 78.1 Takeda et al.. Structure of the miniature type V-F CRISPR-Cas effector enzyme. Molecular cell. 2021. 81. pp. 558-570.e3. doi: 10.1016/j.molcel.2020.11.035. PMID: 33333018.
- ↑ 79.00 79.01 79.02 79.03 79.04 79.05 79.06 79.07 79.08 79.09 79.10 79.11 79.12 79.13 79.14 79.15 79.16 79.17 Yoon et al.. Eukaryotic RNA-guided endonucleases evolved from a unique clade of bacterial enzymes. Nucleic acids research. 2023. 51. pp. 12414-12427. doi: 10.1093/nar/gkad1053. PMID: 37971304.
- ↑ 80.00 80.01 80.02 80.03 80.04 80.05 80.06 80.07 80.08 80.09 Bao & Jurka. Homologues of bacterial TnpB_IS605 are widespread in diverse eukaryotic transposable elements. Mobile DNA. 2013. 4. pp. 12. doi: 10.1186/1759-8753-4-12. PMID: 23548000.
- ↑ Rex et al.. The mechanism of translational coupling in Escherichia coli. Higher order structure in the atpHA mRNA acts as a conformational switch regulating the access of de novo initiating ribosomes. The Journal of biological chemistry. 1994. 269. pp. 18118-27. PMID: 7517937.
- ↑ Huber et al.. Translational coupling via termination-reinitiation in archaea and bacteria. Nature communications. 2019. 10. pp. 4006. doi: 10.1038/s41467-019-11999-9. PMID: 31488843.
- ↑ Gomes-Filho et al.. Sense overlapping transcripts in IS1341-type transposase genes are functional non-coding RNAs in archaea. RNA biology. 2015. 12. pp. 490-500. doi: 10.1080/15476286.2015.1019998. PMID: 25806405.
- ↑ Zago et al.. The expanding world of small RNAs in the hyperthermophilic archaeon Sulfolobus solfataricus. Molecular microbiology. 2005. 55. pp. 1812-28. doi: 10.1111/j.1365-2958.2005.04505.x. PMID: 15752202.
- ↑ 85.0 85.1 Jäger et al.. Primary transcriptome map of the hyperthermophilic archaeon Thermococcus kodakarensis. BMC genomics. 2014. 15. pp. 684. doi: 10.1186/1471-2164-15-684. PMID: 25127548.
- ↑ Phok et al.. Identification of CRISPR and riboswitch related RNAs among novel noncoding RNAs of the euryarchaeon Pyrococcus abyssi. BMC genomics. 2011. 12. pp. 312. doi: 10.1186/1471-2164-12-312. PMID: 21668986.
- ↑ 87.0 87.1 87.2 87.3 Gomes-Filho et al.. Sense overlapping transcripts in IS1341-type transposase genes are functional non-coding RNAs in archaea. RNA biology. 2015. 12. pp. 490-500. doi: 10.1080/15476286.2015.1019998. PMID: 25806405.
- ↑ 88.0 88.1 Ibrahim et al.. Halobacterium salinarum and Haloferax volcanii Comparative Transcriptomics Reveals Conserved Transcriptional Processing Sites. Genes. 2021. 12. doi: 10.3390/genes12071018. PMID: 34209065.
- ↑ Koide et al.. Prevalence of transcription promoters within archaeal operons and coding sequences. Molecular systems biology. 2009. 5. pp. 285. doi: 10.1038/msb.2009.42. PMID: 19536208.
- ↑ Jäger et al.. Primary transcriptome map of the hyperthermophilic archaeon Thermococcus kodakarensis. BMC genomics. 2014. 15. pp. 684. doi: 10.1186/1471-2164-15-684. PMID: 25127548.
- ↑ Klein et al.. Noncoding RNA genes identified in AT-rich hyperthermophiles. Proceedings of the National Academy of Sciences of the United States of America. 2002. 99. pp. 7542-7. doi: 10.1073/pnas.112063799. PMID: 12032319.
- ↑ Been & Wickham. Self-cleaving ribozymes of hepatitis delta virus RNA. European journal of biochemistry. 1997. 247. pp. 741-53. doi: 10.1111/j.1432-1033.1997.00741.x. PMID: 9288893.
- ↑ Ferré-D'Amaré et al.. Crystal structure of a hepatitis delta virus ribozyme. Nature. 1998. 395. pp. 567-74. doi: 10.1038/26912. PMID: 9783582.
- ↑ Karvelis et al.. A pipeline for characterization of novel Cas9 orthologs. Methods in enzymology. 2019. 616. pp. 219-240. doi: 10.1016/bs.mie.2018.10.021. PMID: 30691644.
- ↑ 95.0 95.1 95.2 95.3 95.4 95.5 95.6 95.7 Nety et al.. The Transposon-Encoded Protein TnpB Processes Its Own mRNA into ωRNA for Guided Nuclease Activity. The CRISPR journal. 2023. 6. pp. 232-242. doi: 10.1089/crispr.2023.0015. PMID: 37272862.
- ↑ 96.0 96.1 Fonfara et al.. The CRISPR-associated DNA-cleaving enzyme Cpf1 also processes precursor CRISPR RNA. Nature. 2016. 532. pp. 517-21. doi: 10.1038/nature17945. PMID: 27096362.
- ↑ Swarts et al.. Structural Basis for Guide RNA Processing and Seed-Dependent DNA Targeting by CRISPR-Cas12a. Molecular cell. 2017. 66. pp. 221-233.e4. doi: 10.1016/j.molcel.2017.03.016. PMID: 28431230.
- ↑ 98.00 98.01 98.02 98.03 98.04 98.05 98.06 98.07 98.08 98.09 98.10 98.11 Nakagawa et al.. Cryo-EM structure of the transposon-associated TnpB enzyme. Nature. 2023. 616. pp. 390-397. doi: 10.1038/s41586-023-05933-9. PMID: 37020030.
- ↑ 99.00 99.01 99.02 99.03 99.04 99.05 99.06 99.07 99.08 99.09 99.10 99.11 Sasnauskas et al.. TnpB structure reveals minimal functional core of Cas12 nuclease family. Nature. 2023. 616. pp. 384-389. doi: 10.1038/s41586-023-05826-x. PMID: 37020015.
- ↑ Cong et al.. Multiplex genome engineering using CRISPR/Cas systems. Science (New York, N.Y.). 2013. 339. pp. 819-23. doi: 10.1126/science.1231143. PMID: 23287718.
- ↑ Zetsche et al.. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell. 2015. 163. pp. 759-71. doi: 10.1016/j.cell.2015.09.038. PMID: 26422227.
- ↑ 102.00 102.01 102.02 102.03 102.04 102.05 102.06 102.07 102.08 102.09 He et al.. The IS200/IS605 Family and "Peel and Paste" Single-strand Transposition Mechanism. Microbiology spectrum. 2015. 3. doi: 10.1128/microbiolspec.MDNA3-0039-2014. PMID: 26350330.
- ↑ 103.0 103.1 Altae-Tran et al.. Diversity, evolution, and classification of the RNA-guided nucleases TnpB and Cas12. Proceedings of the National Academy of Sciences of the United States of America. 2023. 120. pp. e2308224120. doi: 10.1073/pnas.2308224120. PMID: 37983496.
- ↑ 104.0 104.1 104.2 104.3 104.4 Kato et al.. Structure of the IscB-ωRNA ribonucleoprotein complex, the likely ancestor of CRISPR-Cas9. Nature communications. 2022. 13. pp. 6719. doi: 10.1038/s41467-022-34378-3. PMID: 36344504.
- ↑ 105.0 105.1 105.2 105.3 Hirano et al.. Structure of the OMEGA nickase IsrB in complex with ωRNA and target DNA. Nature. 2022. 610. pp. 575-581. doi: 10.1038/s41586-022-05324-6. PMID: 36224386.
- ↑ Briner et al.. Guide RNA functional modules direct Cas9 activity and orthogonality. Molecular cell. 2014. 56. pp. 333-339. doi: 10.1016/j.molcel.2014.09.019. PMID: 25373540.
- ↑ Shibata et al.. Real-space and real-time dynamics of CRISPR-Cas9 visualized by high-speed atomic force microscopy. Nature communications. 2017. 8. pp. 1430. doi: 10.1038/s41467-017-01466-8. PMID: 29127285.
- ↑ Sternberg et al.. Conformational control of DNA target cleavage by CRISPR-Cas9. Nature. 2015. 527. pp. 110-3. doi: 10.1038/nature15544. PMID: 26524520.
- ↑ Weinberg et al.. Exceptional structured noncoding RNAs revealed by bacterial metagenome analysis. Nature. 2009. 462. pp. 656-9. doi: 10.1038/nature08586. PMID: 19956260.
- ↑ Dyall-Smith et al.. Haloquadratum walsbyi: limited diversity in a global pond. PloS one. 2011. 6. pp. e20968. doi: 10.1371/journal.pone.0020968. PMID: 21701686.
- ↑ Cox. Regulation of bacterial RecA protein function. Critical reviews in biochemistry and molecular biology. 2007. 42. pp. 41-63. doi: 10.1080/10409230701260258. PMID: 17364684.
- ↑ 112.0 112.1 Braun et al.. A chimeric ribozyme in clostridium difficile combines features of group I introns and insertion elements. Molecular microbiology. 2000. 36. pp. 1447-59. doi: 10.1046/j.1365-2958.2000.01965.x. PMID: 10931294.
- ↑ Karpathy et al.. Genome sequence of Fusobacterium nucleatum subspecies polymorphum - a genetically tractable fusobacterium. PloS one. 2007. 2. pp. e659. doi: 10.1371/journal.pone.0000659. PMID: 17668047.
- ↑ Tourasse et al.. The Bacillus cereus group: novel aspects of population structure and genome dynamics. Journal of applied microbiology. 2006. 101. pp. 579-93. doi: 10.1111/j.1365-2672.2006.03087.x. PMID: 16907808.
- ↑ Tourasse & Kolstø. Survey of group I and group II introns in 29 sequenced genomes of the Bacillus cereus group: insights into their spread and evolution. Nucleic acids research. 2008. 36. pp. 4529-48. doi: 10.1093/nar/gkn372. PMID: 18587153.
- ↑ 116.0 116.1 Tourasse et al.. Unusual group II introns in bacteria of the Bacillus cereus group. Journal of bacteriology. 2005. 187. pp. 5437-51. doi: 10.1128/JB.187.15.5437-5451.2005. PMID: 16030238.
- ↑ 117.00 117.01 117.02 117.03 117.04 117.05 117.06 117.07 117.08 117.09 117.10 117.11 117.12 117.13 117.14 117.15 117.16 117.17 Žedaveinytė et al.. Antagonistic conflict between transposon-encoded introns and guide RNAs. bioRxiv : the preprint server for biology. 2023. doi: 10.1101/2023.11.20.567912. PMID: 38045383.
- ↑ Nielsen & Johansen. Group I introns: Moving in new directions. RNA biology. 2009. 6. pp. 375-83. doi: 10.4161/rna.6.4.9334. PMID: 19667762.
- ↑ Hausner et al.. Bacterial group I introns: mobile RNA catalysts. Mobile DNA. 2014. 5. pp. 8. doi: 10.1186/1759-8753-5-8. PMID: 24612670.
- ↑ Tourasse et al.. Survey of chimeric IStron elements in bacterial genomes: multiple molecular symbioses between group I intron ribozymes and DNA transposons. Nucleic acids research. 2014. 42. pp. 12333-51. doi: 10.1093/nar/gku939. PMID: 25324310.
- ↑ Hasselmayer et al.. Clostridium difficile IStron CdISt1: discovery of a variant encoding two complete transposase-like proteins. Journal of bacteriology. 2004. 186. pp. 2508-10. doi: 10.1128/JB.186.8.2508-2510.2004. PMID: 15060058.
- ↑ Landthaler & Shub. Unexpected abundance of self-splicing introns in the genome of bacteriophage Twort: introns in multiple genes, a single gene with three introns, and exon skipping by group I ribozymes. Proceedings of the National Academy of Sciences of the United States of America. 1999. 96. pp. 7005-10. doi: 10.1073/pnas.96.12.7005. PMID: 10359829.
- ↑ Golden et al.. Crystal structure of a phage Twort group I ribozyme-product complex. Nature structural & molecular biology. 2005. 12. pp. 82-9. doi: 10.1038/nsmb868. PMID: 15580277.
- ↑ Weiss et al.. Clostridioides difficile strain-dependent and strain-independent adaptations to a microaerobic environment. Microbial genomics. 2021. 7. doi: 10.1099/mgen.0.000738. PMID: 34908523.
- ↑ Fuchs et al.. An RNA-centric global view of Clostridioides difficile reveals broad activity of Hfq in a clinically important gram-positive bacterium. Proceedings of the National Academy of Sciences of the United States of America. 2021. 118. doi: 10.1073/pnas.2103579118. PMID: 34131082.
- ↑ 126.0 126.1 126.2 Chen et al.. Multiple serine transposase dimers assemble the transposon-end synaptic complex during IS607-family transposition. eLife. 2018. 7. doi: 10.7554/eLife.39611. PMID: 30289389.
- ↑ 127.0 127.1 127.2 127.3 Filée et al.. I am what I eat and I eat what I am: acquisition of bacterial genes by giant viruses. Trends in genetics : TIG. 2007. 23. pp. 10-5. doi: 10.1016/j.tig.2006.11.002. PMID: 17109990.
- ↑ Filée & Chandler. Convergent mechanisms of genome evolution of large and giant DNA viruses. Research in microbiology. 2008. 159. pp. 325-31. doi: 10.1016/j.resmic.2008.04.012. PMID: 18572389.
- ↑ Filée & Chandler. Gene exchange and the origin of giant viruses. Intervirology. 2010. 53. pp. 354-61. doi: 10.1159/000312920. PMID: 20551687.
- ↑ 130.0 130.1 130.2 130.3 130.4 130.5 130.6 Saito et al.. Fanzor is a eukaryotic programmable RNA-guided endonuclease. Nature. 2023. 620. pp. 660-668. doi: 10.1038/s41586-023-06356-2. PMID: 37380027.
- ↑ Jiang et al.. Programmable RNA-guided DNA endonucleases are widespread in eukaryotes and their viruses. Science advances. 2023. 9. pp. eadk0171. doi: 10.1126/sciadv.adk0171. PMID: 37756409.
- ↑ 132.0 132.1 Filée et al.. Phylogenetic evidence for extensive lateral acquisition of cellular genes by Nucleocytoplasmic large DNA viruses. BMC evolutionary biology. 2008. 8. pp. 320. doi: 10.1186/1471-2148-8-320. PMID: 19036122.
- ↑ Volff. Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. BioEssays : news and reviews in molecular, cellular and developmental biology. 2006. 28. pp. 913-22. doi: 10.1002/bies.20452. PMID: 16937363.
- ↑ Vogt et al.. Transposon domestication versus mutualism in ciliate genome rearrangements. PLoS genetics. 2013. 9. pp. e1003659. doi: 10.1371/journal.pgen.1003659. PMID: 23935529.
- ↑ Vogt & Mochizuki. A domesticated PiggyBac transposase interacts with heterochromatin and catalyzes reproducible DNA elimination in Tetrahymena. PLoS genetics. 2013. 9. pp. e1004032. doi: 10.1371/journal.pgen.1004032. PMID: 24348275.
- ↑ 136.0 136.1 136.2 Nunvar et al.. Identification and characterization of repetitive extragenic palindromes (REP)-associated tyrosine transposases: implications for REP evolution and dynamics in bacterial genomes. BMC genomics. 2010. 11. pp. 44. doi: 10.1186/1471-2164-11-44. PMID: 20085626.
- ↑ 137.0 137.1 137.2 Ton-Hoang et al.. Structuring the bacterial genome: Y1-transposases associated with REP-BIME sequences. Nucleic acids research. 2012. 40. pp. 3596-609. doi: 10.1093/nar/gkr1198. PMID: 22199259.
- ↑ 138.0 138.1 Rocco et al.. A giant family of short palindromic sequences in Stenotrophomonas maltophilia. FEMS microbiology letters. 2010. 308. pp. 185-92. doi: 10.1111/j.1574-6968.2010.02010.x. PMID: 20528935.
- ↑ 139.0 139.1 139.2 Nunvar et al.. Evolution of REP diversity: a comparative study. BMC genomics. 2013. 14. pp. 385. doi: 10.1186/1471-2164-14-385. PMID: 23758774.
- ↑ Gilson et al.. Palindromic unit highly repetitive DNA sequences exhibit species specificity within Enterobacteriaceae. Research in microbiology. 1990. 141. pp. 1103-16. doi: 10.1016/0923-2508(90)90084-4. PMID: 2092362.
- ↑ Boccard & Prentki. Specific interaction of IHF with RIBs, a class of bacterial repetitive DNA elements located at the 3' end of transcription units. The EMBO journal. 1993. 12. pp. 5019-27. doi: 10.1002/j.1460-2075.1993.tb06195.x. PMID: 8262044.
- ↑ Espéli & Boccard. In vivo cleavage of Escherichia coli BIME-2 repeats by DNA gyrase: genetic characterization of the target and identification of the cut site. Molecular microbiology. 1997. 26. pp. 767-77. doi: 10.1046/j.1365-2958.1997.6121983.x. PMID: 9427406.
- ↑ Gilson et al.. DNA polymerase I and a protein complex bind specifically to E. coli palindromic unit highly repetitive DNA: implications for bacterial chromosome organization. Nucleic acids research. 1990. 18. pp. 3941-52. doi: 10.1093/nar/18.13.3941. PMID: 2197600.
- ↑ Tobes & Pareja. Bacterial repetitive extragenic palindromic sequences are DNA targets for Insertion Sequence elements. BMC genomics. 2006. 7. pp. 62. doi: 10.1186/1471-2164-7-62. PMID: 16563168.
- ↑ Liang et al.. A role for REP sequences in regulating translation. Molecular cell. 2015. 58. pp. 431-9. doi: 10.1016/j.molcel.2015.03.019. PMID: 25891074.
- ↑ Kofoid et al.. Formation of an F' plasmid by recombination between imperfectly repeated chromosomal Rep sequences: a closer look at an old friend (F'(128) pro lac). Journal of bacteriology. 2003. 185. pp. 660-3. doi: 10.1128/JB.185.2.660-663.2003. PMID: 12511513.
- ↑ 147.0 147.1 Bachellier et al.. Short palindromic repetitive DNA elements in enterobacteria: a survey. Research in microbiology. 1999. 150. pp. 627-39. doi: 10.1016/s0923-2508(99)00128-x. PMID: 10673002.
- ↑ 148.0 148.1 Messing et al.. The processing of repetitive extragenic palindromes: the structure of a repetitive extragenic palindrome bound to its associated nuclease. Nucleic acids research. 2012. 40. pp. 9964-79. doi: 10.1093/nar/gks741. PMID: 22885300.
- ↑ 149.0 149.1 Beuzón et al.. IS200: an old and still bacterial transposon. International microbiology : the official journal of the Spanish Society for Microbiology. 2004. 7. pp. 3-12. PMID: 15179601.
- ↑ 150.00 150.01 150.02 150.03 150.04 150.05 150.06 150.07 150.08 150.09 150.10 150.11 Ellis et al.. A transposon-derived small RNA regulates gene expression in Salmonella Typhimurium. Nucleic acids research. 2017. 45. pp. 5470-5486. doi: 10.1093/nar/gkx094. PMID: 28335027.
- ↑ Casadesus & Roth. Transcriptional occlusion of transposon targets. Molecular & general genetics : MGG. 1989. 216. pp. 204-9. doi: 10.1007/BF00334357. PMID: 2546037.
- ↑ Schiaffino et al.. Strain typing with IS200 fingerprints in Salmonella abortusovis. Applied and environmental microbiology. 1996. 62. pp. 2375-80. doi: 10.1128/aem.62.7.2375-2380.1996. PMID: 8779575.
- ↑ 153.0 153.1 153.2 153.3 153.4 153.5 Ellis et al.. Silent but deadly: IS200 promotes pathogenicity in Salmonella Typhimurium. RNA biology. 2018. 15. pp. 176-181. doi: 10.1080/15476286.2017.1403001. PMID: 29120256.
- ↑ 154.0 154.1 154.2 154.3 154.4 154.5 154.6 Ellis et al.. A cis-encoded sRNA, Hfq and mRNA secondary structure act independently to suppress IS200 transposition. Nucleic acids research. 2015. 43. pp. 6511-27. doi: 10.1093/nar/gkv584. PMID: 26044710.
- ↑ Ross et al.. Hfq restructures RNA-IN and RNA-OUT and facilitates antisense pairing in the Tn10/IS10 system. RNA (New York, N.Y.). 2013. 19. pp. 670-84. doi: 10.1261/rna.037747.112. PMID: 23510801.
- ↑ 156.0 156.1 Sittka et al.. Deep sequencing analysis of small noncoding RNA and mRNA targets of the global post-transcriptional regulator, Hfq. PLoS genetics. 2008. 4. pp. e1000163. doi: 10.1371/journal.pgen.1000163. PMID: 18725932.
- ↑ Kröger et al.. The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium. Proceedings of the National Academy of Sciences of the United States of America. 2012. 109. pp. E1277-86. doi: 10.1073/pnas.1201061109. PMID: 22538806.
- ↑ Yan et al.. Determination of sRNA expressions by RNA-seq in Yersinia pestis grown in vitro and during infection. PloS one. 2013. 8. pp. e74495. doi: 10.1371/journal.pone.0074495. PMID: 24040259.
- ↑ Hershko-Shalev et al.. Gifsy-1 Prophage IsrK with Dual Function as Small and Messenger RNA Modulates Vital Bacterial Machineries. PLoS genetics. 2016. 12. pp. e1005975. doi: 10.1371/journal.pgen.1005975. PMID: 27057757.
- ↑ Chao & Vogel. A 3' UTR-Derived Small RNA Provides the Regulatory Noncoding Arm of the Inner Membrane Stress Response. Molecular cell. 2016. 61. pp. 352-363. doi: 10.1016/j.molcel.2015.12.023. PMID: 26805574.
- ↑ Chao et al.. An atlas of Hfq-bound transcripts reveals 3' UTRs as a genomic reservoir of regulatory small RNAs. The EMBO journal. 2012. 31. pp. 4005-19. doi: 10.1038/emboj.2012.229. PMID: 22922465.
- ↑ Guo et al.. MicL, a new σE-dependent sRNA, combats envelope stress by repressing synthesis of Lpp, the major outer membrane lipoprotein. Genes & development. 2014. 28. pp. 1620-34. doi: 10.1101/gad.243485.114. PMID: 25030700.
- ↑ Jørgensen et al.. Dual function of the McaS small RNA in controlling biofilm formation. Genes & development. 2013. 27. pp. 1132-45. doi: 10.1101/gad.214734.113. PMID: 23666921.
- ↑ Holmqvist et al.. Global RNA recognition patterns of post-transcriptional regulators Hfq and CsrA revealed by UV crosslinking in vivo. The EMBO journal. 2016. 35. pp. 991-1011. doi: 10.15252/embj.201593360. PMID: 27044921.
- ↑ van Nues et al.. Ribonucleoprotein particles of bacterial small non-coding RNA IsrA (IS61 or McaS) and its interaction with RNA polymerase core may link transcription to mRNA fate. Nucleic acids research. 2016. 44. pp. 2577-92. doi: 10.1093/nar/gkv1302. PMID: 26609136.
- ↑ Ellis et al.. Hfq binds directly to the ribosome-binding site of IS10 transposase mRNA to inhibit translation. Molecular microbiology. 2015. 96. pp. 633-50. doi: 10.1111/mmi.12961. PMID: 25649688.
- ↑ Jäger et al.. An archaeal sRNA targeting cis- and trans-encoded mRNAs via two distinct domains. Nucleic acids research. 2012. 40. pp. 10964-79. doi: 10.1093/nar/gks847. PMID: 22965121.
- ↑ Vogel & Luisi. Hfq and its constellation of RNA. Nature reviews. Microbiology. 2011. 9. pp. 578-89. doi: 10.1038/nrmicro2615. PMID: 21760622.
- ↑ Myeni et al.. SipB-SipC complex is essential for translocon formation. PloS one. 2013. 8. pp. e60499. doi: 10.1371/journal.pone.0060499. PMID: 23544147.
- ↑ 170.0 170.1 170.2 A small RNA derived from the 5’ end of the IS200 tnpA transcript regulates multiple virulence regulons in Salmonella Typhimurium. Ryan S. Trussler, Naomi-Jean Q. Scherba, Michael J. Ellis, Konrad U. Förstner, Matthew Albert, Alexander J. Westermann, David B. Haniford. bioRxiv 2024.06.26.600842; doi: https://doi.org/10.1101/2024.06.26.600842
- ↑ 171.0 171.1 Lou et al.. Salmonella Pathogenicity Island 1 (SPI-1) and Its Complex Regulatory Network. Frontiers in cellular and infection microbiology. 2019. 9. pp. 270. doi: 10.3389/fcimb.2019.00270. PMID: 31428589.
- ↑ Huang et al.. A naturally DNase-free CRISPR-Cas12c enzyme silences gene expression. Molecular cell. 2022. 82. pp. 2148-2160.e4. doi: 10.1016/j.molcel.2022.04.020. PMID: 35659325.
- ↑ Wu et al.. The miniature CRISPR-Cas12m effector binds DNA to block transcription. Molecular cell. 2022. 82. pp. 4487-4502.e7. doi: 10.1016/j.molcel.2022.11.003. PMID: 36427491.
- ↑ 174.00 174.01 174.02 174.03 174.04 174.05 174.06 174.07 174.08 174.09 174.10 174.11 174.12 174.13 174.14 174.15 Wiegand et al.. Emergence of RNA-guided transcription factors via domestication of transposon-encoded TnpB nucleases. bioRxiv : the preprint server for biology. 2023. doi: 10.1101/2023.11.30.569447. PMID: 38076855.
- ↑ 175.00 175.01 175.02 175.03 175.04 175.05 175.06 175.07 175.08 175.09 175.10 175.11 175.12 175.13 175.14 175.15 Wiegand et al.. TnpB homologues exapted from transposons are RNA-guided transcription factors. Nature. 2024. 631. pp. 439-448. doi: 10.1038/s41586-024-07598-4. PMID: 38926585.
- ↑ Siguier et al.. Bacterial insertion sequences: their genomic impact and diversity. FEMS microbiology reviews. 2014. 38. pp. 865-91. doi: 10.1111/1574-6976.12067. PMID: 24499397.
- ↑ 177.0 177.1 He et al.. The IS200/IS605 Family and "Peel and Paste" Single-strand Transposition Mechanism. Microbiology spectrum. 2015. 3. doi: 10.1128/microbiolspec.MDNA3-0039-2014. PMID: 26350330.
- ↑ Jiang et al.. Programmable RNA-guided DNA endonucleases are widespread in eukaryotes and their viruses. Science advances. 2023. 9. pp. eadk0171. doi: 10.1126/sciadv.adk0171. PMID: 37756409.
- ↑ 179.0 179.1 179.2 Meers et al.. Transposon-encoded nucleases use guide RNAs to promote their selfish spread. Nature. 2023. 622. pp. 863-871. doi: 10.1038/s41586-023-06597-1. PMID: 37758954.
- ↑ 180.0 180.1 Michaux et al.. Single-Nucleotide RNA Maps for the Two Major Nosocomial Pathogens Enterococcus faecalis and Enterococcus faecium. Frontiers in cellular and infection microbiology. 2020. 10. pp. 600325. doi: 10.3389/fcimb.2020.600325. PMID: 33324581.
- ↑ Nety et al.. The Transposon-Encoded Protein TnpB Processes Its Own mRNA into ωRNA for Guided Nuclease Activity. The CRISPR journal. 2023. 6. pp. 232-242. doi: 10.1089/crispr.2023.0015. PMID: 37272862.
- ↑ Swarts et al.. Structural Basis for Guide RNA Processing and Seed-Dependent DNA Targeting by CRISPR-Cas12a. Molecular cell. 2017. 66. pp. 221-233.e4. doi: 10.1016/j.molcel.2017.03.016. PMID: 28431230.
How to Cite?
TnPedia Team. (2025). TnPedia: IS200/IS605 Family of Prokaryotic Insertion Sequences. Zenodo. https://doi.org/10.5281/zenodo.15640112