Genomic Comparison of the Closely Related Salmonella enterica Serovars Enteritidis and Dublin
Abstract
The Enteritidis and Dublin serovars of Salmonella enterica are closely related, yet they differ significantly in pathogenicity and epidemiology. S. Enteritidis is a broad host range serovar that commonly causes gastroenteritis and infrequently causes invasive disease in humans. S. Dublin mainly colonizes cattle but upon infecting humans often results in invasive disease.To gain a broader view of the extent of these differences we conducted microarray-based comparative genomics between several field isolates from each serovar. Genome degradation has been correlated with host adaptation in Salmonella, thus we also compared at whole genome scale the available genomic sequences of them to evaluate pseudogene composition within each serovar.
Microarray analysis revealed 3771 CDS shared by both serovars while 33 were only present in Enteritidis and 87 were exclusive to Dublin. Pseudogene evaluation showed 177 inactive CDS in S. Dublin which correspond to active genes in S. Enteritidis, nine of which are also inactive in the host adapted S. Gallinarum and S. Choleraesuis serovars. Sequencing of these 9 CDS in several S. Dublin clinical isolates revealed that they are pseudogenes in all of them, indicating that this feature is not peculiar to the sequenced strain. Among these CDS, shdA (Peyer´s patch colonization factor) and mglA (galactoside transport ATP binding protein), appear also to be inactive in the human adapted S. Typhi and S. Paratyphi A, suggesting that functionality of these genes may be relevant for the capacity of certain Salmonella serovars to infect a broad range of hosts.
1. INTRODUCTION
Infection with non-typhoidal Salmonella enterica is a major cause of food-borne disease in humans worldwide [1-3]. Animals and their products are regarded as the main sources of this pathogen, although it may also be present in other potential sources, such as fresh vegetables [4-6]. From over 2500 different serovars of Salmonella enterica (defined by their surface antigenic properties, both somatic O antigen and flagellar H antigens) about 50 are significant pathogens of animals and humans. Acute infections in humans can develop in one of four ways: enteric fever, gastroenteritis, bacteremia, or extraintestinal focal infection [7]. As with other infectious diseases, the course and outcome of the infection depend on a variety of factors, including the inoculating dose, the immune status of the host, and the genetic background of both the host and the infecting organism.
Although S. enterica serovars are genetically very similar, they differ significantly in host range and disease spectrum. S. enterica serovars may be classified as ubiquitous, host-restricted or host-specific. Ubiquitous serovars, which include Typhimurium and Enteritidis, most commonly produce self-limiting gastrointestinal infections in a wide range of hosts. Host-specific serovars, such as Typhi in humans or Gallinarum in fowl, cause severe systemic diseases in their specific hosts. A few Salmonella serovars, such as Choleraesuis and Dublin, have a narrow host range and are classified as host-restricted [8].
Host-restricted and host-specific serovars are generally more prone to cause invasive disease than ubiquitous serovars [9, 10]. Globally, human extra-intestinal salmonellosis is generally associated with those serovars that are also associated with gastroenteritis, as is the case with S. Enteritidis and S. Typhimurium. However, certain serovars are more prone to cause invasive infections than others, as is clear when the percentage of isolates from bacteremia related to total cases (invasive index) is calculated [7, 11]. For S. Typhimurium and S. Enteritidis, the invasive index ranges from 1 to 7% [11, 12], while for S. Dublin different reports indicate that the invasive index ranges from 50% to 70% [7, 11, 13-15]. Loss of gene function through pseudogene accumulation has been indicated as a hallmark of host-specific pathogenic bacteria as compared to their host-generalist relatives [16-22].
The Enteritidis (O: 1, 9, 12: gm: -) and Dublin (O:1, 9, 12: gp: -) serovars share antigenic properties and are phylogenetically closely related, yet they seem to differ significantly in pathogenic potential [23, 24]. S. Enteritidis commonly causes gastroenteritis but rarely causes invasive disease in humans. S. Dublin usually infects cattle causing abortion and systemic infection, but occasionally can be found infecting other hosts such as pigs and humans. On the rare occasions when it infects humans it often results in bacteraemia with severe disease and high mortality [25-27]. Characterization of the mechanisms underlying these differences is central to a more general understanding of the invasiveness of salmonellae. To date only one complete genome of a S. Enteritidis strain (P125109, hereafter referred as PT4) and two S. Dublin isolates (CT_02021853 and 3246) have been sequenced and annotated and are publicly available [28], [http://www.ncbi.nlm.nih.gov/genomeprj/19467] [29].
To gain new insights into genetic differences that could help to understand the basis of such marked different pathogenic behaviors, here we describe a comparative study between S. Enteritidis and S. Dublin. We conducted microarray-based comparative genomics between four S. Dublin clinical isolates and the core genome resulting of the comparative genome analysis of 29 S. Enteritidis isolates previously reported by us [30]. Further the pseudogene content of each serovar was also evaluated using the available genome sequences.
2. MATERIALS AND METHODS
2.1. Bacterial Strains
Twenty-nine S. enterica serovar Enteritidis isolates from diverse origins in Uruguay were previously characterized by microarray and phenotypic assays [30, 31]. Seven S. enterica serovar Dublin isolates from human infections in Uruguay were used in this study (Table 1).
+ : tested.
- : non-tested.
a Correspond to human samples.
b Comparative genomics hybridization.
c Nucleotide sequence of CDS as described in text and Table 2.
Gene in S. Enteritidis | Primer Sequence (5´-3´) | |
---|---|---|
SEN0042 | TATTCAAAACTTGCTTAGAAAGTAGAG | Forward |
CGGGTCTTGTTGCATAAATGG | Reverse | |
GGAAAGTAATGTTGTCCGCTG | Reverse2 | |
SEN0784 | GTGGTAAACATATTGTAATGTTATTTTC | Forward |
AATGTGATTCAGGCTGTGCT | Reverse | |
SEN2182 | AGACCGGATAACGTATTTCTTTTGCC | Forward |
ATTCCGCCCTCTTTCAGCCAGGTC | Reverse | |
GTGATTGTCCCGGACGACTTCTC | Reverse2 | |
SEN2493 | TCCAGTTTGCTTCGTGAACG | Forward |
CACTGGCGATGTGACGATT | Forward2 | |
CAATTTCGGCGTAATGACGTT | Forward3 | |
ATCAACCGGTTTGTCATTCG | Reverse | |
TACCGTCCCAGTCGCCGTTG | Reverse2 | |
SEN2783 | GTGAGGTATATCAACAAAAAAGACCA | Forward |
TCCAGAGGCAATCCAGGA | Forward2 | |
TGTGCAGGCGCCGTTG | Forward3 | |
ACGGACGGGGAGCCAGG | Reverse | |
CAACCTCTTTGCGTGTATCAACC | Reverse2 | |
SEN2806 | GTGCTGGTAGGCGATATTAAG | Forward |
CTTCCCGGACGCGCGTAT | Forward2 | |
AACCTGCATTTCAGTCACTACAG | Reverse | |
SEN3461 | TTTGGCACGGCTGGCGACAT | Forward |
GAATGCCCTGCTGGTGGATT | Forward2 | |
CGTGCCGGGAACTATAACAG | Forward3 | |
AGCACCGACCCGCCCAACA | Reverse | |
GCCGCGCAAACCGTAGTTCA | Reverse2 | |
SEN3672 | GGCCTGGTCACGTCTGTAAC | Forward |
CTCTCTTTTGTCTTCGGTATCC | Forward2 | |
TATGACGGTTTGATGACAATGG | Reverse | |
SEN4290 | AACGCTTGAGGATTTAATAGAA | Forward |
CTGATTCAGTACCGTCAGTG | Reverse |
Gene Range | Homologous | Function/Gene Prediction | |
---|---|---|---|
Reg En1 | SEN0083-0085 | CT18, TY2, LT2, DT104, SL1344, SBG, SPA, SGAL | probable secreted proteins, sulfatase |
Reg En2 | SEN1379-1395 (1387 present) | STY (SOME) | part of PHAGE SE14, ligA, B, C, D, F, ydaD |
Reg En3 | SEN1432-1435 | SGAL | ROD13 genomic island, idonate and gluconate dehydrogenase, sugar transport |
Reg En4 | SEN1500-SEN1506 | LT2, SL1344, (CT18 and SBG some) | part of ROD14 genomic island |
Sing En1 | SEN0196 | SBG | fhuA, ferrichrome iron receptor |
Sing En2 | SEN0281 | NO | safA, fimbrial subunit |
Sing En3 | SEN0356 | SGAL | putative autotransporter |
Sing En4 | SEN1515 | CT18, TY2, LT2, DT104, SL1344, SBG, SPA, SGAL | Ni/Fe-hydrogenase 1 b-type cytochrome subunit HyaC2 |
Sing En5 | SEN1539 | CT18, TY2, LT2, DT104, SL1344, SBG, SPA, SGAL | dcp, dipeptidil carboxipeptidaseII |
Sing En6 | SEN2167 | CT18, TY2, LT2, DT104, SL1344, SBG, SPA, SGAL | conserved hypothetical protein |
Sing En7 | SEN2420 | SGAL | putative exported protein |
CT18: S. Typhi CT18, TY2: S. Typhi Ty2, LT2: S. Typhimurium LT2, DT104: S. Typhimurium DT104, SL1344: S. Typhimurium SL1344, SBG: S. bongori, SPA: S. Paratyphi A, SGAL: S. Gallinarum.
Gene Range | Homologous | Gene Description | |
---|---|---|---|
Reg Du1 | SG1032-1044 | NO | clpB, Rhs proteins, conserved hypot proteins |
Reg Du2a | SG1182-1195 | SOME SDT, SOME STY | Gyfsi-2 like prophage, phage proteins and cel division inhibitor kil |
Reg Du2b | SG1211-1219 | STM, SDT, SL | Gyfsi-2 like prophage, phage proteins |
Reg Du3a | STY0289-0294 | STM, SDT, SL, SPA, TY2, SOME GAL | SPI6, hypothetical and clpB heat shock protease like protein |
Reg Du3b | STY0302-0310 | STM, SDT, SL, SPA, TY2 | SPI6, hypothetical conserved, membrane and lipoproteins |
Reg Du3c | STY0320-0323 | STM, SDT, SL, SPA, TY2 | SPI6, hypothetical and RHS proteins |
Reg Du4 | STY1020-1036 | TY2, SOME STM, SDT, SL | S. Typhi prophage 10, DNA binding and phage proteins, methyltransferase |
Reg Du5 | STY2043-2045 | SOME SDT | S. Typhi degenerate bacteriophage,putative endolysin |
Reg Du6 | STY3662-3671 | TY2, SOME STM | Phage proteins, regulatory protein CII, DNA adenine methylase |
Sing Du1 | SG1227 | STM, SDT, SL | phage tail protein |
Sing Du2 | SG3368 | STY, STM, SDT, SL, SBG, SPA | possible membrane transport protein |
Sing Du3 | STY0602 | SDT, SBG, SPA | phage integrase |
Sing Du4 | STY1444 | TY2, STM, SDT, SL, SBG, SPA | putative glycolate oxidase |
Sing Du5 | STY2690 | TY2, STM, SDT,SL | hypothetical protein |
Sing Du6 | STY3029 | NO | transposase |
CT18: S. Typhi CT18, TY2: S. Typhi Ty2, LT2: S. Typhimurium LT2, DT104: S. Typhimurium DT104, SL1344: S. Typhimurium SL1344, SBG: S. bongori, SPA: S. Paratyphi A, SGAL: S. Gallinarum.
a : distribution of the 83 S. Enteritidis specific pseudogenes.
b :distribution of the 177 S. Dublin specific pseudogenes.
Gene | Choleraesuis Pseu/Absenta | Gene Despcription |
---|---|---|
SEN0042 | YES | putative transport protein |
SEN0325 | NO | possible transmembrane regulator |
SEN0621 | NO | putative sigma54 dependent transcriptional regulator |
SEN0784 | YES | hypothetical protein |
SEN1194 | NO | putative membrane transport protein |
SEN1331 | NO | conserved hypothetical protein |
SEN1335 | NO | putative membrane protein |
SEN1524 | NO | putative membrane protein |
SEN2173 | NO | putative transcriptional regulator |
SEN2182b | YES | mglA, galactoside transport ATP binding protein |
SEN2493b | YES | shdA, Peyer´s patch colonization and shedding factor |
SEN2611 | NO | putative type I secretion protein, SPI9 ATP-binding protein |
SEN2783 | YES | conserved hypothetical protein |
SEN2806 | YES | ygcY probable glucarate dehydratase |
SEN3461 | YES | lpfC, outer membrane usher protein |
SEN3537 | NO | rfaZ (waaZ) LPS core biosynthesis protein |
SEN3571 | NO | yicJ sodium galactoside family symporter |
SEN3672 | YES | probable PTS system permease |
SEN3954 | NO | nfi, putative endonuclease V |
SEN4259 | NO | hypothetical protein |
SEN4290 | YES | Type I restriction-modification system methyltransferase |
a YES indicates that the corresponding gene is a pseudogene or is absent in the genome of S. Choleraesuis SC-B67. NO indicates that corresponds to an active gene.
b indicates that corresponds to a pseudogene in the sequences of S. Typhi CT18 and Ty2 as well as in S. Paratyphi A ATCC 9150 and S. Paratyphi A AKU_12601, as analyzed by Holt et al. [22].
Isolates were maintained frozen at -80°C in LB containing 25% glycerol. Bacteria were cultured in LB broth, or on LB containing 1.6% agar, or Tryptic Soy Agar. All isolates were identified as Salmonella enterica using standard biochemical tests and microbiological methods. Serovar was determined by the slide agglutination test for O antigen and the tube agglutination test for H antigen, using commercially available anti-O and anti-H antisera (Difco, France). Differentiation between S. Enteritidis and S. Dublin was confirmed by PCR for the detection of genetic regions specific for Enteritidis [32] and by sequencing the fliC gene, which differs between these serovars.
2.2. Comparative Genomic Hybridization Analysis (CGH)
Four S. Dublin strains were analyzed by CGH using the Salmonella generation IV microarray [30, 33, 34] with PT4 DNA [28] as reference. The array is non-redundant and contains coding sequences from the following eight genomes: S. enterica serovar Typhi (S. Typhi) CT18, S. Typhi Ty2, S. Typhimurium LT2 (ATCC 700220), S. Typhimurium DT104 (NCTC 13348), S. Typhimurium SL1344 (NCTC 13347), S. Enteritidis PT4 P125109 (NCTC 13349), S. Gallinarum 287/91 (NCTC 13346), and S. bongori 12419 (ATCC 43975). Total DNA (including plasmid DNA) was extracted from each strain using a Genome DNA extraction kit (Promega) and quantified by agarose gel electrophoresis. Labeled DNA from S. Enteritidis PT4 (control sample) and one of the query Salmonella strains (experimental sample) were mixed in equal volumes and concentrations and hybridized to the microarray slides as previously described [30]. Data were normalized to the median value, and the total list of 6,871 genes was filtered by removing those spots with a high background and those without data in at least one of the replicates (three slides per strain, duplicate features per slide). After filtering, a list of 5,695 genes was obtained that corresponded to genes that presented a valid signal in at least one of the strains analyzed. Data analysis was performed on Excel files, following criteria previously described [30].
Genes assigned as absent/divergent in all S. Dublin isolates were compared to the core genome of S. Enteritidis as defined in our previous study [30]. Genes detected as present in all S. Dublin isolates but absent in S. Enteritidis PT4 were compared with the S. Enteritidis dispensable genome as well as with the fully sequenced Salmonella isolates available in the NCBI database. Genes encoded in plasmids were not considered in this analysis.
2.3. Web Based Comparative Genomics
The sequences and annotations of the Salmonella genomes analyzed here were obtained from the data available at NCBI [http://www.ncbi.nlm.nih.gov/]. Nucleotide sequences were analyzed using the sequence visualization and annotation tool Artemis version 10 [35]. The search for homologous genes and regions was performed using Blast-n and Blast-p online at the NCBI website.
2.4. Pseudogene Screening in S. Dublin Isolates
The sequences of nine CDS detected as pseudogenes in the S. Dublin, S. Gallinarum and S. Choleraesuis sequenced strains were evaluated in all 7 S. Dublin isolates included in this work. Genomic DNA was extracted from the bacterial strains using DNeasy blood and tissue kit (Qiagen). Specific primers for amplification and sequencing were designed based on the sequences of the corresponding regions in the genomes of S. Enteritidis PT4 and S. Dublin CT_02021853 (Table 2). PCRs were conducted using a 10:1 mix, in terms of units, of Taq Polymerase and Pfu Polymerase (Fermentas) and the PCR products were sequenced. Sequences were analyzed and aligned using BioEdit Sequence Alignment editor version 7.0.9.0, 2007.
3. RESULTS
3.1. Microarray-based Comparative Genomics of S. Enteritidis and S. Dublin Isolates
The genetic content of the 4 S. Dublin isolates was evaluated by microarray and a core genome (i.e. genes present in all strains) was defined. To explore the genetic determinants underlying the phenotypic differences between S. Dublin and S. Enteritidis, we compared the core genome of S. Dublin with the previously defined core genome of S. Enteritidis [30]. We found 3771 genes shared by both serovars, whereas 33 genes were only present in S. Enteritidis strains (Table 3) and 87 genes were only present in S. Dublin isolates (Table 4). The regions of difference found by CGH analysis are similar to the regions of difference obtained from comparison of the genomes of the two sequenced strains PT4 and CT_02021853 (results not shown). From these 120 (33 + 87) genes which are exclusive of one serovar or the other, 53 are bacteriophage-encoded.
As shown in Table 3 four DNA regions and seven single genes were present only in S. Enteritidis. Region En1 (SEN083-SEN085) encodes two putative secreted proteins and one sulphatase. BLAST analysis revealed that this region has homologues in several fully sequenced serovars of Salmonella, including S. Gallinarum, S. Typhi, S. Paratyphi A, S. Paratyphi B, S. Choleraesuis, S. Typhimurium, S. Agona, S. Newport and S. Heidelberg. Region En2 (SEN1379-1395), corresponds to phage SE14 [28], that includes genes encoding for DNA nucleases and membrane proteins, and was previously postulated to be a region of difference between S. Enteritidis and all other Salmonella serovars [30, 36] . Region En3 (SEN1432-1435) corresponds to a genomic island previously described as ROD13 [28] that encodes for idonate dehydrogenase, gluconate dehydrogenase, proteins involved in sugar transport, and proteins similar to those required for hexonate uptake. This genomic island is present in the S. Gallinarum genome sequence, but is absent from all other salmonellae sequenced to date. Region En4 (SEN1500-1506) corresponds to part of another genomic region, named ROD14 [28], and encodes for a putative transcriptional regulator akin to the LacI family, and other regulatory proteins probably involved in drug efflux. This region is present in the genome sequences from various S. Typhimurium strains, but is degraded in the S. Gallinarum and PT4 genome sequences.
Six regions and six isolated genes are present only in S. Dublin (Table 4). Region Du1 comprises thirteen genes previously annotated within the genome of S. Gallinarum (SG1032-1044) which include proteins that are members of the Rhs family, Clp proteases and exported proteins. Region Du2 (SG1182-1195 and SG1211-1219) corresponds to part of the Gifsy-2-like prophage remnant present in the genome of S. Gallinarum [28]. Region Du3 corresponds to genes found in SPI-6 from S. Typhi CT18.
Regions Du4, Du5 and Du6, correspond to prophages found in the genome sequence of S. Typhi CT18 [16]. Single genes present only in S. Dublin strains include a membrane transport protein (SG3368), a putative glycolate oxidase (STY1444) and several phage-related proteins.
Microarray methodology allowed us to detect only presence or absence/divergence of genes, but not small variations in gene sequences. Considering that pseudogene accumulation has been postulated to be involved in host restriction and adaptation, we decided to compare the pseudogene content among the available genomic sequences of both serovars and then evaluate if the Uruguayan S. Dublin clinical isolates harbour a particular set of these pseudogenes.
3.2. Pseudogene Analysis
Analysis of the genomes available in the NCBI database for S. Dublin CT_02021853 and S. Enteritidis PT4 strains, show that they have 289 CDS and 111 CDS annotated as pseudogenes respectively. From the 289 S. Dublin pseudogenes, 7 have no homologues in the S. Enteritidis sequence, and 32 correspond to intergenic regions. Among the others, 38 are homologous with 29 pseudogenes in S. Enteritidis, whereas the other 212 pseudogenes in S. Dublin correspond to 177 active genes in S. Enteritidis. Conversely, there are 83 S. Enteritidis pseudogenes that appear to be functional in S. Dublin CT_02021853. We analyzed the pseudogenes specific of each serovar, and grouped them in different classes according with their homology with functional CDS (Table 5).
S. Enteritidis, S. Dublin and S. Gallinarum form a related cluster of serovars but with marked differences in host-specificity, thus we also included S. Gallinarum in the pseudogene analysis. There is a single annotated genome sequence for this serovar that contains 309 pseudogenes [28] and among them only 21 are also annotated as pseudogenes in S. Dublin but not in S. Enteritidis (Table 6). This group of CDS includes nine that are also inactive (7) or completely absent (2) in the other host-restricted serovar S. Choleraesuis [37] and are described in Table 6.
Overall, the presence of these nine pseudogenes could be regarded as potential distinguishing markers of host-restricted serovars, thus we decided to evaluate their sequences in all S. Dublin Uruguayan isolates obtained from human infections (4 strains analyzed by CGH as described above plus 3 other isolates, Table 1). We found that all 7 isolates have these 9 CDS inactivated as pseudogenes, either by the same point mutations that are present in the fully sequenced S. Dublin CT_02021853 strain (7 of the 9 CDS) or by a different deletion as is the case of the CDS homologous to SEN2493 and SEN4290. Recently the genome sequence of another S. Dublin strain (S. Dublin 3246), was publicly released (GenBank: CM001151) [29]. We found that all 9 CDS are also pseudogenes in this strain. Further, in all but one of them the inactivation is due to the same changes than in S. Dublin CT_02021853. Interestingly, the exception is the CDS corresponding to SEN4290, which possess the same deletion than the Uruguayan strains analyzed here.
DISCUSSION
S. Enteritidis and S. Dublin are two closely related serovar with marked differences in pathogenic traits and epidemiological behavior, thus it is reasonable to assume that genomic comparison between them could shed some light on the molecular basis of these differences. A single previous report described a microarray-based genome comparison [38], and here we conducted a similar analysis using a different set of field isolates and microarray chip. Further, we now report a comparison of the full genome sequences of S. Enteritidis and S. Dublin particularly looking at differences in pseudogene composition between them.
Our comparative genome hybridization study predicted 33 genes specific to S. Enteritidis and 87 specific to S. Dublin. The analysis revealed four genetic regions and seven single genes that seem to be exclusive of S. Enteritidis core genome, as well as six regions and six single genes specific for S. Dublin. These results corroborate and extend the previous report where 3 S. Dublin and 24 S. Enteritidis strains where compared [38]. This report described the same four regions specific for Enteritidis but only one of the six S. Dublin regions found by us. This particular region, that we denominated Du3, corresponds to regions B24, B25_a and B25_b as of the earlier report. Region Du3 corresponds to genes found in SPI-6 from S. Typhi CT18. This region encodes a ClpB heat-shock protease-like protein, as well as different membrane proteins and lipoproteins that belong to the T6SS encoded in this island. Interestingly, this region includes a gene in the rhs family (STY0321) that has no homologue in the CT_02021853 genome sequence.
Among the other regions specific for S. Dublin described here, Region Du1 was recently proposed to be a pathogenicity island (SPI-19) identified in S. Gallinarum, S. Dublin, S. Weltevreden and S. Agona that encodes a type-6 secretion system (T6SS) [39]. In S. Enteritidis, an internal deletion has eliminated most of the island. Region Du2, includes various bacteriophage regulatory proteins, recombinases, transposases, and structural proteins. It also includes one gene (SG1186) previously annotated as encoding a putative phage-encoded cell division inhibitor protein belonging to the kil super-family and associated with the capacity to inhibit the essential ftsZ cell-division gene [40]. ftsZ expression is altered during the intracellular phase of infection with S. enterica, a process that is independent of sulA, a known inhibitor of ftsZ [41]. Genes encoding proteins belonging to the same super-family are also present in several S. Typhi genome sequences, as well as in other enterobacteria (e.g. different STEC strains, Shigella flexneri, Shigella dysenteriae and others) as revealed by Blast-p analysis, suggesting a possible role for these proteins in pathogenesis. Regions Du1 and Du2 were not represented in the microarray used by Porwollik and collaborators [38], thus we cannot exclude that these regions were also present in the strains studied there, but simply not found because of the particular microarray used. Regions Du4, Du5 and Du6, correspond to prophages found in the genome sequence of S. Typhi CT18 [16]. Region Du4 comprises 17 genes from a lambdoid bacteriophage that include several CDS encoding for DNA binding proteins. Region Du5 includes 3 genes that are part of a degenerate bacteriophage; one of these (STY2044) encodes a putative endolysin similar to several lysozymes from E. coli and Shigella strains. Region Du6 spans 10 genes including a DNA adenine methylase (STY3667), regulatory proteins and endonucleases. These 3 regions of difference were not found in the earlier report, despite the CDS been present in the microarray. Instead, that work reported differences in other prophage-derived genes. Thus, it could that the genomes of the particular set of strains used in both studies posses different prophage composition. The analysis of Du4-Du5-Du6 in both S. Dublin sequenced isolates, revealed that regions Du5 and Du6 are very conserved in both strains whereas region Du4 is almost complete in CT_ 02021853 but incomplete and less conserved in strain 3246, supporting the hypothesis of different content in phage genes among S. Dublin isolates.
Among the seven single genes that are for the first time described here as absent in S. Dublin strains, safA (SEN0281) and dcp (SEN1539) are of special interest. safA is the first gene of the saf fimbrial operon and encodes a lipoprotein. The operon forms part of the degraded pathogenicity island SPI-6 in the S. Enteritidis chromosome. This operon is not annotated in the S. Dublin genome sequences available. However, Blast analysis revealed that this is a region highly conserved at a nucleotide level between PT4 and both S. Dublin sequenced isolates. There are several stop codons in the S. Dublin sequence homologous to safA, suggesting that this gene is in process of degradation. The fact that we cannot detect safA by CGH in the S. Dublin Uruguayan isolates may be related with this. The dcp gene encodes for dipeptidyl-carboxypeptidase II, which is highly conserved among the Enterobacteriaceae. This gene has been described previously as a frequent site for SNPs in S. Enteritidis [42], and it is absent from the CT_02021853 sequence.
Overall, the CGH analyses did not detect clear differences in genes that have been previously reported as required for virulence to explain the differences in pathogenicity of both serovars. However, the presence/absence of a gene, as detected by this methodology, does not inform about its expression, thus these results should be interpreted with caution.
The high number of pseudogenes detected in CT_02021853 suggests that this mechanism might be relevant in the process of host adaptation of this serovar, as well as in the different epidemiological and pathogenic behavior of S. Dublin when compared with S. Enteritidis. As we describe in Table 5, we observed a differential distribution of functionality amongst the CDS inactive in S. Enteritidis and S. Dublin. More than 40% of the pseudogenes specific for S. Enteritidis correspond to CDS related to phages or transposases but only 12% with those involved in metabolism and regulatory proteins. Conversely, among the pseudogenes specific for S. Dublin 33% correspond to CDS encoding proteins involved in central metabolism or regulatory proteins and 37% to CDS related to surface structures but only 3% to phages and transposases. These observations may be relevant to understand the host restriction of S. Dublin.
We found 21 CDS that appear to be active genes in the broad host-range S. Enteritidis but pseudogenes in the host-restricted S. Dublin and in the host-specific S. Gallinarum. From this set of CDS, 9 are pseudogenes as well in the other host-restricted serovar S. Cholerasuis suggesting that their inactivation could be relevant as genetic determinants of host adaptation. These nine CDS correspond to two hypothetical proteins (SEN0784 and SEN2783), one putative transport protein (SEN0042), the gene encoding the outer membrane usher protein LpfC (SEN3461), one probable phosphotransferase system permease (SEN3672), one gene encoding a putative Type I restriction modification system protein (SEN4290), and the gene encoding a probable glucarate dehydratase 2 (SEN2806 or ygcY). The other two genes that complete this list are mglA (SEN2182) and shdA (SEN2493), which are pseudogenes in S. Typhi CT18 and Ty2 as well as in S. Paratyphi A ATCC 9150 and S. Paratyphi A AKU_12601 [22]. ShdA is involved in colonization of Peyer’s patches by S. Typhimurium and in shedding of the bacteria after infection [43-45]. MglA is a galactoside transport ATP binding protein. The roles of these genes in the broad host-range of S. Enteritidis remain to be established.
All these nine CDS are pseudogenes in the seven S. Dublin clinical isolates evaluated in this work, as well as in the other fully sequenced isolate S. Dublin 3246, suggesting that the lost of their functionality is not a consequence of random mutation. Two of these 9 pseudogenes in the Uruguayan isolates have lost their functionality by mutations that are different from those seen in the sequenced strain CT_02021853 suggesting that this loose of functionality involves a process of convergent evolution.
In conclusion, our results show several genetic differences that may help to explain why such close related organisms can nevertheless behave with such marked differences. Comparison of larger numbers of field strains at full genome scale is becoming increasingly feasible, and may provide new insights into the genetic basis of host adaptation.
CONFLICT OF INTEREST
None declared.
ACKNOWLEDGMENTS
This work was jointly supported by a project grant from the Wellcome Trust (078168/Z/05/Z) and by the Central Research Committee (CSIC) of the Universidad de la República Uruguay. We like to thanks Gordon Dougan and Derek Pickard for their helpful advices.