Skip to main content

Genome analysis reveals Erwinia amylovora in China is closely related to those of Kyrgyzstan and Kazakhstan

Abstract

Erwinia amylovora, the causal agent of fire blight disease, is a destructive agricultural plant pathogen in the Rosaceae family. The disease was first described in North America and has now spread to more than 50 countries. The disease has recently been reported in China’s neighboring countries, including Russia, South Korea, Kyrgyzstan, and Kazakhstan. In 2016, the typical symptoms of fire blight in pear tree shoots and leaves were observed in Xinjiang province and later, it spread eastward to Gansu province, posing a threat on the national pear and apple industries. In this study, fire blight pathogen was isolated from diseased plants and the whole genome sequencing was performed for 14 (12 from Xinjiang and 2 from Gansu), 3, and 2 isolates from China, Kyrgyzstan, and Kazakhstan, respectively. Phylogenomic analysis showed that the Chinese isolates are closely related to those from Kyrgyzstan and Kazakhstan. Genome annotation reveals that most of the gene families and metabolic pathways are conserved in E. amylovora isolates. These results provide useful information in understanding how the fire blight occurs in China and what measures should be taken to eradicate the disease.

Background

Fire blight, caused by the Gram-negative bacterium Erwinia amylovora, is a devastating disease of rosaceous plants (i.e., apple and pear). Since it was discovered, fire blight has been considered as the most destructive disease of pome fruit trees and a major threat to apple and pear production. The typical symptom of fire blight shows water soaking first, followed by wilting and rapid necrosis, leaving infected tissues with a scorched and blackened appearance (Eastgate 2000). These symptoms can be seen on all above-ground tissues, including blossoms, fruits, shoots, branches, limbs, and in the rootstock near the graft union on the lower trunk (Zhao et al. 2019). The quarantine of E. amylovora imposes additional economic costs by phytosanitary measures (Krissoff and Calvin 1998).

Fire blight disease is indigenous to North America and has spread to more than 50 countries and regions, including countries of North America, Europe, North Africa, the Middle East, Oceania, and Asia. Currently, it has reached Central/East Asia, the native origin of apple germplasm resources, including Kyrgyzstan, Kazakhstan, and South Korea (Doolotkeldieva and Bobusheva 2016; Park et al. 2016). In 2016, fire blight was first observed on pear and apple trees in Yili, Xinjiang province of China (Wang et al. 2022). Since then, it has spread to most pear-producing regions in Xinjiang province and adjacent Gansu province, causing considerable yield reduction and economic losses to the apple and pear industries (Sun et al. 2023).

As of August 2024, 253 assemblied E. amylovora genomes at different levels are publicly available in GenBank. The sequenced Erwinia genomes from different hosts enable the identification of virulence, host-specificity, and metabolic determinants involved in pathogenicity by comparative genomic analysis (Kamber et al. 2012). Comparative genomic analysis of Spiraeoideae- and Rubus-infecting strains of E. amylovora led to the identification of a pan-genome with the large conservative core genome (Mann et al. 2013). The genomic features of the Korean E. amylovora strains were also investigated, and evolutionary pathways were inferred based on comparative genome analysis and phylogenomic reconstruction (Song et al. 2021). The first reported genome of E. amylovora isolate in China was published in 2023 (Fei et al. 2023). In this study, the comparative genomes of nineteen isolates from China, Kyrgyzstan, and Kazakhstan were conducted, providing a clue for the origin and the outbreak of fire blight in China.

Results

Overview of the genomic features

The fourteen Chinese E. amylovora isolates, twelve from Xinjiang and two from Gansu province, three Kyrgyzstan, and two Kazakhstan isolates were obtained from symptomatic plant tissues, which were confirmed to be E. amylovora using the primers p29A/B that target the plasmid pEA29 (Additional file 1: Figure S1). The plasmid patterns of the 19 isolates were similar among the isolates except for XJ15 (Additional file 1: Figure S2). Genome sequences of these isolates were obtained by whole genome sequencing and were deposited in the National Center for Biotechnology Information (NCBI) GenBank database under the accession numbers listed in Table 1.

Table 1 List of sequenced isolates from China and other countries

The basic genome features of the nineteen strains are shown in Additional file 2: Table S1 and Fig. 1. Genome size and gene numbers of the nineteen strains shared a high similarity among the isolates. The circular chromosomes of the nineteen genomes range from 3.83 to 3.99 Mb with an average GC content of 53.5%. Among them, the genome size of GS01 is the smallest, whereas the size of HZ01 from Kazakhstan is the largest. The average predicted protein-coding numbers present in Chinese isolates were 3866, while those of Kyrgyzstan and Kazakhstan were 3839 and 4101, respectively. The number of tRNA genes ranges from 67 to 90, and the number of sRNA is between 25 and 54.

Fig. 1
figure 1

Circular representation of genomes of 19 Erwinia amylovora isolates. From outside to the center, circles represent CDS on forward strand, CDS on reverse strand, G + C content, GC skew and genome size

Phylogenetic analysis of E. amylovora isolates

Phylogenetic tree was constructed using the core genome of the nineteen strains and the previously reported E. amylovora isolates (Fig. 2). It showed close relationships and high sequence similarities within the isolates. The isolates from China, Kazakhstan, and Kyrgyzstan were clustered to be separate branches, suggesting that although their genome sequences are not identical, these isolates were originated from a single source. In addition, the large branch of the nineteen isolates is closest to the isolates from Britain, Switzerland, and France, and then to those from North America (Fig. 2).

Fig. 2
figure 2

Phylogenetic analysis of E. amylovora strains. The phylogenomic relationships were inferred with a core genome alignment of 31 E. amylovora strains

To further confirm the genetic relationship between these isolates, synteny plots were generated using MUMmer version 3.22 (Kurtz et al. 2004). Results showed that the similarity between KG01 and other six genomes (KG02, KG03, XJ10, XJ12, XJ14, and GS01) of these 19 isolates was high (Fig. 3). These results further suggested that the 19 E. amylovora isolates are very closely related at genomic level.

Fig. 3
figure 3

Comparison of genomic sequences between isolates

Gene function analysis

About 3801–4187 protein-coding sequences were predicted in the 19 isolates. Genome function was annotated by blasting genes against different databases, including COG (Cluster of Orthologous Groups of proteins), KEGG (Kyoto encyclopedia of genes and genomes), and GO (Gene ontology) databases.

Based on GO database annotation, distribution of genes in cellular component, molecular function, and biological process can be retrieved. In the category of cellular component, the genes are mainly distributed in cell part, membrane, macromolecular complex and organelle category. In the category of molecular function, the genes are mainly concentrated in catalytic activity, binding, transporter activity, nucleic acid binding transcription factor activity, and structural molecule activity (Fig. 4a). In the category of biological process, the genes are primarily distributed in cellular process, metabolic process, single-organism process, localization, and regulation of biological process.

Fig. 4
figure 4

Genome function annotation of 19 E. amylovora strains. a Distribution of GO annotation; b Distribution of KEGG annotation; c Frequency of cluster genes within 19 genomes; d COG functional categories

Figure 4b showed the statistics of KEGG annotation. Gene functions of E. amylovora are enriched in membrane transport, amino acid, carbohydrate, energy and nucleotide metabolisms, cell motility, signal transduction, and cellular community. By cluster analysis, genome sequences of the nineteen isolates were further grouped into 4016 gene clusters, each of which represents a collection of orthologous genes. As shown in Fig. 4c, 3095 gene clusters exist in the genomes of all nineteen isolates, which forms the core genome. Roughly, 92% of the genes in the genomes are identical.

Among the core genome sequences, COG annotation showed that the number of genes involved in amino acid transport and metabolism is the largest cluster (average 7.5%), followed by cell wall/membrane/envelope biogenesis (average 5.8%), translation, ribosomal structure and biogenesis (average 5.7%), carbohydrate transport and metabolism (average 5.4%), and transcription (average 5.3%) (Fig. 4d).

Secretory systems

Plant pathogenic bacteria usually contain type I to type VI secretory systems, which play important roles in pathogenesis and environmental adaptation. Gram-negative bacteria like Erwinia, Xanthomonas, Pectobacterium, and Ralstonia species usually have the type II secretion system (T2SS), an important virulence factor (Smits et al. 2010). T2SS is mainly composed of the outCDEFHJKLMNOS-chiV proteins. T2SS is found in all nineteen isolates, and the sequence of genes are identical, indicating that T2SS is highly conserved.

Type III secretion system (T3SS) is one of the most important pathogenicity factors for E. amylovora, enabling them to successfully infect the host plants. Bacteria transport a large number of effectors into plants through T3SS (Khokhani et al. 2013; Vrancken et al. 2013). T3SS of plant pathogenic bacteria are mainly composed of Hrc and Hrp proteins. Among them, hrpW, hrpN, hrpS, hrpY, hrpX, hrpJ, hrpK, hrcC, hrcJ, hrcN, and hrcS are conserved (Fig. 5a, Additional file 2: Table S2). Meanwhile, all nineteen isolates exhibited virulence in young pear leaves and immature pear fruits (Additional file 1: Figure S3).

Fig. 5
figure 5

a Physical map of type III secretion system; b Distribution of regulators of cyclic di-GMP; c Physical map of amylovoran biosynthesis gene cluster; d Gene clusters of secondary metabolites

Cell wall degrading enzymes

Plant pathogens utilize cell wall degradation enzymes, such as cellulase and pectinase, to break down cellulose and pectin in the epidermis of plants. A large number of plant cell wall degrading enzyme genes are identified in the nineteen genomes (Additional file 2: Table S3 and Table S4) including polygalacturonase (GH28) and pectin lyase (PL1, PL3). Additionally, an Arabinan endo-1,5-Alpha-L-arabinosidase (GH43) is present in fourteen E. amylovora isolates, but not in the other five isolates (Additional file 2: Table S5).

Flagellum synthesis and chemotactic gene analysis

E. amylovora is a nonobligate plant pathogenic bacterium, which can be spread through different means to survive under unfavorable conditions (Santander et al. 2014). It has been reported that the motility mechanism has been proven to be closely related to bacterial transmission and host adhesion. It has been observed that flagella-mediated motility and chemotaxis is very important for the colonization on plants. Flagellum synthesis genes in the genomes of nineteen isolates were systematically analyzed. The complete flagellum synthesis gene clusters and 10–11 chemotactic proteins are identified in the genome of nineteen E. amylovora isolates.

Signal transduction system

Two-component system (TCS) is the most essential signal transduction system that mediates bacterial response to environmental signals. It was reported that the enterobacterial-specific Rcs phosphorelay system is essential for pathogenicity and amylovoran production in E. amylovora (Lee et al. 2018). As an unusual complex TCS, the Rcs system is composed of three core RcsBCD proteins and one auxiliary RcsA protein, and both of them do not carry the phosphorylation site (Ancona et al. 2015). The Rcs system exists in the nineteen E. amylovora genomes and is highly conserved.

The second messenger cyclic di-GMP (c-di-GMP)-mediated signaling plays a key role in the establishment and development of plant infections (Edmunds et al. 2013). C-di-GMP is synthesized by diguanylate cyclase (DGC) enzymes, which contains GGDEF domains and is degraded by phosphodiesterase (PDE) enzymes encoding either an EAL or an HD-GYP domain. We found that all three types of proteins can be identified in the genomes of the nineteen isolates (Fig. 5b), including seven GGDEF proteins that are mainly involved in c-di-GMP synthesis. In addition, eight EAL proteins are mainly involved in c-di-GMP degradation, and six HD-GYP proteins are mainly involved in c-di-GMP phosphorylation.

Exopolysaccharide biosynthesis

It is well-known that bacterial exopolysaccharides (EPSs) can protect bacteria from hostile conditions (Ordax et al. 2010). Amylovoran and levan are two important exopolysaccharides produced by E. amylovora. Amylovoran contributes to pathogenesis and biofilm formation (Koczan et al. 2011). It is a heteropolymer of glucuronic acid and galactose synthesized by the ams operon, which includes twelve genes from amsA to amsL presenting in a large gene cluster (Geider 2000). These genes are conserved in the nineteen genomes (Fig. 5c). The genes rscB3, rcsC, and yojN control amylovoran synthesis, which are also very conserved. Furthermore, the two primary regulative genes in levan production, rlsA and rlsB, are conserved in all nineteen strains.

Determinant of iron intake

Iron serves as a cofactor for many proteins, and is a fundamental nutritional component for all living organisms. In bacteria, siderophore production and uptake are regulated by iron uptake receptors, which control iron uptake globally (Dellagi et al. 1998). Ferric uptake regulator (Fur) exists in the nineteen isolates and is conserved, including the hydroxamate siderophore desferrioxamine E (DFO E) and the dfoJAC genes. (Additional file 2: Table S6).

The reabsorption of siderophore depends on TonB receptors, which help transport siderophore to the periplasmic space. Four TonB-dependent receptor genes are present in the genomes, including OprC which codes the copper receptor protein and FoxR which codes the feroxyamine receptor protein. Both of the two receptor protein functions are unknown.

Comparative analysis of secondary metabolites

Plant pathogenic bacteria produce a series of toxins and other secondary metabolites to help infect host plants via the nonribosomal peptide synthetase (NRPS) and polyketide synthetase (PKS) pathways. A total of twelve distinct NRPS/PKS gene clusters were predicted in the nineteen isolate genomes using the antiSMASH software (Fig. 5d). These gene clusters shared high similarities with the previously identified secondary metabolite gene clusters (Additional file 2: Table S7).

Gene clusters 1, 2, and 3 were present in all nineteen strains and shared 75%, 9%, and 13% similarity with clusters desferrioxamine E, basiliskamide A/B, and anthelvencin A/B/C, respectively. Gene cluster 4 was present in 11 isolates and had 100% similarity with the gamexpeptide C gene cluster. Gene cluster 5 was identified in eight isolates and shared 100% similarity with kilossin gene cluster. Gene cluster 6 was present in four isolates and showed 100% similarity with the rhizomide A/B/C gene cluster. Gene cluster 7 only existed in XJ14, and the similarity to xenotetrapeptide gene cluster was 100%. Gene cluster 8 was only found in KG03, with a 13% similarity with the chitinimide A-G gene cluster. Gene cluster 9 only existed in strain XJ19, and its similarity to xenoamicin A/B gene cluster was 20%. Gene cluster 10 was only present in strain XJ23, sharing 33% similarity with endopyrrole A/B gene cluster. Gene cluster 11 was only found in HZ01, which shared 50% similar with tolaasin F/I gene cluster. Gene cluster 12 was only found in HZ02, which showed 42% similarity to gacamide A gene cluster (Additional file 2: Table S7).

Discussions

E. amylovora has been included in the “Catalogue 366 of quarantine pests for import plants to the People's Republic of China” in the year of 2007. In 2020, fire blight was listed in the “National Class I pest list”. China has consistently carried out the strictest quarantine measures for the control of E. amylovora, including the examination of imported pear, apple, hawthorn, and other host plant seedlings and scions. Nonetheless, fire blight disease was first observed in China in 2016.

E. amylovora, one of the top ten important plant pathogenic bacteria, is indigenous in North America in 1780s and spread to the Eurasian continent in the later of the twentieth century. At the beginning of the twenty-first century, it has spread to China’s neighboring countries, including Russia (in 2003), Kazakhstan (in 2008), Kyrgyzstan (in 2008), and South Korea (in 2015). The first occurrence of fire blight was observed in Yili Autonomous Prefecture of Xinjiang province in 2016, where it borders Kazakhstan and Russia. Three years later, the disease spread to most of the regions in Xinjiang province and the adjacent province Gansu. However, the outbreak of the disease in China and how it spreads is still unclear (Sun et al. 2023). To answer these questions, we sequenced 19 E. amylovora isolates, including 12 from Xinjiang province, two from Gansu province, three from Kyrgyzstan, and two from Kazakhstan. The comparative genomic analysis was performed for the isolates, including isolates from Korea (Song et al. 2021).

Genome comparison showed that genome sequences of the 19 isolates are highly similar, with about 92% of the genes that are identical. In addition, most of the gene families and metabolic pathways were very conserved. Phylogenetic tree based on comparative genomics showed that isolates from China are more closely related to those from Kyrgyzstan and Kazakhstan than those from Europe and North America. Moreover, when comparing the genomes of the 19 E. amylovora isolates with the strains involved in the genomic studies by Zeng et al. (2018), they fit in the “Widely-Prevalent” clade. It is thus speculated that fire blight pathogen, which is originated from North America and then spread to Europe, where it further spread from Kyrgyzstan and Kazakhstan to China, although the speculation needs further research to verify. Furthermore, since E. amylovora genomes are highly homogeneous, the genetic diversity among these isolates should be further investigated in the future.

Conclusion

In summary, our results of genome sequencing and annotation would help us to understand and explore the specific mechanisms of the interaction between E. amylovora and plants, and to study the whole metabolic pathway and molecular mechanism at genomic level. The genome sequencing and annotation for pear fire pathogen provide clues about the epidemics of the pathogen in China.

Methods

Bacterial isolates and growth conditions

The E. amylovora isolates used in this study were isolated from fruits, trunk exudates, and branches from symptomatic trees in Xinjiang and Gansu Province, China, Kyrgyzstan, and Kazakhstan (Table 1). Tissues with typical symptoms were surface sterilized with 75% ethanol and allowed to air dry. Small pieces of tissue from the margins of lesions were ground in sterilized water and then left at room temperature for 5 min. Aliquots of the macerated leaf tissue were streaked onto Luria–Bertani (LB) medium, incubated at 28°C for 24 h, and then subcultured onto LB medium to obtain single colonies. Pure isolates of the bacteria were tested for induction of a hypersensitive response by infiltrating fully developed leaves of Nicotiana tabacum cv. Samsun plants with bacterial suspensions (approximately 1 × 108 colony-forming units (CFU)/mL) of 16-h-old cultures. Bacterial cells were cryopreserved in 25% glycerol at −80℃. All strains were routinely cultured in Luria–Bertani (LB) medium at 28°C.

Whole genome sequencing and assembly

Whole genome sequences of nineteen isolates (Table 1) of E. amylovora were determined using Illumina HiSeq 4000 system (BGI, Shenzhen, China). Genomic DNA was sheared to construct two read libraries with length of 300–400 bp and 5–6 kb by a Bioruptor ultrasonicator (Diagenode, Denville, NJ, USA) and physicochemical methods. Paired-end fragment libraries were sequenced according to the Illumina HiSeq 4000 system’s protocol. Raw reads of low quality from paired-end sequencing (those with a certain proportion of low quality (≤ 20) bases, a certain proportion of Ns (10% as default), adapter and duplication contamination) were discarded. Sequenced reads were assembled using SOAPdenovo v1.05 software (http://soap.genomics.org.cn/soapdenovo.html).

Genome annotation

Protein-coding gene sequences from the assembled E. amylovora genomes were predicted using RAST (Rapid Annotation using Subsystem Technology) software (Aziz et al. 2008). Gene prediction was performed for the E. amylovora genome assemblied by glimmer3 (http://www.cbcb.umd.edu/software/glimmer/) with Hidden Markov models. tRNA, rRNA, and sRNA recognition use tRNAscan-SE, RNAmmer, and the Rfam database. The best hit abstracted using Blast alignment tool for function annotation. The databases, KEGG, COG, NR (Non-Redundant Protein Database databases), Swiss-Prot, and GO, TrEMBL, and EggNOG, are used for general function annotation.

Gene family is constructed by the gene of reference, with integration of multi software: align the protein sequence in BLAST, eliminate the redundancy by solar, and carry out gene family clustering treatment for the alignment results with Hcluster_sg software. The antiSMASH program (Medema et al. 2011) was used for secondary metabolite gene cluster identification. The dbCAN software was used to predict the glycoside hydrolase, two-component systems, lipopolysaccharide biosynthesis, flagella, and chemotaxis genes.

Whole genome alignments analysis

The genomes of nineteen E. amylovora isolates were compared with E. amylovora strains as previously reported (Table 1), including CFBP 1430 from France, ATCC49946 from USA, E-2 from Belarus, CFBP 1232 from England, Ea266 from Canada, ATCCBAA-2158 from USA, ACW56400 from Switzerland, LA637 from Mexico, and Ea356 from Germany. BLAST was used to compare conserved genes of all strains according to pairwise comparison method. Based on genome sequences, MEGA7 was also used to generate the whole genome phylogenetic tree.

The synteny of the nineteen isolates with E. amylovora CFBP 1430 strain was performed using MUMmer v3.22. BLAST Core/Pan genes were clustered by the CD-HIT rapid clustering of similar proteins software with a threshold of 50% pairwise identity and 0.7 length difference cutoff in amino acid.

Availability of data and materials

The datasets generated and/or analysed during the current study are available in the NCBI repository, https://dataview.ncbi.nlm.nih.gov/object/PRJNA1087690?reviewer=4du0h9s0pan9dvrj71gk6alluk.

Abbreviations

COG:

Cluster of Orthologous Groups of proteins

GO:

Gene ontology

KEGG:

Kyoto encyclopedia of genes and genomes

References

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by Major Science and Technology Projects of Xinjiang (2023A02006).

Author information

Authors and Affiliations

Authors

Contributions

YT and BH designed the research; YZ, YT, and YZ prepared the materials, JW, SW, XL, and XG analyzed the data and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yanli Tian or Baishi Hu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Supplementary Information

Additional file 1: Figure S1.

Identification of Erwinia amylovora isolates by PCR, using specific primer pairs p29A/B. Figure S2. Plasmid profiling of Erwinia amylovora isolates. Figure S3. Virulence assays results.

Additional file 2:  Table S1.

Basic genomic features of the sequenced isolates. Table S2. Analysis of the T3SS components derived from 19 Erwinia amylovora isolates. Table S3. Analysis of carbohydrate esterases and polysaccharide lyases derived from 19 Erwinia amylovora isolates. Table S4. Analysis of glycoside hydrolases derived from 19 Erwinia amylovora isolates. Table S5. Distribution of plant cell wall degrading enzymes. Table S5. Analysis of hydroxamate siderophore desferrioxamine E biosynthesis genes derived from 19 Erwinia amylovora isolates. Table S7. Distribution of gene clusters of secondary metabolites among the 19 genomes.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Zhao, Y., Wu, S. et al. Genome analysis reveals Erwinia amylovora in China is closely related to those of Kyrgyzstan and Kazakhstan. Phytopathol Res 7, 31 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s42483-025-00316-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s42483-025-00316-6

Keywords