Data paper
Issue: № 3 (21), 2023


Pseudomonas shirazica (the synonymous with Pseudomonas asiatica) is highly diverse bacterial species with important phenotypic traits relevant to human activities. We sequenced and assembled the genome of P. shirazica strain HY376 using long-read sequencing. The assembly resulted in a circular chromosome of 5,683,422 bp with 62.6% G+C content. The annotation revealed 5,181 genes, including 4,951 coding genes, 130 pseudogenes, 22 rRNAs, 74 tRNAs, and four non-coding RNAs. Additionally, it uncovered gene multiplicity and intra-genomic variations in key rRNA genes, which are important for molecular taxonomy. This study adds the eighth complete-level P. shirazica genome to GenBank, along with two complete-level P. shirazica and five P. asiatica genomes.

1. Announcement

The genus Pseudomonas, described formerly by Migula in 1894, is one of the most diverse and ubiquitous bacterial genera whose species were isolated worldwide from all types of environments

, encompassing 313 validly described species as of June 2023
. The species Pseudomonas shirazica was delineated from P.putida clade in 2019 by Keshavarz-Tohid et al. The strain VM14 was proposed as the type strain
. Pseudomonas shirazica Keshavarz-Tohid et al. 2020 was included in validation list No. 194 in 2020
. Independently, P.asiatica was delineated from the same clade in 2019 by Tohya et al.
. Pseudomonas shirazica Keshavarz-Tohid et al. 2020 was assigned as a later heterotypic synonym of Pseudomonas asiatica Tohya et al. 2019
. P.shirazica/P.asiatica encompasses strains that were isolated from various environments and that exhibit a variety of phenotypes. P.shirazica is characterized by Gram-negative cells forming white colonies on Kings’B medium, approximately 3 mm in diameter, after 24 h of growth. The cells are rods, motile, 1.6–5 µm in length, with a growth temperature optimum of 28C and an optimum pH of 7.0

P. shirazica strain HY376 was grown in 50 ml flasks containing 10 ml of PYE medium at 37°C. Cells in the exponential growth phase were collected by centrifugation (20 min 4500 rpm), washed with saline and used for DNA isolation. High molecular weight genomic DNA for long-read sequencing with ONT MinION was extracted using a combination of guanidinium thiocyanate-Triton X-100 lysis, followed by proteinase K and RNAse A enzymatic treatments. Subsequently, a Circulomics nanobind plant nuclei big DNA kit was used, and DNA was further size selected with the Circulomics SRE XL kit. The library was prepared using kit ONT SQK-LSK110 following the manufacturer's instructions, with a final long DNA selection using the large fragment buffer (LFB). Sequencing was performed in a MinION R9.4.1 flow cell. Basecalling was performed with Guppy v6.0.1+652ffd179 (GPU mode) using the super-accurate model (sup) and the --config dna_r9.4.1_450bps_sup.cfg option

. A total of 73318 "passed" reads were received (790400113 bases totally; min_len.: 131 bp; max_len.: 254026 bp; avg_len.: 10780.44 bp). Filtering with NanoFilt 
with parameters -q 14 -l 30000 --headcrop 30 --tailcrop 30  was conducted to select best reads (Q≥14, length ≥ 30K) with appropriate expected coverage 40. This yielded 4926 reads, totaling 247722674 bases (minimum length: 29950 bp, maximum length: 253966 bp, average length: 52284.23 bp).  De novo genome assembly preparation was performed using the Flye v.2.9-b1774 assembler
. The final assembly consisted of a single circular contig with an average coverage of 42.

The assembled genome was annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) version 6.1. The annotation method employed was Best-placed reference protein set, with GeneMarkS-2+ as the gene prediction tool


The genome assembly consists of a circular chromosome measuring 5683422 bp in length, with a G+C content of 62.6%. Among the total of 5181 annotated genes, 4951 are coding genes, 130 are pseudogenes, and there are 100 rRNA genes (8 copies of 5S, 7 copies of 16S, and 7 copies of 23S), 74 tRNA genes, and 4 non-coding RNA genes. The WGS sequencing revealed intragenomic variations in the copies of ribosomal RNA genes. Of particular interest is the 16S ribosomal RNA locus, where the first three copies are identical. We designated the first copy (locus 343032..344568) as the reference. The fourth and sixth copies share the same SNP g.343488C>T, while the fifth copy has the SNP g.344061A>G. The last, seventh copy, bears two SNPs, g.343488C>T and g.343163G>A. Similarly, the 23S ribosomal RNAs exhibit multiple intragenomic variations. Among the eight copies of 5S ribosomal RNAs, seven are identical, while the eighth copy has two SNPs

As mentioned in the introduction, the species P. asiatica/P. shirazica exhibits a broad range of phenotypic diversity, including traits that are highly relevant to practical human activities, such as pathogenicity and hydrocarbon degradation. To comprehensively understand this diversity of phenotypes and harness them for biotechnology purposes, a pangenomic approach is employed, involving the sequencing of genomes from multiple strains of a single species and generating high-quality assemblies. Currently, the GenBank repository includes two complete-level genomes of P. shirazica and five genomes of P. asiatica

. The genome presented in this article adds to this collection and represents the eighth genome of the species.

2. Data availability

The genome sequence of P. shirazica strain HY376 have been deposited in the GenBank database under accession number CP127845 (version CP127845.1). The corresponding BioProject accession number is PRJNA982540, while the BioSample accession number is SAMN35709901. The raw sequencing data can be accessed through the SRA (Sequence Read Archive) under accession number SRR25007542.

Article metrics