Research Article |
Corresponding author: Zachary Lahey ( zachary.lahey@usda.gov ) Corresponding author: Norman F. Johnson ( baeus2@yahoo.com ) Academic editor: Elijah Talamas
© 2023 Zachary Lahey, Huayan Chen, Mark Dowton, Andrew D. Austin, Norman F. Johnson.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Lahey Z, Chen H, Dowton M, Austin AD, Johnson NF (2023) The genome of the egg parasitoid Trissolcus basalis (Wollaston) (Hymenoptera, Scelionidae), a model organism and biocontrol agent of stink bugs. Journal of Hymenoptera Research 95: 31-44. https://doi.org/10.3897/jhr.95.97654
|
Trissolcus basalis (Wollaston) is a minute parasitic wasp that develops in the eggs of stink bugs. Over the past 30 years, Tr. basalis has become a model organism for studying host finding, patch defense behavior, and chemical ecology. As an entry point to better understand the molecular basis of these factors, in addition to filling a critical gap in the genomic resources available for parasitic Hymenoptera, we sequenced and assembled the genome of Tr. basalis using short (454, Illumina) and long read (Oxford Nanopore) sequencing technologies. The three sequencing methods produced 32 million reads (4.10 Gb; 27.9×), which were assembled into 7,586 scaffolds. The 147 Mb (N50: 42.8 kb) assembly contains complete sequences for 93.1% of the insect BUSCO dataset, and an extensive annotation protocol resulted in 14,158 protein-coding gene models, 12,197 (86%) of which had a blast hit in GenBank. Repetitive elements comprised 13.8% of the genome, and a phylogenomic analysis recovered Tr. basalis as sister to Chalcidoidea, a result in line with other studies. We identified 174 rapidly evolving gene families in Tr. basalis, including olfactory receptors and pheromone/general odorant binding proteins. These genetic elements are an obligatory portion of the parasitoid-host relationship, and the draft genome of Tr. basalis has and will continue to be useful in elucidating these relationships at finer resolution.
assembly, biological control, insect genomics, nanopore, Telenominae
Trissolcus basalis (Wollaston) (Hymenoptera: Scelionidae) is a minute, solitary parasitoid of stink bug eggs (Hemiptera: Pentatomoidea), principally the cosmopolitan pest Nezara viridula (L.) (Pentatomidae). This parasitoid is found primarily in tropical and subtropical regions, where it has been used effectively in the biological control of its host (
454 Life sciences
Sequencing followed the protocol of
To correct homopolymer errors in the 454 reads, an Illumina sequencing library was prepared from five female Tr. basalis in the same culture. The DNA extract was prepared for Illumina sequencing using a Nextera DNA Sample Preparation Kit (Epicentre Biotechnologies, Madison, Wisconsin, USA). Sequencing was conducted on an Illumina Genome Analyzer IIx (Illumina, San Diego, California, USA) at the Nucleic Acid Shared Resource (College of Medicine, The Ohio State University, Columbus, Ohio, USA). In total, 29,780,645 51-bp reads (1,518,812,895 bp) were generated.
High molecular weight DNA was extracted from approximately 100 unsexed Tr. basalis using a Gentra Puregene Tissue Kit (Qiagen, Hilden, Germany) following the manufacturer’s protocol. DNA quality was estimated using an Agilent Bioanalyzer. The DNA library was prepared using a Ligation Sequencing Kit 1D. Sequencing was performed on a R9.5 flow cell using an Oxford Nanopore MinION (Oxford Nanopore, Oxford, United Kingdom). The 48-hour MinION sequencing run generated 341,751 reads (1,047,061,835 bp). All steps, excluding DNA extraction, were conducted at The Molecular and Cellular Imaging Center (MCIC; The Ohio State University, Wooster, Ohio, USA).
Pyrosequencing reads are particularly susceptible to the accumulation of homopolymer errors (
The Tr. basalis genome was assembled following a hybrid approach that utilized short (454, Illumina) and long read (Oxford Nanopore) sequencing technologies. 454 and nanopore reads were assembled with SPAdes (version 3.11.1;
Genome statistics were calculated with QUAST (version 4.5;
The Tr. basalis genome was annotated following the protocol of Daren Card (Department of Organismic & Evolutionary Biology, Harvard University), with modifications (https://gist.github.com/zjlahey/3c400c3039eef674e335d3d850ad595f).
Repetitive elements were identified and annotated with RepeatModeler (version open-2.0.1;
Protein-coding genes were annotated in an iterative fashion with MAKER (version 3.01.03;
We followed the protocol on Rfam (https://docs.rfam.org/en/latest/genome-annotation.html) to identify and annotate non-coding RNAs with Infernal (version 1.1.3;
To estimate gene gains, losses, and rapidly evolving gene families within Tr. basalis, we conducted a gene family analysis using the Tr. basalis proteome and the protein sequences of six additional hymenopterans. Taxa were chosen based on the availability of hymenopteran proteomes and included three members of Proctotrupomorpha [Belonocnema kinseyi Weld (Cynipidae), Nasonia vitripennis (Walker) (Pteromalidae), and Trichogramma pretiosum Riley (Trichogrammatidae)]; one member of Ichneumonoidea [Microplitis demolitor Wilkinson (Braconidae)]; one member of Orussoidea [Orussus abietinus (Scopoli) (Orussidae)]; and the turnip sawfly, Athalia rosae (L.) (Tenthredinidae). Protein sequences of A. rosae, O. abietinus, M. demolitor, N. vitripennis, and T. pretiosum were downloaded from OrthoDB v10 (
Orthogroup inference was conducted with OrthoFinder (version 2.5.2;
Rates of gene gain and loss (λ) were estimated with CAFE (version 4.2.1;
This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession JAAMPD000000000. Raw DNA sequencing reads (454, Illumina, Nanopore) are available at the Sequence Read Archive by searching for BioProject Accession PRJNA49235.
We assembled the genome of Tr. basalis de novo using sequence data from second- and third-generation sequencing technologies. The combined read output from all sequencing platforms totaled 4.10 Gb (27.9× coverage). These reads were assembled into 7,568 scaffolds, totaling 147 Mb in length (34.7% GC content). The scaffold N50 was 42.8 kb, and the longest scaffold measured 349,262 kb. Given low read coverage, we were unable to estimate genome size in silico. However, the Tr. basalis draft assembly size falls within the range of genome size estimates of other platygastroids, which are typically between 200 and 400 Mb (data not shown), in addition to the average genome size range of other hymenopterans. We assessed genome assembly completeness with BUSCO, using the Insecta odbv10 database (N = 1,367) in genome mode and with the ‘long’ flag enabled to perform a more thorough search. We recovered 93.1% complete, 1.8% duplicated, 1.4% fragmented, and 3.7% missing Insecta BUSCOs in the Tr. basalis genome. These values compare favorably with other parasitoid Hymenoptera with more contiguous genome assemblies (Fig.
Morphological and genomic traits of Trissolcus basalis A head of female Tr. basalis reared from BMSB eggs in Tuscaloosa, Alabama, USA (FSCA 00090269 B repeat landscape plot of different TE classes within the Tr. basalis genome. Nucleotide sequence divergence in each TE copy was calculated as the Kimura distance between the annotated TE copies in the genome and the consensus sequence of each TE family C ultrametric timetree depicting the position of Tr. basalis relative to six other hymenopterans inferred from a phylogenetic analysis of 4,510 single-copy protein-coding genes identified by OrthoFinder. Numbers above branches (left to right, separated by forward slashes) indicate gene family expansions, gene family contractions, and the number of rapidly evolving gene families in each lineage. Each branch received 100% SH-aLRT and UFBoot2 support values D genome assembly completeness comparison based on the proportion of BUSCOs recovered in each genome using the Insecta odbv10 dataset (N = 1367). Abbreviations: BMSB, brown marmorated stink bug; C, complete; D, duplicated; DNA, DNA transposon; F, fragmented; LINE, long interspersed nuclear element; LTR, long terminal repeat; M, missing; mya, million years ago; RC, rolling circle transposon; S, single-copy; SINE, short interspersed nuclear element; TE, transposable element; Aros, Athalia rosae; Bkin, Belonocnema kinseyi; Mdem, Microplitis demolitor; Nvit, Nasonia vitripennis; Oabi, Orussus abietinus; Tbal, Trissolcus basalis; Tpre, Trichogramma pretiosum.
RepeatMasker annotated 13.8% of the Tr. basalis genome as composed of repeats, approximately half of which were unclassified repeats (7.1%). The most abundant classified repetitive elements were various LINE and LTR retroelements (3.0%); DNA transposons (1.3%); and simple repeats (1.5%). The repeat landscape of Tr. basalis shows a relatively uniform distribution of repeat classes, with a gradual decline in the proportion of LTR retroelements and an increase in the proportion of DNA transposons (Fig.
The MAKER genome annotation pipeline resulted in 14,158 protein-coding gene models. Approximately 95% (13,507) of the 14,158 gene models have an annotation edit distance (AED) score of less than 0.5, and 70% (9,915) contain at least one recognizable InterPro domain. AED is a quality control metric that explains how well the gene annotations produced by MAKER match external evidence (i.e., proteomes from other species). The AED values and proportion of gene annotations with a recognizable InterPro domain for the Tr. basalis genome are indicative of a well-annotated assembly (Holt & Yandell, 2011). In addition, nearly half of the protein set (6,929 or 48.9%) was assigned at least one gene ontology (GO) term. To determine how well our annotated protein set compares with external protein databases, we queried our protein annotations against those of the metazoan portion of the Swiss-Prot/UniProt database and all Hymenoptera protein sequences deposited in GenBank (last accessed March 17, 2021). A total of 9,303 (65%) and 12,197 (86%) of the annotated proteins in Tr. basalis were supported by a best BLASTp hit in the Swiss-Prot/UniProt database and GenBank, respectively. A table of the most frequently recovered InterPro domains, GO terms, and Pfam entries associated with the Tr. basalis protein set is available in the Suppl. material
Seventy-five different RNA families were annotated in the Tr. basalis genome. The top 5 most common families belong to the tRNA (RF00005), Histone3 (RF00032), 5S_rRNA (RF00001), SSU_rRNA_eukarya (RF01960), and LSU_rRNA_eukarya (RF02543) RNA sequence families. We also identified both conserved regions of the Sphinx long non-coding RNA gene, which plays a role in the regulation of male mating behavior in the fruit fly Drosophila melanogaster Meigen (
We compared the annotated proteome of Tr. basalis with those of six other hymenopterans with well-annotated genomes. Orthogroup clustering performed with OrthoFinder assigned 81,474 (93.4%) of the 87,222 protein sequences into 11,205 orthogroups. The number of orthogroups with all species present was 6,295 and 4,510 of these were identified as single-copy orthologues. Regarding Tr. basalis, 81.8% (11,582) of its genes were assigned to an orthogroup, and 76.7% (8,599) of orthogroups contained Tr. basalis. The number of orthogroups specific to Tr. basalis was 173, and the number of genes within these 173 species-specific orthogroups was 1,026 (7.2% of the 11,582 genes assigned to an orthogroup). The number of unassigned genes in Tr. basalis was much higher than the taxa with which it was compared. Potential explanations for this discrepancy are (1) the fragmentary nature of the Tr. basalis draft assembly leading to truncated protein models and (2) inaccurate gene annotations. Increasing genome contiguity using additional long-read sequencing technologies and chromosome confirmation capture would decrease the incidence of truncated protein models, and manual curation of the gene models would aid in the identification of false positives.
The orthogroup count data and ultrametric timetree produced by OrthoFinder were used to estimate the rate of gene family evolution with CAFE. We estimated the rate of gene family evolution (gains and losses) in this group of Hymenoptera at 0.0008, after accounting for possible genome assembly/annotation error. This result is in line with a recent multi-order gene family analysis that reported the rate of gene family gain and loss in 24 hymenopteran taxa at 0.0009 (
We identified 174 (99 expansions and 55 contractions) rapidly evolving gene families in Tr. basalis, with most (91) rapidly expanding families containing at least one member with an InterPro, PANTHER, or Pfam annotation (Suppl. material
While this manuscript was in preparation,
We thank Dr. Malte Petersen (Max Planck Institute of Immunobiology and Epigenetics) for sharing the R script used to generate the repeat landscape plot. Thanks to Dr. Jason Mottern (USDA-APHIS) and an anonymous reviewer for their careful and thoughtful review of the manuscript. This material is based upon work supported in part by the National Science Foundation under grant No. DEB-0614764 to N.F. Johnson and A.D. Austin and by funding from The Ohio State University, and the National Natural Science Foundation of China (31900346) to Huayan Chen.
Genome of the egg parasitoid Trissolcus basalis (Wollaston) (Hymenoptera, Scelionidae), a model organism and biocontrol agent of stink bugs
Data type: genomic (excel document)
Explanation note: Bioinformatic data associated with the annotated Trissolcus basalis genome assembly.