Research Article |
Corresponding author: Darren Ward ( wardda@landcareresearch.co.nz ) Academic editor: Jose Fernandez-Triana
© 2023 Darren Ward, Brent Martin.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ward D, Martin B (2023) Trialling a convolution neural network for the identification of Braconidae in New Zealand. Journal of Hymenoptera Research 95: 95-101. https://doi.org/10.3897/jhr.95.95964
|
Computer vision approaches, such as deep learning, potentially offer a range of benefits to entomology, particularly for the image-based identification of taxa. An experiment was conducted to gauge the ability of a convolution neural network (CNN) to identify genera of Braconidae from images of forewings. A deep learning CNN was trained via transfer learning from a small set of 488 images for 57 genera. Three-fold cross-validation achieved an accuracy of 96.7%, thus demonstrating that identification to genus using forewings is highly predictive. Further work is needed to increase both the coverage to species level and the number of images available.
Braconidae, computer vision, diagnositics, identification, model
Insect populations are challenging to study. One of the main problems is the identification of species, particularly in hyper diverse groups such as Hymenoptera, and because knowledge of biodiversity around the world is uneven (
Recent studies on image-based insect identification are showing that deep learning models can extract features from images and learn to differentiate species to an accuracy approaching, or exceeding, human expertise (
Further to identification and diagnostics, the use of images is also being combined with additional automation and/or robotics to undertake sampling in the field, routine laboratory sample processing, or extracting data from images (
In this paper, we test the ability of a convolutional neural network to classify genera of Braconidae that are present in New Zealand using images of the forewing.
All specimens are from the New Zealand Arthropod Collection (
Pinned specimens were selected that represent genera of Braconidae which have been recorded from New Zealand. This includes genera which are either endemic (restricted to New Zealand); native (in New Zealand but also naturally occur elsewhere); have been accidentally introduced through human trade; or intentionally introduced for biological control.
Taxa (and the number of images) are: Aleoides (10); Alysia manducator (Panzer, 1799) (10); Apanteles (12); Aphaereta aotea Hughes & Woolcock, 1976 (8); Aphidius colemani Viereck, 1912 (11); Ascogaster elongata Lyle, 1923 (10); Asobara persimilis (Papp, 1977) (10); Aspicolpus (10); Aspilota parecur Berry, 2007 (8); Austrohormius (10); Bracon phylacteophagus Austin, 1989 (4); Bracon variegator Spinola, 1808 (10); Caenophanes sp5 (11); Choeras helespas Walker, 1996 (9); Chorebus rodericki Berry, 2007 (10); Cotesia (10); Cryptoxilos thorpei Shaw & Berry, 2005 (10); Dacnusa areolaris (Nees, 1811) (10); Diaeretiella rapae (McIntosh, 1855) (10); Dinocampus coccinellae (Schrank, 1802) (10); Dinotrema longworthi Berry, 2007 (10); Diolcogaster (10); Dolichogenidea tasmanica (Cameron, 1912) (10); Doryctomorpha antipoda Ashmead, 1900 (10); Eadya daenerys Ridenbaugh, 2018 (2); Eubazus (10); Glyptapanteles (10); Habrobracon hebetor (Say, 1836) (10); Kauriphanes (6); Kiwigaster variabilis Fernandez-Triana & Ward, 2011 (9); Lysiphlebus testaceipes (Cresson, 1880) (5); Macrocentrus rubromaculata (Cameron, 1901) (10); Metaspathius (7); Meteorus pulchricornis (Wesmael, 1835) (9); Microctonus hyperodae Loan, 1974 (9); Microplitis (10); Monolexis fuscicornis Förster, 1862 (3); Neptihormius (10); Notogaster charlesi Fernandez-Triana & Ward, 2020 (10); Ontsira antica (Wollaston, 1858) (10); Opius sp2 (10); Pauesia nigrovaria (Provancher, 1888) (8); Pholetesor (5); Pronkia sp4 (9); Pseudosyngaster pallidus (Gourlay, 1928) (10); Rasivalva (2); Rhyssaloides (9); Sathon sp1 (7); Schauinslandia (10); Shireplitis bilboi Fernandez-Triana & Ward, 2013 (2); Shireplitis frodoi Fernandez-Triana & Ward, 2013 (3); Spathius exarator (Linnaeus, 1758) (10); Syntretus (10); Taphaeus (10); Therophilus (5); Trioxys (10); Venanides (10); and Xynobius (10).
Some genera were not included because they are wingless, have very reduced wings, or there was an insufficient number of specimens.
An attempt was made to get 10 specimens from each genus. However, this was not always possible. The average number of forewings removed from a genus was 8.6 (range 5–12, median 10). To remove wings, a specimen was placed in a specimen manipulator and a micropin was used to gently move the tegula up and down until the forewing fell off. Wings were not ‘pulled’ because the membrane rips easily. Static electricity meant the wing stuck to the micropin and forceps, making it easy to put into a gelatin capsule. After all wings had been removed, wings were slide mounted with Euparal.
The specimen records, all images (zip folder), and one representative image of each genus are freely available via the datastore repository (https://doi.org/10.7931/xftx-6w25).
Images of the slide mounted wings were taken on a Nikon MZS25 scope with a Nikon DS-Ri2 camera (16.25 megapixels). There was no photo stacking. Images were cropped and edited using Adobe Photoshop (Fig.
The following pre-processing corrections were applied to each image (Fig.
A few images were excluded from analyses as they had become ripped during the slide mounting process or were deemed poor quality (colouration, debris on wing) which was not spotted when wings were initially removed.
Transfer learning was used to train an Xception network that had been initially trained on the Imagenet image set (www.image-net.org). The total number of images were split into three sets (folds) of 2/3 train, 1/3 test, via stratified round-robin cross-validation. The fully connected classification layers were trained for 200 epochs, followed by a further 200 epochs fine-tuning of all parameters. The learning rate was fixed at 0.0001 and the ADAM optimiser used to automatically adjust the update magnitude; this scheme resulted in a very smooth learning curve for this dataset that plateaued at around 200 epochs, reducing the need for validation sets to determine the optimal cut-off. Images were randomly augmented during training to reduce the chance of overfitting and to allow for variations in image conditions that may arise in future cases. Augmentation was conservative because the images were quite highly standardised. The augmentations used were (randomly shift the image up to 10% horizontally and vertically; randomly zoom the image up to +/-10%; randomly rotate the image up to +/-25 degrees).
A total of 488 wings were used representing 57 genera. Results from cross-validation gave an overall accuracy of 96.7% (472/488; Table
Cross-validation | Number of correct images / Total images | Percent accuracy |
---|---|---|
1 | 182/188 | 96.81% |
2 | 152/156 | 97.44% |
3 | 138/144 | 95.83% |
List of errors where the correctly identified image was incorrectly predicted. Scores represent the confidence of the model that the prediction is correct. Sorted by highest score.
Catalog number | Correct | Predicted | Confidence score |
---|---|---|---|
NZAC02012114 | Glyptapanteles | Dolichogenidea | 0.997 |
NZAC02011921 | Doryctomorpha | Caenophanes | 0.964 |
NZAC02012115 | Glyptapanteles | Sathon | 0.765 |
NZAC02011668 | Aphaereta | Asobara | 0.65 |
NZAC02012085 | Shireplitis | Venanides | 0.649 |
NZAC02012113 | Glyptapanteles | Dolichogenidea | 0.597 |
NZAC02012063 | Pholetesor | Sathon | 0.567 |
NZAC02012084 | Shireplitis | Venanides | 0.56 |
NZAC02012039 | Sathon | Glyptapanteles | 0.545 |
NZAC02011790 | Caenophanes | Doryctomorpha | 0.535 |
NZAC02011933 | Neptihormius | Metaspathius | 0.525 |
NZAC02012117 | Glyptapanteles | Dolichogenidea | 0.508 |
NZAC02011792 | Caenophanes | Doryctomorpha | 0.497 |
NZAC02012038 | Sathon | Shireplitis | 0.471 |
NZAC02012088 | Shireplitis | Venanides | 0.395 |
NZAC02011984 | Aleoides | Doryctomorpha | 0.293 |
This small experiment demonstrated that forewings appear to be highly predictive of genus level identifications. The model accuracy is particularly impressive given the very small number of images. Often hundreds or even thousands of images are needed to build these models. For example,
Two main questions need to be addressed in future work. Firstly, how well does only one species (or morphospecies) represent a genus. Several of the genera above are monotypic, and for some genera the forewing morphology will differ very little between species, but for genera with higher species diversity this condition is unlikely to hold. However, this was an initial trial of the technology, and as the number of species-level image sets increases then genus-level identification becomes less relevant. Secondly, how well will the model perform when additional species or genera are added. An increase in the number of ‘classes’ (taxa) will likely increase the morphological variability in the dataset, perhaps affecting model accuracy and consequently needing more source images to overcome (
Machine learning tools, particularly convolutional neural networks (CNNs), are fast becoming a valuable tool for the identification of insects (
At present, the major hurdle is the shortage of images (
Funded by the Ministry of Business, Innovation and Employment (MBIE) through the Strategic Science Investment Fund (SSIF) for Nationally Significant Collections and Databases (NSCDs) at Manaaki Whenua-Landcare Research (MWLR) via the Biota Portfolio (BIO) within the Research Priority Area (RPA) for Collections and Databases. Thanks to S. Malysheva for slide mounting the wings. Many thanks to the taxonomic experts who have identified specimens and worked on the New Zealand Braconid fauna over the years, especially to Sergey Belokobylkij, Donald Quicke, and Jose Fernandez-Triana.
Funded by the Ministry of Business, Innovation and Employment (MBIE) through the Strategic Science Investment Funding (SSIF) for Nationally Significant Collections and Databases.