[orcid=0000-0003-0624-4877]
1]organization=College of Science and Engineering,addressline=James Cook University,city=Townsville,postcode=4818,state=QLD,country=Australia
2]organization=Agriculture Technology and Adoption Centre,addressline=James Cook University,city=Townsville,postcode=4814,state=QLD,country=Australia
[orcid=0000-0001-9718-4464]
[orcid=0000-0001-7975-3985]
Ethan Kane Watersethan.waters@jcu.edu.au[[ Carla Chia-Ming Chencarla.ewels@jcu.edu.au Mostafa Rahimi Azghadimostafa.rahimiazghadi@jcu.edu.au
A B S T R A C T
Disease detection in sugarcane, particularly the identification of asymptomatic infectious diseases such as Ratoon Stunting Disease (RSD), is critical for effective crop management. This study employed various machine learning techniques to detect the presence of RSD in different sugarcane varieties, using vegetation indices derived from freely available satellite-based spectral data. Our results show that the Support Vector Machine with a Radial Basis Function Kernel (SVM-RBF) was the most effective algorithm, achieving classification accuracy between 85.64% and 96.55%, depending on the variety. Gradient Boosting and Random Forest also demonstrated high performance achieving accuracy between 83.33% to 96.55%, while Logistic Regression and Quadratic Discriminant Analysis showed variable results across different varieties. The inclusion of sugarcane variety and vegetation indices was important in the detection of RSD. This agreed with what was identified in the current literature. Our study highlights the potential of satellite-based remote sensing as a cost-effective and efficient method for large-scale sugarcane disease detection alternative to traditional manual laboratory testing methods.
keywords:
Sugarcane\sepHealth Monitoring System\sepRemote Sensing\sepSatellite-based Spectroscopy\sepMachine Learning\sepVegetation Indices\sepDisease & Pests\sep
1 Introduction
Disease presents a formidable hurdle to maximising yield in the sugarcane industry with Ratoon stunting disease (RSD) emerging as a key contributor globally (Chakraborty etal., 2024). The bacterium Leifsonia xyli subsp. xyli (Lxx) is the primary causal agent of RSD, which is primarily propagated with contaminated cutting implements (Davis etal., 1984). RSD infection can result in substantial yield losses (up to 60%), depending on sugarcane variety and water availability during periods of growth (Bailey and Bechet, 1986; Chakraborty etal., 2024; Davis and Bailey, 2000; Croft etal., 2000). This results in pronounced economic repercussions, with an estimated annual loss of $25 million observed across 87,000 hectares monitored in Australia in 2019(Magarey etal., 2021). The lack of external symptoms (Chakraborty etal., 2024; Davis and Bailey, 2000; Croft etal., 2000; Sugar Research Australia, 2021) has made the detection and the management of RSD a significant challenge. It relies heavily on laboratory diagnostic techniques such as PCR, qPCR, LAMP, or LSB-qPCR for RSD detection (Carvalho etal., 2016; Fegan etal., 1998; Ghai etal., 2014; Young etal., 2016). Collecting the field samples required for these techniques is both time-consuming and costly, especially for large-area detection. This underscores the need to develop a more efficient method for large-scale detection of RSD.
Previous research has demonstrated success in using spectroscopy to detect disease and pests in sugarcane by leveraging subtle differences often imperceptible to the naked eye (Bao etal., 2021). Spectroscopy measures the electromagnetic radiation reflected from an object, providing insights into its chemical composition, molecular structure, and physical properties (Lu etal., 2020) that may indicate the presence of disease. However, most of the studies in the literature utilised handheld spectrometers (Bao etal., 2024; Grisham etal., 2010; Ong etal., 2023; Bao etal., 2021; Vargas etal., 2016; Abdel-Rahman etal., 2010; Soca-Muñoz etal., 2020), which are challenging to scale up for large field detection. Hence, recent studies have shifted towards drone or satellite-based observations (Moriya etal., 2017; Apan etal., 2004; Narmilan etal., 2022; Simões and Rios do Amaral, 2023; Johansen etal., 2014, 2018), acknowledging the need for larger-scale diagnostic methods. Despite this, no studies to date have yet utilised satellite-based multispectral imaging for disease detection in sugarcane. Although three previous studies have explored the diagnosis of asymptomatic sugarcane diseases with machine learning and spectroscopy, both relied on handheld spectrometers (Bao etal., 2021, 2024; Grisham etal., 2010). No current research has applied large-scale remote sensing techniques to detect asymptomatic diseases in sugarcane.
Reference | Health Condition | Spectroscopy | Category | ML Algorithm | Best Classification Accuracy |
Apan etal. (2004, 2003) | Orange Rust | Hyperspectral | Satellite | LDA | 96.90% |
Moriya etal. (2017) | Mosaic | Hyperspectral | Drone | SID | 92.50% |
Grisham etal. (2010) | SCYLV | Hyperspectral | Handheld Spectrometer | LDA | 73% |
Narmilan etal. (2022) | White Leaf Disease | Multispectral | Drone | RF, DT, KNN, XGB | 92%, 91%, 92%, 92% |
Simões and Rios do Amaral (2023) | Orange & Brown Rust | Multispectral | Drone | RF, KNN, SVM | 90%, 90%, 90%; 86%, 83%, 88% |
Ong etal. (2023) | Brown Stripe & Ring Spot | Hyperspectral | Handheld Spectrometer | RF, SVM, NB | 95%, 85%, 77% |
Johansen etal. (2014) | Cane Grub | Multispectral | Satellite | GEOBIA | 79% |
Johansen etal. (2018) | Cane Grub | Multispectral | Satellite | GEOBIA | 98.7% |
Bao etal. (2021, 2024) | Smut & Mosaic | Hyperspectral | Handheld Spectrometer | CNN | >90% |
Vargas etal. (2016) | Diatraea saccharalis | Hyperspectral & Multispectral | Handheld Spectrometer & Satellite | N/A | 79.8% & 85.5% |
Abdel-Rahman etal. (2010) | Thrips | Hyperspectral | Handheld Spectrometer | N/A | N/A |
Soca-Muñoz etal. (2020) | Orange & Brown Rust | Hyperspectral & Multispectral | Handheld Spectrometer & Drone | N/A | N/A |
Freely available satellite-based remote sensing offers a cost-effective and efficient alternative to the traditional, resource-intensive methods of identifying and managing disease in sugarcane (Waters etal., 2024). In particular, free publicly accessible multispectral satellite data can alleviate the financial burden of purchasing expensive spectrometers images, and promote the widespread adoption of this advanced technology.
Spectroscopy data can be converted into a ’Vegetation Index’ to accentuate the important characteristics of vegetation for the specific task (Xue and Su, 2017; Fang and Liang, 2014). The suitability of a vegetation index for a particular application depends on both the spectrometer and the project objective (Xue and Su, 2017). The majority of the sugarcane disease and pest detection studies (Narmilan etal., 2022; Apan etal., 2004; Simões and Rios do Amaral, 2023; Vargas etal., 2016; Soca-Muñoz etal., 2020; Johansen etal., 2014, 2018) leverage vegetation indices to improve performance. This was particularly prominent in the study by Apan etal. (2004) which developed several new vegetation indices for sugarcane disease detection with a focus on water and vegetation stress.
The application of machine learning (ML) algorithms to spectroscopy data has emerged as a promising approach to aid disease management in sugarcane (Table 1). A wide range of algorithms has been employed in this field, including XGBoost (XGB), Random Forest (RF), Decision Trees (DT), K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Linear Discriminant Analysis (LDA), and Neural Networks (NN) (Narmilan etal., 2022; Grisham etal., 2010; Bao etal., 2021; Apan etal., 2004, 2003; Narmilan etal., 2022; Ong etal., 2023; Simões and Rios do Amaral, 2023). Despite the diversity in approaches, only a handful of studies have systematically compared ML performance Narmilan etal. (2022); Ong etal. (2023); Simões and Rios do Amaral (2023). Among the algorithms, RF is the most common and consistently achieves high classification accuracy Narmilan etal. (2022); Ong etal. (2023); Simões and Rios do Amaral (2023). Similarly, SVM has demonstrated strong performance, achieving promising results Narmilan etal. (2022); Simões and Rios do Amaral (2023). However, despite the variety of methods applied, the limited exploration of each method is insufficient to draw robust conclusions about the relative effectiveness of these algorithms. This highlights the need for continued comprehensive comparisons across multiple datasets and conditions to better understand their relative effectiveness in sugarcane disease management. Additionally, few studies have accounted for the impact of different sugarcane varieties on model performance, warranting further exploration given the increased variation introduced by classifying several varieties Grisham etal. (2010); Abdel-Rahman etal. (2010); Simões and Rios do Amaral (2023).
The aim of this research is therefore to investigate the efficacy of various machine learning algorithms in classifying asymptomatic RSD across several sugarcane varieties using freely available satellite data, including identifying which vegetation indices are important for the diagnosis of RSD if at all. To date, this study represents the first effort to classify an asymptomatic disease in sugarcane through satellite-based spectroscopy (Waters etal., 2024). It is also the first instance of classifying RSD with machine learning and spectroscopy (Waters etal., 2024).
2 Methodology
2.1 Overall Process Summary
The project comprises three stages: ground-truth data collection, data preprocessing, and ML development.In stage 1, RSD data is collected from sugarcane blocks. For stage 2, sentinel-2 images with pre-applied atmospheric correction undergo preprocessing before pixelwise values are extracted and labeled from raw spectral bands and vegetation indices. In the third stage, the dataset is divided into an 80:20 train-test split, with 10-fold validation applied to the training set for hyperparameter tuning. The top-performing models are subsequently evaluated on the test set by performing bootstrapping with 5000 samples. Permutation testing is then performed to ensure model performance is significantly better than a null distribution. The steps are detailed in the following sections.
2.2 Study Location & Ground Truth Data Collection
Sampling occurred between February and March 2022 across 72 sugarcane blocks situated in the Herbert region of Queensland, Australia. The acquisition of a ground-truth dataset detailing the disease status and variety of sugarcane was conducted by trained field agronomists at Herbert Cane Productivity Services Limited (HCPSL).
Two types of sugarcane blocks were sampled; farmer-owned blocks and HCPSL-owned seed production blocks. HCPSL-owned seed production blocks exist to provide farmers the opportunity to purchase disease-free sugarcane seeds due to the strict hygiene and disease testing requirements. HCPSL followed their standard RSD sampling protocol for both types of blocks. In seed production plots samples were collected every 20m in a grid pattern. In Farmer-owned blocks, 12 samples were collected from the four-corners of the block, and four from randomly selected locations within the block. The juice was then extracted from the samples and underwent qPCR testing at Sugar Research Australia (SRA) laboratories. The block is classified as RSD Positive if any sample within that block returned as RSD positive. To avoid potential false negatives in farmer-owned blocks due to sparse sampling, only HCPSL seed production blocks were used for RSD-negative samples.
The shapefile corresponding to the sample blocks and the corresponding sugarcane variety was provided by HCPSL. In total, data were collected from 72 blocks, which comprised five sugarcane varieties, Q200, Q208, Q240, Q253 and SRA14.
2.3 Multispectral Satellite Image Acquisition and Data Labelling
This study utilized the European Space Agency’s (ESA) Sentinel-2 satellite series, which offers freely available products for the wavelengths necessary to calculate the vegetation indices listed in Table 3. “Level-2A” products are provided for all required bands at a 20m resolution and have had atmospheric correction applied by ESA with the Sen2Cor process (European Space Agency, ). To streamline data processing, a Python script was developed as a pipeline to extract all spectral bands from the “Level-2A” satellite image data products provided by Sentinel-2 via the “Sentinel Hub Process API.”
The multispectral data used in the study were captured by Sentinel-2 on 27th February 2022, as it had the least cloud coverage during the sampling period. Since RSD is transmitted through contaminated tools during planting or harvesting, and the study period was during the plant growth period, we assumed that the infection status remained unchanged during the study period, hence only one day of image data was used in this study.
The Sentinel-2 data processing pipelines used the QGIS Python API to transform the coordinate reference system (CRS) of the Sentinel-2 products to match the CRS of the ground truth data shapefile representing the farms of interest. All Sentinel-2 pixels within the geometry of the 72 sampled blocks described in the shapefile were labelled with the disease status and variety of the block. Each labelled pixel within the shapefile geometries was utilised as a separate observation. Table 2 lists the different sugarcane varieties, with their respective RSD positive/negative pixel numbers.
Variety | RSD Positive | RSD Negative | Total |
Q200 | 145 | 389 | 534 |
Q208 | 869 | 649 | 1518 |
Q240 | 766 | 573 | 1339 |
Q253 | 886 | 1769 | 2655 |
SRA14 | 88 | 89 | 177 |
2754 | 3469 | 6223 |
2.4 Vegetation Indices
The causal bacteria of RSD infect the xylem vessels responsible for water transport in sugarcane, reducing water absorption and resulting in vegetation stress (Davis etal., 1984; Chakraborty etal., 2024). Previous research has demonstrated that Sentinel-2’s bands 11 and 12 in the Short-wave Infrared Region (SWIR) are sensitive to moisture absorption in vegetation, making them potential predictors for RSD (Fensholt and Sandholt, 2003; Liu etal., 2021; Wang etal., 2017). Additionally, Near Infrared Region (NIR) bands and vegetation indices derived from them have previously been utilised as indicators of vegetation health or stress (Apan etal., 2004, 2003; Narmilan etal., 2022). An independent t-test with a significance value of 0.05 was performed for each variety and spectral band, adjusting for type I error with Bonferroni’s Correction, to provide an indication of which bands and vegetation indices may be effective predictors of RSD.
Although the statistically significant bands varied between varieties, bands B08A, and B12 were consistent across a majority of varieties. This supports the notion that vegetation moisture and stress are key indicators for diagnosing RSD. Accordingly, the vegetation indices selected for this study focus on vegetation moisture and stress with several general health indicators, as shown in Table 3 in Appendix A, were chosen to emphasise these features. Specifically, the DWSIs are of interest given their demonstrated effectiveness for detecting sugarcane disease with satellite-based remote sensing Apan etal. (2003); Dutia etal. (2006) by focusing on vegetation moisture. The original indices were formulated with hyperspectral satellite imagery, specifically targeting R1600 to measure water absorption. In this study, these indices were replicated as closely as possible with Sentinel-2 data. For example, Sentinel-2’s band 11 is approximately centered at R1600, was selected for its ability to capture similar moisture-related signals. Given that bands 11 and 12 are both sensitive to moisture (Fensholt and Sandholt, 2003; Liu etal., 2021; Wang etal., 2017), and the statistical significance observed in the analysis for band 12, the original DWSIs were adapted to include band 12. This led to the creation of several new indices — DWSI 6, 7, and 8, — that integrate both bands to enhance their sensitivity to RSD symptoms. To our knowledge, these modified indices have not been previously utilised. Additionally, Sentinel-2’s band 5, a red-edge band known to improve vegetation classification accuracy (Kussul etal., 2017; Qiu etal., 2017), was incorporated into DWSI-8 in an attempt to further improve its predictive performance.
The spectral bands of the satellite image captured on the 27th February 2022 were utilised to compute the 19 vegetation indices listed in Table 3 with the QGIS Python API for each pixel in the dataset.
2.5 Machine Learning Algorithm Development
In this study, we compared and contrasted five machine learning algorithms to predict if a pixel is RSD positive. These are Random Forest (RF), Logistic Regression (LR), Quadratic Discriminant Analysis (QDA), Gradient Boosting (GB) and Radial Basis Function Support Vector Machine (SVM-RBF). Given that the sugarcane variety can be potentially unknown to the users, We developed two sets of models for the scenarios when the sugarcane variety is known and unknown. The final dataset seen in Table 2 is largely unbalance, therefore, the dataset was down-sampled to achieve equal class distribution, to mitigate potential bias.
The dataset was partitioned into 80% for training and 20% for testing. Standard scaling was applied to the entire dataset, with the scaling parameters fitted exclusively on the training set. Hyperparameters were tuned using 10-fold cross-Validation with "Halving Grid Search" on the 80% training set. HalvingGridSearchCV is a resource-efficient version of GridSearchCV from the scikit-learn library for hyperparameter tuning. It uses successive halving, which starts with many candidate parameter combinations and progressively eliminates the least promising ones, allocating more resources as the number of iterations increases (Scikit-learn Developers, 2024).
The model performance was evaluated by bootstrapping the unseen test set with 5000 bootstrap samples, yielding distributions for accuracy, precision, and recall. After the model evaluation, each machine learning algorithm was trained with 10-fold cross-validation on the entire dataset, to find the most appropriate hyperparameters for the entire dataset, after the performance has been determined.
To assess the significance of the ML algorithms’ ability to detect RSD, a permutation test was conducted. The pixel labels were shuffled, and the models were retrained using the previously determined hyperparameters. This process was repeated 1000 times to generate a null distribution of accuracy, precision, and recall, reflecting model performance when there is no true association between the features and disease status. By comparing the original performance distribution to the null distribution, the statistical significance of the observed model performance can be evaluated.
2.6 Software and Libraries
This study was conducted with several software tools and libraries for data processing, analysis, and model development. QGIS 3.36.0 was employed for examining satellite imagery and utilizing its Python API to calculate vegetation indices, perform spatial joins, and extract tabular data from the spatial layers. Visual Studio Code version 1.89.1 was utilised as the programming IDE. Python 3.9.18 and a list of its libraries were utilisesed, including the following: matplotlib 3.7.2, seaborn 0.12.2, numpy 1.24.3, pandas 2.0.3, scikit-learn 1.3.0 and scipy 1.11.1.
3 Results
The classification and permutation testing results (Figure 1) reveal a clear separation between the null distribution and the model accuracy distribution for all machine learning algorithms and varieties, with the exception of SRA14. This demonstrates that the models significantly outperformed random chance. In contrast, the overlap of distributions for SRA14 across all models suggests that its performance was not substantially different from what would be expected by chance, indicating a weaker predictive power compared to models for the other varieties.
Three machine learning algorithms demonstrated over 80% accuracy across all varieties for RSD detection. SVM-RBF performed the best overall achieving between 85.64% and 96.55% accuracy depending on the variety (Figure 1). SVM-RBF achieved the highest classification accuracy when variety was not considered (85.64%) and for several specific varieties including Q208 (91.15%), Q240 (96.96%), and Q253 (85.35%). Similarly, GB and RF demonstrated consistently high performance across all varieties achieving 83.33% to 94.83% and 83.33% to 96.55% accuracy respectively, depending on the variety. All three models performed particularly well for Q200, Q208 and Q240 achieving between 91.15% and 96.55% accuracy depending on the specific variety and model. GB and RF tied for the highest median overall accuracy on Q208, though both models demonstrated less balanced performance in terms of precision and recall when compared to SVM-RBF. Additionally, GB matched the best median overall accuracy for Q200. A complete table of results for all models and varieties can be seen in Table 4 in Appendix B.















In contrast, LR and QDA showed varying performance depending on the specific variety. These two models performed particularly poorly for variety Q253 and underperformed compared to the other models on Q240. However, they excelled with other varieties. QDA, in particular, emerged as the top-performing model for variety SRA14, achieving a median overall accuracy of 88.89%. Additionally, for Q200, QDA, SVM-RBF, and GB all tied as the best-performing models, each yielding identical classification reports with a median overall accuracy of 96.55%.
Variety-specific models for all machine learning algorithms performed as well as, or better than, the variety-agnostic models. Models specific to Q200, Q208 and Q240 outperformed the variety agnostic model, whereas those specific to Q253 and SRA14 showed no significant improvement. RSD in varieties Q200 and Q240 was consistently classified with higher accuracy compared to the other varieties. Conversely, Q253 consistently showed lower classification accuracy, while SRA14 exhibited significant variability in performance. Notably, a closer analysis of positive class precision revealed that, in 4 out of 5 algorithms, varieties Q200 and Q208, or the variety-agnostic model, yielded lower precision for the positive class.
The hyperparameters and overall accuracies of the final models trained on the entire dataset are reported in Table 5 in Appendix C now that an appropriate estimate of performance has been determined. The outcomes largely align with the previously observed accuracies, which is expected if the test set utilsied was representative of the sample distribution. In this final evaluation, there were no ties for best performance across varieties. SVM-RBF maintained its position as the top-performing model for most varieties and QDA continued to outperform other models for SRA14.
The feature importance analysis for diagnosing RSD revealed some variation between the GB and RF classifiers, however, both generally agreed upon several key findings. Notably, When sugarcane variety was not considered, both classifiers identified DWSI-6 as the most influential feature, and identified DWSI-7, DWSI-2, and spectral bands 2, 5, 6, and 11 among the top 10 most important features. Analysing feature importance for RSD diagnosis with separate models for each sugarcane variety indicated variations between both varieties and classifiers. No single feature appeared in the top 10 importance rankings across all varieties for both classifiers. However, across a majority of instances DWSI-6, DWSI-3, Band 2, 5, 6, 7 and 11 were the most frequently occurring features in the top 10 importance. A complete description of feature importance can be seen in Figure 2 in Appendix D.
4 Discussion
This study demonstrates that ML algorithms can effectively classify RSD across several varieties with freely available satellite-based spectroscopy data. Consistent with previous studies, RF, SVM, and GB variants demonstrated strong performance (Narmilan etal., 2022; Simões and Rios do Amaral, 2023; Ong etal., 2023).
Vegetation Indices (VI) and spectral bands sensitive to vegetation moisture and stress, were consistently the most influential features, reinforcing the established understanding that RSD diminishes water uptake and heightens plant stress (Bailey and Bechet, 1986; Chakraborty etal., 2024; Davis and Bailey, 2000; Croft etal., 2000). Consistent importance of the features DWSI-7, DWSI-6, DWSI-3, Band 5, 6 and 7, clearly reflects this. The trend was particularly pronounced in the variety-agnostic models, potentially due to the variation introduced by the different sugarcane varieties, forcing the model to focus on the most generalised disease signals common across all varieties. In contrast, feature importance varied among variety-specific models, reflecting the distinct biophysical characteristics of each variety and their unique responses to disease presence. The importance of DWSI-7 and DWSI-6 across the different models highlights that adapting existing DWSI to include the other statistically significant moisture absorption band contributed to the success of these models. Interestingly, Band 2 which is centred in the blue part of the visible spectrum was considered to be consistently important, despite RSD being considered as an asymptomatic disease universally by experts. While water and stress-related indices proved valuable, popular general health indicators like the Normalized Difference Vegetation Index (NDVI) performed poorly, potentially due to the asymptomatic nature of RSD.
Across all machine learning algorithms, variety-specific models were as or more effective than variety-agnostic ones. Performance of RSD detection varied between sugarcane varieties potentially arising from the physical differences between the varieties or external factors correlated with variety selection. Variables such as soil type, rainfall, and land topology may influence variety choice and introduce variability in the observed spectral reflectance. Consequently, future studies should incorporate soil data, rainfall, and land topology as predictors to assess their impact on RSD diagnosis and determine whether these factors explain variation that affects class separability. Notably, variety SRA14 typically exhibited lower classification rates and larger variation, compared to other varieties with model performance that was not significantly greater than random chance. This may be attributed to a smaller sample size than the other varieties (see Table 2) and perhaps future studies with a larger dataset will find a significant performance difference.
Despite the high classification accuracies achieved by the models, there were several limitations of the study. Positive class precision was consistently lower than negative class precision for models classifying RSD in Q200 and Q208, indicating a higher rate of false positives across the algorithms. This may have arisen from the assumption that RSD infection in a block was uniformly distributed by labeling all pixels within a block with the overall disease status of that block. This could result in regions with little or no RSD being mislabeled as ’RSD Positive’, thereby affecting model training. Future studies should avoid this assumption where possible and implement a higher resolution sampling method. Interestingly, this observation coincides with the only two varieties identified by Sugar Research Australia as partially resistant to RSD (Sugar Research Australia, 2021, 2024). Partial resistance might reduce the spread of RSD sufficiently to create patches of non-infected areas within otherwise infected paddocks. However, to verify this hypothesis, further research with higher resolution sampling methods would be necessary. Additionally, this study focused on developing models and a prototype system for RSD detection within the Herbert region in February of 2022. Future work should explore the influence of other predictors that could not be accounted for in this study to ensure its reliability and broader applicability in other regions or at other points in time. This includes an investigation into the impacts of temperature, humidity, sunlight duration, flowering, other varieties, simultaneous disease infections and precipitation. RSD detection should be trialed at different stages of the sugarcane growth with time series data to improve its applicability to the industry.
5 Conclusion
In this study, we demonstrated that RSD can be classified for several varieties with ML algorithms from freely available satellite-based spectroscopy data. The best-performing machine learning algorithm was Support Vector Machine with a Radial Basis Kernel achieving between 85.64% and 96.55% accuracy depending on the variety. The performance of all machine learning algorithms was found to be significantly better than the null distribution for all varieties except SRA14. Inline with current literature, the inclusion of sugarcane variety information and vegetation indices improved classification rates of disease. Additionally, several varieties were found to be more challenging to classify for RSD compared to others. These promising initial results, coupled with the efficiency of classifying 72 blocks within minutes as opposed to months, indicates the potential benefits of implementing a large-scale health monitoring system using satellites and machine learning. Given that RSD is one of the most impactful diseases in sugarcane cultivation, this research significantly advances sugarcane disease management.
6 Acknowledgements
We extend our sincere appreciation to HCPSL for their invaluable contribution in collecting and providing the ground-truth data essential for this study. Special thanks to Lawrence Di Bella, Rod Nielson, Adam Royle and Rhiannan Harragon from HCPSL for their industry knowledge and support in the ground truth data collection process.
7 Funding
This work was supported by Australia’s Economic Accelerator Seed grant provided by the Australian Government, Department of Education
Appendix A
Reference | Vegetation Index | Formula |
RouseJr etal. (1973) | Normalized Difference Vegetation Index (NDVI) | |
Kaufman and Tanre (1992) | Atmospherically Resistant Vegetation Index (ARVI) | |
Genc etal. (2008) | Simple ratio index (SRI) | |
Merzlyak etal. (1999) | Plant Senescence Reflectance index (PSRI) | |
Jordan (1969) | Ratio Vegetation Index (RVI) | |
McFeeters (1996) | Normalized difference Water index (NDWI) | |
Gao (1996) | Normalized Difference Moiseture Index (NDMI) | |
Tucker (1979) | Normalized Green Red Difference Index (NGRDI) | |
Gitelson etal. (2002) | Visible Atmospherically Resistant Index (VARI) | |
Datt (1998) | Simple Ratio 860/550 | |
Apan etal. (2004, 2003) | Disease-Water Stress Index 1 (DWSI-1) | |
Apan etal. (2004, 2003) | Disease-Water Stress Index 2 (DWSI-2) | |
Apan etal. (2004, 2003) | Disease-Water Stress Index 3 (DWSI-3) | |
Apan etal. (2004, 2003) | Disease-Water Stress Index 4 (DWSI-4) | |
Apan etal. (2004, 2003) | Disease-Water Stress Index 5 (DWSI-5) | |
RouseJr etal. (1973) | Green Blue NDVI (GBNDVI) | |
This Study | Disease-Water Stress Index 6 (DWSI-6) | |
This Study | Disease-Water Stress Index 7 (DWSI-7) | |
This Study | Disease-Water Stress Index 8 (DWSI-8) |
|
Appendix B
Model | Variety | Class | Precision | Recall | Accuracy |
SVM: RBF | Q200 | Positive | 93.33% | 100% | 96.55% |
Negative | 100% | 93.39% | |||
Q208 | Positive | 87.5% | 95.31% | 91.15% | |
Negative | 95.31% | 87.41% | |||
Q240 | Positive | 97.69% | 96.95% | 96.96% | |
Negative | 96.22% | 97.14% | |||
Q253 | Positive | 85.95% | 84.97% | 85.35% | |
Negative | 84.86% | 85.88% | |||
SRA14 | Positive | 85.00% | 89.47% | 86.11% | |
Negative | 88.88% | 84.21% | |||
Not Considered | Positive | 84.94% | 87.06% | 85.64% | |
Negative | 86.45% | 84.24% | |||
GB | Q200 | Positive | 93.33% | 100.00% | 96.55% |
Negative | 100.00% | 93.39% | |||
Q208 | Positive | 90.48% | 92.59% | 91.15% | |
Negative | 92.06% | 89.78% | |||
Q240 | Positive | 96.90% | 94.74% | 95.22% | |
Negative | 93.39% | 96.12% | |||
Q253 | Positive | 85.96% | 83.59% | 84.51% | |
Negative | 83.23% | 85.63% | |||
SRA14 | Positive | 90.00% | 81.25% | 83.33% | |
Negative | 76.92% | 87.5% | |||
Not Considered | Positive | 81.67% | 88.03% | 84.73% | |
Negative | 88.05% | 81.71% | |||
RF | Q200 | Positive | 93.33% | 96.43% | 94.83% |
Negative | 96.77% | 93.94% | |||
Q208 | Positive | 88.98% | 93.94% | 91.15% | |
Negative | 93.66% | 88.57% | |||
Q240 | Positive | 94.57% | 93.13% | 93.04% | |
Negative | 91.43% | 93.13% | |||
Q253 | Positive | 82.01% | 83.43% | 82.82% | |
Negative | 83.82% | 82.42% | |||
SRA14 | Positive | 85.00% | 85.00% | 83.33% | |
Negative | 83.33% | 83.33% | |||
Not Considered | Positive | 80.58% | 88.67% | 84.59% | |
Negative | 88.92% | 80.99% | |||
QDA | Q200 | Positive | 93.33% | 100.00% | 96.55% |
Negative | 100.00% | 93.39% | |||
Q208 | Positive | 87.59% | 88.89% | 87.69% | |
Negative | 88.00% | 86.61% | |||
Q240 | Positive | 76.38% | 93.39% | 83.91% | |
Negative | 93.33% | 76.19% | |||
Q253 | Positive | 92.18% | 70.54% | 76.90% | |
Negative | 61.80% | 88.89% | |||
SRA14 | Positive | 100.00% | 83.33% | 88.89% | |
Negative | 76.92 % | 100.00% | |||
Not Considered | Positive | 87.79% | 66.74% | 71.01% | |
Negative | 52.99% | 80.17% | |||
LR | Q200 | Positive | 93.33% | 93.10% | 93.10% |
Negative | 93.55% | 93.55% | |||
Q208 | Positive | 81.69% | 87.60% | 84.23% | |
Negative | 87.29% | 81.20% | |||
Q240 | Positive | 95.39% | 95.35% | 94.78% | |
Negative | 94.29% | 94.29% | |||
Q253 | Positive | 70.12% | 74.37% | 72.96% | |
Negative | 75.90% | 71.88% | |||
SRA14 | Positive | 95.00% | 82.61% | 86.11% | |
Negative | 77.27% | 93.33% | |||
Not Considered | Positive | 65.33% | 72.03% | 68.90% | |
Negative | 72.73% | 66.13% |
Appendix C
Model | Variety | Optimal hyper-parameters | Accuracy |
SVM: RBF | Q200 | ’C’: 1000, ’gamma’: 0.001 | 99.17% |
Q208 | ’C’: 1000, ’gamma’: 0.01 | 90.14% | |
Q240 | ’C’: 10, ’gamma’: 0.1 | 96.72% | |
Q253 | ’C’: 10, ’gamma’: 1 | 85.39% | |
SRA14 | ’C’: 1000, ’gamma’: 0.01 | 89.02% | |
Not Considered | ’C’: 10, ’gamma’: 0.1 | 84.59% | |
GB | Q200 | ’learning rate’: 0.1, ’max depth’: 3, ’estimators’: 200 | 95.00% |
Q208 | ’learning rate’: 0.2, ’max depth’: 3, ’estimators’: 500 | 88.68% | |
Q240 | ’learning rate’: 0.3, ’max depth’: 3, ’estimators’: 1000 | 95.17% | |
Q253 | ’learning rate’: 0.1, ’max depth’: 5, ’estimators’: 500 | 82.46% | |
SRA14 | ’learning rate’: 0.3, ’max depth’: 3, ’estimators’: 500 | 87.04% | |
Not Considered | ’learning rate’: 0.1, ’max depth’: 5, ’estimators’: 1500 | 84.17% | |
RF | Q200 | ’max depth’: None, ’estimators’: 200 | 95.83% |
Q208 | ’max depth’: 30, ’estimators’: 1000 | 89.75% | |
Q240 | ’max depth’: 30, ’estimators’: 200 | 94.81% | |
Q253 | ’max depth’: 20, ’estimators’: 1000 | 85.19% | |
SRA14 | ’max depth’: None, ’estimators’: 200 | 88.79% | |
Not Considered | ’max depth’: 20, ’estimators’: 200 | 84.26% | |
QDA | Q200 | N/A | 97.24% |
Q208 | N/A | 89.13% | |
Q240 | N/A | 87.09% | |
Q253 | N/A | 76.58% | |
SRA14 | N/A | 92.71% | |
Not Considered | N/A | 69.46% | |
LR | Q200 | ’C’: 100, ’penalty’: ’l1’, ’solver’: ’liblinear’ | 93.33% |
Q208 | ’C’: 100, ’penalty’: ’l1’, ’solver’: ’liblinear’ | 84.31% | |
Q240 | ’C’: 100, ’penalty’: ’l1’, ’solver’: ’liblinear’ | 95.24% | |
Q253 | ’C’: 100, ’penalty’: ’l1’, ’solver’: ’liblinear’ | 74.34% | |
SRA14 | ’C’: 100, ’penalty’: ’l1’, ’solver’: ’liblinear’ | 82.19% | |
Not Considered | ’C’: 100, ’penalty’: ’l1’, ’solver’: ’liblinear’ | 68.73% |
Appendix D


References
- Abdel-Rahman etal. (2010)Abdel-Rahman, E., Ahmed, F.,vanden Berg, M., Way, M.,2010.Potential of spectroscopic data sets for sugarcanethrips (fulmekiola serrata kobus) damage detection.International Journal of Remote Sensing31, 4199–4216.doi:10.1080/01431160903241981.
- Apan etal. (2003)Apan, A., Held, A., Phinn,S., Markley, J., 2003.Formulation and assessment of narrow-band vegetationindices from eo1 hyperion imagery for discriminating sugarcane disease.Proceedings of the Spatial Sciences Conference .
- Apan etal. (2004)Apan, A., Held, A., Phinn,S., Markley, J., 2004.Detecting sugarcane ‘orange rust’ disease usingeo-1 hyperion hyperspectral imagery.International Journal of Remote Sensing25, 489–498.doi:10.1080/01431160310001618031.
- Bailey and Bechet (1986)Bailey, R., Bechet, G.,1986.Effect of ratoon stunting disease on the yield andcomponents of yield of sugarcane under rainfed conditions.Proceedings of the South African SugarTechnologists Association 60, 204–210.
- Bao etal. (2024)Bao, D., Zhou, J.,Bhuiyan, S.A., Adhikari, P.,Tuxworth, G., Ford, R.,Gao, Y., 2024.Early detection of sugarcane smut and mosaic diseasesvia hyperspectral imaging and spectral-spatial attention deep neuralnetworks.Journal of Agriculture and Food Research18, 101369.URL: https://www.sciencedirect.com/science/article/pii/S266615432400406X,doi:https://doi.org/10.1016/j.jafr.2024.101369.
- Bao etal. (2021)Bao, D., Zhou, J.,Bhuiyan, S.A., Zia, A.,Ford, R., Gao, Y., 2021.Early detection of sugarcane smut disease inhyperspectral images.2021 36th International Conference on Image andVision Computing New Zealand (IVCNZ) ,1–6doi:10.1109/IVCNZ54163.2021.9653386.
- Carvalho etal. (2016)Carvalho, G., da Silva, T.,Munhoz, A., Monteiro-Vitorello, C.,Azevedo, R., Melotto, M.,Camargo, L., 2016.Development of a qpcr for leifsonia xyli subsp. xyliand quantification of the effects of heat treatment of sugarcane cuttings onlxx.Crop Protection 80,51–55.doi:https://doi.org/10.1016/j.cropro.2015.10.029.
- Chakraborty etal. (2024)Chakraborty, M., Soda, N.,Strachan, S., Ngo, C.N.,Bhuiyan, S.A., Shiddiky, M.J.A.,Ford, R., 2024.Ratoon stunting disease of sugarcane: A reviewemphasizing detection strategies and challenges.Phytopathology® 114,7–20.URL: https://doi.org/10.1094/PHYTO-05-23-0181-RVW,doi:10.1094/PHYTO-05-23-0181-RVW. pMID: 37530477.
- Croft etal. (2000)Croft, B., Magarey, R.,Whittle, P., 2000.Manual of Canegrowing.BSES.
- Datt (1998)Datt, B., 1998.Remote sensing of chlorophyll a, chlorophyll b,chlorophyll a+b, and total carotenoid content in eucalyptus leaves.Remote Sensing of Environment66, 111–121.URL: https://www.sciencedirect.com/science/article/pii/S0034425798000467,doi:https://doi.org/10.1016/S0034-4257(98)00046-7.
- Davis and Bailey (2000)Davis, M.J., Bailey, R.A.,2000.A guide to sugarcane diseases.CIRAD and ISSCT.
- Davis etal. (1984)Davis, M.J., Gillaspie, A.G.,Vidaver, A.K., Harris, R.W.,1984.Clavibacter: a new genus containing somephytopathogenic coryneform bacteria, including clavibacter xyli subsp. xylisp. nov., subsp. nov. and clavibacter xyli subsp. cynodontis subsp. nov.,pathogens that cause ratoon stunting disease of sugarcane and bermudagrassstunting disease†.International Journal of Systematic andEvolutionary Microbiology 34, 107–117.doi:https://doi.org/10.1099/00207713-34-2-107.
- Dutia etal. (2006)Dutia, S., Bhatiacharya, B.,Rajak, D., Chattopadhyay, C.,and, N., Parihar, J.,2006.Disease detection in mustard crop using eo-1 hyperionsatellite data.Journal of the Indian Society of Remote Sensing(Photonirvachak) 34.
- (14)European Space Agency, .Sentinel-2 overview.https://sentinels.copernicus.eu/web/sentinel/missions/sentinel-2/overview.Accessed: 01/05/2022.
- Fang and Liang (2014)Fang, H., Liang, S., 2014.Leaf area index models.doi:https://doi.org/10.1016/B978-0-12-409548-9.09076-X.
- Fegan etal. (1998)Fegan, M., Croft, B.J.,Teakle, D.S., Hayward, A.C.,Smith, G.R., 1998.Sensitive and specific detection of clavibacter xylisubsp. xyli, causal agent of ratoon stunting disease of sugarcane, with apolymerase chain reaction-based assay.Plant Pathology 47,495–504.doi:https://doi.org/10.1046/j.1365-3059.1998.00255.x.
- Fensholt and Sandholt (2003)Fensholt, R., Sandholt, I.,2003.Derivation of a shortwave infrared water stress indexfrom modis near- and shortwave infrared data in a semiarid environment.Remote Sensing of Environment87, 111–121.URL: https://www.sciencedirect.com/science/article/pii/S0034425703001895,doi:https://doi.org/10.1016/j.rse.2003.07.002.
- Gao (1996)Gao, B.c., 1996.Ndwi—a normalized difference water index for remotesensing of vegetation liquid water from space.Remote Sensing of Environment58, 257–266.doi:https://doi.org/10.1016/S0034-4257(96)00067-3.
- Genc etal. (2008)Genc, H., Genc, L.,Turhan, H., Smith, S.,Nation, J., 2008.Vegetation indices as indicators of damage by thesunn pest (hemiptera: Scutelleridae) to field grown wheat.African Journal of Biotechnology7.
- Ghai etal. (2014)Ghai, M., Singh, V.,Martin, L., McFarlane, S.,van Antwerpen, T., Rutherford, R.,2014.A rapid and visual loop-mediated isothermalamplification assay to detect Leifsonia xyli subsp. xyli targeting atransposase gene.Letters in Applied Microbiology59, 648–657.doi:10.1111/lam.12327.
- Gitelson etal. (2002)Gitelson, A.A., Kaufman, Y.J.,Stark, R., Rundquist, D.,2002.Novel algorithms for remote estimation of vegetationfraction.Remote sensing of Environment80, 76–87.
- Grisham etal. (2010)Grisham, M.P., Johnson, R.M.,Zimba, P.V., 2010.Detecting sugarcane yellow leaf virus infection inasymptomatic leaves with hyperspectral remote sensing and associated leafpigment changes.J Virol Methods 167,140–5.doi:10.1016/j.jviromet.2010.03.024.
- Johansen etal. (2014)Johansen, K., Robson, A.,Samson, P., Sallam, N.,Chandler, K., Eaton, A.,Derby, L., Jennings, J.,2014.Mapping canegrub damage from high spatial resolutionsatellite imagery, in: Proceedings of the 36thConference of the Australian Society of Sugar Cane Technologists, ASSCT2014, pp. 62–70.
- Johansen etal. (2018)Johansen, K., Sallam, N.,Robson, A., Samson, P.,Chandler, K., Derby, L.,Eaton, A., Jennings, J.,2018.Using geoeye-1 imagery for multi-temporalobject-based detection of canegrub damage in sugarcane fields in queensland,australia.GIScience & Remote Sensing 55,285–305.doi:10.1080/15481603.2017.1417691.
- Jordan (1969)Jordan, C.F., 1969.Derivation of leaf-area index from quality of lighton the forest floor.Ecology 50,663–666.doi:https://doi.org/10.2307/1936256.
- Kaufman and Tanre (1992)Kaufman, Y.J., Tanre, D.,1992.Atmospherically resistant vegetation index (arvi) foreos-modis.IEEE transactions on Geoscience and Remote Sensing30, 261–270.doi:10.1109/36.134076.
- Kussul etal. (2017)Kussul, N., Lavreniuk, M.,Skakun, S., Shelestov, A.,2017.Deep learning classification of land cover and croptypes using remote sensing data.IEEE Geoscience and Remote Sensing Letters14, 778–782.
- Liu etal. (2021)Liu, Y., Qian, J., Yue,H., 2021.Comprehensive evaluation of sentinel-2 red edge andshortwave-infrared bands to estimate soil moisture.IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 14,7448–7465.doi:10.1109/JSTARS.2021.3098513.
- Lu etal. (2020)Lu, B., Dao, P.D., Liu,J., He, Y., Shang, J.,2020.Recent advances of hyperspectral imaging technologyand applications in agriculture.Remote Sensing 12,2659.doi:https://doi.org/10.3390/rs12162659.
- Magarey etal. (2021)Magarey, R., McHardie, R.,Hession, M., Cripps, G.,Burgess, D., Spannagle, B.,Sutherland, P., DiBella, L.,Milla, R., Millar, F.,Schembri, A., Baxter, D.,Hetherington, M., Turner, M.,Jakins, A., Quinn, B.,Kalkhoran, S., Gibbs, L.,Ngo, C., 2021.Incidence and economic effects of ratoon stuntingdisease on the queensland sugarcane industry : Assct peer-reviewed paper.Proceedings of the Australian Society of Sugar CaneTechnologists volume 42, 520–526.
- McFeeters (1996)McFeeters, S.K., 1996.The use of the normalized difference water index(ndwi) in the delineation of open water features.International Journal of Remote Sensing17, 1425–1432.doi:https://doi.org/10.1080/01431169608948714.
- Merzlyak etal. (1999)Merzlyak, M.N., Gitelson, A.A.,Chivkunova, O.B., Rakitin, V.Y.,1999.Non-destructive optical detection of pigmentchanges during leaf senescence and fruit ripening.Physiologia plantarum 106,135–141.doi:https://doi.org/10.1034/j.1399-3054.1999.106119.x.
- Moriya etal. (2017)Moriya, E.A.S., Imai, N.N.,Tommaselli, A.M.G., Miyoshi, G.T.,2017.Mapping mosaic virus in sugarcane based onhyperspectral images.IEEE Journal of Selected Topics in Applied EarthObservations and Remote Sensing 10,740–748.doi:10.1109/JSTARS.2016.2635482.
- Narmilan etal. (2022)Narmilan, A., Gonzalez, F.,Salgadoe, A.S.A., Powell, K.,2022.Detection of white leaf disease in sugarcane usingmachine learning techniques over uav multispectral images.Drones 6, 230.doi:https://doi.org/10.3390/drones6090230.
- Ong etal. (2023)Ong, P., Jian, J., Li,X., Zou, C., Yin, J.,Ma, G., 2023.New approach for sugarcane disease recognitionthrough visible and near-infrared spectroscopy and a modified wavelengthselection method using machine learning models.Spectrochimica Acta Part A: Molecular andBiomolecular Spectroscopy 302, 123037.doi:https://doi.org/10.1016/j.saa.2023.123037.
- Qiu etal. (2017)Qiu, S., He, B., Yin, C.,Liao, Z., 2017.Assessments of sentinel-2 vegetation red-edgespectral bands for improving land cover classification.The International Archives of the Photogrammetry,Remote Sensing and Spatial Information Sciences XLII-2/W7,871–874.URL: https://isprs-archives.copernicus.org/articles/XLII-2-W7/871/2017/,doi:10.5194/isprs-archives-XLII-2-W7-871-2017.
- RouseJr etal. (1973)RouseJr, J.W., Haas, R.H.,Schell, J., Deering, D.,1973.Monitoring the vernal advancement and retrogradation(green wave effect) of natural vegetation.Report. Remote Sensing Center Texas A&M University.
- Scikit-learn Developers (2024)Scikit-learn Developers, 2024.HalvingGridSearchCV.URL: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.HalvingGridSearchCV.html.accessed: 2024-09-26.
- Simões and Rios do Amaral (2023)Simões, I.O., Rios do Amaral, L.,2023.Uav-based multispectral data for sugarcane resistancephenotyping of orange and brown rust.Smart Agricultural Technology 4,100144.doi:https://doi.org/10.1016/j.atech.2022.100144.
- Soca-Muñoz etal. (2020)Soca-Muñoz, J.L., Rodrìguez-Machado,E., Aday-Dìaz, O.,Hernàndez-Santana, L., Orozco-Morales,R., 2020.Spectral signature of brown rust and orange rust insugarcane.Revista Facultad de IngenierÃa Universidad deAntioquia , 9 –20doi:10.17533/udea.redin.20191042.
- Sugar Research Australia (2021)Sugar Research Australia, 2021.Ratoon Stunting Disease.Sugar Research Australia.URL: https://sugarresearch.com.au/sugar_files/2017/03/RSD-Info-Sheet_2021_May-2021.pdf.
- Sugar Research Australia (2024)Sugar Research Australia, 2024.Variety Guide.Sugar Research Australia.URL: https://sugarresearch.com.au/sugar_files/2024/06/SRA_Variety-Guide-2024-25-Herbert.pdf.
- Tucker (1979)Tucker, C.J., 1979.Red and photographic infrared linear combinations formonitoring vegetation.Remote Sensing of Environment 8,127–150.doi:https://doi.org/10.1016/0034-4257(79)90013-0.
- Vargas etal. (2016)Vargas, L.A.O., Mendoza, G.G.,Gómez, R.A., Rivero, N.A.,Espinosa, L.Y., 2016.Characterization of diatraea saccharalis in sugarcane(saccharum officinarum) with field spectroradiometry.International Journal of Environmental &Agriculture Research (IJOEAR) .
- Wang etal. (2017)Wang, C., Chen, J., Wu,J., Tang, Y., Shi, P.,Black, T.A., Zhu, K.,2017.A snow-free vegetation index for improved monitoringof vegetation spring green-up date in deciduous ecosystems.Remote Sensing of Environment196, 1–12.URL: https://www.sciencedirect.com/science/article/pii/S0034425717301906,doi:https://doi.org/10.1016/j.rse.2017.04.031.
- Waters etal. (2024)Waters, E.K., Chen, C.C.M.,Azghadi, M.R., 2024.Sugarcane health monitoring with satellitespectroscopy and machine learning: A review.URL: https://arxiv.org/abs/2404.16844,arXiv:2404.16844.
- Xue and Su (2017)Xue, J., Su, B., 2017.Significant remote sensing vegetation indices: Areview of developments and applications.Journal of Sensors 2017,1353691.doi:10.1155/2017/1353691.
- Young etal. (2016)Young, A.J., Kawamata, A.,Ensbey, M.A., Lambley, E.,Nock, C.J., 2016.Efficient diagnosis of ratoon stunting disease ofsugarcane by quantitative pcr on pooled leaf sheath biopsies.Plant Disease 100,2492--2498.doi:10.1094/PDIS-06-16-0848-RE. pMID:30686165.