The performance of a plant is determined by three major factors:
- the interaction between genes and environment.
These three factors are explained below.
Genes are the building blocks to all living things. The genes present in a plant affect its productivity, influence how tall or short it is, or may protect the plant from a particular disease.
In addition to genes, a plant’s health and productivity are also directly impacted by the environment (weather and soil) in which it is grown. Plants need water and sunlight. However, too much rain can cause disease or flooding. Or too much heat, especially in the absence of rainfall, can decrease productivity. The type of soil also has an effect on a plant. For example, if a plant is grown in soil that is able to hold more water than average, it will be able to better withstand an extended period of low rainfall. By characterizing the environments in which plants are grown, we can better understand how plants react to the different environments. Scientists do this by precisely measuring the weather and soil in all growing locations.
A particular plant is adapted to grow best in a particular region due to many factors, including the length of the growing season (determined roughly by the time between the last frost in the spring and the first frost in the fall), expected rainfall, temperature, solar radiation, soil types and others. Some plants may tolerate drought better than others. Some plants may prefer a soil that is sandy, while others prefer clay. This is what is called a genetic by environment (GxE) interaction. The environment activates certain genes that allow the plant to thrive (or not) in that particular environment.
Plant breeders work to develop high yielding plants for growers across a wide range of environments. Not all environments are productive growing environments; however scientists are working to better understand GxE and breed for plants that can perform in highly stressed environments. Successfully doing so could result in crops being developed to make marginal cropland more productive, potentially reducing hunger in arid regions of the world.
Corn is one of the world’s most important crops. Each year, breeders create several new corn products, known as experimental hybrids. Corn breeders work to create corn hybrids that can maintain high yield across a wide range of environments. Historically, identifying the best hybrids has been by trial and error, with breeders testing their experimental hybrids in a diverse set of locations and measuring their performance to select the highest yielding hybrids. This process can take many years. Corn breeders would benefit from accurate models that can predict performance across a range of environmental scenarios.
One way of modeling corn yield is that any particular hybrid (experimental cross of corn varieties) has a maximum yield potential, which then decreases depending on the environment in which it is grown. Every environment will have certain characteristics, or limiting factors, that are suboptimal for any hybrid, causing the actual yield to be less than the yield potential.
Can environmental data be aggregated into useful metrics representing stresses encountered by corn throughout a growing season? Can these metrics be used to discriminate between hybrids tolerant and susceptible to the stresses they represent?
Some potential environmental stresses that can have a negative effect on yield are poor weather (heat, drought, cold, etc.), soil lacking nutrients, insect damage or pathogens. The degree of each stress and how resistant a particular hybrid is to the stresses encountered will determine how much the yield is impacted. In addition, certain stresses, when faced at the same time, can have a stronger impact than the combined individual stresses.
A strong understanding of how a hybrid reacts when facing certain stresses (and combined stresses) could be a powerful tool for developing hybrids for regions that are less hospitable for corn, allowing farmers the potential to productively grow corn where currently it is challenging. Furthermore, individual farmers benefit from having access to this type of information because they can better manage risk across their acres.
Using feature engineering on environmental data (daily weather, soil, plant/harvest dates, any other available data), develop metrics representing the amount of stress that corn would face in any particular environment across a growing season. The objective is to individually model heat stress, drought stress, and stress due to the combination of heat and drought. Each stress will obviously depend on the weather at each location, but the impact can also vary depending on soil type and when the stress occurs throughout the growing season. These stresses are not the only factors affecting yield but, typically, the higher the stress, the lower the typical yield would be.
A sub-analysis that can be done at this step is measuring the impact of the interaction of heat stress and drought stress. Can the yield loss due to these stresses be explained by the individual contributions of heat and drought stress, or does the interaction of the two stresses significantly contribute to yield loss?
Using the stress metrics developed in Objective #1, classify hybrids as either tolerant or susceptible to each type of stress using the hybrid’s yield across different environments. One possible way of doing this is by conducting a linear regression of yield against each stress, and classify hybrids based on the slope of that regression line. You are encouraged to use more complex or non-linear models in order to build a better classifier.
- Each stress does not necessarily need to be represented by a single metric. The analysis will become more complex as more variables are added.
- Objective #1 can be completed using supervised methods. Increased stress should correlate with decreased yield across locations for average yields.
- Objective #2 must be completed with unsupervised methods. No dataset will be provided that classifies any set of hybrids as tolerant or susceptible to any stress, though we will be using internal data to evaluate your classifications.
- An example of a similar analysis can be found in this paper (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5061753/). It only covers drought tolerance and uses supervised methods on a much smaller labeled dataset. Some of the techniques used are not generally applicable to the case presented here, but it does provide some context to understand the problem.
- Definition and interpretation of stress metrics (heat, drought, combined heat and drought)
- Classifications of stress tolerance (heat, drought, combined heat and drought) for all hybrids.
Additionally, following the standards for academic publication, entries should include:
- Quantitative results to justify your modeling and classification techniques
- A clear description of the methodology and theory used
- References or citations as appropriate
The entries will be evaluated based on:
- Novel ideas used to define stress metrics and classify hybrids for stress tolerance
- How well your classifications agree with Syngenta’s internal knowledge of hybrid stress tolerance
- Simplicity and intuitiveness of the solution
- Evaluation of factors included in the decision process
- Clarity in the explanation
- The quality and clarity of the finalist’s presentation at the 2019 INFORMS Conference on Business Analytics and Operations Research
You are provided with the following training datasets to create stress models and classify hybrids.
- Performance Dataset: This dataset contains the observed yields from the tests (trials) of hybrids. Each row represents one observation for one hybrid at a given location and year. Performance data of 2452 hybrids in 1560 locations is provided from 2008 to 2017. In addition, plant date, harvest date, and irrigation status are included for each observation, along with information about the location such as average yield and soil properties (sourced from ISRIC). The ‘performance dataset’ needs to be aligned with ‘weather dataset’ by ENV_ID (which is a unique identifier combining latitude, longitude and year). (performance_data.csv)
- Weather Dataset: This dataset (sourced from Daymet) contains the recorded weather for each environment in which any hybrids were tested. Across the growing region, differences in weather conditions and soil types will cause variation in a hybrid’s observed performance, as well as a difference in the observed average yield of all hybrids tested in a location. Weather data is included in daily increments, labeled by the day number within the year (e.g. January 1 is day 1, December 31 is day 365 in non-leap years). This dataset needs to be aligned with the ‘performance dataset’ by ENV_ID (which is a unique identifier combining latitude, longitude and year). (weather_data.csv)
- Key for Datasets: This table provides the meaning of each variable in the two datasets.
|Performance Dataset||HYBRID_ID||ID for each hybrid in dataset|
|ENV_ID||ID for each environment in dataset|
|HYBRID_MG||Maturity group of hybrid – a higher number indicates a longer growing season needed to reach maturity|
|ENV_MG||Typical maturity group of environment – a higher number indicates a longer growing season with more growing degree days; this can vary due to weather in any given year|
|YIELD||Yield of hybrid in environment|
|PLANT_DATE||Plant date for this observation|
|HARVEST_DATE||Harvest date for this observation|
||Whether field was irrigated:
NULL – unknown irrigation
NONE or DRY – no irrigation
ECO – very light irrigation
LIRR – light irrigation
IRR – normal irrigation
|ENV_YIELD_MEAN||Mean yield for ENV_ID|
|ENV_YIELD_STD||Standard Deviation of yield for ENV_ID |
|ELEVATION||Elevation of field |
|CLAY||% of clay in soil |
|SILT||% of silt in soil |
|SAND||% of sand in soil|
|AWC||Available water capacity in soil|
|PH||pH of soil|
|OM||Organic matter in soil|
|CEC||Cation exchange capacity of soil|
|KSAT||Saturated hydraulic conductivity of soil|
|Weather Dataset||ENV_ID||ID for each environment in dataset|
|DAY_NUM||Day number within year of weather variables|
|SWE||Snow water equivalent|
JAN 18, 2019
Deadline for Submissions
APRIL 14-16, 2019
Finalist Presentations and Winner Announcement
- Adee, E., Roozeboom, K., Balboa, G. R., Schlegel, A., & Ciampitti, I. A. (2016). Drought-Tolerant Corn Hybrids Yield More in Drought-Stressed Environments with No Penalty in Non-stressed Environments. Frontiers in Plant Science, 7, 1534. http://doi.org/10.3389/fpls.2016.01534
- Hengl T, Mendes de Jesus J, Heuvelink GBM, Ruiperez Gonzalez M, Kilibarda M, Blagotić A, et al. (2017) SoilGrids250m: Global gridded soil information based on machine learning. PLoS ONE 12(2): e0169748. https://doi.org/10.1371/journal.pone.0169748
- Thornton, P.E., M.M. Thornton, B.W. Mayer, N. Wilhelmi, Y. Wei, R. Devarakonda, and R.B. Cook. (2014). Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 2. ORNL DAAC, Oak Ridge, Tennessee, USA. http://dx.doi.org/10.3334/ORNLDAAC/1219