Management sensitivity, repeatability, and consistency of interpretation of soil health indicators on organic farms in southwestern Ontario

Abstract: Assessment tools are needed to evaluate the effect of farming practices on soil health, as there is increasing interest from growers to improve the health of their soils. However, there is limited information on the efficacy of different soil health indicators on commercial farms and perhaps less so on organic farms. To assess efficacy, three organic growers in cooperation with the Ecological Farmers Association of Ontario’s Farmer-Led Research Program tested management sensitivity, measurement repeatability, and consistency of interpretation of different soil health indicators. On each farm, we compared permanganate-oxidizable carbon (active carbon), organic matter, wet aggregate stability, phospholipid fatty acid analysis, Haney soil health test, and Haney nutrient test on one field of grower-perceived high productivity, one field of grower-perceived low productivity, and one reference site (undisturbed, permanent cover). Our results were consistent with previous research that showed grower perception of productivity and soil health associated with management-sensitive soil health indicators. Of the indicators tested, active carbon was the only indicator that was sensitive, repeatable, and consistent across the three farms, and soil organic matter was highly repeatable and consistent to detect differences greater than 0.5% organic matter. This study highlights differences among soil health indicators on commercial farms, and it concludes that active carbon and organic matter were the most useful soil health indicators for these organic farms. Participating growers intend to use results to benchmark current soil status and to help guide land management decisions towards improved soil health.


Introduction
Farmers, with their growing understanding of the role that soil plays in productivity, resilience to weather extremes, increased water quality, and mitigation of soil erosion, are interested and motivated to adopt practices that increase soil health (Roesch-McNally et al. 2017). To meet this demand, researchers have developed and refined field and laboratory methods that go beyond the standard chemical tests for nutrient status assessment and estimate the biological and physical status of the soil. With these new soil health indicators identified, the challenge now is to bring this knowledge out of the laboratory and to the farm.
On organic compared with conventional farms, differences in nutrient and soil management likely affect the efficacy of soil health indicators (Delate et al. 2013;Fine et al. 2017). For example, nutrient reserves from manure application in organic systems may be less readily available than synthetic fertilizer inputs while contributing more to microbial biomass carbon (C) and nitrogen (N) (Kallenbach and Grandy 2011) and soil C stocks (Stockdale and Watson 2009). Other organic amendments like compost and mulch addition can increase microbial biomass and activity (Cong Tu et al. 2006) and result in lower net mineralization rates. Fullinversion tillage can decrease soil C stocks (Van Eerd et al. 2014) and aggregate stability (e.g., Kasper et al. 2009), and frequent tillage is often used in organic systems for weed control. This negative effect of tillage, however, can be off-set with diversified crop rotations (Davis et al. 2012;Congreves et al. 2017) that are common in organic systems (Watson et al. 2006) and support higher levels of microbial biomass, greater production of available N, and agronomic performance (Cong Tu et al. 2006;Davis et al. 2012;Congreves et al. 2017;King and Hofmockel 2017). Soil health indicators, therefore, should be assessed on grower fields and, for growers using organic practices, on fields that are under organic management.
To select the most useful soil health indicators for their organic farms, three growers in cooperation with Ecological Farmers Association of Ontario's Farmer-Led Research Program assessed how well different soil health indicators identifiedor benchmarkedlowproductivity fields relative to high-productivity fields, with the goal of helping to guide land management decisions with respect to improving soil health. They compared soil health indicators in fields of growerperceived high productivity, grower-perceived low productivity, and a reference area (i.e., treed fence row).
Growers based these perceptions on observations of biomass and species composition in pastures, and yield in hay fields and vegetable fields. Previous studies show that growers' perception of soil quality correlates with management-sensitive soil health indicators (Gruver and Weil 2007). Our goal was not to validate whether the sites were healthy per se but to use grower expertise to identify sites with differing soil health to assess soil health indicators and characterize different indicators for use in soil health benchmark studies in the future.
Specifically, growers' research questions were as follows: What soil health indicator is most useful on my organic farm? What indicator is worth spending money on? What will help me track progress from a "bad" field to a "good" field? To answer these questions and determine overall usefulness, we assessed different soil health indicators for (i) soil health indicator effectiveness, including management sensitivity (i.e., how well an indicator distinguishes between fields); (ii) measurement repeatability, including level of precision (i.e., how reliably an indicator distinguishes between fields); and (iii) consistency of interpretation, including the consistency of interpretation (i.e., more is better, etc). These are three of four criteria used by the United States Department of Agriculture (USDA) to recommend soil health indicators and associated laboratory procedures (USDA 2018). We did not assess the fourth USDA criterionproduction readinessbecause all of the soil health indicators used in this study are currently available from accredited soil-testing laboratories. We hypothesized that soil health indicators would differ in their ability to detect growers' perceived differences in productivity. More specifically, we hypothesized that measurements associated with microbial activity (i.e., active C, respiration, and mineralization) would be more sensitive but less repeatable than measurements in C pool size (e.g., organic matter). To our knowledge, this is the first study to test the effectiveness, repeatability, and consistency of interpretation of different soil health indicators on grower fields and under organic management.

Field selection
In June 2016, we measured soil health parameters on three organic farms that were located near Dundalk (lat 44.18,Lucknow (lat 43.90,and St. Thomas (lat 42.71, in southwestern Ontario, Canada. Each farm had two to three sites; an undisturbed reference site (R) and grower-perceived high-(H) (all farms) or low-(L) (Dundalk and Lucknow only) productivity fields. At all three farms, R sites were located adjacent to the H site and consisted of a treed fence row with perennial grass that was uncultivated for at least two decades. The R site served as an approximation of the soil health potential at each farm.
At Dundalk, the two fields sampled were certified organic hay under similar management. At Lucknow, both fields were certified organic for 29 yr. The L site was used as pasture and hay for more than 20 yr, and the H site was used as pasture. Both pasture sites were rotationally grazed with cattle, draft horses, swine, and poultry. At St. Thomas, we sampled only one organic vegetable field (H) and a treed fence row (R). In 2016 and 2014, the vegetable field was planted to a variety of vegetables to supply a weekly CSA (communitysupported agriculture). In 2015 and 2013, it was planted to a series of cover crops (2015: mix of oat/barley/pea, followed by soybean/millet/sunn hemp/sunflower, then the field was split into three sections of daikon radish, oat/barley/pea, and winter cereal rye/hairy vetch for the fall and winter; 2013: mustard followed by one section of buckwheat and another section of winter cereal rye undersown with red clover). These different cover crops were selected to complement with specific following vegetable crops; and the sections were randomly sampled to evaluate soil health indicators on the entire H field and not specific cover crops or land management.

Soil sampling
We sampled soil on 22 June 2016 at St. Thomas and 23 June 2016 at Dundalk and Lucknow. Soil at St. Thomas was loam to sandy loam (Brunisolic; Gray Brown Luvisol), soil at Dundalk was a silt loam (Podzol; Gray Brown), and soil at Lucknow was a loam to silt loam (Podzol; Gray Brown). To be reflective of the site, we sampled 10 random but representative (i.e., avoided headlands, depressions, etc.) locations in each site to form five composite samples. To account for the variability of each soil parameter tested, at each location, we took five replicate soil cores (2.2 cm diameter) to 15 cm depth within 1 m of each other and pooled over 10 locations in each site (Fig. 1). This resulted in a total of 15 soil samples [three sites (H, L, and R) by five replicates] each at Dundalk and Lucknow and 10 samples (only two sites H and R) at St. Thomas. To avoid contamination between sites, a soil core was taken at the new site but discarded. All composite soil samples were hand homogenized, bagged, placed on ice in a cooler while in the field, and stored at 4°C. Less than 5 d after sampling, samples were hand homogenized, divided into two, and shipped on ice to the laboratories using overnight courier service.

Soil health indicators
For all 40 soil samples, Ward Laboratories, Inc. (Kearney, NE, USA) analyzed for the Haney soil health test and nutrient test (Haney et al. 2012). These consist of many parameters including soil health indicators: Solvita® CO 2 -Burst (soil respiration as a proxy for general microbial activity), water-extractable organic C (WEO C; readily available C pool as a proxy for C food source), water-extractable organic N (WEO N; available organic N pool as a proxy for bioavailable N), soil organic matter (%; soil organic C content as a proxy for organic matter cycling), as well as weak-acid-extracted nutrients (Haney et al. 2010). Briefly, Ward Laboratories, Inc. dried the soil at 50°C and sieved (2 mm). The Solvita® CO 2 -Burst was a 24 h incubation to quantify respiration from 40 g soil in a 50 mL container with a perforated bottom to allow rewetting with 25 mL distilled water in an 250 mL glass jar sealed with a Solvita® paddle. After 24 h, CO 2 -C was quantified with a Solvita® digital reader and expressed as mg CO 2 -C kg −1 . Ward Laboratories, Inc. quantified WEO C and WEO N by shaking a 4 g soil subsample with 40 mL of distilled water for 10 min and filtering with a Whatman 2V. The laboratory used a weak acid (H3A) extraction as described by Haney et al. (2010) to quantify NH 4 -N, NO 3 -N, and PO 4 -P by flow injection analysis (Lachat QuicChem 8000, Milwaukee, WI, USA) and analyzed concentrations of phosphorus (P), potassium (K), calcium (Ca), iron (Fe), and aluminum (Al) via inductively coupled plasma atomic emission spectroscopy (Thermo Scientific Inc., Waltham, MA, USA). Other soil parameters quantified included pH (1:1 v/v) and organic matter via loss-on-ignition method. Finally, the laboratory calculated additional indicators as part of the Haney soil e., field of growerperceived high (H) or low (L) productivity or reference site (R)], we sampled five cores from 10 random but representative locations. (b) At a sampling location, we placed each core into one of five bags to form five composite samples such that every composite sample comprised a core from each of the 10 sampling sites. In total, we had 15 composite samples from Dundalk, 15 composite samples from Lucknow, and 10 composite samples from St. Thomas.
health test and nutrient test, including traditional value, nutrient value, N difference, and N savings (Ward Laboratories, Inc. 2019).
Ward Laboratories, Inc. also analyzed St. Thomas soil for phospholipid fatty acid analysis (PLFA; microbial community structure as a proxy for microbial diversity) using the chloroform extraction method (Bligh and Dyer 1959;Hamel et al. 2006). The laboratory assigned biomarkers to microbial groups accordingly.
Cornell Nutrient Analysis Laboratory (Ithaca, NY, USA) analyzed all 40 samples for active C (readily available C pool as a proxy for C food source) and wet aggregate stability (aggregation as a proxy for soil structural stability) (Idowu et al. 2009). Briefly, the Cornell Nutrient Analysis Laboratory quantified active C was using 5 g dry soil via permanganate oxidation method adapted from Weil et al. (2003). With 40 g dry soil, the laboratory determined wet aggregate stability of pods of 0.25-2 mm using a rainfall simulation method (Ogden et al. 1997).
The aforementioned indicators of soil health were chosen because previous Ontario studies showed that they were sensitive to management (Van Eerd et al. 2014;Congreves et al. 2015;Chahal and Van Eerd 2018). They are commonly used tests that measure aspects of the soil microbial community (e.g., activity, structure, and function), which was of interest to the farmers in this study. As well, these indicators are readily available to farmers in Canada and the USA. As farmer-led research, this project was designed around the growers' questions. In this case, only one grower was interested in specifically testing microbial community size and structure (i.e., PLFA), and due to budget constraints, we could test only one field (H) compared with the R site.

Statistical analysis
Statistical analysis was conducted using R version 3.0.2 (R Core Team 2013). Each farm was analyzed separately and interpreted for consistency. To assess management sensitivity, we tested management sensitivity of each soil health indicator at Dundalk and Lucknow using a oneway analysis of variance for each soil health indicator. With only two sites at the St. Thomas farm, we tested management sensitivity using a paired t-test on individual parameters. All significant differences were set at P > 0.05.
To assess the consistency of interpretation of each soil health indicator, we used Tukey's honestly significant difference test to rank the fields on each farm. We assumed that soil health indicators would be highest in the reference site, followed by the field of growerperceived high productivity and the field of growerperceived low productivity.
To assess the measurement repeatability of each soil health indicator, we conducted power analyses for each field. As a further measure of variability, we calculated coefficients of variation (CV) for each site at each farm. We estimated effect size and standard deviation from each farm, which we used to calculate the sample size needed to achieve a power of 0.95 and 0.9 for soil health indicators included in the Haney test, wet aggregate stability, active C, and markers from the PLFA. For Dundalk and Lucknow sites, we conducted power analysis between H and L sites. For the St. Thomas site, which did not have a L site, we conducted power analysis between L and R sites.
At Dundalk and Lucknow, we used non-metric multidimensional scaling for all parameters to visualize how the soil health indicators separated by site (R, H, and L), which we tested statistically using a permutational multivariate analysis of variance (PERMANOVA). There were not enough data points to ordinate samples at St. Thomas. To compare relationships among indicators, Spearman's correlation coefficients were calculated for all indicators using the full dataset.

Sensitivity to management
As a component of its effectiveness, a soil health indicator must be sensitive to changes in soil and crop management systems. When interpreted together, the combination of all soil health indicators differentiated among the two hay fields and reference site at Dundalk, among the two pastures and reference site at Lucknow (PERMANOVA: P < 0.01 for both; Fig. 2), and between the vegetable field and R site at St. Thomas (PERMANOVA: P < 0.01). Thus, at all three farms, the three sites (R, H, and L) had different soil health. Because it is generally cost prohibitive for a grower to pay for multiple soil health indicators, we also assessed the sensitivity of individual soil health indicators.
When analyzed separately, active C, nitrate, and total P and K differentiated between production fields and reference area at all three farms (Tables 1-3). For PLFA, which was only measured at St. Thomas, PLFA diversity index, total PLFA biomass, biomass of total bacteria, Actinomycetes, Gram-negative (Gram−) bacteria, total fungi, arbuscular mycorrhizae fungi, saprophytic fungi, protozoan biomass, Gram-positive (Gram+) bacteria; and undifferentiated PLFA markers, Gram+:Gram− ratio, and the ratio of total saturated to unsaturated PLFA markers were all different between the R site and the vegetable production field (H; Table 4).

Indicator repeatability
Using power analysis, we estimated the number of replicates needed to produce data that would be repeatable, which is important for guiding land management decisions. At Dundalk and Lucknow, we conducted power analysis between H and L sites. For these, active C, WEO C/N, Haney test N, nitrate, Haney available N, organic N release, inorganic P, total P, Haney available P, Haney available K, traditional value, nutrient value, N difference, N savings, K, Ca, Fe, and Al all required only two to six replicates to achieve power of 95% and two to five replicates to achieve a power of 90% (Table 5).
Organic matter required three replicates at Lucknow and 15 at Dundalk to achieve a power of 95% (two and 13 replicates, respectively, at 90%). The low statistical power for organic matter at Dundalk was a result of similar averages between H and L sites (6.54 and 6.08, respectively) and not a result of high variation among replicates (see standard error values in Table 1) Furthermore, at Dundalk, the CV organic matter was <0.08, whereas all other indicators at all farms had CV >0.08 for at least one field (data not shown). All other tested indicators required 6-100+ replicates on one of the farms to achieve a power of 90%.
At St. Thomas, where we compared H and R sites, active C, organic matter, wet aggregate stability, soil respiration, WEO C, WEC N, WEC C/N, Haney test N, nitrate, Haney available N, inorganic P, total P, Haney available P, and Ca required two to six replicates to achieve power of 95%. For PLFA, all biomass indicators (expect rhizobia) as well as % Gram+, Gram+:Gram− ratio, and total saturated : total unsaturated ratio required only two to five replicates to achieve power of 95% and two to four replicates to achieve a power of 90% (Table 4). All other PLFA indicators required >16 and >13 replicates to achieve a power of 95% and 90%, respectively.

Consistency of interpretation
To turn differences in soil health indicators into land management decisions, the interpretation of a result should be unambiguously consistent (i.e., more is better such as with organic matter, less is better such as Na concentration, or optimal range such as pH). Of the indicators tested, active C ranked the reference area and production fields consistently on all three farms (R > H > L) (Tables 1-3). Organic matter also correctly ranked the fields on all three farms, but at Dundalk, organic matter in the production fields was too similar to distinguish statistically.
At Dundalk, Haney available K and K also ranked the sites from R to H and L. Soluble salts, Haney soil health indicator, WEO N, Haney test N, nitrate, N mineralization, Haney available N, organic N release, inorganic P, total P, Haney available P, organic P reserve, traditional value, nutrient value, N difference, N savings, and Ca were all greater at the H site than the L site, with the R site inconsistently or indistinguishably ranked (Table 1). Soil pH, wet aggregate stability, soil respiration, and WEO C were indistinguishable between H and L with R greater than or equal to H, except for pH, which was lower in the R site (Table 1). Ammonium, WEO C/N, and Al were greater in the L site compared with the H site, with R equal to H. Fe was greater in the L site compared with the H site, which was equal to the R site (Table 1).
At Lucknow, Haney soil health indicator, WEO N, Haney test N, nitrate, Haney available N, and organic N release also ranked the sites from R to H and L. Wet aggregate stability, WEO C, inorganic P, total P, Haney available P, organic P reserve, Haney available K, traditional value, nutrient value, N difference, N savings, K, and Fe were all greater at the H site than the L site, with the R site inconsistently or indistinguishably ranked (Table 2). Soil pH, soluble salts, Solvita® respiration, WEO C/N, ammonium, and N mineralization were indistinguishable between H and L sites at Lucknow with R greater or equal to the H site. Aluminum and Ca were greater in the L site compared with the H site, which was equal to the R site (Table 2).
At St. Thomas, wet aggregate stability, Solvita® respiration, WEO C/N, WEO C, N mineralization, organic N release, N savings, all biomass indicators (expect rhizobia) as well as PLFA diversity index, % Gram+, Fig. 2. Non-metric multidimensional scaling indicating relationship of soil parameters of three sites sampled at Dundalk (top) and Lucknow (bottom) farms (at each farm n = 15; P < 0.01). H, field of high productivity (top left cluster in each plot); L, field of low productivity (cluster on right for Dundalk; cluster at bottom for Lucknow); R, undisturbed reference (treed fence row). Note: There were not enough data points to ordinate samples at the St. Thomas farm.
[Colour online.] Gram+:Gram− ratio, and total saturated : total unsaturated ratio were all greater in the reference area than the vegetable production field (Tables 3 and 4). Soluble salts, Haney test N, nitrate, organic P, total P, Haney available P, organic P reserve, traditional value, nutrient value, K, and Fe were all greater in the H site compared with the R site. Soil pH, Haney soil health indicator, WEO N, ammonium, Haney available N, Haney available K, N difference, Ca, and Al as well as all PLFA markers expressed as a percentage, rhizobia biomass, fungal to bacterial ratio, predator to prey ratio, monounsaturated to polyunsaturated PLFA marker ratio, and cyclopropyl (Pre18.cw.7c.cy19.0) markers were indistinguishable between R and H sites. In general, we were unable to detect differences at St. Thomas for other cyclopropyl (Pre161.w7.ccy.170) markers (data not shown).

Associations among indicators
We computed Spearman's correlation coefficients for every pair of soil health indicators to better understand relationships among indicators (Supplementary Table S1 1 ). Out of the 406 pairings, 107 were significantly correlated to each other (P < 0.05). Strong positive correlations occurred between active C with the following: organic matter (r = 0.853), water-stable aggregates (r = 0.706), soil respiration (r = 0.654), N mineralization (r = 0.654), WEO C (r = 0.652), WEO N (r = 0.615), N release (r = 0.615), Haney soil health test score (r = 0.502), N difference (r = 0.625), and N savings (r = 0.627). All of these indicators were also correlated with each other (Supplementary Table S1 1 ). In general, we observed nonsignificant or weak correlations between chemical Note: Mean and standard error (in brackets) are shown. For each site, indicators not sharing a lowercase letter differ significantly at the P < 0.05 level. C, carbon; N, nitrogen; P, phosphorus; K, potassium; Ca, calcium, Fe, iron, and Al, aluminum. NA, not available because all values were at limit of detection.
a Units for each parameter were mg kg −1 , unless stated otherwise.
indicators (i.e., nitrate, ammonium, K, Ca, Fe, and Al) and both physical and biological indicators.

Discussion
A useful soil health indicator that is effective for guiding land management decisions is one that responds to differences in soil health with sensitivity and repeatability and has consistency of interpretation.
With these considerations, we tested and compared multiple soil health indicators on three organic farms in southwestern Ontario, Canada. Our data demonstrate that soil health of each field was different and consistent with the growers' perceptions of soil health. Our data also support our general hypothesis that soil health indicators differ in their ability to detect differences in grower-perceived productivity and, therefore, provide validity to our approach to further investigate the sensitivity, repeatability, and consistency of interpretation of specific indicators.
Of the soil health indicators measured in this study, permanganate-oxidizable C (active C) was the only indicator that was sensitive, repeatable, and consistent, and it was also highly correlated with biological, physical, and some chemical indicators. Active C measures the labile and easily oxidizable pool of organic matter that is readily available as a microbial food source and reflects soil C stabilization practices (Hurisso et al. 2016). From this mechanistic perspective, active C should be consistently interpretable (i.e., more is better). Indeed, for all farms in this study, R sites had the greatest concentration of active C followed by H and L sites. Active C was also sensitive enough to distinguish among fields and required relatively few replicates (<5) to achieve statistical power of 95%. This result was consistent with assessments of active C across diverse agroecosystems in North America that showed that active C predicted agronomic performance better than other soil C factions (Culman et al. 2012;Hurisso et al. 2016). Note: Mean and standard error (in brackets) are shown. For each site, indicators not sharing a lowercase letter differ significantly at the P < 0.05 level. C, carbon; N, nitrogen; P, phosphorus; K, potassium; Ca, calcium, Fe, iron, and Al, aluminum. NA, not available because all values were at limit of detection.
a Units for each parameter were mg kg −1 , unless stated otherwise.
The usefulness of active C supports our hypothesis that measurements associated with microbial activity would be sensitive and further supports the growing body of literature showing that rhizosphere interactions related to labile C control soil health (Sokol et al. 2019). One potential downside to active C is that it may not be repeatable due to the inherent heterogeneity of substrate availability in soil, which results in localized microsites and dynamic process rates (i.e., "hotspots" and "hot moments"; Kuzyakov and Blagodatskayaa 2015). In our study, however, active C was sensitive to differences among all fields on all three farms. This sensitivity was apparent even with a relatively high background soil C pool and a relatively high active C pool. Mean organic matter was >6% for all sites except the H site at St. Thomas, which was 3.34% organic matter. Active C was >600 mg kg −1 , which corresponds to "high" or "very high" on the relative ranking by Cornell Soil Organic matter, which comprised active, labile C along with intermediate and stable forms of C, is related to C storage, water and nutrient retention, and biogeochemical cycling. In this study, organic matter consistently ranked the fields (i.e., R > H > L) at all farms. Although we could not detect a difference in organic matter between hay fields at Dundalk because they were very similar, this indicator was sensitive and repeatable for differences greater than 0.5%. Organic matter was also highly correlated with biological, physical, and some chemical indicators. The usefulness of organic matter as an indicator of soil health supports our hypothesis that measurements of C pool size would be repeatable, and our data identified a threshold of sensitivity of 0.5% organic matter.
Similar to the findings in our study, organic matter and active C were useful indicators in other studies Note: Mean and standard error (in brackets) are shown. For each site, indicators not sharing a lowercase letter differ significantly at the P < 0.05 level. C, carbon; N, nitrogen; P, phosphorus; K, potassium; Ca, calcium, Fe, iron, and Al, aluminum. NA, not available because all values were at limit of detection.
a Units for each parameter were mg kg −1 , unless stated otherwise. Nutrients were extracted with H3A according to Haney et al. (2010). examining CASH. In an assessment of over 5000 samples across the USA, active C, organic matter, and penetration resistance (not measured in our study) were the most useful soil health indicators (Fine et al. 2017). Across this broad group of sites, Fine et al. (2017) found that active C was the single best predictor of soil health and accounted for 45% of the variation in the dataset, whereas organic matter was also highly predictive (43%) and correlated with most biological indicators.
Active C and organic matter also had large weighting factors in an Ontario Soil Health Assessment done on conventional farms across southern Ontario (Congreves et al. 2015). Out of 13 soil health indicators in a principle components analysis, organic matter had the largest weighting factor (0.83) and active C the fourth largest (0.62), and both were also sensitive to crop rotation, with rotations including winter wheat and alfalfa scoring highest (Congreves et al. 2015).
Contrary to active C and organic matter, the Haney soil health test, chemical (i.e., pH and nutrient analyses), physical (i.e., wet aggregate stability), and calculated indicators from Haney nutrient test (e.g., nutrient value and N savings) measured in this study were not sensitive enough for us to discern consistent and repeatable differences among the fields tested. Although previous studies have documented the usefulness of water-stable aggregates (Fine et al. 2017), mineralization, and soil respiration (Campbell et al. 1997), in our study, these indicators had high variation among replicates that resulted in poor statistical power (CV > 0.08 in one or more fields; data not shown). Other than active carbon and organic matter, none of the other indicators tested were consistent in their interpretation across farms. For example, nitrate was ranked H > R > L at Dundalk, R > H > L at Lucknow, and H > R at St. Thomas.
It is unexpected that weak-acid-extracted nutrients (Haney et al. 2010) were not consistent or sensitive enough to detect between H and L sites because these fields differed in grower perception of soil health as indicated by crop health, species composition of hay Table 4. Average soil (0-15 cm) microbial community size and structure via phospholipid fatty acid analysis (PLFA) and power analysis to determine the soil sample number needed to achieve significance between reference treed fence row (R) and organic vegetable production field (H) at St. Thomas, ON, Canada. Note: Mean and standard error (in brackets) are shown. a Mono:poly ratio is the ratio of monounsaturated fatty acids to polyunsaturated fatty acids.
fields, and yield. Our research on organic farms using organic fertilizers supports that gross rates, but not net pools of nutrients, may differ with organic amendments compared with inorganic fertilizers (Flavel and Murphy 2006) and partially explains why the weak-acid-extracted nutrient pools neither accurately nor reliably reflect soil health. Similar to our finding, Fine et al. (2017) observed that CASH chemical indicators exhibited little correlation with biological and physical indicators. They attributed their finding to the fact that nutrient status of most agricultural soils in their study was independently managed through synthetic fertilizer and lime additions. There is much debate in the literature and within the agricultural community, including farmers, as to the usefulnessincluding sensitivity, repeatability, and consistency of interpretationof commercial soil health tests such as the Haney soil health test, the CASH, and the Solvita® tests (Roper et al. 2017;Chahal and Van Eerd 2018). In part, it was this debate that motivated the growers involved in this farmer-led research project. Although we only evaluated the Haney soil health test, our combined results of sensitivity, repeatability, and consistency of interpretation suggest that active C and organic matter are useful indicators of soil health but not extracted nutrients from the Haney nutrient test and the Haney soil health test. The lack of performance of these Haney tests may be due to the organic sources of nutrients used on the tested organic farms, but further research is needed.
At the St. Thomas farm, the grower chose to analyze PLFA at a reference site compared with a vegetable field of high grower-perceived productivity. The PLFA reflects the metabolically active fraction of the microbial community and estimates the absolute abundance (size) and relative abundance (structure) of microbial groups as identified by fatty acid markers. Because data from this study are limited to one farm, we cannot draw conclusions about consistency of interpretation among farms. In terms of sensitivity and repeatability, results across all microbial markers showed that size was greater in the reference site compared with the production field. At the same time, there were only two detectable differences in microbial structure: Gram+:Gram−bacteria and the ratio of saturated to unsaturated biomarkers were both greater in the production field. Taken together, microbial community size at the St. Thomas farm was more useful as a soil health indicator than microbial community structure. This is consistent with previous research that found that microbial abundance was best predicted by soil C and nutrients (e.g., Kallenbach and Grandy 2011), whereas edaphic (e.g., pH, soil texture) and environmental factors (e.g., latitude, elevation and soil surface temperature), which were not tested on the St. Thomas farm, best predict microbial community structure (Lauber et al. 2008;Xue et al. 2018). Given the differences observed on one farm, it would be worthwhile pursuing PLFA and other measurements of microbial abundance on other farms.

Conclusions
To empower farmers to build soil health, it is necessary to benchmark current status with an appropriate indicator of soil health. In this context, we analyzed different soil health indicators to answer growers' research questions around the most useful soil health indicator that will help them track progress from a "bad" field to a "good" field. Of the over 20 soil health indicators tested in this study, permanganate-oxidizable C (active C) was sensitive, repeatable, and consistent with respect to interpretation and, therefore, was the most useful indicator for these growers to use to track changes in soil health. Organic matter was also sensitive, repeatable, and consistent to detect differences greater than 0.5% organic matter. Although data on soil microbial communities were limited to one farm in this study, size but not structure was sensitive and repeatable. This is the first study to focus on soil health indicators on working organic farms. Results are limited to three farms from a single region in Ontario but are consistent with studies across diverse agroecosystems throughout North America, further supporting that the efficacy of active C and organic matter can be generalized across management systems and may be useful soil health indicators for future benchmark studies.