Spatial disaggregation clarifies the inequity in distributional outcomes of household solar PV installation

Global installations of household rooftop solar photovoltaics (PVs) are rapidly increasing, driven in many cases by government incentives. We show a direct correlation between economic resources and both the PV penetration and the average PV system size. Using datasets with a high areal resolution for socioeconomic indicators and household PV installations, we create an Index of Economic Resources for OwnerOccupiers for the Australian Capital Territory, Australia. Our analysis confirms the inaccuracy of using highly aggregated datasets in exploring the socioeconomic outcomes of the renewable energy policy, an inaccuracy long known in other disciplines. Analyses using such datasets are likely to overlook vulnerable areas, which could increase perverse policy outcomes. Published under license by AIP Publishing. https://doi.org/10.1063/1.5097424


I. INTRODUCTION
Global installations of small-scale, solar photovoltaic (PV) systems on household rooftops (PV hereafter) have rapidly expanded. In most cases, expansion has been supported by consumer incentives in the form of renewable generation certificates and/or feed-in tariffs (REN21, 2018), with fourteen, high income countries in Europe, the United States currently having multiple different incentives (IEA-PVPS, 2018). PVs create a variety of social benefits, to the consumer who installs it and to the society more broadly, including emission reductions, less need for electricity system upgrades, and reductions in wholesale electricity prices (Passey et al., 2018).
Additionally, however, some benefits of PVs accrue only to the households that install it, raising social equity concerns about the outcomes of government promotion of PV. These can include inequitable take-up of subsidies (Macintosh and Wilkinson, 2011); crosssubsidisation by other electricity consumers through increases in electricity prices (Chapman et al., 2016); and welfare costs to households that do not or cannot install PVs (Farrell and Lyons, 2016). There are, however, few analyses that quantitatively assess the socioeconomic and/or distributional outcomes of policy drivers of PVs (see, for example, Coffman et al., 2016;Poruschi and Ambrey, 2019;Grover and Daniels, 2017). Instead, much literature explores the socioeconomic correlates of installation, in order to understand what enables some households but not others to install PVs (see review by Karakaya and Sriwannawit, 2015), but what are the most important factors that promote or prevent installations remains contentious.
Recent analyses (Grover and Daniels, 2017;Poruschi and Ambrey, 2019) have suggested that income or socioeconomic status is positively correlated with the degree of installation of PVs, here called "penetration," and defined as the percentage of households in a given spatial unit with PVs installed; but others suggest that there is a negative correlation (Graziano et al., 2019;Sommerfeld et al., 2017). Barriers to installation can include a lack of home-ownership and a smaller house size (Sommerfeld et al., 2017) although a larger house size could be another facet of socioeconomic advantages rather than a separate explanatory factor (Balcombe et al., 2013). It has also been suggested that areas with high proportions of rental properties exhibit lower penetration (Graziano and Gillingham, 2015). Finally, Dharshing (2017) suggests that the large, initial capital outlay on a PV system could explain the link they find between the socioeconomic status and penetration.
Nonetheless, there are two important limitations that may affect the conclusions of many of these analyses, which are primarily "ecological" in their design (summarised in Table I). The first relates to the areal unit of analysis and the extent to which large population groups are used to infer findings that may be incorrect for smaller groups of that population, also known as aggregation. There is a common assumption in spatial analyses, known as the "ecological fallacy" (Tranmer and Steel, 1998), that the average attributes of the area reflect those of residents within that area. A related effect, known as the modifiable areal unit problem, is related to the "statistical bias resulting from the sensitivity of the analytical results of spatial areal data to levels of aggregation" (Nelson and Brewer, 2017). Although several studies use such aggregated data to analyze the distributional impacts of PVs [for example (Coffman et al., 2016;Poruschi and Ambrey, 2019;Sommerfeld et al., 2017;van der Kam et al., 2018)], analysis of large, internally heterogenous spatial units is problematic. As the Australian Bureau of Statistics (ABS) (2018b; 31) suggests, the use of aggregated census data "will mask some diversity at finer levels of disaggregation." Problems with analyses of correlations in highly aggregated data have been noted in the social science and public health literature studies for many decades (Robinson, 1950) and are still important today (Anselin et al., 2007).
Second, income is only one of a number of economic variables that have previously been found to explain installations. Its use as a proxy explanatory variable for wealth or economic resources more generally (Graziano et al., 2019;van der Kam et al., 2018) is problematic because, as Graziano et al. (2019) themselves suggest, income being found to have a negative relationship with installations could be due in part to income and wealth having become dissociated. Moreover, analyses that find a negative relationship between income and installation rates typically use areal units with a large population size (Coffman et al., 2016;Sommerfeld et al., 2017;van der Kam et al., 2018). For those studies that use higher resolution data-analysis by Graziano et al. (2019) using the block group level, for exampleincome is the only variable related to socioeconomic advantage/disadvantage. There are of course exceptions. A recent analysis in Australia that used highly aggregated data show a positive relationship with income (Poruschi and Ambrey, 2019). Likewise, Grover and Daniels (2017) found a positive relationship between installations and socioeconomic status in the UK although the authors confess that a more disaggregated analysis would have been possible.
Consequently, this paper compares individual household PV installation data with a more comprehensive and spatially disaggregated index of economic resources (IERs), in order to better understand the distributional outcomes of household PV. As Carley et al. (2018) suggest, "it is important to also document adverse effects of policies, not in an attempt to undermine their credibility or efficacy, but to better understand their limitations and unintended consequences." The paper does not seek to further identify individual drivers, however, given the issues with their use, particularly income, as outlined above. Our analysis is for the Australian Capital Territory (ACT, also known as Canberra) (2358 km 2 ) ( Fig. 1), a city of about 400 000 people with a PV penetration over 15%, one of the highest in the world (Australian Energy Council, 2016 andInderberg et al., 2018). When analyzing the effects of promoting renewable energy, we overcome the above limitations of data aggregation by using data at finer spatial scales than others have previously used. Our aim is to add to a small but important literature on distributional outcomes of PV installation. As Coffman et al. (2016;1042) suggest, the "distributional impacts of policies supporting distributed power generation are of great concern." II. DATA AND METHODS Data for this study was of two types: socioeconomic data obtained from the Australian Bureau of Statistics (ABS), and PV installation data for individual households obtained from a private energy company. Installation data are not publicly available below the postcode level in Australia but were made available in this instance through a larger Australian Renewable Energy Agency project on solar forecasting, which involves 11 of the 15 Distribution Network Service Providers in Australia (ARENA, project: G00854). Census data from August 2016 were compared with the total installations in the ACT up to May 2016. After the removal of Mesh Blocks-the smallest spatial unit used by the ABS, described below-with less than 10 households and one which included a very large retirement village, our analysis included 12 362 (70%) of the 17 556 systems installed by the end of 2016 (Clean Energy Regulator, 2019). The disparity in numbers of installations was due to the data supplied by the distribution company having lower than official figures. The mean installation size was 2.59 kW, with considerable variation evident in the size/capacity of installations (sd ¼ 8.89 kW; range ¼ 0.99-9.12 kW). The standard deviation value is very high because of the number of installations (12 362) and the very large variation in individual installation capacity.
Given previous Australian studies' use of postcode level analysis, we first replicated the standard method by performing an analysis of the spatial data at the postcode level. We tabulated the number of PV installations against the ABS Socio-Economic Index for Areas (SEIFA) Index of Economic Resources (IER) (ABS, 2018) for each postcode in the ACT. Since its creation in 1986 (ABS, 2018), the IER has been widely used in socioeconomic analyses (see, for example, Jablensky et al., 2000). The full methodology is found in (ABS, 2018). Here, the IER is employed as a postcode-level measure of poverty and prosperity. It is a composite measure of various indicators of wealth and income calculated from the individuals within a given area, so it provides a more complete picture of economic status than a single indicator (ABS, 2018). Because the index scores are an ordinal measure, they cannot be compared directly-for example an area with a score of 1200 is not twice as well-resourced as an area with a score of 600. Instead, the index enables areas to be ordered hierarchically. The IER is standardized to a mean of 1000 and a standard deviation of 100, with higher scores indicating greater economic advantage. We then improved on previous analyses by constructing our own index of socio-economic status, which is enhanced in two key ways. First, in order to mitigate the potential effects of the ecological fallacy and the Modifiable Areal Unit Problem, we constructed this new index at the level of the Mesh Block, the smallest building block from which other geographical units-including postal areas-are constructed by the ABS. A Mesh Block contains 30-60 households and attempts to delineate neighborhoods that respect road and suburban boundaries and, where possible, contain dwellings with a homogenous structural composition (ABS, 2016). The use of these microspatial units for socio-economic analysis has only been possible since the release of tabular census data for Mesh Blocks by the ABS in 2017. While the ABS defined 6393 Mesh Blocks in the ACT in 2016, 3805 Mesh Blocks were in scope for this analysis, each selected as having at least ten owner-occupied dwellings. The vast majority (85.0%) of the excluded Mesh Blocks contain nonresidential areas, mostly parkland and commercial areas.
Second, we removed rental dwellings from the analysis, because areas high in rental properties are found to have lower penetration (Graziano and Gillingham, 2015). The reason is often termed the split-incentive problem-the risk and up-front costs for PV installation are borne by the landlord, though the benefits accrue to the tenant (Bird and Hern andez, 2012). Consequently, households in either private rentals or social housing are systematically excluded from access to PVs and were likewise excluded from our analysis. The inclusion of this group in measures of the postcode socio-economic status may skew that indicator for the area, especially where the "salt and pepper" social housing policy has distributed social housing dwellings widely, including in areas of relative affluence (see Musterd and Andersson, 2005). While some rented dwellings may have solar installations, the inhabitants will not be the consumers that installed them. In the ACT, 66.6% of households in the 2016 Census were owner occupied.   Table II. These variables are very similar to those used by the ABS for all households to calculate the existing Index of Economic Resources (IERs) (ABS, 2018). As well as excluding Mesh Blocks with less than ten owner-occupied dwellings, we excluded records with missing data on specific variables (i.e., "Not stated" or "Not applicable" values) from both the denominator and numerator when calculating percentages.
Using this table of percentages (supplementary material, Table S1), we conducted a principle components analysis (PCA) to calculate our Index of Economic Resources for Owner-Occupiers. All variables were scaled to have a unit variance before the analysis was undertaken. The PCA was conducted using the "prcomp" function in R (R Core All variables loaded moderately onto the first component (see supplementary material Table S1), suggesting that no single variable drove results and that the IEROO is a composite of all its constituent variables. Business ownership was only modestly correlated with the index and was considered for exclusion, but ultimately retained for comparability with the ABS IER. When the IEROO is aggregated to postcode levels of geography, only a weak correlation with the IER is evident (r ¼ 0.48, n ¼ 24, p ¼ 0.014).
Statistical analysis was then performed in Excel using individual IEROO scores for Mesh Blocks in the ACT. The number and capacity of installations were aggregated to the Mesh Block level. Following the Australian Bureau of Statistics (2018), analysis was performed by ranking the in-scope Mesh Blocks into deciles by finding the average value of IEROO scores within a decile. The penetration rate and average capacity of installations were then found for these deciles and plotted. Regression analysis was performed on individual Mesh Block IEROO scores and penetration and capacity measures (n ¼ 3804), and averages of IEROO, penetration, and capacity for deciles (n ¼ 10). All returned p values were less than 0.001. Figure 2 plots a subregion of the study area, to show the difference in the spatial resolution between the Index of Economic Resources as calculated for postcodes and as calculated for Mesh Blocks. Of note is the difference in range of index scores, from 910-1234 for postcodes to 838-1625 for Mesh Blocks. Due to the homogenization of index scores, the lowest score for postcodes is 910, while for Mesh Blocks, there are scores much lower than this. Figure 2 also highlights the heterogeneity of Index of Economic Resources in Mesh Blocks compared with postcodes. Figure 3 shows how PV penetration varies with the Index of Economic Resources at the postcode level. There is very little correlation (R 2 ¼ 0.25) between a postcode's penetration and its IER score. Postcode 2615 (IER ¼ 1100), the 7th most disadvantage as measured by the IER, has the highest penetration, 13.16%. Postcode 2601 (IER ¼ 1234), the most advantaged by IER, has the lowest penetration, 0.12%. A regression analysis performed on these datasets did however return a p value below 0.001 although the relationship is "negative"as the IER increases, penetration decreases.

III. RESULTS
By contrast, Fig. 4 shows how penetration at the Mesh Block level varies across deciles of IEROO. There is a much stronger correlation (R 2 ¼ 0.77, p < 0.001) between penetration and IEROO when measured for owner-occupied dwellings at this much smaller, Mesh-Block spatial scale. Penetration rises with IEROO, with more than a tripling from the 1st (1.94%) to the 8th deciles (5.96%).
Turning to the mean capacity of installations, the correlation with IEROO is even stronger (R 2 ¼ 0.91, p < 0.001). Figure 5 shows a nearly linear relationship between a decile's increasing economic resources and the average capacity of installations for that decile. The mean installation capacity rose from 2.1 kW among the poorest decile of owner occupiers to 2.9 kW among the wealthiest decile of owner occupiers.

IV. DISCUSSION
Our results highlight the major differences in the relationship between solar penetration rates and socioeconomic status identified when analysis is undertaken using highly aggregated data, and that performed with higher resolution data, specific to owner occupiers, as we have done here. We have shown a strong, positive relationship between areas with greater economic resources and increased penetration when using owner-occupier specific data aggregated only at very fine spatial scales in comparison to the weak, negative correlation found at the postcode level. Households in Mesh Blocks with higher levels of economic resources are also installing higher capacity PV systems. It is unclear, though, why penetration decreases in the 9th and 10th deciles, as shown in Fig. 4. A recent review on the energy consumption behavior (Frederiks et al., 2015) has however highlighted research that identified an unwillingness to reduce electricity consumption in high income households, which could go some way to explaining the decrease, but this result needs further investigation. Our

Variable
Percentage of owner-occupied dwellings owning a motor vehicle Percentage of owner-occupied dwellings whose household type was not a group house Percentage of owner-occupied households with four or more bedrooms Percentage of owner-occupied dwellings that are not mortgaged Percentage of owner-occupied dwellings whose monthly mortgage repayments were less than $2800 Percentage of adults in owner-occupied dwellings whose personal income was at least $650 per week Percentage of owner-occupied dwellings occupied by a lone person Percentage of owner-occupied dwellings not overcrowded according to the Canadian National Occupancy Standard Percentage of employed persons in owner-occupied dwellings who own the business in which they work Journal of Renewable and Sustainable Energy ARTICLE scitation.org/journal/rse results also show that households in areas with the lowest scores for the Index of Economic Resources for Owner-Occupiers have, by far, the lowest penetration of PVs. When installations are present, they have lower capacities than those in areas with more economic resources. Consequently, there is higher proportion of the population with lower socioeconomic resources that have been unable to install sizeable PVs and thus benefit significantly, or at all, from either feed-in tariffs or renewable generation certificates, as has been previously highlighted in the literature. This inequity is unlikely to be overcome without direct intervention, given that premium feed-in tariffs have been "grandfathered" in all Australian jurisdictions; in some cases feed-in tariffs for new installations are a tenth of previous levels (Poruschi et al., 2018). Renewable generation certificates continue to be paid to consumers that install solar in Australia, though the Renewable Portfolio Standard to which they apply will end in 2020 (CER, 2018). Findings from previous analyses (Stock and Stock, 2016;Sommerfeld et al., 2017) are likely to be masking this inequity, due to their highly aggregated analyses of PV installations being sensitive to the Journal of Renewable and Sustainable Energy ARTICLE scitation.org/journal/rse Ecological Fallacy. It is vital that our above findings are tested in other jurisdictions at the same level of data disaggregation. Although some studies do use solar irradiance in order to determine the extent of payments under incentive schemes (for example Grover and Daniels, 2017), data limitations did not allow for this. This study has two important implications, one for policy-makers and one for analysts. The first, policy-relevant implication relates to households unable to install PV. Zimmermann and Pye (2018;594) propose that when designing renewable energy policies, "[t]he potential for disproportionate impacts on the most vulnerable groups in society require that distributional impacts receive particular attention by policymakers." In this case, a household's inability to install PVs is worsened by the increased ability for a household "with" solar to reduce their electricity consumption in a process known as the "Electricity Death Spiral":

Journal of Renewable and Sustainable Energy
"Once losing sales to [distributed generation], utilities will try to recover lost revenues by increasing their rates to a fewer number of customers. This attempt to regain lost profits will aggravate the problem of yet more customers leaving the utility system for [distributed generation]" (Costello and Hemphill, 2014;8).
The implication of this dynamics is that inequitable outcomes in PV penetration are structural and are likely to be exacerbated as the PV penetration increases across the grid. Simpson and Clifton (2016) suggest the main outcome is that households without PV systems pay higher network charges, as households with solar reduce consumption from the grid. This is concerning because all distribution network businesses in Australia, and many in the US and EU, are bound by the network cap regulation, meaning that "customer actions that decrease or increase their electricity bills, or network costs, may impact on the revenue that will be collected from all customers" (Passey et al., 2018;198). Hence, households unable to install solar and reduce demand are likely to pay ever higher electricity costs. This could be further exacerbated by wealthier consumers installing behind-the-meter batteries, which allow further reduction in grid demand (Chesser et al., 2018) and are being incentivised similarly to PVs in many jurisdictions globally (see, for example, Tidemann et al., 2018).
The second implication is a confirmation of the risks of using overly aggregated data when analyzing the outcomes of different renewable energy policies. As already noted, problems with the use of highly aggregated data have long been known in the public health literature. Geronimus and Bound (1998;475) state: "aggregate measures cannot be interpreted as if they were microlevel variables nor should a specific aggregate measure be interpreted to represent the effects of what it is labeled." This is particularly problematic in our case, since our study has confirmed Bouzarovski and Simcock's (2017) suggestion that data aggregation could lead to vulnerable areas not being identified, and thus missing out on targeted assistance or incentives for reductions in energy use. The accuracy of data used in mapping these areas is therefore imperative, particularly for those areas of lower socioeconomic advantage that may lie within a larger, more affluent area as a result of the salt and pepper social housing policy, or vice versa. The same risk applies to aggregating the target population (i.e., owner-occupiers) with another population unlikely to be eligible for a program (i.e., rental tenants), which can obscure relationships between the socioeconomic status and program uptake.
The use of overly aggregated data is concerning more broadly, given that studies exploring equity issues with the renewable energy transition continue to use such data, which homogenizes large population groups. Exploring the equitable placement of renewable generation technologies in Germany, Drechsler et al. (2017) determine equity by using Gini coefficients, a measure of income inequality, at the Lander, or state level, of which there are sixteen. The population size in the Lander ranges from 671 489 (Bremen) and 17 865 516 (North Rhine-Westphalia). Along with very large population numbers in the author's spatial designations, there is no mention of differing "ability" to pay when measuring willingness to pay for the distance between settlements and power stations. The authors also admit that the benefits and burdens of renewable generation placement were not equally taken into account and that the demographics of the interview participants were skewed toward higher than the average income. Similarly, Carley et al. (2018), developing a vulnerability index for the USA based on Renewable Portfolio Standard outcomes, utilize county level socioeconomic data. Similar to Australian Mesh Blocks, the US Census Bureau collects Census data in "Census blocks" although the smallest geographic area for which socioeconomic data is available is a collection of Census blocks known as a "block group" (average population ¼ 1000) (Federal Register, 2018). In comparison, counties have an average population of 103 666; the largest is the Los Angeles county (10 163 507), the smallest, the Kalawao county in Hawaii (88) (U.S. Census Bureau, 2018). The disparity between Census Block Group and County level measures is apparent when comparing Fig. 6, which shows an Area Deprivation Index calculated at the Census Block level in Pennsylvania (UWSMPH, 2018), with Fig. 7, Carley et al.'s (2018) calculation of vulnerability. Although the indexes use different variables, Fig. 6 shows a range of disadvantages within Pennsylvania, whereas Carley et al.'s (2018) analysis shows a little difference in vulnerability at the county level, which is understandable given the major disparity in the mean population between the areal units.
Given the contrast in our results between levels of data aggregation (postcode vs Mesh Block), it is likely that these analyses would furnish different results had they also been performed at finer scales. Recent work has however highlighted the improvements in the accuracy and availability of finer scale census data globally (Doxsey-Whitfield et al., 2015), as well as the possibility for analysis at fine spatial scales where reliable census data does not exist (Stevens et al., 2015;Clarke et al., 2018).
Our analysis has three limitations. First, although we use the most disaggregated data available, the aggregation of household PV installation data to Mesh Blocks will still create modest inaccuracies due to the ecological fallacy. These are, however, much fewer than using spatial boundaries with much higher numbers of households and population. Second, we have removed rental properties from the IEROO. Because we could not identify rental properties that have PVs installed, this will skew results. Third, apartments have not been accounted for in the analysis, so that Mesh Blocks with higher proportions of apartments will have much lower penetration, also skewing results.

V. CONCLUSIONS
Using socioeconomic and PV installation data at an unusually high areal resolution in the Australian Capital Territory, which has a household PV penetration greater than 15%, we showed positive correlations between increased economic resources, as measured by a new, more inclusive index, and penetration and average capacity of installations. Our new Index of Economic Resources for Owner-Occupiers (IEROOs) removes bias related to the inclusion of socioeconomic characteristics of households that cannot install PVs. The results show a tripling of penetration from the lowest to highest deciles of the IEROO. This contradicts some previous studies, which either failed to show a relationship or showed a negative relationship between penetration and socioeconomic indicators when analyzing highly aggregated data and/or income as a lone economic variable. Our results have two major implications. First, the distributional outcomes of PV penetration are inequitable, meaning the incentives available to consumers have been distributed inequitably, though the extent of this would need to be tested with more detailed data. As a result, areas with the lowest economic resources are unable to reduce their demand on grid-supplied electricity and may be forced to pay an ever-increasing share of the cost of electricity supply. This is likely to be compounded by installation of behind-the-meter batteries by some consumers. Second, our results suggest that due to the ecological fallacy and the modifiable areal unit problem, analyses that use aggregated data could lead to the most vulnerable communities or consumers being overlooked. This may cause, or further entrench, perverse outcomes. A more disaggregated analysis is needed to make sure that the renewable energy transition does not leave lower socioeconomic individuals, households, and communities behind. Previous aggregated studies could, however, be used to identify areal units of interest to test findings with more disaggregated data and multivariable indexes of the socioeconomic status. However, our research was made possible only by installation data being supplied by a private electricity sector company and fine-scale socioeconomic data being given to researchers by the government. Data sharing relationships between the government, industry, and academia are therefore imperative for carrying out further research in this important area.

SUPPLEMENTARY MATERIAL
See the supplementary material for the scree plot and table of variable loadings mentioned in the data and methods section. Journal of Renewable and Sustainable Energy ARTICLE scitation.org/journal/rse