WindDragon: automated deep learning for regional wind power forecasting

Julie Keisler; Etienne Le Naour

doi:10.1017/eds.2025.10

WindDragon: automated deep learning for regional wind power forecasting

Published online by Cambridge University Press: 19 March 2025

Julie Keisler

and

Etienne Le Naour

Show author details

Julie Keisler*: Affiliation:
EDF R&D, Palaiseau, France University Lille, INRIA, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, France
Etienne Le Naour*: Affiliation:
EDF R&D, Palaiseau, France Sorbonne Université, CNRS, ISIR, Paris, France
*: Corresponding authors: Julie Keisler and Etienne Le Naour; Emails: [email protected]; [email protected]
Corresponding authors: Julie Keisler and Etienne Le Naour; Emails: [email protected]; [email protected]

Article contents

Abstract
Impact Statement
Introduction
State-of-the-art
WindDragon
Experiments
Conclusion and impact statement
Author contribution
Competing interest
Data availability statement
Ethics statement
Funding statement
Footnotes
References

Abstract

Achieving net-zero carbon emissions by 2050 necessitates the integration of substantial wind power capacity into national power grids. However, the inherent variability and uncertainty of wind energy present significant challenges for grid operators, particularly in maintaining system stability and balance. Accurate short-term forecasting of wind power is therefore essential. This article introduces an innovative framework for regional wind power forecasting over short-term horizons (1–6 h), employing a novel Automated Deep Learning regression framework called WindDragon. Specifically designed to process wind speed maps, WindDragon automatically creates Deep Learning models leveraging Numerical Weather Prediction (NWP) data to deliver state-of-the-art wind power forecasts. We conduct extensive evaluations on data from France for the year 2020, benchmarking WindDragon against a diverse set of baselines, including both deep learning and traditional methods. The results demonstrate that WindDragon achieves substantial improvements in forecast accuracy over the considered baselines, highlighting its potential for enhancing grid reliability in the face of increased wind power integration.

Keywords

automated machine learning deep learning renewable energies wind power forecasting

Type: Application Paper
Information: Environmental Data Science , Volume 4 , 2025 , e19

DOI: https://doi.org/10.1017/eds.2025.10 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press

Impact Statement

This article presents an optimization tool to automatically find efficient deep neural networks to forecast aggregated wind power generation at the level of a region or a country. These models are based on wind speed maps from numerical weather prediction (NWP) forecasts and take advantage of their spatio-temporal aspect. These methods could play a crucial role in the smooth operation of power grids in the context of massive renewable energy integration.

1. Introduction

1.1. Global context

To meet the 2050 net zero scenario envisaged by the Paris Agreement [United Nations Convention on Climate Change, 2015], wind power stands out as a critical energy source for the future. Remarkable progress has been made since 2010, when global electricity generation from wind power was 342 TWh, rising to 2100 TWh in 2022 (International Energy Agency, IEA, 2023). The IEA targets approximately 7400 TWh of wind-generated electricity by 2030 to meet the zero-emissions scenario. However, to realize the full potential of this intermittent energy source, accurate forecasts of wind power generation are needed to efficiently integrate it into the power grid.

1.2. Regional wind power forecasting

Most of the work in the literature on wind power forecasting is done at a local scale, that is, an individual wind farm or turbine. In this article, we focus on a more global scale, the aggregated production of a country or a large region. Regional wind power generation forecast is critical in the context of the European electricity market for several reasons. (i) First, a short-term forecast of up to 48 h is useful for the spot (day-ahead) market, which sets the “final” price of electricity hour by hour according to supply and demand. (ii) Second, Short-term forecasts are useful for the TSO (Transmission System Operator), which has to ensure the balance between supply and demand on the transmission network within its perimeter. (iii) Finally, in the longer term, up to a few days, regional wind power forecasts can be used to anticipate downturns. They correspond to a situation in which a large amount of renewable energy is fed into the grid at the same time. Renewable energies indeed have market priority over, for example, nuclear or coal, which are more expensive to produce.

Wind power generation forecast at a global scale can be done in two ways, either by forecasting each farm in the region (or even each wind turbine) and then adding these forecasts together, or by directly forecasting the aggregated signal. The first method is impractical for the majority of operators, as it requires production data for each park, which is confidential. Moreover, even in cases where the data is available, Wang et al. (Reference Wang, Wang and Wang2017) pointed out that having a forecast system for each wind farm in the region considered can be too costly for some forecast service providers. In this article, we focus on wind power generation forecast at a global scale.

1.3. Contributions

In this study, we propose to leverage the spatial information in NWP wind speed maps for national wind power forecasting by exploiting the capabilities of Deep Learning (DL) models. The overall methodology is illustrated in Figure 1. To fully exploit the potential of the DL mechanisms, we introduce WindDragon, an automated deep-learning framework that uses the tools developed in the DRAGONFootnote ¹ package (Keisler et al., Reference Keisler, Talbi, Claudel and Cabriel2024b). WindDragon attempts to automatically design well-performing neural networks for short-term wind power forecasting using NWP wind speed maps. WindDragon’s performance will be benchmarked against conventional computer vision models such as Convolutional Neural Networks (CNNs) as well as standard baselines in wind power forecasting. The contributions of this study can be summarized as follows:

• We develop a novel automated deep learning framework specifically tailored to forecast aggregated wind power generation from wind speed maps.
• The proposed framework, named WindDragon, is designed to fully leverage the spatial information embedded in wind speed maps and can accommodate increases in installed capacity, making it adaptable and reusable.
• We conduct extensive experiments that demonstrate that WindDragon, when combined with Numerical Weather Prediction (NWP) wind speed maps, significantly outperforms both traditional and state-of-the-art deep learning models in wind power forecasting.

Figure 1. Global scheme for wind power forecasting. Every 6 h, the NWP model produces hourly forecasts. Each map is processed independently by the regressor which maps the grid to the wind power corresponding to the same timestamp.

2. State-of-the-art

Wind power forecasting at the level of a single wind farm is a mature discipline (Jonkers et al., Reference Jonkers, Avendano, Van Wallendael and Van Hoecke2024) on forecast horizons ranging from the next minutes to the next days (see Kariniotakis Reference Kariniotakis2017 for a book on the subject). However, regional forecasting remains largely unexplored in the literature (Higashiyama et al., Reference Higashiyama, Fujimoto and Hayashi2018).

2.1. Regional wind power forecasting

2.1.1. Transfer strategy

Some studies have attempted to take advantage of the wealth of research at the turbine or wind farm scale to forecast regional wind energy. The general idea is to apply a forecasting model to wind turbines or farms whose data are available within the region and use a transfer function to move from local to regional data. For instance, Pinson et al. (Reference Pinson, Siebert and Kariniotakis2003) mentioned a model based on online persistence scaled with a ratio of the total installed capacity in the region and the capacity of wind farms for which online measures are available. Camal et al. (Reference Camal, Girard, Fortin, Touron and Dubus2024) forecasted the production of any wind farm in the control area of a TSO, taking into account the information collected from other wind farms. The method combines feature selection, regularization, and local-learning via conditioning on recent production levels or expected weather conditions.

2.1.2. Input dimension reduction

Approaches that have attempted to forecast regional wind production directly from meteorological data such as NWP maps, or by incorporating operational variables from the (potentially numerous) wind farms in the region, have quickly run into the problem of the large size of the input data. Camal et al. (Reference Camal, Girard, Fortin, Touron and Dubus2024) noticed that at the scale of a region or of a country, the number of explanatory variables grows linearly with the number of explanatory sites or the number of variables considered per site. Both statistical and Machine Learning models face in this case the curse of dimensionality. Therefore, regularization or feature selection was investigated to mitigate the high dimensionality of the input features. Siebert (Reference Siebert2008) used a clustering algorithm based on k-means and a mutual information-based feature selection algorithm to determine the best set of features for the forecast model. Lobo and Sanchez (Reference Lobo and Sanchez2012) searched for samples with similar weather conditions. Davò et al. (Reference Davò, Alessandrini, Sperati, Monache, Airoldi and Vespucci2016) leveraged the principal component analysis (PCA) method to reduce the dimension of the data sets when forecasting regional wind power and solar irradiance. Wang et al. (Reference Wang, Wang and Wang2017) reduced the dimension of the NWP grid with the selection of minimum redundancy characteristics (mRMR) and PCA. They then applied a weighted average learning strategy to forecast the production of a Chinese region. In the study from Wang et al. (Reference Wang, Wang, Liu, Wang and Feng2018), the spatio-temporal weather data is represented using a distance-weighted kernel density estimation model (DWKDE) which is the basis for a feature selection method based on mRMR. Finally, Wang et al. (Reference Wang, Wang, Liu and Wang2019) performed a probabilistic forecasts with regular vine copulad to reduce the weather dataset.

Although this input reduction is necessary for most Machine Learning models, deep learning models have demonstrated high capacities for extracting complex features from high-dimensional data.

2.2. Deep learning for wind power forecasting

Deep learning models have been highly investigated for wind power forecasting both at the turbine level and at the regional aggregation level. A large variety of architectures have been used, depending on the input data available and the features that are sought to be extracted.

Yu et al. (Reference Yu, Yang, Han, Zhang and Ye2021) recognized the abilities of the deep learning model for non-linear mapping and massive data handling and used a feedforward neural network based on historical wind power and NWP information for regional wind power forecasting. To model the time dependencies of the wind power time series, many works leveraged recurrent neural networks and their variants (long short-term memory or gated recurrent unit) such as Liu et al. (Reference Liu, Zhou and Qian2021) or Alkabbani et al. (Reference Alkabbani, Hourfar, Ahmadian, Zhu, Almansoori and Elkamel2023). The interactions between several wind farms have been investigated using the Transformer model by Lima et al. (Reference Lima, Ren and Costa2022) and using graph neural networks by Qiu et al. (Reference Qiu, Shi, Wang, Zhang, Liu and Cheng2024). The direct use of DNN directly on wind speed maps has been tackled using convolutional neural networks (CNNs) which have shown strong capabilities for extracting relevant features from image data. Higashiyama et al. (Reference Higashiyama, Fujimoto and Hayashi2018) used 3-dimensional CNNs to forecast the production of a single wind farm based on NWP grids. Bosma and Nazari (Reference Bosma and Nazari2022) and Jonkers et al. (Reference Jonkers, Avendano, Van Wallendael and Van Hoecke2024) proposed day-ahead regional wind power forecasting CNNs whose architecture was inspired by Computer Vision models such as ResNet (see He et al. Reference He, Zhang, Ren and Sun2016).

The challenge of wind power forecasting is that it combines dependencies to weather variables but remains a time series. Therefore, architectures mixing various types of layers have been investigated to capture various dependencies. Miele et al. (Reference Miele, Ludwig and Corsini2023) compared the performance of CNN-LSTM with a multi-modal neural network with two branches: one for the NWP grid and one for past data, for a single wind farm. Zhou and Lu (Reference Zhou and Lu2023) combined convolution, LSTM, and attention layers to forecast the production of a wind farm. Given this large variety of possible architectures, one might want to use automated tools to find the best one for the dataset at hand.

2.3. Automated deep learning

2.3.1. Main concepts

The research field related to the automation of deep neural network design is called Automated Deep Learning (AutoDL). It belongs to a more global research area called Automated Machine Learning (AutoML) which studies the automatic design of high-performance Machine Learning models. As with any AutoML approach, AutoDL systems consist of three main components: the search space, the search strategy, and the performance evaluation. The search space should contain all the considered neural network architectures and hyperparameters which is the set of all available design choices, like the number and type of layers in the neural network, the connection between the layers, or the training parameters, like the learning rate. The search strategy will determine how to navigate within the search space to select promising configurations. The bigger the search space, the more sophisticated the search strategy should be for effective exploration. The performance evaluation will assess the performance of the candidate configurations until the search strategy finds a suitable neural network (usually the best configuration found after a given number of evaluations).

2.3.2. AutoDL for wind power forecasting

A few works have applied AutoDL to wind power forecasting, such as Tu et al. (Reference Tu, Roberts, Prasad, Nayak, Jain, Sala, Ramakrishnan, Talwalkar, Neiswanger and White2022) or Jalali et al. (Reference Jalali, Ahmadian, Khodayar, Khosravi, Shafie-khah, Nahavandi and Catalao2022). However, these approaches are limited to optimizing the hyperparameters of one type of architecture, possibly integrating a few architectural hyperparameters such as the number of layers. The AutoDL community has developed a large number of tools to optimize neural network architectures more broadly, but as Tu et al. (Reference Tu, Roberts, Prasad, Nayak, Jain, Sala, Ramakrishnan, Talwalkar, Neiswanger and White2022) points out, the search spaces used by these approaches are tailored to Computer Vision and Natural Language Processing tasks. For example, Hutter et al. (Reference Hutter, Kotthoff and Vanschoren2019) reviewed many approaches based on (hierarchical) cell-based search spaces, where the neural networks are represented as a sequence of small iterated Directed Acyclic Graphs (DAGs) called cells. The architecture of the cell is optimized and then the pattern is repeated throughout the network. Such an approach is efficient for Computer Vision tasks, where models that repeat sequences of convolutional pooling layers and skip connections are very powerful. Another popular approach is DARTS, proposed by Liu et al. (Reference Liu, Simonyan and Yang2018), which uses a meta-architecture that is designed to include all possible architectures. The general structure of the network is fixed, and for each layer several candidate operations are possible. Each is associated with a probability of being chosen, which is optimized by gradient descent. This approach, which is effective for generating architectures based on $ 3\times 3 $ or $ 5\times 5 $ convolutions, has a very limited search space and assumes that the subgraph obtained by keeping only the operation with the highest probability for each layer is the optimal graph. More diverse tasks have been tackled by the AutoDL framework AutoPytorch, which offers a version for tabular data, described in Zimmer et al. (Reference Zimmer, Lindauer and Hutter2020), and for time series forecasting, see Deng et al. (Reference Deng, Karl, Hutter, Bischl and Lindauer2022), providing search spaces of MLPs and residual connections for the tabular version, and various encoder/decoder blocks for the time series version to cover several state-of-the-art architectures in time series (e.g., TFT from Lim et al., Reference Lim, Arık, Loeff and Pfister2021, NBEATS from Oreshkin et al., Reference Oreshkin, Carpov, Chapados and Bengio2019, or DeepAR from Salinas et al., Reference Salinas, Flunkert, Gasthaus and Januschowski2020). All search spaces for the above AutoDL approaches have been restricted to allow effective searching. This observation is shared more generally by recent reviews such as White et al. (Reference White, Safari, Sukthanker, Ru, Elsken, Zela, Dey and Hutter2023) on AutoDL and Baratchi et al. (Reference Baratchi, Wang, Limmer, van Rijn, Hoos, Bäck and Olhofer2024) on AutoML. In the case of wind production forecasting, as indicated by Tu et al. (Reference Tu, Roberts, Prasad, Nayak, Jain, Sala, Ramakrishnan, Talwalkar, Neiswanger and White2022), we would like to have a search space for designing architectures that combine different types of layers such as MLPs, CNNs, or attention, that also have computational graphs that are more complex than a linearly sequential architecture, and whose hyperparameters can be optimized, as they are crucial in this type of task. The AutoDL package DRAGON, recently introduced in Keisler et al. (Reference Keisler, Talbi, Claudel and Cabriel2024b), provides tools for designing such search spaces. The package has already been used to create EnergyDragon (see Keisler et al., Reference Keisler, Claudel, Cabriel and Brégère2024a), an AutoDL framework for forecasting load consumption.

2.4. DRAGON package

DRAGON, or DiRected Acyclic Graphs optimizatioN, is an open-source Python packageFootnote ² offering tools to conceive Automated Deep Learning frameworks for diverse tasks. The package is based on three main elements: building bricks for search space design, search operators for those bricks, and search algorithms.

2.4.1. Search space

DRAGON offers several building bricks to encode deep neural network architectures and hyperparameters. The network structures are represented as Directed Acyclic Graphs, where the nodes represent the layers and the edges the connection between them. The layers are encoded by a succession of three elements: a combiner, an operation, and an activation function. As no constraint is made on the graph structure, each node may receive an arbitrary number of incoming inputs of various sizes. They are gathered into a single input through the combiner. The operation can be any PyTorch building block parametrized by a set of hyperparameters. The DRAGON user has to specify which kind of building blocks the search space should contain, and for each, the associated hyperparameters. Besides the DAGs, the user can choose to optimize other hyperparameters such as the learning rate, the output shape of the last layer, etc. The hyperparameters may be numerical or categorical. The graph encoding can be used to represent the entire structure, but it is also possible to design more specific search spaces for certain applications. For example, it is possible to combine different graphs for a Transformer-type structure (see Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017 for an introduction to the Transformer model), with one graph for the encoder part and another graph for the decoder part, in order to impose a two-part structure. In the process of creating an AutoDL framework based on DRAGON, the selection of appropriate building blocks from the package is essential for generating a suitable search space.

2.4.2. Performance evaluation

The search space has been designed for a specific performance evaluation strategy, which will assess the score of a given configuration from the search space. DRAGON does not provide any default performance evaluation, which depends on the task at hand. Therefore, it should be implemented within the created AutoDL framework. Given an element from the search space, the performance evaluation should at least build a model and perform any type of training/validation process on the data.

2.4.3. Search Operators

Each building block from DRAGON comes with a neighbor attribute that defines how to create a neighboring value from a representation. Those operators can be seen as mutations in the case of an evolutionary algorithm or neighborhood operators for a simulated annealing or a local search. In the case of an integer, for example, the neighbor attribute will pick the new value in a range surrounding the actual one. For the DAGs, it is possible to add or delete nodes, or to modify the edges and the node’s contents.

2.4.4. Search Algorithms

The package implements several search strategies which may use the search operators and can be distributed in a high-performance computing (HPC) environment. Besides the random search, Hyperband (see Li et al. (Reference Li, Jamieson, DeSalvo, Rostamizadeh and Talwalkar2018)), an evolutionary algorithm and Mutant-UCB presented in Brégère and Keisler (Reference Brégère and Keisler2024) are available. They take as input the search space and the performance evaluation designed by the user and return the best configuration.

For more information on the DRAGON package see the original article Keisler et al. (Reference Keisler, Talbi, Claudel and Cabriel2024b) or the documentation onlineFootnote ³.

3. WindDragon

We used the tools provided by DRAGON to create WindDragon, an AutoDL framework for regression on wind speed maps toward regional wind power forecasting. The framework takes as input two datasets $ {\mathcal{D}}_{\mathrm{train}} $ and $ {\mathcal{D}}_{\mathrm{valid}} $ . Each dataset $ \mathcal{D} $ is made up of pairs $ \left({X}_t,{Y}_t\right) $ for several time steps $ t $ , where $ {X}_t\in {\unicode{x211D}}^2 $ is a wind speed map and $ {Y}_t\in {\unicode{x211D}}^R $ are the associated wind production values, one for each of the $ R $ regions. First, the framework creates wind speed maps by region r: $ {X}_t^r $ . Two datasets $ {\mathcal{D}}_{\mathrm{train}}^r=\left({X}^r,{Y}^r\right) $ and $ {\mathcal{D}}_{\mathrm{valid}}^r=\left({X}^r,{Y}^r\right) $ are put together for each region $ r $ with these regional wind speed maps and the associated regional production. WindDragon aims at finding, for each region $ r $ , the optimal model $ {\hat{f}}^r $ from a search space $ \Omega $ with respect to a loss function $ \mathrm{\ell} $ such that:

(1)

$$ {\hat{f}}^r\underset{f\in \Omega}{\arg \min}\mathrm{\ell}\left({f}_{\hat{\delta}},{\mathcal{D}}_{\mathrm{valid}}^r\right), $$

where the model $ {f}_{\hat{\delta}} $ corresponds to the model $ f\in \Omega $ trained on $ {\mathcal{D}}_{\mathrm{train}}^r $ .

3.1. Search space and performance evaluation

3.1.1. Data processing

The input data $ {X}_t $ contains the wind speed map corresponding to the whole country and has to be divided into regional data. As shown Figure 2 for a specific region (here Auvergne-Rhône-Alpes), wind turbines are not evenly distributed across the administrative regions. Therefore, instead of using them, we draw areas around each wind farm in the region and took the convex hull of all the considered points. The result is a seamless map $ {X}_t^r\subset {X}_t\in {\unicode{x211D}}^2 $ that includes local wind turbines with no gaps to disrupt the models. The areas surrounding the wind farms are drawn according to a distance parametrized by a parameter called $ g\in {\mathrm{\mathbb{N}}}^{\star } $ . When $ g $ gets higher, the convex hull becomes larger. Installed capacity data—corresponding to the maximum wind power a region can produce—for each region and each time step $ t $ is available and updated every 3 months. It was collected and used to scale the wind power target to train the models. Training the model $ f $ on the region $ r $ with respect to the training loss $ {\mathrm{\ell}}_{\mathrm{train}} $ , means finding the model optimal weights $ \hat{\delta}\in \Delta $ such that:

(2)

$$ \hat{\delta}\in \underset{\delta \in \Delta}{\mathrm{argmin}}\;{\mathrm{\ell}}_{\mathrm{train}}\left({f}_{\delta}\left({X}^r\right),\frac{Y^r}{c^r}\right), $$

where $ {c}^r\in \unicode{x211D} $ is the installed capacities for the region $ r $ and $ {\mathcal{D}}_{\mathrm{train}}^r=\left({X}^r,{Y}^r\right) $ . The evaluation of the model $ f $ on $ {\mathcal{D}}_{\mathrm{valid}}^r $ is made on the denormalized value $ {Y}^r $ .

Figure 2. Data preparation for the region Auvergne-Rhône-Alpes. The wind farms are represented in red. The first image shows the distribution of wind farms across the administrative region.

3.1.2. Search space

Each model $ f\in \Omega $ has to forecast a one-dimensional output $ {Y}_t^r\in \unicode{x211D} $ from a two-dimensional input: the wind speed map $ {X}_t^r\in {\unicode{x211D}}^2 $ . Therefore, each neural network from $ \Omega $ is made of two Directed Acyclic Graphs as represented in Figure 3. A first graph $ {\Gamma}_1 $ processes 2D data and can be composed of convolutions, pooling, normalization, dropout, and attention layers. Then, a flattened layer and a second graph $ {\Gamma}_2 $ follow. This one is composed of MLPs, self-attention, convolutions, and pooling layers. A final MLP layer is added at the end of the model to convert the latent vector to the desired output format. The detailed operations and hyperparameters available within WindDragon are detailed in Table 1. Regarding the parameters that are external to the architecture, the weather map size parameter $ g $ is also optimized. The search space is then: $ \left[{\Gamma}_1,{\Gamma}_2,o,g\right] $ where $ o $ represents the final MLP layer, which is a constant.

Figure 3. WindDragon’s meta-model for wind power forecasting.

Table 1. Layers available and their associated hyperparameters in the WindDragon search space (for the first and the second graph)

3.1.3. Performance Evaluation

The performance evaluation takes as input a region $ r $ and a configuration from the search space and will:

• Construct the datasets $ {\mathcal{D}}_{\mathrm{train}}^r $ and $ {\mathcal{D}}_{\mathrm{valid}}^r $ from $ {\mathcal{D}}_{\mathrm{train}} $ and $ {\mathcal{D}}_{\mathrm{valid}} $ according to the parameter $ g $ parameterizing the grid size, from the configuration.
• Build the model $ {f}^r $ with the elements from the configuration and train the model on $ {\mathcal{D}}_{\mathrm{train}}^r $ according to Equation (2).
• Evaluate the performance of $ {f}_{\hat{\delta}}^r $ on $ {\mathcal{D}}_{\mathrm{valid}}^r $ according to Equation (1).

3.2. Search algorithm

Regarding the search algorithm, four are available within DRAGON: the Random Search, HyperBand (Li et al., Reference Li, Jamieson, DeSalvo, Rostamizadeh and Talwalkar2018), an Evolutionary Algorithm, and Mutant-UCB. In Brégère and Keisler (Reference Brégère and Keisler2024) introducing this last algorithm, the four are compared and Mutant-UCB appears as the most efficient one.

3.2.1. Mutant-UCB

This algorithm combines a multi-armed bandits approach with evolutionary operators. Each model $ f\in \Omega $ corresponds to an arm, an choosing arm corresponds to a partial training of the model. Indeed, training a neural network takes a lot of time, and a lot of algorithms such as the Random Search or the Evolutionary Algorithms give the same amount of resources for all the evaluated configurations. It means such algorithms are losing a lot of time and computational resources on bad configurations. Resource allocation strategies used for example by HyperBand, allows to gradually attribute resources to the most promising solutions. A partial training can then be, for example, a training on a small set of data or with a small number of epochs. In short, Mutant-UCB generates a population of $ K\in {\mathrm{\mathbb{N}}}^{\star } $ of random configurations. For each arm $ k $ from this population, a partial training is made to get a first loss $ {\mathrm{\ell}}_k $ . Then, at each iteration $ i $ , an arm $ {I}_i $ from the population is drawn following an Upper-Confidence-Bound strategy:

$$ {I}_i\in \underset{k\in \left\{1,\dots K\right\}}{\mathrm{argmin}}\left\{{\hat{\mathrm{\ell}}}_k-\sqrt{\frac{E}{N_k}}\right\}, $$

where $ {\hat{\mathrm{\ell}}}_k $ is the average loss for all the previous partial training of the model associated to the arm k, $ E $ is the exploration parameters and $ {N}_k $ the number of times the arm $ k $ has been picked. Once the arm $ {I}_i $ is chosen, with a probability $ 1-{\overline{N}}_{I_i}/N $ , the model is mutated. Otherwise, a new partial training is done. The value $ N $ corresponds to the maximum number of partial training a model can have (to prevent overfitting) and $ {\overline{N}}_{I_i} $ corresponds to the number of times the model associated to $ {I}_i $ has been trained. In the case of a mutant creation, the number of arms $ K $ increases, and the new model is partially trained for the first time. For more information on Mutant-UCB please refer to Brégère and Keisler (Reference Brégère and Keisler2024).

3.2.2. Partial training

In the original article, the partial training were done on a small number of epochs. For WindDragon, we changed it to be a small number of epochs on a given region. Instead of running one version of Mutant-UCB, we performed one optimization for all regions. We indeed make the assumption that a similar architecture will fit for all the regions, even if some layers or hyperparameters might change from one region to another. The input $ {X}^r $ might be of different shapes for different regions. This shape change is handled by DRAGON when building the neural network f. The layers and DAGs from the package may be adapted by weight cropping or padding to any new shape during the network initialization. Splitting the training between different regions follows the spirit of Mutant-UCB, where the loss minimized to pick the future arm relies on the empirical mean of the various partial trainings of a model f. The performance across the regions might be different, and converging towards a model generally good over all regions can be done by taking this empirical mean. To reduce the variance between the performance of the region, the loss $ \mathrm{\ell} $ considered to evaluate a model $ f $ on a given region would be an error function (such as the mean squared error, the mean absolute error or a variant) of f, divided by this same error function but of a reference model. See Section 4 for more information.

4. Experiments

4.1. Datasets

The wind speed maps used are 100-m high forecasts at a 9 km resolution provided by the HRESFootnote ⁴ model from the European Centre for Medium-Range Weather Forecasts (ECMWF). The maps are provided at an hourly time step and there are four forecast runs per day (every 6 h). Only the six more recent forecasts are used here as the forecasting horizon of interest is 6 h. The hourly French regional and national wind power generation data as well as the French TSO hourly forecasts and the installed capacities values come from the ENTSOE-E Transparency PlatformFootnote ⁵.

4.2. Baselines

We use the following baselines to compare hourly forecasts for a horizon $ h $ $ \left(h\in \left\{1,\dots, 6\right\}\right) $ :

• Persistence: Given access to forecasts every 6 h derived from the ground truth situation, the wind power value is also available at the same intervl. Persistence involves replicating this value for the subsequent 6 h. Therefore, the model predicts wind power generation at future times $ t+h $ as equal to the observed generation at the current time t.
• XGB on Wind Speed Mean: Forecasts wind power at $ t+h $ using a two-step approach as depicted Figure 4: (i) Compute the mean wind speed for the considered region at $ t+h $ using NWP forecasts. (ii) Apply an XGBoost regressor (Chen and Guestrin, Reference Chen and Guestrin2016) to predict power generation based on the computed mean wind speed.
• Convolutional Neural Networks (CNNs). Use the same training setup as WindDragon: forecasts wind power at $ t+h $ using the NWP forecasted wind speed map. CNNs can efficiently regress a structured map on a numerical value by learning local and spatial patterns (LeCun et al., Reference LeCun and Bengio1995). In addition, the weight sharing induced by the convolutional mechanism reduces the number of learned weights compared to alternative deep learning mechanisms like dense (Haykin, Reference Haykin1994) or self-attention layers (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017). This feature makes CNNs particularly effective when dealing with relatively small amounts of data. Figure 5 shows the architecture of the CNN baseline we implemented. We used a simple grid search to optimize the hyperparameters (e.g., the number of layers, the kernel sizes, the activation functions).
• French TSO (RTE). The European TSOs have to provide Current, IntraDay, and Day-Ahead wind and solar forecasts. We have used the Current forecast within our baseline to put the results into perspective with operational values. The forecasting methods and horizons are not detailed. The regulatory articleFootnote ⁶ only states that the published “Current” forecast is the latest update of the forecast. The information is regularly updated and published during intra-day trading. It is the closest setup from our experiments.

Figure 4. Visual illustration of the XGB two-step approach on the Auvergne-Rhône-Alpes region.

Figure 5. CNN architecture applied to the Grand Est region.

4.2. Experimental setup

We used the years from 2018 to 2019 to train the models, and the data from 2020 is used to evaluate how the models perform. All the neural networks were trained using the Adam optimizer. The CNN was trained for 200 epochs. Mutant-UCB was parametrized with $ N=10 $ , $ K=600 $ , $ E=0.01 $ and 20 epochs by partial training. The CNN model was given as input to the search algorithm. Among the first $ K $ models initialized, 10 had the CNN architecture, with values of $ g $ ranging from 1 to 10. The CNN losses were used to scale the regional errors for WindDragon. Mutant-UCB was distributed over 20 V100 GPUs and ran for 72 h.

4.3. Results

We computed two scores: Mean Absolute Error (MAE) in Megawatts (MW), showing the absolute difference between ground truth and forecast, and Normalized Mean Absolute Error (NMAE), a percentage obtained by dividing the MAE by the average wind power generation for the test year. The MAE gives an idea of the amount of energy contained in the errors, while the NMAE enables performance to be compared between regions. We run experiments for each of the 12 French metropolitan regions and then aggregate the forecasts to derive national results. Let us have $ {\hat{y}}_{t,m}^r $ the forecast of the baseline $ m $ on the region $ r $ at time t. We get the national forecast $ {\hat{Y}}_m={\left\{{\hat{y}}_{t,m}\right\}}_{t=1}^N $ by aggregating the forecasts of the 12 French metropolitan regions:

$$ {\hat{y}}_{t,m}=\sum \limits_{r=1}^{12}{\hat{y}}_{t,m}^r. $$

Then, the national metrics for each baseline $ m $ are retrieved between the national value $ Y $ and the national forecast of this baseline: $ {\hat{Y}}_m $ . The national results are presented in Table 2, while detailed regional results can be found in Table 3. It is interesting to note that the sum of the regional errors is greater than the national error for each model. This is due to the fact that the regional errors offset each other when the signals are aggregated.

Table 2. National results: metrics computed on the aggregation of the regional forecasts for each model. The best results are highlighted in bold and the best second results are underlined

Table 3. Regional results. The best results are highlighted in bold and the best second results are underlined

The results in Table 2 highlight three key findings:

i. Improved performance with aggregated NWP statistics. Using the average of NWP-predicted wind speed maps coupled with an XGB regressor significantly outperforms the naive persistence baseline. It shows that the signal is closer to a regression problem than to a time series forecasting one. It is also interesting to note that this simple model is already better than the signal produced by the French TSO.
ii. Gains from full NWP map utilization. More complex patterns can be captured by using the full predicted wind speed map, as opposed to just the average, thereby improving forecast accuracy. In this context, the CNN regressor applied to full maps yielded gains of 47 MW (11.5%) over the mean-based XGB.
iii. WindDragon’s superior performances. WindDragon outperforms all baselines, showing an improvement of 69 MW (19%) over the CNN. On an annual basis, this corresponds to approximately 600 GWh. The average French citizen consumes between 2500 and 3000 kWhFootnote ⁷ of electricity per year. Therefore, 600 GWh per year is equivalent to the consumption of around 200,000 French inhabitants. The results underscore WindDragon’s effectiveness in autonomously discovering the optimal deep-learning configurations for wind power regression. Moreover, Table 3 indicates that the improvement is effective in all regions. During optimization, WindDragon managed to find, for each region, a model that outperformed each other from the baseline. The architectures found vary a bit from one region to another. Examples of the models produced by WindDragon for various regions can be found Figures A1–A5 The architectures mix various layers such as convolutions, pooling, and normalization layers. The structures are, for the majority, composed of a large two-dimensional graph, efficiently extracting spatial information from the input wind speed map and a small one-dimensional graph. The hyperparameters are however unique for each model.

4.4. Forecasts comparison

In Figure 6, we present the aggregated national wind power forecasts using both WindDragon and the CNN baseline during a given week. While both models deliver highly accurate forecasts, it is important to highlight that DRAGON demonstrates superior accuracy, particularly during the high production level at the end of the signal. Figure A6 shows visual comparisons of all baseline performances on this same week. It appears that the models perform well at different times. For example, the RTE forecast is best for the small production spike in the middle of the day on 11 January, but worst for the production dip on the night of 10 January. These differences in performance open the way to mixtures of models to further improve forecasts.

Figure 6. Wind power forecasts for a week in January 2020. The figure displays the ground truth as dotted lines, and the forecasts from the two top-performing models, WindDragon and CNN.

4.5. Performance analysis

We compared the performance of the two best baselines, CNN and WindDragon, in more detail. Figure 7 shows the absolute errors and the normalized absolute errors by hour of the day and by month. In general, WindDragon is significantly better than CNN at all times of the day and for all months. In Figure 7a,b, the dotted line represents the hour when a new NWP forecast arrives (every 6 h). For the first two forecasts of the day (at midnight and 6 a.m.), the performance of both models decreases as the forecast horizon increases. This is much more marked in the case of CNN, whose performance deteriorates dramatically, particularly at 6 a.m. (when the forecast horizon is therefore 6 h). This observation is less true for the later hours of the day. As for the months, the differences are more pronounced in summer, when wind power production is lower. Finally, we have plotted Figure 8a the mean absolute errors of CNN and WindDragon per quantile of the wind power distribution. We can see from this distribution that the two curves diverge particularly at the first quantile, where the production values are extremely low, and at the last quantile, where they are extremely high. The two curves never cross, demonstrating the homogeneous superiority of WindDragon over CNN. Figure 8b shows the skill score between the MAE of WindDragon and the MAE of the reference model, the CNN, which confirms the impression given by Figure 8a.

Figure 7. Errors comparison between WindDragon and the CNN. The dotted vertical lines in Figure 7a,b represent the beginning of the new NWP forecast.

Figure 8. Comparison of the CNN and WindDragon performance over 20 quantiles. The two figures show WindDragon’s superiority over CNN over the entire distribution, but particularly over the distribution tails.

4.6. WindDragon search algorithm (Mutant-UCB) time convergence

Mutant-UCB ran for 72 h on 20 GPUs. However, we saved the losses of the models found by the algorithm as it ran so that we could analyze its convergence time. Figure 9a shows the best NMAE found per time step for each region. We can see that the performance converges very quickly during the first 2 h of the algorithm before stabilizing. Only a few regions such as Ile-de-France, Auvergne-Rhône-Alpes, and Centre-Val de Loire show improvements in the last hours. Figure 9b zooms in on the first 3 h of the algorithm. Except for PACA and Ile-de-France, most regions fall below 15% of NMAE in about an hour. Thus, although Mutant-UCB has run for a long time to achieve very good performance, it was possible to obtain correct models in just 1 h.

Figure 9. WindDragon search algorithm (Mutant-UCB) convergence: NMAE through time for each region.

5. Conclusion and impact statement

5.1. Summary

This article presents WindDragon, an Automated Deep Learning framework for forecasting regional wind power. WindDragon automated the creation of performing Deep Neural Networks leveraging Numerical Weather Prediction wind speed maps to deliver wind production forecasts. We demonstrate on the French national and regional wind production data that WindDragon can find deep neural networks outperforming traditional and state-of-the-art deep learning models in regional wind power forecasting. Compared to the handcrafted deep learning model inspired by the state of the art in computer vision, WindDragon allows us to find models that perform particularly well in winter and at high wind values, which is all the more interesting in the context of wind power forecasting.

5.2. Limitations

WindDragon, like many AutoML systems, is limited by its high running time compared to handcrafted baselines. However, this duration should be compared to the time spent creating powerful models by hand, which is often hard to measure. Besides, once the model has been found, the inference speed remains competitive with other deep learning models. However, future study could focus on reducing this running training time through even more efficient search algorithms or reducing the search space. This gained efficiency could also be achieved by reducing the input weather map dimension, for example, using unsupervised representation techniques. The high number of model training and evaluations could be leveraged by creating a mix of models instead of just identifying the best one by region. Section 4 highlighted that the baseline models produced quite different forecasts. These differences, if complementary, could enable a mix of models to achieve better performance.

5.3. Future study

Finally, with the rise of data-driven weather forecasting tools, the accuracy of weather forecasting has increased at various forecast horizons (Ben Bouallègue et al., Reference Bouallègue, Clare, Magnusson, Gascon, Maier-Gerber, Janoušek, Rodwell, Pinault, Dramsch and Lang2024) and for multiple weather variables. With its non-dependency on past data, our methodology could easily be applied to longer forecast horizons (to be used for other industrial use cases) but also for photovoltaic (PV) regional forecasting, by applying it to solar radiation maps generated by NWP models.

Acknowledgments

We are grateful for the technical advice and careful proofreading of Ghislain Agua and Yannig Goude.

Author contribution

Conceptualization: J. K.; E. L. N., Methodology: J. K.; E. L. N., Data curation: J. K.; E. L. N., Data visualization: J. K.; E. L. N., Writing original draft: J. K.; E. L. N., All authors approved the final submitted draft.

Competing interest

The authors declare none.

Data availability statement

We use open-source data for wind power generation given by the French TSO: https://www.rte-france.com/eco2mix. However, NWP maps are not open source.

Ethics statement

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Funding statement

This research was supported by grants from EDF (Electricité de France).

A. Appendix

A.1. Models found by WindDragon for various regions

Figure A1. Architecture found by WindDragon on Grand Est.

Figure A2. Architecture found by WindDragon on Auvergne-Rhône-Alpes.

A.2. Forecasts comparison

Figure A3. Architecture found by WindDragon on Hauts-de-France.

Figure A4. Architecture found by WindDragon on Île-de-France.

Figure A5. Architecture found by WindDragon on Occitanie.

Figure A6. Weekly comparative visuals.

Footnotes

¹ https://dragon-tutorial.readthedocs.io/en/latest/index.html.

² https://dragon-tutorial.readthedocs.io/en/latest/.

³ https://dragon-tutorial.readthedocs.io/en/latest/index.html.

⁴ https://www.ecmwf.int/en/forecasts/datasets/set-i.

⁵ https://transparency.entsoe.eu/.

⁶ https://transparencyplatform.zendesk.com/hc/en-us/articles/16648445340180-Generation-Forecasts-for-Wind-and-Solar-14-1-D.

⁷ Based on the average European per capita consumption [Statista Research Department, 2022].

References

Alkabbani, H, Hourfar, F, Ahmadian, A, Zhu, Q, Almansoori, A and Elkamel, A (2023) Machine learning-based time series modelling for large-scale regional wind power forecasting: A case study in Ontario, Canada. Cleaner Energy Systems 5, 100068.CrossRef Google Scholar

Baratchi, M, Wang, C, Limmer, S, van Rijn, JN, Hoos, H, Bäck, T and Olhofer, M (2024) Automated machine learning: Past, present and future. Artificial Intelligence Review 57(5), 1–88.CrossRef Google Scholar

Bouallègue, ZB, Clare, MC, Magnusson, L, Gascon, E, Maier-Gerber, M, Janoušek, M, Rodwell, M, Pinault, F, Dramsch, JS, Lang, ST, et al. (2024) The rise of data-driven weather forecasting: A first statistical assessment of machine learning–based weather forecasts in an operational-like context. Bulletin of the American Meteorological Society 105(6), E864–E883.CrossRef Google Scholar

Bosma, SB and Nazari, N (2022) Estimating solar and wind power production using computer vision deep learning techniques on weather maps. Energy Technology 10(8), 2200289.CrossRef Google Scholar

Brégère, M and Keisler, J (2024) A bandit approach with evolutionary operators for model selectionarXiv preprint arXiv:2402.05144.Google Scholar

Camal, S, Girard, R, Fortin, M, Touron, A and Dubus, L (2024) A conditional and regularized approach for large-scale spatiotemporal wind power forecasting. Sustainable Energy Technologies and Assessments 65, 103743.CrossRef Google Scholar

Chen, T and Guestrin, C (2016) Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794.CrossRef Google Scholar

Davò, F, Alessandrini, S, Sperati, S, Monache, LD, Airoldi, D and Vespucci, MT (2016) Post-processing techniques and principal component analysis for regional wind power and solar irradiance forecasting. Solar Energy 134, 327–338.CrossRef Google Scholar

Deng, D, Karl, F, Hutter, F, Bischl, B and Lindauer, M (2022) Efficient automated deep learning for time series forecasting. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp. 664–680.Google Scholar

Haykin, S (1994) Neural Networks: A Comprehensive Foundation. Prentice Hall PTR.Google Scholar

He, K, Zhang, X, Ren, S and Sun, J (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.CrossRef Google Scholar

Higashiyama, K, Fujimoto, Y and Hayashi, Y (2018) Feature extraction of nwp data for wind power forecasting using 3d-convolutional neural networks. Energy Procedia 155, 350–358.CrossRef Google Scholar

Hutter, F, Kotthoff, L and Vanschoren, J (2019) Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.CrossRef Google Scholar

International Energy Agency (IEA). Wind Power Generation, 2023. https://www.iea.org/energy-system/renewables/wind. IEA, Paris.Google Scholar

Jalali, SMJ, Ahmadian, S, Khodayar, M, Khosravi, A, Shafie-khah, M, Nahavandi, S and Catalao, JP (2022) An advanced short-term wind power forecasting framework based on the optimized deep neural network models. International Journal of Electrical Power & Energy Systems 141, 108143.CrossRef Google Scholar

Jonkers, J, Avendano, DN, Van Wallendael, G and Van Hoecke, S (2024) A novel day-ahead regional and probabilistic wind power forecasting framework using deep cnns and conformalized regression forests. Applied Energy 361, 122900.CrossRef Google Scholar

Kariniotakis, G (2017) Renewable Energy Forecasting: From Models to Applications. Woodhead Publishing.Google Scholar

Keisler, J, Claudel, S, Cabriel, G and Brégère, M (2024a) Automated deep learning for load forecasting. In Proceedings of the Third International Conference on Automated Machine Learning, Volume 256 of Proceedings of Machine Learning Research. PMLR, 16/1–28.Google Scholar

Keisler, J, Talbi, E-G, Claudel, S and Cabriel, G (2024b) An algorithmic framework for the optimization of deep neural networks architectures and hyperparameters. Journal of Machine Learning Research 25(201), 1–33.Google Scholar

LeCun, Y, Bengio, Y, et al. (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10), 1995.Google Scholar

Li, L, Jamieson, K, DeSalvo, G, Rostamizadeh, A and Talwalkar, A (2018) Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18(185), 1–52.Google Scholar

Lim, B, Arık, SÖ, Loeff, N and Pfister, T (2021) Temporal fusion transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting 37(4), 1748–1764.CrossRef Google Scholar

Lima, F, Ren, TI and Costa, A (2022) Wind power forecast based on transformers and clustering of wind farms with temporal and spatial interdependence. In International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 01–06.Google Scholar

Liu, H, Simonyan, K, and Yang, Y (2018) Darts: Differentiable architecture search. In International Conference on Learning Representations.Google Scholar

Liu, X, Zhou, J and Qian, H (2021) Short-term wind power forecasting by stacked recurrent neural networks with parametric sine activation function. Electric Power Systems Research 192, 107011.CrossRef Google Scholar

Lobo, MG and Sanchez, I (2012) Regional wind power forecasting based on smoothing techniques, with application to the spanish peninsular system. IEEE Transactions on Power Systems 27(4), 1990–1997.CrossRef Google Scholar

Miele, ES, Ludwig, N and Corsini, A (2023) Multi-horizon wind power forecasting using multi-modal spatio-temporal neural networks. Energies 16(8), 3522.CrossRef Google Scholar

Oreshkin, BN, Carpov, D, Chapados, N and Bengio, Y (2019) N-beats: Neural basis expansion analysis for interpretable time series forecastingarXiv preprint arXiv:1905.10437.Google Scholar

Pinson, P, Siebert, N and Kariniotakis, G (2003) Forecasting of regional wind generation by a dynamic fuzzy-neural networks based upscaling approach. In EWEC 2003 (European Wind Energy and Conference), 5.Google Scholar

Qiu, H, Shi, K, Wang, R, Zhang, L, Liu, X and Cheng, X (2024) A novel temporal–spatial graph neural network for wind power forecasting considering blockage effects. Renewable Energy 227, 120499.CrossRef Google Scholar

Salinas, D, Flunkert, V, Gasthaus, J and Januschowski, T (2020) Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36(3), 1181–1191.CrossRef Google Scholar

Siebert, N (2008) Development of Methods for Regional Wind Power Forecasting. Theses, École Nationale Supérieure des Mines de Paris. https://pastel.hal.science/tel-00287551.Google Scholar

Statista Research Department (2022) Europe: Electricity Demand Per Capita. https://www.statista.com/statistics/1262471/per-capita-electricity-consumption-europe/.Google Scholar

Tu, R, Roberts, N, Prasad, V, Nayak, S, Jain, P, Sala, F, Ramakrishnan, G, Talwalkar, A, Neiswanger, W and White, C (2022) Automl for climate change: A call to actionarXiv preprint arXiv:2210.03324.Google Scholar

United Nations Convention on Climate Change. Paris Agreement. Climate Change Conference (COP21), 2015. https://unfccc.int/sites/default/files/english_paris_agreement.pdf.Google Scholar

Vaswani, A, Shazeer, N, Parmar, N, Uszkoreit, J, Jones, L, Gomez, AN, Kaiser, Ł and Polosukhin, I (2017) Attention is all you need. Advances in Neural Information Processing Systems 30.Google Scholar

Wang, Z, Wang, W and Wang, B (2017) Regional wind power forecasting model with nwp grid data optimized. Frontiers in Energy 11, 175–183.CrossRef Google Scholar

Wang, Z, Wang, W, Liu, C, Wang, B and Feng, S (2018) Short-term probabilistic forecasting for regional wind power using distance-weighted kernel density estimation. IET Renewable Power Generation 12(15), 1725–1732.CrossRef Google Scholar

Wang, Z, Wang, W, Liu, C and Wang, B (2019) Forecasted scenarios of regional wind farms based on regular vine copulas. Journal of Modern Power Systems and Clean Energy 8(1), 77–85.CrossRef Google Scholar

White, C, Safari, M, Sukthanker, R, Ru, B, Elsken, T, Zela, A, Dey, D and Hutter, F (2023) Neural architecture search: Insights from 1000 papersarXiv preprint arXiv:2301.08727.Google Scholar

Yu, Y, Yang, M, Han, X, Zhang, Y and Ye, P (2021) A regional wind power probabilistic forecast method based on deep quantile regression. IEEE Transactions on Industry Applications 57(5), 4420–4427.CrossRef Google Scholar

Zhou, L and Lu, R (2023) Attention-based convolutional neural network-long short-term memory network wind power forecasting. In 2003 3rd New Energy and Energy Storage System Control Summit Forum (NEESSC). IEEE, pp. 294–297.Google Scholar

Zimmer, L, Lindauer, M and Hutter, F (2020) Auto-pytorch tabular: Multi-fidelity metalearning for efficient and robust autodl. CoRR, abs/2006.13799. URL https://arxiv.org/abs/2006.13799.Google Scholar

Figure 2. Data preparation for the region Auvergne-Rhône-Alpes. The wind farms are represented in red. The first image shows the distribution of wind farms across the administrative region.

Figure 3. WindDragon’s meta-model for wind power forecasting.

Table 1. Layers available and their associated hyperparameters in the WindDragon search space (for the first and the second graph)

Figure 4. Visual illustration of the XGB two-step approach on the Auvergne-Rhône-Alpes region.

Figure 5. CNN architecture applied to the Grand Est region.

Table 2. National results: metrics computed on the aggregation of the regional forecasts for each model. The best results are highlighted in bold and the best second results are underlined

Table 3. Regional results. The best results are highlighted in bold and the best second results are underlined

Figure 6. Wind power forecasts for a week in January 2020. The figure displays the ground truth as dotted lines, and the forecasts from the two top-performing models, WindDragon and CNN.

Figure 7. Errors comparison between WindDragon and the CNN. The dotted vertical lines in Figure 7a,b represent the beginning of the new NWP forecast.

Figure 9. WindDragon search algorithm (Mutant-UCB) convergence: NMAE through time for each region.

Figure A1. Architecture found by WindDragon on Grand Est.

Figure A2. Architecture found by WindDragon on Auvergne-Rhône-Alpes.

Figure A3. Architecture found by WindDragon on Hauts-de-France.

Figure A4. Architecture found by WindDragon on Île-de-France.

Figure A5. Architecture found by WindDragon on Occitanie.

Figure A6. Weekly comparative visuals.

Author comment: WindDragon: automated deep learning for regional wind power forecasting — R0/PR1

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr1

Julie Keisler

EDF, France

Revision round: 0

Role: author

Comments

Dear Editors:

We are writing to submit our manuscript “WindDragon: Enhancing Wind Power Forecasting with

Automated Deep Learning” to the Journal of Environmental Data Science.

Our manuscript presents an optimization tool to automatically find efficient deep neural networks to forecast aggregated wind power generation at the level of a region or a country. These models are based on wind speed maps from numerical weather prediction (NWP) forecasts and take advantage of their spatio-temporal aspect. These methods could play a crucial role in the smooth operation of power grids in the context of massive renewable energy integration.

Our submission has the following keywords: Wind Power Forecasting, Deep Learning, Renewable Energies and Automated Machine Learning.

As the corresponding author, I confirm that all co-authors below consent to my submission of this manuscript to the Journal of Environmental Data Science.

Sincerely,

Julie Keisler (EDF Lab Paris-Saclay, University of Lille & INRIA Paris)

Etienne Le-Naour (EDF Lab Paris-Saclay & Sorbonne Université)

Review: WindDragon: automated deep learning for regional wind power forecasting — R0/PR2

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr2

Nina Effenberger

University of Tübingen, Germany

Date of review: 06 September 2024

Revision round: 0

Role: reviewer

Recommendation/decision: minor-revision

Conflict of interest statement

Reviewer declares none.

Comments

The paper presents WindDragon, an AutoML framework that shows improvement in forecasting aggregated short-term wind power compared to other deep learning forecasting techniques. It is overall well-written and the general framework is convincing. However, I have some questions and comments, mainly regarding how the results are evaluated.

Major comments

- Could you please provide training/inference times and set-ups for the CNN, WindDragon and the ViT?

- Is WindDragon an application of DRAGON or a new framework? How does it differ from DRAGON?

- page 3, l.13: Could you please explain in which sense DRAGON is more flexible than and different to other AutoML frameworks? Could you also state why these differences are advantageous for your task, i.e. why you chose DRAGON and not one of the other AutoML frameworks your citing?

- Could you please explain how you use XGBoost to predict the wind power output? Can XGBoost account for the turbine locations as visualized in Figure 4? Is the prediction from speed to power deterministic? How did you train it?

- Table 1+2: In all cases the rankings based on MAE and NMAE are the same. Please choose at least one more metric that is more different to MAE than NMAE (e.g. RMSE to penalize large errors) and provide reasoning for your choice. I suggest to choose your metric such that it proves that structures which are important for your task of interest are predicted properly by your model. This can help to show the improvement of WindDragon over the CNN, compare to page 7, l. 21 “superior accuracy, particularly in predicting high peak values”. It would be nice if this was represented by a metric as well. At the moment I would find it difficult to choose WindDragon over a simple CNN as the CNN shows very good performance as well. Please make sure that the advantages of WindDragon are better elaborated.

- Table 1 caption: What does “sum of the regional forecasts for each models” mean? Is it the sum of MAEs over several consecutive timesteps? Do you report one forecast run i.e. 6 hours or an average? You report more results on page 7, l. 28., is this a theoretical extrapolation and if so, how many days are the baseline? Please provide more details on how exactly you compute the errors.

- Could you please elaborate on why your“ method holds promise for extending the forecasting horizon” (page 8, l.43)?

Minor comments

- “These models are especially useful when the wind farms are scattered on the map (see Figure 3)”, page 2, l.47. I guess this refers to your model accounting for the locations of turbines. Describing it differently could help in understanding this location-awareness. Same for page 3, l.3.

- Could you please give a reference when stating that vision transformers excel at segmenting images (page 3, l.4)?

- page 3, l. 26: “in our case a value is predicted from a 2D map”. I think you could be more precise here and explicitly state that the 2D map are the wind speeds. Is the predicted value wind speed or power?

- Please correct typos in your manuscript, e.g. page 3, l.11 “Compare to other…” or “Wind Generation forecast” instead of wind power generation forecast in the caption of Figure 2, page 4, l.21 “more recent forecasts” instead of most recent forecasts, l.24 “botto-up.

- page 4, l.21: What is the main motivation for having six-hourly forecasts? Is it data availability?

- Figure 3: I think choosing a different color for the farms (black? and white wherever there is no wind data instead of purple) would improve visibility of the farms. It would also improve understanding if the lon-lat scales were the same in all subplots, then one could immediately see that the turbines are at the same locations. I assume that the background color where no turbines are is the wind speed at one time point, please add this to the caption. I think the first image of the series is not necessary for understanding the procedure.

- Figure 4: Please make sure m.s^1 is better readable.

- page 6, l.21: I think it would be better to cite vit-pytorch instead of the github name of the owner of the repository.

- Figure 7+8: Maybe plot the difference to the ground truth instead of the ground truth and the individual forecasts. This could help making the improvement of Wind Dragon over the CNN better visible.

Review: WindDragon: automated deep learning for regional wind power forecasting — R0/PR3

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr3

Reviewer_1

Date of review: 23 September 2024

Revision round: 0

Role: reviewer

Recommendation/decision: major-revision

Conflict of interest statement

Reviewer declares none.

Comments

This paper developed an automated deep learning model for regional/national wind power forecasting, based on previously developed DRAGON model. Case studies based on 3-year data shows that the developed WindDragon model outperforms benchmarks, including CNN and ViT. While the paper shows some interesting findings, major issues should be addressed:

1. The current literature review shows that the authors are new to this field. Please clearly point out the motivation, innovation, and contribution of this paper.

2. Please include key information in the abstract, including motivation, innovation, key result statistics, conclusions.

3. The innovation compared to DRAGON is not clear.

4. Please add more details about the DRAGON algorithm

5. How did the authors handle new wind plants and wind curtailment?

6. It’s not clear how did the authors optimize the CNN for fair comparison.

7. Please include the training curves to show the training process.

8. There seems to be no significance in improvement compared to CNNs.

9. Please show more time series with difference performance of the models.

Recommendation: WindDragon: automated deep learning for regional wind power forecasting — R0/PR4

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr4

Patrick Emami National Renewable Energy Laboratory, United States

Date of review: 23 September 2024

Revision round: 0

Role: Editor

Recommendation/decision: major-revision

Comments

Thank you very much for submitting your manuscript for review. Based on the reviews, we cannot immediately accept this manuscript for publication. Some of the main issues highlighted by reviewers include a lack of clarity concerning the difference between WindDragon and DRAGON, a lack of a comprehensive literature review, and a need to revise the introduction to more clearly state the motivation, innovations and contributions. However, we would reconsider a revised version based on the comments provided by both reviewers.

Decision: WindDragon: automated deep learning for regional wind power forecasting — R0/PR5

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr5

Jakob Runge TU Dresden, Germany

Revision round: 0

Role: Editor in Chief

Recommendation/decision: major-revision

Comments

No accompanying comment.

Author comment: WindDragon: automated deep learning for regional wind power forecasting — R1/PR6

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr6

Julie Keisler

EDF, France

Revision round: 1

Role: author

Comments

We are grateful to the Editor-in-Chief, the Editor and the reviewers for their careful reading and comments. Below we comment on the changes we have made in response to the various suggestions made by the reviewers. In the updated version of the paper, we have written our changes in blue.

The main changes we have made are as follows

- Better contextualising our paper and stating the industrial needs it addresses in the introduction.

- Adding a full state of the art section to detail where our approach stands in both the wind power forecasting and AutoML communities.

- Detailing in the WindDragon section how exactly our approach differs from DRAGON.

- Changing the search algorithm from Evolutionary Algorithm to Mutant UCB to achieve even better results.

- Comparison of the performance of CNN and WindDragon.

Reviewer 1

Major remarks

Could you please provide training/inference times and set-ups for the CNN, WindDragon and the ViT?

Thank you for your comment. The CNN structure was found by hand using a trial and error method, whereas WindDragon’s ‘training’ includes an automatic search for these structures. Once the correct model is found, the inference of the two methods is similar, as it consists of the inference of a simple neural network. Therefore, it seems difficult to compare the training/inference of the different methods. However, thanks to your comment we have included a graph in the Experiments section showing the error of the best model found so far by WindDragon during training, to give an idea of the computational time required. We pointed out in the conclusion that this computation time was long and that this was an area for improvement (although this problem is inherent in any AutoML technique).

Is WindDragon an application of DRAGON or a new framework? How does it differ from DRAGON?

Thank you for your comment. DRAGON is a python open-source package which provides high-level bricks which can be used to define a search space and build search algorithms. With WindDragon we used those tools to propose a search space and a search algorithm adapted to the specific wind power forecasting task. We are revising Section 3 to highlight the difference between Windragon and Dragon according to your comment. More details can be found in this section.

page 3, l.13: Could you please explain in which sense DRAGON is more flexible than and different to other AutoML frameworks? Could you also state why these differences are advantageous for your task, i.e. why you chose DRAGON and not one of the other AutoML frameworks your citing?Thanks for your question. In short, DRAGON is more flexible than other AutoML frameworks because it allows to create search spaces that can mix different layers type (convolution, attention, MLP, etc) and does not constrain the neural networks general structure. According to your remark, we added a new related content paragraph \textit{AutoDL} for wind power forecasting. See Section 2 for more explanation.

Could you please explain how you use XGBoost to predict the wind power output?

Yes of course. For the XGboost baseline we follow a two-stages procedure.

- For a given NWP wind speed map $X_t \in \mathbb{R}^{m \times n}$ we compute the mean of the wind speed map denoted as $\bar{X}_t \in \mathbb{R}$

- Let us consider that our train dataset is composed of $T$ timestamps. The XGB regressor is a simple mapping between the mean of the wind speed map and the wind power generation for this given map. Formally: \begin{align*}

XGB \colon \mathbb{R} &\longrightarrow \mathbb{R} \\

\bar{X_t} &\longmapsto XGB(\bar{X}_t) = \hat{y_t}

\end{align*}

We then learn the XGB regressor on the $\{\bar{X}_t, y_t \}_{t=1}^{T}$ train dataset.

Can XGBoost account for the turbine locations as visualized in Figure 4?

The XGBoost baseline cannot take turbine locations into account because it is training on the mean of the wind speed map. In fact, the purpose of this baseline is to show the performance of a model that only relies on aggregated information, not spatial information (as opposed to deep learning baselines).

Is the prediction from speed to power deterministic? How did you train it?

Yes, it is deterministic. It is trained according to a Mean Square Error (MSE). For uncertainty quantification, the WindDragon framework (as well as all other baselines) could be trained with pinball losses.

Table 1+2: In all cases the rankings based on MAE and NMAE are the same. Please choose at least one more metric that is more different to MAE than NMAE (e.g. RMSE to penalize large errors) and provide reasoning for your choice. I suggest to choose your metric such that it proves that structures which are important for your task of interest are predicted properly by your model. This can help to show the improvement of WindDragon over the CNN, compare to page 7, l. 21 “superior accuracy, particularly in predicting high peak values”. It would be nice if this was represented by a metric as well.

Thanks for your remark. We compared the rankings of the CNN and WindDragon with the MSE and the RMSE and for all regions WindDragon outperforms the CNN. In the industry, the NMAE is the most used metric for regional wind power forecasting. The MAE gives an idea of the quantities of energy, and in particular shows that not all regions are equally important to forecast, since some produce much less wind power than others. For example, the forecasts for the PACA region are very poor, regardless of the method used, but given the very low level of production, this is not a problem. However, to take your comment into account, we have compared the performance of CNN and WindDragon Section 4 in more detail. We have compared performance by time of day, by month and by production distribution. We show in Figure 7 that on the highest quantile of the distribution, WindDragon forecasts are significantly better than those from the CNN.

At the moment I would find it difficult to choose WindDragon over a simple CNN as the CNN shows very good performance as well. Please make sure that the advantages of WindDragon are better elaborated.

For this kind of national problem, even a small gain in terms of percentages can have a huge impact, as hilighted in (iii) Results paragraph. But we agree with you, the results can be improved. This is why we changed our search algorithm and used Mutant-UCB instead of the Evolutionary Algorithm, which gave us much better results. This algorithm is described in detail in section 3. Thanks to this algorithm, WindDragon has an error that is 19\% lower than the CNN, which we believe is very advantageous from an operational point of view.

Table 1 caption: What does “sum of the regional forecasts for each models” mean? Is it the sum of MAEs over several consecutive timesteps? Do you report one forecast run i.e. 6 hours or an average? You report more results on page 7, l. 28., is this a theoretical extrapolation and if so, how many days are the baseline? Please provide more details on how exactly you compute the errors. About the national prediction : let us denote the national wind power generation as $y^{\text{nat}}$ and it is approximation as $\hat{y}^{\text{nat}}$. We can do the same for the french region, we denote the $i-th$ region as $y^{r_i}$ and it is approximation $\hat{y}^{r_i}$.

Our proposed model and all the baselines are specifically trained on each region independently, this is the results reported in table 3. The national forecast $\hat{y}^{\text{nat}}$ can be defined as $\hat{y}^{\text{nat}} = \sum:{i=1}^{I} \hat{y}^{r_i}$ where I denote the total number of regions. This two-stages procedure for the national forecast is way more efficient than directly forecast the national wind power generation based on the whole national wind speed map.

About the test timestamps : We are not sure to fully understand the question. But the reported test results concern the 2020 years (it is the average error over the 24*365 timestamps of the 2020 year).

About the 6 hours forecasts : Our method (and all baselines) does not perform forecasts, but treats a regression problem between a wind speed map at time $t$ and the corresponding wind power generation at time $t$. The 6-hour forecasts concern the NWP maps that we use as input to the regressors. The NWPs are the output of the HRES model, which runs 4 times per day and forecasts 6 hours at a time.

About the error : Thank you for your remark. In the revised version we provide more details on how exactly we compute the errors.}

Could you please elaborate on why your“ method holds promise for extending the forecasting horizon” (page 8, l.43)?

Sorry we made a mistake. We changed this statement in the conclusion and mentioned that a future work would be to apply this to longer horizons.

Minor remarks

Thank you for your questions and suggestions in the minor remarks section. We have addressed your recommendations and questions in the revised version of the manuscript. For detailed responses and updates, please refer to this updated version.

Reviewer 2

Remarks

The current literature review shows that the authors are new to this field. Please clearly point out the motivation, innovation, and contribution of this paper. Please include key information in the abstract, including motivation, innovation, key result statistics, conclusions.

Thanks for your remark. According to your suggestion we rewrite the abstract, introduction and related works. We have positioned ourselves in the literature as forecasting wind power at the regional scale, rather than at the farm scale as many papers do. This positioning is made clear in the abstract. We have justified the industrial need for such a forecast in the introduction. In a new part 2 “State of the art”, we have detailed the methods used in the literature to make regional forecasts, and we have expanded a little to forecasts at the scale of a wind turbine to detail the neural networks used in wind power forecasting.

The innovation compared to DRAGON is not clear.

Thanks for your question. In short, DRAGON is a package providing tools to create AutoDL frameworks for a particular task. We used those tools to create WindDragon, specific for the wind power generation forecasting problem. At the end of Part 2, we have added a presentation of the tools offered by DRAGON. In Part 3, entitled WindDragon, we present how we have used these tools, highlighting the innovative part of our approach (data processing, objective function, search space, search algorithm, etc).

Please add more details about the DRAGON algorithm

We added a full subsection (2.4) detailing DRAGON.

How did the authors handle new wind plants and wind curtailment?

For new installations, the load factor (production divided by installed capacity) allows us to take into account the production of new wind farms. Moreover, our models do not need a lot of training data to be effective, so we took 2 years to forecast the year after. During this period, the wind farms do not change very much. If the geographical location changes significantly and performance suffers as a result, WindDragon will have to be run again to get the models to perform well. As for curtailments, they are not very visible at this scale because they are buried in the mass of wind farms. They are also difficult to predict without data from the wind farms themselves. So we have not taken them into account.

It’s not clear how did the authors optimize the CNN for fair comparison.

The CNN structure was found by hand using a trial and error and with a simple grid search to optimize the hyperparameters (e.g. the number of layers, the kernel sizes, the activation functions). The aim of this paper is to show that the use of AutoDL allows to largely improve the performance of a model found by hand.

Please include the training curves to show the training process.

We detailed the CNN training process in the Experimental set-up paragraph within Section 4. As the training is basic (AdamW for 200 epochs) we do not think this is of much importance for this paper. Regarding WindDragon, thanks to your remark we detailed the objective function (building, training and evaluating a deep neural network) in section 3.

There seems to be no significance in improvement compared to CNNs.

These first results could indeed be improved. We changed our search algorithm and used Mutant-UCB instead of the Evolutionary Algorithm, which gave us much better results. This algorithm is described in detail in section 3. Thanks to this algorithm, WindDragon has an error that is 19\% lower than the CNN, which we believe is very advantageous from an operational point of view.

Please show more time series with difference performance of the models.

We added performance comparisons by time of day, month and wind generation distribution. We hope that these new graphs allow the reader to see that WindDragon consistently outperforms the CNN.

Review: WindDragon: automated deep learning for regional wind power forecasting — R1/PR7

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr7

Nina Effenberger

University of Tübingen, Germany

Date of review: 10 December 2024

Revision round: 1

Role: reviewer

Recommendation/decision: minor-revision

Conflict of interest statement

Reviewer declares none.

Comments

Thank you for answering my questions, the paper is more concise and better readable now. My main comment regarding the improvement of WindDragon over the CNN has been addressed properly. I also like that you now assess accuracy in the tails of the distribution.

- The connection between DRAGON and WindDragon is clearer now. I am still not fully certain to which types of problems WindDragon is constrained as compared to DRAGON. It would be helpful to pinpoint the differences concretely, e.g. by comparing it the two frameworks in form of a table or adding one sentence to section 3. I understand that WindDragon is more constrained to the concrete question. This can be helpful and necessary for some related questions (I guess one example would be the extension of the framework to other regions) but also limits generalizability. Concrete questions I’d be interested in: What are the limits of WindDragon compared to DRAGON? What is the input data to DRAGON vs WindDragon? Which additional tools does DRAGON provide, i.e. in which parts is DRAGON more flexible than WindDragon? Which choices did you have to make to build WindDragon out of DRAGON and which ones did you make that are not necessary and could potentially be adapted for other wind power tasks?

- Regarding Table 2: I still don’t understand what “sum of the regional forecasts for each models” means. Do you mean “MAE is the cumulative error based on the regional results”? The errors of Table 3 do not add up to the numbers in Table 2. How is the NMAE in Table 2 computed? I do not understand how Table 2 is generated out of Table 3 as described on page 10, l.38.

- Small additional comment regarding errors: you could also compute a skill score to compute the added value of WindDragon over the CNN. This is a common procedure and I think it would be valuable as, depending on the score, skill scores are easier to interpret than figures.

- Figure 7b,d shows seasonal differences in the normalized monthly error: In summer, where the errors are higher, the improvement of WindDragon over the CNN is higher than in the summer. This could be particularly valuable as there is usually less wind power in summer which makes the generated power more important and accurate predictions more relevant. I think in both normalized errors there is a positive correlation between the error of the CNN and the absolute improvement of WindDragon: The more wrong the CNN is the more it helps to use WindDragon. This is just a visual investigation, maybe you want to check whether this is actually true and point this out in your results (you only mention the seasonal difference briefly but I think this is a big pro argument for WindDragon).

- Increase font sizes in Figure 7.

- Please correct inconsistencies in language/typos. Examples include

- Page 2, line 45 starts with “if”. I think this is not correct.

- Page 3, line 21 “Transfert strategy”.

- Terms such as “machine learning” or “computer vision” are usually not capitalized in sentences.

- Page 5, line 42 “more complex setups might be imagined” sounds very vague, consider rephrasing or leaving out such inconcrete sentences or move them to the discussion.

- m.s^-1 in Figure 4. Consider using the latex SI package.

- Please make sure all citations are displayed correctly, e.g. p.3, l.46 p.2., l.20

- Page 4, line 40 “*A few works tried to apply AutoDL*”. I think it would be better to say “A few works apply AutoDL”

- Figure 8: Please give more information in the caption of the Figure. I also think if you show the relative or absolute difference between the two plots a “convexish” function would underscore your interpretation that WindDragon particularly improves predictions in the tails. I think these improvements in tails and seasons is what makes the predictions of WindDragon more valuable compared to the CNN and I would appreciate it if you pointed this out in the conclusion.

Review: WindDragon: automated deep learning for regional wind power forecasting — R1/PR8

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr8

Reviewer_1

Date of review: 16 January 2025

Revision round: 1

Role: reviewer

Recommendation/decision: accept

Conflict of interest statement

Reviewer declares none.

Comments

My comments have been well addressed.

Recommendation: WindDragon: automated deep learning for regional wind power forecasting — R1/PR9

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr9

Patrick Emami National Renewable Energy Laboratory, United States

Date of review: 16 January 2025

Revision round: 1

Role: Editor

Recommendation/decision: accept

Comments

No accompanying comment.

Decision: WindDragon: automated deep learning for regional wind power forecasting — R1/PR10

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr10

Jakob Runge TU Dresden, Germany

Revision round: 1

Role: Editor in Chief

Recommendation/decision: minor-revision

Comments

No accompanying comment.

Author comment: WindDragon: automated deep learning for regional wind power forecasting — R2/PR11

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr11

Etienne Le Naour EDF, France

Revision round: 2

Role: author

Comments

We are grateful to the Editor-in-Chief, the Editor and the reviewers for their careful reading and comments. Below we comment on the changes we have made in response to the major remarks made by the reviewer 1. As for the minor comments, we have taken almost all of them into account.

The connection between DRAGON and WindDragon is clearer now. I am still not fully certain to which types of problems WindDragon is constrained as compared to DRAGON. It would be helpful to pinpoint the differences concretely, e.g. by comparing it the two frameworks in form of a table or adding one sentence to section 3. I understand that WindDragon is more constrained to the concrete question. This can be helpful and necessary for some related questions (I guess one example would be the extension of the framework to other regions) but also limits generalizability. Concrete questions I’d be interested in: What are the limits of WindDragon compared to DRAGON? What is the input data to DRAGON vs WindDragon? Which additional tools does DRAGON provide, i.e. in which parts is DRAGON more flexible than WindDragon? Which choices did you have to make to build WindDragon out of DRAGON and which ones did you make that are not necessary and could potentially be adapted for other wind power tasks?

Thank you for your comment. Dragon and WindDragon are not two AutoDL frameworks. Dragon is a toolbox and WindDragon uses these tools to create an AutoDL framework applied to wind forecasting based on NWP maps. Dragon provides computing objects that can be used to encode deep neural networks architecture and hyperparameters, and WindDragon uses these objects to build a framework applicable to the use case. A relevant analogy can be drawn between Dragon and PyTorch, a toolbox for designing neural networks. WindDragon can be regarded as a neural network architecture that uses PyTorch’s tools to perform a specific task, making it irrelevant to consider input data for either Dragon or PyTorch.

Regarding Table 2: I still don’t understand what “sum of the regional forecasts for each models” means. Do you mean “MAE is the cumulative error based on the regional results”? The errors of Table 3 do not add up to the numbers in Table 2. How is the NMAE in Table 2 computed? I do not understand how Table 2 is generated out of Table 3 as described on page 10, l.38.

To get the national MAE per model :

- We first produce one forecast by region.

- We sum for each time step the forecasts of each region.

- We compute the MAE of this new signal.

Thanks to your question, we have added the mathematical formulation of these steps in the “Results” section on page 10.

Small additional comment regarding errors: you could also compute a skill score to compute the added value of WindDragon over the CNN. This is a common procedure and I think it would be valuable as, depending on the score, skill scores are easier to interpret than figures.

We already gave the percentage of improvement within the paragraph « WindDragon’s superior performances » at the end of page 11, but thanks to your remark we added the skill score per quantile Figure 8.b.

Figure 7b,d shows seasonal differences in the normalized monthly error: In winter, where the errors are higher, the improvement of WindDragon over the CNN is higher than in the summer. This could be particularly valuable as there is usually less wind power in summer which makes the generated power more important and accurate predictions more relevant. I think in both normalized errors there is a positive correlation between the error of the CNN and the absolute improvement of WindDragon: The more wrong the CNN is the more it helps to use WindDragon. This is just a visual investigation, maybe you want to check whether this is actually true and point this out in your results (you only mention the seasonal difference briefly but I think this is a big pro argument for WindDragon).

Thank you for your suggestion. We have tested the correlation between CNN error and WindDragon improvement. Our results show that this relationship exists, but is rather weak, with a correlation of 0.5. Although this trend is visible in certain seasons, it does not hold overall.

Figure 8: Please give more information in the caption of the Figure. I also think if you show the relative or absolute difference between the two plots a “convexish” function would underscore your interpretation that WindDragon particularly improves predictions in the tails. I think these improvements in tails and seasons is what makes the predictions of WindDragon more valuable compared to the CNN and I would appreciate it if you pointed this out in the conclusion.

Thank you for your suggestion. We have tested the addition of an absolute/relative difference curve between the WindDragon MAE and the CNN MAE to verify the suggested convex trend. Our results show that, although WindDragon significantly improves predictions at the extremes, the shape of the curve does not strictly follow a convexity. However, in the conclusion, we emphasised the improvement provided by WindDragon in the tails of the distribution and in difficult seasonal conditions.

Recommendation: WindDragon: automated deep learning for regional wind power forecasting — R2/PR12

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr12

Patrick Emami National Renewable Energy Laboratory, United States

Date of review: 13 February 2025

Revision round: 2

Role: Editor

Recommendation/decision: accept

Comments

No accompanying comment.

Decision: WindDragon: automated deep learning for regional wind power forecasting — R2/PR13

Published online by Cambridge University Press: 19 March 2025

DOI: https://doi.org/10.1017/eds.2025.10.pr13

Jakob Runge TU Dresden, Germany

Revision round: 2

Role: Editor in Chief

Recommendation/decision: accept

Comments

No accompanying comment.