Energy Data
The energy data for this project has been collected by using the API from Energidataservice.dk. The exact dataset in question can be found here or downloaded from my github repository here.
import pandas as pd
# Indlæsning af data
Data = pd.read_csv("Production and Consumption - Settlement.csv")
# Konverter 'HourDK' til datetime format og sæt som index
Data['HourDK'] = pd.to_datetime(Data['HourDK'])
Data.set_index('HourDK', inplace=True)
Data.tail()
HourUTC | PriceArea | CentralPowerMWh | LocalPowerMWh | CommercialPowerMWh | LocalPowerSelfConMWh | OffshoreWindLt100MW_MWh | OffshoreWindGe100MW_MWh | OnshoreWindLt50kW_MWh | OnshoreWindGe50kW_MWh | ... | ExchangeNO_MWh | ExchangeSE_MWh | ExchangeGE_MWh | ExchangeNL_MWh | ExchangeGreatBelt_MWh | GrossConsumptionMWh | GridLossTransmissionMWh | GridLossInterconnectorsMWh | GridLossDistributionMWh | PowerToHeatMWh | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HourDK | |||||||||||||||||||||
2023-04-10 02:00:00 | 2023-04-10T00:00:00 | DK2 | 542.716430 | 56.011137 | 195.179447 | 4.254726 | 4.103300 | 125.771092 | 0.091373 | 122.521582 | ... | NaN | 1218.86 | -923.75 | NaN | -199.1 | 1148.563281 | 28.741000 | 13.879016 | 55.044378 | 7.319560 |
2023-04-10 01:00:00 | 2023-04-09T23:00:00 | DK1 | 295.084760 | 174.085155 | 67.325744 | 11.743112 | 110.835594 | 607.133481 | 3.237114 | 855.111878 | ... | 828.20 | 718.94 | -1641.44 | -464.64 | 158.6 | 1726.785529 | 66.019140 | 28.708100 | 73.265991 | 9.637616 |
2023-04-10 01:00:00 | 2023-04-09T23:00:00 | DK2 | 523.548447 | 56.455218 | 192.695424 | 4.105949 | 5.253700 | 126.193216 | 0.078260 | 121.382998 | ... | NaN | 1252.83 | -945.39 | NaN | -160.8 | 1179.818996 | 29.380099 | 13.627976 | 55.872573 | 6.537930 |
2023-04-10 00:00:00 | 2023-04-09T22:00:00 | DK1 | 365.348642 | 184.294252 | 67.073400 | 12.139897 | 108.930271 | 580.453773 | 2.587651 | 872.101825 | ... | 1313.43 | 682.06 | -2128.03 | -450.36 | 150.0 | 1762.614751 | 77.556467 | 26.627500 | 74.698764 | 9.421748 |
2023-04-10 00:00:00 | 2023-04-09T22:00:00 | DK2 | 577.949918 | 59.068309 | 193.441627 | 4.415682 | 5.439600 | 114.727084 | 0.072489 | 116.494591 | ... | NaN | 1221.67 | -945.16 | NaN | -152.3 | 1200.284632 | 29.300249 | 13.956008 | 56.731284 | 2.712530 |
5 rows × 26 columns
Here you can see the last 5 observations of the dataset, which contains 319726 observations. However, there are 2 observations for each time point because there is one observation for each electricity network (DK1 and DK2). These can now be summed to get the consumption for all of Denmark for every hour from 2005-03-25 23:00:00 to 2023-04-10 00:00:00, and cut off all other variables except HourDK
and GrossConsumptionMWh
.
HourDK | GrossConsumptionMWh | |
---|---|---|
0 | 2005-01-01 00:00:00 | 3370.256592 |
1 | 2005-01-01 01:00:00 | 3237.832763 |
2 | 2005-01-01 02:00:00 | 3101.580811 |
3 | 2005-01-01 03:00:00 | 2963.392211 |
4 | 2005-01-01 04:00:00 | 2854.805420 |
... | ... | ... |
159840 | 2023-05-30 19:00:00 | 3935.964505 |
159841 | 2023-05-30 20:00:00 | 3764.163099 |
159842 | 2023-05-30 21:00:00 | 3655.639568 |
159843 | 2023-05-30 22:00:00 | 3663.715933 |
159844 | 2023-05-30 23:00:00 | 3308.564927 |
159845 rows × 2 columns
We can now take a closer look at the energy data by plotting it:
Plotting the timeseries
It may be difficult to see anything beyond the fact that there is an annual season in the data where consumption rises in the winter and falls in the summer. Let’s take a closer look at a year, a month, a week, and a day:
Here we can closer see the annual seasonality.
Here we can see the weekly seasonality where we can see that the energy consumption falls in the weekends.
Zoomed into a week of the timeseries it is even more apparent how big the difference is between weekday and weekend.
Zoomed into a single day the daily seasonality becomes apparent.
lets take a closer look into this seasonality with some boxplots.
seasonality
As it can be seen in the plots above, the annual weekly and daily seasonality was not a coincidence for the timeseries plot we zoomed into, but continues througout the entire timeseries. this gives valuable infomration for when it comes to modelleing a prediction model to fit to the timeseries.