Energy Data

The energy data for this project has been collected by using the API from Energidataservice.dk. The exact dataset in question can be found here or downloaded from my github repository here.

import pandas as pd
# Indlæsning af data
Data = pd.read_csv("Production and Consumption - Settlement.csv")

# Konverter 'HourDK' til datetime format og sæt som index
Data['HourDK'] = pd.to_datetime(Data['HourDK'])
Data.set_index('HourDK', inplace=True)

Data.tail()
HourUTC PriceArea CentralPowerMWh LocalPowerMWh CommercialPowerMWh LocalPowerSelfConMWh OffshoreWindLt100MW_MWh OffshoreWindGe100MW_MWh OnshoreWindLt50kW_MWh OnshoreWindGe50kW_MWh ... ExchangeNO_MWh ExchangeSE_MWh ExchangeGE_MWh ExchangeNL_MWh ExchangeGreatBelt_MWh GrossConsumptionMWh GridLossTransmissionMWh GridLossInterconnectorsMWh GridLossDistributionMWh PowerToHeatMWh
HourDK
2023-04-10 02:00:00 2023-04-10T00:00:00 DK2 542.716430 56.011137 195.179447 4.254726 4.103300 125.771092 0.091373 122.521582 ... NaN 1218.86 -923.75 NaN -199.1 1148.563281 28.741000 13.879016 55.044378 7.319560
2023-04-10 01:00:00 2023-04-09T23:00:00 DK1 295.084760 174.085155 67.325744 11.743112 110.835594 607.133481 3.237114 855.111878 ... 828.20 718.94 -1641.44 -464.64 158.6 1726.785529 66.019140 28.708100 73.265991 9.637616
2023-04-10 01:00:00 2023-04-09T23:00:00 DK2 523.548447 56.455218 192.695424 4.105949 5.253700 126.193216 0.078260 121.382998 ... NaN 1252.83 -945.39 NaN -160.8 1179.818996 29.380099 13.627976 55.872573 6.537930
2023-04-10 00:00:00 2023-04-09T22:00:00 DK1 365.348642 184.294252 67.073400 12.139897 108.930271 580.453773 2.587651 872.101825 ... 1313.43 682.06 -2128.03 -450.36 150.0 1762.614751 77.556467 26.627500 74.698764 9.421748
2023-04-10 00:00:00 2023-04-09T22:00:00 DK2 577.949918 59.068309 193.441627 4.415682 5.439600 114.727084 0.072489 116.494591 ... NaN 1221.67 -945.16 NaN -152.3 1200.284632 29.300249 13.956008 56.731284 2.712530

5 rows × 26 columns

Here you can see the last 5 observations of the dataset, which contains 319726 observations. However, there are 2 observations for each time point because there is one observation for each electricity network (DK1 and DK2). These can now be summed to get the consumption for all of Denmark for every hour from 2005-03-25 23:00:00 to 2023-04-10 00:00:00, and cut off all other variables except HourDK and GrossConsumptionMWh.

HourDK GrossConsumptionMWh
0 2005-01-01 00:00:00 3370.256592
1 2005-01-01 01:00:00 3237.832763
2 2005-01-01 02:00:00 3101.580811
3 2005-01-01 03:00:00 2963.392211
4 2005-01-01 04:00:00 2854.805420
... ... ...
159840 2023-05-30 19:00:00 3935.964505
159841 2023-05-30 20:00:00 3764.163099
159842 2023-05-30 21:00:00 3655.639568
159843 2023-05-30 22:00:00 3663.715933
159844 2023-05-30 23:00:00 3308.564927

159845 rows × 2 columns

We can now take a closer look at the energy data by plotting it:

Plotting the timeseries

png

It may be difficult to see anything beyond the fact that there is an annual season in the data where consumption rises in the winter and falls in the summer. Let’s take a closer look at a year, a month, a week, and a day:

png

Here we can closer see the annual seasonality.

png

Here we can see the weekly seasonality where we can see that the energy consumption falls in the weekends.

png

Zoomed into a week of the timeseries it is even more apparent how big the difference is between weekday and weekend.

png Zoomed into a single day the daily seasonality becomes apparent.

lets take a closer look into this seasonality with some boxplots.

seasonality

png

png

png

As it can be seen in the plots above, the annual weekly and daily seasonality was not a coincidence for the timeseries plot we zoomed into, but continues througout the entire timeseries. this gives valuable infomration for when it comes to modelleing a prediction model to fit to the timeseries.