The Project is Wind Data Analysis some Feature Engineering

Data Description as it was found in https://www.kaggle.com/berkerisen/wind-turbine-scada-dataset

Context

In Wind Turbines, Scada Systems measure and save data's like wind speed, wind direction, generated power etc. for 10 minutes intervals. This file was taken from a wind turbine's scada system that is working and generating power in Turkey.

Content The data's in the file are:

1) Date/Time (for 10 minutes intervals) 2) LV ActivePower (kW): The power generated by the turbine for that moment 3) Wind Speed (m/s): The wind speed at the hub height of the turbine (the wind speed that turbine use for electricity generation) 4) TheoreticalPowerCurve (KWh): The theoretical power values that the turbine generates with that wind speed which is given by the turbine manufacturer 5)Wind Direction (°): The wind direction at the hub height of the turbine (wind turbines turn to this direction automaticly)

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 
#Installing windrose to have the wind direction overview
!pip install windrose
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
# Use seaborn style defaults and set the default figure size
sns.set(rc={'figure.figsize':(11, 4)})
from windrose import WindroseAxes

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

Collecting windrose
  Downloading windrose-1.6.7-py2.py3-none-any.whl (20 kB)
Requirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from windrose) (1.18.1)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.6/site-packages (from windrose) (3.0.3)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.6/site-packages (from matplotlib->windrose) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->windrose) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->windrose) (2.4.6)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.6/site-packages (from matplotlib->windrose) (2.8.1)
Requirement already satisfied: six in /opt/conda/lib/python3.6/site-packages (from cycler>=0.10->matplotlib->windrose) (1.14.0)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.6/site-packages (from kiwisolver>=1.0.1->matplotlib->windrose) (45.2.0.post20200210)
Installing collected packages: windrose
Successfully installed windrose-1.6.7
/kaggle/input/wind-turbine-scada-dataset/T1.csv

# Importing the data needed for the analysis into panda dataframe

df = pd.read_csv('../input/wind-turbine-scada-dataset/T1.csv')

#checking the first 5 set of data in the dataframe
df.head()

#checking if the dataframe contains null
df.isna().sum()

Date/Time                        0
LV ActivePower (kW)              0
Wind Speed (m/s)                 0
Theoretical_Power_Curve (KWh)    0
Wind Direction (°)               0
dtype: int64

#Covert Data/time to index and drop columns Date/Time
df.index=df['Date/Time']
df.drop(['Date/Time'], axis=1, inplace=True)

#New DataFrame after dropping column Date/Time
df.head()

#plotting each data
cols_plot = ['LV ActivePower (kW)', 'Wind Speed (m/s)', 'Theoretical_Power_Curve (KWh)','Wind Direction (°)']
axes = df[cols_plot].plot(marker='.', alpha=0.5, linestyle='None', figsize=(11, 9), subplots=True)

# Plot the data distributions
plt.figure(figsize=(10, 8))
for i in range(4):
    plt.subplot(2, 2, i+1)
    sns.kdeplot(df.iloc[:,i], shade=True)
    plt.title(df.columns[i])
plt.tight_layout()
plt.show()

# Create wind speed and direction variables
ax = WindroseAxes.from_ax()
ax.bar(df['Wind Direction (°)'], df['Wind Speed (m/s)'], normed=True, opening=0.8, edgecolor='white')
ax.set_legend()

<matplotlib.legend.Legend at 0x7fdf4c8e9908>

The wind rose plot above shows that the wind direction is mostly from the north east while some significant wind also come from the south-west.

#Checking for maximum and minimum value of the wind direction to help in choosing the right binning value
print(df['Wind Direction (°)'].max())
print(df['Wind Direction (°)'].min())

359.99758911132795
0.0

#Continuous variable bins; qcut vs cut: https://stackoverflow.com/questions/30211923/what-is-the-difference-between-pandas-qcut-and-pandas-cut
#Fare Bins/Buckets using qcut or frequency bins: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.qcut.html
#df['Wind Speed (m/s) Bin'] = pd.qcut(df['Wind Speed (m/s)'], 4)

 #Age Bins/Buckets using cut or value bins: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.cut.html
#df['Wind Direction (°)'] = pd.cut(df['Wind Direction (°)'].astype(int), 45)

#df

#Bining the data by the wind direction
bins_range = np.arange(0,375,30)

print(bins_range)

[  0  45  90 135 180 225 270 315 360]

#Write a short code to map the bins data
def binning(x, bins):
    kwargs = {}
    if x == max(bins):
        kwargs['right'] = True
    bin = bins[np.digitize([x], bins, **kwargs)[0]]
    bin_lower = bins[np.digitize([x], bins, **kwargs)[0]-1]
    return '[{0}-{1}]'.format(bin_lower, bin)

df['Bin'] = df['Wind Direction (°)'].apply(binning, bins=bins_range)

#group the binned data by mean and std
grouped = df.groupby('Bin')
grouped_std = grouped.std()
grouped_mean = grouped.mean()
grouped_mean.head()

The analysis above shows that highest avearge wind speed was recorded around 180(°)-225(°).

Contrary to the opinion once had from the windrose plot, south - southwest shows good site for wind turbine because it has the highest avearge wind speed. The region also also has highest theoretical power amd LV active power.

#Checking for maximum and minimum value of the windspeed to help in choosing the right binning value
print(df['Wind Speed (m/s)'].max())
print(df['Wind Speed (m/s)'].min())

25.2060108184814
0.0

#Bining the data by the wind direction
bins_range_ws = np.arange(0,26,0.5)

df['Bin'] = df['Wind Speed (m/s)'].apply(binning, bins=bins_range_ws)

#Group by windspeed bin
grouped = df.groupby('Bin')
grouped_std = grouped.std()
grouped_mean = grouped.mean()
grouped_mean

#lets rearrange the index for proper visualisation
step = bins_range_ws[1]-bins_range_ws[0]
new_index = ['[{0}-{1}]'.format(x, x+step) for x in bins_range_ws]
new_index.pop(-1) #We dont need [360-375]...
grouped_mean = grouped_mean.reindex(new_index)

#Rearranged and visulaizing the mean of each windspeed bin 
grouped_mean

Looking at the table above, it can be assumed that the cut-in wind speed is 3.0-3.5 (m/s), rated wind speed is 12.5-13.0 (m/s) and cut-out wind speed is around 25(m/s). This analysis will be us to determine better filter condition in the power curve analysis.

#Power Curve Anaylsis
#Theoretical power curve
plt.scatter(df['Wind Speed (m/s)'],df['Theoretical_Power_Curve (KWh)'])
plt.ylabel('Theoretical_Power (KWh)')
plt.xlabel('Wind speed (m/s)')
plt.grid(True)
plt.legend([' Theoretical_Power_Curve'], loc='upper left')
plt.show()

# LV ActivePower (kW) CP_CURVE
plt.scatter(df['Wind Speed (m/s)'],df['LV ActivePower (kW)'])
plt.ylabel('LV ActivePower (kW)')
plt.xlabel('Wind speed (m/s)')
plt.grid(True)
plt.legend([' LV ActivePower (kW) CP_CURVE'], loc='upper left')
plt.show()

Using the information gathered above, we can now set a filter condition for our LV ActivePower (kW) power curve

Note: The filter information will be done manually based on the knoweldge gathered above.

#Condition 1
#The first step is the removal of downtime events, which can be identified as near-zero power at high wind speeds.

#Eliminate datas where wind speed is bigger than 3.5 and active power is zero.
new_df=df[((df["LV ActivePower (kW)"]!=0)&(df["Wind Speed (m/s)"]>3.5)) | (df["Wind Speed (m/s)"]<=3.5)]

#Condition 2
new_1 = (new_df[ (new_df['Wind Speed (m/s)'] < 12.5)  | (new_df['LV ActivePower (kW)'] >= 3000) ])

#Condition 3
new_2 = (new_1[ (new_1['Wind Speed (m/s)'] < 9.5)  | (new_1['LV ActivePower (kW)'] >= 1500) ])

#Condition 3
new_3 = (new_2[ (new_2['Wind Speed (m/s)'] < 6.5)  | (new_2['LV ActivePower (kW)'] >= 500) ])

#Theoretical_Power_Curve and Filtered LV ActivePower (kW) CP_CURVE Visualisation
plt.scatter(new_3['Wind Speed (m/s)'],new_3['LV ActivePower (kW)'])
plt.scatter(df['Wind Speed (m/s)'],df['Theoretical_Power_Curve (KWh)'], label='Theoretical_Power_Curve (KWh)')
plt.ylabel('Power (kW)')
plt.xlabel('Wind speed (m/s)')
plt.grid(True)
plt.legend(['Theoretical_Power_Curve and Filtered LV ActivePower (kW) CP_CURVE'], loc='lower right')
plt.show()

The filtered power curve can still be improved. You can suggest best filter condition.

Feature Engineering

Generating more features from the limited data given.

#Function to create more feature as WS and  Category
def CP_group(val):
    if val<3.5:
        return 'Region_1'
    elif val> 3.5 and val < 10:
        return 'Region_1.5'
    elif val>10 and val < 15:
        return 'Region_2'
    elif val>15 and val < 23:
        return 'Region_2.5'
    else:
        return 'Region_3'
df['Operational Category']=df['Wind Speed (m/s)'].apply(CP_group)

df.head(5)

The feature generated is Operational category and windspeed bin, which can be converted into dummy variable for further ML prediction. For operational category, regions are derived from the operational state of wind turbine according to the data provided. Region_1 : Non-operational Region_1.5 :Max Rotor Efficinecy Region_2 : Rated Region_2.5 : Reduced Rotor Efficiency Region_3 : Cut-out

Converting the categorical data into variable

#Checking the data type for better understanding
df.dtypes

LV ActivePower (kW)              float64
Wind Speed (m/s)                 float64
Theoretical_Power_Curve (KWh)    float64
Wind Direction (°)               float64
Bin                               object
Operational Category              object
dtype: object

#Splitting the data into categorical data and float
df_float = df[df.dtypes[df.dtypes == "float"].index]

df_Cat = df[df.dtypes[df.dtypes == "object"].index]

df_float.head(5)

df_Cat.head(5)

#Converting the categorical data into dummy variable for easy analysis
df_Cat = pd.get_dummies(df_Cat)

df_Cat.head(5)

#concatinating the two data type together
Result=df_float.join([df_Cat])

Result.head(5)

	Date/Time	LV ActivePower (kW)	Wind Speed (m/s)	Theoretical_Power_Curve (KWh)	Wind Direction (°)
0	01 01 2018 00:00	380.047791	5.311336	416.328908	259.994904
1	01 01 2018 00:10	453.769196	5.672167	519.917511	268.641113
2	01 01 2018 00:20	306.376587	5.216037	390.900016	272.564789
3	01 01 2018 00:30	419.645905	5.659674	516.127569	271.258087
4	01 01 2018 00:40	380.650696	5.577941	491.702972	265.674286

	LV ActivePower (kW)	Wind Speed (m/s)	Theoretical_Power_Curve (KWh)	Wind Direction (°)
Date/Time
01 01 2018 00:00	380.047791	5.311336	416.328908	259.994904
01 01 2018 00:10	453.769196	5.672167	519.917511	268.641113
01 01 2018 00:20	306.376587	5.216037	390.900016	272.564789
01 01 2018 00:30	419.645905	5.659674	516.127569	271.258087
01 01 2018 00:40	380.650696	5.577941	491.702972	265.674286

	LV ActivePower (kW)	Wind Speed (m/s)	Theoretical_Power_Curve (KWh)	Wind Direction (°)
Bin
[0-45]	1138.411865	7.284140	1425.913297	28.846724
[135-180]	1023.474388	6.438602	1105.973131	162.328981
[180-225]	2080.714602	10.367445	2238.586793	201.573860
[225-270]	711.123621	5.549333	847.241618	246.004417
[270-315]	364.405454	4.081395	428.531340	290.525356

	LV ActivePower (kW)	Wind Speed (m/s)	Theoretical_Power_Curve (KWh)	Wind Direction (°)
Bin
[0.0-0.5]	0.000000	0.365804	0.000000	162.505971
[0.5-1.0]	0.000000	0.778586	0.000000	170.854432
[1.0-1.5]	0.000411	1.260064	0.000000	175.189300
[1.5-2.0]	0.010590	1.759065	0.000000	176.404131
[10.0-10.5]	2337.270905	10.253631	2937.008035	96.637178
[10.5-11.0]	2622.546737	10.748151	3173.479505	95.600305
[11.0-11.5]	2858.359373	11.250353	3351.673813	104.367578
[11.5-12.0]	3142.711254	11.746023	3474.949240	109.737906
[12.0-12.5]	3311.716786	12.235626	3552.269674	114.318559
[12.5-13.0]	3399.230946	12.734817	3591.080794	119.538932
[13.0-13.5]	3461.541262	13.234760	3600.000000	119.903896
[13.5-14.0]	3437.329427	13.735134	3600.000000	123.675148
[14.0-14.5]	3346.402636	14.244160	3600.000000	134.325467
[14.5-15.0]	3277.812776	14.743941	3600.000000	134.458114
[15.0-15.5]	3450.968504	15.246197	3600.000000	133.546871
[15.5-16.0]	3429.492375	15.755537	3600.000000	148.060855
[16.0-16.5]	3472.428529	16.245875	3600.000000	159.263057
[16.5-17.0]	3490.019377	16.744082	3600.000000	160.154239
[17.0-17.5]	3493.432480	17.245462	3600.000000	180.663106
[17.5-18.0]	3547.503648	17.740810	3600.000000	185.310190
[18.0-18.5]	3546.267684	18.247225	3600.000000	187.996120
[18.5-19.0]	3546.643004	18.737283	3600.000000	191.774631
[19.0-19.5]	3553.650078	19.250298	3600.000000	195.528997
[19.5-20.0]	3572.176268	19.720386	3600.000000	196.803280
[2.0-2.5]	0.147161	2.257685	0.000000	165.121134
[2.5-3.0]	1.573017	2.748982	0.000000	158.107974
[20.0-20.5]	3563.737667	20.245513	3600.000000	195.783813
[20.5-21.0]	3570.665316	20.721941	3600.000000	197.231952
[21.0-21.5]	3561.570438	21.229620	3600.000000	198.271231
[21.5-22.0]	3561.487630	21.816660	3600.000000	198.876234
[22.0-22.5]	3571.854932	22.225620	3600.000000	200.274921
[22.5-23.0]	3601.627384	22.782828	3600.000000	197.338250
[23.0-23.5]	3601.182861	23.264395	3600.000000	198.663809
[23.5-24.0]	3601.594330	23.723268	3600.000000	199.374050
[24.0-24.5]	3601.124186	24.090772	3600.000000	188.537265
[24.5-25.0]	3602.022949	24.587030	3600.000000	192.731796
[25.0-25.5]	3600.780029	25.206011	3600.000000	202.970200
[3.0-3.5]	11.050656	3.240103	29.953759	144.551439
[3.5-4.0]	52.400688	3.748835	86.474149	139.208026
[4.0-4.5]	122.285245	4.251673	171.736466	137.531497
[4.5-5.0]	217.627315	4.745835	276.645981	123.819726
[5.0-5.5]	324.317718	5.259840	403.378587	120.212117
[5.5-6.0]	449.559879	5.755264	546.536159	112.437549
[6.0-6.5]	584.742626	6.251456	713.280130	106.344603
[6.5-7.0]	768.078805	6.752764	909.683267	104.912647
[7.0-7.5]	967.532189	7.253347	1135.709619	102.989673
[7.5-8.0]	1198.022103	7.750484	1391.760518	99.941991
[8.0-8.5]	1413.798709	8.250909	1677.695173	97.342031
[8.5-9.0]	1641.452914	8.747404	1983.173994	97.570784
[9.0-9.5]	1867.102063	9.248591	2305.570885	96.043959
[9.5-10.0]	2100.245163	9.748743	2622.305939	92.813551

	LV ActivePower (kW)	Wind Speed (m/s)	Theoretical_Power_Curve (KWh)	Wind Direction (°)
Bin
[0.0-0.5]	0.000000	0.365804	0.000000	162.505971
[0.5-1.0]	0.000000	0.778586	0.000000	170.854432
[1.0-1.5]	0.000411	1.260064	0.000000	175.189300
[1.5-2.0]	0.010590	1.759065	0.000000	176.404131
[2.0-2.5]	0.147161	2.257685	0.000000	165.121134
[2.5-3.0]	1.573017	2.748982	0.000000	158.107974
[3.0-3.5]	11.050656	3.240103	29.953759	144.551439
[3.5-4.0]	52.400688	3.748835	86.474149	139.208026
[4.0-4.5]	122.285245	4.251673	171.736466	137.531497
[4.5-5.0]	217.627315	4.745835	276.645981	123.819726
[5.0-5.5]	324.317718	5.259840	403.378587	120.212117
[5.5-6.0]	449.559879	5.755264	546.536159	112.437549
[6.0-6.5]	584.742626	6.251456	713.280130	106.344603
[6.5-7.0]	768.078805	6.752764	909.683267	104.912647
[7.0-7.5]	967.532189	7.253347	1135.709619	102.989673
[7.5-8.0]	1198.022103	7.750484	1391.760518	99.941991
[8.0-8.5]	1413.798709	8.250909	1677.695173	97.342031
[8.5-9.0]	1641.452914	8.747404	1983.173994	97.570784
[9.0-9.5]	1867.102063	9.248591	2305.570885	96.043959
[9.5-10.0]	2100.245163	9.748743	2622.305939	92.813551
[10.0-10.5]	2337.270905	10.253631	2937.008035	96.637178
[10.5-11.0]	2622.546737	10.748151	3173.479505	95.600305
[11.0-11.5]	2858.359373	11.250353	3351.673813	104.367578
[11.5-12.0]	3142.711254	11.746023	3474.949240	109.737906
[12.0-12.5]	3311.716786	12.235626	3552.269674	114.318559
[12.5-13.0]	3399.230946	12.734817	3591.080794	119.538932
[13.0-13.5]	3461.541262	13.234760	3600.000000	119.903896
[13.5-14.0]	3437.329427	13.735134	3600.000000	123.675148
[14.0-14.5]	3346.402636	14.244160	3600.000000	134.325467
[14.5-15.0]	3277.812776	14.743941	3600.000000	134.458114
[15.0-15.5]	3450.968504	15.246197	3600.000000	133.546871
[15.5-16.0]	3429.492375	15.755537	3600.000000	148.060855
[16.0-16.5]	3472.428529	16.245875	3600.000000	159.263057
[16.5-17.0]	3490.019377	16.744082	3600.000000	160.154239
[17.0-17.5]	3493.432480	17.245462	3600.000000	180.663106
[17.5-18.0]	3547.503648	17.740810	3600.000000	185.310190
[18.0-18.5]	3546.267684	18.247225	3600.000000	187.996120
[18.5-19.0]	3546.643004	18.737283	3600.000000	191.774631
[19.0-19.5]	3553.650078	19.250298	3600.000000	195.528997
[19.5-20.0]	3572.176268	19.720386	3600.000000	196.803280
[20.0-20.5]	3563.737667	20.245513	3600.000000	195.783813
[20.5-21.0]	3570.665316	20.721941	3600.000000	197.231952
[21.0-21.5]	3561.570438	21.229620	3600.000000	198.271231
[21.5-22.0]	3561.487630	21.816660	3600.000000	198.876234
[22.0-22.5]	3571.854932	22.225620	3600.000000	200.274921
[22.5-23.0]	3601.627384	22.782828	3600.000000	197.338250
[23.0-23.5]	3601.182861	23.264395	3600.000000	198.663809
[23.5-24.0]	3601.594330	23.723268	3600.000000	199.374050
[24.0-24.5]	3601.124186	24.090772	3600.000000	188.537265
[24.5-25.0]	3602.022949	24.587030	3600.000000	192.731796
[25.0-25.5]	3600.780029	25.206011	3600.000000	202.970200

	Bin_[0.0-0.5]	Bin_[0.5-1.0]	Bin_[1.0-1.5]	Bin_[1.5-2.0]	Bin_[10.0-10.5]	Bin_[10.5-11.0]	Bin_[11.0-11.5]	Bin_[11.5-12.0]	Bin_[12.0-12.5]	Bin_[12.5-13.0]	...	Bin_[7.5-8.0]	Bin_[8.0-8.5]	Bin_[8.5-9.0]	Bin_[9.0-9.5]	Bin_[9.5-10.0]	Operational Category_Region_1	Operational Category_Region_1.5	Operational Category_Region_2	Operational Category_Region_2.5	Operational Category_Region_3
Date/Time
01 01 2018 00:00	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	1	0	0	0
01 01 2018 00:10	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	1	0	0	0
01 01 2018 00:20	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	1	0	0	0
01 01 2018 00:30	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	1	0	0	0
01 01 2018 00:40	0	0	0	0	0	0	0	0	0	0	...	0	0	0	0	0	0	1	0	0	0