Due date: 5pm, Friday 4 April (Week 7).
Late submissions will be penalised at a rate of 5% per day. This penalty applies to the initial mark you receive.
Submissions will generally not be accepted after 5pm, Wednesday 9 April (Week 8).
The use of generative AI is forbidden for this assignment. All work submitted must be your own. You may be asked to discuss your code with your tutor. Your final mark for this assignment will be based on how well you can explain your answer for the assignment and other related problems.
Version: v1.06 on 02 April 2025
irradiance_time_series
in the section titled (Averaging the irradiance data) and Dealing with Different Amounts of Data) should only be applied in fault_detection_main
. All other functions will be tested only with valid/complete input.Automatic detection of faults can be found in many engineering systems. There are systems to automatically diagnose faults in engines, chemical plants, power generation plants, robotic arms and on on.
This assignment is inspired by a fault detection system in a photovoltaic (PV) plant [1]. A PV plant (Wikipedia page on PV power station) is a collection of solar panels which converts solar energy into
electrical power. However, sometimes the plant does not work correctly which results in, for example, less
electrical power being generated than it should be. If this is the case, the plant technicians should be alerted
automatically so that they can fix the faults as soon as possible.
In this assignment, you will write Python programs to perform fault detection. The aim of your program is to
process data sequences of solar irradiance and power to determine whether there are faults and if so, when they
have occurred.
Note that we chose the word inspired earlier because we have adapted the fault detection problem in [1] as a programming assignment by simplifying and liberally changing a few aspects of the original problem. In particular, we have made changes so that, in this assignment, you will have to use the various Python constructs that you have learnt. This means a few details of this assignment may not be realistic in engineering terms, but on the whole, you will still get a taste on how programming can be used to perform automatic fault detection.
By completing this assignment, you will learn:
The algorithm uses two sets of measurements. The first is the amount of solar irradiance which is the quantity
of solar radiation falling on the solar panels. The second is the amount of electrical power generated by the solar
panels; we will simply refer to that as power or power generated.
The key idea of the fault detection algorithm is to use the measured irradiance and power to determine whether a fault has occurred. For a given amount of irradiance, the algorithm uses a model (which in this case is a formula) to predict what the expected amount of power the PV plant should generate. After that, the algorithm compares the power predicted by the formula against the measured power. If the difference between these two quantities is too big then the algorithm will decide that a fault has occurred.
We begin with describing the data that the algorithm will operate on. We will use the following Python code as an example. In the following, we will refer to the following code as the sample code. Note that the data and parameter values in the sample code are for illustration only; your code should work with any allowed input data and parameter values.
# Data: irradiance and power # Irradiance measurements in W/m^2 irradiance_time_series = [ 240.2, 220.1, 260.2, 280.7, 256.5, 320.3, 300.7, 267.1, 321.2, 234.5, 421.7, 476.2, 321.6, 329.7, 323.4, 407.9, 456.7, 489.3, 521.5, 534.6, 543.7, 567.5] # Generated power measured in kW power_time_series = [31.2, 27.5, 55.5, 44.2, 58.38, 53.52] # Parameters for the fault detection algorithm # Data sampling times in minutes irradiance_sampling_time = 12 power_sampling_time = 60 # Parameters of the model to predict the power generated for # a given level of irradiance a0 = 0.086 a1 = 3.44e-5 a2 = 3e-3 model_para = [a0, a1, a2] # Margin in power measurment to decide whether it is a fault or not margin = 10.0 # in kW # Call the fault detection function import fault_detection_main as fd fault_status_output = fd.fault_detection_main(irradiance_time_series, power_time_series, irradiance_sampling_time, power_sampling_time,model_para,margin) |
In the sample code, there are two data series which contain, respectively, the irradiance and power
measurements. Both series are Python lists whose entries are of the float type. Their variable names are
irradiance_time_series and power_time_series. The irradiance is
measured in Watts per square metre and power generated is measured in kilowatts.
In the sample code, the irradiance and power measurements were collected once every 12 and 60 minutes
respectively. These values are stored in the variables irradiance_sampling_time and
power_sampling_time.
(Remark: In [1], the irradiance measurements were taken once every 5s, which is a more
realistic sampling time. We have chosen a sampling time of 12 minutes for irradiance so that the length of the list
irradiance_time_series will not be exceedingly long in this example.)
(Averaging the irradiance
data)
Similarly for power_time_series[2] and power_time_series[3].
Although we can make correspondence between power_time_series[4] and the last two irradiance measurements, the correspondence is incomplete and therefore these data are not usable. Also, there are no irradiance measurements corresponding to the last power measurement, which means this power measurement is not usable.
Since we can only make (complete) correspondences between the first 4 power measurements and the first 20
irradiance measurements, so we will only use these measurements for fault detection.
We will divide the first 20 irradiance measurements into non-overlapping segments of 5 data points and compute
the average of each segment. This is so that each segment of irradiance measurements corresponds to one power
measurement. The table below illustrates the computation. We have added a segment number so that we can refer to
them later on. Note that the segment number also corresponds to the indices in the variable power_time_series.
Segment number |
Data in the segment from irradiance_time_series | Average |
Value |
0 |
240.2, 220.1, 260.2, 280.7, 256.5 | (240.2 + 220.1 + 260.2 + 280.7 + 256.5) / 5 |
251.54 |
1 |
320.3, 300.7, 267.1, 321.2, 234.5 | (320.3 + 300.7 + 267.1 + 321.2 + 234.5) / 5 | 288.76 |
2 |
421.7, 476.2, 321.6, 329.7, 323.4 | (421.7 + 476.2 + 321.6 + 329.7 + 323.4) / 5 |
374.52 |
3 |
407.9, 456.7, 489.3, 521.5, 534.6 | (407.9 + 456.7 + 489.3 + 521.5 + 534.6) / 5 |
482.00 |
For the irradiance_time_series given in the sample code, we can summarize this averaging
as returning a list whose entries are [251.54, 288.76, 374.52, 482.00]. For ease of reference,
we will refer to this list by using the name irradiance_time_series_average
later on.
Note that we rounded the numbers in the last column to 2 decimal places for display only. You should not be rounding any of your calculations in this assignment.
(Use the average irradiance and model to predict the expected power generated)
The next step is to use the average irradiance in each segment to predict the expected amount of power generated. To do that, we use a model (which in this case is a formula) to calculate the expected power from irradiance. We first define some notation:
The formula is:
P = G (a0 + a1 G + a2 log(G))
where log is the natural logarithm.
By using the values of a0, a1 and a2 from the sample code, and the average irradiance calculated earlier, we can
calculate the expected power generated for each time segment:
Segment number |
Average irradiance |
Predicted power generated (rounded to 2 decimal points for display only) |
0 |
251.54 | 27.98 |
1 |
288.76 | 32.61 |
2 |
374.52 | 43.69 |
3 |
482.00 | 58.38 |
The next step is to compare the predicted power against the measured power. We will use the algorithmic parameter margin which is defined in the sample code.
If the value of measured power minus predicted power is less than or equal to margin and
bigger than or equal to -margin, then the decision is that there is no fault because the
measured power is sufficiently close to the predicted power; otherwise, there is a fault. For example, by using the
values of margin from the sample code, we have:
(Performing fault detection for a
time-series)
The above examples show how the fault detection is to be performed for two power measurements. The following
table summarizes the result of fault detection for the time series.
Segment number |
Average irradiance | Predicted power generated |
Measured power |
Measured power minus predicted power |
Fault (True if it is a fault) |
0 |
251.54 | 27.98 | 31.2 | 3.22 |
False |
1 |
288.76 | 32.61 | 27.5 | -5.11 | False |
2 |
374.52 | 43.69 | 55.5 | 11.81 | True |
3 |
482.00 | 58.38 | 44.2 | -14.18 | True |
We will use a list to indicate when the faults had occurred. For the above example, we will represent the faults in the data series using [2,3] because the measurements power_time_series[2] and power_time_series[3] are determined to be faults.
In the case where there are no faults, we will indicate that by an empty list [ ].
The following figure illustrates the fault detection decision making. The solid blue dots show the predicted power generated for the average irradiance. The vertical lines are centred at the predicted power and have a height of 2*margin. The power measurements are plotted with crosses. If the cross is within the vertical line, then it is not a fault; otherwise, it is.
(Determining the false alarms)
After a fault detection algorithm has been designed, the engineers will want to check how well the algorithm is
in catching the faults. One way that they can do that is to monitor the PV plant manually to determine whether
actual faults have occurred. There are two possible types of error:
Let us follow on from the above example. The fault detection algorithm says the power measurements [2,3] are faults. Let us, for the sake of illustration, say that the real faults are [1,2]. In this case, the real fault at 1 is a missed detection because it is
not detected by the detection algorithm. On the other hand, the fault detection algorithm claims that there is a
fault at 3 but it is in fact a false alarm. If we store the results from the fault
detection in a list called your_fault_list and the real faults in a list called
real_fault_list. For this example,
A task for this assignment is to determine the false alarms from the given your_fault_list and real_fault_list. For this assignment, you will store the false alarms in a list. In this example, it is [3]. In the case where there are no false alarms, that should be indicated by an empty list [ ].
Note that the engineers should also be interested in missed detection, but the calculation is very similar to false alarms, so we will not ask you to do that.
The description above shows how the data (irradiance_time_series, power_time_series) and algorithmic parameters (irradiance_sampling_time, power_sampling_time, model_para, margin) are used to determine when the faults occur. Note that the algorithmic parameters must be valid so that the computation can be carried out. We require that your code performs a number of validity checks before determining if there are any faults. For example, we assume that the algorithmic parameter irradiance_sampling_time is required to be a strictly positive integer. The following table states the requirements for the algorithmic parameters to be valid and what assumptions you can make when testing.
Algorithmic parameters | Requirements for the parameter to be valid | Assumptions you can make when testing or further explanation |
irradiance_sampling_time | Data type must be int and its value is strictly
positive. Hint: The python expression (type(x) is int) will return True if variable x is of the type int; False otherwise. |
Examples of invalid parameter values are -5, -5.2, 5.7. You can assume that, when we test your
code, irradiance_sampling_time is always a number |
power_sampling_time | Data type must be int and its value is strictly positive | You can assume that, when we test your code, power_sampling_time is always a number |
irradiance_sampling_time, power_sampling_time |
The value of power_sampling_time must be an integral multiple of the value of irradiance_sampling_time | For example, if power_sampling_time is
12 and irradiance_sampling_time is 7, then the given parameters are invalid
because 12 is not an integral multiple of 7. You can also assume that power_sampling_time and irradiance_sampling_time are given in the same unit. |
model_para | Must have exactly 3 entries in the list |
You can assume that the given model_para is always a list and its entries
are always numbers (int or float). For example, if the given model_para has four entries, then it is invalid. |
margin | Must be a strictly positive number |
You can assume that the given margin is always a number (int or float). |
The above sample code shows the situation where the overall duration of power measurements (6 samples times 60
minutes = 360 minutes) is more than that of the irradiance measurements (22 samples times 12 minutes = 264
minutes). The above example shows that we should only be using the first 4 power measurements and the first 20
irradiance measurements.
Another situation is when the overall duration of power measurements is less than that of the irradiance
measurements. Consider the following code:
# Irradiance measurements in W/m^2 irradiance_time_series = [ 240.2, 220.1, 260.2, 280.7, 320.3, 300.7, 267.1, 321.2, 421.7, 476.2, 321.6, 329.7, 407.9, 456.7, 489.3, 521.5, 543.7, 567.5] # Generated power measured in kW power_time_series = [31.2, 27.5] # Data sampling times in minutes irradiance_sampling_time = 15 power_sampling_time = 60 |
From the sampling times, we know that 1 power measurement corresponds to 4 irradiance measurements. In this case, all the power measurements and the first 8 irradiance measurements should be used to determine the faults.
When the overall duration of power measurement is equal to that of irradiance measurements, you should use all the measurements.
In order for the fault detection algorithm to run, there must be enough power and irradiance measurements. The
requirements are:
You can assume that, when we test your assignment, both irradiance_time_series and power_time_series are lists, and their entries are always of the float type. You can assume that the entries in irradiance_time_series are bigger than or equal to 1.
You need to implement the following six functions. The first five functions working together will implement
the fault detection algorithm. The sixth function finds the false alarms.
The requirement is that you implement each function in a separate file. This is so that we can test them independently and we will explain this point here. We have provided template files, see Getting Started.
1. def calc_average(time_series, segment_length):Additional requirements: In order to facilitate testing, you need to make sure that within each submitted file, you only have the code required for that function. Do not include test code in your submitted file.
Test your functions thoroughly before submission.
You can use the provided Python programs (files like test_calc_average.py etc.) to test your functions. Please
note that each file covers a limited number of test cases. We have purposely not included all the cases
because we want you to think about how you should be testing your code. You are welcome to use the forum to
discuss additional tests that you should use to test your code.
We will test each of your files independently. Let us give you an example. Let us assume we are testing three files: prog_a.py, prog_b.py and prog_c.py. These files contain one function each and they are: prog_a(), prog_b() and prog_c(). Let us say prog_b() calls prog_a(); and prog_c() calls both prog_b() and prog_a(). We will test your files as follows:
You need to submit the following six files. Do not submit any other files. For example, you do not need to submit your modified test files.
To submit this assignment, go to the Assignment 1 page and click the tab named "Make Submission".
Criteria | Nominal Marks |
calc_average.py | 3 |
power_prediction.py | 3 |
fault_detection_one_sample.py | 3 |
fault_detection_time_series.py | 3 |
Case 1 for fault_detection_main.py: Expected output is the string 'Corrupted input' | 2 |
Case 2 for fault_detection_main.py: Expected output is the string 'Not enough data' | 1 |
Case 3 for fault_detection_main.py: Expected output is a list or an empty list | 3 |
find_false_alarms.py |
3 |
You are reminded that work submitted for assessment must be your own. It's OK to discuss approaches to solutions with other students, and to get help from tutors, but you must write the Python code yourself. Sophisticated software is used to identify submissions that are unreasonably similar, and marks will be reduced or removed in such cases.
Note that some aspects of this assignment are not realistic. We mentioned the sampling time of irradiance earlier. Also, we have neglected the dependence on temperature, which is in [1].
[1] R. Platon et al., Online Fault Detection in PV Systems. IEEE
Transactions on Sustainable Energy, Vol. 6, No. 4, Pages 1200-1207, October 2015. https://ieeexplore.ieee.org/document/7098398