Version: v1.02, at 10 am 20 June 2024
Change Log
This assignment is inspired by the diagnosis of a medical condition called hypopnea. The word hypopnea is derived from the Greek roots hypo meaning under normal and pnea meaning breathing. Informally, hypopnea is sometimes referred to as overly shallow breathing. Hypopnea can be diagnosed by measuring the air flow rate into and out of the lungs together with other measurements. Here, we will only look at the air flow rate. The figure below, which is taken from [1], shows the air flow rate into and out of the lungs of a subject over a duration of about 120 seconds.
Three episodes of hypopnea, as well as their duration, have been highlighted in the figure above. An observation that can be made from the figure is that during an episode of hypopnea, the air flow rate hovered around zero or was much smaller than normal, which means that the subject was breathing a lot less than normal.
In this assignment, you will write Python programs to perform automatic diagnosis inspired by the above example on hypopnea. The aim of your programs is to process a data sequence (given as a Python list of numbers) to determine the starting time and duration of the episodes within the data. The reason why we chose the word inspired is because you will not be using the actual medical criteria for diagnosing hypopnea. We have adapted the diagnostic problem so that, in this assignment, you will have to use the various Python constructs that you have learnt but at the same time giving you a taste on how programming can be used to perform diagnosis automatically.
Although the above example comes from biomedical engineering, there are plenty of examples of automatic diagnosis in all other branches of engineering and science, e.g. diagnosing engine performance, quality control in chemical reactors etc.
By completing this assignment, you will learn:
We begin with describing the data that the algorithm will operate on. We will use the following Python code as an example. In the following, we will refer to the following code as the sample code. Note that the data and parameter values in the sample code are for illustration only; your code should work with any allowed input data and parameter values.
# Flow rate flow_rate = [-4.5, 0.5, 4.5, -0.1, -4.3, -4.1, 0.1, 4.1, 0.4, -4.9, -1.3, 0.2, 1.1, 0.4, 1.1, -1.7, 0.3, 3.1, 0.8, -2.6, -1.5, -0.2, 1.2, 0.6, -4.1, -4.1, 0.1, 4.1, 0.4, -4.9, -1.2, -0.1, 1.2, 0.7, -1.9, -3.9, 0.1, 2.9, 0.5, -2.2, -2.0, 0.5, 1.7, 4.6, 4.7, -3.4, 0.2] # Parameters for the diagnostic algorithm (Algorithmic parameters) segment_len = 5 # Number of data points in a segment interval = [-2.6,3.1] # For determining whether a segment has the symptom threshold = 0.8 # For determining whether a segment has the symptom min_segment = 2 # Minimum number of segments to form an episode # Call the functions (which you will write in the assignment) to determine the episodes episodes = diag.run_diagnostic(flow_rate,segment_len,interval,threshold,min_segment)
In the sample code, the data for the diagnostic algorithm are stored in a list called flow_rate. There are also four algorithmic parameters segment_len, interval, threshold and min_segment; we will explain their meaning later.
A plot of the data is given in the blue line in the following plot.
For this example, there are two episodes where the flow rate is lower than normal and they have been highlighted by the magenta rectangles. The aim of the diagnosis is to determine all the episodes in the given flow rate data. We will now describe the requirements.
(Divide the flow rate data into segments and determine whether each segment has the symptom) We first divide the given flow rate data into a number of non-overlapping segments. The number of data points in each segment is given by the variable segment_len which has the value of 5 in the sample code. Because of this value of segment_len, the first segment will contain the data points:
flow_rate[0], flow_rate[1], flow_rate[2], flow_rate[3], flow_rate[4].
The second segment will contain the data points:
flow_rate[5], flow_rate[6], flow_rate[7], flow_rate[8], flow_rate[9],
and so on. The list flow_rate given in the sample code contains 47 data points, so we will get 9 complete segments. The two remaining data points (flow_rate[-2], flow_rate[-1]), will be discarded and will not be used. When we typeset the sample code above, we have purposely put 5 elements in each row for flow_rate, so that each row (other than the last one) is a complete segment.
The next step is to determine whether each segment has the symptom that we are looking for. Intuitively, we will say that a segment has the symptom if most of the data points in the segment has a smaller amplitude than normal. We will use the algorithmic parameters interval and threshold to determine whether a segment has the symptom. The parameter interval is used to determine whether a data point has smaller amplitude than normal and the parameter threshold is used to determine whether most points in a segment are small in amplitude.
The parameter interval is a list with 2 elements, and the parameter threshold is a scalar. In the sample code above, interval is the list [-2.6,3.1] and threshold is 0.8. We will use these values in an example to explain how you should use them. With the given values of interval and threshold, we say that a segment has the symptom if a fraction of 0.8 or more of the data points in a segment are between -2.6 and 3.1, inclusive of the end-points. The following table shows the calculation to determine whether the 9 segments in flow_rate have the symptoms or not.
Note that the algorithmic parameters segment_len,
interval and threshold
may take on different values in different tests.
After computing whether each complete segment has the symptom, we can summarise the results in a Python list of Boolean values. We will refer to this list using the variable name disorder_status where disorder means the symptom is present. For the flow_rate data in the sample code, the variable disorder_status is:
disorder_status = [False, False, True, True, True, False, True, True, False]
Note that there are 9 elements in disorder_status and they correspond to the 9 complete segments in the given flow_rate. Note also that you can obtain disorder_status from the right-most column in the table above.
The next part of the computation is to determine the episodes from the
variable disorder_status.
An episode is formed by consecutive segments that have symptoms and an episode must have a minimum number of segments. The algorithmic parameter min_segment specifies the minimum number of segments an episode must have. The value of min_segment is 2 in the sample code but its value can change from test to test.
The determination of the episodes requires only two variables: disorder_status and min_segment. For min_segment equals to 2, the variable disorder_status given above has two episodes, which are highlighted by the orange colour:
[False, False, True, True, True, False, True, True, False]
The first episode starts in the third segment (corresponding to a Python list index of 2) and a duration of 3 segments. The second episode starts in the seventh segment (corresponding to a Python list index of 6) and a duration of 2 segments. We will summarise the information on the episodes by using a list of lists as follows:
[[2,3],[6,2]]
The first list [2,3] corresponds to the first episode. The first element 2 in [2,3] is the Python list index of the segment that the episode begins and the second element 3 is the number of segments in the episode. Similarly for the second list. The variable episodes, in the last line of the sample code above is expected to take on the value of this list of lists.
Let us consider the case where the variable min_segment has the value of 3 instead. Then, in this case, the variable disorder_status given above has only one episode, which is highlighted by the orange colour:
[False, False, True, True, True, False, True, True, False]
This is because each episode is now required to have at least 3 segments. We will summarise the information on the episodes by using a list of lists as follows:
[[2,3]]
If we further increase the variable min_segment
to the value of 4, then there are no episodes in the variable disorder_status
given above. In this case, we summarise the information on the episodes by
using an empty list, i.e.
[].
The description above shows how the data (flow_rate) and algorithmic parameters (segment_len, interval, threshold, min_segment) are used to compute the episodes. Note that the algorithmic parameters must be valid so that the computation can be carried out. We require that your code performs a number of validity checks before computing the episodes. For example, the algorithmic parameter segment_len must be a positive integer greater than or equal to 1 for it to be valid, otherwise it is not valid. The following table state the requirements for the algorithmic parameters to be valid and what assumptions you can make when testing.
Algorithmic parameters | Requirements for the parameter to be valid | Assumptions you can make when testing |
segment_len | A positive integer greater than or equal to 1 | You can assume that, when we test your code, the given segment_len
is always a number (int or float). In other words, the given segment_len cannot be of data type str, list etc. For example, when we test your code, we may give segment_len a value from say 1, 5, -6, -7.3, 2.7. Out of these, 1 and 5 are valid, while the others are not. |
interval | interval[0] must be strictly less than interval[1] | You can assume that the given interval
is always a list with 2 numbers (int or float). For example, when we test your code, we may give interval the values of say [-10,-5.7], [10, 5.7] Out of these, [-10,-5.7] valid, while [10, 5.7] is not. |
threshold | A float strictly between 0 and 1, i.e. 0 and 1 not included. | You can assume that the given threshold is always a number (int or float) |
min_segment | A positive integer greater than or equal to 1 | You can assume that the given min_segment is always a number (int or float). |
You can assume that the given flow_rate is always a list. This list can be empty. If the list is not empty, then its elements are either int or float. In order for the computation described above to be carried out, the number of elements in flow_rate must be greater than or equal to the product of the algorithmic parameters segment_len and min_segment; you should only carry out the computation if there are enough data in flow_rate.
You need to implement the following four functions. All these four functions working together will implement the the automatic diagnosis.
The requirement is that you implement each function in a separate file. This is so that we can test them independently and we will explain this point here. We have provided template files, see Getting Started.
1. def has_symptom(data_segment, interval, threshold):Additional requirements: In order to facilitate testing, you need to make sure that within each submitted file, you only have the code required for that function. Do not include test code in your submitted file.
Clarification: Since run_diagnostic() will only proceed to determine the episodes if all the parameters are valid and there are enough data, you are allowed to assume that when we test the correctness of has_symptom(), flow_rate_to_disorder_status() and find_episodes(), all the algorithmic parameters are valid.
Test your functions thoroughly before submission.
You can use the provided Python programs (files like test_has_symptom.py
etc.) to test your functions. Please note that each file covers a limited
number of test cases. We have purposely not included all the cases
because we want you to think about how you should be testing your code.
Note that the file test_2_data.txt contains the flow rate data for the test file test_run_diagnostic_2.py.
We will test each of your files independently. Let us give you an example. Let us assume we are testing three files: prog_a.py, prog_b.py and prog_c.py. These files contain one function each and they are: prog_a(), prog_b() and prog_c(). Let us say prog_b() calls prog_a(); and prog_c() calls both prog_b() and prog_a(). We will test your files as follows:
You need to submit the following four files. Do not submit any other files. For example, you do not need to submit your modified test files.
To submit this assignment, go to the Assignment 1 page and click the tab named "Make Submission".
Criteria | Nominal marks |
Function has_symptom.py | 4 |
Function flow_rate_to_disorder_status.py | 5 |
Function find_episodes.py (Case 1: One or more episodes but none of the episodes include the first or last complete segment) | 3 |
Function find_episodes.py (Case 2: no episodes) | 1 |
Function find_episodes.py (Case 3: One or more episodes but some of the episodes include the first and/or last complete segment) | 3 |
Function run_diagnostic.py Case 1: Expected output is the string 'Corrupted input' | 2 |
Function run_diagnostic.py Case 2: Expected output is the string 'Not enough data' | 1 |
Function run_diagnostic.py Case 3: Expected output is a list of lists or an empty list. | 1 |
You need to properly explain your submission to a tutor. Your ability to properly explain your answers/solutions will determine your final grade for this assignment.
You are reminded that work submitted for assessment must be your own. It's OK to discuss approaches to solutions with other students, and to get help from tutors, but you must write the Python code yourself. Sophisticated software is used to identify submissions that are unreasonably similar, and marks will be reduced or removed in such cases.
[1] Jennifer Accardo and Jennifer Reesman, "Can you hear me snore?".
Journal of Clinical Sleep Medicine, Vol. 9, Number 11.
http://jcsm.aasm.org/ViewAbstract.aspx?pid=29203