I remember from the Organizational Behaviour course that I attended long back, a discussion around the inherent limitations of any screening examination. It is one of those many things I remember from the wonderful lectures of Prof. Anantharaman. He was such a nice teacher. I want to blog about this issue now and see how far I can get it right.
It is now a well accepted hypothesis that the mental abilities (as measured, eg., by an IQ test) of a large population follow a normal distribution. Let us consider an examination where the examinees are being tested for a particular kind of such mental ability (more of an analytical aptitude than just information about a subject) that presumably also follows a bell curve of distribution. For simplicity, we call the stuff that is being tested for as ability
. Though examination is the only way one can learn about the actual ability
of the examinee, let us take it for granted that there is such a thing called actual ability and we have access to it for the sake of analysis. The marks obtained in the examination will then be called as measured ability
A perfect exam is that which shows a perfect correlation between the actual and measured abilities. If we were to plot the actual ability along the x-axis and the measured ability along the y-axis, every examinee will lie on the 45 degree line. The perfect exam captures the actual ability of every examinee and can sequence them all so well. Figure 1 shows such a plot. Size of the points show the number of examinees in that range. The exam is designed to have the entire range of marks to capture the fine differences between all the examinees, both at the top-right (clever ones) and at the bottom-left (dumb ones). The marks will show the same bell shape as the actual ability.
Obviously no such thing like a perfect exam exists. An exam must be completed before the examinees drop dead and must be evaluated in a reasonable time frame and as objectively as possible. This puts a limit on the range of questions (therefore, marks) for the measurement and defines the measurement window as shown in figure 2. M1 and M0 are the highest and lowest possible marks that could be obtained. There will then be examinees who will cluster at the boundaries (shown by the arrows), the actual number of them depending on where the boundaries of this window lie within the range of variation of actual ability.
A public/finishing exam is usually conducted for a very large population at that level. The measurement window is chosen to spread across a wide range. If the exam is designed and conducted properly and in the absence of negative marks, the percentage of marks obtained and the percentile will be similar and will have a reasonable correlation with the actual ability. However, the unavoidable clustering at the top due to the finite size of the measurement could be problem. As in eg., if one needs to resolve and sequence the clever examinees for the purpose of, say, admission to a course that has limited number of seats - so limited that only top 2% are to be picked up. Selection of a small fraction and sequencing them reliably is the perceived need. The problem arises because of the inherent scatter in the data.
The scatter in the data arises because of the following reasons. Certain examinees could obtain marks well above their actual ability because (a) they were lucky that the questions asked were closely from what they understood; (b) they undertook targetted training; (c)they cheated; or just that (d)their marks were influenced.
While even a well designed exam, due to the finite time and effort constraints imposed on it, cannot avoid (a) and (b) to some extent, the meticulous conduct and evaluation adhering to good norms can surely avoid (c) and (d).
Certain examinees could obtain marks well below their actual ability because they were (e) not calm and composed (last minute rush, bereavement in family, so many genuine as well as other reasons), (f) not in the best of moods (family pressure, health and what not) during the exam or (g) just plain unlucky that what they understood well was not tested enough in the exam.
The converse of (d) can be taken care of by the conduct of the exam. (g) is the converse of the limitation that led to (a) and can be only minimized but not eliminated fully. (e) and (f) are reality. And reality bites.
What this scatter does is to spread the examinees on either sides of the 45 degree line as shown in figure 3. In the worst case, the points could look like a circular cloud at the center. If the range M1 to M0 is to cover most of the examinee population at the average, then MX is going to be too close to M1. 2% is such a small fraction.
Contradiction of purpose
The finite nature of the measurement window leads to clustering at the ends (the upper end near M1 being our interest). The vagaries of the reality and the constraints on the exam lead to the scatter. This makes an exam that is designed to cover most of the students at the average as unsuitable for selection and sequencing of a very small percentage at the top defined by those between M1 and MX in figure 3.
The solutions could be the following
- Reduce the scatter: By constant improvements in the exam design and meticulous conduct and evaluation, this is what is being aimed at by the organizers of the selection test. Hopefully.
- Shift the limit M1 up and M0 down. This needs an increase in the duration and marks of the exam: not practicable due to time and efforts constraint on the examinees as well as evaluation system.
- Shift the the limit M1 up and M0 also up. This means loss of resolution at the bottom end at the advantage of increased resolution at the top. Since we are interested in only selection at the very top, this is one possible option
Selection at the top
Assume we have pushed both M1 and M0 up and have zoomed on to the plot at the top right. It would perhaps look like figure 4. The bell curve on the x-axis shows which section of the population is being looked at. We can make the following observations.
- The upper limit of measurement M1 is designed and set so high that no one from the finite population of examinees can cross it. The clustering at the top end is avoided and the first few ranks are reliable.
- If M0 is already near the ability level of the average among the population, there is going to be heavy clustering at the left-bottom. In addition, if negative marks are used to keep the window of M1-M0 large, the percentage of marks obtained is no longer correlated with the percentile. It does not make sense to compare it with the percentage of marks obtained for the broad kind of public exam discussed above.
- Remember that MX is set such that the number of examinees that have marks between M1 and MX equals the limited number of seats for the course. Because the finite sized measurement window is on the higher end, the resolution at the top improves and the scatter at the top-right is reduced. Due to this, the cut off MX is no longer close to M1. This minimizes the number of examinees in II quadrant. Why is it important? Read on.
Paradoxical II quadrant
In figure 4, you see four quadrants labelled. The point where the cutoff line MX meets the 45 degree line, we drop a vertical down. This vertical line is the cut off of actual ability which is aimed at for the selection into the course. The meaning and implication of the quadrants is given below.
- The I quadrant shows all the examinees who have scored more marks than the cutoff limit and also have higher actual ability. These are the ones that really deserve to be selected and are also selected.
- The III quadrant shows the examinees who have scored less marks than the cutoff and also have lesser actual ability. These donot deserve to be selected and are also not selected.
- The IV quadrant shows those examinees who have scored marks more than MX but have an actual ability less than intended cutoff (the vertical line). These are the lucky fellows who made it in, thanks to the scatter.
- Now, the II quadrant shows all those poor examinees who have scored less marks than the cutoff MX but have higher actual ability. They could not make it in, because of the scatter.
In the case of selection of astronauts for example, no risk can be taken with regard to those from the IV quadrant. So the limit MX is kept so high that this quadrant is practically eliminated
But the aim of any good selection test for higher education should be to minimize the II quadrant. Looking at the nature of the scatter, it can be minimized only if the limit M1 goes up and the scatter is reduced. Think about it, these two cannot go together all that along. Tougher exam means more coaching to crack it, more pressure to make it and these are behind the scatter. Now you see the paradox?
The II quadrant can also be minimized if MX can be lowered. See it?
- If there is no need to pick only a small fraction at the top, then there is no issue only! The craze or necessity to be at the top should go.
- If there is no need to sequence the examinees, then again life is easy. Let bunches of examinees get into groups of courses - the finer sequencing can be an internal matter for the groups of courses later.