The Instigator
Pro (for)
0 Points
The Contender
Con (against)
12 Points

Confidence assessment marking

Do you like this debate?NoYes+4
Add this debate to Google Add this debate to Delicious Add this debate to FaceBook Add this debate to Digg  
Vote Here
Pro Tied Con
Who did you agree with before the debate?
Who did you agree with after the debate?
Who had better conduct?
Who had better spelling and grammar?
Who made more convincing arguments?
Who used the most reliable sources?
Reasons for your voting decision - Required
1,000 Characters Remaining
The voting period for this debate does not end.
Voting Style: Open Point System: 7 Point
Started: 10/4/2011 Category: Education
Updated: 6 years ago Status: Voting Period
Viewed: 3,706 times Debate No: 18614
Debate Rounds (3)
Comments (11)
Votes (3)




Resolved: Confidence assessment marking is a better test marking system than the one currently in place


Often people discuss the merits of standardised tests, how to improve them etc. One thing most people forget to think of is of whether an ideal marking scheme is currently in place, as how something is marked effects how student understanding is evaluated. While it would be entertaining to compare 2 idealistic systems, this debate will focus on the preliminary step of establishing faults in the current system.

A few points and rules before arguments:


  1. 1. ‘Better’ refers to being on the whole superior.
  2. 2. ‘Confidence assessment marking’ and ‘the one currently in place’ will refer to the systems Pro illustrates.
  3. 3. If there are issues, ask in the comments prior to the debate
  4. 4. No ridiculous semantics!

Other points

  1. 1. Confidence assessment marking is only being assessed on it’s merits for the following type of questions:

a) Multiple choice

b) 1 word answer (say, a science definition)

c) Questions like those in maths – ie. Sqrt. 196 where the student supplies their own answer

This is because of problems in marking of things like essays where deduction isn’t very fair given the way essays are scored. Repeat: Only the above types of questions are what we are debating on. Clarify in the comments if needed.

Background reading

This is not essential reading, yet I recommend it. (where I first got this idea/ explored other ideas, useful for understanding the proposed system in 2.3) (previous debate of mine on a slightly different topic; I’m using similar arguments so this might be a useful read)

The 2 Systems

Current marking system

To get an idea of the 2 systems, we’ll start with the standard system. Say there’s 100 1 mark questions for a score of 100. Let’s also say that the students gets 50 correct and 50 wrong.

50 correct questions = 50 marks

50 wrong answers = 0

Total = 50/100

Note that the current system rewards guessing, and does not effectively distinguish between differing levels of student belief in their answer.

Confidence Assessment

Testing occurs as usual in that students answer questions and such. Also similar is that questions are given a rank of some sort, or put simply, a number of marks which the question is worth (difference between a 1 and a 4 mark question indicates complexity for example) . For example, say a student says the answer to 2+2 is 4 on a test that uses the above system and the question is worth 1 mark. The student is awarded 1 mark.

What differs in confidence assessment is that a student’s faith in their answer is also tested. This is done by another mark (a ‘confidence level’) attached to their question answer - 1,2 or 3. This confidence level is what the student is awarded for a correct answer (times whatever the question value is) , while 0, -2 and -6 are awarded (respectively) otherwise .

Why confidence assessment is superior

Now that everything is cleared up (hopefully) , I’ll begin to try and demonstrate why confidence assessment is superior. The main points to consider are the problems with rewarding guessing, the benefits of assessing student belief in their answer (as opposed to not) and how feedback to both students and teachers is affected by the marking schemes. Where necessary though, other points will be raised.

Problems with rewarding guessing

Put simply, the reason rewarding guessing is bad is mark inflation. Ie. If it’s multiple choice with A,B,C and D and you don’t know the answer to 4 questions, you’ll likely get 1 right through pure guesswork. Try this in a full blown test and to get 40% one need only know 20% and on average (using 4 choices) they will get to 40% through guesswork. Obviously, students in reality make informed guesses normally, but this demonstrates a point at worst.

First point is that mark inflation skews feedback. While I cover this in greater depth later, put simply this means that rewarding guessing creates skewed results for teachers when they are trying to assess student ability, and also can make students overconfident in their own abilities.

Secondly, a mindset of immediate guessing should not be promoted. That’s not to say that guessing should be discouraged. Say you had a 20% chance to win a million dollars for $10 . You’d do it. It’s also not to say that one shouldn’t do anything unless they are 100% sure. Rather, it is best to refrain from taking some actions when one’s chances can be improved. Let’s demonstrate this:

David wants to buy shares. Due to various reasons he can only be 2/3 sure he is making a sound financial investment. Let’s also say that due to financial hardship David can only make one investment; ie. He can’t throw caution to the wind and invest with these odds in 3 stocks hoping to win on 2 and lose on 1 while still making money.

Now, the question is should David invest? No. Why? Because in real life David could instead improve his chances to make a sound investment (say, research deeper, learn terms he doesn’t understand etc.) . It’s much better for testing to cultivate a mindset of learning more as opposed to instinctive guessing.

Benefits with testing confidence

Confidence is key in everyday life, affecting whether we do something, whether we get help, research etc. Therefore, it makes sense to diagnose a student’s confidence.

First, a medical student. Let’s say the student gives a very confident yet incorrect diagnosis. This could have ramifications if done on a patient (try wrong drugs for treatment), so they are penalised heavily. This is opposed to the student being very unsure of their answer. If a doctor was unsure of their diagnosis, they won’t operate and will consult others for help or think things over again as opposed to blindly operating.

Secondly, confidence assessment helps differentiate bright from average students. In the case of an easy test where many would do well, it is more likely that the brighter students would have a higher confidence level, resulting in higher scores. This would also apply in more difficult tests where worse students might tentatively put a ‘1’ on a guess of theirs, yet a good student would put a ‘3’. It seems only fair to award certainty in decision making.

Thirdly, confidence assessment is very helpful in teaching. As A.R. Gardner Medwin from the Dept. Physiology, University College London states in :

“In many situations, university teachers are faced with a wide range of student backgrounds and abilities. It is important to identify and assist students early on, where knowledge is weak or absent. Because of the diversity of student problems, this is an area where automated techniques can help a great deal. One of the merits of confidence assessment is that it is possible to give students relatively easy basic tests without seeming condescending to good students who get them mostly correct. These students get a kick out of getting the answers right at maximum confidence, while the weak students can answer questions initially with low confidence and then increase confidence as they repeat exercises in an area. We use the system in this way, for example, to help with the basic mathematical skills necessary for a BSc degree.”


Testing is typically done to assess student understanding of various things taught to them. To succeed in this, teachers must be able to get accurate results. Mark inflation skews this and so makes the quality of feedback less.

For students, they are given a false indication of their own abilities, letting them down when content is revisited (as is often done in high school for instance) when they think they know something, don’t practice and then do badly.

More on this point in later rounds when I have more characters.


For the reasons above, confidence assessment marking is a superior choice.



I thank my opponent for instigating this interesting topic.

Despite his anti-guessing rhetoric, my opponent wants to reward guessers even more than I do. Take, for instance, a multi-choice quiz with 5 options and 20 questions. Guessing randomly, a student should receive an average mark of 4/20, or 20%, under the status quo. Obviously some students will be lucky and get a higher mark, and some unlucky and get a lower mark, but 20% would be the average. Now let's add in confidence ratings. The maximum mark is now 60, but the minimum is -120, giving a marking range of 180 points. Assume the student puts a confidence rating of '2' for every question (+2 or -2 on my opponent's round 1 rubric). 4 questions were correct, giving the student 8 marks, and 16 questions were wrong, giving the student -32 marks. The student ends up with -24 marks, or 96/180. Therefore, my opponent proposes that randomly guessing students should receive an average mark of not 20%, but 53.3%!

Now let us suppose you are a clever student who studies hard and gets above-average grades. You score 16/20 in the quiz. Under the status quo, your mark would be 80%. Again, let's add in confidence ratings to see the difference. The marking range is still 180 points. Assume the student puts a confidence rating of '2' for every question (+2 or -2). 16 questions were correct (32 marks) but 4 questions were wrong (-8 marks) giving a sum total of 24 marks, or 144/180. Put another way, your score would still be 80%. Therefore, you do not benefit at all from my opponent's model. The people who benefit from my opponent's motion are dumb, not clever.

Since more people are getting higher grades (even guessing randomly gives you more than 50%), the average mark will be higher under my opponent's model. That's mark inflation. If the problem he wants to solve is mark inflation, he should suggest a solution to the problem, not a way to make the problem worse.

The second problem he identifies is that confidence is important in real life. Sometimes, however, taking risks is also important in real life. For a doctor, the time it takes to find and call a specialist may be the difference between life and death for a patient. Medical students are already taught that taking risks is a bad idea except in emergencies. Their understanding of when to take risks and when not to is already being assessed. Currently the education system is doing a fair job of this - we do not have an epidemic of new doctors taking un-necessary risks with their patients. Therefore this is not a problem with the status quo we need to solve.

Third, my opponent states that smarter students will choose higher confidence intervals. Come back to my original example. If the guessing student who scored 4/20 had chosen '1' for every confidence rating, he would have gotten 4 marks correct (+4) and 16 marks wrong (according to my opponent's rubric, -0). His score is 124/180, or 68.9% (a 15.6% improvement). Now let's assume that the clever student who scored 16/20 answered everything with a '3' confidence rating. He scored 16 correct (48 marks) yet 4 wrong (-24 marks). He ends up with 24 marks - exactly the same as last time (80%). Again, only the stupid improve.

If my opponent was planning to argue that smarter students will better know which answers they know well and which answers they don't, he should think again. Stupid students frequently only study that portion of their courses which is required to "pass". The questions that they have studied, they will know well and thus answer with a confidence of '3'. Other questions they will guess and answer with '1' since they don't know them at all. Smart students, however, attempt to study the whole course. Since my opponent's own evidence notes that smart students will likely answer with the "maximum" level, they will answer 3 for everything, whether they actually know it or not (because they think they know it, because they have studied for it). In this way, students who don't know so much can gain an extra advantage, since they know when to take the risks.

The other alternative would be to score all negative marks a zero (not suggesting that this is what my opponent is doing, it's just an alternative way of looking at the mark range). The problem with this is that others assessing the student's quality cannot know how badly they failed. That model would essentially be a really complicated way of changing the present 'D+' 'D' 'D-' and 'F' grades into a single 'F' grade. The biggest injustice of that system is that while a lazy guessing student can get a positive grade by choosing only '1' for confidence, a hard-working yet dimwitted student who bravely took the risk to choose '2' can get a negative mark, despite the fact that the hard-working student got many more questions right. The guessers pass, but many who make a good attempt will, unfortunately, be lumped together with the worst of the failures under this system.

It just doesn't add up.

The model that my opponent has proposed will only exacerbate the harms he has given you. It gets worse, however. The model that he has proposed will compound additional harms to our education system.

First, the harm of putting extra pressure on students. Science and Medicine are actually tough degrees, taking many years. Only the best of the best survive. Exams, therefore, are suitably difficult already. Asking students to assess their confidence only adds to their in-exam nervousness. It's perfectly normal to be not confident in an exam because ... well, it's an exam. Every college student will know the feeling of walking into an exam room after months of study, only to forget everything. No student deserves to be penalised for doing something perfectly normal - freaking out during an exam of one of the hardest courses at the university. Not only will they be punished through having less of a confidence "multiplier," but they will lose valuable time and concentration on deciding how confident they are on a 1-3 scale. That means that either students won't do the exam so well (in other words, standards at the university drop) or the exam is made easier to accommodate the new marking scheme (in other words, the university's standards drop). Both are harmful for both the university and the wider student body.

Second, the harm of complicating the system. Whenever there are additional steps for markers (human or machine) to process, there is also additional room for marking error. Since the process is also more complicated, it is also less likely that students will check their answers against their result to see if there has been an error. That's harmful to students because it means they don't get the grades they deserve, but moreover it's harmful to the faculty staff who will suffer the consequences of a reputation for lax standards. Clever people in high schools may therefore be less inclined to study things such as science and medicine, the result of which (in the long term) is an even worse shortage of qualified scientists and doctors than what many countries around the world are facing already. Furthermore, the increased administrative costs of filing twice the number of responses are significant, costing students more money for a worse service.

There are other harms that I will be glad to elaborate on in later rounds. Good luck to my opponent for round 2!
Debate Round No. 1


Firstly, I thank Larz for taking this debate. His debate record is quite strong, and I always prefer to lose to a strong opponent than win easily. With that said, I don’t intend to lose!

Con Case

Con raises a few main arguments –my system increases mark inflation (as opposed to decreases; he agress mark inflation is bad) and can lead to good students being grouped with bad students, no current problem to fix (doctor argument) , smarter students will answer everything 3, extra pressure on students and that it complicates the system.

Mark Inflation

I was quite interested when my opponent brought up this point as it had never come up in any articles I read. I kept researching and still couldn’t find a note on whether it went into negative scores, so I’ll take my own path and state there is a floor at 0. This means that one can’t get a mark below 0 (although a note would be taken on how negative the score is as teachers / computers have to mark till a test’s end). The other solution would be to change pass marks and such, but that’s a very messy process which I’m not advocating.

So, returning to my opponent’s 4/20 example on confidence 1 (effectively a guess), his guessing doesn’t give him 20%, it gives him 6.66%. Essentially, the benefits of guessing are mostly eliminated, and mark inflation is lower under confidence assessment, making many of my points (such as feedback) true.

On regrading, there would be no change as grading is done on percentages and percentages aren’t negative. On the good attempt people, they clearly aren’t so good and the system is based on risk and return. Also, there is strong evidence to show that students, as a whole, don’t rate their confidence high enough, but more on that later. Effectively, these people are rare and, due to confidence assessment actually better realise their lack of knowledge (again, later elaboration).

System Complications

Put simply, this is a near non-issue. There’s no complications with adding a multiplier in computer programs. My friend could easily write this program in a few minutes (he’s wrote a program to work through the preferential voting process) , no problem, meaning this is no complication. Furthermore, as testing is increasingly moving in the direction towards computers (particularly in the case of the questions we are debating), issues with teacher time (which are minor) are a temporary problem that is minor and is a reasonable concession for the other positives of the system.

Extra Pressure?

This is based on Con’s second last paragraph.

A few points. 1, universities aren’t harmed by voluntary decisions, plus this is the physiology choosing this type of assessment! Second, while exams may be stressful, it’s necessary to have such pressure. If a person can’t pass an exam with a reasonable amount of preparation, then are they really the right person for the job when confronted with sudden problems (with no preparation)? Will they make errors under stressful situations in real life when mistakes matter? Furthermore, once could always allow 5 min extra time in a test for confidence.

And here’s an entertaining statistic for my opponent to puzzle over – how come when students who had used confidence assessment marking were asked on keeping it in exams voted 52% in favour to 35% against (13% no reply or no preference)?

If my marking scheme creates extra pressure, yet students support it, then pressure isn’t really much of an issue.

No problem to fix?

Con’s 4th paragraph (not including his 1 line intro) countered.

Firstly, I never justified not taking risks. I clearly stated I still supported risk taking when necessary (see R1). If a patient’s going to die if you go consult somebody then you’ll take your chances trying to save them without help.

Secondly, Con states risk taking is not a ‘problem’ as a reason not to change systems. That’s like me saying ‘the current speed of modern computers is not a problem, so we needn’t improve their speed’. The lack of a ‘problem’ is no reason not to support improvement.

Statistical Reliability

Con earlier raises a point about teachers having to do more marking and this taking more time and being bad. Fair point, but if the tests they were marking were more statistically reliable and so could be made significantly shorter would not this make up this deficit? Now, readers may be saying ‘but why is this true?’ Well, it’s due to the confidence factor. Now let’s demonstrate this:

“Reliability analysis of exams (GardnerMedwin, 2006b) suggests in any case that the number of questions needed for equally reliable assessment data can fall by a third or more with CBM compared with conventional marking.” -

“An improvement in the ability to discriminate between students means that confidence-based data can give results of equivalent statistical reliability to conventional marking with fewer questions, by a factor of 2.0 for the lowest third of our groups and 2.2 for the top third” -

“It corresponds to an increase in a test length of about 58%” (see the first source of this round)

Effectively, confidence assessment marking can allow for a halving of marking or double the statistical reliability.

More on Guessing

There’s many things to cover here. Firstly, my opponent fears that smart students will just blindly guess ‘3’. I can say that intelligent people know when they are right and when they are wrong. Furthermore:

“In the exams, 41% of answers were entered at C=3 with 95% correct, 19% at C=2 with 79% correct and 40% at C=1 with 62% correct. The percentages correct were within the optimal ranges but for C=2 and C=3 were near the top of these ranges, reflecting caution or under-confidence. Only 2 students (1F, 1M; both weak) were over-confident” -

It should be noted that the ‘optimal ranges’ are over 80% for C=3, 67-80% for C=2 and 67<% for C=1. Students don’t think in probabilities like this but pick up the idea of the system very quickly and intuitively (in sources already mentioned).

Furthermore, confidence assessment marking is helpful for learning as it promotes understanding as opposed to rote learning. As evidenced:

“By failing to think critically and identify points of weakness, students lose the opportunity to embed their learning deeply and to find connections between different elements of their knowledge” - . Given that this is what confidence assessment does through students thinking about understanding, this is a positive. Educationally, the issue is to teach students to distinguish between confident and unconfident answers rather than handle marking schemes with optimally calibrated answers.

Another point is the psychological jolt this system provides in destroying bad habits. Because no consideration is given in normal marking as to how to answer (you just answer) , students don’t pick up on bad habits when they answer. However, when one answers with a ‘3’ , indicating about an 80% confidence or higher (on average) and is wrong (-6), it indicates a serious misconception with their thinking, prompting a rethink in future (vs. just having another question wrong and doing nothing) , as shown in some student comments from an evaluation study (same source as the one directly above) :

‘(Issroff & Gardner-Medwin, 1998) 4 : "It .. stops you making rush answers.", "You can assess how well you really understand a topic.", "It makes one think .. it can be quite a shock to get a -6 .. you are forced to concentrate".’


The above disproves Con’s contentions, expands upon the benefits of the proposed system and reinforces R1 contentions. Resolution affirmed.

(apologies if this is disjointed, low char. left)



I thank pro for his rebuttals.

Mark Inflation
The harms of mark inflation are not that marks are higher, but that marks become more concentrated. For instance, higher marks don't make it harder to tell the good from the bad, or send people false signals, but marking a good student with a mark closer to a poor student will. The problem is that under my opponent's model, you can't tell the difference between a student who didn't want to "effectively guess" but got 50%, and a student who got 0% (in other words, my opponent is giving poor students a head start of at least 50%)! That means that marks from poor students are artificially inflated to non-zero grades, while students who don't "effectively guess" are artificially deflated to be nearer to the poor students, if not given a zero grade. Indeed, until you get more than about two-thirds of the questions right, it makes sense to "effectively guess" by picking 1, and until you get more than about three-quarters right, it makes more sense to pick '1' than '3'. If you don't get >~80% right, picking 3 is a very foolish decision. The problem is that unless students pick, on average, above 2, they cannot reach more than 66% on their final grade- students are FORCED to take big risks. In addition, my opponent proposes a minimum mark at two-thirds of the mark range. Introducing a minimum mark like this, and forcing all other grades to tend towards it, is precisely what causes grade inflation. The poor students' marks really do increase, but with a minimum mark and a skewed mark range to trick people into believing that they don't. That's not solving the problem. It also ignores the fact good students do no better, as I told you.

There is a problem with recklessly changing the mark range, however. As the harms of grade inflation, people also cannot tell how badly they have failed. You couldn't get, for instance, a D- if every fail score was a 0. That's bad for students, and is not to mention the fact that with their 50-66% head start, students who "effectively guess" get much higher average grades than students who don't "effectively guess," making the exam much easier for the guessers.

As a rebuttal to this, my opponent said it's OK because the system is based on risk and return. First, basing your system on risk and return does not offset the harm of mark inflation, and second, even if it did, forcing people to wager on all of their answers is terribly unfair for the reasons I have described already. Then he said smart people better know their lack of knowledge. I've already told you that dumb people don't need to study the whole course to do well, but if you try to study the whole course you are less likely to know what you have and have not missed. Therefore, the message my opponent wants to send to students is "it's OK to not be diligent and learn the hard stuff, so long as you can do the easy part." Imagine if doctors had that attitude.

System Complications
Most of the time, the errors creep in with inputting the data (and occasionally exporting, as different systems can sometimes be unexpectedly incompatible). If you use a scanning system to input your data, there are issues with recognising odd forms of handwriting (short answer exclusive problem, but a problem nonetheless), sheets inputted backwards or back-to-front, sheets appearing smudged because the scanner is slightly out of calibration, and so on.

Even so, adding extra steps to the process always introduces the potential for more errors in the program. This is well-known to teachers. Recently in England, switching to a system with computer and Internet based marking to ensure fairness led to a drastic increase in marking errors (, despite the best efforts of the smartest computer programmers in the country. Today, nobody has been able to fix the marking errors, which has led to top students missing out on university placements ( It's wrong to assume that computers are always more reliable than humans just because your "friend" said so.

Extra Pressure
It's no wonder students would need extra time to compensate for the pressure. When you're faced with double the number of questions on your paper, and only five minutes of extra time to calculate your "risk and return" for each one, you're faced with a daunting task. Second, while unprepared pressures exist in real life, that's not what exams are for. Exams are for testing your understanding to make you as prepared as you can be. If students are incentivised to study less of the course, they can get much higher marks for being less prepared. It is better to have a correct but uncertain doctor than an utterly wrong but confident one, but the exam results reflect the opposite sentiment! Finally, it is wrong to assume that every graduate will be under that pressure. People who study math, for instance, may end up being an engineering consultant, which has ample time to check and revise answers. "One size fits all" doesn't work in education.

On the survey, they did it after five years of running confidence assessments. People who don't like those kinds of assessments are not exactly going to sign up for UCL, are they? They therefore surveyed only those people who willingly signed up to do confidence assessments.

No problem to fix
I wonder why my opponent wants to introduce an assessment that heavily penalises taking risks? One of his rationales was to stop doctors and scientists from taking too many risks - they don't, and his system wouldn't fix that problem anyway. It's not an "improvement," it's a step back from a system that's already working. If it ain't broke, don't fix it.

Statistical Reliability
The fewer questions you have, the more the distribution of grades is skewed towards zero. Therefore, what your sources are effectively saying is that the most "reliable" test marks a greater number of students as stupid on the basis of fewer questions. It doesn't take a genius to work out that you can more easily guess 10 questions than 20 questions, so effectively they want to make it even easier for guessers and more demoralising for hard-working students. Why? UCL is an exclusive, prestigious university (only 12,000 undergraduates - - my own university has almost twice that number and I'm from New Zealand!), so they have to fail most of their students. They want to admit lots of students, however, because universities are funded for each student. This marking system makes average students just as good as guessers, so the average student looks bad, meaning less graduation, maximising funding for elitist professors. It isn't fair for average students, but the university will never tell you that.

More on Guessing
The reason why smart student's don't over-estimate is that they pick up on the dumb people's system and don't study the whole course. Students absolutely do think in probabilities if that will maximise their chances of passing - and by guessing a proportion of the questions, they can do so. That's bad, because you mask the problem of student's weaknesses.

Rote Learning absolutely encouraged over critical thinking! Thinking critically is more likely to make you less confident an answer is correct (because you critique your answer), which leads to lower grades.

Stops rush answers
Only if guessing isn't a "rush answer"! We're not making students think about their answer, we're making them wager on it. Students who get -6 won't improve their answer, they'll just put a 1 by it next time and study something else! Besides, who is dumb enough to rush an answer with confidence '3'? Students under the status quo can see where they need to improve - assessing self-esteem does not improve this process.

The resolution is negated.
Debate Round No. 2


I thank Con for replying. Unfortunately, we don’t see eye to eye and seem to be talking past each other a little.

Mark Inflation

Being a key point of the debate and other points being secondary (in the sense that if Con was right my system looks rather pathetic), we’ll spend a while on this point.

First, Con states that I’m inflating marks by setting a floor of 0. This is done to keep percentages in line. I already said last round that educationally the issue is to distinguish between various degrees of confidence as opposed to handling marking schemes with optimal precision.

On grades, I addressed that last round as they are percentage based (percentages are still based on marks from 0 to the max score).

Furthermore, Con blames my system for not making good student’s marks rise (‘ignores the fact that good students do no better’). Well isn’t that shocking! By getting rid of inflated marks that aren’t reflective of true understanding scores drop! I take this as proof that this system is making an impact.

On Con’s ‘it’s okay to not learn the hard stuff’, he’s wrong. If a student just coasts on ‘1’ level answers they lose 2/3 of the potential marks. There’s still marks for the ‘hard stuff’ as he puts it, the exact same percentage of marks as in the current system.

The key point to consider is that the scheme encourages people to accurately report their confidence. If people inflate confidence they will end up losing marks, same with undervaluing it. I also remind people that while people will often go 1 in this system there’s NO reason not to guess under a traditional scheme.

System Complications

I already conceded that there would be slight additional complications. However, as computers improve and exams are done on computers (meaning no problems with scanning) these complications will diminish. Note this was the first year of a new system, so improvements will swiftly be made. This minor inconvenience is well worth the advantages of confidence assessment.

Also, here’s a quote from con’s source (1st) ‘the Qualifications and Curriculum Authority (QCA), which oversees exams, denied that the use of the new technology is "the main issue" ’

Puzzle that one.

Extra Pressure

I’ve shown previously evidence that shows students intuitively pick up the system and are comfortable with it. I give evidence of this and Con simply tries to dismiss it. Con should face the idea that students can prefer the system. Test marking systems don’t make people leave university. Here’s more evidence of students being extremely comfortable with the extra ‘pressure’:

“96% of the students found the system either easy or very easy to use” -

I’ve shown the pressure to be nearly non-existent, and extra time can be added to cope with any extra pressure if needed.

No Problem?

I don’t want to penalise risks, I want to support knowledge, judgement and sound decisions. I figure improving things is good.

I’ve said this in various places. I particularly urge readers to remember my guessing example from R1 involving investment.

Statistical Reliability

Con was bordering on irrelevant here. Yes, the less questions one has the more the distribution of grades is skewed to 0. However, confidence is assessed this is somewhat counteracted.

I provided ample evidence last round of confidence assessment of say, 100 questions being equivalent to current tests of easily over 150 questions.

I’ll even give an example for readers. For every 79 questions in a current test, using only 50 questions under confidence assessment teachers and students get the same level of feedback! Confidence assessment therefore allows for major reductions in marking time, test creation time (a bigger time issue) and so on for the exact same level of insight into student knowledge (the main purpose of testing, although the system has other advantages)! Or, we can have the same number of questions and a far deeper insight into student understanding.

Look at the evidence, not Con’s assertions. I personally quite like the dilemma of the 2 appealing choices above (less resources spent on tests or greater test reliability and accuracy) .

On the off topic point of UCL, it seems rather fitting that one of the world’s top universities utilises a marking scheme that is superior, doesn’t it?

More on guessing

Yet again, I provide evidence contrary to Con’s assertions and Con merely claims his assertions are right, believing that regardless of the evidence to the contrary. I use numbers below to indicate a direct reply to that sentence of Con’s (1 means the first sentence and so forth)

1- Same percentages, no reason not to study the whole course, plus their undervaluing of their marks LOST them marks based on where their percentages should be (my sources shows a person utilising ‘2’ and being right 90% of the time. They could get 50% extra marks by using ‘3’ ; cut-off for 3 is 80%)

2- Contrary to evidence... Also, guessing is even more encouraged in the current system. This quote (see below source under ‘Rote Learning’) directly contradicts Con here: “Students rarely describe their choice of confidence level in terms of explicit probabilities, even after the principles are explained”

3- The reverse occurs through the exposure of student lack of knowledge.

Rote Learning

Again, we’re speaking past each other. If students do want to leave questionable answers at higher confidence levels (as opposed to realising they are bad from critiquing) then they’ll just lose marks. In the words of (starts off on traditional scheme)

Experience with students suggests that many of the stronger ones find they can do well by relying on superficial associations, with little incentive (on conventional right/wrong mark schemes) to think rigorously or understand the issues thoroughly: the first thing they think of is usually good enough. At the same time, weaker students try to emulate this with diligent rote learning, rejecting deeper learning as unnecessarily challenging. A properly designed scheme for CBM ensures that in order to get the best marks students must discriminate between responses based on sound knowledge or understanding, and those where there is a significant risk of error.”

The system is inclined to reward correct indications of confidence.

Stops rush answers

Con’s words here are quite instructive, if only to demonstrate to readers common misconceptions. I’d reply to everything, but I don’t have space, so here’s a smaller part of Con’s words:

“Besides, who is dumb enough to rush an answer with confidence '3'? Students under the status quo can see where they need to improve - assessing self-esteem does not improve this process.

This is key. Anybody putting down 3 is fairly sure of their answer, so those times when one is wrong really show up under confidence assessment (because it affects them a lot) . This means that students are more likely to realise their conceptual errors and fix them. I showed student testimony of this last round and the logic is sound.

On the status quo being able to do this, I agree. I merely state that confidence assessment is far more likely to correct bad student habits as mistakes are less likely to be glanced over.


I ask readers to not just consider what I’ve stated this round, but that which Con has left unanswered in previous rounds / I’ve shown to be true in previous rounds. Together, I believe the resolution to be affirmed.

I thank Con for being a serious and strong opponent.

Also, if anybody wants to learn more on the subject (after the debate) read my sources or contact me.


I don’t think voters have any reason to give spelling/grammar or conduct either way. People should consider the issue of sources though, as I’ve used what I believe to be more and higher quality sources, although this isn’t a significant issue. As for arguments, I’m not about to endorse the Con position; I suggest a Pro vote.

Thanks again to Con and readers.



It's a shame we're speaking past each other. My opponent has been very articulate and has upheld a high standard of conduct, though of course that's no protection against being dead wrong. In this round I'll summarise my position.

When you're able to set the mark value of the questions yourself, then you can make the easy questions worth three times as much as the hard questions. Since the easy questions are now worth the most marks, guess what people aren't going to study? That's the wrong message to be sending the young people of today - we should be telling students to put in the hard yards, not that the easy part counts the most.

When you're pushing all marks down because, unless a student puts a confidence of '3' for every single question and gets nothing wrong, their score will include deductions for wrong answers, lack of confidence, or (most likely) both, you're going to have lower scores for top students. When poor students have head starts of up to 66%, you're going to get higher scores for poor students. Couple that with a minimum mark and a single fail grade, and what you end up with is more concentrated marks. That's really bad, for all the reasons my opponent so well articulated in round 1. No wonder he dropped all responses to this point last round, other than to reassert that "percentage based grades" somehow make the single fail grade OK. No, they make it worse, as I explained last round.

When you're complicating a non-broken system that will introduce more marking errors, students and teachers will get angry. It's not a "minor inconvenience" like my opponent would have you believe - people's lives and futures are at stake here! The system we have now isn't perfect, but that's no excuse to make the system worse. My opponent states improvements will be made in the system, but as I said, in England it's 4 years later and there are still no "improvements" - only steps in the wrong direction. He adds that the QCA in England denied that technology was the "main issue", but what do you expect them to say? "Oh, we just wasted millions of taxpayer dollars in return for a system that didn't work and which has resulted in many more mistakes being made"? Hardly.

When you're basing your whole case on a single survey, you know you're a desperate debater. The UCL survey was only conducted on students after 5 years - more than the time it takes to complete a degree. Therefore, while I agree tests don't make most students leave the university, the only new students to join are those happy with the format. This happens in New Zealand with our three forms of high school assessment - parents pick their kid's high school based on whether they want an NCEA, Cambridge or IB exam. My opponent is deeply naive if he thinks people don't think about exams when they go to uni; indeed, that's almost all that people think about! Time and time again, my opponent has asserted students prefer the system based on the survey, not logic. The fact is that when you're in an exam for a top paper at a top university, it's stressful enough as it is. When you double the number of responses they have to give, and force them to gamble on every single question, they will be more stressed out!

When you're advocating not penalising risks, you shouldn't advocate a system that has penalties attached to risks. In round 1 my opponent gave the example of a doctor unsure of his diagnosis. The greater harm is if he's wrong, as my opponent admits. With my opponent's examination system, he is more likely to be wrong, AND more likely to take risks. More likely to take risks because you have to have an average confidence higher than 2 to even get 66% in your medical exams, and more likely to be wrong because there is less need to study the difficult aspects of medicine with my opponent's model.

When you make a point about statistical reliability only on the basis of some comments by professors, and I point out that the professors have a vested interest in misrepresenting how well the system works, my point is absolutely relevant. I also showed a prima face case why the professors are wrong - reliability comes from knowing more about what people know, not how well they perform at the same type of unfair, biased test a year later. It's utterly illogical and contrary to common sense to call a test that I have proven to make the smart dumber and the dumb smarter, using fewer questions, a more reliable assessment. Despite my opponent's constant assertions, it isn't. Confidence assessment does not assess "understanding" as my opponent constantly asserts, but rather "confidence," which is at best of questionable importance in the real world, yet has the largest impact on the final test score. Then he instructs voters to look at the evidence, not my assertions. Right back at you! I haven't made a single assertion under this heading that I have not somewhere justified. And no, some professors who gain massive financial benefits from lying about the power of confidence assessment do not count as "justification."

When you give guessers a head start, you encourage guessers. When rote learners can triple the marks they get on questions they know, you encourage rote learning. When students critically think about their answer, they are less confident about it (that's just the nature of critique), and thus get much lower marks under your scheme - asking the professors who I've proven are heavily biased by their monetary incentives is not good evidence. Some of the best scientists and doctors were skeptical of the answers they came up with at first, but years later they turned back and agreed that they had, in fact, stumbled on the correct solution. Is it right to cut the reward of unconfident people by a third? Is it right to encourage guessers? Is it right to encourage rote learners, while discouraging critical thought? Pro seems to think so.

When you agree the status quo is already working to ensure answers are not rushed, and that your system actually makes rush answers less unappealing because they're worth much less than under the status quo, you fail to show that your system has any benefit whatsoever. I agree with my opponent's logic that scoring a -6 will deeply affect students, but when a 6-point improvement is just a '1'-rating away, I don't buy that the system will motivate students to study any more than the status quo. Rather it will motivate them to reduce their confidence and score much higher. The status quo is exactly the same, just without the skewed mark range to create that perverse incentive.

When you do all that, your model has failed.

When your model has failed, you lose the debate. You also lose sources if I've discredited all of your sources and you've discredited none of mine. Vote based on what my opponent and I said, not what you hoped we'd say. Vote based on argument, not blind assertion - including the blind assertions in my opponent's sources. Vote to keep our qualifications systems fair and unbiased. I've been involved in my fair share of student activism over the years, and this is one thing no student or teacher should support. The system didn't add up in round one. It still doesn't.

Vote con.
Debate Round No. 3
11 comments have been posted on this debate. Showing 1 through 10 records.
Posted by F-16_Fighting_Falcon 6 years ago
RFD: The main problem I had with Pro's system was the floor being set to 0, I initially thought the plan was very good but after Con pointed out its errors, Pro set the floor to 0 which Con correctly points out makes no difference between a D and an F. Unfortunately, inflation was in fact the most important point and lost the debate for Pro.

Pro however more convincing that errors will be minimal. Con's point about errors wasn't strong enough. Pro showed that they were computerized. Con contends that the more data to enter, the more errors there will be. But should we not opt for a better system just because of a risk of computer error?

Pro shows that smart students are more likely to know how confident they are in answering questions. It was more convincing than Con's assertion that dumb students study specific parts of the syllabus. There is an intuitive advantage gained by studying everything which cannot be substituted by only studying specific chapters so Pro was more convincing.

Con says in round 3 that exams are for testing your understanding to make sure of how prepared you can be. However, Pro's method clearly fit the bill better as Con never denies that Pro's system better helps students guage their level of confidence.

Overall, Pro doesn't prove that the system eliminates or reduces guessing enough for it to be worth the added complication of implementing it. Con's strongest arguments were the refutations he put forth showing that guessing will take place whatever system is followed. Overall, it was competitive, and Pro was more convincing on many of the supporting points.
Posted by larztheloser 6 years ago
I wouldn't doubt that confidence assessment has some merits (that's why I didn't contest some of your points!), I just doubt that the merits are greater than the harms. The biggest harm is the apparent unfairness. If some genius manages to fix that loophole, then I agree confidence assessment could be great for these kinds of tests.

Also, don't worry about the horrible spacing, it motivates me (because it looks like you always write more so I feel the need to make up for it - lol).
Posted by Logic_on_rails 6 years ago
Done! I doubt I'll convince you Larz, but I hope you realise the merits of confidence assessment in marking. Also, apologies for horrible spacing.
Posted by larztheloser 6 years ago
Oh, that's right, I forgot you Aussies don't have holidays during the Rugby World Cup. Don't worry, I can wait!
Posted by Logic_on_rails 6 years ago
May have to wait another day Larz. I'm trying to get this done before school holiday's end (today) , but I've got to do a proper job. My apologies for the delay.
Posted by Logic_on_rails 6 years ago
drafterman and larz, I was also intriqued by this point as it hadn't come up in any of my research! Nevertheless, there is evidence that shows it is in use by many universities, so it clearly works out mathematically. I'll likely by guessing that they set a floor at 0. I'll also probably demonstrate evidence that shows that students don't guess higher than they should (in fact, they typically slightly undervalue their answers) . It should be a good second round.
Posted by larztheloser 6 years ago
Worse still, if you make it non-relative (0-100), then as I said a lazy guessing student can get a positive grade by choosing only '1' for confidence, a hard-working yet dimwitted student who bravely took the risk to choose '2' can get a "negative" mark (and thus end up with a 0), despite the fact that the hard-working student got many more questions right - and there would be no way of differentiating the hard-working from a guesser other than that the guesser got a HIGHER score!
Posted by drafterman 6 years ago
I'm not sure I understand how Confidence Assessment tests are graded. Con suggests that it is graded relatively, based upon the minimum and maximum scores (-120 to 60 for a 20 question test - a range of 180). This doesn't make sense as simply marking each question with a confidence of 1 will net you, at the very least, 0 points, which is 67%. How can I get 67% for getting no questions correct? On the flip side, if I got every question correct at Confidence 1, I'd only get 20 points which is only 78%! So an 11% difference between no questions and all questions correct!

The only alternative I see is that it isn't relative; it's still based on a 0 to 100 scale. Any score below 0 is treated as a 0 and any score above 100 is treated as a 100. This also seems to have flaws as I only have to get 7 out of 20 questions right at a level 3 confidence (21 points out of 20) to get 100%. That means I can technically get a perfect score based on getting less than half the answers correct!

I've done some superficial google searching, but all I can find is how the individual questions are scored, I can't seem to find anything that shows how the overall tests are graded.
Posted by GaryBacon 6 years ago
If this is still open in a few days, I may take it.
Posted by wjmelements 6 years ago
I totally agree with the Affirmative position.
3 votes have been placed for this debate. Showing 1 through 3 records.
Vote Placed by GaryBacon 6 years ago
Agreed with before the debate:--Vote Checkmark0 points
Agreed with after the debate:-Vote Checkmark-0 points
Who had better conduct:-Vote Checkmark-1 point
Had better spelling and grammar:--Vote Checkmark1 point
Made more convincing arguments:-Vote Checkmark-3 points
Used the most reliable sources:-Vote Checkmark-2 points
Total points awarded:06 
Reasons for voting decision: I did not know much about this topic before reading the debate. But I believe that Con effectively showed why confidence assessment marking should not be implemented on exams.
Vote Placed by drafterman 6 years ago
Agreed with before the debate:--Vote Checkmark0 points
Agreed with after the debate:--Vote Checkmark0 points
Who had better conduct:--Vote Checkmark1 point
Had better spelling and grammar:--Vote Checkmark1 point
Made more convincing arguments:-Vote Checkmark-3 points
Used the most reliable sources:--Vote Checkmark2 points
Total points awarded:03 
Reasons for voting decision: This was a tough one. I think the problem on Pro's side was a lack of complete information regarding implementation and tangible benefits.
Vote Placed by F-16_Fighting_Falcon 6 years ago
Agreed with before the debate:--Vote Checkmark0 points
Agreed with after the debate:--Vote Checkmark0 points
Who had better conduct:--Vote Checkmark1 point
Had better spelling and grammar:--Vote Checkmark1 point
Made more convincing arguments:-Vote Checkmark-3 points
Used the most reliable sources:--Vote Checkmark2 points
Total points awarded:03 
Reasons for voting decision: The core of the argument was to not reward guessing. Con shows that Pro's argument does not eliminate that problem as it still rewards guessing albeit in different and more complicated ways. (See comments section for analysis of major points.)