Techniques in common sites: checking for numbers and human errors

The computer is working properly and the software is correct without errors. However, people make many mistakes (including writing bad software or breaking the computer). In the quality circle, there is a Japanese term, Error proof, Which roughly means “avoid mistakes.” The idea is to avoid mistakes by making them too obvious to not happen. For example, consider the SIM card in a mobile phone. The small diagonal means it can only travel in one way. If you put it in the wrong way, it is obviously wrong.

Success in Error proof, You must be able to imagine what the user might do wrong, and then come up with some way to clearly show that it is wrong. There are such examples around us, sometimes we don’t even know. For example, what do your credit card numbers, car VIN codes and UPC codes on a can of beans have in common?

The answer is that they are all long strings of numbers, as we all know, it is difficult for humans to input correctly. People miss numbers or transpose them. Therefore, the person who writes an application that uses such numbers usually wants to check to make sure that the person did not make a mistake.

Of course, numbers are numbers, right? If I tell you to enter a five-digit postal code, I can figure out whether you enter a four-digit or six-digit postal code, but it is difficult to know whether 77508 or 77580 is what you mean. This is why long and important numbers have one or more check digits.

The check digit is like a checksum or CRC-you calculate it from the other digits in the number, and if your calculation does not match the check digit you get, then there is a problem.

A simple example

For simplicity, suppose you have a four-digit PIN code between 0000 and 9999, and we want to make a five-digit code with a check digit. A simple method is to add all the numbers together and discard all but the last number (that is, the remainder after dividing by 10 or modulo 10).

For example, 0052 becomes 00527, and 9522 becomes 95228. Isn’t it simple? Now you know that 10118 is not a valid number. Of course, 00527 is valid, but so is 00257 or 52007. So maybe we can do better.

real life

In real life, algorithms try to take the position of numbers into account. There are several ways to do this, and as you might expect, there are many mathematical methods to determine what is best. Many systems use weighting algorithms, where each number has a different weight, usually 1, 3, 7, or 9, and no two adjacent numbers have the same weight. Since these numbers are relatively prime to 10, any change in one digit will result in a different check digit. In addition to those involving 5 and 2 (because 5 and 2 are multiplied by 10), approximately 90% of individual transpositions can be captured using this weight.

For example, the ubiquitous UPC code uses a weight of 1 and 3 as alternating numbers, where the number 1 is the rightmost number (except for the check digit), then the numbers 2, 3, and so on to the left. The algorithm is:

Ignore the check digit, starting from the right, add all the digits in the odd position together
Multiply the sum by 3 (odd weight)
Ignore the check digit and add the remaining digits to the running total
Take the last digit of the sum (that is, the remainder after dividing by 10); if the number is not zero, subtract it from 10

For example, there is a can of spray air on my desk, and the UPC is 681131309516. The first six digits are unique to the company. The next five digits are the unique ID, and the last digit is the check digit. This means that the odd digits are 1, 9, 3, 3, 1, and 6. The even digits are 5, 0, 1, 1, and 8. The first sum, then 23, multiplied by 3 gives 69. The even number result is 15, for a total of 84, and the preliminary check digit is 4. Since this is not zero, the true check digit is 10-4 or 6. Try to change any number or swap between any two number groups and see what results you get.

better

ISBN-10 is more robust. It uses ten digits, where the weight of each digit ranges from 1 to 10, and takes the remainder after dividing by 11. This can catch all common errors, but may result in a check digit of 10, denoted by X.

There are other more powerful algorithms, such as Dam, Verkhov, with LuenYou can also add more check bits for better performance, just as more bits of the CRC are generally more reliable.

significance

These check bits are not used as security devices. Generally, the algorithm is well known and easy to calculate. Therefore, it is not because of the check digit that the bad guys cannot figure out how to make a fake credit card number.They just provide a little Error proof So the program can immediately find common errors in these numbers. Things to keep in mind next time you design an interface or anything else that is prone to human error.

Of course, if you want to prevent computer errors, it is best to use CRC. If you are not worried about manually calculating the check digit, there are other ways to catch the error.