Arbitrary Precision and Big Numbers in PHP
Mathematical questions and applications always trigger me. Recently I just finished a book by Stephen Hawking “God Created The Integers” and being able to “talk” to those great mathematicians in history thrills me. This is the main reason I decided to write about this topic.
In this article, we will review the PHP capability to provide arbitrary precision number calculation / big integer calculation by reviewing 3 PHP modules: GMP, BC Math and php-bignumbers. We will demonstrate two real-world examples to see the powers/limitations of each. The first one will be calculating PI to arbitrary precision – well, for the sake of the article, we will restrict the precision, say, to 1000 digits; the second will be a simple demonstration on RSA encryption/decryption.
Let’s get started.
sudo apt-get install php5-gmp
php-bignumbers is distributed via Composer. Create a
composer.json file and run
php composer.phar install after adding the requirement. The php-bignumbers Github site has the detailed information to how to write the
BC is usually pre-installed and enabled.
After installation, we can test that all are working fine by writing a simple PHP script and running it in the terminal:
<?php // Below is to test php-bignumber lib require_once 'vendor/autoload.php'; // Required for autoload use \Litipk\BigNumbers\Decimal as Decimal; $one = Decimal::fromInteger(1); $two= Decimal::fromInteger(2); $three=Decimal::fromInteger(3); $seven = Decimal::fromString('7.0'); $a=$one->div($seven, 100); // Should display something like 0.142857142857 .....and the last digit is 9 in consideration of rounding up the 101th digit. $b=$two->div($seven, 100); $c=$three->div($seven, 100); echo ($c->sub(($a->add($b)))); //Displays 0.00000... to the last digit echo "\n"; // Now we test GMP echo gmp_strval(gmp_div("1.0", "7.0")); //Displays 0. Not really useful! echo "\n"; // Now we test BC $oneseven=bcdiv('1', '7', 100); // Should display something like 0.142857142857 .....but note the last digit is 8, not 9. $twoseven=bcdiv('2','7', 100); $threeseven=bcdiv('3','7', 100); echo bcsub(bcadd($oneseven, $twoseven,100), $threeseven, 100); // Displays 0.000000000... to the last digit echo "\n";
The above code will output like below:
In the first and the third output (using php-bignumbers and BC, respectively), they are identical and accurate (1/7+2/7-3/7=0).
For GMP, it only handles big integers and does not provide arbitrary precision floating numbers calculation method so it says 1/7=0, a result we can expect for one integer dividing another. For this reason, I won’t use GMP for arbitrary precision demo (PI calculation) and will only use it in the RSA demo.
A brief comparison of three libraries
- New, under development;
- Object Oriented;
- More typing to create a number;
- Limited operators and functions (at the time of writing, it is missing
powfunction and all other important mathematical function families like trigonometry, factorial, etc);
- Provides arbitrary precision capability, which is what we needed.
- Mature, pre-installed and enabled;
- Not Object Oriented;
- 10 functions available, missing all functions as php-bignumber, except
pow, and missing log/ln;
- Provides arbitrary precision capability, which is what we needed.
- Mature, easy to install;
- Not Object Oriented;
- Over 40 functions available, but all focused on integer manipulation;
- Provides arbitrary precision for integers, not fully what we want.
Theoretically, with +-*/ provided, there is actually no limitation to simulate other operations on numbers. It may be easier to expand php-bignumbers than to expand the other two. So I am looking forward to the author of php-bignumbers to add in more functions in the future.
If we are only dealing with integers, and particularly not much with the division (or at least, as we can see in RSA demo, we are more interested in the quotient and remainder), all 3 libs are a good candidate and GMP may be even better as it provides the largest number of functions. But if we are determined to get a floating result with arbitrary precision, GMP is out of consideration and we will rely on BC and php-bignumbers.
Now, let’s get into the real world demonstration.
PI as an irrational number, its value can be calculated infinitely to any number of digits. The calculation of PI itself is useful in testing the speed of a PC, and testing the optimization of the algorithm.
As this calculation involves floating numbers, we will use BC and php-bignumbers. For the two programs, we will use the simplest way to calculate PI:
This is not the fastest way to calculate PI as its convergence speed is a bit slow, but never mind.
To benchmark the speed of these two libraries, we will set a stopwatch. As I know it is a lengthy process, I will use the PHP built-in
time function to get the timestamp and calculate the difference – I can tell you now that the difference is obvious. We also will compare if the two results are identical (not close enough, but identical).
n used in the above formula will be set to 1,999,999. This is to avoid too much time spent on finishing the computation on my PC.
Also, to avoid extra function call overhead, we will not use any extra user functions. This may create some redundant lines but let’s bear with it.
I have done some basic optimization in the calculation. More optimization suggestions are more than welcome!
The code (and the RSA code shown later) has been uploaded to Github.
I have tested the performance (both in terms of speed and accuracy) with different
n and the below table shows the summary:
Frankly speaking, this result really disappoints me. With a max N set to 2 million, we are only able to achieve an accuracy to the 6th decimal point (3.141592) compared to the real PI. But this is due to the convergence speed of the algorithm itself.
Both methods (BC and php-bignumbers) demonstrate a high level of identity of the results: for 1000+ digits, they only differ after the 990th digit. This is equivalent to 10E-990 difference, which is sufficient enough for us to put the same level of confidence on the identity of their outputs. In other words, given the same algorithm, these two libs give out the “same” result.
However, the speeds are of a huge gap. Roughly, BC takes only 1/3 of the time of that of php-bignumbers. I believe this is because the php-bignumbers lib is spending a lot of overhead on OO-related processing.
Both libs (and also GMP) internally use strings to store arbitrary precision numbers by forming that string to the digits we specified. String manipulation will also slow down the performance. There’s also the fact that BC is written and executed at a low level – in C – while bignumbers is just a high level PHP library.
If we decrease the precision to 100 digits, the time consumed will be tremendously shortened. When n=999,999, 15s and 85s will elapse for BC and php-bignumbers, respectively and the accuracy of the output is still preserved. The results from the two libs will differ from the 95th digit out of 100+ digits. We are still comfortable to say they are identical.
So the suggestion is, at this stage, BC is still recommended. It provides the same accuracy but is much faster. To save time when doing such arbitrary precision calculation, it is highly recommended to set a lower scale of precision. The time will be roughly 1/10 if the scale is 1/10 (1000 digits costs 128s while 100 digits costs 15s for BC. For php-bignumbers, it is only 1/5 though).
Just to bring our discussion to the extreme, I have inserted a few lines of code to see what the result will be if we don’t rely on any libraries and just use the built-in floating numbers.
The result is astonishing. When n is set to 999,999, the built-in floating number gives a result of 3.1415916986586, almost identical to what we can get from BC (3.1415916986595…). But guess what? It finishes almost instantaneously!
Thus my conclusion is this: If we don’t really care about the digits at and beyond the 10th digit, we probably shouldn’t use any of the arbitrary libs at all.
An RSA demo
RSA encryption used to be very much trusted but a piece of recent news put it under the spot light, accusing RSA’s possible cooperation with NSA by planting a backdoor inside RSA’s encryption algorithm (here). Nevertheless, the algorithm itself is still worth discussing here.
A detailed description of RSA is beyond this article. Interested parties can refer to its WIKI page for a detailed explanation. We will just illustrate the code we write to implement the RSA encryption and decryption.
Note: The code and the steps illustration below is inspired by this article, written in Chinese by a Chinese programmer Mr Ruan.
The preparation work for our RSA demo is illustrated as below:
We won’t cover the algorithm that is behind the generation of these numbers. So far, we just need to know that there are 6 numbers generated/selected after these steps: p, q, n, φ(n), e and d. We also know the public key is generated as (3233, 17) and private key generated as (3233, 2753).
Encrypting a number (strings should be converted to ASCII or Unicode character by character) is done in the process to find
c in the below calculation:
m^e = c (mod n), and m<n
That is, to find the remainder of
m^e divided by
Let’s say we would like to encrypt the letter ‘A’; its ASCII code is 65, then:
65^17 = 2790 (mod 3233)
c=2790. We can send 2790 to my partner and he will use the private key (3233, 2753) to decrypt it. (In practice, we will receive the public key from my partner and he will keep the private key to himself.)
The decryption will be like solving the following:
c^d=m (mod n)
That is, to find out the remainder (
c^d divided by
2790^2753 = 65 (mod 3233)
That’s it! My friend decrypted my message (
2790) and gets
65, which is exactly the ASCII code for ‘A’!
The PHP program to do this process is shown below. I am using BC first and then GMP:
<?php $letter = 'A'; $m = ord($letter); // ASCII code of letter 'A' $d = 2753; $e = 17; $n = 3233; echo "Encryption / Decryption using BC:\n"; $c = bcmod(bcpow($m, $e, 40), $n); $res = bcmod(bcpow($c, $d, 40), $n); echo "Original data: $letter\n"; echo "Encrypted data: $c\n"; echo "Decryption:\n"; echo "Input: $c\n"; echo "Decrypted: $res\n"; echo "The letter is: ".chr($res)."\n"; echo "=========================\n"; echo "Encryption / Decryption using GMP:\n"; $c = gmp_powm($m, $e, $n); $res = gmp_powm($c, $d, $n); echo "Original data: $letter\n"; echo "Encrypted data:".gmp_strval($c)." \n"; echo "Decryption:\n"; echo "Input: ".gmp_strval($c)."\n"; echo "Decrypted: ".gmp_strval($res)."\n"; echo "The letter is: ".chr(gmp_strval($res))."\n";
The results are shown in below screen shot:
BC and GMP perform equally well in this demo. The encryption and decryption are equally fast and finish instantaneously. As the RSA algorithm only involves big integers, the accuracy is well preserved.
Something not shown here is the capability that both libraries can handle really big numbers. For example, in our decryption process, we calculated a very big number (2790^2753, which is a number of 9486 digits)!
The demo above, of course, can’t be really used as a serious RSA program. The length of the key (length of 3233 expressed in binary) is only 12. In actual usage, the length should be normally set to 1024 digits, or 2048 digits for particularly sensitive information.
In this article, we reviewed the 3 libraries that provide big integer and/or arbitrary precision calculation capabilities: BC Math, GMP, php-bignumbers.
The author’s recommendation is to use BC Math as it supports both integers and floating numbers and it also runs pretty fast. Or, we can choose to rely on PHP’s built-in floating numbers if we only care about the first 10 digits or less.
To conclude this article, I will display an encrypted message below using the above RSA configuration. Let me know if you can decrypt it! (The code to decrypt the below sequence is also included on Github)
The message goes:
1486 1992 745 2185 2578 1313 1992 884 2185 1992 281 2185 2235 884 2412 3179 2570 2160 884 1313 1992 884 2185 1992 2680 3179 884 1313 612 2185 3179 2235 884 1853