End of Project Report
Ellen Preiss
Undergraduate Research
Assistant
Cryptography
William McCallum
This summer’s research in cryptography was aimed at finding
ways to keep the public key of the RSA cryptosystem secure. There were four students working on
different ways to break to public key, and therefore find what needs to be done
to make it stronger against attacks.
One topic studied was producing large primes, another was attacking the
exponents in the RSA algorithm, and another student and I looked at factoring
algorithms.
RSA
deals with picking two large prime numbers and multiplying them together to get
a number n. Then you pick a small encryption exponent a. So you take your message
to the encryption exponent, in modulus n,
to get a coded message. The public key
includes n and a. One ideal way to break
the code would be to factor n. This, however, is not easy with the
large-scale numbers we are dealing with.
So I looked at some algorithms which make factoring a bit more
efficient. As a result of this work, we
hoped to see what kind of prime numbers are the best to choose to make a safe
public key.
Before we could begin any of the research, we learned some
Modular Arithmetic and other aspects of number theory. Then we learned about public key
cryptosystems, particularly the RSA encryption system. We sent coded messages to each other and
practiced decoding messages we received, to get an idea of what this type of
cryptography is all about. I then read parts of a textbook about cryptography
called “Making, Breaking Codes,” by Paul Garrett, where I was introduced to a
number of factoring algorithms.
Pollard’s Rho and p-1 methods are simple examples of algorithms which
find a random mapping of numbers and test them for factors. I studied these factoring algorithms, and
then implemented them to see what types of numbers need to be avoided so the
key would still be hard to factor.
After I had a basic understanding of each factoring algorithm,
I would program it in Mathematica and use that to factor numbers from one to
thirty digits, or as far as I could go in a reasonable amount of time. Then I would look for a function of n for
the running time and also for the number of steps through each loop. This gave an idea of how useful the methods
would before for larger numbers. I made
tables of the data I found, then used excel to make a function to fit the data. For example, I used the formula r(n) = a*nb,
and then took the log of that to get log(r(n)) = log(a) + b*log(n). This is in the form of y=mx+b, so the slope
of log(r(n)) versus log(n) was b, and the y-intercept was log(a). This method gave some pretty close formulas,
though it did not always work because of the type of data I had.
For
Pollard’s Rho Method, I changed the algorithm 13 times to see how the running
time and number of steps needed to factor would be altered, so there are 13
tables for this. For Pollard’s P-1
Method, all I changed was the value of b, so there are two tables. (Each group of tables is in one file, with
separate sheets)
As you can see when you
view the Rho tables, the functions r(n) and a(n) are fairly close to the real
values, but as the value of n grows larger, the functions are less
accurate. The functions in the p-1
tables are the same way, but r(n) and c(n) are close to the correct values.
Pollard’s Rho Method generates two sequences of
random numbers, and takes the differences of those sequences to generate a
quadratic map. Then it tests the
differences against n for a greatest
common divisor (GCD). This is better
than trial division because instead of just dividing one number into n to see if it is a factor, it looks for
a greatest common divisor between the numbers, which allows it to test a larger
amount of numbers at once. Instead of
just trying to divide the number, it tries to divide the number and its
factors. This method starts with one
small number, and uses it to generate a random map, and with this map it tests
for the greatest common divisor with respect to the number being factored. So say you start with x=2, and y=x2+1=5. If the GCD of x-y and n is between one and n,
the GCD is the factor. If it’s one, you
have to take x=x2+1 and y=(y2+1)2+1, until the
GCD isn’t one anymore. If the GCD is n, you have to start over, and change
the formula for x to something of the form x2+c, where c is not 0 or
-2. I decided to study Pollard’s Rho
Method in more detail by writing the algorithm in a computer program and
looking at how long it took to find a factor, and how many steps it required. The following is a program I wrote in
Mathematica:
n= (composite number);
f[v_]:={Mod[v[[1]]^2+1,
n], Mod[(v[[2]]^2+1)^2+1, n]}
g[v_]:=GCD[v[[1]]-v[[2]],
n]
w=f[{2, 5}];
a=1;
Timing[Do[t=g[w]; ++a;
If[t>1, Break[], ]; w=f[w], {i, 10}]]
Print[t]
Print[a]
This prints the running
time, the factor (t), and the number of steps (a) to find the factor. The number 10 is changed according to how
many steps the algorithm will go through (Mathematica will not go to one
billion steps), which is said to be usually about n1/4. I used this program for n up to 31 digits (too much CPU time is required to look at numbers
much larger), and then looked for formulas relating the running time to n, and the number of steps to n.
After this I started changing the formula in the algorithm, to see why
Pollard wrote it the way he did. Looking
at how many steps you could go through before the run time got to the 100
seconds, linear functions went 13 steps, cubic functions went 16 steps, and Rho
functions went 24 steps. This shows
that the Rho formulas take the least amount of time to run through. Formulas of the form x=x2+c
worked the quickest, also with the least amount of steps needed, overall. So his random mapping appears to be just
right. I also looked at how the method
works with prime numbers that are very far apart, making one much smaller, and
then multiplied together to get a different type of n. Of course, this was a
much faster algorithm, because Pollard’s Rho Method is fastest for small prime
factors up to about 7 digits. This
means that it is not very efficient for the RSA cryptosystem, where we could
easily be working with factors with up to 100 digits. The research with this method is useful, however, in finding ways
to make the public key stronger.
Pollard’s Rho Method shows that the prime factors need to be about the
same size, but other factoring algorithms start searching numbers that are
about the same size. So there is a
balance in there somewhere.
When
I changed the algorithm to a linear function (y=x+c), I saw that it was simply
trial division. This is because the
differences of the two random sequences found generated a map which was
actually the sequence of integers in order from c and up. This means it was testing for a GCD the same
way as you test for a factor in trial division, and the map was not
random. So the factor was equal to the
number of steps it took to find a factor.
When I found a formula for the number of steps with regards to n, I saw
that this was close to the actual factor.
It was about the square root of n, which is what is said to be close to
the factor of n when the prime numbers are about the same size. Of course, this only worked because I was
using prime numbers close together, so the formula would change for different
numbers.
Pollard’s P-1 Method
This
method works well to find prime factors p so that p-1 is divisible by only
relatively small factors. This tells us
that another way to make a RSA public key secure is to make sure the factors of
the prime factor, of n, minus one are large.
In the p-1 method, you choose b where 2 ≤ b ≤ n-1. Then you start off by taking the greatest
common divisor of b and n, and if that is greater than or equal to 2, you’re
done and the GCD is the proper factor.
If not, you take the first possible prime factor p=2, and calculate
l=Floor(ln(n)/ln(p)), and b=bq^l, and then take g=GCD(b-1,n). While g=1, you continue going back through
this loop, making p the next greater prime number, until 1<g<n. If g=n, there’s a failure and b will have to
be changed. The following is the p-1
method in Mathematica:
n= (composite number);
Timing[r=0; b=2; g=GCD[b,
n]; c=1;
If[g>=2, {Print[g]}, {t=1,
While[g<2, {p=1,
While[p<2, {++t,
If[PrimeQ[t], p=t, p=1], r++}],
l=Mod[Floor[Log[n]/Log[p]], n],
b=PowerMod[b, p^l, n],
g=GCD[b-1,n],
c++}]}]]
Print[g]
Print[c]
Print[r]
This program gives the
time used and prints the factor (g), the number of times through the GCD loop
(c), and the number of times through the prime loop (r). I used this program for up to 28 digits and
waited for a couple hours on the 29th before I stopped.
An
interesting thing I found when experimenting with different values of b was
that when b=6√n, it went the same number of times through each
of the loops, but the time used was at least cut in half. I tried a number of other values for b and
this was the fastest. This was an
interesting find, but for our purposes it only adds to the fact that the value
of n should be large. There could be
other tricks to make this method faster, but until they’re found, it is also
not very useful when trying to factor very large RSA numbers.
The
two main methods I studied, Pollard’s Rho and p-1 methods, were good tools to
study factorization and some cryptography.
No huge breakthroughs were found, but they were interesting to look at,
and they gave us guidelines to follow in order to choose a strong public key,
which was our goal. Pollard’s Rho
method shows us that p and q should be large, and his p-1 method points out how
the factors of p-1 and q-1 should also be large, when p*q=n. These two tips protect the public key from
two easy attacks on the RSA Cryptosystem.
This is a good beginning, and combined with the efforts of the other
students and their work this summer, our team could make a fairly secure public
key.
According
to the RSA lab, the best factoring method for large numbers is the general
number field sieve, although it is very complex. Some other good ones are also the quadratic sieve algorithm and
the elliptic curve method. These would
take more knowledge of number theory and a lot more time to look over. So this project has been a good time of getting
introduced to the basics of cryptology and number theory, and the way research
works.