Impractical Experiments #1: Representing Numbers as Polynomials

posted by Craig Gidney on September 24, 2013

In this post: trying to represent numbers like \sqrt[3]{2 + \sqrt{3} \cdot 5} in a nice way.

Roots and Numbers

I like arbitrary precision arithmetic. I don’t end up using it that much, but I like having the option of doing things with perfect accuracy. Not having to care about overflow, not having to care about rounding error, just… being correct by default.

That is, until I have to compute a square root and am invariably forced to sacrifice my nice correctness properties. I want all the following properties, but square roots come along and ruin everything:

  • Exact: Most square roots are irrational. This makes it difficult to represent them exactly, without introducing any error. You’ll never find integers n and d such that \frac{n}{d} = \sqrt{2}, so forget about using a BigRational.
  • Closed: It’s not enough to represent simple roots, you have to be able to add and multiply without moving to a different type. For example, a naive representation like { base: Rational, root: Int } is exact but not closed. It can represent 1 as { base:1, root:1 } and \sqrt{2} as { base:2, root:2 }, but it can’t represent 1 + \sqrt{2}.
  • Rigid Equality: Number representations flexible enough to represent square roots tend to be so flexible that you can’t actually compute equality. For example, suppose you’re representing numbers as functions that output more and more digits of their corresponding number. Given functions f and g, corresponding to numbers x_f and x_g, how do you determine if x_f = x_g? If they’re not equal you can just find a digit that differs, but showing they’re equal requires showing all of the digits are the same. This will either get you stuck in a loop or require a halting oracle to analyze the functions (i.e. the problem is incomputable). I’d prefer to wait not forever to discover that, yes, \sqrt{3} = \sqrt{5 + \sqrt{24}} - \sqrt{2}.
  • Minimal: One way to represent roots is to use a full computer algebra system. For the purposes of this post, I’m going to consider that massive overkill. Is there a simpler representation, that can handle roots?

A couple months ago, I came up with an idea for a number representation that seemed to satisfy each of these properties: the roots of polynomials with integer coefficients. Consider: if you solve for x in x^3 - 2 = 0, what do you get? The only real solution is the cube root of 2. That’s kind of interesting: we can think of x^3 - 2 = 0 as an indirect representation of \sqrt[3]{2}.

In fact, a wide class of numbers can appear as the roots of a polynomial and this has been investigated mathematically by plenty of people. The so-called algebraic numbers include not just roots but “Any expression formed using any combination of the basic arithmetic operations and extraction of nth roots gives an algebraic number” (and more). That sounds like exact and closed to me.

You can also check if polynomials share a root using polynomial gcd (rigid equality!). It’s a bit disappointing that transcendental numbers like euler’s constant can’t appear as polynomial roots, but from another point of view that’s just being minimal.

Roots of integer polynomials seem to have the properties I want. On the other hand, because computing polynomials roots involves approximating, I’ll need to somehow work with the roots without actually computing them. Time to experiment.

Basic Arithmetic

Let’s start by figuring out how to represent simple numbers, and do arithmetic on them involving other simple numbers.

Conversion

How do we represent an integer n as the root of a polynomial? Easy: it’s the root of x - n = 0. If we’re representing polynomials as a little-endian list of integer coefficients, that means our representation of 5 will be [-5, 1].

Representing a rational number \frac{n}{d} is one small step further. The polynomial x-\frac{n}{d} = 0 doesn’t have integer coefficients, but we can multiply both sides by d to get the equivalent d \cdot x - n = 0 which does. That’s [-n, d].

Shifting

How do we increase the root(s) of a polynomial by 1? We use the fact that, to shift a function p(x) to the right, you just use p'(x) = p(x-1).

More precisely, to increase the roots of c_n x^n + ... + c_2 x^2 + c_1 x + c_0 by 1 we must subtract 1 from each x like so: c_n (x-1)^n + ... + c_2 (x-1)^2 + c_1 (x-1) + c_0. We then need to expand the terms (using the binomial theorem) and simplify the result (I didn’t say we had to be fast…).

For example, x^2 - 3x + 2 has roots 1 and 2. To add 1 we expand (x-1)^2 - 3(x-1) + 2 to x^2 - 5x + 6 and find that it does in fact have the roots 1+1 and 2+1.

Actually, the technique of “do the opposite to x” lets us do a whole bunch of operations. To add c to the roots of p we use p(x-c). To subtract c from the roots of p we use p(x+c).

Scaling

To multiply the roots of p by c we just do the opposite to x and use p(x/c). This just divides the coefficient of x^i by c^i for each i. The resulting coefficients may no longer be integers, but we can fix that by scaling the coefficients by the least common multiple of the denominators.

For example, multiplying the roots of x^2 - 3x + 2 by 5 gives \frac{1}{25}x^2 - \frac{3}{5}x + 2. We cancel the denominators by multiplying by 25, giving x^2 - 15x + 50 which has roots 1 \cdot 5 and 2 \cdot 5.

When the scaling factor is -1, meaning we’re just negating our input, dividing x^i by c^i corresponds to just flipping the coefficients of odd powers of x.

Division

The multiplicative inverse of x is \frac{1}{x}. When we try to get the multiplicative inverse roots of p(x) by doing the opposite to x, we end up with p(\frac{1}{x}). That’s not a polynomial, but we can fix it.

For example, x^2 - 3x + 2 turns into \frac{1}{x^2} - \frac{3}{x} + 2. We fix it not being a polynomial by multiplying both sides by x^2. The result is 1 - 3x + 2x^2, which has roots that are the multiplicative inverses of 1 and 2: 1 and \frac{1}{2}.

The multiplicative inverse of roots operation has a much simpler formulation. Did you catch it? Just reverse the coefficients.

Division works exactly like multiplication when p(x) is used for the numerator. When p(x) is used in the denominator, we just use the multiplicative inverse operation before multiplying.

For example, to represent the result of dividing 5 by the roots of x^2 - 3x + 2 we just take the multiplicative inverse, which is 2x^2 - 3x + 1, and multiply its roots by 5 to get 2x^2 - 15x + 25. The result has the right roots: \frac{5}{1} and \frac{5}{2}.

N’th Roots

To take the n‘th root of the roots of p we do the opposite to x and use p(x^n). This is equivalent to scaling each exponent by a factor of n.

That’s right, to cube root the roots of x^2 - 3x + 2 we only need to multiply the exponents by 3 to get x^6 - 3x^3 + 2. It has roots 1 and \sqrt[3]{2}, which is exactly what we wanted.

Equality

To determine if a rational \frac{n}{d} is a root of p(x), we just evaluate p(\frac{n}{d}) and check if the result is 0. You can use a BigRational to do the evaluation, since all the relevant quantities (the input and the coefficients) are integers or rationals.

Summary of Operations

When only one of the arguments to an operation is represented implicitly as the roots of a polynomial, you can get an implicit polynomial result as follows:

  • Conversion: A rational \frac{n}{d} becomes the polynomial d \cdot x - n = 0.
  • Equality: Check p(c) = 0.
  • Addition: Expand and simplify p(x - c).
  • Subtraction: Expand and simplify p(x + c).
  • Negation: Flip coefficients of odd powers.
  • Multiplicative Inverse: Reverse coefficients.
  • Multiplication: Scale coefficient of x^i by c^{-i} for each i. Cancel common denominator.
  • N’th root: Scale exponents by n.
  • Division (p as numerator): Scale coefficient of x^i by c^i for each i.
  • Division (p as denominator): Combine the multiplicative inverse and multiplication operations.
  • Exponentation: Unsure…

I hope this has given you a sense of how operations work on the roots, without actually computing the roots. Things are about to get more complicated.

Operations on Roots of Two Polynomials

The real challenge to using polynomials to represent numbers is operating on two at once. I know, because I filled pages and pages of notebook trying to rewrite the coefficients of the polynomial you get by adding or multiplying the roots of two polynomials in terms of the coefficients of the other two. This turned out to be the wrong approach, but I’ll go over it briefly.

A polynomial can be written as the product of its roots r_i like this: (x-r_1)(x-r_2)...(x-r_n). More compactly, given the list of roots R, you can write \prod_{r \in R}(x-r).

A polynomial’s coefficient c_i, which would be multiplied by x_i in the standard sum-of-powers form, can be computed by adding up all the ways you can choose i factors of x when expanding the product form of the polynomial. More exactly, c_i = \sum_{S \in \binom{R}{|R| - i}} (\prod S), where \binom{R}{n} is the list of all the ways you can choose n elements from the list of roots R.

If the roots of our input polynomials are R_1 and R_2 respectively, and we want to add the roots, then the resulting polynomial must have the root r_3 = r_1+r_2 whenever r_1 \in R_1 and r_2 \in R_2. Formally, R_3 = \biguplus_{r_1,r_2 \in R_1 \times R_2}(r1 + r2), where \biguplus is “make a list of these items” in the same sense that \sum is “sum up these items”.

Given R_3, we can compute its coefficients c_i = \sum_{S \in \binom{R_3}{i}} (\prod S). All we need to do is rewrite that equation for c_i into a form that only mentions the coefficients derived from R_1 and R_2, instead of directly using R_1, R_2, or R_3.

Easy!… Not so much.

I did make progress, using this approach. I managed to separate the effects of the two lists of roots, but got stuck at a point where I was dot-producting all the decreasing sequences that sum up to whatever into the exponents of …

You know what, nevermind the details of how I was failing. The point is that eventually I asked for help from the math stack exchange. Within an hour, the problem was solved. Three different ways.

The solution I implemented is based on creating and solving a linear system. Unfortunately, despite implementing it and even tweaking it to work for addition, I don’t understand it well enough to explain it. The answer I was given is quite good, though.

Normally I’d work at understanding the code until I could know it was correct. In this case there are more serious issues that make the point a bit moot.

In Practice

I realized ahead of time that representing numbers as polynomials was going to have issues. That’s why it was an impractical experiment. Regardless, once I finished implementing the code, I could finally try out my numbers-as-polynomials and confirm that the expected problems were present.

The approach works fine on simple cases:

PolyNumber n = 5; // x - 5 = 0
PolyNumber v = n.Root(3); // x^3 - 5 = 0
double[] roots = v.Approximates(); // { 1.7099757194519043 }

But things can get quite slow, as the degrees of polynomials multiply together as you operate:

PolyNumber n = 5; // x - 5 = 0
PolyNumber v = n.Root(5) + n.Root(7); // takes MULTIPLE SECONDS
// x^35 + -35x^30 + -25x^28 + 525x^25 + -56875x^23 + 250x^21 + -4375x^20 + -2996875x^18 
// + -1242500x^16 + 21875x^15 + -1250x^14 + -24609375x^13 + 48365625x^11 + -65625x^10 
// + -2800000x^9 + -37734375x^8 + 3125x^7 + -64421875x^6 + 109375x^5 + -9406250x^4 
// + -6562500x³ + -328125x² + 546875x + -81250
double[] roots = v.Approximates(); // { 2.6382288932800293 }

And don’t forget that numbers have two square roots, meaning we end up tracking multiple solutions a lot:

PolyNumber n = 2; // x - 2 = 0
PolyNumber v = n.Root(2) + 1; // x² + -2x + -1
double[] roots = v.Approximates(); // { 0.41421365737915039, 2.4142136573791504 }

It’s also possible to end up with multiple solutions despite never using an operation with two real results. For example, computing \sqrt[3]{2} + \sqrt[9]{8}:

var two = (PolyNumber)2; // x - 2
var eight = (PolyNumber)8; // x - 8
var p = two.Root(3) + eight.Root(9); // x^24 + -20x^21 + 184x^18 + -1064x^15 + -3344x^12 + -146528x^9 + -304064x^6 + -25472x³ + -2048
var roots = p.Approximates() // { -1.2599220275878906, 2.5198402404785156 }

Where did that second root come from? Well, the root diagram shown on wolfram alpha gives a clue:

complex roots

Oh right, polynomials have complex roots and our addition operation adds roots together. This is inconvenient, despite being kind of magical since I didn’t try to deal with the complex case at any point. Sometimes adding two complex roots cancels their imaginary parts and poof you get a brand new real root you didn’t want!

Well, so much for that idea.

Summary

It’s possible to represent numbers implicitly as the roots of polynomials, and you get a lot of nice properties doing so, but ultimately the result is slow and has a nasty tendency to introduce “bonus” solutions.

You can browse my implementation on github.

Update: Commenters have mentioned Holonomic functions, which generalize this idea to differential equations, as well as the Cyclotomic fields. Other people have even had the same idea.

Discuss on Reddit

My Twitter: @CraigGidney

  • chrisdew

    Have you thought of representing variables as the list of operations performed on them, with a simplification pass on each operation, and a final calculation, done only at display time? Analogous to how some ORMs merge multiple filters and merge and eliminate redundant sorts.

    • CraigGidney

      Yes, but that’s overkill and very complicated to get right. When you go to simplify an expression it’s hard to ensure you have all the necessary rules, and hard to know when to stop simplifying. Exact equality also remains very difficult without some sort of structure beyond “do these operations”.


Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

Archive

More interesting posts (7 of 29 articles)

Or check out our Portfolio.