posted by Craig Gidney on March 26, 2013

Yesterday, I read the post “Of intersecting linked lists” by one Paul Masurel. In it, he discusses the problem of determining the first node that is common to two linked lists. He ends with a solution that takes time and space, where is the maximum distance to the first common node from the head of either list. I would inline the solution here, but it’s fifty lines of carefully chosen branches (euphemism: “branch noise”).

There is actually a solution much simpler than Paul’s. One just as time efficient, but requiring only constant space.

### From n to d

The comments about the post on reddit almost came up with the right algorithm. People realized that if you computed the lists’ lengths then you could line up the heads so they advanced to the common node in sync. This removes the need for any sort of collection of seen nodes, reducing the space requirements to constant.

The problem is that computing the lengths of the lists takes time, instead of time. As people realized: although is good, is better. may be as large as , but may also be significantly less. For example, we might be dealing with a class of lists that satisfy because they tend to intersect very quickly.

(Even worse, we might be dealing with implicit lists that never end, so . I’d prefer not to wait an infinite amount of time to discover that [ ] starts intersecting [ ] at .)

Luckily, we don’t need to know the lengths of the lists. Any pair of distances to any common node will work, because we only care about the difference between the two distances. If we can find any common node then we can compute the difference in distance and easily find the earliest common node. Realize this, and you’re halfway there.

### Algorithm: Alternating Anchors

A strategy I often use when dealing with linked lists is “dropping anchors”, and it applies here. The idea is to assign a local variable (the anchor) to be the current node, and watch for the current node matching the anchor again as you progress. Each time you drop the anchor you multiply the number of steps to take, before dropping it again, by a constant factor.

I recommend trying dropping anchors on the classic “Does this linked list end in a cycle?” problem. The result (apparently called Brent’s Algorithm) is three times as fast as the more commonly known Tortoise and Hare solution.

In the case of linked list intersection, we need two anchors: one for each list head. We’ll alternate between the two lists, advancing further and further while watching for the anchor of the other list. Once the other anchor is spotted, we’ve found a common node. We’ll also track the number of steps on both lists as we go, so that we know the distances to the common node without having to retrace our steps. Given the distances, it’s just a matter of synchronizing the heads and then advancing them at the same rate.

Here’s a diagram of the algorithm: In the above diagram, the number of steps between alternations keeps increasing by a factor of . As soon as the number of steps exceeds , which is in the case shown in the diagram, a common node is found. (In general, when using a growth factor of , two more alternations will be sufficient after exceeding ). The distances to the node are then used to line the heads up, so they can advance in sync until the first common node is found.

Notice that the linked list is so large it runs off the side of the diagram, but its long length (whether it be or ) doesn’t affect the running time. It takes alternations for the number of steps to become large enough to guarantee a common node is found, and the last alternation dominates the running time because the number of steps per alternation is increasing exponentially. Thus the running time is , independent of , and we’ve only used a constant amount of space.

This algorithm (which I call ‘alternating anchors’) is actually somewhat similar to Paul Masurel’s. The main differences are that his keeps adding anchors to a set, instead of picking them up and dropping them again, and that he tries to do some sort of backtracking, instead of using the relative distances to sync up and rescan in the lists. Not having to deal with hash sets and such makes alternating anchors faster, although it has the same asymptotic time complexity.

### Code

Assuming you don’t have cycles in your linked lists, this well-tested C# code will find the first common node:

///Finds the first common node between two non-cyclical linked lists.
///All lists are treated as if they end with a null node.
///The null node is treated as a valid node.
// find *any* common node, and the distances to it
var node = new[] {h0, h1};
var dist = new[] {0, 0};
var stepSize = 1;
while (node != node) {
// advance each node progressively farther, watching for the other node
foreach (var i in Enumerable.Range(0, 2)) {
foreach (var repeat in Enumerable.Range(0, stepSize)) {
if (node[i] == null) break;
if (node == node) break;
node[i] = node[i].Next;
dist[i] += 1;
}
stepSize *= 2;
}
}

// align heads to be an equal distance from the first common node
var r = dist - dist;
while (r < 0) {
h0 = h0.Next;
r += 1;
}
while (r > 0) {
h1 = h1.Next;
r -= 1;
}

while (h0 != h1) {
h0 = h0.Next;
h1 = h1.Next;
}

return h0;
}

If you do have cycles, things are more complicated.

### Dealing with Cycles

When cycles are present the above code might compute the wrong relative distance, due to one side going around a cycle more times than the other. It can even go into an infinite loop, if the lists share no common nodes and one (or both) has a cycle.

Why didn’t I bother with cycles? Because they’re not well behaved with respect to the problem, especially if we want to keep that time bound.

Suppose we have the cycle [ ]. What’s the first common node of and ? Both? Neither? Either? An error? How are we going to consistently detect that error, without spending time proving/disproving that there’s a cycle?

That being said, let’s assume that when the linked lists only intersect inside a cycle you’re willing to get back some arbitrary unspecified node from the cycle. A node that might change haphazardly under trivial changes like swapping the argument order or decreasing the growth factor. If you’re willing to bite that bullet, then two simple changes will make the algorithm “work” with cycles.

First, you need to detect getting stuck in a cycle. It’s pretty easy to inline Brent’s Algorithm into the existing code, so I recommend doing that. If both sides have detected a cycle, or one side has detected a cycle and the other has ended, then the lists are disjoint and the algorithm should terminate with failure.

Second, once a common node is found, you need to retrace the distances to that node in order to be sure no nodes were double counted due to cycling. This ensures you get the right result if the lists meet before the cycle, or if they enter the cycle at the same place.

### Summary

You can find the first common node in two linked lists in time proportional to the distance to the common node, using only a constant amount of space.

When the lists end in a cycle, the result may not be well defined.

Dropping anchors is a good strategy to keep in mind when working with linked lists.

I’m an easy target for snipers.

### 5 Responses to “Intersecting Linked Lists Faster”

1. matan says:

hi there,
im having hard time to understand how your algorithm works at O(d) when you have lists that looks like the one in the picture i added tomy comment .
I`ve tried to use your algorithm but it just doesnt work at cases like this

• CraigGidney says:

I generated the animation using the code I posted (though I slightly tweaked a constant), so I’m pretty sure it works. Could you give code for a test case that fails? (I assume you know that O(d) means “proportional to d” and not “at most d”, so the scan going past the intersection by an amount proportional to the prefix length is expected and allowed.)

• m.t says:

perhaps i miss understood your explanation why it takes O(log_{1.5}d) alternations for the number of steps to become large enough to guarantee a common node.
can you add some kind of proof?

• CraigGidney says:

The main things to understand are that 1) eventually the difference between number of steps will be larger than d, 2) once that happens the algorithm will terminate either on the current run of steps or the run of steps after it (depending on which side started off behind), and 3) exponential processes have their runtimes dominated by the largest step.

Since the number of steps keeps getting larger, eventually it will exceed d. Because the increase is exponential and we stop within two steps of exceeding d, the last step will exceed d by at most a constant factor. Because the increase is exponential, we get the same asymptotic complexity by simply considering the final step (which was at most ~d*1.5*1.5 steps if we used a 50% growth rate).

• m.t says:

thank you very much for the explanation.

the idea is brilliant !

Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

## Archive

###### More interesting posts (22 of 25 articles)

Or check out our Portfolio.