## Intersecting Linked Lists Faster

Yesterday, I read the post “Of intersecting linked lists” by one Paul Masurel. In it, he discusses the problem of determining the first node that is common to two linked lists. He ends with a solution that takes time and space, where is the maximum distance to the first common node from the head of either list. I would inline the solution here, but it’s fifty lines of carefully chosen branches (euphemism: “branch noise”).

There is actually a solution much simpler than Paul’s. One just as time efficient, but requiring only constant space.

### From n to d

The comments about the post on reddit almost came up with the right algorithm. People realized that if you computed the lists’ lengths then you could line up the heads so they advanced to the common node in sync. This removes the need for any sort of collection of seen nodes, reducing the space requirements to constant.

The problem is that computing the lengths of the lists takes time, instead of time. As people realized: although is good, is better. may be as large as , but may also be significantly less. For example, we might be dealing with a class of lists that satisfy because they tend to intersect very quickly.

(Even worse, we might be dealing with implicit lists that never end, so . I’d prefer not to wait an infinite amount of time to discover that [] starts intersecting [] at .)

Luckily, we don’t need to know the lengths of the lists. Any pair of distances to any common node will work, because we only care about the difference between the two distances. If we can find *any* common node then we can compute the difference in distance and easily find the earliest common node. Realize this, and you’re halfway there.

### Algorithm: Alternating Anchors

A strategy I often use when dealing with linked lists is “dropping anchors”, and it applies here. The idea is to assign a local variable (the anchor) to be the current node, and watch for the current node matching the anchor again as you progress. Each time you drop the anchor you multiply the number of steps to take, before dropping it again, by a constant factor.

I recommend trying dropping anchors on the classic “Does this linked list end in a cycle?” problem. The result (apparently called Brent’s Algorithm) is three times as fast as the more commonly known Tortoise and Hare solution.

In the case of linked list intersection, we need two anchors: one for each list head. We’ll alternate between the two lists, advancing further and further while watching for the anchor of the other list. Once the other anchor is spotted, we’ve found a common node. We’ll also track the number of steps on both lists as we go, so that we know the distances to the common node without having to retrace our steps. Given the distances, it’s just a matter of synchronizing the heads and then advancing them at the same rate.

Here’s a diagram of the algorithm:

In the above diagram, the number of steps between alternations keeps increasing by a factor of . As soon as the number of steps exceeds , which is in the case shown in the diagram, a common node is found. (In general, when using a growth factor of , two more alternations will be sufficient after exceeding ). The distances to the node are then used to line the heads up, so they can advance in sync until the first common node is found.

Notice that the linked list is so large it runs off the side of the diagram, but its long length (whether it be or ) doesn’t affect the running time. It takes alternations for the number of steps to become large enough to guarantee a common node is found, and the last alternation dominates the running time because the number of steps per alternation is increasing exponentially. Thus the running time is , independent of , and we’ve only used a constant amount of space.

This algorithm (which I call ‘alternating anchors’) is actually somewhat similar to Paul Masurel’s. The main differences are that his keeps adding anchors to a set, instead of picking them up and dropping them again, and that he tries to do some sort of backtracking, instead of using the relative distances to sync up and rescan in the lists. Not having to deal with hash sets and such makes alternating anchors faster, although it has the same asymptotic time complexity.

### Code

Assuming you don’t have cycles in your linked lists, this well-tested C# code will find the first common node:

```
///Finds the first common node between two non-cyclical linked lists.
///All lists are treated as if they end with a null node.
///The null node is treated as a valid node.
public static Link<T> FindEarliestIntersection<T>(this Link<T> h0, Link<T> h1) {
// find *any* common node, and the distances to it
var node = new[] {h0, h1};
var dist = new[] {0, 0};
var stepSize = 1;
while (node[0] != node[1]) {
// advance each node progressively farther, watching for the other node
foreach (var i in Enumerable.Range(0, 2)) {
foreach (var repeat in Enumerable.Range(0, stepSize)) {
if (node[i] == null) break;
if (node[0] == node[1]) break;
node[i] = node[i].Next;
dist[i] += 1;
}
stepSize *= 2;
}
}
// align heads to be an equal distance from the first common node
var r = dist[1] - dist[0];
while (r < 0) {
h0 = h0.Next;
r += 1;
}
while (r > 0) {
h1 = h1.Next;
r -= 1;
}
// advance heads until they meet at the first common node
while (h0 != h1) {
h0 = h0.Next;
h1 = h1.Next;
}
return h0;
}
```

If you *do* have cycles, things are more complicated.

### Dealing with Cycles

When cycles are present the above code might compute the wrong relative distance, due to one side going around a cycle more times than the other. It can even go into an infinite loop, if the lists share no common nodes and one (or both) has a cycle.

Why didn’t I bother with cycles? Because they’re not well behaved with respect to the problem, especially if we want to keep that time bound.

Suppose we have the cycle []. What’s the first common node of and ? Both? Neither? Either? An error? How are we going to consistently detect that error, without spending time proving/disproving that there’s a cycle?

That being said, let’s assume that when the linked lists only intersect inside a cycle you’re willing to get back some arbitrary unspecified node from the cycle. A node that might change haphazardly under trivial changes like swapping the argument order or decreasing the growth factor. If you’re willing to bite that bullet, then two simple changes will make the algorithm “work” with cycles.

First, you need to detect getting stuck in a cycle. It’s pretty easy to inline Brent’s Algorithm into the existing code, so I recommend doing that. If both sides have detected a cycle, or one side has detected a cycle and the other has ended, then the lists are disjoint and the algorithm should terminate with failure.

Second, once a common node is found, you need to retrace the distances to that node in order to be sure no nodes were double counted due to cycling. This ensures you get the right result if the lists meet before the cycle, or if they enter the cycle at the same place.

### Summary

You can find the first common node in two linked lists in time proportional to the distance to the common node, using only a constant amount of space.

When the lists end in a cycle, the result may not be well defined.

Dropping anchors is a good strategy to keep in mind when working with linked lists.

I’m an easy target for snipers.

—

### Discuss on Reddit

—

## 5 Responses to “Intersecting Linked Lists Faster”

Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

## Archive

- strilanc
- What Quantum Computers Do Faster, with Caveats
- Naming Things: Fail-Useful
- Detecting Simple Cycles Forming, Faster
- Third Party Bit Commitment
- Angular Velocity is Simple
- Collection Equality is Hard
- Deadlocks in Practice: Don’t Hold Locks While Notifying
- Brute Force Parallelization
- A Year’s Worth of Opinions about Objective-C
- Referencing Substrings Faster, without Leaking Memory
- Not Crying Over Old Code
- Exploring Universal Ternary Gates
- Impractical Experiments #2: Securing Peer to Peer Fog of War against Map Hacks
- Achieving Exponential Slowdown by Enumerating Twice
- Using Immortality to Kill Accidental Callback Cycles
- Cancellation Tokens (and Collapsing Futures) for Objective-C
- Visualizing the Eigenvectors of a Rotation
- Collapsing Futures in Objective-C
- Bug Hunting #1: Garbled Audio from End to End
- Impractical Experiments #1: Representing Numbers as Polynomials
- Implementing Quantum Pseudo-Telepathy
- Turn On Your Damn Warnings
- Big-O Made Trivial
- Unfathomable Bugs #7: The Broken Oven
- Solomonoff’s Mad Scientist
- Yearly Blogging Roundup #1
- What isn’t a Monad
- Searching a Sorted Matrix Faster
- How to Read Nested Ternary Operators
- Making Sublime Text 2 Jump to the Correct Line with Unity on OS X
- My Bug, My Bad #4: Reading Concurrently
- Whole API Testing with Reflection
- Optimizing a Parser Combinator into a memcpy
- Don’t Treat Paths Like Strings
- Breaking a Toy Hash Function
- Counting Iterators Lazily
- Unfathomable Bugs #6: Pretend Precision
- My Bug, My Bad #3: Accidentally Attacking WarCraft 3
- Collapsing Types vs Monads (followup)
- Collapsing Futures: Easy to Use, Hard to Represent
- Eventual Exceptions vs Programming in a Minimal Functional Style
- The Mystery of Flunf
- Explain it like I’m Five: The Socialist Millionaire Problem and Secure Multi-Party Computation
- Computer Science Blows My Mind
- A visit to Execution Labs in Montréal
- Transmuting Dice, Conserving Entropy
- Rule of Thumb: Ask for the Clock
- Rule of Thumb: Use Purposefully Weakened Methods
- Rule of thumb: Preconditions Should be Checked Explicitly
- Intersecting Linked Lists Faster
- Mouse Path Smoothing for Jack Lumber
- My Bug, My Bad #2: Sunk by Float
- Repeat Yourself Differently
- Grover’s Quantum Search Algorithm
- Followup to Non-Nullable Types vs C#
- Optimizing Just in Time with Expression Trees
- When One-Way Latency Doesn’t Matter
- Determining exactly if/when/where a moving line intersected a moving point
- Emulating Actors in C# with Async/Await
- Making an immutable queue with guaranteed constant time operations
- Improving Checked Exceptions
- Perishable Collections: The Benefits of Removal-by-Lifetime
- Decoupling shared control
- Decoupling inlined UI code
- Linq to Collections: Beyond IEnumerable<T>
- Publish your .Net library as a NuGet package
- When null is not enough: an option type for C#
- Unfathomable Bugs #5: Readonly or not
- Minkowski sums: examples
- My Bug, My Bad #1: Fractal Spheres
- Working around the brittle UI Virtualization in Windows 8
- Encapsulating Angles
- Unfathomable Bugs #4: Keys that aren’t
- How would I even use a monad (in C#)?
- Useful/Interesting Methods #1: Observable.WhenEach
- Unfathomable Bugs #3: Stringing you along
- Anonymous Implementation Classes – A Design Pattern for C#
- Tasks for ActionScript 3 – Improving on Event-Driven Programming
- Minkowski sums and differences
- Non-Nullable Types vs C#: Fixing the Billion Dollar Mistake
- Unfathomable Bugs #2: Slashing Out
- Script templates and base classes
- Unity font extraction
- Abusing “Phantom Types” to Encode List Lengths Into Their Type
- Constructive Criticism of the Reactive Extensions API
- Quaternions part 3
- Quaternions part 2
- Quaternions part 1
- Unfathomable Bugs #1: You can have things! You can have things IN things! You can have …
- Coroutines – More than you want to know
- Asset Bundle Helper
- The Visual Studio goes away
- .Net’s time traveling StopWatch
- Introducing Catalyst

hi there,

im having hard time to understand how your algorithm works at O(d) when you have lists that looks like the one in the picture i added tomy comment .

I`ve tried to use your algorithm but it just doesnt work at cases like this

I generated the animation using the code I posted (though I slightly tweaked a constant), so I’m pretty sure it works. Could you give code for a test case that fails? (I assume you know that O(d) means “proportional to d” and not “at most d”, so the scan going past the intersection by an amount proportional to the prefix length is expected and allowed.)

perhaps i miss understood your explanation why it takes O(log_{1.5}d) alternations for the number of steps to become large enough to guarantee a common node.

can you add some kind of proof?

The main things to understand are that 1) eventually the difference between number of steps will be larger than d, 2) once that happens the algorithm will terminate either on the current run of steps or the run of steps after it (depending on which side started off behind), and 3) exponential processes have their runtimes dominated by the largest step.

Since the number of steps keeps getting larger, eventually it will exceed d. Because the increase is exponential and we stop within two steps of exceeding d, the last step will exceed d by at most a constant factor. Because the increase is exponential, we get the same asymptotic complexity by simply considering the final step (which was at most ~d*1.5*1.5 steps if we used a 50% growth rate).

thank you very much for the explanation.

the idea is brilliant !