Don’t Treat Paths Like Strings

posted by Craig Gidney on July 9, 2013

File paths are represented by strings, but they don’t act like strings. People often manipulate paths by using simple string operations, but this is a mistake.

In this short post I’ll justify not treating paths like strings by showing how two ‘simple’ path operations, combining paths and equating paths, do not correspond to the simple string operations I so often see used as if they were equivalent.

Path Combining is not String Concatenation

Perhaps the most common operation performed on paths is combining them. Given a root path and a relative path, return a path that where the relative path is relative to the root path. For example, combining the root path C:\ProgramFiles with the relative path SomeApp\Defaults.dat gives the resulting path C:\ProgramFiles\SomeApp\Defaults.dat.

The naive way to do path combining is via string concatenation:

string Combine(string root, string relative) {
    return root + @"\" + relative; // NO. BAD.
}

The above solution has three problems:

  • Creating invalid paths. Suppose I try to combine the paths C:\test1 and C:\test2. What should combining these paths do? I see two reasonable interpretations: throw an error, or have the second path being absolute cause the first path to be ignored. Instead of those two possibilities, the naive method returns an invalid path (C:\test1\C:\test2) that can cause a crash at some future inconvenient time.
  • Accumulating separators. If the root path already ends in a \, the resulting path will be separated by a \\ instead of a \. This is inelegant and potentially confusing: users may wonder if the path will work (it should), and programmers may incorrectly think \\ is an escape sequence for \.
  • Hardcoded separators. The major problem with the above solution is that it assumes \ is a valid directory separator. This is wrong on Linux, which uses /. It’s also questionable on Windows machines when the default locale is set to Japan or Korea in which case the directory separator is sorta-kinda the yen sign or the won sign.

A proper path combiner doesn’t have the above problems. The .Net framework actually includes such a method: Path.Combine. Use it. (Perhaps even on hardcoded paths, to avoid any issues related to assuming a particular directory separator character.)

Path Equality is not String Equality

Needing to determine if two paths are equivalent (i.e. resolve to the same local file) is not as common an operation as combining, but it still shows up now and then.

The naive solution for equating paths is to use string equality:

bool ArePathsEqual(string path1, string path2) {
    return path1 == path2; // Not even CLOSE to correct.
}

The above code is horribly, horribly wrong. Allow me to list why:

  • Case sensitivity. Windows file paths are case-insensitive and Unix file paths aren’t. Depending on the environment, file paths may or may not be case sensitive.
  • Alternative separators. Windows typically allows both \ and / as directory separators, and forgives placing multiple separators adjacent to each other. The path Documents\Doc.txt is equivalent to the path Documents\/\/\/\/\////\\\Doc.txt.
  • Relative directories. File paths may contain the special . and .. directories to ‘stay’ or ‘go up’. The path Documents\Doc.txt is equivalent to the path Documents\.\..\.\Documents\.\.\.\Doc.txt.
  • Macros. Depending on what will be consuming the file path, paths may contains macros that expand into normal paths. For example, %AppData%\Defaults.dat may expand into C:\ProgramData\YourApp\Defaults.dat and so, in cases where the path will be expanded in that way, you should consider those two paths to be equivalent.
  • Absolute vs Relative. The path Relative.txt resolves to the same file as C:\Relative.txt when the current directory is C:, but not when the current directory is elsewhere. You need to decide if you want relative and absolute paths to be sometimes equivalent (when the current directory happens to make them match) or never equivalent (since they are affected differently by changing the current directory).
  • Hard links. Totally distinct file paths may point at the same file on disk.

If you try to determine path equality via string equality, expect things to go badly when any of those six issues comes into play.

Unfortunately, because different use cases require different variants of path equality (Does it make sense for relative paths to be considered equivalent to absolute paths? What sorts of macros are there? Do we care about hard links?), there’s no standard method(s) to equate paths. At least, not that I’m aware of in the languages I use day to day. If you need to compare paths, you’re likely stuck implementing your own utility method. Determine what the semantics should be, and make sure to include tests covering the mentioned issues.

Summary

Paths are represented by strings, but it can be misleading to think of them as strings. When working with paths you should use methods designed to work on paths, not strings. There are string operations that appear to correspond to path operations, such as combining paths via string concatenation and comparing paths via string equality, but they miss important corner cases.

Note that, if you happen to be in a programming language with a structured type for paths, you should consider using that instead of raw strings wherever possible. For example, in Java you can use File or Path and in .Net you can use FileInfo and related classes.

Discuss on Reddit


Twisted Oak Studios offers consulting and development on high-tech interactive projects. Check out our portfolio, or Give us a shout if you have anything you think some really rad engineers should help you with.

Archive