Efficient Disasters: Remove-Item In the Pipeline ·

Table of Contents

Today’s post was about one of my major mistakes last year. I’m hoping ya’ll can learn from it.

tl;dr be extra careful with Remove-Item. Also, catching all errors can mask problems

Remember: Disasters are often just a series of small mistakes.

At work I found a package that was failing on a sizeable portion of our fleet. When I dug into it I realized the package was using Get-ChocolateyUnzip to unzip to a directory. The problem was that the directory was for a program that was in use and the unzip command would fail.

Mistake #1: Instead of finding the package owner and convincing them to use a proper install format (e.g. msi/x) I decided I could be clever.

This pattern of attempting to overwrite in-use files is common mistake for those who aren’t familiar with Windows. When a file is opened, Windows locks it to ensure you don’t have conflicting writes. This means that nothing can change them while they’re being used. One approach is to prompt the user to stop the app/service using them, but that doesn’t work when it’s a package manager working behind the scenes. An alternative approach is to move the files.

FileLocks and Moves #

Move!?! Yes. An overwrite is really just a delete and create on Windows. But a move is considered valid because the system understands that it needs to follow the file while the application is still running.

While moving solves the issue of writing the new version to disk, it does leave us with a new problem. We need to clean up the files from the existing version, and those are likely still file locked.

In comes our primary antogonist, Remove-Item. We know it can be dangerous to use programatically but sometimes it’s necessary. One common mistake is to pass it the recurse flag, which can be disastrous if the path is not explicit. Another reason we don’t want to use Remove-Item -Recurse on this folder because it’ll fail on the first file lock.

To avoid letting Remove-Item blindly remove files, we will first get a list of files with Get-ChildItem. This returns a series of objects which represent the files, including the full path, name, permissions, etc. These objects are System.IO.FileInfo type and have been around since the early days of PowerShell.

System.IO.FileInfo, Current Object, and PS5.1 #

Now with a list of files we can iterate and attempt to Remove-Item. We also expect some files to be locked, so we need to catch Remove-Item errors so that we can try to remove them with another method.

$oldFiles = Get-ChildItem -Recurse $oldPath
$oldFiles | ForEach-Object {
  try {
    $filename = $_.FullName
    Remove-Item $_ -Force -ErrorAction Stop
  } catch {
    # Code to clean up on reboot
  }
}

What about $filename? So in the catch code block we need to convert the object to a single string. Having been bit by a bad Remove-Item command in the past I wanted a property that was fully qualified (full path).

Mistake #2: The goal was to actually use $filename with the Remove-Item but it was missed during development.

But guess what? This is valid PowerShell. This should work because we’re passing the full object…

Mistake #3: Unlike PowerShell 7, Remove-Item in PowerShell 5.1 uses the name property.

So this means if we’re trying to remove foo.exe it would essentially be doing Remove-Item foo.exe instead of Remove-Item C:\tools\foo\foo.exe. Mistake #4: Because we were expecting locked files that were going to throw, we were capturing the errors (instead of just specific errors).

Why didn’t this show up in testing? #

I prefer PowerShell 7 as my daily driver. That’s what I default to in VSCode which is where I write most of my code. Because of my paranoia around Remove-Item I tested regularly on my machine.

Let this be a reminder that “my machine” will not always match prod!

One scenario I didn’t account for was the difference between package versions. I was testing version A to C when the issue was actually limited to going from B to C.

Mistake #5: Going from A to C on my machine, I didn’t see any folders getting removed. When checking the second removal method the $filename variable was used further giving me a false sense of confidence.

Once this was landed the fleet was monitored to look for any errors. The problem is that we only monitor our specific environment and this package was used by many.

Mistake #6: Not having a good way to track packages being rolled out across different environments.

Takeaways #

While this resulted in pain outside of my group, I am still responsible for not accounting for other environments/use cases. Like I mentioned at the top, a disaster is rarely a single large event and more a series of smaller ones. If you’re curious about how we recovered, you can check out Force Reinstalling Many Choco Packages.

Let’s go over each mistake and call out some alternatives that could have saved us all this headache.

Mistakes #

Instead of finding the package owner and convincing them to use a proper install format (e.g. msi/x) I decided I could be clever.

I should have found the package owner and figured out how to use a better install format. This would help them level up their skills and possibly call out some environments I failed to account for. Note to self: You’re not that clever! 😆

The goal was to actually use $filename with the Remove-Item but it was missed during development.

This one is tough. You could rely on static analysis, but because the variable was used later it wouldn’t show up. I could write a linter that looks for Remove-Item $_ and throw a warning for myself.

Unlike PowerShell 7, Remove-Item in PowerShell 5.1 uses the name property.

When working on code that will be executed in Windows PowerShell, make sure you’re testing in that version!

Because we were expecting locked files that were going to throw, we were capturing the errors (instead of just specific errors).

As hinted, we should really be looking for a specific error. Capturing all errors is prone to mask problems. I highly recommend Kevin Marquette’s article: PowerShell: Everything you wanted to know about exceptions .

Going from A to C on my machine, I didn’t see any folders getting removed.

Not having a good way to track packages being rolled out across different environments.

I think both of those really are solved by having better data on what packages are being used and where. Had I known that B was actually in use, I could have tested for that scenario.

I literally saw this coming since I was cautious about using Remove-Item and still stepped on this land mine. By sharing my mistakes I hope that you come to realize that no one is perfect, or at least you had a good laugh!

Thanks to Steven Judd for reviewing this post! If only I had you review my PR…

Photo by Sigmund on Unsplash