PowerShell Grep and Findstr Equivalent Using the Select-String Cmdlet

This post explores a PowerShell Grep and FindStr equivalent using the Select-String cmdlet.

Grep (Unix) and FindStr (Windows) are command line utilities used for searching text patterns within input strings and files.  By using PowerShell’s Select-String cmdlet with regular expressions, we can simulate the same behaviour.

We discussed previously how we can use PowerShell’s Substring method to extract part of a string.  However this is only useful when the string of text we are searching has a consistent format.  In our previous post, we gave the example of a product reference such as ALK-3242334UK.

But what if this same product reference was hidden in a string of random text such as:

Thank you for purchasing the Alkane Laptop (ALK-3242334UK), Keyboard (ALK-9876352UK) and Mouse (ALK-6622553UK).  We value your custom and look forward to seeing you in future.

To extract the product references we must use the Select-String cmdlet with a regular expression.

What is a Regular Expression?

In basic terms, a regular expression (often abbreviated as “regex”) is a powerful and flexible way to match patterns in text.  The teaching of regular expression syntax extends well beyond this post, but as a simple example look at the following:

[A-Za-z]+

Here, the regex search pattern is stating that we want to identify matches in a string of text that have one or more (+) characters that are either upper case (A-Z) or lower case (a-z) letters.

Matches in this example might include:

  • John
  • BaNaNa
  • car

Non-matches might include:

  • House! (exclamation is not an upper or lower case letter)
  • 2Bike (2 is not an upper or lower case letter)
  • Peter Pan (a space is not an upper or lower case letter)

PowerShell Grep and FindStr Equivalent using Select-String

By using Select-String with the -pattern parameter, we can search a string of text using regular expressions.

In our previous post explaining how to use PowerShell’s substring method, we mentioned how the format of an example product reference might be:

ALK-[7 Digits]UK

We can translate this pattern to a regular expressions like so:

ALK-\d{7}UK

In other words, search for a string starting with “ALK” followed by a hyphen, 7 digits and ending with “UK”.  As an example, we can find all instances of product references in a string of text like so:

$alkaneMessage = "Thank you for purchasing the Alkane Laptop (ALK-3242334UK), Keyboard (ALK-9876352UK) and Mouse (ALK-6622553UK).  We value your custom and look forward to seeing you in future."

#pipe text into select-string and return all matches in an array of strings
$allMatches = ($alkaneMessage | Select-String -Pattern "ALK-\d{7}UK" -AllMatches).Matches

#loop through array of strings and output matches
foreach ($match in $allMatches) {
    write-host $match
}

Here we take our random string of text called $alkaneMessage and pipe it into the Select-String cmdlet, specifying our search pattern as an argument, and finding -AllMatches.

We then loop through each match and output the match using write-host.

Of course, in a real-world example the text contained in $alkaneMessage would likely be the text contained in a log file or even multiple log files!  In this example, let’s say we have a folder called c:\alkane\logs containing multiple log files from all our orders.  And we want to extract all the product references from each log file:

$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "ALK-\d{7}UK" -AllMatches

#loop though lines containing matches
foreach ($match in $allMatches) {
    #loop through each match in the line
    foreach ($submatch in $match.Matches) {
            $matchedValue = $submatch.Groups[0].Value
            write-host "File path: " $match.Path
            write-host $matchedValue "found in: " $match.Line
            write-host "Line Number: " $match.LineNumber
            write-host ""
    }
}

Regular Expression Match Groups

With the risk of convoluting things too much, you’ll note that in each match we can have matching groups. In the example above, Group[0] represents a full match – i.e. all of our product reference.

But we can specify groups within a match by using brackets in our regular expression.  Each bracket pairing would then represent Group[1], Group[2] and so on.

For example, if we just want to extract the 7 digits from our product reference we can enclose the digits in brackets in our regular expression like so:

ALK-(\d{7})UK

And then access it using Group[1] like so:

$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "ALK-(\d{7})UK" -AllMatches

#find lines containing matches
foreach ($match in $allMatches) {
    #find each match in the line
    foreach ($submatch in $match.Matches) {
            write-host "Full match: " $submatch.Groups[0].Value
            write-host "Digits only: " $submatch.Groups[1].Value
    }
}

We could even go bonkers and have three matching groups for “ALK”, the digits and the “UK” like so (notice three bracket pairings in the regular expression):

$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "(ALK)-(\d{7})(UK)" -AllMatches

#find lines containing matches
foreach ($match in $allMatches) {
    #find each match in the line
    foreach ($submatch in $match.Matches) {
            write-host "Full match: " $submatch.Groups[0].Value
            write-host "ALK only: " $submatch.Groups[1].Value
            write-host "Digits only: " $submatch.Groups[2].Value
            write-host "UK only: " $submatch.Groups[3].Value
    }
}

Hopefully that provides a taster on PowerShell Grep and FindStr functionality using Select-String and regular expressions.