PowerShell Grep and Findstr Equivalent Using the Select-String Cmdlet

This post explores a PowerShell Grep and FindStr equivalent using the Select-String cmdlet.

Grep (Unix) and FindStr (Windows) are command line utilities used for searching text patterns within input strings and files.  By using PowerShell’s Select-String cmdlet with regular expressions, we can simulate the same behaviour.

We discussed previously how we can use PowerShell’s Substring method to extract part of a string.  However this is only useful when the string of text we are searching has a consistent format.  In our previous post, we gave the example of a product reference such as ALK-3242334UK.

But what if this same product reference was hidden in a string of random text such as:

Thank you for purchasing the Alkane Laptop (ALK-3242334UK), Keyboard (ALK-9876352UK) and Mouse (ALK-6622553UK).  We value your custom and look forward to seeing you in future.

To extract the product references we must use the Select-String cmdlet with a regular expression.

What is a Regular Expression?

In basic terms, a regular expression (often abbreviated as “regex”) is a powerful and flexible way to match patterns in text.  The teaching of regular expression syntax extends well beyond this post, but as a simple example look at the following:

[A-Za-z]+

Here, the regex search pattern is stating that we want to identify matches in a string of text that have one or more (+) characters that are either upper case (A-Z) or lower case (a-z) letters.

Matches in this example might include:

  • John
  • BaNaNa
  • car

Non-matches might include:

  • House! (exclamation is not an upper or lower case letter)
  • 2Bike (2 is not an upper or lower case letter)
  • Peter Pan (a space is not an upper or lower case letter)

PowerShell Grep and FindStr Equivalent using Select-String

By using Select-String with the -pattern parameter, we can search a string of text using regular expressions.

In our previous post explaining how to use PowerShell’s substring method, we mentioned how the format of an example product reference might be:

ALK-[7 Digits]UK

We can translate this pattern to a regular expressions like so:

ALK-\d{7}UK

In other words, search for a string starting with “ALK” followed by a hyphen, 7 digits and ending with “UK”.  As an example, we can find all instances of product references in a string of text like so:

$alkaneMessage = "Thank you for purchasing the Alkane Laptop (ALK-3242334UK), Keyboard (ALK-9876352UK) and Mouse (ALK-6622553UK).  We value your custom and look forward to seeing you in future."

#pipe text into select-string and return all matches in an array of strings
$allMatches = ($alkaneMessage | Select-String -Pattern "ALK-\d{7}UK" -AllMatches).Matches

#loop through array of strings and output matches
foreach ($match in $allMatches) {
    write-host $match
}

Here we take our random string of text called $alkaneMessage and pipe it into the Select-String cmdlet, specifying our search pattern as an argument, and finding -AllMatches.

We then loop through each match and output the match using write-host.

Of course, in a real-world example the text contained in $alkaneMessage would likely be the text contained in a log file or even multiple log files!  In this example, let’s say we have a folder called c:\alkane\logs containing multiple log files from all our orders.  And we want to extract all the product references from each log file:

$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "ALK-\d{7}UK" -AllMatches

#loop though lines containing matches
foreach ($match in $allMatches) {
    #loop through each match in the line
    foreach ($submatch in $match.Matches) {
            $matchedValue = $submatch.Groups[0].Value
            write-host "File path: " $match.Path
            write-host $matchedValue "found in: " $match.Line
            write-host "Line Number: " $match.LineNumber
            write-host ""
    }
}

Regular Expression Match Groups

With the risk of convoluting things too much, you’ll note that in each match we can have matching groups. In the example above, Group[0] represents a full match – i.e. all of our product reference.

But we can specify groups within a match by using brackets in our regular expression.  Each bracket pairing would then represent Group[1], Group[2] and so on.

For example, if we just want to extract the 7 digits from our product reference we can enclose the digits in brackets in our regular expression like so:

ALK-(\d{7})UK

And then access it using Group[1] like so:

$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "ALK-(\d{7})UK" -AllMatches

#find lines containing matches
foreach ($match in $allMatches) {
    #find each match in the line
    foreach ($submatch in $match.Matches) {
            write-host "Full match: " $submatch.Groups[0].Value
            write-host "Digits only: " $submatch.Groups[1].Value
    }
}

We could even go bonkers and have three matching groups for “ALK”, the digits and the “UK” like so (notice three bracket pairings in the regular expression):

$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "(ALK)-(\d{7})(UK)" -AllMatches

#find lines containing matches
foreach ($match in $allMatches) {
    #find each match in the line
    foreach ($submatch in $match.Matches) {
            write-host "Full match: " $submatch.Groups[0].Value
            write-host "ALK only: " $submatch.Groups[1].Value
            write-host "Digits only: " $submatch.Groups[2].Value
            write-host "UK only: " $submatch.Groups[3].Value
    }
}

Hopefully that provides a taster on PowerShell Grep and FindStr functionality using Select-String and regular expressions.

 

Using PowerShell Select-String to Match Regular Expressions

This blog provides a simple example of using PowerShell select-string to match regular expressions.

Select-string and regular expressions come in really useful when we want to extract specific data from a string that matches a particular pattern.  Consider this URL:

https://www.alkanesolutions.co.uk/2023/05/05/another-great-blog/

We can see that it contains the date in year/month/day.  But how can we extract it easily?  We can use regular expressions.  And the expression we’ll use in this example is:

\d{4}/\d{2}/\d{2}

Splitting this up, we are simply searching for a pattern that matches 4 {4} digits \d followed by a forward slash, then 2 {2}digits \d followed by a forward slash, then 2 {2} digits \d.

We need to expand on this though, because when we search our string we want to be able to extract the day, the month or the year separately.  And to do this we need to make each part a matching group.  We can do this by simply enclosing each matching group in a rounded bracket like so:

(\d{4})/(\d{2})/(\d{2})

So if we find matches in our string, the first matching group will be the whole pattern (\d{4})/(\d{2})/(\d{2}), the second matching group will be (\d{4}), the third (\d{2}) and finally the fourth (\d{2}).

Here’s a quick example:

$text = "https://www.alkanesolutions.co.uk/2023/05/05/another-great-blog/"

$text | Select-String -Pattern "(\d{4})/(\d{2})/(\d{2})" | Select -Expand Matches | Select -Expand Groups

and the result is:

Groups   : {0, 1, 2, 3}
Success  : True
Name     : 0
Captures : {0}
Index    : 34
Length   : 10
Value    : 2023/05/05

Success  : True
Name     : 1
Captures : {1}
Index    : 34
Length   : 4
Value    : 2023

Success  : True
Name     : 2
Captures : {2}
Index    : 39
Length   : 2
Value    : 05

Success  : True
Name     : 3
Captures : {3}
Index    : 42
Length   : 2
Value    : 05

We can clearly see that there are 4 groups captured in total as mentioned above, and you can see the value of each matching group.  This is handy because it makes it really easy to extract the data we want like so:

$text = "https://www.alkanesolutions.co.uk/2023/05/05/another-great-blog/"

$matchingGroups = $text | Select-String -Pattern "(\d{4})/(\d{2})/(\d{2})" | Select -Expand Matches | Select -Expand Groups

$wholeDate = $matchingGroups | Where Name -eq 0 | Select -Expand Value
$justYear = $matchingGroups | Where Name -eq 1 | Select -Expand Value
$justMonth = $matchingGroups | Where Name -eq 2 | Select -Expand Value
$justDay = $matchingGroups | Where Name -eq 3 | Select -Expand Value

write-host $wholeDate
write-host $justYear
write-host $justMonth
write-host $justDay