This post explores a PowerShell Grep and FindStr equivalent using the Select-String cmdlet.
Grep (Unix) and FindStr (Windows) are command line utilities used for searching text patterns within input strings and files. By using PowerShell’s Select-String cmdlet with regular expressions, we can simulate the same behaviour.
We discussed previously how we can use PowerShell’s Substring method to extract part of a string. However this is only useful when the string of text we are searching has a consistent format. In our previous post, we gave the example of a product reference such as ALK-3242334UK.
But what if this same product reference was hidden in a string of random text such as:
Thank you for purchasing the Alkane Laptop (ALK-3242334UK), Keyboard (ALK-9876352UK) and Mouse (ALK-6622553UK). We value your custom and look forward to seeing you in future.
To extract the product references we must use the Select-String cmdlet with a regular expression.
What is a Regular Expression?
In basic terms, a regular expression (often abbreviated as “regex”) is a powerful and flexible way to match patterns in text. The teaching of regular expression syntax extends well beyond this post, but as a simple example look at the following:
[A-Za-z]+
Here, the regex search pattern is stating that we want to identify matches in a string of text that have one or more (
+)
characters that are either upper case (
A-Z
) or lower case (
a-z
) letters.
Matches in this example might include:
- John
- BaNaNa
- car
Non-matches might include:
- House! (exclamation is not an upper or lower case letter)
- 2Bike (2 is not an upper or lower case letter)
- Peter Pan (a space is not an upper or lower case letter)
PowerShell Grep and FindStr Equivalent using Select-String
By using
Select-String
with the
-pattern
parameter, we can search a string of text using regular expressions.
In our previous post explaining how to use PowerShell’s substring method, we mentioned how the format of an example product reference might be:
ALK-[7 Digits]UK
We can translate this pattern to a regular expressions like so:
ALK-\d{7}UK
In other words, search for a string starting with “ALK” followed by a hyphen, 7 digits and ending with “UK”. As an example, we can find all instances of product references in a string of text like so:
$alkaneMessage = "Thank you for purchasing the Alkane Laptop (ALK-3242334UK), Keyboard (ALK-9876352UK) and Mouse (ALK-6622553UK). We value your custom and look forward to seeing you in future."
#pipe text into select-string and return all matches in an array of strings
$allMatches = ($alkaneMessage | Select-String -Pattern "ALK-\d{7}UK" -AllMatches).Matches
#loop through array of strings and output matches
foreach ($match in $allMatches) {
write-host $match
}
Here we take our random string of text called
$alkaneMessage
and pipe it into the
Select-String
cmdlet, specifying our search
pattern
as an argument, and finding
-AllMatches
.
We then loop through each match and output the match using
write-host
.
Of course, in a real-world example the text contained in
$alkaneMessage
would likely be the text contained in a log file or even multiple log files! In this example, let’s say we have a folder called c:\alkane\logs containing multiple log files from all our orders. And we want to extract all the product references from each log file:
$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "ALK-\d{7}UK" -AllMatches
#loop though lines containing matches
foreach ($match in $allMatches) {
#loop through each match in the line
foreach ($submatch in $match.Matches) {
$matchedValue = $submatch.Groups[0].Value
write-host "File path: " $match.Path
write-host $matchedValue "found in: " $match.Line
write-host "Line Number: " $match.LineNumber
write-host ""
}
}
Regular Expression Match Groups
With the risk of convoluting things too much, you’ll note that in each match we can have matching groups. In the example above, Group[0] represents a full match – i.e. all of our product reference.
But we can specify groups within a match by using brackets in our regular expression. Each bracket pairing would then represent Group[1], Group[2] and so on.
For example, if we just want to extract the 7 digits from our product reference we can enclose the digits in brackets in our regular expression like so:
ALK-(\d{7})UK
And then access it using Group[1] like so:
$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "ALK-(\d{7})UK" -AllMatches
#find lines containing matches
foreach ($match in $allMatches) {
#find each match in the line
foreach ($submatch in $match.Matches) {
write-host "Full match: " $submatch.Groups[0].Value
write-host "Digits only: " $submatch.Groups[1].Value
}
}
We could even go bonkers and have three matching groups for “ALK”, the digits and the “UK” like so (notice three bracket pairings in the regular expression):
$allMatches = Select-String -Path "C:\alkane\logs\*.log" -Pattern "(ALK)-(\d{7})(UK)" -AllMatches
#find lines containing matches
foreach ($match in $allMatches) {
#find each match in the line
foreach ($submatch in $match.Matches) {
write-host "Full match: " $submatch.Groups[0].Value
write-host "ALK only: " $submatch.Groups[1].Value
write-host "Digits only: " $submatch.Groups[2].Value
write-host "UK only: " $submatch.Groups[3].Value
}
}
Hopefully that provides a taster on PowerShell Grep and FindStr functionality using Select-String and regular expressions.