PowerShell - Get a SubString out of a String using RegEx
Last week one of my colleague asked me if I could help him with some Regular Expression (Regex) to select some text inside a String.
I don’t work a lot with RegEx but when I do, I use tools like PowerRegex from Sapien, RegExr,the technet help forabout_Regular_Expressionsor RegExlib.com. And to be honest, most of the time I’m trying to avoid it…trying to find a solution the “PowerShell Way” before trying with Regex…
Problem
So here is what he asked:
Out of the following string OU=MTL1,OU=CORP,DC=FX,DC=LAB
(Which is a Distinguished Name), he wanted to get the name MTL1, (The site code for Montreal).
Solutions
I came up with the following solutions:
Using PowerShell
("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ",")[0].substring(3)
Using RegEx
("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ',*..=')[1]
Note: Please leave a comment if you know a better way, I would be curious to learn more.
Steps to solution: Using PowerShell
First let’s check the methods and properties available using Get-Member
PS C:\> "OU=MTL1,OU=CORP,DC=FX,DC=LAB" | get-member
TypeName: System.String
Name MemberType Definition
---- ---------- ----------
Clone Method System.Object Clone(), System.Object ICloneable.Clone()
CompareTo Method int CompareTo(System.Object value), int CompareTo(string ...
Contains Method bool Contains(string value)
CopyTo Method void CopyTo(int sourceIndex, char[] destination, int dest...
EndsWith Method bool EndsWith(string value), bool EndsWith(string value, ...
Equals Method bool Equals(System.Object obj), bool Equals(string value)...
GetEnumerator Method System.CharEnumerator GetEnumerator(), System.Collections...
GetHashCode Method int GetHashCode()
GetType Method type GetType()
GetTypeCode Method System.TypeCode GetTypeCode(), System.TypeCode IConvertib...
IndexOf Method int IndexOf(char value), int IndexOf(char value, int star...
IndexOfAny Method int IndexOfAny(char[] anyOf), int IndexOfAny(char[] anyOf...
Insert Method string Insert(int startIndex, string value)
IsNormalized Method bool IsNormalized(), bool IsNormalized(System.Text.Normal...
LastIndexOf Method int LastIndexOf(char value), int LastIndexOf(char value, ...
LastIndexOfAny Method int LastIndexOfAny(char[] anyOf), int LastIndexOfAny(char...
Normalize Method string Normalize(), string Normalize(System.Text.Normaliz...
PadLeft Method string PadLeft(int totalWidth), string PadLeft(int totalW...
PadRight Method string PadRight(int totalWidth), string PadRight(int tota...
Remove Method string Remove(int startIndex, int count), string Remove(i...
Replace Method string Replace(char oldChar, char newChar), string Replac...
Split Method string[] Split(Params char[] separator), string[] Split(c...
StartsWith Method bool StartsWith(string value), bool StartsWith(string val...
Substring Method string Substring(int startIndex), string Substring(int st...
ToBoolean Method bool IConvertible.ToBoolean(System.IFormatProvider provider)
ToByte Method byte IConvertible.ToByte(System.IFormatProvider provider)
ToChar Method char IConvertible.ToChar(System.IFormatProvider provider)
ToCharArray Method char[] ToCharArray(), char[] ToCharArray(int startIndex, ...
ToDateTime Method datetime IConvertible.ToDateTime(System.IFormatProvider p...
ToDecimal Method decimal IConvertible.ToDecimal(System.IFormatProvider pro...
ToDouble Method double IConvertible.ToDouble(System.IFormatProvider provi...
ToInt16 Method int16 IConvertible.ToInt16(System.IFormatProvider provider)
ToInt32 Method int IConvertible.ToInt32(System.IFormatProvider provider)
ToInt64 Method long IConvertible.ToInt64(System.IFormatProvider provider)
ToLower Method string ToLower(), string ToLower(cultureinfo culture)
ToLowerInvariant Method string ToLowerInvariant()
ToSByte Method sbyte IConvertible.ToSByte(System.IFormatProvider provider)
ToSingle Method float IConvertible.ToSingle(System.IFormatProvider provider)
ToString Method string ToString(), string ToString(System.IFormatProvider...
ToType Method System.Object IConvertible.ToType(type conversionType, Sy...
ToUInt16 Method uint16 IConvertible.ToUInt16(System.IFormatProvider provi...
ToUInt32 Method uint32 IConvertible.ToUInt32(System.IFormatProvider provi...
ToUInt64 Method uint64 IConvertible.ToUInt64(System.IFormatProvider provi...
ToUpper Method string ToUpper(), string ToUpper(cultureinfo culture)
ToUpperInvariant Method string ToUpperInvariant()
Trim Method string Trim(Params char[] trimChars), string Trim()
TrimEnd Method string TrimEnd(Params char[] trimChars)
TrimStart Method string TrimStart(Params char[] trimChars)
Chars ParameterizedProperty char Chars(int index) {get;}
Length Property int Length {get;}
So how can I get the MTL1 ? Notice how each elements are separated by a comma ,
Let’s try to split them, there is Split()
method!
("OU=MTL1,OU=CORP,DC=FX,DC=LAB").split(',')
OU=MTL1
OU=CORP
DC=FX
DC=LAB
Awesome… but instead of the Split()
method, let’suse the PowerShell -Split
Operator.
PS C:\> "OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ','
OU=MTL1
OU=CORP
DC=FX
DC=LAB
Now, Out of the 4 items, we want to select the first one.[0]
will do it.
PS C:\> ("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ',')[0]
OU=MTL1
Finally we can use the method SubString()
to select the piece of text we want.
The first letter of the Site code comes after the =
sign, so it will be charactere number 3.
PS C:\> ("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ',')[0].substring(3)
MTL1
Steps to solution: Using Regex
RegEx a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching (example: validate an Email format). RegEx allows you to search on Positioning, Characters Matching, Number of Matches, Grouping, Either/Or Matching, Backreferencing. Important to note that you can also use RegEx to replace substring or split your strings.
In my solution I used the following part:
The first part ,*
will match zero or more timeof the preceding element.
The second part ..=
will find anypatternthat contains 2charactersfollowed by =
A period matches one instance of any character
PS C:\> ("OU=MTL1,OU=CORP,DC=FX,DC=LAB" -split ',*..=')[1]
MTL1
Other solutions from the readers
Jay
'OU=MTL1,OU=CORP,DC=FX,DC=LAB' -match '(?<=(^OU=))\w*(?=(,))'
$matches[0]
Robert Westerlund
"OU=MTL1,OU=CORP,DC=FX,DC=LAB" -match "^OU=(?<MTL1>[^,]*)"
$matches["MTL1"]
Resources
- TechNet - about_Operators
- TechNet - about_Comparaison_operators
- TechNet - about_split
- TechNet - about_join
- PowerShell Cookbook - Appendix B - Regular Expression Reference
- MSDN - Regular Expression Language - Quick Reference(Thanks Jay)
- Scripting Guys Blog - How Can I Create a Phone Directory from Files with Varying Text Formats?
- Scripting Guys Blog - How Can I Convert a Tab-Delimited File to a Comma-Separated Value File?
- Scripting Guys Blog -How Can I See Which Packets Are Being Dropped by Windows Firewall?
- Scripting Guys Blog -Use PowerShell Regular Expressions to Format Numbers
- Scripting Guys Blog - Articles about RegEx
Leave a comment