Parse URLs with RegEx
Today I struggled with a RegEx expression I needed to incorporate into Crazyegg.com. I needed it be inclusive of several criteria (below) and I think I figured out something that is universal and not already somewhere else on the internet.
What I’m trying to do is be inclusive of all “normal” variations of the begining URL parameters like HTTP, HTTPS, WWW and the naked domain.
- Has to parse if it’s HTTP or HTTPS
- Has to parse if it has WWW (with or without http or https)
- 1 or 2 doesn’t have to be present (so it can be a naked domain – example.com)
so here is the expression I came up with, it’s just regular JS regex:
Let me explain what is happening here.
- ^((http(s?):\/\/)?(\/)?(www\.|\/)?yoursite\.com) – this capturing group means that HTTP:// must be present but the (s?) modifies it and means that it’s Optional – Cool… Check! Also, the ? at the end makes the whole expression optional.
- ^((http(s?):\/\/)?(\/)?(www\.|\/)?yoursite\.com) – This capturing group is because we use sharepoint and sometimes we get weird urls being reported so you can probably delete this unless your having the same issue or using a regex expression in GA
- ^((http(s?):\/\/)?(\/)?(www\.|\/)?yoursite\.com) – This capturing group is essentially saying there needs to be a www or / and that it’s optional.
The great thing about this expression is that it pretty much covers every possible combination with the exception of subdomains. To address subdomains we could easily just add them in #3 as another OR option.
I think this is one of the most complete regex expressions that tackles the problem of HTTP, HTTPS and WWW or nothing that I’ve seen… Believe me, I’ve looked.
Why use RegEx in Analytics?
RegEx is a simple language that I get higher levels of granularity and control on what to include or not include in my reports.
regexr.com – This is a great validation tool that highlight changes in realtime
regex101.com – This tool is pretty good too but I like it because it also give you a complete realtime explanation on what your creating AND a better quick reference guide.
When building expressions I normally use both of these tools at the same time between windows and it beats the pants off any paid for versions.
you can also try:
regexpal.com – a simplified version if your in need of validation with no tips (unless you buy the paid version)
debuggex.com – has a pre-created library that you can access if you sign in
Regular Expression iOS app – This app is pretty good and has a lot of great resources with the app… Best of all its free!
regxlib.com – A library various regex expressions you can use. Most are for specific purposes.