Filtering Yahoo Mail and Live Mail in Google Analytics

[Note: Please leave a comment if there is another type of filter that you would like to see? Or issue you have with GA data.]

One of the time consuming and annoying things about Google Analytics is that it handle the sub-domains from Yahoo and Live mail as separate referral sources. There is not sufficient documentation at Google to explain how to condense these mail programs into a single source so I will show you how.

First, if you don’t already have one, you should create a Test Profile. Just in case anything goes wrong — you don’t want to screw up your existing profile’s data.

Now make a Custom Advanced Filter:

Google Analytics Mail Filter

Extract Campaign Source. In this I am extracting anything that ends in mail.yahoo.com and overwriting the source with “mail.yahoo.com”

  1. Field A -> Extract A: Campaign Source: (.*)\.mail\.yahoo\.com$
  2. Output To -> Constructor: Campaign Source: mail.yahoo.com
  3. Field A Required: Yes
  4. Override Output Field: Yes
  5. Case Sensitive: No

This filter will collect all URLs that end in mail.yahoo.com and condense them to only mail.yahoo.com. You can do a similar filter for Live or any other e-mail service that is being seen as multiple referrers so your goal conversion tab will be more accurate and useful.

Why a Custom Filter is Necessary

Google Analytics has a pre-made filter called Search and Replace, but because it does not accept Regular Expression commands you would need to create a separate filter for every webmail account rather than this filter that handles the problem at a provider level.

A note on regular expressions: The extract fields are give special meaning to some character (see below), make sure that you use a forward slash (\) before your periods that are supposed to be read as periods, otherwise you may get bad results.

From Google FAQ

Regular Expression Characters

Click on each character’s description to read a detailed article describing how to use it.

Wildcards

. Matches any single character (letter, number or symbol) goo.gle matches gooogle, goodgle, goo8gle
* Matches zero or more of the previous item The default previous item is the previous character. goo*gle matches gooogle, goooogle
+ Just like a star, except that a plus sign must match at least one previous item gooo+gle matches goooogle, but never google.
? Matches zero or one of the previous item labou?r matches both labor and labour
| Lets you do an “or” match a|b matches a or b

Anchors

^ Requires that your data be at the beginning of its field ^site matches site but not mysite
$ Requires that your data be at the end of its field site$ matches site but not sitescan
Note: to understand why anchors are necessary, please read Tips for Regular Expressions at the bottom of this page.

Grouping

() Use parenthesis to create an item, instead of accepting the default Thank(s|you) will match both Thanks and Thankyou
[] Use brackets to create a list of items to match to [abc] creates a list with a, b and c in it
- Use dashes with brackets to extend your list [A-Z] creates a list for the uppercase English alphabet

Other

\ Turns a regular expression character into an everyday character mysite\.com keeps the dot from being a wildcard

Tips for Regular Expressions

  1. Make the regular expression as simple as possible so that you and your colleagues can work with them easily in the future.
  2. Be sure to use a backslash if you have characters like “?” or “.” and you wish to match those literal characters — otherwise, they will be interpreted as special regular expression characters.
  3. Not all regular expressions include special characters. For example, you can specify that a Google Analytics goal be a regular expression, and even if you don’t have any special characters, your goal will be interpreted according to the rules of regular expressions.

Regular expressions are greedy. For example, site matches mysite and yoursite and sitescan. If site is your regular expression, it is the equivalent of asking to match to all strings that contain site. Therefore, you should use anchors whenever necessary, to get a more accurate match. ^site$, which uses both a beginning ^ and ending $ anchor, will ensure that the expression has to start with site and end with site and include nothing else. Notice, too, that there were no special characters in the regular expression site – it is interpreted as a regular expression only if it is in a regular expression-sensitive field.

2 Trackbacks

  1. [...] Hundred Dollar SEO You Get What You Pay For Skip to content ContactArchivesSitemapSexiest Man In SEO « Filtering Yahoo Mail and Live Mail in Google Analytics [...]

  2. By Hundred Dollar SEO » Anchor Text For Internal Links on November 13, 2008 at 8:26 pm

    [...] you should be linking with an appropriate description of the content that you are linking to like: filtering mail in Google Analytics. If you need a full sentence to describe a link something is suspect — both as a reader and [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*