After being stumped by an earlier quesiton: SO google-analytics-domain-data-without-filtering
I ve been experimenting with a very basic analytics system of my own.
MySQL table:
hit_id, subsite_id, timestamp, ip, url
The subsite_id let s me drill down to a folder (as explained in the previous question).
I can now get the following metrics:
- Page Views - Grouped by subsite_id and date
- Unique Page Views - Grouped by subsite_id, date, url, IP (not nesecarily how Google does it!)
- The usual "most visited page", "likely time to visit" etc etc.
I ve now compared my data to that in Google Analytics and found that Google has lower values each metric. Ie, my own setup is counting more hits than Google.
So I ve started discounting IP s from various web crawlers, Google, Yahoo & Dotbot so far.
Short Questions:
- Is it worth me collating a list of all major crawlers to discount, is any list likely to change regularly?
- Are there any other obvious filters that Google will be applying to GA data?
- What other data would you collect that might be of use further down the line?
- What variables does Google use to work out entrance search keywords to a site?
The data is only going to used internally for our own "subsite ranking system", but I would like to show my users some basic data (page views, most popular pages etc) for their reference.