Everything on the internet lives on a server. If servers are the machines responsible for hosting and delivering all of the content you see in your browser, why do so few analytics platforms ask them for statistics on the content they’re serving?
“Server-side” and “client-side” analytics refer to the location where traffic data is collected from users—whether at the origin (server-side) or the destination (client-side). Nearly all contemporary analytics platforms rely on data collected within a user’s browser, including self-hosted solutions where it’s common to run the analytics platform on the same physical server that’s hosting your website.
Come along as we explore some of the reasons for the near-total ascendancy of client-side analytics.
server-side analytics
If you host your websites on servers that you control, it’s theoretically possible to analyze your website traffic by parsing your server access logs. This was the dominant approach to website analytics in the internet’s early days, and it hasn’t entirely faded away. Modern tools like GoAccess and Grafana are built around the same concept.
server-side analytics: advantages
Tracking user traffic by processing server logs is not without its advantages:
- Server access logs provide a complete and authoritative accounting of how often the resources that make up your site have been fetched by remote machines. This allows you to record traffic by bots and crawlers that don’t execute the JavaScript necessary for client-side analytics.
- Server-side analytics can be more robust against client-side blocking. Many ad blockers and privacy-focused browsers ship with filters that block analytics scripts by default.
- Server-side analytics can track how often users download non-HTML resources like PDFs and media files, which a JavaScript-based solution can’t do directly.
- Server-side analytics have access to request headers with information about the user’s IP address, browser, and operating system, although all of this can be spoofed. Storing IP addresses may run afoul of GDPR and other privacy regulations.
server-side analytics: shortcomings
Server-side analytics also has some major shortcomings:
- Server-side analytics becomes very complex if you have a distributed architecture with multiple servers. It’s best suited to situations where all traffic flows through a single, central server, like a reverse proxy server. This limitation rules out server-side analytics for most contemporary cloud hosting solutions.
- If you deploy a static website that’s served over a CDN or cached by intermediaries, traffic stats from the origin server may not be representative of actual user traffic.
- Sourcing server-side analytics data is more difficult. Ingesting and parsing access logs is non-trivial and depends on the specific architecture of your servers. You will need to develop an understanding of log rotation and cron. You will spend more time acting as a system administrator, rather than building your website.
- Reporting tools are not as mature and require more configuration. There’s a decent chance that regular expressions will become involved at some point. You will need to understand the log format of each tool that you want to monitor, if only to debug integration issues that arise.
- The server log analysis tools that do exist nowadays aren’t built with web analytics as a primary use case. GoAccess is the most popular web log analyzer, but it’s targeted more towards system administrators monitoring infrastructure performance than website publishers.
- Single-Page Applications (SPAs) built on frameworks like React, Vue, or Svelte make it difficult to correlate server requests to actual user browsing behavior. These frameworks typically intercept browser navigation requests and simulate page loads by fetching new resources from the server and rendering them in the browser. JavaScript and CSS resources may be re-used on multiple pages. Moreover, these frameworks typically preload page data when a user hovers on a link, which can result in server requests for pages that are never actually visited.
- Server-side analytics can’t track how long users spend on a page, or what they’re doing on it. You can’t set up custom events to record when a user clicks on a button or scrolls to the bottom of a page. The scope of server-side tools is much more limited than what you can achieve with client-side analytics.
Now you can answer the question “why don’t we just parse the server access logs?”. Even in my setup, which has a single reverse proxy server whose logs record all my traffic, I’m not going to bother with server-side analytics because being able to record bot traffic isn’t worth the headache of moving logs around and configuring reporting tools.
client-side analytics
“Client-side analytics” refers to the fact that data is gathered by a script that runs in the user’s browser1. Client-side analytics are the norm these days, and have been since the days of 1990s hit counters.
client-side analytics: advantages
- Easier setup. If you don’t mind complicity in surveillance capitalism, getting started with Google Analytics is as easy as signing up and adding a script tag to your pages. And even if you’re a conspiracy-theorizing masochist who insists on running analytics from your own server, you can get a self-hosted analytics platform up and running in minutes.
- Broader scope. Even lightweight client-side analytics platforms like Umami can record how long users spend on a page. Nearly all tools in this space allow you to configure custom events that trigger when users perform actions like filling out a form or using a mortgage calculator. These additional features mean you can track fulfillment of goals rather than just page views, even when the underlying interactions don’t involve any additional HTTP requests.
- Publisher focus. More heavyweight platforms like PostHog can record heatmaps of user interactions within a page and perform A/B testing, features that target marketers more than sys admins. Client-side analytics tools are better suited towards tracking user engagement rather than just the technical details of content delivery.
client-side analytics: shortcomings
- Blockability. The main disadvantage of client-side analytics is that many clients block the data-collection script from reporting traffic back to the analytics server.
- Bot blindness. Client-side analytics don’t collect statistics on bots and crawlers that don’t execute JavaScript or fully load linked resources like tracking pixels.
- Invasiveness. Client-side analytics tools with an empahsis on user privacy do exist, but they’re still the exception rather than the rule. Many tools like Google Analytics rely on third-party tracking cookies, which require user consent in some jurisdictions. As a website creator you can choose to use a platform that respects user privacy, but you’ll have to do more research to find one.
- Third-party data ownership. Sending traffic data to a third-party analytics server necessarily means relinquishing some control over your traffic data. You can get around this by running an analytics platform on your own server, but that’s not something everyone will be comfortable with.
client-side analytics: architecture
The architecture of a typical self-hosted analytics platform with client-side data collection looks like this:
conclusion
Unless your primary concerns are server load balancing and uptime monitoring, you’re going to end up using a client-side analytics platform (if you use analytics at all). But fear not, there are plenty of options available that make it possible to run an analytics platform on your own server and keep your data private. And it’s possible that you now have a better understanding of the components that make up a typical client-side analytics platform and how they work together, which will prepare you for the task of setting up your own self-hosted analytics solution.
Footnotes
Data may also be collected through a request made by the user’s browser, such as when a user loads a tracking pixel. Tracking pixels are served from API endpoints that record user data directly, unlike server-side analytics based on access logs. ↩