How Semrush Turns Traffic Data Into Traffic Intelligence

知识库

Semrush Toolkits

Traffic & Market

How Semrush Turns Traffic Data Into Traffic Intelligence

You may have been wondering where the traffic intelligence you see within our Traffic & Market toolkit comes from.

This article unveils the core processes—from raw data collection to ready-to-use insights visible within the tools.

Essentially, all the data goes through the four key stages:

Data collection
Data cleaning
Data modeling
Data delivery

Infographics How Semrush Turns Traffic Data Into Traffic Intelligence

Data collection

We receive terabytes of data from a panel of various third-party data providers every one or two days. This is what’s called clickstream data—it offers an aggregated view of millions of real but anonymized internet users’ online journeys, following their online activity.

Clickstream data lets us identify general user behavior stats and trends.

Data Cleaning

All the data gets aggregated and aligned with a common format in the traffic analytics system.

Using our proprietary machine learning model, we clear data of various anomalies.

As our AI keeps learning, it begins to recognize patterns similar to the way a human brain does, turning our model into an extensive algorithm that can pinpoint anomalies and better separate questionable data from representative data.

We also cross-check the data with Semrush’s backlinks database and organic SERP positions database to see if it matches the specifics of each country and device.

Once the data is reviewed with our algorithm, we get a more realistic picture of generic users’ sessions, and this is the dataset around which we build our engagement metrics.

Data Modeling and Delivery

At this stage, we have a big data box where we store the clickstream and proprietary data.

Before we enter this data into our machine learning model, it goes through one more check. We normalize the data, taking the domain’s popularity into account, as well as “typical” user behavior across countries, demographics, devices, and various industries.

For instance, a user from the US who’s only using the web once a month will more likely visit Google (a popular domain) than the FDA’s website (a somewhat less visited domain), so we take out the part of users with very weak activity patterns in an effort to obtain more accurate data for both more popular and less visited websites.

This helps us input more meaningful data into our machine-learning model.

The algorithm goes through supervised learning, which means that our big data tech keeps enhancing and learning every day.

Daily and Weekly Traffic Data

Semrush offers daily and weekly data in the Traffic & Market dashboards. This enhanced feature comes along with the adoption of a new AI model that offers increased traffic granularity, accuracy, and stability.

While previously we only processed data on a monthly scale, the new model brings daily data processing. Processing data on a daily basis allows us to provide daily and weekly traffic metrics for competitor domains.

Infographics that shows how data collection has changed for .Trends tools.

With this improved AI model offering higher-fidelity data, we are able to bring our prior estimates into sharper focus, which may cause some shifts in metrics.

On Semrush’s Traffic Data Coverage

With data quality, the sky’s the limit. So we are constantly working on adding new data to our tools, while our AI and big data tech keep learning and advancing their algorithms.

We’ve recently updated our data processing model for gathering traffic insights, which allowed us to expand our traffic data coverage by 20%.

Below, you can find out what changed exactly.

Infographics that shows how data processing model has changed.

*Events represent the fact that a user visited a certain webpage.

**Sessions are a set of actions a user makes with a given website during a limited timeframe. In Semrush Traffic & Market, we refer to sessions as visits.