A Data Science Central Community
These big data problems probably impact many search engines. It also proves that there is still room for new start-up to invent superior search engines. These problems can be fixed with improved analytics and data science.
Here are the problems, and the solutions:
1. Outdated search results. Google does not do a good job at showing new or recently updated web pages. Of course new does not mean better, and Google algorithm favors old pages with a good ranking, on purpose, maybe because ranking for new pages is less reliable / has less history (that's why we created statistical scores to rank web pages with no history). To solve this problem, the user can add 2013 to Google searches. And Google could do that too, by default. For instance, compare search results for the query data science with those for data science 2013. Which one do you like best? Better, Google should allow you to choose between "recent" vs. "permanent" search results, when you do a search.
The issue here is to correctly date web pages, a difficult problem since webmasters can use fake time stamps to fool Google. But since Google indexes most pages every couple of days, it's easy to create a Google time stamp, and keep two dates for each (static) web page: date when first indexed, date when last modified. You also need to keep a 128-bit signature (in addition to related keywords) for each webpage, to easily detect when it is modified. The problem is more difficult for web pages created on the fly.
2. Wrongly attributed articles. You write an article on your blog. It then gets picked up by another media outlet, say the New York Times. Google displays the New York Times version at the top, and sometimes does not even display the original version at all, even if the search query is the title of the articles, using exact match. One might argue that the New York Times is more trustworthy than your little unknown blog, or that your blog has a poor page rank. But this has two implications:
One easy way for Google to fix the problem is again to correctly identify the first version of an article, as described in the previous paragraph.
3. Favoring irrelevant webpages. Google generates a number of search result impressions per week for every website, and this number is extremely stable. It is probably based on the number of pages, keywords and popularity (page rank) of the web site in question, as well as a bunch of other metrics (time to load, proportion of original content, niche vs. generic website etc.) If every week, Google shows exactly 10,000 impressions for your website, which page / keyword match should Google favor?
Answer: Google should favor pages with low bounce rate. In practice, it does the exact opposite.
However, one might argue that if bounce rate is high, maybe the user has found the answer to his question right away by visiting your landing page, and thus user experience is actually great. In our case (regarding our websites) we disagree, as each page displays links to similar articles and typically results in subsequent page views. Indeed, our worst bounce rate is associated with Google organic searches. More problematic is the fact that bounce rate from Google organic is getting worse (while it's getting better for all other traffic sources), as if Google algorithm lacks machine learning capabilities, or is doing a poor job with new pages added daily. In the future, we will write longer articles broken down in 2 or 3 pages. Hopefully, this will improve our bounce rate from Google organic (and from other sources as well).
Related articles
Comment
Another area of concern is SEO companies destroying your Google page rank using bad SEO practices on purpose, then contacting you "offering" to fix your poor rankings for a fee.
Oleg: A bounce rate of 50% is far above average, if you compute it over thousands of websites (via Google Analytics; the Alexa computation provides very different numbers, but I think they are not as accurate). Besides bounce rate, other metrics measure user interest, such as "time spent", "visit depth" (pages per visit), "number of actions by user" etc.
Of course all these metrics have limitations: if you split a long page into two pages, suddenly all your metrics improve (except number of users, number of visits), but it's purely artificial. However page splitting does really improve one thing: the chance that the visitor will click on a banner ad, especially if banner ads are rotating. So it does indirectly increase revenue.
Vincent, don't you think that the current definition of bounce rate is a bit illogical. For instance, someone visits the main page of a company, sees the 'jobs' page and click on it, thus moving to another page, while still remaining on pages of the same organization. Is such behavior treated as a bounce or not?
Thank you, Vincent. But why do you consider bounce rate of 55 % very good? I think a really good bounce rate needs to be 25-30%, i.e., close to its actual minimum.
@Oleg: A bounce is a user visiting a webpage and then leaving the website right away. In other words, it's a single-page visit to a website, with entry page = exit page.
By "worse", I mean Google.com (organic) bounce rate went from (say) 76% to 79%, over several months, while the number of clicks-in remained very stable. I ranked our traffic sources by bounce rate, and clearly, Google has the worst among all large traffic sources. The only one that was worse was Google paid traffic. Stumbleupon and Reddit have a worse bounce rate (above 90%), due to the way it works - they deliver traffic spikes very infrequently to non-targeted users. LinkedIn, Google+ and our internal properties have very good bounce rates, some below 55%.
Hello Vincent,
How would you count a bounce? As far as I understood from your blog, you take into account not only time spent on a web page but also whether a visitor clicked on one of the links on that page, too, right?
What did you mean by saying that bounce rate from Google organic search gets worse? I think Google returns relevant to searches information on the first few pages (of course, this depends a lot on the ability of a visitor to formulate the right query to Google). If a user is inexperienced or doesn't know precisely what to look for, his queries are vague and this is not Google's fault if irrelevant pages are returned, thus leading to a high bounce rate.
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge