There is a lot to say about this topic.
I recently shared at SearchFest how large-scale site search and dynamic content can be problematic for SEO. Not surprisingly, there were many questions at the conference and online afterwards asking me for more specific information. This article is an attempt to set clarity on the subject.
It's a difficult topic for a few reasons. One is that many of the trends SEO professionals see are the result of industry happenings that are not publicly known. There is a lot that happens in the world of enterprise SEO that will never make front page news on SearchEngineLand. There are reasons why this information should always be kept confidential; RKG is not in the business of "outing" anyone else in the industry; we are in the business of taking care of our clients and staying on the cutting edge of search. While it's tempting to talk openly about some of the things we know, it's prudent to keep certain things confidential.
But there are things we can learn, and share, from what happens inside the world of SEO. In fact, it's often learning about the inside stuff that gives us glimpses into trends and the future.
"Search Results In Our Search Results"
Google is famous for saying, "We don't want search results in our search results." I remember speaking at SMX Advanced a few years ago and showing examples of what I felt were great site search SEO strategies that Epicurious was using. A representative from Google (someone I very much respect) was on the panel with me, and invoked the infamous quote. It's been a few years since that panel, but only in the last several months have we seen Google get more proactive in discounting certain types of search results and dynamic content. The words certain types of search results are very important here. Not all search results are 'bad' for SEO. Jamey Barlow of RKG and SEO extraordinaire says it best,
"Are these pages truly relevant and helpful to a user? Otherwise you are creating these Potemkin pages that are obviously designed to fool the search engines. That's a bad business to be in and it ignores the first rule of SEO: that value to the user is value to the web."
It makes sense that Google would not want blatantly low-quality search result pages ranking well, especially if their data show lower overall user engagement. Some of this action may fall to Panda-related algorithm updates. When there is insufficient content on a search result page, and that result is not particularly relevant for the terms the site is "going for" in their title and headings (for example), Google should be able to discount this automagically with their classifiers.
It may be more than that in the cases we've seen, however. The first example is a Fortune 10 website and household name that has made use of search results to a large extent. While it's known within the organization that this isn't a sustainable strategy, their reality has been formed more passively than aggressively. Within large enterprise companies, projects are not easily implemented (especially big projects). Often a current strength or weakness in SEO is the culmination of months or years of not getting the right things done, rather than a proactive agenda to drive SEO in a particular way.
When we investigated this site several months ago, we found a large dependence on site search pages for their non-branded organic traffic. While products performed well (a great sign), category pages were stunningly weak. The combination of too much dependence on site search pages and weak category pages created for them a highly unsustainable situation.
Looking at data today we see this large site and another Fortune 50 competitor of theirs both dropping precipitously in the number of organic keywords from Google. This data is via SEMRush, which can at best be considered directional. It is not precise data. In fact, it could be completely wrong. However, based on what we know about the industry and what these sites were doing, it seems highly coincidental that their traffic would decrease around the time we heard of Google looking more closely at site search and dynamic content.
It's Not Just Site Search
It's not only site search that can be problematic. Dynamic content - pages generated on the fly - can also pose problems. This is not a carte blanche statement that must be adhered to in a general sense. Every company and SEO must take into consideration their experience, the site's strengths and weaknesses, and their appetite for risk when making a decision about using dynamically generated pages. Done right, they have the potential to be powerful tools. Done wrong, they can create large issues for websites.
Use of dynamic content (and for that matter, site search) are SEO techniques that have been heavily relied on in the past. Four or five years ago it was a fairly novel approach and fewer sites were using the technique. Today, it's all too common to see it done poorly. There are good technologies available, but I suspect Google is taking a close look at all forms of dynamic content if they're designed with SEO in mind. It's a slippery slope and a controversial topic, to be sure.
How To Do Site Search and Dynamic Content Right
In our experience, site search pages tend to convert higher than conventional category landing pages on ecommerce sites. Certainly there will be exceptions, but we have seen the patterns too frequently to ignore. If the pages are fast, it makes intuitive sense: shoppers want to see everything a store offers right in front of them, rather than a curated list that may not include what they're looking for. This introduces a bit of tension: high-quality category pages can rank very well, but shoppers convert higher on search result pages. What should you do?
Think about methods to add quality - and relevance - to your site search pages. A site search that is highly relevant, optimized in all the right places (URL, title, headings), and has unique content is no longer a poor quality page; it now has potential to be a targeted and valuable page. Maile Ohye writes,
"...if your site design surfaces category pages similarly to search result pages, adding valuable content to the page makes the content more helpful to the searcher (and no longer just search results)."
The key goes back to what Jamey Barlow said: it's about making a relevant, quality experience. Too many SEO strategies rely on site search because it's easy and it scales well, but they forget to think about the overall user experience.
I'm not saying every site should go create unique content for each of their major search result pages. Sites that make use of site search heavily can benefit more from these (especially if they're already powering category and sub-category selection) by adding content and other quality signals to the pages.
Dynamic content is harder to get right. By nature, anything automated will have some sacrifice. Compared to human-created content, dynamically generated pages usually won't be as high quality, or as relevant, or include as much originality. With automated tools you benefit from exactly that: automation. You also benefit from scale and efficiency. What you lose is quality, potentially relevance, and potentially the user experience. To me the most important question to ask is, are these pages high-quality and will our users love them?
The other major issue with site search and dynamic content is that they both can introduce duplication. Site search is infamous for creating infinite variations of pages with the same content but slightly different URLs. Dynamic content can cannibalize a site's 'natural' pages by creating slightly overlapping topical themes and keyword targets that compete with each other. We've seen both cause problems.
Technical Methods for Handling Site Search
Keep in mind the following tools:
- Rel Canonical
- Meta Noindex (Follow and Nofollow)
- Robots.txt Disallow
- Nofollow (link attribute)
- Webmaster Tools Parameter Handling
- Rel Prev, Next
Let's talk to the particular benefits of each.
- Rel Canonical: this is your go-to for everything duplicate content. Rel canonical tags work much like a "soft 301" and will appropriately pass equity while removing the duplicate URL from Google's index. Bing follows these clumsily in our experience, and as yet still doesn't support them cross-domain. On the downside, anything annotated with rel canonical must be crawled to be counted: this does nothing to make search engine crawling more efficient.
- Meta Noindex: think of this as a method to noindex a URL at the meta level, rather than the link level with nofollow, which we'll cover below. URLs marked with meta noindex will still get crawled, and unless the annotation specifies "nofollow" as well, the links within a noindex'd page will also be crawled. Internal PageRank can still flow through the links on pages marked with 'noindex, follow'. This can be an effective tool and we continue to recommend it in certain cases. However, like the rel canonical tag, meta noindex'd URLs must be crawled to be counted.
- Robots.txt: the sledgehammer of SEO, disallow rules here will put a brick wall between your content and Googlebot. This can be a very good thing, but proceed with extreme caution: it is not a subtle tool. Robots.txt is quite effective at blocking Bing and Google (and whoever adheres to web standards) from crawling, but it is not as strong with regards to indexing signals. Robots.txt excluded pages don't pass any equity, do not get crawled, and if they're indexed may stay in the index or may fall out slowly over time. More frequently they become what we term, "suppressed listings" in Google's index, where there is no title or snippet information, only a URL. This happens when Googlebot finds a link (usually on another site) to a robots.txt excluded URL and cannot crawl it.
- Nofollow: the nofollow link attribute is a strange little animal. It does so many things: it discounts links that "aren't trusted" or that are paid. It stops equity from passing. It generally (with exceptions) stops Googlebot and Bingbot from crawling. It is a very fine tool, however, and it's greatest strength is the ability to do this at the link level rather than the meta level. For cart pages, certain overhead facets, for sorts, tags and the like, nofollow is still a tremendously useful little tool.
- Parameter Handling: entire posts could be written on the Google and Bing parameter handling tools. They are fantastic, especially Google's, and can work quite effectively. They are entirely focused on the crawling experience of the engines, not indexation. Because of that, they can have great influence over indexing. See this useful article by RKG's own Ben Goodsell for more details.
- Rel Prev, Next: a specific annotation for a series of paginated URLs and quite handy. Please see The Latest & Greatest on SEO Pagination for the nitty gritty details.
How To Handle Site Search Pages
All that said, what's the right approach for site search pages? Typically you'll want to use a combination of tools. Robots.txt is the most emphatic and easiest method (if you can live with the PageRank vacuum). However, if you already have tens of thousands of site search pages indexed you'll want to use meta noindex (just keep in mind crawling bloat). Parameter handling can be very effective, too, provided your URL query strings are encoded in a series of field-value pairs.
In every case, I would look specifically at the site in question and make a recommendation based on its particular situation. Unfortunately, "it depends" is the only responsible answer here. Further reading:
- Robots.txt Best Practices for SEO, me
- SEO Tips for Ecommerce Sites, Maile Ohye
- Video: Parameter Handling, Maile Ohye
- Crawling to De-Index, Dr. Pete Myers