Here are 9 juicy takeaways from Joachim Kupke's presentation at SMX East in NYC this month. Overall it was a terrific conference, other than the cursed Javits Center constantly causing issues with the wifi (or freezing us, or creating AV headaches). Danny Sullivan (the conference organizer, for those living under rocks and things) repeatedly said things like, "Javits sucks!" and "blame Javits, don't blame us!". We blame Javits, Danny. Notably absent from SMX East this year were regular search engine reps like Matt Cutts and Nathan Buggia, but it was great getting to hear from lesser known Google and Microsoft folks like Maile Ohye (Google) and Sasi Parthasarathy (Bing). As an SEO, I'm particularly interested in what the search engines have to say about specific technical issues such as indexation, duplicate content, crawling and redirects, and this conference had a couple of great sessions where a lot of that information was discussed. There were a few surprises (elaborated below) and a couple new announcements made, but overall the information shared by Joachim and the other search reps was very specific and likely subtle to anyone outside the 'inner realms' of search engine optimization. I love me some inner realms. Let's get to it -- here are my 9 SEO takeaways from Joachim's contributions at SMX East.
Joachim Kupke's Presentation on Duplicate ContentJoachim is on the indexing team at Google, and shared some juicy tidbits on how Google handles duplicate content, but also shared a lot of insights into how Google 'sees' the web and indexes URLs. Here are the points that stood out to me.
1. Impressions & ClicksJoachim repeatedly used the terms 'impressions' and 'clicks' in the context of a URL in Google's index. He mentioned that if they see a URL with very few impressions (or none), it will likely take very long to be updated in the index (no surprise there). However, URLs with a lot of impressions and clicks (or on domains that are important and crawled frequently) will be updated quickly. This makes sense, but it's interesting to hear a search engineer reinforce these things. Those 301s or noindex tags on some pages that aren't being re-crawled and updated in Google? Probably because they're very low priority for the engine (yet another reason why big brands rule in SEO).
2. Infrastructure for Handling Duplicate ContentGoogle is said to have "a ton of infrastructure for duplicate elimination," some of which includes:
Define:treasure trove A treasure found of particularly great value; Silver, gold or money that is found hidden and has no identifiable owner
- Detection of recurring URL patterns
- The contents of a page
- The link canonical tag (if all else fails)
3. Historical Record of URLsGoogle keeps a sort of Archive.org of the web with older versions of content (not really like that at all, but you get the idea: a historical record of pages), for the ability to compare the most recently-crawled version with an earlier version. The contents that change can be subtracted from things that don't change within a site. This may also give Google the ability to ascertain where global elements, shingles, and content stubs appear within a site separately from definitive, unique and changing content.
4. Google + rel=canonical = LoveGoogle loves the link canonical meta tag. It has been, in Joachim's words, "tremendously successful" and has seen exponential adoption on the web. They are treating this tag very seriously, it is a "strong hint" as Maile Ohye told us at SMX Advanced in June of this year. This was reinforced by both Maile and Joachim at SMX East. It has "huge impact" on Google's canonicalization decisions: 2 out of 3 times, rel=canonical alters the organic decision. This is big, folks.
5. 302s are Just Fine for Canonical Targets302 redirects are fine canonical targets. This was explained at least twice by Joachim, and actually has 2 parts:
- Because of an internal method for handling the trailing slash on URLs, Google needs to have (and recommends all web developers deploy) a trailing slash on canonical targets and internal links. Without the trailing slash, Google will actually add the slash and update the URL in its index. Now, I've found multiple examples of pages where this doesn't happen, but Joachim was pretty firm that it's a web problem in general that Google is forced to work around.
- The takeaway is that you should always add the trailing slash to the absolute URL in the canonical target. If you don't, Google will add it anyway, but adding it proactively should speed up server response times (which may have impact on very large sites).
6. How 302 Canonical Targets Could be Abused302 redirects are fine as canonical targets. Yes, I know I just repeated myself. Here's the interesting part for SEOs: if 302s are ok to use here, I can think of a method to use the link canonical meta tag for SEO purposes without having to do any heavy lifting on URL structure improvement. How? Read on for a theoretical example: A site with very poor URL structure (how about this example) would like to improve URLs for SEO and usability reasons. However, the developers are swamped, the technical platform is wonky, they don't have enough money for quality SEO, or they simply don't believe it matters that much to change. An SEO comes to them with the following proposition:
- Create a table with search-friendly URL versions of every URL to be improved.
- Add these search-friendly URLs as rel=canonical targets in source code.
- 302 the canonical target to the existing (crappy) URL on the site.
- Presto! Pretty URLs in search results.
7. Don't Disallow Your Duplicate Content (?)Google says "please do not use
robots.txtto annotate duplicate content." Content Google can't get to, Google can't know about, and they don't like that. Their preference seems to be "put it all out there" and we can decide what's best, and anytime content is excluded from search engines they lose that ability. My personal preference is to take more control, not less, but I understand the thinking behind this and why they'd want to say this.
8. Indexing May Take Very Long for "Unpopular" URLsJoachim stated that indexing takes time (as I mentioned previously), but especially for "obscure or unpopular" URLs. And while indexing takes time, cleaning up an "existing part of the index" takes an even longer period of time. There are of course ways to issue a crawl from Google (which is separate from an index update, of course), but by and large lesser-known sites don't get the same love popular sites do.
9. Cross-domain Support for the Link CanonicalGoogle will soon be bringing cross-domain support for the canonical tag. This is fairly huge. Yahoo! and Bing both said that they're still working on simply supporting rel=canonical at all.
Other great stuff at SMX EastThere was plenty of other great stuff, too, especially David Mihm, Will Scott, Andrew Shotland, Mike Blumenthal and Mary Bowling on Local SEO Ranking Factors. Local is such an exciting area right now for search marketers, and this crew brought together an amazing session. It got me so fired up that I came back very excited to delve deeper into local. Update: Laurent Bourrelly (@laurent8) has graciously translated this post into French and posted it here: Redirection 302, contenu dupliqué et autres infos sur l'indexation. Thanks, Laurent!
Join the Discussion