In part I: SEO Secrets of AskApache.com we talked about content and building a website to be your "SEO Base". This article discusses some advanced SEO concepts to get a site indexed, move your best pages higher in search results, and controlling the pagerank/seo-juice of your site. but relatively easy ways to control and tweak WHAT urls on your site are indexed, and HOW. I've heard some people refer to this as "controlling pagerank flow" and "controlling pagerank juice", basically we want our best pages to rank higher in the search engine results.

Big Picture: Going from no website to AskApache.com in less than a year can be accomplished by anyone with unique content and a resolve to avoid any shortcuts and take it one step at a time.

We want what Google wants, to provide the most relevant content for someone who is doing a search. Basically, you want every url on your site that has unique content to be included in the index. In other words, you have to think like a search engine!


The Goal of Google

Here's what I mean, Google approaches search with the overwhelming goal of bringing the content to a searcher that is the most-likely to be what that searcher is searching for. Another way of looking at it is something I read on Google...

Google's goal is to get you off of their site as fast as possible by providing you with exactly what you are looking for.

Think about SEO like this

If you search for htaccess tutorial for seo on Google, would you be more likely to visit a tutorial about using .htaccess for seo or a category page for htaccess articles? AskApache.com has both of those urls included in the index, but the article ranks higher than the category page, as it very well should.


Get Your URLS Indexed

Before I explain how I am able to help Google and other search engines rank my article pages higher than my category pages, we need to get the urls in the index or nothing will show up. There are many well-discussed methods for getting included in the index, so I'll just list a few that I use.

  1. Provide High-Quality original content, people will link to it
  2. Get a sitemap and use Google Webmaster Tools
  3. Publish an RSS/Atom Feed and ping the net when you publish a post
  4. Use robots.txt and robots meta tags

Your URLs in the index

Here's how to find out which of your pages are indexed.

  • Indexed pages in your site: site:
  • Pages that link to your site's front page: link:
  • The current cache of your site: cache:
  • Information about your site: info:
  • Pages that are similar to your site: related:

Break It Down

Yo homeslice! I didn't mean break dance.. I mean lets simplify AskApache in the context of getting our urls indexed high/low. Here's the stats: 1 Homepage, 206 Articles, 19 Pages, 31 Categories

1 Homepage

This page is generally the highest ranking page in the index, it should contain links to your best urls, and provide easy navigation

206 Specific Topic Articles

These are the article's (like this one) of AskApache.com and are the main source of search engine traffic. You want each url (if its a good article) to be ranked as high as possible. Some keys are to really make each article specific to a topic by using best-practice (X)HTML.

19 Static Pages

Most of these are pages like the online-tools hosted on this site, or other basic pages like about, contact us, site-map, etc.. Some of these you may want to rank very high ( like the /about/ page ) and some you may not want to even be included in the index.

31 Specific Topic Category Pages

These are tricky because they are generally just lists of articles from each category, which isn't specific enough to get much seach-engine-traffic, but is very useful to site visitors. I beefed up my category pages by adding additional information about the category topic in addition to excerpts of the articles.


Higher Pagerank = Higher Up in Search Results

So Googlebot and other search engine robots have these crazy complicated algorithms (many patented) that SEO Industry types may get caught up in and try to technically analyze them. I'm sure you've seen/read/heard the complicated advice that will always be pushed by many... advice like:

  • analyze the number of words in the description/title/1st paragraph/etc.
  • Make sure your "keyword" is sprinkled throughout the text every 10-30th word..
  • Other equally unexciting technical analysis

Now if you've had success with that then props to you, success is success, but I personally choose to completely ignore all that. The number 1 thing that the top search engines advise is to design your page for a Human Visitor, not a computer. The golden rule for me is how I would rank the page, not how some algorithm would.


Designing for a Human Visitor

This is a major factor in your site being at the top vs. nowhere. You design your HTML to be as minimal as possible (see source code for my homepage) and contain ONLY the neccessary elements. Above all, use semantically sound XHTML markup. (view source of W3C)

External CSS/Javascript

Get your javascript and CSS out of your HTML and use external files (like this site) ALWAYS! You should start with just the HTML, no css, no colors, no javascript, and THEN you add the .css and then you add the javascript.

Site Accessibility

Say your browser didn't have a mouse, didn't support images, css, javascript, or even colors! Your HTML should be structured such that your page is still easily readable and easy to navigate. You can use lynx, links, and many other terminal-based browsers to test for this... please see the Web Accessibility Initiative (WAI) for detailed info.

Web accessibility refers to the practice of making websites usable by people of all abilities and disabilities. When sites are correctly designed, developed and edited, all users can have equal access to information and functionality. For example, when a site is coded with semantically meaningful HTML, with textual equivalents provided for images and with links named meaningfully, this helps blind users using text-to-speech software and/or text-to-Braille hardware.


Controlling a URL's Pagerank

A few tools and techniques are available for controlling the "juice" or "pagerank" of your urls.

  1. Robots.txt
  2. Robots Meta Tags
  3. Links

Robots.txt

I've done quite a bit of research and experimentation with robots.txt files, which is a file located in the root of your website at http://www.askapache.com/robots.txt that is downloaded by all legitimate search engine spiders/bots and used as a Blacklist to prevent certain urls from being indexed. Here are a few of the articles on this site, which you may skip if you like as they don't illustrate the big-picture that I am going to discuss now.

How To Use Robots.txt

Even though robots.txt files are for whitelisting and blacklisting urls, I have found that they should only be used as an extreme form of blacklisting. When you Disallow a url in your robots.txt file, that means most search engine bots won't even LOOK at the url. As you can see in the below example, I only disallow urls that shouldn't ever be LOOKED at. The real powertool is the robots meta tag.

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content

Sitemap: http://www.askapache.com/sitemap.xml

Robots Meta Tag

Ok I'm really trying to simplify, because what you should understand is the big-picture. Every page can have a robots meta tag in the header, and this robots meta tag can tell the search-engine to index/not-index AND follow/not-follow. Here are some examples:

<meta name="robots" content="index" />
<meta name="robots" content="noindex" />
<meta name="robots" content="noindex,follow" />
<meta name="robots" content="index,nofollow" />

content="index" / noindex

index means the search engine is free to index, archive, cache, and follow the page whereas noindex means DO NOT include this page in the search engine results.

content="follow" / nofollow

follow means the search engine is free to LOOK at the page and follow the links on the page whereas nofollow means DO NOT follow the links on the page.


WordPress Auto-Robots meta tag code

Just add this to any plugin file and it will add the right robots meta tag to your site.. tweak to taste.

<?php
function askapache_robots_header(){
 global $wpdb;

 $robot = '<meta name="robots" content="noindex,follow,nocache,noarchive" />';

 if ( is_paged() || is_search() || is_404() || is_author() || is_tag() )
   $robot = '<meta name="robots" content="noindex,follow" />';
 elseif ( is_home() || is_front_page() || is_single() )
   $robot = '<meta name="robots" content="follow,index" />';
 elseif ( is_category() || is_page() )
   $robot = '<meta name="robots" content="follow" />';

 echo $robot . "n";
}
add_action( 'wp_head', 'askapache_robots_header' );

Links

External and Internal Links are the crux of SEO. It's important to start FIRST on your Internal Links and linking structure... Once you are satisfied that the correct pages are indexed and ranked appropriately, then you can begin to look at external links.

  • The fewer links on a page, generally the better.
  • If every page of your site points to the same url on your site, pagerank goes up from the number of internal links.
  • Use of the rel, title, alt attribute semantically is very helpful. (next, prev, index)
  • Put your best links higher up in the XHTML, and put helpful/solid links at the end.
  • You can add rel="nofollow" to links that you dont want followed.

The web has gotten to be so full of malicious/non-helpful SEO activity that I recommend developing your content NOT external links. If you want to do this right and provide great content that makes search engine users happy and makes the web better, then explore this blog and develop content until the next article in this series, where I'll show you how to make your site explode.

Stay tuned for Part III, which will dive deeper into the pipeworks of AskApache.com

Tags

Comments