Some tools I use include page caching, a sweeper, an observer, a custom logger, and an xml builder. (My expensive SEO consultant says I gotta get that keyword density up.)
The Sitemap Protocol
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc> required
<lastmod>2005-01-01</lastmod> optional
<changefreq>monthly</changefreq> optional
<priority>0.8</priority> optional
</url>
<url>
...loop through your URLs...
</url>
</urlset> Building the sitemap
1. The route
The conventional location for a simple sitemap seems to be domain.com/sitemap.xml. I default params[:format] to “xml” and map the route to the index action of sitemaps_controller.rb.
# config/routes.rb
Grinch::Application.routes.draw do
get "sitemap.xml" => "sitemaps#index", as: "sitemap", defaults: { format: "xml" }
...
end2. The controller (+ caching the sitemap)
If you’re unfamiliar with Rails caching, check out: Rails Guides: Caching with Rails
Note: Rails doesn’t cache by default in development mode. If you want to test page caching locally, you’ll need to enable it:
# config/environments/development.rb
config.action_controller.perform_caching = trueBut here’s a quick crash course:
Page caching means the full response is written to a static file in public/. Future requests won’t even hit Rails because, generally, servers are configured to look in public/ before passing the request to Rails. This is why you have to remove public/index.html in a new Rails app before you can see your root action.
caches_page :index will tell Rails to write to a sitemap.xml if it doesn’t exist. The route tells Rails where to write it hierarchically in the public directory:
get "sitemap.xml"–>public/sitemap.xmlget "sitemaps/sitemap.xml"–>public/sitemaps/sitemap.xml
# app/controllers/sitemaps_controller.rb
class SitemapsController < ApplicationController
caches_page :index
def index
@static_paths = [about_me_path, projects_path]
@posts = Post.all
respond_to do |format|
format.xml
end
end
endLike any other controller action, it’ll send those @vars to an expected views/sitemaps/index.xml
3. The builder
Builder ships with Rails. It’s akin to any other compilers/templaters you may be used to like .css.scss and .html.erb.
Builder initializes an xml object that you can just add on to.
# app/views/sitemaps/index.xml.builder
xml.urlset(xmlns: "http://www.sitemaps.org/schemas/sitemap/0.9") do
@static_paths.each do |path|
xml.url do
xml.loc "#{GRINCH['root_url']}#{path}"
xml.changefreq("monthly")
end
end
@posts.each do |post|
xml.url do
xml.loc "#{GRINCH['root_url']}#{url_for(post)}"
xml.lastmod post.updated_at.strftime("%F")
xml.changefreq("monthly")
end
end
end4. Expiring the cache when posts are created/destroyed.
Rails needs to be told when to expire (delete) a cached page. We can do that with a Sweeper.
Sweepers are the terminators of the caching world and responsible for expiring caches when model objects change. They do this by being half-observers, half-filters and implementing callbacks for both roles.
# app/sweepers/sitemap_sweeper.rb
class SitemapSweeper < ActionController::Caching::Sweeper
observe :post
def sweep(post)
expire_page(sitemap_path)
end
alias_method :after_create, :sweep
alias_method :after_destroy, :sweep
endexpire_page() is the opposite of the caches_page() we used in the controller. Any time a Post is created/destroyed, Rails will delete the file specified in our sitemap_path route. A fresh sitemap.xml will be cached on the next request.
For an app with a large sitemap, you’d want to start rebuilding it immediately so a search engine doesn’t have to wait. Please, think of the crawlers.
Simples
Great.Now a fresh sitemap.xml is located at domain.com/sitemap.xml even as we create and destroy posts.
But now let’s tell search engines about it when an updated sitemap is available.
Pinging search engines
1. The pinger
# app/models/sitemap_pinger.rb
class SitemapPinger
SEARCH_ENGINES = {
google: "http://www.google.com/webmasters/tools/ping?sitemap=%s",
ask: "http://submissions.ask.com/ping?sitemap=%s",
bing: "http://www.bing.com/webmaster/ping.aspx?siteMap=%s"
}
def self.ping
SitemapLogger.info Time.now
SEARCH_ENGINES.each do |name, url|
request = url % CGI.escape("#{GRINCH['root_url']}/sitemap.xml")
SitemapLogger.info " Pinging #{name} with #{request}"
if Rails.env == "production"
response = Net::HTTP.get_response(URI.parse(request))
SitemapLogger.info " #{response.code}: #{response.message}"
SitemapLogger.info " Body: #{response.body}"
end
end
end
endYou might want to round off your search engine list with the big boys like AltaVista, HotBot, and Lycos, but I wasn’t about to get cocky.
2. The logger
To ensure my pinger even does anything, I log the responses to logs/sitemap.log. I also don’t want to ping the search engines during development/testing any time I mess with Posts, so I tell it to only ping in production.
To set up a new logger, create an initializer.
log_path = File.join(Rails.root, 'log/sitemap.log')
log_file = File.open(log_path, 'a')
log_file.sync = true
SitemapLogger = Logger.new(log_file)3. Triggering the ping
We have our pinging mechanism set up, but now we need to actually invoke the pinger when we have some fresh intel for our search engine friends.
I use an Observer.
Observer classes respond to life cycle callbacks to implement trigger-like behavior outside the original class. This is a great way to reduce the clutter that normally comes when the model class is burdened with functionality that doesn’t pertain to the core responsibility of the class.
# app/observers/sitemap_observer.rb
class SitemapObserver < ActiveRecord::Observer
observe :post
def ping(post)
SitemapPinger.ping
end
alias_method :after_create, :ping
alias_method :after_destroy, :ping
endHowever, unlike sweepers, we must register our observer before it will be invoked.
# config/application.rb
config.active_record.observers = :sitemap_observerBoom
There we have it. Our observer will call our SitemapPinger.ping whenever posts are created/destroyed (and our sitemap.xml is destroyed).
To verify it all works in development:
- Remove the
if Rails.env == "production"condition in the SitemapPinger - Ensure
config.action_controller.perform_caching = true - Create/destroy a few posts
- Check
log/sitemap.logfor updates - Ensure
rooturl.com/sitemap.xmlworks
