A sitemap according to Wikipedia is "a list of pages of a web site accessible to crawlers or users." While they are completely optional, Google uses the sitemap on your site to learn about it's structure. This allows Google and other search engines to potentially increase crawling coverage.
While you can build this yourself manually via XML Builder or handcrafting an XML file, I prefer using the sitemap_generator gem. The greatest benefit of using the gem is that it is built to adhere to the Sitemap 0.9 protocol. Not only does it handle regular links, but also supports news, videos, images, mobile and geo sitemaps. SitemapGenerator also provides Ruby on Rails integration out of the box.
To get started, add the following to your
bundle, install it to your Ruby on Rails project via the following rake task:
bundle exec rake sitemap:install
Creating your Sitemap configuration
SitemapGenerator requires that you specify a configuration file in
config/sitemap.rb. Here is a breakdown:
The search engines reading your sitemap need to know what website they are dealing with. Set
default_host to your root website URL.
SitemapGenerator::Sitemap.default_host = 'http://www.yoursite.com'
SitemapGenerator comes with multiple adapters that will more than likely suit your needs. If you already have CarrierWave setup in your project, the
SitemapGenerator::WaveAdapter uses your existing settings. If CarrierWave is not being used, you can always fallback to the
SitemapGenerator::S3Adapter. Set your adapter through the
adapter configuration setting.
SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new
Since we are hosting our sitemap remotely, we need to set
sitemaps_host. An example of this would be "http://YOUR_BUCKET.s3.amazonaws.com/". I personally set this to an environment variable
SitemapGenerator::Sitemap.sitemaps_host = ENV['SITEMAP_HOST']
tmp to write our sitemap files before uploading. This example assumes you are using Heroku.
SitemapGenerator::Sitemap.public_path = 'tmp/'
To specify a specific directory you would like your Sitemaps stored on, set
SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/'
Once setup, you will need to specify the structure of your site. The following example demonstrates a couple of options such as specifying the change frequency of a page and indicating when the page was last modified.
SitemapGenerator::Sitemap.create do add '/contact_us', 'changefreq': 'weekly' Article.find_each do |article| add article_path(article), lastmod: article.updated_at end end
Finally, SitemapGenerator can ping search engines to indicate they should crawl the site again by calling
Here is the completed
# config/sitemap.rb SitemapGenerator::Sitemap.default_host = 'http://www.yoursite.com' SitemapGenerator::Sitemap.adapter = SitemapGenerator::WaveAdapter.new SitemapGenerator::Sitemap.sitemaps_host = ENV['SITEMAP_HOST'] SitemapGenerator::Sitemap.public_path = 'tmp/' SitemapGenerator::Sitemap.sitemaps_path = 'sitemaps/' SitemapGenerator::Sitemap.create do add '/contact_us', 'changefreq': 'weekly' Article.find_each do |article| add article_path(article), lastmod: article.updated_at end end SitemapGenerator::Sitemap.ping_search_engines
Sitemap to the url of your remote sitemap endpoint:
Once a day during your slowest traffic period, trigger a refresh via the included rake task:
bundle exec rake sitemap:refresh