Monday, September 08th, 2008 | Author:

I need to pick up an XML file from a server every 30 minutes and process it. I’ve done similar things before, and using Hpricot it is a pleasure:

#! /usr/bin/env ruby
require 'rubygems'
require 'hpricot'
require 'open-uri'

doc = Hpricot(open("http://example.com/the_file.xml"))
(doc / :person).each { |person| ... }

Couldn’t be simpler. This time, there is a snag: the file is sensitive, so the connection is encrypted using HTTPS. For this article, let’s say we’re talking about Cert’s list of new vulnerabilities, which can be found at https://www.cert.org/blogs/vuls/rss.xml. open-uri supports HTTPS, so it shouldn’t be a problem, but it is:

doc = Hpricot(open("https://www.cert.org/blogs/vuls/rss.xml"))
# =>
/usr/lib/ruby/1.8/net/http.rb:590:in `connect': certificate verify failed (OpenSSL::SSL::SSLError)

OpenSSL, which open-uri uses behind the scenes, fails to verify Cert’s certificate and halts execution.

Solution 1: skip verification

Let’s assume that I don’t care much about the verification; all I want is the data, and it just so happens that it is only available through HTTPS. open-uri doesn’t let me turn off verification so I have to dig deeper.

open-uri is just a clever wrapper around Ruby’s comprehensive, but insufficiently documented, networking library that handles a variety of protocols, including HTTPS. To fetch a web page over a secure connection, you can use something like this sample client (from net/https.rb):

#! /usr/bin/env ruby
require 'net/https'
require 'uri'

uri = URI.parse(ARGV[0] || 'https://localhost/')
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true if uri.scheme == "https"  # enable SSL/TLS
http.start {
  http.request_get(uri.path) {|res|
    print res.body
  }
}

There are three things to note in the sample client:

  1. You should require net/https, not net/http.
  2. You create the client with Net::HTTP.new, not Net::HTTPS.new. (There is no HTTPS class despite the fact that you require 'net/https'.)
  3. You need to set use_ssl = true explicitly. The URI library is clever enough to set its port attribute to 443 when it parses a URI that starts with https, but Net::HTTP isn’t quite as clever.

If you put the above code in webclient.rb and run it, you’ll see this:

$ ruby webclient.rb https://www.cert.org/blogs/vuls/rss.xml
warning: peer certificate won't be verified in this SSL session
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
    <channel>
        <title>Vulnerability Analysis Blog</title>
[...]

Yes, it will fetch and print the RSS XML, but it will also warn you that it doesn’t verify the host’s certificate. Let’s turn off the warning by telling Net::HTTP that we don’t expect it to perform any verification:

uri = URI.parse(ARGV[0] || 'https://localhost/')
http = Net::HTTP.new(uri.host, uri.port)
if uri.scheme == "https"  # enable SSL/TLS
  http.use_ssl = true
  http.verify_mode = OpenSSL::SSL::VERIFY_NONE
end
http.start { ... }

Run this, and you get the same result without the warning.

Solution 2: add verification

Solution 1 is not enough for my current needs. I want encryption, but I also want to know that I’m talking to the right server. To turn on verification, I change VERIFY_NONE to VERIFY_PEER and run again. Now I’m back on square one with OpenSSL::SSL::SSLError: certificate verify failed. Uh-huh. So what’s wrong with that one? It works in my browser without problems.

I’m not going to go into how HTTPS and certificate validation works. Suffice it so say that my browser is more trusting than OpenSSL. And it’s not blind trust either; the browser knows more Certificate Authorities. So how do I add them to Ruby and OpenSSL? I looked around and found a solution to a similar problem, Connecting to POP3 servers over SSL with Ruby. Adapting that to my HTTPS problem, it becomes a two-step solution:

  1. Download the CA Root Certificates bundle from haxx.se, the creators of curl. Store the file in the same directory as webclient.rb and make sure that it’s called cacert.pem. (But please see the discussion below on Too much trust.)
  2. Make webclient.rb use this file instead of whatever is bundled with OpenSSL.

Now we can tell Net::HTTP to use this CA file:

uri = URI.parse(ARGV[0] || 'https://localhost/')
http = Net::HTTP.new(uri.host, uri.port)
if uri.scheme == "https"  # enable SSL/TLS
  http.use_ssl = true
  http.verify_mode = OpenSSL::SSL::VERIFY_PEER
  http.ca_file = File.join(File.dirname(__FILE__), "cacert.pem")
end
http.start { ... }

Look, it works! It gives the expected output, and it is verifying… something. But what? Time to look under the hood again. It turns out that with these settings, OpenSSL checks that the server certificate is signed by a known CA and has not expired, which is good, but not everything I’m looking for. I also want it to check that the certificate belongs to the server that I’m talking to. To see an example, go to https://google.com/. In Firefox 3, you should get an iconic policeman telling you it’s a Page Load Error. The certificate belongs to www.google.com, not google.com. But our script is not quite as discerning:

$ ruby webclient.rb https://google.com/
hostname was not match with the server certificate
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF="http://www.google.com">here</A>.
</BODY></HTML>

Note the warning on the first line of output. Apparently Net::HTTP checks to see if the certificate belongs to the host, but it’s not a fatal error. To change this, we need to enable the “post-connection check”. So here is the final version of the script:

#! /usr/bin/env ruby
require 'net/https'
require 'uri'

uri = URI.parse(ARGV[0] || 'https://localhost/')
http = Net::HTTP.new(uri.host, uri.port)
if uri.scheme == "https"  # enable SSL/TLS
  http.use_ssl = true
  http.enable_post_connection_check = true
  http.verify_mode = OpenSSL::SSL::VERIFY_PEER
  http.ca_file = File.join(File.dirname(__FILE__), "cacert.pem")
end
http.start {
  http.request_get(uri.path) {|res|
    print res.body
  }
}

Now it will fail for https://google.com/ but succeed for https://www.google.com/. Done!

Too much trust

OK, I should admit that downloading a file from someplace called haxx.se doesn’t seem like the best way to raise security. If you really want to know who you will be trusting, you should download Root Certificates from each of the CA’s that you trust. That’s way too much work for the application I’m working on right now, but it might be a requirement for you. If you don’t want to go mad, though, try doing it the same way the haxx people did. They wrote a little tool to extract the Root Certificates from the source files of Mozilla, and they even have a tool for extracting it from your binary installation. Check out their documentation for a full description and links to the tools (source code).

[Update: John in comment 16 has written up an instruction on how to get the certificates file using https.  Turtles all the way down.]

Category: Ruby
You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

19 Responses

  1. The method “enable_post_connection_check” is not available since 1.8.6 129

    Log Message:
    merge revision(s) 13657:
    * lib/net/http.rb, lib/open-uri.rb: remove
    Net::HTTP#enable_post_connection_check. [ruby-dev:31960]

    See http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/tags/v1_8_6_129/ChangeLog

    Stephan

  2. 2
    David Vrensk 

    @Stephan: Thanks, that is really valuable information, especially since the app where I developed this solution is running on a managed host. I’ll see if I can find time to work out something for more modern Ruby versions. If you beat me to it, please tell me!

  3. I think “more modern Ruby versions” are more secure. How much more secure, I can’t tell, sorry. I don’t think they include the “CA Root Certificates bundle”.

  4. … I forgot …. the enable_post_connection_check method is removed since the check is performed by default …. one turns it off with Net::HTTP#verify_mode= OpenSSL::SSL::VERIFY_NONE

  5. 5
    David Vrensk 

    Ah! So basically I should be good with

    http.enable_post_connection_check = true if http.responds_to? :enable_post_connection_check

    if I understand you right? Many thanks!

  6. Yes, I am thinking the same, however I am not sure.

    For example, are there earlier Ruby versions that didn’t have this method (then it was added, then it was removed again)?

  7. 7
    David Vrensk 

    Good question. I suppose the best way to find out is to build a nice test suite, but that would require having hosts that respond in the right way (like https://google.com/ and https://www.google.com/, which of course cannot be trusted to behave like they did when I wrote this post).
    I have checked out the diff and it’s nice and simple and a step in the right direction.

  8. If you look at the tests that come with httpclient, http://raa.ruby-lang.org/project/httpclient/ they run their own test server to get a handle on https security.

    Do you think the original code of your blog post verifies the server certificate. The implementation of validate_certificate at http://dev.ctor.org/soap4r/wiki/SslCertificateVerification does this for the case that one already has the file from the CA.

    Stephan

  9. If you look at the tests that come with httpclient, http://raa.ruby-lang.org/project/httpclient/ they run their own test server to get a handle on https security.

    Do you think the original code of your blog post verifies the server certificate? The implementation of validate_certificate at http://dev.ctor.org/soap4r/wiki/SslCertificateVerification does this for the case that one already has the file from the CA.

    Here might be another way,

    http.cert=OpenSSL::X509::Certificate.new(IO.read(path-to-already-available-cert))

    Stephan

  10. 10
    David Vrensk 

    @Stephan: your last two comments were caught in the spam filter which I didn’t cull until today. Sorry, my bad.
    I’ll get back to you later; have to run now.

  11. When it’s too tricky and if possible, I switch to curl or wget inside a queue. But thanks for posting these tips!

  12. 12
    grosser 

    thanks for the writeup, just saved me lot of time :)

  13. Hello,
    Have You an example of request soap with certified authentication?

    thanks

  14. 14
    David Vrensk 

    @Flavio: No, I haven’t used this with SOAP. Actually, I rarely use Soap4r, but instead I create an XML request and parse the XML response using something like HappyMapper. It doesn’t really scale, but sometimes it’s good enough.

  15. @David

    ok, thanks

  16. 16
    John 

    Your blog post helped/inspired me to come up with this, which is a more secure way to acquire the cert list:

    https://gist.github.com/996292

    And then this, which sets ruby to use the cert list library-wide:

    https://gist.github.com/996510

    Thanks!

  17. 17
    David Vrensk 

    Hey John, that’s really nice! I’ll add a note to the original post.

  1. [...] thought this would be straight-forward but it turned out to be slightly tricky. Thankfully I found this post that outlines the basics of setting up HTTPS in Ruby. Most people probably take the first method of [...]

  2. [...] can manually install the root certs, but first you have to get them from somewhere. This article gives a nice description of how to do that. The source of the cert files it points to is hosted [...]