Archive for » August, 2008 «

Wednesday, August 20th, 2008

[This is a re-write of a project note I wrote in project over a year ago. Published here to make the world a better place.]

There has been some commotion recently regarding XHTML and HTML, where standardistas have reiterated their arguments about XHTML: it’s a nice standard, but since is isn’t actually treated as XHTML by browsers, it’s better to use HTML.

The “standard” text on this (at least for me) is http://www.hixie.ch/advocacy/xhtml from which I’d like to quote the executive summary:

If you use XHTML, you should deliver it with the application/xhtml+xml MIME type. If you do not do so, you should use HTML4 instead of XHTML. The alternative, using XHTML but delivering it as text/html, causes numerous problems that are outlined below. Unfortunately, IE6 does not support application/xhtml+xml (in fact, it does not support XHTML at all).

I have come back to that text on several occasions, but it never quite convinced me. I didn’t want to lose the well-formedness of XHTML. Then I ran across http://www.webdevout.net/articles/beware-of-xhtml which made me change my mind. The article uses numerous examples to show that XHTML is not just a syntactically nicer version of HTML, but it actually has slightly differing rules for how to apply CSS. So basically, in all projects that I have used XHTML, I have been relying on bugs in present browsers. Ouch.

Time to change—back

They promised us that XHTML was the future, so I haven’t worried too much about HTML lately. Are there any downsides to using HTML instead? Well, not per se. But the tools and platforms I use are slightly XHTML-centric.

All the standard Rails helpers (form helpers, JS helpers, etc.) produce xhtml. The simple solution is to drop this in a file in lib/:

module ActionView::Helpers::TagHelper
  def tag_with_html_syntax(name, options = nil, open = true, escape = true)
    tag_without_html_syntax(name, options, open, escape)
  end
  alias_method_chain :tag, :html_syntax
end

TextMate snippets produce XHTML. Most of them can be controlled by an environment variable, but there are exceptions, like C-ret which produces produces an XHTML self-closed br element.
External tools, data sources and parsers are usually xml-centric.
WordPress, which I have used for a few client projects lately, spits out XHTML. I’ll see if I can create a patch for that and have it accepted.

Oh well. XHTML, I’ll miss you. HTML, can we be friends again?

Category: Web development | Tags: html, xhtml | 2 Comments

Escaping single quotes in Ruby harder than expected

Wednesday, August 06th, 2008

So I was writing this tool to create a bunch of SQL statements from a data dump. Simple enough, right. And as always when you generate SQL statements, you have to make sure that the data doesn’t interfere with the SQL syntax by escaping the single quotes (and generally any binary data, but I didn’t have that). Any database gem/module/library has that built-in, of course, but I didn’t want to use that. So I said [Note: this doesn't work. Read on for the solution.]

def quote (str)
  str.gsub('\\','\\\\').gsub('\'','\\\'')
end

I read this as “replace all backslashes with double backslashes, and then replace all single quotes with a backslash and a single quote”. I added a simple test for it (yay TestUnit!):

def setup
  @m = Migrate.new
end

def test_quote
  assert_equal("I\\'m home", @m.quote("I'm home"))
end

But imagine my surprise when I got

  1) Failure:
test_quote(TestMigrate) [migration/test_migrate.rb:29]:
< "I\\'m home"> expected but was
< "Im homem home">.

Ooookay. What’s wrong here? Have I misunderstood the rules for escaping the escape sequence? It’s supposed to be easier with single quotes, but maybe I got it wrong. So I tried with double quotes:

def quote (str)
  str.gsub("\\","\\\\").gsub("'","\\'")
end

Surely this would work? Nope, it gives the exact same error. Time to look up gsub in the manual:

str.gsub(pattern, replacement) => new_str
str.gsub(pattern) {|match| block } => new_str

[…] If a string is used as the replacement, special variables from the match (such as $& and $1) cannot be substituted into it, as substitution into the string occurs before the pattern match starts. However, the sequences \1, \2, and so on [my emphasis] may be used to interpolate successive groups in the match.

“And so on”? Oh, so obviously \' (escaped \\' in the string literal) is the replacement string equivalent of $', which means everything afther the match (as all regexp hackers know). So I need to escape the backslash for regexp engine too:

def quote (str)
  str.gsub("\\","\\\\").gsub("'","\\\\'")
end

OK, the tests pass. But the code looks wrong. Four backslashes can’t work for both cases, can they? Let’s add a test case:

def test_quote
  assert_equal("I\\'m home", @m.quote("I'm home"))
  assert_equal("S\\\\N", @m.quote("S\\N"))
end

Nope, that fails. So we need this:

def quote (str)
  str.gsub("\\","\\\\\\\\").gsub("'","\\\\'")
end

Eight backslashes. Yes, the test passes, but is it worth it? Is it understandable? I don’t want comments to explain my code. Comments are good to provide a raison d’être for something, but not to explain its looks. Let’s switch to the other form of gsub:

def quote (str)
  str.gsub(/\\|'/) { |c| "\\#{c}" }
end

“If you see a backslash or a single quote, replace it with a backslash and whatever you saw.” That’s what I wanted to say anyway.

Good. But I wrote this in Markdown, so now I have to generate the HTML and the go through it and make sure that I restore whatever backslashes Markdown ate. (It turns out it didn’t eat any. TextMate has a Markdown Preview function that ate a lot of backslashes, but when I said “Convert to HTML” it didn’t eat any at all. Go figure.)

Category: Ruby | Tags: gotcha, regexp, Ruby, SQL escape | 9 Comments

The survey

Wednesday, August 06th, 2008

It’s time for the yearly survey for web professionals:

But that’s hardly a note to self. Looks like I’m turning this into a blog.

Category: The daily grind | Comments off

Archive for » August, 2008 «

Time to change—back

Tag Cloud

Recent Comments

Categories

Archives

Pages

Meta