Patch to allow (escaped) HTML in Textpattern comments

filed under: , ,

28 November 2006

(In case the title didn’t warn you, non-geeks probably won’t be interested in this article.)

Textpattern has a “feature” whereby HTML tags are completely stripped out of visitor comments, presumably to close a common security hole. I’ve created a patch for Textpattern 4.0.4 that changes it such that HTML tags in comments are allowed, but the relevant characters (<, >, &, ‘, “) are converted to entity codes and therefore aren’t treated as HTML delimiters and therefore aren’t a security concern.

Net effect: it’s possible to post HTML code examples in article comments. I’ll post such a comment right after I post this article, just to prove the point.

I’ll be announcing the patch in the txp-dev mailing list and on the forums, and inviting Textpattern developers to use this article as a place to test such comments. Hopefully it will be merged into the next official Textpattern release, so I don’t have to maintain this patch for more than a few months.

Comments
  1. Nathan Arthur says:

    <table><tr><td>
    See how this didn’t become a table?
    </td><td>
    But see how it can still use Textile to modify the formatting?
    </td></tr></table>

  2. Nathan Arthur says:

    I’ve already discovered my first bug – if I use the admin-side to edit a comment, the entity codes are lost and the HTML ends up being applied. Ideally, the admin-side would do the same thing the visitor-side does, and preserve the input text as written.

    Hrm.

  3. Nathan Arthur says:

    I’ll test with this comment by putting “bq” tags around the following paragraph:

    This line should have escaped “bq” tags around it.

    ...and then I’ll ‘edit’ (without modifying) and save the comment on the admin-side.

  4. Nathan Arthur says:

    Yep, it broke. If you look at the source, you can see that the “bq” tags ended up in the HTML, rather than displayed on the page.

  5. Nathan Arthur says:

    I may have it fixed – here’s another paragraph wrapped in “bq” tags:

    <bq>This line should have escaped “bq” tags around it</bq>

    I’ll edit this momentarily (without modifying) and save the comment, on the admin side. Lets see what happens…

  6. Nathan Arthur says:

    Hey – it worked! The other test case is to see what happens if I just write ‘<’ and ‘>’ characters, like this: > < & ‘ “

  7. Nathan Arthur says:

    Still working! It’s a bit ugly in the textbox on the admin side, but it does the right thing :)

  8. Matt says:

    So, if I have a tag in a post, do the angle brackets get stored in the database or do the escaped characters? I typically am in favor of storing the text as-is in the database and escaping it on display. That gives you a lot of flexibility to change what tags you escape without a database cleanup. But, in this case, you might be better off escaping the data before it hits the database. If you do that for your users, then your admin edits would see that the users submitted escaped stuff and the admin could change that to whatever because admin submissions aren’t escaped. Thoughts?

  9. Nathan Arthur says:

    Matt:

    1) In standard Textpattern 4.0.4, anything that looks like html (e.g. <sometext>) is completely removed from the comment before it is saved to the database. With my patch, the angle brackets are converted to entity codes before they’re stored in the database.

    2) I agree that the text should be stored as-is in the database, and adjusted on display. TXP isn’t designed around that behavior, though, and it wasn’t necessary to completely change it just to fix this bug, so I didn’t. But I sure wish they’d change it.

    3) The data is escaped before it hits the database. The problem is that textareas in browsers convert escaped characters back to regular characters, so (by default) admin users don’t see that the brackets were escaped. That’s why I have to do the second round of escaping.

  10. Markus Merz says:

    Hmm, letting text pass through unfiltered to the (external) database … Isn’t that a big security hole by design if the database doesn’t do its own in/out filtering?

  11. Nathan Arthur says:

    I’m not sure what you mean by the “external database”. Also, as near as I can tell, no text is passed unfiltered anywhere; it’s filtered when it is submitted, and then again when it is edited again on the admin side.

  12. Markus Merz says:

    The database is external by definition because it is not inside Textpattern. I was referencing to “the text should be stored as-is in the database”. Maybe it was a misunderstanding because your concept involves the changing of the ‘dangerous’ letters before they hit the database?

    Never mind, I think that having a whitelist of allowed characters and escaping everything else will be a good thing.

Add a comment

(will not be displayed)

(not required)