Anne Van Kesteren

Introduction

Anne van Kesteren lives in Zeist in The Netherlands where he runs his own web development company, Limpid. However, Anne, is more widely known through his Weblog about markup style, where he explores the depths of XML, XHTML and CSS.

Interview

Q1. Let's start by setting the record straight. When should you send XHTML as 'text/html', and when should you send it as 'application/xhtml+xml'?

When possible, send it with the (specific) XML content-type, 'application/xhtml+xml'. That way you are sure that your document is well-formed. This is quite important, since most people (including me) still rely on browser results and not on the validator. What I mean is, they do trust the validator, but don't validate their pages when the browser does not have any problems displaying their pages.

Using the correct content-type, the better browsers will give you a not well-formed error, which automatically leads to updating that page (having errors on your page is not that cool).

Another problem with 'text/html' is that your document is treated as HTML, not as XHTML. There are some small differences in handling of CSS that you might not understand the first time you use 'application/xhtml+xml'. To name an example: in HTML documents, the 'background' property applied to the body element will be "re-applied" by the browser to the 'viewport' so that the image or color covers the entire screen and not just the body element. You might have guessed it, in XML this is different. The body element hasn't got any special treatments at all in such documents and behaves like it was a div with some default styling ('padding', 'margin').

To conclude, I think it is ok if you are sending your XHTML 1.0 documents as 'text/html' as long as you are aware what problems it might give in the future.

Q2 If you choose to serve 'application/xhtml+xml', how do you work around browsers that don't support it?

There are multiple options. The most extensive one I have seen is written in PHP. This one takes into account the vary header, which is too complicated to explain here. Other solutions have been provided long ago in a column from Mark Pilgrim: "The Road to XHTML 2.0: MIME Types". A real world example can be found at the W3C.

Q3. Why use XHTML instead of HTML? What are the advantages and is HTML still appropriate in some cases?

Probably the most difficult question. I think people will have to choose this for themselves, what they like best. Personally, I like XHTML, because the syntax is very easy compared to the SGML syntax.

Sometimes however, I really like the way you can minimalize everything in HTML (and I don't mean writing crappy code). Leaving out both start and end tags of an element (HTML, head and body element), leaving out the end tags of li, p etc.

Q4 You have talked about pages that are not well-formed being a bigger problem than invalid pages. Why is that?

Most of the pages that are invalid are not well-formed. Using the correct content-type on these pages will make the browser return an error (not well-formed). Trying to retrieve some information from a page (without using HTML Tidy) with XSLT and your parser will return an error. If a page is well-formed, but invalid you won't have these problems.

One of the major advantages of XML (and therefore XHTML) is that it does not need to validate. It needs to conform to a set of easy rules and that's it. (Of course, rules get more complicated if you want to do more specific things, but those are mostly never possible with SGML anyway.)

HTML however, needs to be valid, otherwise, it can't be parsed. Well, it can, thanks to quirks mode, which solves a lot of problems older web pages have (and some newer pages have those problem as well, unfortunately).

Currently, we can't take advantage of invalid, well-formed XHTML since there is at least one browser that doesn't support it. It will parse the XHTML just as HTML and therefore XHTML needs to be valid. In the future, we could create documents, where we skip some elements that are unnecessary for the document. Simple example:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <title>Webstandards Group!</title>
    <p>An example of minimalized markup.</p>
</html>

(Note: Mozilla supports this really well)

Q5 Recently you wrote that XHTML is not a step between now and the future? Could you elaborate?

There are some books I bought that had this little "preview" in what was coming after HTML, XML. Between now, HTML, and then, XML, web authors could play with XHTML to get used to the XML syntax.

Of course, XHTML showed us how XML would be and what we could do with it. (The specification tells about XSLT and related advantages.) However, that doesn't mean that every author will use his own custom version of XML in the future. An important aspect of modern web design are semantics of a web page.

If we have two examples:

<h1>Header</h1>
<header>Header</header>

In example one it might be less obvious what the content is for a human being. The second example tells that with more detail, the content is a 'header' and the value is 'Header'. From that point of view, we could say that the second example, which is a pure XML example, has more semantics.

The first example however, is understood by far more software (and some HTML authors who really used elements, rather than some <p><b>Header</b></p> construct). Software can't read the content, but it can sort content by value. Most search engines recognize HTML elements, like h1, which is a quite important element on the page. Google (to pick a search engine), doesn't recognize HEADER, it can't read like you and me.

So until someone invents a language that can describe the semantics of XML elements and that language would be understood by software, XHTML will be your first choice.

Q6. When creating markup, you often start by studying the content first (then taking your dog for a walk)? Why is that?

Well, the dog needs fresh air now and then... I think that content is the most important aspect of a web page. I should not generalize that, I think that for informative web sites (like a book store, a news site, etc.) content is the most important thing.

The reason people visit your page, is that they want to find information as quick as possible. To give them such an experience, you need to know something about the content that will eventually end on the site, so you can (in co-operation with the company) organize it in the most optimal form.

Markup comes after that and personally, I think it is quite important that every piece of content has the correct element. If that is done and it works together with the design (obviously, document order affects the layout), the site is as compact as possible and ready for Google.

Q7. You have said in the past 'Markup is all about personal opinions'. There seems to be many ways to mark up content. How do you decide the right element for the job? And is there a perfectly marked up website?

For most of the document content, there are HTML elements available that cover the semantics the elements need. Of course, everything could be more specific, like nl from XHTML 2.0 instead of ul, but with the current available and implanted markup most of the 'content items' can be described. (It is debatable if navigation is actually a part of the content.)

In some situations this gets quite tricky. For example, sometimes the content would really benefit from a definition list construct, but it would be "bad" from a semantic point of view (the structure is nice, but the element has the wrong name). Personally, I choose (most of the time) elements with the structure that is needed, over elements like DIV, which might be better from a semantic point of view.

A perfectly marked up website? No, I don't think there is any. For me, when I think about this subject, I see multiple markup languages, each designed for a specific job, combined together using namespaces in a single document.

On the current web, leaving the design of a site aside, I like web pages with a minimum of markup. I don't have a specific site I like most.

Q8. Is perfect accessibility possible on a website?

Tricky question. To answer it we have to know what we understand under accessibility. For me, this is something in the lines of: "all document content including images and other multi-media must be available for everyone".

In that case, accessibility isn't that difficult. Actually, every site could be accessible if people knew how to use HTML (that is for today's web, where all pages are delivered in some form of HTML). Most designers don't understand it, they think it is a tool for displaying their layout. They don't understand that they give the wrong semantics to a web document using tables. They don't understand how to use the img element and that an ALT attribute is required. (It must be said though, that img is quite poorly designed.)

Q9. Is it true you develop in Mozilla then test in Internet Explorer? Do you think it is better to start with a standards compliant browser and work backwards?

Both are indeed correct. Actually, for my personal site, I don't even bother to test in IE anymore (It must be said that every browser has some kind of strange behaviour on that site).

I used to develop with Internet Explorer when I started. I mean, why not, it is the browser most people use and those people will be your audience most of the time. The problem was however, that I was coding for IE and all it's specific behaviours and quirks. When I tested it in Mozilla afterwards, everything was different.

That is not an example that standards don't work, they do. It is an example of a developer who thinks the browser is right and codes to the browser. Now I'm working in Mozilla, I will still code to some browser quirks (although I know most of them, since I follow lots of HTML/CSS bugs in Bugzilla), but Mozilla is *much* more standard compliant than IE is. Since most people who create a couple of sites know what a browser supports and what not and most people have a list of workarounds it is fairly easy to create a standard compliant site.

Q10. Finally, following on from a recent post of yours, do divs and spans have semantic meaning?

I was in the opinion that they didn't have any semantic meaning at all, but Liorean changed my mind:

And I want to point out that neither div nor span is without semantic meaning, and neither do they play a different role in XHTML2 than they do in XHTML1. A div has the role of a generic division of a document, a section of content. There's very few documents on the web where there is no place it's semantically correct to place a div to wrap sections. The span has a little weaker meaning, only meaning a generic grouping of text. Note that their explicit semantic meaning is a generic one, in difference to the specific semantics of all other HTML elements. The XHTML2 section element differs in meaning to div in that it is not a generic section of a document, but a generic section of text/content.

http://annevankesteren.nl/archives/2004/03/re-the-myth-of-css#comment-913

Interview taken from the Web Standards Group, 4th May 2004 http://webstandardsgroup.org/features/anne-van-kesteren.cfm

Unrestricted Access

Interviews