FAQ in Webserver Programming

Which HTTP request headers are expected when a browser sends a GET?

According to the HTTP specification there are really not much headers that are to be expected when receiving a GET request. The only header that MUST be sent is Host, for example Host: www.example.org. It allows servers to distinguish among virtual hosts (i.e. one server serving multiple websites) they serve on a single IP address.

Which HTTP response headers are mandatory?

There are not a lot of response headers to be returned under all circumstances. The one header that really should be sent by a server is Date which is the timestamp the resource was generated. This is not to be confused with Last-Modified which helds the timestamp the requested resource was modified the last time. Bear in mind that Last-Modified is optional.

Apart from that, there are situations when a webserver SHOULD or even MUST send particular headers. So the real answer is "it depends". A good starting point is to look up the specification of the concrete response code that's going to be sent.

Where can I find the specifcation of HTTP?

The specification of the still predominant version 1.1 of HTTP can be found at RFC7230 Message Syntax and Routing and subsequent documents, namely RFC7231 Semantics and Content, RFC7232 Conditional Requests, RFC7233 Range Requests, RFC7234 Caching and RFC7234 Authentication.

What does implementation-independent in context of HTTP mean?

It means that HTTP doesn't specify whatsoever about which technology to use to produce requests and responses. The only thing to care about when writing a webserver is that it follows the interface specification of HTTP.

What is meant by HTTP being a stateless protocol?

It means that each request shall be understood in isolation by the webserver. In HTTP there is no idea of a connection state meaning that the semantics of a request shall not depend on previous requests. A good counterexample is the FTP protocol. In FTP there's some connection state information, for example the current directory and operations like a ftp get are interpreted relative to the current directory.

What are resources?

A resource is a piece of data being served by a webserver. They are identifiable by their URI (Uniform Resource Identifier). URIs act like global IDs. In fact, URIs build a global namespace. One more thing to note is that resources can be associated to multiple URIs.

What are representations of resources?

One resource can have multiple so called representations. A respresentation delivers the resource state in a specific data format. As an example think of an order that's being served as text/html for human users and as text/json for external systems. The most important thing to note is that both representations are likely to be served from the same URI. However, client's need to set content negotiation headers, like Accept, to ensure they get the representation they desire.

What does idempotent mean?

Idempotent means that sending the same request multiple times has the same effect as sending this request just once. As per the specification, several HTTP methods like GET, PUT and DELETE need to be implemented in an idempotent way. HTTP system components like web browsers, caches, proxies and even web crawlers count on that guarantee. A practical example would be a payment operation. Sending the payment request multiple times would not cause the customer being charged multiple times.

This causes a backend developer more work, so why is it a good thing? It's a common concept in distributed computing to deal with typical situations like broken network connections, server unavailability and so on. With idempotent interfaces HTTP clients are allowed to resend requests when they're in doubt an operation was processed in a successful way by the server.

Which HTTP methods are idempotent?

The main idempotent methods are GET, HEAD, PUT and DELETE. However, POST is not idempotent and remember that POST is the only method supported by HTML meant to create, update or delete a resource.

Which HTTP methods are considerd safe?

Safe methods don't cause any change of the resource state. In HTTP, the main safe methods are GET, HEAD and OPTIONS. Bear in mind that clients count on that. As a consequence it is a bad idea to implement a delete operation through a GET. Web crawlers would happily visit such "delete" links, causing your resources being deleted accidentally.

When to use the 302 (Found) response code?

A good example is a resource that requires the user to login. The user wants to browse to such a resource and the web server responds like "well, I've found what you're looking for. But first please authenticate yourself via the resource I reference in the Location response header." This StackOverflow answer captures this scenario quite well.

Which HTTP response code to use when implementing the Post-Redirect-Get pattern?

In my opinion the best fitting response code is 303 (See other) since — according to the specification — it's "an indirect response to the original request". Along with the response code 303, the response should set a Location header to tell the client under which URI it finds the result of the operation requested with POST.

Can HTTP reason phrases in HTTP responses be overridden?

For the response code 200 the recommended reason phrase is "OK". That's only a recommendation as stated by the Section 6.1 of RFC 7231 and can be overridden without affecting the protocol.

What is the rationale behind the `Transfer-encoding: chunked` response header?

In HTTP/1.0, responses needed to have a Content-Length header which required a server to know the length of the body it is going to send in advance. With Transfer-encoding: chunked servers can omit Content-Length and therefore start sending the payload earlier without knowing the eventual size. Moreover chunked encoding allows the server to send headers after the payload has been sent. That's advantageous when headers are calculated based on the payload, e.g. when signing the payload and transfering the result in a header field.

How to deal with caching of personalized pages?

Suppose a website has a small personalized element on each page — for example the username in the top right corner. There are two possible solutions:

loading personal content with JavaScript
use advanced techniques like Edge Side Includes in reverse proxies (e.g. Varnish, Squid)

Should the Last-Modified response header be sent?

According to Section 2.4 of RFC 7232 the Last-Modified head SHOULD be sent. User agents take the timestamp sent in Last-Modified as the value for the request validation header If-Modified-Since. One must know that Last-Modified found its way into HTTP/1.0 whereas ETags were introduced in HTTP/1.1. According to the specification a webserver should send both a Last-Modified and an ETag header.

Does the `ETag` header have any advantage over `Last-Modified`?

One reason to send both Last-Modified and ETag is that Last-Modified based caching can be harmed by time synchronization problems between client and server.

What is a user agent or UA?

User Agent is another term for an HTTP client. User Agent (and origin server) is sort of the canonical name for a client in the HTTP specification.

Are URIs that only differ in a trailing slash considered to be two different URIs?

Yes, they are considered to be different URIs. However, they still can point to the same resource. And in practice that's usually the case. However, it might be a good idea to designate a primary URI for a resource and respond with 301 (See Other) to requests that use a non-primary URI to the resource.

Is HTML5 valid XHTML?

No, HTML5 is not XHTML, nor is it XML. In XHTML even minor syntax errors cause incomplete rendering of pages while HTML5 is more forgiving. For most use cases HTML5 is recommended. See 1 Introduction — HTML5 for more information. A good discussion if XHTML is dead can be found at xhtml - Is XHTML5 dead or is it just an synonym of HTML5? - Software Engineering Stack Exchange.

What does the `Vary` header do?

First of all, it's a header that controls caching and content negotiation. But how? Vary is a response header telling clients and intermediaries which request header fields are used by the origin server to select a unique representation. So if the server serves an article with the URI http://example.org/some-blog-article and uses the Accept-Language header to determine the language of the article to be sent, then Vary should have the value Vary: accept-language. If a second request header is significant in choosing a represenation, then the header field names in Vary are separated with commas like this Vary: accept-encoding, accept-language. The header field names are case-insensitive, by the way. Read more about the Vary header in Section 7.1.4 in RFC 7231 and Section 4.1 of RFC 7234.

Do I have to set a `Vary` header?

No, a webserver is not required to send a Vary header. If it is absent, solely the URI of the resource determines the uniquness of a resource to be cached. Caution needs to be taken when Vary has a value of *. In this case Section 4.1 of RFC 7234 says that testing if a request can be satisfied by a cached version always fails. A good article on best practices for using Vary can be found on the Fastly blog.

What happens when a server constructs varying responses based on some header field and `Vary` does not include this header field?

The article Best Practices for Using the Vary Header | Fastly on the Fastly blog does a good job pointing out possible consequences. The author uses the Accept-Encoding request header as an example. To summarize, if an origin server generates either a compressed (gzip) or an uncompressed response based on the received Accept-Encoding request header and Vary is not defined, a cache could deliver a gzip-compressed response to a client that does not understand gzip. The other way round is also possible: if Accept-Encoding is significant to determine the response and Vary is not set accordingly (should be Vary: accept-encoding), a cache could end up with the uncompressed version of the response. As a result clients that do understand a gzip response would be served with the uncompressed one from the cache and thus bandwidth and performance would be wasted.

When a POST is resulting in 200 (OK), what is supposed to be sent in the request body?

In Section 6.3.1 of RFC 7231 the specification of the 200 response code says that: "POST a representation of the status of, or results obtained from, the action;". Should the new representation of an updated resource (e.g. the shipping address in an online shop) be enclosed in the message body? As far as I understand: in some circumstances yes, in particular when you want the response to a POST to be cacheable, see Section 4.3.3 of RFC 7231. But the cited sentence above probably means that something simple like "It works" shall be enclosed in the body of a 200 OK response. For a hint see Section 7.1.4 of RFC 7231 that says "... thereby distinguishing it from representations that might only report about the action (e.g., "It worked!")". There's even more to read about what to put into a response body in Section 3.3 Payload Semantics of RFC 7231.

Is a `500 Internal Server Error` cachable?

Be default, the 500 status code is specified to be non-cachable by default. See Section 6.1 of RFC 7231 and Section 6.6.1 500 Internal Server Error of RFC 7231. However, if caching headers explicitly mark a response as being cachable, a 500 can be stored by a cache. See Section 3 of RFC 7234 that says this "Note that any of the requirements listed above can be overridden by a cache-control extension;". I've written a more detailed answer on StackOverflow.

How to know if a response with a certain response code is cachable?

Each response code (a list can be found in Section 6 of RFC 7231) that is cacheable by default is declared as such in an explicit way as pointed out by Section 6.1 of RFC 7231.

Is including a `Content-Type` header mandatory when sending a request with data enclosed in the message body, like in the case of PUT and POST?

No, it's not mandatory. The sender should include the Content-Type header to indicate the format and encoding of the enclosed data in the body. If Content-Type is missing, the server may assume the content type is application/octet-stream or apply content sniffing to determine its type. For more info see Section 3.1.1.5. Content-Type of RFC 7231.

How to know which `headers` to send when sending an HTTP request with a certain method?

One might assume that the respective method definition in RFC7231 is the place to search for that information. Unfortunately not. Instead this information is sprinkled accross the header specifications. It's impractical to read through the specifications of all exisiting headers. Instead a good strategy might be to observe HTTP traffic sent by your favorite web browser since they tend to be pretty standards compliant when it comes to HTTP.

Is it allowed to send multiple header fields with the same header name?

No and yes. In general Section 3.2.2 of RFC 7230 specifies that

A sender MUST NOT generate multiple header fields with the same field name

But there are exceptions and the specification goes on with

unless either the entire field value for that header field is defined as a comma-separated list [i.e., #(values)] or the header field is a well-known exception"*

One of those exceptions is the Set-Cookie header field that can appear multiple times in an HTTP response.

Why is it allowed to send multiple Set-Cookie header fields in a HTTP response?

The cookie specification states the following in Chapter 3:

An origin server can include multiple Set-Cookie header fields in a single response.

And further

Origin servers SHOULD NOT fold multiple Set-Cookie header fields into a single header field. The usual mechanism for folding HTTP headers fields (i.e., as defined in [RFC2616]) might change the semantics of the Set-Cookie header field because the %x2C (",") character is used by Set-Cookie in a way that conflicts with such folding.

While I can't tell how this syntax incompatibility came to be, I suppose it's because browsers and servers were working that way before the respective aspects of the web had been standardized.

« Back to Resources