seo: how to specify character encoding in html web pages

There are several different things you can do to optimize your web pages, that will allow them to load faster across web browsers. A fast page load time will help with the SEO and allow you to have some advantages with search engine rankings over time. One of the recommendations is to specify the correct character encoding early in the computation process of rendering.

Although there is a difference between Character Set and Character Encoding, we will use both of these interchangeably here as it refers to pretty much the same within the scope of this post. In the strictest sense, you will be specifying the character encoding rather than the character set.

A Character Set or Character Encoding is specifically a set of characters that can be used to render the entire content of a web page correctly and legibly on a visual medium such as the computer or smartphone. Specifying the character set correctly and early is important in making sure that the medium or web browser can display the page correctly across all client machines.

This is also one of the recommendations made by several web page optimization software including Google Page Speed. This character encoding information allows the browsers to render pages early without having to parse the web page code to determine the character set. This can significantly reduce the page load times.

There are mainly three different ways of specifying the character encoding when creating web pages.

Response Headers: The character set is set site-wide using the response headers. This means that every web request or at least a large subset of them will have the same character encoding, which is usually true for most websites.

Web Page: Each web page sets its own character set. This is usually done by the web developer while creating a web page. The meta tag inside the html head tag is used to specify the character encoding in this case.

XML Tag: Again set inside the code of the web page, this relevant when the web pages are served as xhtml rather than just html. The xml declaration tag is used in this scenario.
While each of the above is valid, they are not equivalent when it comes to rendering on the client side web browser, especially in terms of the page load times. Typically the web browser uses various different ways to determine the content type and character encoding of the webpage or the document that it is trying to render. A typical order in which this happens is

Response Header: It checks the response headers for a header named Content-Type.

XML Tag: If the web page is a xhtml document, then parse the xml declaration to read the attribute named encoding.

Meta Tag: Parse the head tag of the web page to find the meta tags and find tag with an attribute http-equiv set to Content-Type.

Heuristics: Some standard heuristics based algorithm to determine the character set based on the web page content.

Looking at the order in which the web browser determines the character encoding, you can see that the first one is the Response Header. If you put the character set value in there, that would save additional computation time on the browser end. If the browser does not find the character encoding in the headers, then it will go down the order which involves parsing the web page code/content which is more resource intensive.

We will see the various ways that the character encoding can be set in the response headers. This mostly varies depending on the platform that the website runs on.

Apache HTTP Server

Apache Server is one of the most commonly used servers to serve web pages. You will need to modify the .htaccess file on your Apache instance to set this.

Open the .htaccess file in a text editor and modify it to include one or both of the following lines in it.

AddDefaultCharset UTF-8

Or

AddType 'text/html; charset=UTF-8' html

As there can be a .htaccess file in each folder, you can vary and set different character set on a per directory basis. Usually, you will just need to modify the .htaccess file in the top most directory to set the value sitewide.

Sometimes, you might want to force a different character set than the one defined in the .htaccess for a single file or a small set of files. This can also be done by setting the following lines of code in the .htaccess file. For example, to override a file named example.html,

<Files example.html> ForceType text/html;charset=ISO-8859-1 </Files>

Nginx Server

Another popular web server used is Nginx. You can modify the response headers in the Nginx server by setting the following code in the config. This file is named nginx.conf in your conf/ directory. It could be placed inside the server{…} section of the file.

charset UTF-8

PHP website

If your website is written in php, then regardless of the web server it gives an additional option to set the response header. You may use the following header function before you generate any content at all.

header("Content-Type: text/html; charset=utf-8");

Sometimes, you may not be able to configure the server to send a proper response header such as when you do not have access to modify the .htaccess files. Or some business requirements requires you to set the value on a per page basis. But with most websites and blogs, it is not usually the case and all pages have the same encoding allowing it to be set sitewide at the server level.

Also, note that the files need to be served using the http protocol for this to work.