With UTF-8 encoding, you will eliminate a whole heap of trouble. If your editor allows you to change the encoding in the ‘Save as’ dialog box, then paste the text after saving the file. If your file turns into gibberish, you need to first select everything in the file to the system clipboard ( Ctrl+A and Ctrl+C), then change the encoding, and finally do a paste ( Ctrl+V). (Ideally, you should be storing all UI strings in a resource file so that file encoding is never a problem.) However, for maximum resilience and portability, save all your source files in UTF-8 encoding. These do not work well with non-English Unicode strings. If your OS is Windows, then it is likely that you save all your source code files in the default Windows encoding Latin-1 or Western European. Similarly, the numeric entity reference for the registered symbol is written as ® or as ®. It can also be written as © with the x signifying that the codepoint is hexadecimal. However, any character can be written using its Unicode codepoint, which is known as a numeric entity reference. HTML has several such character entity references. The registered symbol (®) could be written as ® and Trademark symbol (™) could be written as &trade. When I started learning HTML, I found that the copyright symbol could be written as ©. Figure 2: The spartan way of highlighting comments in source code For many years, Linux would let you create a directory named CON in a Windows partition if you wanted to. Windows, too, would not allow you to touch a file or directory named CON because it was a reserved file descriptor for the console in DOS. A more subversive trick was to use memory utilities and change the name of a directory to ‘CON’ in the file allocation table (FAT). On lab computers that did not have floppy drives, I created undeletable directories in the hard disk by suffixing the directory names with an undetectable space symbol (Alt+255). On a DOS keyboard, you could type the copyright symbol (©) by holding down the Alt key and typing 0169 on the numeric keypad. Windows was a GUI program that ran on it. In the early days of computing, I worked primarily on DOS. In Unicode, the box-drawing characters (⊢ ⊣ ⊤ ⊥ ⊦ ⊧ ⊨ ⊩ ⊪ ⊫ ═ ║ ╒ ╓ ╔ ╕ ╖ ╗ ╘ ╙ ╚ ╛ ╜ ╝ ╞ ╟ ╠ ╡ ╢ ╣ ╤ ╥ ╦ ╧ ╨ ╩ ╪ ╫ ╬) were moved further up into the stratosphere. It is better than traditional ASCII art even today. Unicode represents a paradigm shift from the days when boxes were drawn in the headers of C code using characters in the extended ASCII set (128 extra codes added to the 128-code ASCII). Spolsky’s article mentions how the creators of PHP initially wrote the Web scripting language without support for Unicode! If your computer course rushed through such details, please run through and spend some time on the basics from a proper C language book. When learning programming for the first time with a language such as C, it is easy to assume that a character is the same as a byte or that a byte comprises only 8-bits. Figure 1: The uber way of beautifying comments in source code However, the tech giant gave that up, once Unicode continued to grow. Microsoft had a big font named Arial Unicode MS that was supposed to support all of Unicode. Unicode is a unified text scheme for representing characters of over 150 languages, the list of which also includes Braille, sign language, Esperanto, musical notations, hieroglyphs, cuneiform, chemical symbols, emoticons and dingbats. If you haven’t read the article titled, T he Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) written in 2003 by Joel Spolsky (co-founder of ), I suggest you read it before writing another line of code. Did your computer course teach you to acquire resources as late as possible and release them early? Did any programming book tell you that software should not be developed using Visual Studio while it is running with administrator privileges? Every developer goes through the process of learning new and best-practices while unlearning old and die-hard ones. Some things that nobody teaches us are learnt the hard way. If you are a developer, do not make the mistake of ignoring this encoding standard. Its main purpose is to serve all mankind. Much thought from the finest brains all over the world has gone into the making of Unicode.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |