Write RTL Better With These Unicode Characters

Reading Time: 7 mins

Writing in RTL languages (Abjad writing system) comes with a few challenges specially if your editor’s support for those languages isn’t so good, and that is the case for the majority of web based editors (e.g. comment sections). The issues you may come across will vary from wrong direction of texts that have a mix of RTL & LTR characters to having end-sentence punctuation marks coming to the beginning of the sentence. In this post I’ll tell how to solve issues of this kind using specific unicode characters.

Table Of Contents

An Example

Let’s take a look at this sentence:

        ‪سلام! من یک متن فارسی هستم!

and now this one:
        ‫سلام! من یک متن فارسی هستم!

The first sentence has it’s exclamation mark at the beginning of the sentence! but the second sentence has it’s exclamation mark correctly at the end, why? because it has an extra 202B character at the beginning. We call these characters non-printing characters (for obvious reasons 😁), there are also characters that can change the paragraph to RTL, to LTR, to reverse a word, to change the order of character blocks and so on. The aim of this post is familiarity with some of these characters; This post was written after I found this question on Stackoverflow.

Intro & Terminology

1- Directionality:

All characters have an intrinsic “directionality”, for example Latin characters have Left-to-Right directionality and Persian/Arabic/Urdu characters have Right-to-Left directionality, some characters have neutral directionality like dot (.) and comma (,), when we type a series of RTL characters, the text will be RTL and when we type a series of LTR characters, the text will be LTR. (Oh really? 😂)

2- Character Block:

Characters next to each other that have the same directionality are called “character blocks! In the next image we have 7 character blocks.

characters-block-photo

3- Different Types of Non-printing Characters

3 of the types are of importance here, 1- Mark characters, 2- Embedding characters, 3- Override characters.

Mark Characters

1- 200F: Right-To-Left Mark

This character has an intrinsic right-to-left directionality but has no visual appearance! The usage of this character is that if we add it to a neutral character, it can change the directionality of the associated character block to right-to-left.

A simple text in stackoverflow’s answers section.

text not using 200F

The same text with an extra 200F at the end.

2- 200E: Left-to-right-mark

This character has an intrinsic left-to-right directionality and just like the former one it has no visual appearance, we can use it alongside a neutral character and change the associated character block to left-to-right.

The text below has a dot at the ending and after adding the 200E character the directionality of the last character block is set to LTR, and the dot comes to the right of the English sentence.

Usage of 200E at the end of the sentence.

text with and without 200E

The raw text shown in VSCodium.

vscodium screenshot for a better showcasing of U200E

Embedding Characters

1- 202B: Right-to-left Embedding

This is the character that we talked about in our first example, it’s job is to change the directionality of a paragraph to RTL, or to be more precise, to change the order of the character blocks within a paragraph to Right-to-Left. A really useful scenario for this character is when you want to write a mixed list of words, to make the list look as expected a 202B must be added before each LTR character block.

Addition of 202B before each number (numbers are LTR).

showing a use case of 202B

The raw text shown in VSCodium.

 

202B use case in vscodium

P.S: A Secondary Usage of 202B

A secondary usage that it had for me personally was when I wanted to write English words and have them to the right of the editor in editors such as Notepad, KWrite and Gedit. To do so I add this character before the paragraph and it comes to the right. In this niche scenario 202F can be used without any change too.

Using 202B before 3rd and 4th paragraph
202B character not used
Not using 202B before 3rd and 4th paragraph
202B character used

Another use-case of this character is in some government websites’ editors that don’t have any built in functionality to distinguish the directionality of the text, and you’ll be presented with such outputs:
    مطالب upload شدند.

So we add a 202B in the beginning and we’re good to go:

    ‫مطالب upload شدند.

2- 202A: Left-to-right Embedding

In the next image I’ve changed the directionality of each character block to LTR so that I could write a math equation using Persian words:

Addition of 202A before each word in the second Persian equation.

photo showing a use case of U202A

The raw text shown in VSCodium.

 

3- 202C: Pop Directional Formatting یا PDF

This character marks the end of an embedded section, imagine I want to use 202A and 202B characters multiple times in a text, wherever I want to end this embedding section, I use 202C or PDF, I’ve actually used this character in the Persian translation of this post, in that case, I wanted a left to right character block within a right to left paragraph, so I ended the LTR section using a 202C.

The first sentence in the next image is raw, the second one has changed the directionality of the whole paragraph to LTR, and the third one has a single LTR character block and the rest of the paragraph is still RTL which is what we expect.

The text mentioned

The raw text shown in VSCodium.

 

 

pdf showcase in vscodium

Override Characters

1- 202D & 202E

They don’t have many use-cases but they are the cutest non-printing characters 😇, they change the directionality of every single character in a character block! so if you type “Hello” and then add a 202E (Left-to-Right override) in the beginning it’ll change to “olleH”! and if you write “سلام” with a 202D (Right-to-Left override) in the beginning, it’ll change to “مالس”! except this fun use-case the other use-case I know of is an evil one! 😈, I’m gonna tell you about it later.

Using override characters for fun!

The raw text shown in VSCodium.

 

vscodium showcase of override characters

2- The Evil Use-case of Override Characters!

Look at the file below, what do you see? a harmless .png photo right?

Image of a harmless .png photo.

evil use case of override characters

But NOOO! It’s a windows executable!

Behind the scenes of the ‘harmless’ file!

evil use case of override character

Now let me show you the raw text of the filename!

The raw text shown in VSCodium.

 

vscodium showcase of override character

This is an evil use-case of 202E, you can make an executable (probably a malicious one) look like an innocent picture! By the way this method has become obsolete and even if you change the name to look innocent, you can’t bypass the UAC (User Account Control) privileges, anti-viruses and make a thumbnail that can work on all windows versions.

Playground

Here is a playground for you to work with the characters and see how things work.

How To Use On Linux

The majority of Linux keyboards have these characters by default, in the following table you’ll see the location of each of the characters:

Location Character
Alt + 0 200F
Alt + 9 200E
Alt + ] 202B
Alt + [ 202A
Alt + p 202C
Alt + i 202D
Alt + o 202E

How To Use On Windows

There are two ways you can write these characters on Windows, the first one is to hold Alt key down and write the unicode number using the numpad, and the other way is to add those characters to your keyboard using Microsoft’s tutorial on how to Insert ASCII or Unicode Latin-based symbols and characters.

Characters On Windows

Location Character
Alt + 8207 200F
Alt + 8206 200E
Alt + 8235 202B
Alt + 8234 202A
Alt + 8236 202C
Alt + 8237 202D
Alt + 8238 202E

Outro

Thank you for reading my post, if you detected any errors, or if you have any suggestions regarding this post or anything about my blog, I’ll be happy if you write a comment or send me an email.

By elamir

🧠 Logician (INTP) ❤️ A good friend 💻 Software developer 💊 Medical science student 🌐 A global citizen from Iran

2 comments

Leave a comment

Your email address will not be published. Required fields are marked *