Cases of incorrect text display
Of course, when a program flatly refuses to open seemingly native formats, it is very difficult, if not almost impossible, to fix it. But, there are times when they open, but their contents cannot be read. We are now talking about those cases when instead of text, by the way, with the structure preserved, some squiggles are inserted, which cannot be “translated”.
These cases are most often associated with only one thing - incorrect text encoding. It would be more accurate, of course, to say that the encoding is not incorrect, but simply different. Not perceived by the program. Another interesting thing is that there is no general standard for encoding. That is, it may vary depending on the region. So, having created a file, for example, in Asia, most likely, if you open it in Russia, you will not be able to read it.
This article will directly discuss how to change the encoding in Word. By the way, this will be useful not only for correcting the “malfunctions” described above, but also, conversely, for intentionally incorrectly encoding a document.
How to Fix Broken Character Encoding (Corrupted Text) in Microsoft Word
What is text character corruption?
People who actively work with Plain Text files that are suffixed with the .TXT extension will sometimes encounter documents that show garbled text instead of the expected text. This phenomenon often occurs when the corrupted text document is written in a foreign language that does not use the Latin alphabet, but can happen for all files if there are inconsistencies in the settings used when saving the file.
Character corruption occurs when the save file uses a default file encoding that is different from the end user's program. Most computer programs use UTF-8 encoding by default, but foreign characters usually also have one or more language-specific encoding systems. For example, Asian languages use a 16-bit encoding system; therefore, when a document is opened on a machine that uses an 8-bit system (such as UTF-8), the text will be replaced with garbled characters.
Rest assured that the corrupted text is not lost. There are many ways to fix corrupted character encoding, including using special software created for this specific scenario. However, if you only want to fix one or two documents, downloading and installing new software can be a hassle. Here I will show you how to fix these corrupted text files in Microsoft Word, which is probably already installed on computers running Windows operating system.
If you are using a Windows computer, you most likely already have Microsoft Word installed. Microsoft Word has a built-in character encoding converter that you can use to save a file in the desired encoding.
This fix will work with Microsoft Word 2003 and later.
Windows opens plain text files (with a .txt extension) using Notepad by default. To open a damaged document in Microsoft Word:
1. Right-click the document
2. Select "Open with"
3. Select "Word"
The Convert File dialog should open automatically when a file with corrupt encoding is detected. Select Encoded Text from the list of options and click OK.
If the dialog box does not appear, you must launch it manually. Go to File -> Options -> Advanced and scroll down until you reach the General section. Under General, select the Confirm file format conversion when opening checkbox. Close Word and reopen the damaged document and a dialog box will appear.
The encoding selection dialog should automatically suggest the correct encoding. If it doesn't, you can manually select an encoding from the list.
Select Auto Select if you are unsure of the source encoding, or select from the list if you know the language the file is in. You will be able to check whether the damaged file is fixed in the preview window.
The recovered text can now be read in Microsoft Word, but it may still appear corrupted in plain text software because many are not written to handle special character encodings. To prevent this from happening, it is best to save the document in a plain text encoding such as UTF-8 or UTF-16.
To do this, click the "File" tab in the upper left corner of the document and select "Save As" from the list. Select a save folder and select "Plain Text Document" as the file format. Click "Save".
A new Convert File dialog box will open. From the list, select the encoding for the final document. The preview box will highlight words that won't save correctly in red, so try to select an encoding that matches the document. When in doubt, it is best to use the Unicode format as an encoding, as it is designed to take into account all the world's writing systems.
Finally, click "OK" to save the revised document.
Your document should now display correctly in the plain text program of your choice, such as Notepad.
,
Definition
Before talking about how to change the encoding in Word, it is worth defining this concept. Now we will try to do this in simple language, so that even a person far from this topic will understand everything.
Let's come from afar. The Word file does not contain text, as many people believe, but only a set of numbers. It is they who are converted into understandable symbols by the program. It is for these purposes that encoding is used.
Encoding is a numbering scheme in which the numeric value corresponds to a specific character. By the way, the encoding can contain not only a digital set, but also letters and special characters. And due to the fact that each language uses different characters, the encoding is different in different countries.
Understanding text encoding
The text that appears as text on the screen is actually stored as numeric values in a text file. The computer translates numeric values into visible symbols. An encoding standard is used for this.
An encoding is a numbering scheme in which each text character in a set is assigned a specific numeric value. The encoding may contain letters, numbers and other symbols. Different languages often use different character sets, so many of the existing encodings are designed to represent the character sets of their respective languages.
Different encodings for different alphabets
The encoding information saved with the text file is used by the computer to display text on the screen. For example, in the “Cyrillic (Windows)” encoding, the “Y” character corresponds to the numeric value 201. When you open a file containing this character on a computer that uses the “Cyrillic (Windows)” encoding, the computer reads the number 201 and displays "Y" sign.
However, if the same file is opened on a computer that uses a different encoding by default, the character corresponding to the number 201 in this encoding will be displayed on the screen. For example, if your computer uses the “Western European (Windows)” encoding, the “Y” character from the source Cyrillic-based text file will be displayed as “É”, since this is the character that corresponds to the number 201 in this encoding.
Unicode: a single encoding for different alphabets
To avoid problems with encoding and decoding text files, you can save them in Unicode. This encoding includes most characters from all languages that are commonly used on modern computers.
Since Word is based on Unicode, all files in it are automatically saved in this encoding. Unicode files can be opened on any computer with an English operating system, regardless of the language of the text. In addition, on such a computer you can save files in Unicode that contain characters that are not in Western European alphabets (for example, Greek, Cyrillic, Arabic or Japanese).
This is interesting: How to set up an FTP server on Windows 10, 7 and Linux
How to change encoding in Word. Method one
After this phenomenon has been defined, you can proceed directly to how to change the encoding in Word. The first method can be done by opening the file in the program.
If you see a set of incomprehensible characters in the opened file, this means that the program incorrectly determined the text encoding and, accordingly, is not able to decode it. All you need to do to display each character correctly is to specify the appropriate encoding to display the text.
Speaking of how to change the encoding in Word when opening a file, you need to do the following:
- Click on the “File” tab (in earlier versions this is the “MS Office” button).
- Go to the "Settings" category.
- Click on the “Advanced” item.
- In the menu that opens, scroll the window to the “General” item.
- Put o.
- Click "OK".
So, half the battle is done. You'll soon learn how to change text encoding in Word. Now, when you open files in Word, a window will appear. In it you can change the encoding of the opening text.
Follow these steps:
- Double-click the file that needs to be transcoded.
- Click on the “Encoded Text” item, which is located in the “File Conversion” section.
- In the window that appears, set the switch to “Other”.
- In the drop-down list located next to it, determine the desired encoding.
- Click OK.
If you have chosen the correct encoding, then after all this has been done, a document will open in a language that is understandable to you. The moment you select an encoding, you can see what the future file will look like in the “Sample” window. By the way, if you are thinking about how to change the encoding in Word on MAC, to do this you need to select the appropriate item from the drop-down list.
Creating text with the desired encoding
Sometimes it becomes necessary to create a text file in a different code system. For example, for the PDF graphic editor of the Works-6 program or other software products. The Word editor will help you solve this problem. You need to type the text as you usually do, observing the necessary structure and requirements for the typed information.
After creating the file, in the main menu of the editor, go to FILE, and then select SAVE AS. In the drop-down window, in addition to the ability to determine the future name of the file, options for encoding the file after saving will be presented.
To prevent information loss, it is recommended to save the file in a regular format, and only then write it in the required one.
Please note that there are programs that do not support word or line wrapping. Therefore, in this case, it is necessary to write the text avoiding such hyphens.
Another feature when difficulties arise in the readability of the text. This is a slight difference between the 2003 version of Worda and later versions. A new text file format has appeared - docx. Its difference is not a question of encoding, in the sense in which we are now considering it. And this kind of information cannot be viewed on the old version; the editor needs to be updated.
Instructions
- If you do not have the Word program, then download it from the official website of the developers and install it on your computer. If you do not intend to use this program constantly, then you do not need to pay for it; the trial version will suffice.
- Right-click on the desired file and open the “Open with” submenu, select the Word program. If this program is not in the list, then launch Word in the usual way. Open the "File" menu and select the "Open" command, specify the location of the desired document on your hard drive and click "Open". You will be offered several options for opening the file related to its non-standard encoding; select the one you need and click OK.
Encoding selection
- Next, you need to change the encoding and save the result; to do this, open the “File” menu and click “Save As”. Specify the directory for the modified document, enter a new name and execute the “Save” command. The Document Attributes window will load, select the desired encoding and press Enter (the most used encoding is “Unicode”).
- Be careful when saving the document; if you try to save the file in the same folder with the same name, the new document will replace the old file. To save two different documents on disk, you need to use different names or folders for them.
- When saving a file, also pay attention to its extension. If the document will be opened in the future using Word 2003 and older versions, then use the doc format. If the document is needed for a program from 2007 or newer versions, then the docx format is suitable. It is also worth remembering that the doc format opens on both older versions of the program and new ones, but they have limited formatting. It is worth understanding that displaying a text document in non-standard characters is not only a sign of an unknown encoding; perhaps the editor you are using does not have the required font, in which case it is not the encoding that needs to be changed, but the font.
This is interesting: What kind of program is Unity Web Player, installation method, removal, description of how the extension works
Method two: while saving the document
The essence of the second method is quite simple: open a file with an incorrect encoding and save it in a suitable one. This is done as follows:
- Click File.
- Select "Save As".
- In the drop-down list located in the “File Type” section, select “Plain Text”.
- Click on “Save”.
- In the file conversion window, select your preferred encoding and click OK.
Now you know two ways to change text encoding in Word. We hope that this article helped you in resolving the issue.
Selecting encoding when saving a file
If you do not select an encoding when saving the file, Unicode will be used. In general, Unicode is recommended because it supports most characters in most languages.
If you plan to open the document in a program that does not support Unicode, you can select the desired encoding. For example, on an English operating system, you can create a document in Traditional Chinese using Unicode. However, if such a document is opened in a program that supports Chinese but does not support Unicode, the file can be saved in the "Chinese Traditional (Big5)" encoding. As a result, the text will display correctly when you open the document in a program that supports Traditional Chinese.
This is interesting: How to reassign keys on the keyboard - an overview of reassignment programs
Note:
Because Unicode is the most comprehensive standard, some characters may not appear when you save text in other encodings. For example, suppose that a Unicode document contains text in both Hebrew and Cyrillic. If you save the file in the “Cyrillic (Windows)” encoding, the Hebrew text will not be displayed, and if you save it in the “Hebrew (Windows)” encoding, the Cyrillic text will not be displayed.
If you select an encoding standard that doesn't support some characters in the file, Word will mark them in red. You can preview the text in the selected encoding before saving the file.
When you save a file as encoded text, the text for which the Symbol font is selected, as well as the field codes, are removed from the file.
How to solve encoding problems in Windows and MS Office
With the transition to all new versions of Windows, the severity of the problem of the existence of many Russian language encodings has almost disappeared
This problem is radically solved by the transition to Unicode, which for Windows has been going on for several generations of this system, but it still won’t end. And, as often happens, while solving some problems, Unicode gives rise to many others.
However, for Internet resources and emails this is almost always easily resolved automatically by browsers and email clients. If your web page or letter still appears “crazy” (which sometimes happens due to non-compliance by developers with standards), then you need to select the “Encoding” item from the main menu and set the one you need using trial and error. This item in most browsers is located in the “View” menu (remember that the main menu, which is not visible in modern versions of web browsers, can always be called up with a key).
"Plain text" problems One of these problems is associated with files in the "plain text" format, although it would seem that it could be simpler? Take a sequence of text characters and write it to a file. But it is precisely because of this simplicity that if a problem arises in them, then in full. If you try to save Russian text through Word (any version after Office 97, including the latest 2010) as “plain text”, you will get a range of single-byte Russian encodings to choose from. By default (Fig. 1), the standard “Windows Cyrillic” (also known as 1251, or ANSI), familiar from DOS, is offered.
Try to do the same through the standard Notepad from Windows 7 - you will already be offered a choice between the usual ANSI and as many as three Unicode options (Fig. 2). There is already an ambush here: a text file in Unicode format must be accompanied by a special header BOM (Byte Order Mark), which determines the byte order (i.e. which byte in a 2-byte character comes first - high or low). Actually, the ambush is that BOM is not a mandatory attribute of a Unicode file, and, on the one hand, it may be absent in texts received from an external source (say, from some Linux programs), on the other hand, it can cause crashes in programs who “don’t understand” this title.
There is only one recipe: whenever possible, avoid “Unicode” in “pure texts” and focus on the familiar ANSI. This will avoid most of the problems associated with Russian text files, although it will limit their portability to English or European versions of the OS.
Another task related to text files, which usually baffles inexperienced users, can be solved by any Microsoft Word, starting with Office XP - this is encountering texts in a non-standard single-byte encoding (for example, old DOS, also known as OEM, or 866) . First, you need to make sure in the settings that the “Confirm file format conversion when opening” option is enabled (by default it is turned off!). In older versions of Word, this setting is located in the Tools/Options menu on the General tab. In Word 2007, click on the button with the Office logo, select “Word Options” at the bottom of the window that opens. In Word 2010, the Options section is accessible through the File menu. In “Options”, go to the “Advanced” item in the sidebar, and then find the “General” section there (Figure 3).
When this function is enabled, you should open an “unreadable” text file through the “Open” menu (and not by clicking from Explorer, which will most likely launch “Notepad”). Then you should select “Recover text from any file” from the drop-down list of file types. The file can, of course, be of any format (i.e., not necessarily with the TXT extension), as long as it contains text and not binary characters.
By the way, you can simply read a document in DOS encoding using Notepad and, in general, any program that can change fonts - just use the font selection menu (in Notepad it’s “Format/Font”) to replace the current one with Terminal. Just remember to return the font back later, otherwise you won’t be able to read normal documents. Unicode in the Clipboard However, such conflicts with text files are quite rare. The average user is much more likely to have problems with Unicode in the Clipboard. A common situation is when, when transferring text from old programs that do not support the specified encoding, or some PDF documents, instead of Russian, something like Auaia iayedai appears when inserting. In most correctly composed applications, simply switching to Russian helps (in the program from which the copying is made, and not at the destination), but often there are particularly stubborn applications and PDF documents, from which information that this is Russian is not available. can be extracted by any force.
Microsoft Word in older versions of Office XP and 2003 can solve this problem, which many people don’t know about - it has a “Fix broken text” function (in the “Tools” menu). The new versions 2007/2010 do not have this function. It was not possible to find anything about this in the help or on the Microsoft website - it is likely that Microsoft thought that there were no products left in nature with similar properties, which, unfortunately, is not true.
You may also not want to waste time looking for an official solution. Then use my program ClipWin (Fig. 4), which can be downloaded at: revich.lib.ru/clipwin.zip. The interface of this program is designed to perform the operation as quickly as possible, but without losing control over its execution: if the text is already in the system “pocket,” just launch the program and click on three times. After the first click, the text (already corrected) will be pasted from the Clipboard into the window for control, after the second, the corrected text will replace the one that was originally, and after the third, your program will close, but the text itself will be corrected and can be pasted anywhere.
What is the correct keyboard layout? In my deep conviction, it is absolutely impossible to exist comfortably and work effectively in the Windows environment without using additional keyboard layout switches - the need to constantly aim with crooked fingers at two keys at the same time causes pain in my fingers just thinking about it. More precisely, this was absolutely impossible until the advent of Vista, where, along with traditional key combinations, it finally became possible to switch the input language with one <Ё> key. But such a solution, at least for the guardians of the purity of the Russian language, is unacceptable - do not enter the letter E every time through a special insert or character table.
The solution to this problem is well known and lies in the use of the popular Punto Switcher, which allows you to configure basically any key to perform an operation (usually the right or ). The program is supported by Yandex; you can download it from the “Programs” section of the search engine (at soft.yandex.ru) or directly at punto.yandex.ru. Alexander Evdokimov recently spoke about this and other similar programs in the article “Letter Correctors.”
Many programs of this type are capable of performing another popular function: switching text already typed in the wrong layout. Depending on the settings, this can even be done automatically, although I personally prefer to disable this automation.
Another misunderstanding related to the language layout is less common, but can add a few unpleasant moments. After switching to English, you may be puzzled by one feature of entering some characters. This applies to such characters as quotation marks, apostrophes (there are two of them - straight and oblique), tilde “~” and circumflex (“lid”) “^”. After pressing the corresponding key, nothing is entered - you must also press either the spacebar to enter the character in its “pure” form, or some other letter or number, then the character will be entered before it. You can press a quotation mark or an apostrophe twice - a pair of characters will appear at once (they are often used in pairs, the text is then entered between them).
This not always convenient feature of English-language computer input can arise if for some reason the English language is set to the “US-international” layout (or, perhaps, one of the European ones, which we do not use). The misunderstanding is easily resolved by changing the English layout back to the default, simply “USA”. In Windows 7, this setting can be accessed in the Control Panel through “Regional and Language Options / Languages and Keyboards / Change Keyboard / General / Add” (Fig. 5).
Final advice If you have other problems with encodings that are not described in this article, then most likely they are caused by installing the “wrong” version of Windows - for example, American or European with an additional language pack. To avoid such problems, you should always use the Russian version whenever possible. But the way life has turned out for Russian-speaking Windows users is that there was, is and will be a problem with encodings. Its severity decreases as software updates occur, but the problem will likely never go away completely.
Source : Hard'n'Soft
Author : Yuri Revich
Changing the encoding in Notepad++
This application is used by many programmers to create websites, various applications and much more. Therefore, it is very important to save and create files using the required encoding. In order to configure the desired option for the user, you should:
Step 1. Launch the program and select the “Encodings” tab in the top context menu.
Step 2. In the drop-down list, the user needs to select the required encoding from the list and click on it.
Step 3. It is easy to check the correctness of the procedure by paying attention to the bottom panel of the program, which will display the newly changed encoding.
Important! Before starting to work in Notepad++, it is first recommended to check the installed encoding. If necessary, it must be changed using the instructions given earlier.