(emacs.info) MS-DOS and MULE

Info Catalog (emacs.info) MS-DOS Printing (emacs.info) MS-DOS (emacs.info) MS-DOS Processes
 
 International Support on MS-DOS
 ===============================
 
    Emacs on MS-DOS supports the same international character sets as it
 does on Unix and other platforms ( International), including
 coding systems for converting between the different character sets.
 However, due to incompatibilities between MS-DOS/MS-Windows and Unix,
 there are several DOS-specific aspects of this support that users should
 be aware of.  This section describes these aspects.
 
 `M-x dos-codepage-setup'
      Set up Emacs display and coding systems as appropriate for the
      current DOS codepage.
 
 `M-x codepage-setup'
      Create a coding system for a certain DOS codepage.
 
    MS-DOS is designed to support one character set of 256 characters at
 any given time, but gives you a variety of character sets to choose
 from.  The alternative character sets are known as "DOS codepages".
 Each codepage includes all 128 ASCII characters, but the other 128
 characters (codes 128 through 255) vary from one codepage to another.
 Each DOS codepage is identified by a 3-digit number, such as 850, 862,
 etc.
 
    In contrast to X Windows, which lets you use several fonts at the
 same time, MS-DOS doesn't allow use of several codepages in a single
 session.  Instead, MS-DOS loads a single codepage at system startup,
 and you must reboot MS-DOS to change it(1).  Much the same limitation
 applies when you run DOS executables on other systems such as
 MS-Windows.
 
    If you invoke Emacs on MS-DOS with the `--unibyte' option (
 Initial Options), Emacs does not perform any conversion of non-ASCII
 characters.  Instead, it reads and writes any non-ASCII characters
 verbatim, and sends their 8-bit codes to the display verbatim.  Thus,
 unibyte Emacs on MS-DOS supports the current codepage, whatever it may
 be, but cannot even represent any other characters.
 
    For multibyte operation on MS-DOS, Emacs needs to know which
 characters the chosen DOS codepage can display.  So it queries the
 system shortly after startup to get the chosen codepage number, and
 stores the number in the variable `dos-codepage'.  Some systems return
 the default value 437 for the current codepage, even though the actual
 codepage is different.  (This typically happens when you use the
 codepage built into the display hardware.)  You can specify a different
 codepage for Emacs to use by setting the variable `dos-codepage' in
 your init file.
 
    Multibyte Emacs supports only certain DOS codepages: those which can
 display Far-Eastern scripts, like the Japanese codepage 932, and those
 that encode a single ISO 8859 character set.
 
    The Far-Eastern codepages can directly display one of the MULE
 character sets for these countries, so Emacs simply sets up to use the
 appropriate terminal coding system that is supported by the codepage.
 The special features described in the rest of this section mostly
 pertain to codepages that encode ISO 8859 character sets.
 
    For the codepages which correspond to one of the ISO character sets,
 Emacs knows the character set name based on the codepage number.  Emacs
 automatically creates a coding system to support reading and writing
 files that use the current codepage, and uses this coding system by
 default.  The name of this coding system is `cpNNN', where NNN is the
 codepage number.(2)
 
    All the `cpNNN' coding systems use the letter `D' (for "DOS") as
 their mode-line mnemonic.  Since both the terminal coding system and
 the default coding system for file I/O are set to the proper `cpNNN'
 coding system at startup, it is normal for the mode line on MS-DOS to
 begin with `-DD\-'.   Mode Line.  Far-Eastern DOS terminals do
 not use the `cpNNN' coding systems, and thus their initial mode line
 looks like on Unix.
 
    Since the codepage number also indicates which script you are using,
 Emacs automatically runs `set-language-environment' to select the
 language environment for that script ( Language Environments).
 
    If a buffer contains a character belonging to some other ISO 8859
 character set, not the one that the chosen DOS codepage supports, Emacs
 displays it using a sequence of ASCII characters.  For example, if the
 current codepage doesn't have a glyph for the letter `o`' (small `o'
 with a grave accent), it is displayed as `{`o}', where the braces serve
 as a visual indication that this is a single character.  (This may look
 awkward for some non-Latin characters, such as those from Greek or
 Hebrew alphabets, but it is still readable by a person who knows the
 language.)  Even though the character may occupy several columns on the
 screen, it is really still just a single character, and all Emacs
 commands treat it as one.
 
    Not all characters in DOS codepages correspond to ISO 8859
 characters--some are used for other purposes, such as box-drawing
 characters and other graphics.  Emacs cannot represent these characters
 internally, so when you read a file that uses these characters, they are
 converted into a particular character code, specified by the variable
 `dos-unsupported-character-glyph'.
 
    Emacs supports many other characters sets aside from ISO 8859, but it
 cannot display them on MS-DOS.  So if one of these multibyte characters
 appears in a buffer, Emacs on MS-DOS displays them as specified by the
 `dos-unsupported-character-glyph' variable; by default, this glyph is
 an empty triangle.  Use the `C-u C-x =' command to display the actual
 code and character set of such characters.   Position Info.
 
    By default, Emacs defines a coding system to support the current
 codepage.  To define a coding system for some other codepage (e.g., to
 visit a file written on a DOS machine in another country), use the `M-x
 codepage-setup' command.  It prompts for the 3-digit code of the
 codepage, with completion, then creates the coding system for the
 specified codepage.  You can then use the new coding system to read and
 write files, but you must specify it explicitly for the file command
 when you want to use it ( Specify Coding).
 
    These coding systems are also useful for visiting a file encoded
 using a DOS codepage, using Emacs running on some other operating
 system.
 
    ---------- Footnotes ----------
 
    (1) Normally, one particular codepage is burnt into the display
 memory, while other codepages can be installed by modifying system
 configuration files, such as `CONFIG.SYS', and rebooting.
 
    (2) The standard Emacs coding systems for ISO 8859 are not quite
 right for the purpose, because typically the DOS codepage does not
 match the standard ISO character codes.  For example, the letter `c,'
 (`c' with cedilla) has code 231 in the standard Latin-1 character set,
 but the corresponding DOS codepage 850 uses code 135 for this glyph.
 
Info Catalog (emacs.info) MS-DOS Printing (emacs.info) MS-DOS (emacs.info) MS-DOS Processes
automatically generated by info2html