Microsoft Word/COM support for TECkit, CC, and ICU, Perl and Python

About

This package provides tools through which you can change the encoding, font, and/or script of text in Microsoft Word and other Office documents, OpenOffice and LibreOffice documents, XML documents, and SFM text and lexicon documents. It also installs a system-wide repository to manage your encoding converters and transliterators (TECkit, CC, ICU, Perl, or Python based, as well as support for adding custom transduction engines).

For developers, it provides a simple COM interface to select and use a converter from the repository. It is easy to use from VBA, C++, C#, Perl, Python or any .NET/COM enabled language.

The core EncConverters assembly is fully integrated with FLEx (FieldWorks Language Explorer), Speech Analyzer, Phonology Assistant, Adapt It and OneStory Editor software. It provides the same system-wide registry of installed and available encoding converters for all of these user programs. Additionally the package includes some extra utilities such as a clipboard converter for manipulating text between cut and paste operations.

The following picture illustrates the suite of tools, utilities, and applications that are available and how they interact:

Figure 1. SIL Converters Suite

Figure 1. SIL Converters Suite

Figure 1 shows the three distinct layers to SIL Converters.

  • At the top are various client applications. These user-oriented programs use the EncConverters core assembly to provide encoding conversion and other transduction facilities to their users.
  • The EncConverters core provides an abstraction layer so the client applications can access the various transduction engines without having to implement the interface to each one separately.
  • The transduction engines are the server applications that provide the actual conversion/text processing capability.

If you are an end user, you are probably most interested in how to use EncConverters with client applications—for example:

  • Using the Bulk Word Document Converter to convert the encoding of text in one or more Word document to Unicode, or
  • Using Bulk SFM Converter to convert SFM documents into Unicode (typically texts and lexicons from Shoebox to Toolbox)

If you are a developer, you may be interested in

  • Using EncConverters to gain access to the different transduction resources available by writing to the single EncConverters’ interface. See {link:EncConverters40_Usage this webpage} for details and code snippets.

Upgrade! New Features

This version (4.0) was released to fix various bugs including most significantly the removal of the core EncConverters assembly from the Global Assembly Cache. From v4.0 and following, client applications will redistribute the core assemblies ECInterfaces.dll and SilEncConverters.dll with their applications directly. They can still share the same global system repository of activated converters, but there will be less dependency between the various clients in terms of release requirements.

As of v3.1.1, a new transduction engine was added that provides support for the webpage-based converters in the Scientific and Technical Hindi Google Group. The Files section of that group contains a number of webpage-based encoding converters and transliterators for numerous Indic legacy encodings. The conversion code embedded in these web-pages can now be used to convert data with any SIL Converters client application (e.g. the Bulk Word Document Converter) by using the new Technical Hindi Html EncConverter Add-in. To activate this new add-in be sure to check the Maps and Tables, Indic converters feature during installation. Once installed, you can read Help for Technical Hindi (Google group) Html Converter Plug-in (in Start, All Programs, SIL Converters, Help) for further instructions.

The Bulk Word Document Converter was also updated to fix a few problems related to converting text that was inserted into a document a single character at a time (thru the Insert Symbol command).

SILConverters 4.0 corresponds to the same version of the core EncConverters assembly as Fieldworks 7.1 and overcomes the uninstallation problem previously encountered in Speech Analyzer 3.0.1 and Phonology Assistant 3.0. If you uninstalled any of the applications which used the earlier version of the EncConverters core, it would become unavailable to the other applications that used the same, earlier version of the EncConverters core until an installation Repair was done.

Bulk Word Document Converter

The Bulk Word Document Converter has been enhanced by adding a search feature which will search your hard-drive for documents containing specific fonts to be converted.

TECkit Map Unicode Editor

The TECkit Map Unicode Editor has also been enhanced to show character maps for both the left and right-hand side of a conversion so that a point-and-click approach to encoding conversion can be used in developing the map.

Quick Installation Overview

Note: You will need Administrator privilege on the computer to install this software.

The Master Setup program runs a series of installers:

  • Software prerequisites—Necessary system updates and add-ons are installed on your computer.
  • SIL Encoding Converters 4.0 Setup—Conversion applications are installed and conversion Maps and Tables are copied to your hard drive.
  • SIL Converters for Office 2003—Currently this installer only installs an additional operating system update.
  • Converter Option Installer—A utility that allows you to activate the conversion Maps and Tables you want to use.

Full installation instructions can be found here: SIL Converters 4.0 Installation. This document is intended to guide you through the Master Installer installation screens and initial SIL Converters 4.0 Setup. This guide may be sufficient for many users. However, to make full use of SIL Converters, you should download and refer to the Help for SIL Converters documentation (download below).

Downloads

Documentation

SILConverters 4.0 Installation documentationfor all platforms PDF | 359.6 KB | 29 Aug 2011
SILConverters Documentationfor all platforms PDF | 1.95 MB | 29 Aug 2011

SIL Converters 4.0

This version of SILConverters uses the same version of the EncConverters core as FieldWorks 6.0.

SIL Converters Package only (no addons) for offline installation (EXE file) 4.0for Windows EXE | 25.42 MB | 29 Aug 2011
SIL Converters Standalone installer (includes addons like .NET) for offline installation (EXE file) 4.0for Windows EXE | 76.51 MB | 29 Aug 2011

The Package only download is recommended for users that have a fairly good internet connection and want to download an installer that can be run off-line. This is also recommended if you intend to install SIL Converters on multiple machines since this link will download the total install set (less required components, such as .Net 2.0—see Standalone installer)
Download, Extract files and run setup.exe. There are no specific Installation instructions for this. However, you should find the installation instructions in the above pdf helpful.

The Standalone installer is recommended for users that have a very good internet connection and want to download an installer that has all potential prerequisites and can be run off-line. This is also recommended if you intend to install SIL Converters on multiple such machines since this link will download the total install set (including all required components).
Download, Extract files and run setup.exe. There are no specific Installation instruction for this. However, you should find the installation instructions in the above pdf helpful.

SIL Converters Maps and Tables

This section describes the encodings, font names, and converters contained in the different Maps and Tables packages available in the SIL Converters installer. You can check below for the fonts/encodings that you are interested in to see which Maps and Tables package to install.

Most end-users are interested only in a small number of encodings. Typically, computer support people have created TECkit maps and/or CC tables for the various encodings used in each entity, alleviating most end-users from having to create their own maps and tables.

Because there are hundreds of possible encoding converters and transliterators that different end-users may be interested in, they are packaged into logically-related groups of converters and are available via a two-step process.

Steps

  1. Use the SIL Converters installer to install the package(s) or converter likely to be useful to you (e.g. based on your entity).
    • During installation, all the converter maps/tables in the selected package(s) will be installed into a fixed location on your computer (i.e. C:Documents and Settings\All Users\Application Data\SIL\SILConverters40\MapsTables on Windows XP or C:\ProgramData\SIL\SILConverters40\MapsTables on Windows 10).
  2. Use the Converter Options Installer application to install the few applications you want into the EncConverters’ system repository.
    • They become available to SILConverters client applications.

Note: Installing maps and tables onto your computer with the SILConverters installer (step 1 above) will not make them available to SILConverters client applications unless you explicitly add them to the system repository using the Converter Options Installer or some other mechanism (see Adding Converters to the System Repository in the Help for SILConverters document).

Select Features — Optional Maps and Tables
Select Features — Optional Maps and Tables

The following sections give the details about fonts and encodings for different Maps and Tables packages:

Basic Converters

Converters and Transliterators common to all SIL. This includes the following:

Converter NameEncoding NameFont Names
SIL IPA93<>UNICODESIL-IPA93-2001SILDoulos IPA93
SILManuscript IPA93
SILSophia IPA93
SIL-IPA-1990<>UNICODESIL-IPA-1990SILDoulosIPA
SILManuscriptIPA
SILSophiaIPA
SIL Galatia<>UNICODESIL-GREEK_GALATIA-2001SIL Galatia
ISO-8859<>UNICODEISO-8859-1
AMER PHON>UNICODE(SIL)-Amer_Phon_SILDoulosL3-(2005)
SIL PUA 3.2<>UNICODE 4.1
SIL PUA 3.2<>UNICODE 5.0
SIL PUA 3.2<>UNICODE 5.1
Symbol<>cp1252
UTF8<>UTF16
ReverseStringFor reversing the bytes of a “narrow” (bytes) string
nullNo change to string, but can be used to apply a different font to some text (e.g. in the Data Conversion Macro)
NFCConvert to normal form composed
NFDConvert to normal form decomposed

ICU Transliterators

Configuration information for the following ICU transliterators are for Unicode-encodings only.

These are not the only transliterators available via the ICU Transliterator transduction engine, but are only a few of the pre-defined latinizing (or romanizing) transliterators that can be useful in different client applications for different ranges of Unicode.

  • Devanagari to Latin (aka. Devanagari-Latin)
  • Bengali to Latin (aka. Bengali-Latin)
  • Gujarati to Latin (aka. Gujarati-Latin)
  • Gurmukhi to Latin (aka. Gurmukhi-Latin)
  • Kannada to Latin (aka. Kannada-Latin)
  • Malayalam to Latin (aka. Malayalam-Latin)
  • Oriya to Latin (aka. Oriya-Latin)
  • Tamil to Latin (aka. Tamil-Latin)
  • Telugu to Latin (aka. Telugu-Latin)
  • Arabic to Latin (aka. Arabic-Latin)
  • Cyrillic to Latin (aka. Cyrillic-Latin)
  • Greek to Latin (aka. Greek-Latin)
  • Han to Latin (aka. Han-Latin)
  • Hangul to Latin (aka. Hangul-Latin)
  • Hebrew to Latin (aka. Hebrew-Latin)
  • Hiragana to Latin (aka. Hiragana-Latin)
  • Katakana to Latin (aka. Katakana-Latin)
  • Jamo to Latin (aka. Jamo-Latin)
  • NumericPinyin to Latin (aka. NumericPinyin-Latin)
  • Any to Latin (aka. Any-Latin)

Note: These transliterators can be daisy-chained together to transliterate between non-Latin scripts using a Compound meta-converter. For example, chaining the Devanagari-Latin transliterator (in the Forward direction) with the Arabic-Latin transliterator (in the Reverse direction) gives a ‘Devanagari-Arabic’ transliterator.

FindPhone to IPA converters

Adds the following converters for dealing with FindPhone encoded data:

  • FindPhone>SIL IPA93
  • FindPhone>UNICODE

SAG Indic

Contains encoding converter map(s) for the following encoding/font:

Converter NameEncoding NameFont Names
Annapurna<>UNICODESIL-ANNAPURNA_05-2002Annapurna
SAG IPA<>UNICODESIL-SAG-IPASAG-IPA SILDoulos
SAG-IPA SILManuscript
SAG-IPA SILSophia
SAG IPA Super<>UNICODESIL-SAG-IPA_SuperSAG-IPA Super SILCharis
SAG-IPA Super SILDoulos
SAG-IPA Super SILManuscript
SAG-IPA Super SILSophia
WinDTS Devanagari<>UnicodeSIL-WinDTSWinDTS Devanagari
TransRoman<>UNICODESIL-SAG_TransRoman21-2002TransRoman2 Charis
TransRoman2 Doulos
TransRoman2 Manuscript
TransRoman2 Sophia
AkrutiOriSarala99<>UNICODEOriya-AkrutiOriSarala-99AkrutiOriSarala-99

Cameroon

Contains encoding converter map(s) for the following encoding/fonts:

Converter NameEncoding NameFont Names
Cameroon<>UNICODECameroonCam Cam SILDoulosL
Cam Cam SILSophiaL
Cam Cam SILManuscriptL
Cam2 Cam2 SILDoulos
Cam2 Cam2 SILSophia
Cam2 Cam2 SILManuscript
Cam Paratext SILDoulos
Cam Paratext SILSophia
Cam Paratext SILManuscript

Central Africa

Contains encoding converter map(s) for the following encoding:

Converter NameEncoding Name
angb4<>UNICODESIL-angb4-2005
MarcelNgbaka<>UNICODESIL-MarcelNgbaka-2005

East Africa

Contains encoding converter map(s) for the following encoding/fonts:

Converter NameEncoding NameFont Names
Times African<>UNICODETimes AfricanTimes African
Bantu Und<>UNICODEBantu UndBantu Und

Eastern Congo Group

Contains encoding converter map(s) for the following encoding/fonts:

Converter NameEncoding Name
Mayogo<>UNICODEMayogo
Komo<>UNICODEKomo
KomoASCII to UnicodeKomoASCII
ECG<>UNICODEECG-Unicode(Jan.2005)
BuduASCII<>UNICODEBuduASCII
BUDU<>UNICODEBUDU
BheleASCII<>UNICODEBheleASCII

West Africa

Contains encoding converter map(s) for the following encoding/fonts:

Converter NameEncoding Name
SIL-93linb-2005<>UNICODESIL-93linb-2005
UBS-Abidjan-2005<>UNICODEUBS-Abidjan-2005
Bambara SIL Charis<>UNICODEBambara SIL Charis
SIL-BF Font Family-2005<>UNICODESIL-BF_Font_Family-2005
SIL-BF_Times-2006<>UNICODESIL-BF_Times-2006
X-SIL-Fulfulde<>UNICODEX-SIL-Fulfulde
SIL-Ghana Doulos-2005<>UNICODESIL-Ghana_Doulos-2005
SIL-Mali Standard Font Family<>UNICODEMali Standard SILDoulos-2005
RCI Standard Doulos/Sophia/Manuscript<>UNICODESIL-RCI Standard-1994
X-SIL-Senufo<>UNICODEX-SIL-Senufo
SIL-Karaboro-2006<>UNICODESIL-Karaboro-2006
SIL Samogho Doulos/Sophia/Manuscript<>UNICODESIL-Samogho-2006
SIL-Songhai-2006<>UNICODESIL-Songhai-2006
Tombouctou-Dutch<>UNICODESIL-Tombouctou-Dutch-2006
Burkina Faso Winye-2003<>UNICODESIL-Burkina_Winye_Unknown_Font-2005

Hebrew

Contains encoding converter map(s) for the following encoding/fonts:

Converter NameEncoding NameFont Names
SIL Ezra<>UNICODESIL-HEBREW_STANDARD-1997SIL Ezra
Hebrew Unicode 4.0<>Hebrew Unicode 5.0SIL-HEBREW_Unicode_40-2004Modifies Unicode Hebrew from 4.0 to 5.0

Indic Converters

ISCII Encodings

The following ISCII encodings are supported:

Converter Name
ISCII Devanagari<>UNICODE
ISCII Bengali<>UNICODE
ISCII Gurmukhi<>UNICODE
ISCII Gujarati<>UNICODE
ISCII Oriya<>UNICODE
ISCII Tamil<>UNICODE
ISCII Kannada<>UNICODE
ISCII Malayalam<>UNICODE

Himalli

The following Himalli encodings are supported:

Converter NameEncoding NameFont Names
HimaliNew Devanagari<>UNICODEDevanagari-HimaliNewFor use with the Himali New font
Himallill Devanagari (Mac)<>UNICODEDevanagari-HimallillMac-1999For use with files that use the Mac version of Himallill font
Himallill Devanagari (PC 2001)<>UNICODEDevanagari-HimallillPC-2001For use with PC files using the Himallill font named Himallil.ttf, dated 11-Dec-2001
Himalli Devanagari (Mac)<>UNICODEDevanagari-HimalliMac-1999For use with files that use the Mac version of Himalli font
Himalli Devanagari (PC 1998)<>UNICODEDevanagari-HimalliPC-1998For use with PC files using the PC Himalli font named himalli.ttf dated 12-May-1998
Himalli Devanagari (PC 2002)<>UNICODEDevanagari-HimalliPC-2002For use with PC files using the PC Himalli font named himalli_.ttf (note underscore) dated 18-Dec-2002

Miscellaneous TECkit Converters

This package contains TECkit maps for the following Indic encodings:

Converter NameFont Names
GujaratiLS<>UNICODEGujaratiLS
KrutiDev010<>UNICODEKrutiDev010
KrutiDev011<>UNICODEKrutiDev011
KrutiDev290<>UNICODEKrutiDev290
Kantipur Devanagari<>UnicodeKantipur
Preeti Devanagari<>UnicodePreeti
Shusha<>UnicodeShusha
Tibetan Modern A<>UnicodeTibetan Modern A
UniDevanagri<>UniIPA (phonetic)Transliteration between Unicode Devanagari and Unicode IPA (phonetic) representation

Papua New Guinea

Contains encoding converter map(s) for the following encoding/fonts:

Converter NameEncoding NameFont Names
SIL PNG<>UNICODESIL-PNG_Fonts-1998PNG SILCharis
PNG SILDoulos
PNG SILManuscript
PNG SILSophia Lit
PNG SILCharis Lit
PNG SILSophia CQLit

NLCI (India)

Contains encoding converter map(s) for the following encoding/font:

Converter NameEncoding NameFont Names
SL Oriya<>UNICODENLCI-SLOriya
Winscript/iLeap Devanagari<>UNICODECDAC-ISFOC_DEVANAGARIDEV Panini
DV-TTYogesh
Winscript/iLeap Gujarati<>UNICODECDAC-ISFOC_GUJARATIGUJ Gir
Winscript Malayalam<>UNICODENLCI-MalayalamMAL Vayalar
Winscript Oriya<>UNICODENLCI-OriyaORI Asika
Winscript Tamil<>UNICODENLCI-TamilTAM Thiruvalluvar
Winscript Telugu<>UNICODENLCI-TeluguTEL Nirmal

Related resources

Contact

If you would like to report a problem, you can create an issue in SIL Converter’s issue tracker. Or, you can send an email via the contact form below.


    Your Name (required)

    Your Email (required)

    Font

    Subject

    Your Message