How to get TECkit, CC, ICU, Perl and Python for text processing with one interface

SIL Converters Home

This document is primarily for developers who are interested in adding EncConverter support to their application.

There are several functions in the EncConverter(s) interfaces that greatly simplify the use of the EncConverters core in various client applications. Through these functions, the various Transduction Engine plug-ins provide their own user-interface for adding and configuring converters, eliminating the need for client applications to create their own user-interface elements for the various transduction engines.

This article shows the new functions, some snippets of code to invoke them, and examples of the resulting user-interface dialog boxes.

Note:
The functions described here have not changed significantly since version 2.2, however, the program ID string is different in different versions. If you are using the EncConverters core version 4.0, then you use the namespace prefix, “SilEncConverters40” as shown below. If you are using an earlier version (e.g. SILConverters 3.1 or earlier), you will want to use “SilEncConverters31” in the strings given in the snippets of code below.

AutoSelect

The AutoSelect function can be used by client application code to launch the Select Converter dialog box, which allows the user to choose the transducer they want to use:


Select Converter Dialog

In this dialog box, the user can select the desired converter from the list of available converters in the System Repository and also select the Conversion Options, such as Direction, Normalization of the output data, and Debug mode for getting feedback about what data was sent and received by the underlying conversion engine.

Once configured, the user clicks OK, and the IEncConverter object corresponding to the selected transducer is returned to the client application for use.

This figure shows the function prototype:

Function Prototype
IEncConverter AutoSelect(ConvType eConversionTypeFilter)

The ConvType parameter can be used to have the Select Converter dialog filter the list of converters being displayed. Depending on the needs of the client app, you can limit the list to only Legacy to Unicode Encoding converters, for example, by passing in the value:

ConvType.Legacy_to_from_Unicode

If your application only supports Unicode encodings on both sides of a transduction process, then you can use:

ConvType.Unicode_to_from_Unicode

To have the Select Converters dialog box display all of the converters in the repository, use:

ConvType.Unknown

The following code snippets show how to call the AutoSelect function in various different programming environments.

Note:

The ECInterfaces.dll file contains definitions of the IEncConverter (transducer) and IEncConverters (collection/repository) COM interfaces. The SilEncConverters40.dll file contains the .Net implementations of those interfaces (as the EncConverters object and the transducer abstract base class, EncConverter).

New as of EncConverters 4.0, these assemblies are now installed by each individual client application in the same folder as the application to avoid multiple client interaction (e.g. one client changing something requiring other clients to have to be re-released as well). So if you’re wanting to use and redistribute the EncConverters core, you include the ECInterfaces.dll and SilEncConverters40.dll files along with your application in the install folder.

YOU ALSO NEED to REDISTRIBUTE THE PLUG-IN FILES….

If your program is a .Net client, then you should add a reference to both assemblies. If you are accessing the EncConverters core thru COM (e.g. C++/ATL), then you only need to include the reference to the ECInterfaces.dll assembly.

// C# .Net snippet
using ECInterfaces;       // Add Reference to ECInterfaces.dll
using SilEncConverters40; // Add Reference to SilEncConverters40.dll

...
public void TestAutoSelect()
{
    // get an instance of the repository object
    EncConverters aECs = new EncConverters();

    // call AutoSelect to query the user for the converter to use
    IEncConverter aEC = aECs.AutoSelect(ConvType.Unknown);

    // Always check the return in case the user Cancelled the dialog 
    if (aEC != null)
    {
        // call the 'Convert' function to do a conversion
        string strIn = "bccdèéêfg";
        string strOut = aEC.Convert(strIn);

        MessageBox.Show(String.Format("'{1}' became '{0}'", strOut, strIn));
    }
}

' VB.Net snippet
Imports ECInterfaces           ' Add Reference to ECInterfaces.dll
Imports SilEncConverters40     ' Add Reference to SilEncConverters40.dll
...
Public Sub TestAutoSelect()

    ' get an instance of the repository object
    Dim aECs As New EncConverters

    ' call AutoSelect to query the user for the converter
    Dim aEC As IEncConverter = aECs.AutoSelect(ConvType.Unknown)

    ' Always check the return in case the user Cancelled the dialog
    If (Not aEC Is Nothing) Then

        ' call the 'Convert' function to do a conversion
        Dim strIn As String = "bccdèéêfg"
        Dim strOut As String = aEC.Convert(strIn)
        MessageBox.Show(String.Format("'{0}' becomes '{1}'", strIn, strOut))

    End If

End Sub

' VBA snippet (for Word, Excel, Access, Publisher, etc, macros)
' Be sure to Add a Reference to the ECInterfaces.tlb file
' via Tools, References, and then Browse for the .tlb file

Sub TestAutoSelect()
    Dim aEC As IEncConverter    ' variable for the converter
    Dim aECs As IEncConverters   ' variable for the repository

    ' get an instance of the repository object
    Set aECs = CreateObject("SilEncConverters40.EncConverters")

    ' call AutoSelect to query the user for the converter
    Set aEC = aECs.AutoSelect

    ' Always check the return in case the user Cancelled the dialog
    If (Not aEC Is Nothing) Then

        ' call the 'Convert' function to do a conversion
        Dim strIn, strOut As String
        strIn = "bccdèéêfg"
        strOut = aEC.Convert(strIn)
        strRes = "/" & strIn & "/ became /" & strOut & "/"
        MsgBox strRes

    End If

End Sub

// C++ snippet (using ATL's CComPtr and CComBSTR helper classes)

// add ECInterfaces.tlb to the local project folder
#import "mscorlib.tlb"  raw_interfaces_only
#import "ECInterfaces.tlb"  raw_interfaces_only
using namespace ECInterfaces;

void TestAutoSelect()
{
    // get an instance of the repository object
    CComPtr aECs;
    aECs.CoCreateInstance(L"SilEncConverters40.EncConverters");

    // if it worked, call AutoSelect to query the user for the converter
    CComPtr aEC;
    if( !!aECs && (aECs->AutoSelect(ConvType_Unknown, &aEC) == S_OK) )
    {
        // Always check the return in case the user Cancelled the dialog
        if( !!aEC )
        {
            // call the 'Convert' function to do a conversion
            CComBSTR strIn = L"bccdèéêfg";
            CComBSTR strOut;
            if( aEC->Convert(strIn, &strOut) == S_OK )
            {
                CString strFormat;
                strFormat.Format(_T("'%s' becomes '%s'"), 
                    (LPCTSTR)strIn, (LPCTSTR)strOut);
                MessageBox(0, strFormat, _T("TestAutoSelect"), MB_OK);
            }
        }
    }
}

AutoSelectWithTitle

The AutoSelectWithTitle method is the same as AutoSelect described above, but allows the client application to provide a string which will be used in the Choose Converter dialog box frame. This is useful, for example, if your application queries the user for several different converters and you want the dialog box frame (top bar) to display some text to inform the user of which one to select for a given occurrence.

This method is available on the IEncConverters interface.

Function Prototype

IEncConverter AutoSelectWithTitle(ConvType eConversionTypeFilter, string strChooseConverterDialogTitle)

As above, the ConvType parameter can be used to have the Select Converter dialog filter the list of converters being displayed. Depending on the needs of the client app, you can limit the list to only Legacy to Unicode Encoding converters, for example, by passing in the value:

ConvType.Legacy_to_from_Unicode<

If your application only supports Unicode encodings on both sides of a transduction process, then you can use:

ConvType.Unicode_to_from_Unicode

To have the Select Converters dialog box display all of the converters in the repository, use:

ConvType.Unknown

The string strChooseConverterDialogTitle parameter is the string that will be displayed in the Choose Converter dialog box frame.

AutoSelectWithData

The AutoSelectWithData method is the same as AutoSelect described above, but allows the client application to provide a string or byte array of data and a font name which will be used in a preview pane of the Choose Converter dialog box. This is useful, for example, to allow your users to try the different transducers on the data to see which one produces the desired output.


Select Dialog with Preview Pane

This method is available on the IEncConverters interface.

Function Prototype

IEncConverter AutoSelectWithData(byte[] abyPreviewData, string strFontName, ConvType eConversionTypeFilter, string strChooseConverterDialogTitle)

or

IEncConverter AutoSelectWithData(string strPreviewData, string strFontName, ConvType eConversionTypeFilter, string strChooseConverterDialogTitle)

Where:

byte[] abyPreviewData

This parameter is to pass sample data to the Select Converter dialog box to be converted using the converter the user selects and the result will be displayed in the preview pane. This parameter may also be passed to this method from C++ or other non-.Net clients as a SafeArray of type VT_UI1 elements (unsigned bytes). This is useful if you are dealing with 8-bit legacy data prior to encoding conversion to Unicode.

string strFontName

This parameter can be used to pass the display name of a font typeface to use in the preview window (e.g. “Arial Unicode MS”).

See above for the definition of the ConvType eConversionTypeFilter and string strChooseConverterDialogTitle parameters.

string strPreviewData

This parameter is to pass sample data to the Select Converter dialog box to be converted using the converter the user selects and the result will be displayed in the preview pane. This parameter may also be passed to this method from C++ or other non-.Net clients as a COM BSTR.

Configure

While most client applications will probably want to use the AutoSelect function discussed above, there is another interface that can be used when you know exactly which transduction engine implementation you want your users to use.

That is, one of the main advantages of using EncConverters is that your client application doesn’t have to concern itself with whether the underlying conversion engine is a CC Table, TECkit map, Perl Expression, etc. With exactly the same interface (i.e. nominally the Convert call), your client application can get the converted value back regardless of the underlying process.

However, if for whatever reason, you want to limit your users to a particular implementation type, you can use the Configure method (of the IEncConverterConfig interface) to acquire a converter of a particular type.

Calling the Configure method will cause a three-tab About / Setup / Test Area dialog box to be displayed for the corresponding converter. For example, here is the dialog box for the Perl Expression EncConverter implementation:


Perl Expression Converter Setup dialog

Each EncConverter implementation has its own version of this dialog box to query for whatever information is necessary to its operation. Each implementation differs in complexity. So, for example, the TECkit Setup tab has only a browse button for TECkit map file and an edit control for showing the file spec.

The dialog box also has an About help tab explaining the details of the converter and its configuration details and a Test Area tab for checking the converter with sample data.

Once configured, the user clicks OK, and the selected IEncConverter object is returned to the client application for use.

In addition to the Configure method there’s another helper method available (from the IEncConverters interface) for acquiring an unconfigured IEncConverter object of the appropriate implementation type: NewEncConverterByImplementationType.

This figure shows the function prototype of both methods:

Function Prototypes:

IEncConverter NewEncConverterByImplementationType(string strImplType)

string strImplType

This parameter indicates the Implementation Type of the converter being requested. The values it takes may change as new implementations are added to the EncConverters suite, but as of v2.6, the following implementation types are available:

“SIL.cc” (i.e. CC Table)

“SIL.tec” (i.e. TECkit map)

“SIL.tecForm” (i.e. TECkit for converting between UTF flavors)

“SIL.map” (i.e. TECkit map – auto-compiling)

“cp” (i.e. Code Page Converter)

“SIL.comp” (i.e. Compound (daisy-chained) Converter)

“ICU.trans” (i.e. ICU Transliterator)

“ICU.conv” (i.e. ICU Converter)

“ICU.regex” (i.e. Regular Expression Find and Replace (ICU))

“SIL.PyScript” (i.e. Python Script)

“SIL.PerlExpression” (i.e. Perl Expression)

“SIL.fallback” (i.e. Primary-Fallback Converter)

“SIL.AdaptItKB” (i.e. AdaptIt Knowledge Base Lookup Converter)

“SIL.AdaptItKBGuesser” (i.e. AdaptIt Target Word Guesser)

“SIL.TechHindiWebPage” (i.e. Technical Hindi (Google group) Html Converter)

Each of these strings is available as public string constant of the EncConverters’ implementation class (e.g. EncConverters.strTypeSILcc).

bool Configure (IEncConverters aECs, string strFriendlyName, ConvType eConversionType, string strLhsEncodingID, string strRhsEncodingID)

The parameters, most of which can be left default, are as follows:

IEncConverters aECs

This is the interface pointer for the repository object, through which dialog box will add any newly created converters. Required.

string strFriendlyName

This is the friendly name to give the converter (if the client app wants to specify it; otherwise, it can be null to allow the user to choose the name).

ConvType eConversionType

If the client application only deals with Unicode to Unicode conversions, for example, you can provide the appropriate ConvType value and then the radio buttons for specifying the type will be left out of the dialog box (i.e. to simplify the dialog). Otherwise, use ConvType.Unknown

string strLhsEncodingID

This allows the client app to specify what the encoding ID of the left-hand side of the conversion is (e.g. UNICODE). Can be null.

string strRhsEncodingID

This allows the client app to specify what the encoding ID of the right-hand side of the conversion is (e.g. UNICODE). Can be null.

The following code snippets show how to call the Configure function in C#:

// C# snippet
using ECInterfaces;       // Add Reference to ECInterfaces.dll
using SilEncConverters40; // Add Reference to SilEncConverters40.dll
...
public void TestConfigure()
{
    // get an instance of the repository
    EncConverters aECs = new EncConverters();

    // get a new, empty IEncConverter object of the correct implementation type
    IEncConverter aEC = aECs.NewEncConverterByImplementationType(
                                    EncConverters.strTypeSILtec);

    // get the configuration interface for this type
    IEncConverterConfig aConfigurator = aEC.Configurator;

    // there may be some implementation types that don't provide their 
    // own user-interface, so always check before using it
    if( aConfigurator != null )
    {
        // call its Configure method to do the UI
        if(aConfigurator.Configure(aECs, null, ConvType.Unknown, null, null))
        {
            // if we reach here, then the converter must have been 
            // configured; though, in fact, it might not have been
            // added to the System Repository (i.e. it might just 
            // be a temporary converter). You can use the 
            // rConfigurator.IsInRepository method to detect if it
            // was added or not
            // now, call the 'Convert' function to do a conversion
            string strIn = "bccdèéêfg";
            string strOut = aEC.Convert(strIn);

            MessageBox.Show(String.Format("'{1}' became '{0}'", 
                        strOut, strIn));
        }
    }
}

Custom transduction engines

In addition to providing support in client applications for accessing different transduction engines, it is possible to plug any arbitrary transduction engine into the SILConverter’s paradigm so that it becomes available to any of the EncConverter’s clients. This can be done by implementing the IEncConverter interface as defined in the ECInterfaces.dll .Net assembly.

If your transduction engine is in an exe which processes text via standard in/out or in a DLL, there are several sub-classes in that assembly that simplify the process of adding a transduction engine.