HTML Logo by World Wide Web Consortium (www.w3.org). Click to learn more about our commitment to accessibility and standards.

ocPortal Tutorial: Localisation and internationalisation

Written by Chris Graham, ocProducts
ocPortal has support for internationalisation, including:
  • time zones
  • translation of text into different languages (.ini or .po files)
  • translation of text into different languages (Comcode pages)
  • translation of text into different languages (text files)
  • translation of images into different languages (e.g. labelled buttons)
  • different character sets (for example, Cyrillic)
  • different locales, for different numbering systems (for example, European comma and decimal-point difference)
  • there is support for translating content into different languages


Time zones

In ocPortal, time zones can be adjusted in two ways:
  • adjusting the site time-zone relative to server time (this is a site configuration option). This is convenient if the server is located in a different time-zone to the site (e.g. A British website using an American hosting company)
  • adjusting member time-zones, relative to the site time-zone (OCF only)

Language file format (technical overview)

This section will describe the format used to store language string s in ocPortal. In theory, this is not needed to be known, as an interface in the Admin Zone is provided that works with this behind-the-scenes; however it is useful to know, especially if you are wishing to work through the language files in a text editor.

ocPortal language packs are made up of .ini files, containing mappings between special codes (based on the English) and the actual string as displayed. For example, a common string in the 'global' language file (the one containing common strings used throughout the portal), is coded as:

Code

PROCEED=Proceed
ocPortal is developed in British English, and this is technically known as the 'fall-back language', because it always has a complete set of language files and strings.
Thumbnail: Choosing a language and language file to edit in the language editor

Choosing a language and language file to edit in the language editor


The .ini files for any translation are stored together in a directory that is named with the standard two-letter code to denote that language; for example, English is 'EN'. A list of these codes is in lang/langs.ini.

All bundled languages packs are located in the 'lang' directory of ocPortal. There is also a 'lang_custom' directory which contains custom language packs, or language packs that 'override' those available in the 'lang' directory on a file-by-file basis. Whenever language files are edited in the Admin Zone, the file is automatically overridden to a lang_custom one if it has not been already.

Not all language files need to be translated, and language files do not have to be complete, as if a string cannot be found and the fall-back language (English) isn't being used, ocPortal will look in the English language pack using the fall-back mechanism.
Thumbnail: Using the language editor to translate language strings

Using the language editor to translate language strings



The language editor (i.e. how to change strings)

The language editor allows you to translate 'strings' so that your website is displayed in a language other than the original British English. Alternatively, you may just wish to change language strings to change the 'style' of the website.

The language editor is very easy to use. All you need to do is go to the translation module, choose your language, choose the language file to translate, and then you are presented with an interface to translate the strings.
A small level of integration is provided for languages which Google can translate, so as to provide a guide.

You can reach the language editor from the 'Style' section of the Admin Zone, under the 'Language' icon.

We now recommend doing translations via Launchpad. See the "Collaborative translations on Launchpad" section.

Special strings

Language string codes that are in lower-case are special strings, that should not be translated directly. These strings contain encoded information relating to the language pack.

String codename Purpose
charset The character set needed for the language (standard code for an ASCII character set). Many people change this to 'utf-8' (Unicode, works with any characters), although regional character sets are supported also.
dir The direction of text (usually ltr, but sometimes rtl for languages such as Arabic). An "rtl" language would likely require many few template changes as well as language changes. If someone does this we would consider integrating the changes back into a future version of ocPortal.
locale The locale: there are standard locale codes for unix, based on language codes, but they vary across operating system: use what works on your server.
The locale code is used to prepare certain operating system date strings, and number formatting.
en_right Sometimes templates have to apply CSS properties values of 'left' or 'right', according to the text direction. For a rtl language, this becomes 'left' instead of 'right'.
en_left As above, but opposite.
language_author Your name
date_* / time_* / calendar_* Date/time formatting in one of the two PHP time formats (1, 2). If there are no '%' signs it's "date", if there are % signs it's "strftime".
dont_escape_trick Ignore this one


Character sets

There are three systems that are in common usage to allow diverse characters to be displayed in a document:
  • HTML entities
  • Unicode
  • Character sets

ocPortal supports character sets. In some places, HTML entities will work, but there are definitely places where, in the current version of ocPortal, they will not. Unicode is not ideal for PHP systems like ocPortal, due to the 'binary safe' design of PHP strings – however, in practice it does work due to backwards-compatibility in Unicode and the fact that ocPortal will take Unicode into account when it matters.

To understand character sets, you need to understand how strings (or text files) are composed. Each character (a symbol, represented by a 'glyph' on the screen) is essentially represented a number, 0-255; 0-127 are usually standard, and specified using the '7-bit ASCII code': the 128-255 range is essentially free, and what the numbers map to depends on the 'character set' used. As different languages use different characters (for example, accented characters, or a whole different alphabet, or even a pictographical language), different languages use different character sets.

A file that uses 'high' characters will look different when viewed in editors set to different character sets. In order to put in text in the appropriate character set, and to view it, your editor must be set to it; this is to be expected to be by default if you are translating to your native language.

Comcode (and HTML) pages

In addition to the language files, Comcode and HTML pages may be translated. To translate a Comcode page, either manually copy the Comcode page .txt file from the pages/comcode/EN directory, to the appropriate pages/comcode/<lang> directory and change it there, or simply choose the target language and edit the file using ocPortal.

As HTML pages are created outside ocPortal, you must manually copy the file in the equivalent way to as stated for Comcode pages.

Text files

There are some other text files you might want to translate are, in a similar way to Comcode pages (see above):
  • text/EN/quotes.txt
  • text/EN/rules*.txt
And these files don't need translating but could be replaced with equivalents in your language:
  • text/EN/too_common_words.txt (a list of words that should not be considered in search results, for example)
  • text/EN/word_characters.txt (a list of characters that appear in words in your language – most languages have all the English characters, but also accented ones)

None of these files are very important, only translate them if you want to.

Images

If you look under the themes/default/images/ directory you will see there is an EN directory that contains images with English text on. You can copy this to the ISO codename of your language pack (e.g. FR), and then replace the images with translated ones. Make sure you clear your theme image cache (Admin Zone, Tools section, Cleanup tools icon) after doing this. We have the PSD files (requires Adobe Photoshop or compatible software) for many of the images in our downloads database. The font is a commercial font called 'Kabel', so you may wish to use a free font like 'Arial' instead.

WYSIWYG editor

ocPortal uses a third-party WYSIWYG editor – a modified version of AreaEdit, which itself is a modified version of Xinha.
You need to make sure you have translated versions of all data/areaedit/*/lang/<lang>.js files. There are quite a few translations already in there, but there are also many gaps.

Alternatively if you don't want to worry about the WYSIWYG editor, remove this code from your HEADER template:

Code

,ocp_lang='{$LANG;}'

MySQL collations

MySQL has 'collations' which basically sets the MySQL character set. ocPortal does not handle these, it uses whatever is there.
This generally does not matter a lot, because anything that you ask to store will be correctly stored and retrieved regardless of collation. However, it does make a small difference in searches. For example, in languages there are usually 'equivalent' characters (e.g. lower case and upper case), and the MySQL collation tells MySQL about those. Set your collation appropriately if you don't think that your MySQL server's default collation will be correct for your language. You can set this when you create your MySQL database, or if your database already exists, you need to set it for each of the tables.

You may need to add something like:

Code

$SITE_INFO['database_charset']='utf8';
to your info.php file if you set a MySQL character set different to the server default (substitute 'utf8' for the actual character set you're using).
After doing this you may need to clear the cache from inside the upgrader.php script, as ocPortal's cache data will no longer be read out in the same way it had been written in.

The normal Western European collation (used by English) is 'latin1_swedish_ci'. If anybody wonders why 'Swedish' is used for 'English', it is because English does not use accented characters and hence was considered a subset of Swedish, which does.

GD fonts

If you find that the vertical text shown on permission editing interfaces is incorrect, it may be due to an incompatibility between PHP and the free Bitstream fonts that ocPortal bundles.
This is known to happen with Russian characters. The solution is to replace the data/fonts/Vera.ttf file with Verdana.ttf from your own computer. We would distribute this file with ocPortal, except we don't have a license to; however if you have a copy of Windows or Mac OS you should have your own licensed copy of this file.

Multiple languages on one site

Thumbnail: Language configuration

Language configuration

It is possible to configure ocPortal such that members may select which language to use on your site, and pages are then presented in this language. There are a number of ways a user may choose a language:
  • via the language block (which inserts a keep_lang parameter into the URL, to preserve their choice until they close the browser window)
  • via their member profile (OCF supports this better than other language drivers, although the integration can be improved by editing the lang/map.ini file)
  • via their web browser stated language (disabled by default, as most users unfortunately have it misconfigured)

Thumbnail: Changing the default site language

Changing the default site language


Translating content

ocPortal can have its content translated and delivered for each language, without requiring any duplication.

ocPortal's multi-language support automatically becomes available when you have more than one language installed and have the OCF "Enable regionalisation" option enabled.

We need to consider the following cases:
  1. Sending newsletters
  2. Editing theme images
  3. Editing Comcode pages
  4. Using the Zone Editor
  5. Everything else

For '1' (newsletters), you will get a choice what language to send it for when you go to the newsletter module. Subscribers choose their language when they sign up.

For '2'-'4', you will get a choice of language which to edit under when you go to the respective section of ocPortal. What you save will be saved accordingly.

You will notice for any of '1'-'4', when you choose your language you will temporarily see the website in the language you are working under, until you finish. This is useful, but also ocPortal does it for architectural reasons. Be aware however, that the reason content is saved in a certain language here is due to the language selection you just made, and not necessarily directly related to the language you are viewing. This will be clarified in the next paragraph.

For '5', translation is performed in a special 'Translate content' part of the Admin Zone. It is crucial to understand that it is not performed just by editing content to your own language on normal edit screens. Content added to ocPortal is saved against the language being used by the submitter (except from '1' to '4' above). Therefore, when adding content you must ensure you have the right language choice, and a good rule of thumb to check this is by seeing if the language ocPortal is uses in its interface matches the language you expect to be submitting content in. When editing content, the content is always saved against the language you see it in when you are editing – if it has been translated already then it will be edited as such, otherwise it will still be in the original submitted language – never translate from an edit screen. If something is edited (so long as there were actual changes), all translations are automatically marked 'broken', and will be put back into the translation queue.

You will see there is an option in the footer for opening up a 'Translate content' screen just with language strings that were included on the page you are viewing.

In ocPortal almost everything (*) can be translated, but obviously you would not want to translate every forum post for a large community (for example). For this reason, ocPortal saves language with 'priorities', and that of the highest priority will be presented for translation first. For example, the names of zones would be the highest priority, whilst forum posts would be the lowest.

(*) A few things cannot be translated such as forum names. The reasoning is that you do not want such things translated, but rather you should have a different copy of each forum for each language. This is an exceptional situation, and is only designed like this due to the way forums are used. Other kinds of category may be translated as described above.

Criticising language packs

Thumbnail: Choosing a language to criticise the translation of

Choosing a language to criticise the translation of

A tool to criticise language packs is provided, to identify what has not been translated, amongst other things. This tool is intended for those who translate language files without using the inbuilt editor, or for those who have upgrade the software and need to update their language packs.

ocProducts policy on languages

We very much want ocPortal to be widely used by people in any language, but do not get involved in maintaining or developing individual language packs (other than the standard English). We may distribute third-party packs (which you can make from within ocPortal) if there is popular request.

Collaborative translations on Launchpad

You can use Launchpad to translate ocPortal into your language with the help of others.

Launchpad is great because:
  • You do not need to feel that you are alone translating everything yourself anymore
  • It's very easy to work together. People can be translating the same language at the same time
  • Anyone can download the current set of translations at any time

The process is as follows:
  • Go to the Launchpad site.
  • Sign up
  • Log in
  • Set your languages
  • Start translating (the strings are split across about 60 files, often it works well to work with other people, each doing different files)
  • When you're ready, you can download the PO files. There is a link to do this on the page for the version you are translating (it'll archive all files for you and then e-mail you a download link). You will need to extract all the files into the same directory, your lang_custom/<language> directory (e.g. lang_custom/FR). You only need to extract the files relating to your language but it won't matter if you extract all languages as ocPortal can still find the right ones.
  • Convert the PO files to .ini files using Pootle – as of ocPortal 4.1.11 you will be able to use .po files directly. Just copy them to the usual language directory, e.g. lang_custom/XX, where XX-is the two-letter-codename for the language. More details are in the Launchpad FAQ. You may need to create some extra language directories for Comcode pages and template caching if you are not on a SuExec server – basically anywhere where ocPortal has an EN directory create a directory for your language pack's 2-letter-codename too.

Note that some of the "English" will be written as "English: (English value). Explanation: (Explanation)". This is because Launchpad has no specific way to explain what strings are used for, so when importing we have put the explanation and original English together like this.

Need some help? Try the translation forum.

Turning on a different language

To change the language used on your site, use the http://yourbaseurl/config_editor.php script (load up the URL, with yourbaseurl substituted with your real base URL).

Alternatively members can select their language by editing their member profile. This may also be necessary because they might already have their profile locked down to English.

To test you can also append &keep_lang=FR to the URL (this is an example for French). If the URL did not contain an "?" symbol already you would need to append ?keep_lang=FR instead.

Concepts

language string
A piece of text, often a phrase, used by ocPortal; identified by a short code WRITTEN_LIKE_THIS
character set
A set of characters that the one-byte-per-character representation system ties to; used to allow more than 255 characters to be represented on computers so that they may show many different language scripts

Is this tutorial insufficient?

If you think this tutorial needs work (maybe we didn't explain things well enough?) please let us know.