Welcome to the OvniConv project.

Presentation

This project is about developing some OpenDocumentFormat tools to help converting to TCVN 6909:2001 (Unicode) all files encoded with old Vietnamese encoding, say TCVN 5712:1993, VNI, VPS, and so on.

Detailed explanation of the problem of displaying non unicode vietnamese fonts

This page is more interesting in term of history since the hosting and the development of the project is hosted here :

https://launchpad.net/ovniconv

there's also an openoffice.org extension that exist also :

http://extensions.services.openoffice.org/project/ovniconv

Second try: more proof of concept!

  • May.2008 :
    • Nguyẽn Đình Trung contribution to automate the conversion from DOC to ODT ⇒ see Batch convert Doc to ODT
    • Added a new tool to convert Vietnamese TCVN encoded fonts to Unicode ⇒ see the result (WARNING: EXPERIMENTAL)
    • Added a new option to ovniconv to keep Vietnamese fonts (but their Unicode flavor) when converting
    • Stéphane Masse contribution to get a graphical interface for ovniconv ⇒ see GUI for OvniConv

Experimentation

First try: proof of concept!

  • open an old TCVN encoded MS-Office .DOC file using OOo:
ooffice test-tcvn.doc
  • save the file in .ODT format, then quit OOo
  • use unzip to extract the .ODT content.xml file
unzip test.odt content.xml
  • recode content.xml from UTF-8 to WINDOWS-1252
iconv --from=UTF-8 --to=WINDOWS-1252 < content.xml > content-tcvn.xml
  • recode content.xml from TCVN-5712 to UTF-8
iconv --from=TCVN-5712 --to=UTF-8 < content-tcvn.xml > content.xml
  • use zip to put back content.xml in the .ODT file
zip test.odt content.xml
  • open the .ODT file using OOo
ooffice test.odt
  • It's all Unicode encoded! (but fonts are still declared as .vn* ones)
  • Note that there still is some issue with some special characters (like double-quote) which are loosely replaced with Vietnamese accentuated characters. This is because we are doing a global raw string conversion, converting also strings using fonts other than .vn*. The final tool would have to take care of converting only those strings associated with some .vn* font.
  • Test file used: test-tcvn.doc

About the HyphenationIssue

 
projects/ovniconv.txt · Last modified: 2008/10/12 21:11 by ict4ngo
 
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Debian Driven by DokuWiki