[NZLUG] Character set translator passing illegal charaters

Volker Kuhlmann hidden at paradise.net.nz
Fri Jun 28 10:53:47 NZST 2013


On Wed 26 Jun 2013 21:09:59 NZST +1200, Robin Sheat wrote:

> What are you converting? I often use Encoding::FixLatin in Perl for
> cleaning up Latin1 and making unicode out of it. This does a pretty good
> best-effort job, along with some regexes to clean up smart quotes and
> other oddball things that end up in text (I found VT100 control codes in
> one data set.)

The aim is trying to shift to utf8, but creates all sorts of legacy
problems. The bottom line is that I want to be able to go back and
forth, but only for fractions of text files, not for the whole file at
once, when they are being edited. But let's not concentrate on the why,
but on the what, and that is to convert to target encoding only those
parts not already in it, but don't royally screw up everything else
(which is what recode and iconv are doing).

> http://search.cpan.org/dist/Encoding-FixLatin/lib/Encoding/FixLatin.pm

Very nice (once I manage to integrate it into the system), thank you.

Now for the other way round...

Perhaps creating a couple of sed replacement tables (or at least the one
utf8->latin1) would be the best way afterall.

Thanks,

Volker

-- 
Volker Kuhlmann			is list0570 with the domain in header.
http://volker.dnsalias.net/	Please do not CC list postings to me.


More information about the NZLUG mailing list