Bug #5255

Wrong codepage of shapefile

Added by Stanislaw Kapustka about 1 year ago. Updated 11 months ago.

Status:Closed Start Date:03/29/2012
Priority:Normal Due date:
Assigned to:- % Done:

0%

Category:-
Target version:-
Platform: Patch supplied:No
Platform version: Affected version:1.7.4
Status info: Causes crash or corruption:No
Resolution:upstream

Description

When opening shapefiles, it doesn't matters what codepage You choose, it is always UTF-8 in QGIS 1.74, so polish letters are wrong displayed (when shapefile was saved in other codepage than UTF-8, of course). Other coding is on list but it not works. In QGIS 1.73 it works perfect. The same problem is in master version.

chinese.zip (555 Bytes) rouault -, 06/10/2012 09:06 am


Related issues

related to Quantum GIS Desktop - Bug #5340: QGIS loses non-latin letters in new shapefiles Closed 04/11/2012
related to Quantum GIS Desktop - Bug #5900: QGIS 1.8.0 windows standalone ships with GDAL version tha... Rejected 06/29/2012
related to Quantum GIS Desktop - Bug #5911: Language Driver ID in dbf file of new shapefile Closed 06/30/2012

History

Updated by Alexander Bruy about 1 year ago

This is because 1.7.4 and master now compiled against GDAL 1.9.0.

Updated by zirneklitis - about 1 year ago

When *.dbf file is re-saved with OpenOffice Calc, QGIS shows the correct characters with any given code page. Until any edits are saved within QGIS. Question marks are saved in place of any non-latin characters. It's impossible to switch the code page for any shape files created by QGIS.

Updated by Giovanni Manghi about 1 year ago

zirneklitis - wrote:

It's impossible to switch the code page for any shape files created by QGIS.

it is not qgis fault, is gdal one. see:

http://ssrebelious.wordpress.com/2012/03/11/qgis-and-gdal1-9-encoding-issue-a-workaround/

this is because 1.7.3 works, it is compiled with an old release of gdal.

Updated by Alexander Bruy about 1 year ago

Bug in GDAL already fixed, see http://trac.osgeo.org/gdal/ticket/4650

Updated by Giovanni Manghi about 1 year ago

  • Status changed from New to Closed
  • Resolution set to upstream

Updated by zirneklitis - about 1 year ago

Recompiled GDAl and QGIS:

QGIS version: 1.8.0-Lisboa, QGIS code revision: a1255fc, Compiled against GDAL/OGR: 2.0dev, Running against GDAL/OGR: 2.0dev.

Nothing has changed. The problem still remains.

OS: Fedora 14 x64.

Updated by Alexander Bruy about 1 year ago

  • Status changed from Closed to Reopened

You can try custom QGIS build from NextGIS (http://nextgis.ru/en/nextgis-qgis/) where this issue solved

Updated by Giovanni Manghi 12 months ago

  • Status changed from Reopened to Closed

zirneklitis - wrote:

Recompiled GDAl and QGIS:

QGIS version: 1.8.0-Lisboa, QGIS code revision: a1255fc, Compiled against GDAL/OGR: 2.0dev, Running against GDAL/OGR: 2.0dev.

Nothing has changed. The problem still remains.

OS: Fedora 14 x64.

still a gdal issue, not a qgis one.

Updated by zirneklitis - 12 months ago

I insist that this is a QGIS issue.

GDAL 1.9.0 (and newer) is trying to interpret the encoding setting from the shape file itself. When creating a new shape file “ENCODING” should be passed as an attribute, which, obviously, is not done.

Calling qgis from terminal allows two track down an warning messages. Saving non-Latin characters in a shape files generates following warning message: “Warning 1: One or several characters couldn't be converted correctly from UTF-8 to ISO-8859-1.
This warning will not be emitted anymore”.

On the other hand, most of the shape files used by users are without character encoding byte. So QGIS has to operate with environmental variable “SHAPE_ENCODING”. At present the only solution is to use the same character coding for the given QGIS session, e.g.:

SHAPE_ENCODING=UTF-8 export SHAPE_ENCODING qgis

The example above allows to create and edit shape files with UTF-8 as a character encoding (example for Linux users, Windows users must use “SET SHAPE_ENCODING=UTF-8”).

------------------------------------------------
Excerpt from

http://trac.osgeo.org/gdal/wiki/ConfigOptions

In C/C++ configuration switches can be set programmatically like this:

#include "cpl_conv.h"
...
CPLSetConfigOption( "GDAL_CACHEMAX", "64" );

Normally a configuration option applies to all threads active in a program, but they can be limited to only the current thread this way:

CPLSetThreadLocalConfigOption( "GDAL_CACHEMAX", "64" );

Updated by zirneklitis - 12 months ago

The Linux example above should be as follows:

$ SHAPE_ENCODING=UTF-8
$ export SHAPE_ENCODING
$ qgis

Updated by Alexander Bruy 12 months ago

zirneklitis - wrote:

I insist that this is a QGIS issue.

This is GDAL issue. GDAL always reports that it returned attributes is UTF-8, even when attributes have different encoding. SHAPE_ENCODING environment variable didn't work in most cases. This bug was partially fixed (see http://trac.osgeo.org/gdal/ticket/4650), but some more fixes needed

Updated by Jürgen Fischer 12 months ago

Alexander Bruy wrote:

You can try custom QGIS build from NextGIS (http://nextgis.ru/en/nextgis-qgis/) where this issue solved

how?

Updated by Alexander Bruy 12 months ago

Jürgen Fischer wrote:

how?

This is only workaround, not real fix. We simply reverted some parts of 2d0edcd7a2 (related to OLCStringsAsUTF8). With GDAL 2.0 in most cases all works fine without this workaround and we are working on final fix for GDAL

Updated by rouault - 12 months ago

Note that I've just pushed additonnal fixes in GDAL ( see http://trac.osgeo.org/gdal/ticket/4650 ) that should make OLCStringsAsUTF8 more reliable.

Updated by Tim Sutton 12 months ago

Hi

Could you please provide a Free, minimal test dataset so the we can add a test to our test suit, along with an idea of how we can evaluate the test as passing.

Updated by rouault - 12 months ago

I'm attaching a small shapefile generated by the following OGR Python script (needs latest GDAL trunk, to support recoding of field name from UTF-8 to CP936 - reading should be OK with GDAL 1.9)

import sys
from osgeo import ogr, osr, gdal
import struct

ds = ogr.GetDriverByName('ESRI Shapefile').CreateDataSource('chinese.dbf')
lyr = ds.CreateLayer('chinese', options = ['ENCODING=LDID/77'])
chinese_str = struct.pack('B' * 6, 229, 144, 141, 231, 167, 176)
lyr.CreateField(ogr.FieldDefn(chinese_str, ogr.OFTString))
feat = ogr.Feature(lyr.GetLayerDefn())
feat.SetField(0, chinese_str)
lyr.CreateFeature(feat)
ds = None

Updated by zirneklitis - 11 months ago

Who should create the .cpg files – GDAL or QGIS? Shape file with *.cpg* present works as expected (partly – QGIS has no idea of the existence of this file). The attribute values are not crippled any more. More about *.cpg files:

http://support.esri.com/en/knowledgebase/techarticles/detail/21106

Updated by Minoru Akagi 11 months ago

I installed GDAL 1.9.1 by using OSGeo4W.

When I convert a dataset of Shapefile which dbf file has "19" value (it means "CP932") in LDID field to KML format with ogr2ogr, the following message is shown.

Warning1: Recode from CP932 to UTF-8 not supported, treated as ISO8859-1 to UTF-8

The Japanese characters of generated KML file is incorrect. This will also result character corruption in QGIS.

I think that recoding of GDAL with iconv library is not enabled now.
For testing, I built GDAL 1.9.1 compiled with HAVE_ICONV constant declared and linked with iconv library.
With my built ogr2ogr, the warning is not appeared and a KML file with readable Japanese characters is generated.

I, as a Japanese user of the great softwares, desired that QGIS use GDAL with iconv library linked.

Updated by Minoru Akagi 11 months ago

I've also reported this recoding issue to OSGeo4W Trac.
http://trac.osgeo.org/osgeo4w/ticket/294

Updated by Minoru Akagi 11 months ago

Sorry, I noticed that I had a problem, which had been solved already in latest GDAL trunk. There is no problem converting CP932 to UTF-8.

Also available in: Atom PDF