Bug report #5911

Language Driver ID in dbf file of new shapefile

Added by Minoru Akagi over 2 years ago. Updated over 1 year ago.

Status:Closed Start Date:06/30/2012
Priority:High Due date:
Assigned to:- % Done:

0%

Category:Data Provider
Target version:-
Platform: Patch supplied:No
Platform version: Affected version:1.8.0
Status info: Causes crash or corruption:Yes
Resolution: Tag:

Description

Shapefile created with QGIS has 0x57 value in the LDID field of dbf file regardless of what encoding has been selected in the dialog. The LDID/87 (0x57) value means ISO-8859-1, which is a default. See OGR driver: ESRI Shapefile. This issue causes character corruption in the attribute table.

In detail, though the createEmptyDataSource() receives encoding as one of the parameters, it is not used to create shapefile.

Although the LDID might be set to the codepage specified by the user, the generated dataset that had zero in the LDID field and included .cpg file might be easier to handle as a user. This point is desirable to be discussed.

Best regards.

qgsogrprovider3.patch - a patch for solution number 3 (1 kB) Minoru Akagi, 08/01/2012 11:04 pm

encodingtest.zip (6.2 kB) Marco Lechner, 08/16/2012 07:39 am

japan_poly.zip - test data (Japanese main islands). (30.5 kB) Minoru Akagi, 04/12/2013 04:19 am

shp-encoding-problem-cp1250.zip (237.5 kB) Ivan Mincik, 04/18/2013 02:02 am


Related issues

related to QGIS Application - Bug report #5900: QGIS 1.8.0 windows standalone ships with GDAL version tha... Rejected 06/29/2012
related to QGIS Application - Bug report #5255: Wrong codepage of shapefile Closed 03/29/2012
related to QGIS Application - Bug report #4343: Shapefile, created in Qgis, encoding not recognized by Es... Closed 10/03/2011
related to QGIS Application - Bug report #5622: layer properties, general, provider-specific options, enc... Closed 05/20/2012
related to QGIS Application - Bug report #5508: DBF encoding and cyrillic values Closed 04/26/2012
related to QGIS Application - Bug report #6057: QGIS 1.8 Encoding problem with Bulgarian characters CP1251 Closed 07/17/2012
related to QGIS Application - Bug report #5927: ESRI shapefile encoding problem Closed 07/02/2012
related to QGIS Application - Bug report #5982: vector layer encoding default not saved not configurable Open 07/09/2012
related to QGIS Application - Bug report #5340: QGIS loses non-latin letters in new shapefiles Closed 04/11/2012
duplicated by QGIS Application - Bug report #6500: Language Encoding very broken in 1.8 Lisboa Closed 10/11/2012

Associated revisions

Revision 75dc85b4d652116814873bb7674cab15ce6cde66
Added by Jürgen Fischer about 2 years ago

allow to ignore (OGR's interpretation of ) shape file encoding (might fix #5911)

Revision 7fb46498c9fb3c14a2d0b0fcc8e634dba2f1cade
Added by Jürgen Fischer over 1 year ago

also optionally apply SHAPE_ENCODING to layer creation (fixes #5911)

History

Updated by Jürgen Fischer over 2 years ago

Mapping between LDID values and codepages (LDID/87 means ISO-8859-1): http://trac.osgeo.org/gdal/browser/branches/1.9/gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp#L170

Updated by Minoru Akagi about 2 years ago

I show three solutions:
1. Create a mapping which converts the MIBenum to text of cpg file(or LDID value). QTextCodec::codecForName(encoding_name)->mibEnum() gives MIBenum.

  • The encodings in the listbox of QgsEncodingFileDialog are those that QTextCodec supports.
  • Mapping between MIBenum and character set name is at http://www.iana.org/assignments/character-sets
  • Some encodings supported by QTextCodec are not supported by Shapefile.
2. Change encodings in the listbox of QgsEncodingFileDialog for supported encodings of Shapefile.

3. Generate a Shapefile dataset that has zero in LDID field and no cpg file regardless of the selected encoding. Then QGIS opens the dataset with the encoding specified.

I guess number 3 is easiest one.

Updated by Minoru Akagi about 2 years ago

I attach a patch for solution number 3.

Updated by Minoru Akagi about 2 years ago

If default LDID of OGR Shapefile Driver dataset creation was changed to zero, the encoding problem of shapefiles generated via the "Save vector layer as" dialog would be solved as well. Also that of shapefiles generated by some plug-ins(e.g. fTools).

Updated by Marco Lechner about 2 years ago

I guess this should be priority high, because all shapes not having LDID set or not having an cpg-file (which surely are most of Shapes out there) are forced latin1. Users choice to select the encoding when loading a layer, should always overwrite the default. Otherwise it can not be understood why a Shapes attributetable is always displayed wrong, wether the user tries to define encoding or not. This brakes the behavior of QGIS as known by the user.

I add some Shapes and qgs-files for testing.

btw it depends on gdal-Version 1.9.x

Updated by Jürgen Fischer about 2 years ago

  • Status changed from Open to Closed

Updated by Minoru Akagi about 2 years ago

Jef's fix is good at reading shapefiles and creating new shapefiles, so the issue of this ticket has been fixed. However I've found an encoding problem of the shapefile generated via the "Save vector layer as" dialog or fTools is still existing. OGR Shapefile driver converts character encoding from UTF-8 to ISO-8859-1 and rarely garbles attribute strings.

See also GDAL #4808

Updated by Minoru Akagi about 2 years ago

Sorry,

Testing it again today, I don't experience any character corruption of shapefiles generated via both "Save vector layer as" and fTools. Maybe I had forgotten to check the option. The fix is very nice!

Updated by Minoru Akagi over 1 year ago

  • Status changed from Closed to Reopened

I've noticed that the garbling occurs when saving Spatialite/PostGIS layer to Shapefile. The above fix means that if the option "Ignore shapefile encoding" is checked, OGR Shapefile's encoding conversion will be disabled when a OGR layer is loaded. Saving after editing newly created layer has no problem because in the layer creation QGIS generates an empty layer and then loads it. However, in the particular case I encountered, if no OGR layer has been loaded, output will be garbled.

Updated by Jürgen Fischer over 1 year ago

  • Status changed from Reopened to Closed

Updated by Minoru Akagi over 1 year ago

Thank you very much.

Updated by Minoru Akagi over 1 year ago

Updated by Ivan Mincik over 1 year ago

I am attaching test Shapefile in cp1250 and the same in utf-8 for comparison. Both where made by QGIS 1.8 compiled with GDAL 1.7 (in Debian Squeeze).

Updated by Minoru Akagi over 1 year ago

In master, LDID is set to zero and .cpg file is appended except for "System" on creating shapefile. Thank you Borys!

Also available in: Atom PDF