Bug #5911
Language Driver ID in dbf file of new shapefile
| Status: | Closed | Start Date: | 06/30/2012 | |
|---|---|---|---|---|
| Priority: | High | Due date: | ||
| Assigned to: | - | % Done: | 0% |
|
| Category: | Data Provider | |||
| Target version: | - | |||
| Platform: | Patch supplied: | No | ||
| Platform version: | Affected version: | 1.8.0 | ||
| Status info: | Causes crash or corruption: | Yes | ||
| Resolution: |
Description
Shapefile created with QGIS has 0x57 value in the LDID field of dbf file regardless of what encoding has been selected in the dialog. The LDID/87 (0x57) value means ISO-8859-1, which is a default. See OGR driver: ESRI Shapefile. This issue causes character corruption in the attribute table.
In detail, though the createEmptyDataSource() receives encoding as one of the parameters, it is not used to create shapefile.
Although the LDID might be set to the codepage specified by the user, the generated dataset that had zero in the LDID field and included .cpg file might be easier to handle as a user. This point is desirable to be discussed.
Best regards.
Related issues
| related to Quantum GIS Desktop - Bug #5900: QGIS 1.8.0 windows standalone ships with GDAL version tha... | Rejected | 06/29/2012 | ||
| related to Quantum GIS Desktop - Bug #5255: Wrong codepage of shapefile | Closed | 03/29/2012 | ||
| related to Quantum GIS Desktop - Bug #4343: Shapefile, created in Qgis, encoding not recognized by Es... | Closed | 10/03/2011 | ||
| related to Quantum GIS Desktop - Bug #5622: layer properties, general, provider-specific options, enc... | Closed | 05/20/2012 | ||
| related to Quantum GIS Desktop - Bug #5508: DBF encoding and cyrillic values | Closed | 04/26/2012 | ||
| related to Quantum GIS Desktop - Bug #6057: QGIS 1.8 Encoding problem with Bulgarian characters CP1251 | Closed | 07/17/2012 | ||
| related to Quantum GIS Desktop - Bug #5927: ESRI shapefile encoding problem | Closed | 07/02/2012 | ||
| related to Quantum GIS Desktop - Bug #5982: vector layer encoding default not saved not configurable | New | 07/09/2012 | ||
| related to Quantum GIS Desktop - Bug #5340: QGIS loses non-latin letters in new shapefiles | Closed | 04/11/2012 | ||
| duplicated by Quantum GIS Desktop - Bug #6500: Language Encoding very broken in 1.8 Lisboa | Closed | 10/11/2012 |
Associated revisions
Revision 75dc85b4d652116814873bb7674cab15ce6cde66
allow to ignore (OGR's interpretation of ) shape file encoding (might fix #5911)
Revision 7fb46498c9fb3c14a2d0b0fcc8e634dba2f1cade
also optionally apply SHAPE_ENCODING to layer creation (fixes #5911)
History
Updated by Jürgen Fischer 11 months ago
Mapping between LDID values and codepages (LDID/87 means ISO-8859-1): http://trac.osgeo.org/gdal/browser/branches/1.9/gdal/ogr/ogrsf_frmts/shape/ogrshapelayer.cpp#L170
Updated by Minoru Akagi 10 months ago
I show three solutions:
1. Create a mapping which converts the MIBenum to text of cpg file(or LDID value). QTextCodec::codecForName(encoding_name)->mibEnum() gives MIBenum.
- The encodings in the listbox of QgsEncodingFileDialog are those that QTextCodec supports.
- Mapping between MIBenum and character set name is at http://www.iana.org/assignments/character-sets
- Some encodings supported by QTextCodec are not supported by Shapefile.
- Shapefile supports code page values in http://resources.arcgis.com/fr/content/kbase?fa=articleShow&d=21106
- There are a few plug-ins which use QgsEncodingFileDialog.
- Some other dialogs (such as "Save Vector Layer As" dialog) also have the encoding listbox.
3. Generate a Shapefile dataset that has zero in LDID field and no cpg file regardless of the selected encoding. Then QGIS opens the dataset with the encoding specified.
I guess number 3 is easiest one.
Updated by Minoru Akagi 10 months ago
- File qgsogrprovider3.patch added
I attach a patch for solution number 3.
Updated by Minoru Akagi 10 months ago
Updated by Minoru Akagi 10 months ago
If default LDID of OGR Shapefile Driver dataset creation was changed to zero, the encoding problem of shapefiles generated via the "Save vector layer as" dialog would be solved as well. Also that of shapefiles generated by some plug-ins(e.g. fTools).
Updated by Marco Lechner 9 months ago
- File encodingtest.zip added
- Priority changed from Normal to High
I guess this should be priority high, because all shapes not having LDID set or not having an cpg-file (which surely are most of Shapes out there) are forced latin1. Users choice to select the encoding when loading a layer, should always overwrite the default. Otherwise it can not be understood why a Shapes attributetable is always displayed wrong, wether the user tries to define encoding or not. This brakes the behavior of QGIS as known by the user.
I add some Shapes and qgs-files for testing.
btw it depends on gdal-Version 1.9.x
Updated by Jürgen Fischer 9 months ago
- Status changed from New to Closed
Fixed in changeset 75dc85b4d652116814873bb7674cab15ce6cde66.
Updated by Minoru Akagi 9 months ago
Jef's fix is good at reading shapefiles and creating new shapefiles, so the issue of this ticket has been fixed. However I've found an encoding problem of the shapefile generated via the "Save vector layer as" dialog or fTools is still existing. OGR Shapefile driver converts character encoding from UTF-8 to ISO-8859-1 and rarely garbles attribute strings.
See also GDAL #4808
Updated by Minoru Akagi 9 months ago
Sorry,
Testing it again today, I don't experience any character corruption of shapefiles generated via both "Save vector layer as" and fTools. Maybe I had forgotten to check the option. The fix is very nice!
Updated by Minoru Akagi 3 months ago
- Status changed from Closed to Reopened
I've noticed that the garbling occurs when saving Spatialite/PostGIS layer to Shapefile. The above fix means that if the option "Ignore shapefile encoding" is checked, OGR Shapefile's encoding conversion will be disabled when a OGR layer is loaded. Saving after editing newly created layer has no problem because in the layer creation QGIS generates an empty layer and then loads it. However, in the particular case I encountered, if no OGR layer has been loaded, output will be garbled.
Updated by Jürgen Fischer 3 months ago
- Status changed from Reopened to Closed
Fixed in changeset 7fb46498c9fb3c14a2d0b0fcc8e634dba2f1cade.
Updated by Minoru Akagi 3 months ago
Thank you very much.
Updated by Minoru Akagi about 1 month ago
- File japan_poly.zip added
Updated by Ivan Mincik about 1 month ago
- File shp-encoding-problem-cp1250.zip added
I am attaching test Shapefile in cp1250 and the same in utf-8 for comparison. Both where made by QGIS 1.8 compiled with GDAL 1.7 (in Debian Squeeze).
Updated by Jürgen Fischer about 1 month ago
Updated by Minoru Akagi 27 days ago
In master, LDID is set to zero and .cpg file is appended except for "System" on creating shapefile. Thank you Borys!