Bug report #4091

Create layer from delimited text (csv) does not work properly for quoted strings

Added by springmeyer - over 5 years ago. Updated almost 4 years ago.

Status:Closed Start Date:07/18/2011
Priority:Low Due date:
Assigned to:- % Done:

50%

Category:C++ Plugins
Target version:Version 2.0.0
Platform: Pull Request or Patch supplied:No
Platform version: Affected version:master
Status info: Causes crash or corruption:No
Resolution: Tag:

Description

If you import a csv with values with commas, using 'comma' as the delimiter, only commas that are unquoted should be used to split the columns.

Right now (QGIS 1.7.0) the result of a row like:

1, "John,Doe", "Mary, Jane" 

is to split on the , between John and Doe, which is not the right behavior.

Assigning to ccrook as i see he's done some recent work on the plugin and can hopefully give feedback on this.

The reason I think getting this behavior right is critical is that most csv export software (in my case I'm using LibreOffice) is going to default to quoting strings with commas and using commas as delimiters.

Associated revisions

Revision 230bbfb459f807a645fa3edbbc44b1012177bdfb
Added by Giuseppe Sucameli over 4 years ago

use plain delimiter if one delimiter only was selected (partially fix #4091)

History

Updated by springmeyer - over 5 years ago

I also meant to mention that when "a value" is imported the quotes are not stripped, as they should be. It is my understanding that quoted strings should be representing string literals so keeping the quotes after import is wrong.

Updated by Paolo Cavallini about 5 years ago

  • Category set to C++ Plugins
  • Pull Request or Patch supplied set to No

Updated by Giovanni Manghi about 5 years ago

  • Target version set to Version 1.7.4

Updated by Chris Crook about 5 years ago

  • Affected version set to master
  • Causes crash or corruption set to No

Definitely an issue with CSV import! The workaround for the moment is to OGR CSV format (with a VRT file) which works just fine. Will have a look at fixing this in delimited text plugin.

Updated by springmeyer - about 5 years ago

Chris Crook wrote:

Definitely an issue with CSV import! The workaround for the moment is to OGR CSV format (with a VRT file) which works just fine. Will have a look at fixing this in delimited text plugin.

Hey, thanks for commenting. I've used the VRT method and was looking for a one-step approach for novice users. I ended up solving things (for my purposes) in Mapnik by writing my own CSV plugin. So, +1 to improving this feature, but at least my original usecase is not longer critical.

Updated by Paolo Cavallini almost 5 years ago

  • Target version changed from Version 1.7.4 to Version 1.8.0

Updated by Paolo Cavallini over 4 years ago

  • Target version changed from Version 1.8.0 to Version 2.0.0

Updated by Giuseppe Sucameli over 4 years ago

  • Status changed from Open to Closed

Updated by Giuseppe Sucameli over 4 years ago

  • Status changed from Closed to Reopened
  • Assigned to deleted (Chris Crook)
  • Priority changed from Normal to Low
  • % Done changed from 0 to 50

Whether you choose only one delimiter from the "selected delimiter" list it is internally converted to "plain delimiter", so now it works also quoted strings (see http://hub.qgis.org/issues/6013).

If more delimiters are choosen from the "selected delimiters" list it still uses the "regexp delimiter" and it doesn't parse qouted strings.

The newline problem (quoted strings on more lines are not parsed) is still there, whatever delimiter you're using.

Updated by Chris Crook over 4 years ago

I have an update for the delimiter plugin which fixes the newline and comma issues, but it also requires an update to the plugin dialogue which I haven't had time to complete yet. Basically the approach I am considering is to use a couple of alternative parsers - one for regexp, one for plain whitespace, and one for fixed delimiters such as CSV. I'm thinking the dialog could then be a bit simpler (for the user), with an initial selection of parser type (which could include preset types, such as Excel CSV, tab delimited), and then options displayed according to the type of delimiter set.

One development issue that makes this difficult is that both the data provider plugin and the options need to access the same parsing code, but they are different compilation modules, so I haven't figured where to put the common code, or whether to just replicate it.

Updated by Chris Crook almost 4 years ago

  • Status changed from Reopened to Closed

Fixed for 2.0 at commit fab2c57478f67be01a9ac91f0ce27a1f739d0501

Also available in: Atom