regex


Regex look for all quotes not preceded by a comma


I have a CSV file that is not well formatted and I need to look for extra quotes.
This is what it looks like:
"1","title","desc desc dejdg sdjgh djhg"
"2","title2","desc jhgdj "jhsgfjhsgd" jhgd"
^^^^^^^^^^^^
I need to look for any " that doesn't have a , before:
(\")(?!\,)
This would mean look for any " that doesn't have a , right after, but I don't know how to look for a , before the ".
The regex you are asking for would be
(?<!,)"
Alternatively, if you're not averse to matching more than you need you can use
(^|[^,])"
which would match the preceding non-comma as well as the double-quote, but support for it should be more widespread (excluding findstr).
However, for parsing CSV files you should use a proper parser as any regex-based solution (at least those I have seen so far for this task) is
error-prone
unreadable
slow
What about
[^,](\")
?
Notice that this doesn't detect quotes at the beginning of the line (which, technically, are quotes not preceded by commas), but for your usage this is fine, since quotes at the beginning of the line aren't an error in a CSV file.
In .NET, you might use the TextFieldParser Class. Add the Microsoft.VisualBasic reference to your project and try this:
using System;
using System.Linq;
using Microsoft.VisualBasic.FileIO;
class Program
{
static void Main(string[] args)
{
using (var tfp = new TextFieldParser("input.txt"))
{
tfp.Delimiters = new string[] { "," };
tfp.HasFieldsEnclosedInQuotes = false;
while (!tfp.EndOfData)
{
var fields = tfp.ReadFields();
fields.ToList().ForEach(field =>
{
Console.WriteLine(field);
});
}
}
}
}

Related Links

Duplicate symbols in regular expressions
NSRegularExpression to remove the rest codes
Regex replace in sublimetext2
Regular Expression Search Replace all non leading tabs with single space Notepad++
Django regular expression on success format
Reg ex required for finding two ||
regex from first character to the end of the string
How can I recognize the following group of strings using a grammar or regex?
htaccess redirect begins with
grep through binary file
Printing Text on Perl with RegEp
Regexp for substitution of anything other than numbers and 'N/A'
This regex is inefficient. Why does it crash Sublime Text 2's stack?
Posix regex in Postgresql to extract from quoted text?
Regular expression sequence matching
Perl change number to words

Categories

HOME
ssis
printing
dom
logstash
opencl
performance-testing
angular2-forms
jsessionid
streaming
application-cache
sony
splunk
py.test
range
google-maps-sdk-ios
wso2-das
librsvg
session-cookies
jscrollpane
watchservice
records
plc
slowcheetah
contains
revitpythonshell
co
code-analysis
nested-set-model
missingmethodexception
jsgrid
robotium
hawq
stylesheet
paket
matterjs
apache-commons-httpclient
django-users
cloudinary
haxm
jdk1.6
pdfstamper
fipy
drupal-webform
compass-lucene
word-embedding
btle
stateful
mdanalysis
sweet.js
jquery-ui-dialog
wp-api
federation
libev
reactivex
breach-attack
mysql-error-1050
mapr
geneticsharp
android-alertdialog
tachyon
vispy
ubuntu-15.04
flattr
webpagetest
pygments
angularjs-filter
accounts
balana
php-opencloud
fab
par
session-replication
mtp
double-quotes
multiscreen
bloodhound
gevent-socketio
fortrabbit
newsql
swift-array
setattribute
tortoisecvs
information-schema
knockout-mvc
epoll
sendto
django-postgresql
directcompute
dojo.gridx
hebrew
windows-phone-7.1
pydatalog
batman.js
pdf-scraping
showcaseview
android-lru-cache
llblgenpro
geometry-surface
html5-apps
eject
xsocket
doophp
timestamp-with-timezone
rpn
obout
ed
high-traffic
cassandra-0.7
sloc
development-machine
versant-vod

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App