regex


From Matlab to R: Capture named fields with regular expressions to a dataframe


I want to capture name fields from a list of strings by using a regular expression. In Matlab I did it this way:
strings = {'sn555 ID_O20-5-684_N52_2_Subt2_01.',...
'sn555 ID_O20-5-984_S52_8_Subt10_11.'};
pattern = ['sn(?<serial_number>.*) ID(_)(?<ID>.*)_(?<Class>[NS])'...
'(?<Sector>.*)_(?<Point>(.*))_[Ss]ubt.*\.'];
ParsedData = regexp(strings,pattern,'names');
The result (converted to a dataset) is:
ParsedData =
serial_number ID Class Sector Point
'555' 'O20-5-684' 'N' '52' '2'
'555' 'O20-5-984' 'S' '52' '8'
Now I want to parse these strings in R and convert the result to a dataframe.
I tried this:
strings <- c("sn555 ID_O20-5-684_N52_2_Subt2_01.",
"sn555 ID_O20-5-984_S52_8_Subt10_11.")
pattern <- paste0('sn(?<serial_number>.*) ID(_)(?<ID>.*)_(?<Class>[NS])',
'(?<Sector>.*)_(?<Point>(.*))_[Ss]ubt.*\\.');
ParsedData <- gregexpr(pattern,strings, perl = TRUE);
ParsedData
Unfortunately, I'm new to regular expressions in R and the output (ParsedData) is confusing to me. What are your suggestions how to convert the strings to a dataset?
In the past I wrote a helper function to extract capture groups from regular expressions called regcapturedmatches.R.
You can use it with your data like this:
rr <- regcapturedmatches(strings,ParsedData)
rr
# [[1]]
# serial_number X ID Class Sector Point X.1
# [1,] "555" "_" "O20-5-684" "N" "52" "2" "2"
#
# [[2]]
# serial_number X ID Class Sector Point X.1
# [1,] "555" "_" "O20-5-984" "S" "52" "8" "8"
You get a list back with an array with column names. You could turn that into a data.frame with:
do.call(rbind.data.frame, rr)
# serial_number X ID Class Sector Point X.1
# 1 555 _ O20-5-684 N 52 2 2
# 2 555 _ O20-5-984 S 52 8 8

Related Links

vimrc to detect remote connection
Strange issue with regex matching in perl, alternate attempts match
REGEX reformatting
GPA regex in Perl
Replace string unless between two points
TCL passing lists of regexes through command line
Match a Regular Expression by simple 2 cases:
Regex ignore Find and Replace in Notepad++
Perl: How to match a string that is not in a given string [duplicate]
glob2rx in R to get all cells whose last decimal is 5?
Finding file names without a specified character
Perl: quick replacing of occurrences of multiple words in an array
Extract resolution from string
How do I create a Scala Regex that is compiled using Java Pattern.COMMENTS?
Is there a way to search terms in order with RegexpQuery in lucene?
Regex to allow any charcter EXCEPT backslash

Categories

HOME
azure
dom
opencl
xaml
runnable
crf
android-sqlite
configuration
apacheds
bootstrap-selectpicker
logback
wso2is
angular-mdl
prestashop-1.7
obiee
nservicebus
lighttpd
resolve
rebus
atmelstudio
stackexchange-api
slime
powershell-v2.0
many-to-many
spark-graphx
naturallyspeaking
php-carbon
desire2learn
scalability
expression-trees
plm
bluebird
matterjs
background-process
machine-language
ollydbg
capstone
jetty-9
ecdsa
runge-kutta
visualsvn-server
rft
nscalendar
log4perl
move
adal.js
tpl-dataflow
abbyy
g-wan
jeditorpane
lapply
mysql-notifier
surroundscm
ispconfig
data-fitting
visual-build-professional
vlc-android
asp.net-webpages
pddl
win32com
tag-helpers
dual-table
radix-sort
samsung-gear
monkeyrunner
axlsx
wss4j
keypad
fastparse
newsql
javaw
slick2d
tinkerpop-frames
dojo.gridx
magento-1.6
acts-as-taggable-on
windows-phone-7.1
fay
dfsort
re2c
bochs
mips64
emacs-jedi
repeating
pubdate
wow64
google-authorship
sizzle
java-ee-5
ad-hoc-distribution
arden-syntax
time-limiting
openid4java
nstreecontroller
spquery
network-scan
xml-database
stsadm
roguelike

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App