regex


Bash sed - find hashtags in string


Based on this post, I have tried to come up with a command to find all hashtags words (words starting by #) in a quite complicated string:
echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" | sed -e 's/^/ /g' -e 's/ [^#][^ ]*//g' -e 's/^ *//g'
Unfortunately the output is:
#cerveau #Mopses"
instead of:
#cerveau #Einstein #Ordre #Mopses
What should be the correct command?
grep is usually better at extracting substrings. With the GNU-grep's -o option (only output the matching parts), you can just
echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" \
| grep -o '#[[:alpha:]]*'
If you really need sed, do the similar thing: replace all words that don't start with a # by a space, then remove the first word and compact the spaces:
sed -e 's/[^[:alpha:]#][[:alpha:]]*/ /g' \
-e 's/^[^#]*//' \
-e 's/ */ /g'
If you want to use sed, you can separate out all words that start by a \n and then find them:
echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" \
| sed -re 's/(#\w+)/\n\1\n/g' \
| sed -rn '/^(#\w+)$/p'
You need the -r option in sed to use extended regular expressions.
You can do this:
echo "Le #cerveau d’#Einstein n’est « #Ordre des #Mopses\" » pas" | grep -o '#[a-zA-Z0-9_]\+'
You get the expected output:
#cerveau
#Einstein
#Ordre
#Mopses
Explanation: The -o option in grep:
Prints only the matching part of the lines.
So, the grep command above matches a hashtag followed by a non-zero number of alphabets, digits and underscores.

Related Links

regex to match value up to 2 decimal
What is the mappings.ts file and how should it be set up in Tritium?
Symfony2 IBAN Validator returns false for valid UK IBAN
Regex to find anchor tags which are without http or https in the href attribute
What's the difference between [:space:] and [:blank:]?
Regex to replace &nbsp with
How do I replace one or more whitespace characters using the replace() function in XQuery?
.htaccess Pretty URL not displaying correctly with redirection
regex replacing several special characters
Creating a delimited text using regex
Perl regex with a negative lookahead behaves unexpectedly
Regex Get path name from full path
Replacing comma's by dots in floats using regular expressions
Visual Studio 2012 Regexes
regEx search/replace variable name without preceeding “_”
Perl regex script and command line different

Categories

HOME
amazon-s3
elastic-beanstalk
compiler-errors
android-wear
jsessionid
design
anylogic
antlr
weblogic11g
android-sms
go-gorm
oracle-adf
jbpm
tin-can-api
wget
jena
wso2-das
zebble
graph-tool
asp.net-mvc-5.2
ellipse
lazy-loading
countdown
teraterm
oracle11gr2
rational
gyroscope
slime
powershell-v2.0
amazon-cloudfront
robotium
angular-services
structure
x-sendfile
flexjs
buffer-overflow
reformatting
codeceptjs
debugview
gcp
akka-http
angular2-databinding
subtotal
xaf
jetty-9
kmdf
angular-xeditable
azure-management-api
brainfuck
automata-theory
karnaugh-map
integer-programming
pci
psycopg2
domdocument
xbrl
koala-framework
tryton
facebook-canvas
graph-traversal
cabal-install
wxhaskell
vispy
xendesktop
dashing
activity-diagram
ng-annotate
nuitka
duration
vsvim
trello.net
uac
wchar-t
monkeyrunner
multiscreen
vbo
pegkit
misra
physx
directcompute
ash
image-scaling
asp.net-apicontroller
disk-io
delphi-2009
pubdate
double-precision
pixelsense
ekeventkit
httpconnection
datatemplate
sharepoint-timer-job
plinqo
servletunit
android-3.1
automapping
coords
non-relational-database
contracts
server.transfer
large-teams

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App