![]() \s matches whitespace characters (space, tabs) \w matches "word" characters (a-b, A-Z, 0-9 and _) but only match certain kinds of characters. Regexps have built-in character classes: these are like. How can we build a regexp which will match only those hashtags which end in pol? We need to make sure that the character following 'pol' is not another letter or number: that is, we need a part which will match a space. Our new regexp filters out the hashtag of the World Pole Dancing championship which we are keenly watching on Channel 11, because it contains the string "#worldpol", which matches #. pol. We've hit one of the problems with regexps: it's really easy to match more than we intended to. Which will match all whatever-pol hashtags: #. pol > You suck #auspol > I hate people #nswpol > You are worse than germs #vicpol > Gold for Australia! woo! #worldpoledancing More precise about this when we get to groups.) Is a portion of a regexp which makes sense as a regexp on its own. The simplest way to put it is that a part and other specialĬharacters or character combinations. I should clarify what "part" means a little here: an ordinaryĬharacter is a part, but so is a full stop. If we follow part of a regexp by it will try to match against It's at this point that regexps start getting genuinely powerful. Have to make a series of mutes like #.pol The regexp fails to weed out tweets with my (invented) "#scotlandpol" hashtagīecause there are too many characters between "#" and "pol". #.pol > Down with things! #auspol > Feelpinions! #nswpol > Bikies! #qldpol Independence now! #scotlandpol #.pol matches a "#" followed by any three characters followed by is a wildcard: it will match any character. If you use any special characters when typing in a keyword mute, a switch to turn on regular expressions will appear under the keyword field. Tweetbot's regexp functionality is tucked away. Matching nothing but ordinary characters is exactly the same as a keyword match. Note that although letters and numbers are all ordinary characters, many symbols and punctuation marks are "ordinary" (in the regexp sense) too, like hash "#". Matching against are highlighted like: this. The actual characters which the regexp is (with a > next to them) and tweets that don't match in dark red. ![]() In the examples, the regexp itself appears in blue, matched tweets are in green #auspol > You leftards all suck #auspol Get bent #nswpol ![]() An ordinaryĬharacter matches itself, so if a regexp is just a sequence of ordinaryĬharacters, it will match any tweet containing that entire sequence. Letters and numbers are ordinary, as are some punctuation marks. The simplest kind of "part" is an ordinary character. We'll flesh this out by building a regexp which will stop me from seeing tweets with hashtags like #auspol. To try to explain what matching means, I'm going to start with an irritatingly opaque definition:Ī string matches a regexp if every part of the regexp matches a part of the string in the right order. In the context of Tweetbot's mute feature, the regexp behaves like a special keyword filter which is used to test the tweets in your timeline: any tweet that matches is muted. Regexp basicsĪ regexp is a string which is used to test other strings to see if they either match or don't match a pattern. It should give you a basic understanding of the syntax of regexps, explain how a couple of my filters work and allow you to start building your own. This tutorial isn't a complete description of regular expressions - people have written whole books on that. It occurred to me that they could form a quick introductory tutorial to regexps in general.įirst, an unavoidable jargon intermission: I'm going to use the programming term "string" to refer to any sequence of characters, because non-jargon terms like "word" or "sentence" are not general enough and would just confuse things, and otherwise we'll all get sick of me repeating the phrase "sequence of characters". Regular expressions, or regexps, are an incredibly handy way to match patterns in text: they date back to the earliest years of computing and are available in most programming languages.Ībout a month ago, I posted a regexp to filter out tweets about a certain Australian drug smuggler, and I was asked if I had any other examples. My favourite Twitter client, Tweetbot, allows muting not just by keyword but by regular expressions. Improve your Twitter experience with regular expressions Twitter regexp muting Twitter regexp muting ![]()
0 Comments
Leave a Reply. |