Go Back   Cockos Incorporated Forums > REAPER Forums > ReaScript, JSFX, REAPER Plug-in Extensions, Developer Forum

Reply
 
Thread Tools Display Modes
Old 11-21-2014, 08:06 PM   #1
Argitoth
Human being with feelings
 
Argitoth's Avatar
 
Join Date: Feb 2008
Location: Mesa, AZ
Posts: 2,057
Default Thinking about building regex-like string functions for EEL from the ground up

I created a cool string function, see details here: http://stackoverflow.com/questions/2...tring-function

and I am thinking of expanding it to have regex-like syntax. Why not exactly regex? Because I think I can do better than regex (in terms of making it more useful in a programming environment). The BIG difference I will be introducing, as detailed in the link above, is a bidirectional string matching system, where you can move forwards and backwards through the string to find matches, and the position is saved for the next series of matches. I thiiiink the "bidirectional" aspect of this removes the need for backreferencing...? So here is my idea for syntax:
Code:
START/END and other extended ASCII characters:
() = start/end of capture constructor                  ()
{} = start/end capture constructor medata              ?(?P<name>)
^  = start/end of string                               ^$
>< = start/end of line                                 ^$
·  = start/end class constructor                       []
¦  = start/end literal constructor                     \
•  = OR (inline)                                       |
¬  = TO (class/inline)                                 -
¬  = start/end do not capture (group/inline)           (?:)
`  = start/end do not capture (outline)                (?:)
ˆ¯ = start/end NOT (inline)                            ^
ªª = literal ª (inline)                                \
ª• = literal • (inline)                                \
ªa = literal a (outline)                               \
   = literal space (inline)       
˜  = 0 or 1           (inline)                         ?
‹  = 0 or more (lazy) (inline)                         *?
«  = 0 or more (grdy) (inline)                         *
›  = 1 or more (lazy) (inline)                         +?
»  = 1 or more (grdy) (inline)                         +   
©  = 1 of any character (inline)                       .
©° = 0 or 1 of any character (inline)                  .?
©‹ = 0 or more of any character (lazy) (inline)        .*?
©› = 1 or more of any character (lazy) (inline)        .+?
©« = 0 or more of any character (grdy) (inline)        .*
©» = 1 or more of any character (grdy) (inline)        .+

N TIMES SPECIFIERS (outline):
?   = 0 or 1                    ?
*   = 0 or more (lazy)          *?
**  = 0 or more (grdy)          *
+   = 1 or more (lazy)          +?
++  = 1 or more (grdy)          +
2*  = 2 or more (lazy)          {2,}*?
2+  = 2 or more (grdy)          {2,}*
*2  = 0 to 2 matches (lazy)     {,2}?
1+2 = 1 to 2 matches (grdy)     {1,2}*
2   = exactly 2 matches         {2}
.   = 1 of any character        .
~   = 0 or 1 of any character   .?
:   = 0 or more of any character (lazy)
;   = 1 or more of any character (lazy)
::  = 0 or more of any character (grdy)
;;  = 1 or more of any character (grdy)

FUZZY MATCH SPECIFIERS (outline):
@ = FULL word                  [a-zA-Z]+
# = FULL number                ([+-]?(?:\d\.\d+|\d+|\.\d+|\d+\.))
& = FULL midi note name        [A-G]#?(?:-2|-1|[0-8])
$ = FULL sentence              [a-zA-Z,;'" \t]+[?!.]+[\t ]+
% = FULL path                  [a-zA-Z]:\\[^<>:"/|?*]*\\[^<>:"/|?*\\]*\.[^<>:"/|?*\\\s]*
_ = space 
- = not space                  [^ ]
t = tab                        \t
T = not tab                    [^\t]                      
h = horizontal space           [\t ]
H = not horiz space            [^\t ]
v = vertical space             [\r\n]
V = not vertical space         [^\r\n]
w = white space                \s
W = not white space            \S
n = numeral                    \d
N = not a numeral              \D
a = alphabet                   [a-zA-Z]
A = not alphabet               [^a-zA-Z]
s = symbol                     [`~!@#$%^&*()-_=+\\|\]\}\[\{;:'"/?.>,<'\}\]]
S = not symbol                 [^`~!@#$%^&*()-_=+\\|\]\}\[\{;:'"/?.>,<'\}\]]
| = OR                         |
!= = start/end NOT             ^
/ = literal
\ = literal

CLASS CONSTRUCTOR (inline):
·a¬c    = a TO c              [a-c]
·ˆa¬c   = NOT a TO c          [^a-c]
·abc    = a OR b OR c         [abc]
·ˆabc   = NOT a NOT b NOT c   [^abc]
·ˆa¯bc  = NOT a, b OR c       [^ad-zA-Z`~!@#$%^&*()-_=+\\|\]\}\[\{;:'"/?.>,<'\}\]]

LOGIC SPECIFIERS (inline):
¦ab•cd        = ab OR cd       ab|cd
¦ˆab¯         = NOT ab

LOGIC SPECIFIERS (outline):
dd|ad        = digit digit OR alphbelt digit  \d\d|[a-zA-Z]\d
!dd=         = NOT digit digit

CAPTURE SPECIFIERS:                           
(¦¬abc)       = do not capture                             (?:abc)
(¦¬ab¬cd¬e¬f) = match abcdef, capture only cdf   
(grp}ab)      = named group match ab                       (?P<grp>ab)
{1}           = match by whetever is stored in group 1
{grp}         = match by whetever is stored in group "grp" (\k<grp>)

?FLAGS? = multiple flags        (?FLAGS)            
?i      = case insensitive      (?i) 
?I      = insensitive off       (?i) default
?c      = continous             (?g) default
?C      = continous off

OUTPUT SYNTAX:
¦     = start/end group output
¦-    = entire string                    $0
¦-,0  = entire string and all substrings $0$1$2...
¦0,-  = all substrings and entire string $0
¦0    = all substrings                   $1$2$3...
¦0}-¦  = each substring separated by '-'  $1,$2,$3...
¦12   = group 12                         
¦2,3  = group 2 and 3                    $2$3
¦+2   = group 0 to 2                     $1$2$3
¦2+   = group 2 and up                   $2$3$4...
¦1+3  = group 1 to 3                     $1$2$3
¦grp  = group "grp"                     

SPECIAL FUNCTIONS::
function.sentence_body = "characters"
function.sentence_end  = "characters"
function.word          = "characters"
function.alphanumeric  = "characters"
funciton.number_signmode  = 0 or 1 or 2 or 3
    0: both + and -
    1: only -
    2: only +
    3: no signs allowed
funciton.number_matchmode = 0 or 1
    0: consume everything that matches
    1: do not consume ending '.' (because it is a period)
function.number_decimalchar = '.' or ',' is acceptable
function.number_bignumbermode = 0 or 1
    0: do not caputre big numbers e.g. 1,023,420
    1: capture big numbers
function.number_plus = 0 or 1
    0: convert + to null
    1: do not convert
    2: + means not a number
function.path = 0 or 1 or etc.
    0: filesystem_type1
    1: filesystem_type2
    2: etc.

note: spaces are only literal when inline or in group metadata

input:  There are 354 three numbers 50 in this 222 string.
MATCH:  (,}`:`d+)3
LOGIC:  Separate matches by comma in group 0, match 0 or more of any char (lazy) (no capture) follow by 1 or more of number char (lazy). Loop 3 times/
SYNTAX: ¦0
LOGIC:  Output group 0.
output: 354,50,222
__________________
Soundemote - Home of the chaosfly and pretty oscilloscope.
MyReaperPlugin - Easy-to-use cross-platform C++ REAPER extension template

Last edited by Argitoth; 11-25-2014 at 12:09 PM.
Argitoth is offline   Reply With Quote
Old 11-22-2014, 01:00 AM   #2
IXix
Human being with feelings
 
Join Date: Jan 2007
Location: mcr:uk
Posts: 3,891
Default

My only thought is that since regex is moderately standardised and well documented it would be easier for people to use if you stick to it. If you implement standard regex syntax then I can't see why you couldn't add a method of searching backwards too.
IXix is offline   Reply With Quote
Old 11-22-2014, 04:49 AM   #3
Argitoth
Human being with feelings
 
Argitoth's Avatar
 
Join Date: Feb 2008
Location: Mesa, AZ
Posts: 2,057
Default

noted! It'd be a miracle if I implemented even 1/10th of the functionality, maybe that's due to lack of confidence. Mostly I'm just writing down my ideas.

Edit: Will release a basic string function soon.
__________________
Soundemote - Home of the chaosfly and pretty oscilloscope.
MyReaperPlugin - Easy-to-use cross-platform C++ REAPER extension template

Last edited by Argitoth; 11-23-2014 at 09:53 AM.
Argitoth is offline   Reply With Quote
Old 11-23-2014, 12:49 PM   #4
Argitoth
Human being with feelings
 
Argitoth's Avatar
 
Join Date: Feb 2008
Location: Mesa, AZ
Posts: 2,057
Default

kThere! I did it! The ultimate syntax for matching strings. A little more complex than regex, but you can do more with less characters.

The big difference in this syntax is that there is a switch to go from literal to not literal. Everything outside of "literal" switch is a class or some kind of logic or group stuff. Everything inside of "literal" (is literal of course) makes use of ascii characters above 127 for additional match functions (so you don't have to switch out of literal mode constantly).

Why not regex you say? speaking to you IXix Because regex simply does not do what I need!

Another big feature I am introducing in this syntax is better group and capture implementation. You can do a lot with groups! Such as: use groups to separate strings by whatever characters , store any sort of matches in one group, don't store certain matches, named and/or numbered groups.
__________________
Soundemote - Home of the chaosfly and pretty oscilloscope.
MyReaperPlugin - Easy-to-use cross-platform C++ REAPER extension template

Last edited by Argitoth; 11-25-2014 at 12:16 PM.
Argitoth is offline   Reply With Quote
Old 11-23-2014, 01:17 PM   #5
Breeder
Human being with feelings
 
Breeder's Avatar
 
Join Date: Nov 2010
Posts: 2,436
Default

Quote:
Originally Posted by Argitoth View Post
Because regex simply does not do what I need!
Never found a problem I couldn't solve with regex. I guess there's a reason it's so popular
I agree with IXix, reinventing the wheel here is not really user-friendly. Regex can get complicated, and there are even multiple flavors of it. Forcing the user to learn yet another version is not nice
Breeder is offline   Reply With Quote
Old 11-23-2014, 01:39 PM   #6
Argitoth
Human being with feelings
 
Argitoth's Avatar
 
Join Date: Feb 2008
Location: Mesa, AZ
Posts: 2,057
Default

Quote:
Originally Posted by Breeder View Post
Never found a problem I couldn't solve with regex. I guess there's a reason it's so popular
True true... although I think it's popular because it's the only one... is there something else?
__________________
Soundemote - Home of the chaosfly and pretty oscilloscope.
MyReaperPlugin - Easy-to-use cross-platform C++ REAPER extension template
Argitoth is offline   Reply With Quote
Old 06-03-2020, 05:26 PM   #7
amagalma
Human being with feelings
 
amagalma's Avatar
 
Join Date: Apr 2011
Posts: 3,458
Default

I know I am a bit late... but... have you done it?
__________________
Most of my scripts can be found in ReaPack.
If you find them useful, a donation would be greatly appreciated! Thank you! :)
amagalma is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 02:08 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.