Basics Blocks
Composer
Composer is the main block where everything happens in it. You can't add or remove it, what you do must be placed inside the compose block, Orchestra looks for blocks inside of the Composer Block. This also means that anything outside of the Composer block is being ignored.Comment
Just as in a textual programming language where you can put text that has no effect on the behavior of the program, Comment Block in Orchestra provides the same functionality.Input Blocks
Plain Text
Plain Text block is to insert plain text. It encodes the special characters that are RegEx syntax as well as encoding any character that is not basic english alphabet into Unicode escapes and places the entire text into a sequence group ((?: ) in wildcard format) if needed. hence the entire text is being treated safely as one group.
Example 1
The following code compiles to/hello world\! How are you\? \uD83D/
Unsafe Wildcard
Unsafe Wildcard Unsafe is used to inject wildcard grammar into the final Orchestra. Orchestra targets JavaScript's RegEx implementation which is a very standard implementation by the way, However other languages have some extra RegEx goodies. In order for you to have those goodies inside of the Orchestra, you may use this unsafe block and add whatever text you want to the Orchestra at any position.Example
ECMAScript (as even in the latest drafts) has no support for Look-behind thus as our strategy Orchestra does not provide a block for look-behind too. However if targeting special RegEx engines that support lookbehind you can inject a wildcard like this that compiles to/(?<=ABC)/.
Character Sets
Alphabet
Alphabet block lets you define a basic character set. When a RegEx engine reaches a character set it checks if the character is a member of that character set. Alphabet has some predefined subsets that you can use to save some time:
-
0-9: Numerals. Includes: [
0,1,2, ...,9] - a-z: Lowercase English letters from
atoz - A-Z: Uppercase English letters from
AtoZ
However most of the times you wanna create a set of some custom characters. You can insert those characters into the other box of the alphabet.
RegEx Specs
Alphabets compiles into wildcard via the[ ] grammar.
Example 1
Imagine you wanna find the sentence ending marks: [?, !, ., ; ]. This Orchestra can be useful. It compiles to /[\.\!,;]/ in wildcard format.
Example 2
The following option based Orchestra works exactly the same as the Example 1. However using Alphabets you write less code and RegEx engines get to optimize your Orchestras so much better.Range
Range block defines a character set from one character to anther. So any character between those 2 would be included in the set. The characters order is defined with their ASCII / Unicode key.
RegEx Specs
A Range block compiles to a wildcard alphabet in form of[s-e] in witch the `s` is the start ond `e` is the end.
Example
Imagine you wanna find all numbers from 4 to 8. This Orchestra can find them. Wildcard compiled version of this Orchestra is:/[4-8]/.
Anything But
Anything But block is the opposite of Alphabet block. It matches any character but the characters defined it it's set. For using it please read the documentation of Alphabet block.
RegEx Specs
Anything But compiles to wildcard just like the way Alphabet compiles however it uses[^ instead of [ for it's starting to indicate the exclude point.
Example
This Orchestra finds the sequences with noQ / q characters in them. This Orchestra compiles to wildcard: /^qQ/.
Sigma
Using Alphabet you can define many powerful sets, however in the Alphabet you can't define sets with wildcard escape sequences or custom ranges. Sigma let's you define advanced sets with all these parts. There are 3 blocks designed to be inside of Sigma and Exclude Set.
-
Sigma Custom Range
With this block you can define ranges in your code that are not just0-9,a-zandA-Z. For more please read the documentation for Range block.
-
Sigma Characters
This let's you add custom characters to the set. All characters are safe and are being encoded like the plain text.
-
Sigma Wildcard Escapes Unsafe
To insert very special characters of those that need to be inserted as escaped wildcard characters you can use this to include them in the set.
Example
Imagine we want to find an integer number with all digits but5 and 6. This Orchestra compiles to /[1-47-9]+/ and finds exactly that
Exclude Set
The relationship between Sigma and the Exclude Set is exactly the same way with Alphabet and Anything But. So basically Exclude Set acts the opposite way Sigma works. It matches anything but the characters defined within it's set.
Control Structures
One or more
One or more block matches one or more than one sequence of letters following by each other.
RegEx Specs
One or More is the Kleene plus which means in the wildcard format you'd write it as+
Example
The following Orchestra compiles to wildcard/a+/ and matches one or more of the letter 'a' followed by each other. So "a", "aa", "aaaaaa" are all matched. In the string "abaa" however, It will match two substrings as "a" and "aa".
Any number of
Any number of block, matches any number of it's value. Just like "One or more" it matches one or more than one of a given value, but it also it matches "zero" number of the given value. That statement means the sequence can be optional.
RegEx Specs
Any number of is the Kleene star which means in the wildcard format you'd write it as*
Example
The following Orchestra compiles to wildcard/ab*a/ and matches at least "aa" but also "aba", "abbbbba".
Maybe
Maybe makes it's value optional. So the value can either appear or not appear.
RegEx Specs
Maybe compiled to wildcard using the? operator.
Example 1
The following Orchestra compiles to wildcard/ab?c/ and matches either "abc" or "ac".
Example 2
These two Orchestras are equivalent to each other.Options
The combination of One of Options and Option blocks makes it possible for you to define multiple possibilities in your code. Each time the Orchestra reaches the One of options block one of it's options must happen. Remember that the Option block can only be used inside of the One of Options block and One of Options block can not contain anything but Option blocks. (This approach lets you create more sophisticated options with clear borders.)
RegEx Specs
"Any number of" is the Kleene star which means in the wildcard format you'd write it as*
Example
This Orchestra finds different ways to say"Hello World" like "Howdy World" and so on...
Remember Match
Remember Match is used to capture a portion of your Orchestra formula and save it to a give memory.
RegEx Specs
Remember Match compiles to wildcard by placing( ) over it's value.
Example
Imagine you wanna find the Top level domain name and the website's address using a URL of formwww.websitename.domain. This Orchestra compiles to /www\.((?:[^\.])+)\.((?:[^\.])+)/ and in a JavaScript environment you can use $1 and $2 to find website name and domain name.
Lookahead
Lookahead accepts a given sequence only if it was being followed by it's lookahead part. By check marking the Don't Accept, it will reverse the logic so that it will accept it's value if it's not being followed by it's lookahead.
RegEx Specs
Lookahead compiles compiles to wildcard by adding a lookahead grammar after it's value as(?=) and (?!) for the reverse logic. Look at the example for the full understanding of that.
Example
In this lookahead we want to find all the"Regular" strings that are being in the "Regular Expression" strings. You can use match but you can also use this Orchestra. This Orchestra compiles to this wildcard: /(?:Regular)(?=(?:[ \t])+ Expressions)/
Repeat
Repeat 'n' times accepts a sequence only if it is repeated n times.
RegEx Specs
Repeat 'n' times times compiles to(?:value){n}
Example
This Orchestra compiles to/w{3}/ and matches the string "www".
Repeat at Least
Repeat at least 'n' times is like the One or more block but the difference is that you can set the minimum repeat times to anything you want and not just one.
RegEx Specs
Repeat at least 'n' times times compiles to(?:value){n,}
Example
This Orchestra matches any LOL text with at least3 repeats of O. So both of "LOOOL" and "LOOOOOL" are accepted, but "LOOL" is rejected. This Orchestra Compiles to /LO{3,}L/
Repeat in Range
Repeat in range defines a range of repeat that is accepted. You both set a min and max parameters. Then any repeat less or more than the min and max would be rejected.
RegEx Specs
Repeat in range 'min' to 'max' times compiles to(?:value){min,max}
Example
To find WOOOW words that contain at least 3 and at most 6'O' characters in their middle this Orchestra can work. Also it compiles to: /WO{3,6}W/
Special Characters
Any Character
Any Character matches any character but the new line (\n) character.
RegEx Specs
Any Character is the dot sign of wildcard and simply compiles to.
Example
This Orchestra compiles to/.+/ and matches the content of a line.
Basic Whitespace Set
The Basic Whitespace Set block let's you use the 4 basic whitespace characters (Single Non Breaking Space, Tab Character and New Line Feed Character '\n')
RegEx Specs
If one of the chars be selected only it will be exported, so a compilation like\n, for a combination of two, Orchestra compiler generates an alphabet of chars [\t\n] and in case all the 3 chars are selected the compilation would be the RegEx special escape sequence for non-breaking space + tab + new line feed that is the character: \s
Advanced Whitespace Set
Advanced Whitespace Set is a set of whitespace characters that are rarely used thus they are being grouped in a separate block than the Basic Whitespace Set. These characters are:
- Vertical Tab
\v - NUL
\0 - Carriage Return
\r - Form-feed
\f
RegEx Specs
If one character be selected, just the character will be returned for example:\v but for more than one selected character an alphabet of the selected characters will be returned: [\v\r]
Special Sets
Word
Word is a special set that includes ranges from a-z, A-Z, 0-9 and underscore character.
RegEx Specs
Word in wildcard is\w
Example
A Word block is basically an Alphabet block with all of it's ranges set and a underscore in special characters which compiles to /[0-9a-zA-Z_]/ wildcard.
Anything but Word
Anything but Word is the exclude set of Word block which means anything not in ranges of a-z, A-Z, 0-9 and underscore character.
RegEx Specs
Anything but Word in wildcard is\W
Anything but Word
Anything but Word is the exclude set of Word block which means anything not in ranges of a-z, A-Z, 0-9 and underscore character.
RegEx Specs
Anything but Word in wildcard is\W
Boundaries
Start of the Line
Start of the Line Matches the very beginning of the a line.
RegEx Specs
Start of the Line in wildcard is^
Example
A very common use of Start of the Line and End of the Line is when you want to exactly match a RegExp within your test sample. For example imagine if your test sample is "aaabbbbaaa" and your RegExp is the one you see in the right; What you want is to see if your string is only made of a.
In JavaScript you would do /a+/.test("aaabbbbaaa") to test your string against the RegExp that we just made, result is however not what you expect, it is true.
That simply is due to the fact that RegExp test functions "Match Partially" within your code so in this string there are two sequences containing a row of a characters and therefore RegExp engines returns true.
To overcome this problem you may use Start of the Line and End of the Line to indicate that you want the whole string to be match with your RegExp and not only a partial part of it.
This Orchestra that compiles to /^a+$/ ensures that your Orchestra is testing the full string and so it's result when testing aaabbbbaaa is false.
End of the Line
End of the Line matches the very end of the a line.
RegEx Specs
Start of the Line in wildcard is$
Word Boundary
Word Boundary Matches a word boundary. This is the position where a word character is not followed or preceded by another word-character, such as between a letter and a space. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero.
RegEx Specs
Word Boundary in wildcard is\b
Example 1
When this Orchestra matches the "hello world" string it remembers 3 groups: h, the space between hello and world and w.
Example 2
This Orchestra does not match the string: "foobar" because foo is followed by b which is a counted as a word character.
Example 3
This Orchestra will never match anything because it's impossible for a word boundary to exist between two words.
Anything but Word Boundary
Anything but Word Boundary matches a non-word boundary. This is a position where the previous and next character are of the same type: Either both must be words, or both must be non-words. Such as between two letters or between two spaces. The beginning and end of a string are considered non-words. Same as the matched word boundary, the matched non-word boundary is also not included in the match.
RegEx Specs
Anything but Word Boundary in wildcard is\B
Example