MODULES Uize.Str.Split
- Contents
- 1. Introduction
-
2. Static Methods
-
2.1. Uize.Str.Split.split
- 2.1.1. Why Not Use the split Instance Method?
-
2.1.2. Examples
- 2.1.2.1. Splitting Words Delimited by a Semi-colon
- 2.1.2.2. Splitting Words Delimited by One or More Non-Word Characters
- 2.1.2.3. Splitting A Multi-line String Into Separate Lines
- 2.1.2.4. Splitting Using a Regular Expression And Getting Captures
- 2.1.2.5. Splitting Using a Regular Expression And Ignoring Captures
- 2.1.3. Splitter Ommitted From Result
- 2.1.4. Compensates for Poor Implementations
- 2.2. Uize.Str.Split.splitInTwo
-
2.1. Uize.Str.Split.split
- 3. Static Properties
1. Introduction
The Uize.Str.Split
module provides some utility methods for splitting strings.
DEVELOPERS: Chris van Rensburg
1.1. Examples
There are no dedicated showcase example pages for the Uize.Str.Split
module.
SEARCH FOR EXAMPLES
Use the link below to search for example pages on the UIZE Web site that reference the Uize.Str.Split
module...
SEARCH
1.2. Implementation Info
The Uize.Str.Split
module defines the Uize.Str.Split
package under the Uize.Str
namespace.
1.2.1. Features Introduced in This Module
The features listed in this section have been introduced in this module.
STATIC METHODS
Uize.Str.Split.split
| Uize.Str.Split.splitInTwo
STATIC PROPERTIES
1.2.2. Features Overridden in This Module
No features have been overridden in this module.
1.2.3. Features Inherited From Other Modules
This module has no inherited features.
1.2.4. Modules Directly Under This Namespace
There are no modules directly under this namespace.
1.2.5. Unit Tests
The Uize.Str.Split
module is unit tested by the Uize.Test.Uize.Str.Split
test module.
2. Static Methods
2.1. Uize.Str.Split.split
Splits the specified string value into an array of string elements, using the specified splitter string or regular expression.
SYNTAX
splitPartsARRAY = Uize.Str.Split.split (sourceSTR,splitterSTRorREGEXP);
2.1.1. Why Not Use the split Instance Method?
As you may be aware, JavaScript's built-in String
object provides a split
instance method.
Unfortunately, this method has poor implementations in some JavaScript interpreters that may lead to well written code behaving inconcistently and exhibiting buggy behavior in the faulty interpreters. The Uize.Str.Split.split
method compensates for poor implementations by providing an implementation that is in strict accordance with the ECMA-262 specification.
2.1.2. Examples
2.1.2.1. Splitting Words Delimited by a Semi-colon
In this example, a string containing a list of fruit names separated by single semi-colons is being split using a string type splitter that is a single semi-colon.
EXAMPLE
fruits = Uize.Str.Split.split ('apple;orange;pear;peach;strawberry;watermelon',';');
After the above statement has been executed, the value of the fruits
variable will be the array ['apple','orange','pear','peach','strawberry','watermelon']
.
2.1.2.2. Splitting Words Delimited by One or More Non-Word Characters
In this example, a string containing a list of fruit names separated by various different delimiters that are one or more non-word characters is being split using a regular expression splitter that matches on one or more non-word characters.
EXAMPLE
fruits = Uize.Str.Split.split ('apple-|-orange,pear;peach<>strawberry...watermelon',/\W+/);
After the above statement has been executed, the value of the fruits
variable will be the array ['apple','orange','pear','peach','strawberry','watermelon']
.
2.1.2.3. Splitting A Multi-line String Into Separate Lines
In this example, a multi-line string is being split using a regular expression that supports a variety of different EOL (End Of Line) styles.
EXAMPLE
lines = Uize.Str.Split.split ('line 1\rline 2\nline 3\r\nline 4',/\r\n|[\r\n]/);
After the above statement has been executed, the value of the lines
variable will be the array ['line 1','line 2','line 3','line 4']
. The regular expression being used to split the multi-line string supports three different EOL styles: a carriage return (the '\r'
character) followed by a line feed (the '\n'
character), just a single carriage return character, or just a single line feed character.
2.1.2.4. Splitting Using a Regular Expression And Getting Captures
Because the Uize.Str.Split.split
method is a strict implementation of the ECMA-262 specification for the split
instance method of JavaScript's String
object, it supports including the regular expression captures in the returned array.
So, for example, if we were splitting a multi-line string into separate lines and wanted to capture the specific line ending characters used for each of the lines (they may be inconcistent across all the lines of the multi-line string), then we can use the unique behavior of the Uize.Str.Split.split
method as follows...
EXAMPLE
lines = Uize.Str.Split.split ('line 1\rline 2\nline 3\r\nline 4',/(\r\n|[\r\n])/);
After the above statement has been executed, the value of the lines
variable will be the array ['line 1','\r','line 2','\n','line 3','\r\n','line 4']
. Because the entire splitter regular expression is inside a capture (i.e. the parentheses), the entire matched splitter is included in the returned array for each line of the multi-line string. When the Uize.Str.Split.split
method builds up the result array, it follows the array element for each split part with elements for all the captures in the regular expression, in the order in which the captures occur in the regular expression.
2.1.2.5. Splitting Using a Regular Expression And Ignoring Captures
Because the Uize.Str.Split.split
method includes captures from a regular expression splitter in the returned array, an extra step is needed if you wish to use parentheses for grouping in a regular expression but don't wish the captures to be included in the result array.
EXAMPLE
words = Uize.Str.Split.split ('solar<_-_>power<_-_-_>will<_-_-_-_>win',/<(?:-=)+->/);
After the above statement has been executed, the value of the words
variable will be the array ['solar','power','will','win']
. The regular expression is using a group to allow matching of one or more of the substring '-='
. However, we don't want those matched characters to pollute the result array - we only want the words that are split out from the string. To accomplish this, we use a feature of regular expressions that allows a group to not be treated as a capture, simply by prefixing the contents of the group expression (i.e. the stuff inside the group's parentheses) with the special characters ?:
- this tells the regular expression engine to not capture the characters matched by the group.
2.1.3. Splitter Ommitted From Result
The splitter string or regular expression match is not included in the string elements of the returned array.
So, for example, the statement Uize.Str.Split.split ('foo#bar','#')
would return the array value ['foo','bar']
- the splitter, which is a '#'
(pound) character string literal in this case, is stripped from the values of the returned array elements.
With a regular expression splitter, the entire substring matched by the regular expression will be omitted. So, for example, the statement Uize.Str.Split.split ('foo####bar',/#+/)
would return the array value ['foo','bar']
- the splitter, which is a /#+/
(one or more pound characters) regular expression in this case, strips out all the contiguous pound characters from the values of the returned array elements.
The only way to include the substring matched by a splitter is to use a regular expression splitter and to enclose the entire regular expression in parentheses - this invokes the behavior of including regular expression captures in the result array. The matched substrings are still not included as part of the split values, but as separate elements of the result array - between the elements for the split values (see the example Splitting Using a Regular Expression And Getting Captures).
2.1.4. Compensates for Poor Implementations
The Uize.Str.Split.split
method is implemented in strict accordance with the ECMA-262 specification (i.e. the JavaScript language specification).
The Uize.Str.Split.split
method addresses poor implementations of the split
instance method of JavaScript's built-in String
object in some JavaScript interpreters, such as Microsoft's JScript interpreter that is used by Internet Explorer and WSH (Windows Script Host). Specifically, the Uize.Str.Split.split
method addresses two known issues when using a regular expression splitter: incorrect dropping of empty split values and incorrect omission of captures in the result array.
2.1.4.1. Incorrect Dropping of Empty Split Values
Microsoft's JScript interpreter exhibits an issue where empty split values are omitted when a regular expression splitter is used (but not when a string splitter is used).
EXAMPLE
result = 'foo,,bar'.split (/,/);
In the above example, a string is being split using a regular expression splitter that matches a single comma. In compliant JavaScript interpreters, the above statement would produce a result array with the value ['foo','','bar']
- exactly the same result as if you used a simple string splitter (i.e. 'foo,,bar'.split (',')
).
For a reason that is hard to fathom, the JScript interpreter omits the second empty string element to produce, instead, the result ['foo','bar']
. It's hard to justify or defend this implementation choice, as it wreaks havoc with using the split
instance method to parse lists of values that were serialized using the Array
object's join
instance method, and where some of the values were empty strings.
The Uize.Str.Split.split
method fixes this issue, so it can be used in Internet Explorer and WSH (Windows Script Host) to safely split strings using a regular expression splitter.
2.1.4.2. Incorrect Omission of Captures in the Result Array
While the split
instance method of JavaScript's built-in String
object is supposed to include captures from a regular expression splitter in the returned array, this behavior is not supported by some JavaScript interpreters - notably Microsoft's JScript interpreter.
This means that the statement 'line 1\rline 2\nline 3\r\nline 4'.split (/(\r\n|[\r\n])/)
would return the result array ['line 1','line 2','line 3','line 4']
in the JScript interpreter, and not the array ['line 1','\r','line 2','\n','line 3','\r\n','line 4']
as it should. The Uize.Str.Split.split
method fixes this issue, so it can be used in Internet Explorer and WSH (Windows Script Host) to safely split strings using a regular expression splitter.
NOTES
compare to the Uize.Str.Split.splitInTwo static method |
IMPLEMENTATION INFO
this feature was introduced in this module |
2.2. Uize.Str.Split.splitInTwo
Returns an array of exactly two elements, representing the two segments of the specified source string after splitting it using the specified splitter string.
SYNTAX
twoPartsARRAY = Uize.Str.Split.splitInTwo (sourceSTR,splitterSTR);
EXAMPLE
var nameValue = Uize.Str.Split.splitInTwo ('TITLE: The Matrix: Reloaded',': ');
In the above example, the nameValue
variable would be left with the array value ['TITLE','The Matrix: Reloaded']
. In contrast, the built-in split
method of the String
object would produce the three element array ['TITLE','The Matrix','Reloaded']
when splitting the above string using ': '
.
If the splitter string is not found within the source string, then the returned array will contain the entire source string for its first element, and an empty string for its second element.
EXAMPLE
var nameValue = Uize.Str.Split.splitInTwo ('TITLE',': ');
In the above example, the nameValue
variable would be left with the array value ['TITLE','']
.
VARIATION
twoPartsARRAY = Uize.Str.Split.splitInTwo (sourceSTR,splitterREGEXP);
When a splitterREGEXP
parameter is specified, the sourceSTR
value will be split on the regular expression, and the two resulting parts will exclude the substring that was matched by the splitter regular expression.
EXAMPLE
var nameValue = Uize.Str.Split.splitInTwo ('TITLE : The Matrix: Reloaded',new RegExp ('\\s*:\\s*') ;
In the above example, the nameValue
variable would be left with the array value ['TITLE','The Matrix: Reloaded']
. In this case, the regular expression specified for the splitterREGEXP
parameter matches a substring of any number of spaces, followed by a colon, followed by any number of spaces - in other words, a colon with optional padding. The two parts of the result will not include the whitespace padding around the colon, since it was part of the splitter match.
IMPLEMENTATION INFO
this feature was introduced in this module |