javascript - RegEx to split on commas, but excluding those within braces, brackets, and parenthesis -
javascript - RegEx to split on commas, but excluding those within braces, brackets, and parenthesis -
i'm trying parse comma-separated list, while omitting commas fall within inner structures defined braces, brackets, or parenthesis. example, string:
'text:firstname,css:{left:x,top:y},values:["a","b"],visible:(true,false),broken:["str", 1, {}, [],()]'
should split as:
text:firstname css:{left:x,top:y} values:["a","b"] visible:(true,false) broken:["str", 1, {}, [],()]
so far, i've got following... close breaks on nested structures:
[^,\[\]{}]+(({|\[)[^\[\]{}]*(}|\]))?
any help appreciated!
unless willing alter info format, or can find easy way turn proper json after receiving, best bet parsing manually.
the simplest matcher (assumes "nice" values):
class="lang-none prettyprint-override">on ([{ - increment parens on )]} - decrement parens or emit error if parens 0 on , - emit , reset buffer if parens 0 (finish match) if not , - force output buffer
this doesn't work "ugly" strings (quoted parens, escaped quotes, escaped escapes...). parser should parse valid input correctly, while still beingness relatively simplistic:
class="lang-none prettyprint-override">on ([{ - increment parens if state "start". force buffer. on )]} - decrement parens if state "start" , parens positive. emit error if parens zero. force buffer. on , - emit , reset buffer if parens 0 , state "start" (finish match). force buffer. on \ - force buffer, , force , read next symbol well. on ' - if state "start", alter state "squote", , vice versa. force buffer. on " - if state "start", alter state "dquote", , vice versa. force buffer. on eof - emit error if parens not 0 or state not "start".
here's sketch of implementation in javascript:
function splitliteralbodybycommas(input){ var out = []; var ilen = input.length; var parens = 0; var state = ""; var buffer = ""; //using string simplicity, array might faster for(var i=0; i<ilen; i++){ if(input[i] == ',' && !parens && !state){ out.push(buffer); buffer = ""; }else{ buffer += input[i]; } switch(input[i]){ case '(': case '[': case '{': if(!state) parens++; break; case ')': case ']': case '}': if(!state) if(!parens--) throw new syntaxerror("closing paren, no opening"); break; case '"': if(!state) state = '"'; else if(state === '"') state = ''; break; case "'": if(!state) state = "'"; else if(state === "'") state = ''; break; case '\\': buffer += input[++i]; break; }//end of switch-input }//end of for-input if(state || parens) throw new syntaxerror("unfinished input"); out.push(buffer); homecoming out; }
this parser still has flaws:
it allows closing parens braces et al. solve this, create parens
stack of symbols; if opening , closing symbol don't match, raise exception.
it allows malformed unicode-escaped strings. \utest
accepted parser.
it allows top-level comma escaped. not fault: \,,\,
valid string, containing 2 top-level escaped commas separated unescaped one.
a trailing backslash produces unexpected output. again, fixed reading info we're escaping. easier prepare buffer += input[++i] || ''
(append empty string instead of undefined
, allows invalid input.
it allows sorts of other invalid input: [""'']{'\\'}"a"
example. prepare need improve (more comlex) grammar, , accompanyingly more complex parser.
having said that, isn't improve utilize json transmitting data?
option 1: real object: {"text":"firstname", "css":{
... alternative 2 (only if really wish so): array of strings: ["text:firstname, css:{
...
in both cases, json.parse(input)
friend.
javascript regex
Comments
Post a Comment