javascript - RegEx to split on commas, but excluding those within braces, brackets, and parenthesis -



javascript - RegEx to split on commas, but excluding those within braces, brackets, and parenthesis -

i'm trying parse comma-separated list, while omitting commas fall within inner structures defined braces, brackets, or parenthesis. example, string:

'text:firstname,css:{left:x,top:y},values:["a","b"],visible:(true,false),broken:["str", 1, {}, [],()]'

should split as:

text:firstname css:{left:x,top:y} values:["a","b"] visible:(true,false) broken:["str", 1, {}, [],()]

so far, i've got following... close breaks on nested structures:

[^,\[\]{}]+(({|\[)[^\[\]{}]*(}|\]))?

any help appreciated!

unless willing alter info format, or can find easy way turn proper json after receiving, best bet parsing manually.

the simplest matcher (assumes "nice" values):

class="lang-none prettyprint-override">on ([{ - increment parens on )]} - decrement parens or emit error if parens 0 on , - emit , reset buffer if parens 0 (finish match) if not , - force output buffer

this doesn't work "ugly" strings (quoted parens, escaped quotes, escaped escapes...). parser should parse valid input correctly, while still beingness relatively simplistic:

class="lang-none prettyprint-override">on ([{ - increment parens if state "start". force buffer. on )]} - decrement parens if state "start" , parens positive. emit error if parens zero. force buffer. on , - emit , reset buffer if parens 0 , state "start" (finish match). force buffer. on \ - force buffer, , force , read next symbol well. on ' - if state "start", alter state "squote", , vice versa. force buffer. on " - if state "start", alter state "dquote", , vice versa. force buffer. on eof - emit error if parens not 0 or state not "start".

here's sketch of implementation in javascript:

function splitliteralbodybycommas(input){ var out = []; var ilen = input.length; var parens = 0; var state = ""; var buffer = ""; //using string simplicity, array might faster for(var i=0; i<ilen; i++){ if(input[i] == ',' && !parens && !state){ out.push(buffer); buffer = ""; }else{ buffer += input[i]; } switch(input[i]){ case '(': case '[': case '{': if(!state) parens++; break; case ')': case ']': case '}': if(!state) if(!parens--) throw new syntaxerror("closing paren, no opening"); break; case '"': if(!state) state = '"'; else if(state === '"') state = ''; break; case "'": if(!state) state = "'"; else if(state === "'") state = ''; break; case '\\': buffer += input[++i]; break; }//end of switch-input }//end of for-input if(state || parens) throw new syntaxerror("unfinished input"); out.push(buffer); homecoming out; }

this parser still has flaws:

it allows closing parens braces et al. solve this, create parens stack of symbols; if opening , closing symbol don't match, raise exception.

it allows malformed unicode-escaped strings. \utest accepted parser.

it allows top-level comma escaped. not fault: \,,\, valid string, containing 2 top-level escaped commas separated unescaped one.

a trailing backslash produces unexpected output. again, fixed reading info we're escaping. easier prepare buffer += input[++i] || '' (append empty string instead of undefined, allows invalid input.

it allows sorts of other invalid input: [""'']{'\\'}"a" example. prepare need improve (more comlex) grammar, , accompanyingly more complex parser.

having said that, isn't improve utilize json transmitting data?

option 1: real object: {"text":"firstname", "css":{... alternative 2 (only if really wish so): array of strings: ["text:firstname, css:{...

in both cases, json.parse(input) friend.

javascript regex

Comments

Popular posts from this blog

web services - java.lang.NoClassDefFoundError: Could not initialize class net.sf.cglib.proxy.Enhancer -

Accessing MATLAB's unicode strings from C -

javascript - mongodb won't find my schema method in nested container -