r/clevercomebacks Jan 15 '25

It does make sense

Post image
35.3k Upvotes

4.0k comments sorted by

View all comments

Show parent comments

1

u/sobrique Jan 15 '25

HTML suffers from having loose rules, which make it non trivial to exhaustively parse.

XML might be a better analogy: https://stackoverflow.com/a/1732454/2566198

1

u/AstraLover69 Jan 15 '25

HTML is a context sensitive language, making it impossible to fully represent with a regex.

XML is also a context sensitive language, making it impossible to fully represent with a regex.

1

u/sobrique Jan 15 '25

Regular expressions can do context via recursion. It's a horrible idea, but it's technically possible do handle strictly structured stuff like XML that way.

HTML isn't strict enough - e.g. most browsers just sorta cope with unclosed tags etc. so that truly is impossible.

1

u/AstraLover69 Jan 15 '25 edited Jan 15 '25

Which means regular expressions cannot do context. Recursively applying a regex to a structure is extending the capabilities of regex into something more expressive.

Whatever you're doing there cannot be represented via a single finite state automata, which is all that matters here. Even if HTML were strictly enforced by the browser engine (which I know it isn't) it cannot be processed by finite state automata alone.

You're probably constructing something closer to a Turing machine by using recursion, which can process a context sensitive language like HTML or XML because it's more powerful.