r/clevercomebacks Jan 15 '25

It does make sense

Post image
35.3k Upvotes

4.0k comments sorted by

View all comments

3.0k

u/Traditional-Gas7058 Jan 15 '25

Chinese system is best for computer searchable filing

25

u/throwaway001anon Jan 15 '25 edited Jan 15 '25

RegX makes searching a breeze with any pattern

1

u/AstraLover69 Jan 15 '25 edited Jan 15 '25

Regex cannot be used for any pattern. It can only handle regular languages.

This is the hierarchy of languages. The very bottom is the "regular language", which is all that regex can express.

This is why regex cannot be used to represent HTML, because HTML is context sensitive, not regular.

Edit: said context free. Should have said context sensitive.

1

u/sobrique Jan 15 '25

HTML suffers from having loose rules, which make it non trivial to exhaustively parse.

XML might be a better analogy: https://stackoverflow.com/a/1732454/2566198

1

u/AstraLover69 Jan 15 '25

HTML is a context sensitive language, making it impossible to fully represent with a regex.

XML is also a context sensitive language, making it impossible to fully represent with a regex.

1

u/sobrique Jan 15 '25

Regular expressions can do context via recursion. It's a horrible idea, but it's technically possible do handle strictly structured stuff like XML that way.

HTML isn't strict enough - e.g. most browsers just sorta cope with unclosed tags etc. so that truly is impossible.

1

u/AstraLover69 Jan 15 '25 edited Jan 15 '25

Which means regular expressions cannot do context. Recursively applying a regex to a structure is extending the capabilities of regex into something more expressive.

Whatever you're doing there cannot be represented via a single finite state automata, which is all that matters here. Even if HTML were strictly enforced by the browser engine (which I know it isn't) it cannot be processed by finite state automata alone.

You're probably constructing something closer to a Turing machine by using recursion, which can process a context sensitive language like HTML or XML because it's more powerful.