Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

tabs indentation is not considered as invalid #80

Closed
stelcheck opened this issue Jun 25, 2013 · 26 comments
Closed

tabs indentation is not considered as invalid #80

stelcheck opened this issue Jun 25, 2013 · 26 comments
Labels

Comments

@stelcheck
Copy link

Tab indentation will basically result as a non-char when parsing YAML.

For instance:

a:
        b:
                c: yo mamma

As tab indented comes out as:

js-yaml test.yaml 

{ a: null, b: null, c: 'yo mamma' }

It would be safer and easier to debug if the file would not be parseable on tab indent.

@dervus
Copy link
Collaborator

dervus commented Jun 25, 2013

It seems like s-l-comments grammar rule is missed, and s-separate is used instead:

skipSeparationSpace(true, -1);

But I am not sure. YAML's grammar is pretty complex, and I could miss something. If that is really a bug, it's also needed to check all code for using right "separation skip" rules. Unfortunally, we need deep review of the grammar and JS-YAML's code in order to slove this.

@ixti
Copy link
Contributor

ixti commented Jun 25, 2013

TL;DR Tabs are allowed as white-space and equals single space:

[31]    s-space ::=  #x20 /* SP */   
[32]    s-tab   ::=  #x9  /* TAB */  
[33]    s-white ::= s-space | s-tab
[63]    s-indent(n) ::= s-space × n
[66]    s-separate-in-line  ::= s-white+ | /* Start of line */
[74]    s-flow-folded(n)    ::= s-separate-in-line? b-l-folded(n,flow-in)
                                s-flow-line-prefix(n)

yamlaintmarkuplanguage yaml version1 2

@stelcheck
Copy link
Author

Tabs are acceptable if they are not used for indenting variable declarations. In the example image you have posted, tabs are used as part of multiline strings, where in which case it would be perfectly fine.

http://www.yaml.org/spec/1.2/spec.html#id2777534

In YAML block styles, structure is determined by indentation. In general, indentation is defined as a zero or more space characters at the start of a line.

To maintain portability, tab characters must not be used in indentation, since different systems treat tabs differently. Note that most modern editors may be configured so that pressing the tab key results in the insertion of an appropriate number of spaces.

@ixti
Copy link
Contributor

ixti commented Jun 26, 2013

Aha. Good catch!

@ronkorving
Copy link

Would love to see a fix for this. The current behavior is really dodgy. If nobody will, I'll try and do a PR, but that could take quite a while. Fingers crossed that someone who is more familiar with the project is willing to help out.

@puzrin
Copy link
Member

puzrin commented Jun 28, 2013

@ronkorving we are busy with another projects. Since bug is not critical, don't expect fix soon.

dervus added a commit that referenced this issue Sep 29, 2013
Flow readers must not consume trailing whitespace because such a
behavior is too error-prone. Even plain scalar reader does not
consume that outside of it's scope (indentation level).

Example of wrong of "every reader consumes trailing whitespaces"
behavior is issue #80
@dervus dervus added the defect label Jun 1, 2014
@whatsdis
Copy link

whatsdis commented Oct 7, 2015

damn really need a fix for this, wondering if anyone would be up for a bounty

@ronkorving
Copy link

Agreed, we really do. I wonder how many (wo)man-hours have been wasted because of this issue.

@whatsdis
Copy link

whatsdis commented Oct 7, 2015

@ronkorving what I found out is that if you replace all the tabs, it renders correcty in js-yaml. Unless there is some way to parse the yaml directly with Codemirror API?

I've been having this issue here : perak/codemirror#19

brb, going to give this a try and see if it works out.

@whatsdis
Copy link

whatsdis commented Oct 7, 2015

holy crap it worked! thanks to @perak from http://meteor-kitchen.com

@ronkorving
Copy link

I'm not sure I follow? Tabs are invalid. I want to see an error when I accidentally use tab-indentation :)

@tokyowizard
Copy link

👍

@ingydotnet
Copy link

Greetings. I am one of the creators of YAML. Tabs are not allowed in indentation and an error should happen when they are detected in a place where indentation might occur.

IMHO libyaml is the most perfect implementation of the YAML 1.1 spec (which is very close to YAML 1.2).

Here is the result of the OP test in Perl's libyaml binding: https://gist.github.com/ac5821875eb30979f7a8
Here is the same program in Python: https://gist.github.com/582ea40dbdcd473249a4

They both throw the same error. I believe it is a wise choice to test arguable behaviors against a libyaml binding. Possibly even moreso than the spec.

FWIW, we spent every day for 3 months (circa 2003) trying to sanely allow tabs into YAML indentation. We came to the conclusion that it was an insane endeavor. (It ends up being much harder than, say, Python indentation based scoping).

@fwextensions
Copy link

A better error message when tabs are encountered would reduce a lot of confusion for people who don't know they're invalid. end of the stream or a document separator is expected doesn't exactly illuminate what the issue is. When encountering that exception, a new user is likely to:

  • Google "yaml to json"
  • Click the first result
  • Paste in a simple tab-indented test file
    foo:
    - bar
    - baz
    tabs:
    - not
    - spaces
  • See the expected JSON output with no error
  • Wonder what's wrong with JS-YAML

I'm not sure what library json2yaml.com is using, but it would be nice if JS-YAML could do something similar or at least provide clearer error messaging.

@stelcheck
Copy link
Author

Where would I get started to fix this bug? About time we try to tackle it, in the end it really should just be about line prefix right?

@puzrin
Copy link
Member

puzrin commented Dec 14, 2017

@stelcheck IMHO it worth to check comments why #224 was rejected. I can't say anything more concrete.

@stelcheck
Copy link
Author

I can understand that you might want something like:

a:
   b: |
        Some tab-indented text

From what I understand, the closed PR you refer to was not allowing that.

At any rate, having tabs being straight out being ignored (as the initial bug report suggests, given the output - we get {a:.., b:...} instead of {a: {b: ...}}) is just flat wrong - regardless of the stance one would take regarding how tabs should be considered, or what the YAML specs say. It's just unintuitive from a user perspective.

 But it is a general statement, and we must follow only grammar rules in the spec.

I already linked to http://www.yaml.org/spec/1.2/spec.html#id2777534 previously in the thread. I believe the YAML specification is rather clear on that subject, but perhaps I am misunderstanding something.

@puzrin
Copy link
Member

puzrin commented Dec 15, 2017

Closed PR tried to do global magic instead of change behaviour exactly in required places. Yes, loader should throw exception on tabs.

@stelcheck
Copy link
Author

Ah, apologies, I think I misinterpreted your last comment - your point was that we shouldn't do global magic, but instead solve the problem through a safe and sound implementation.

That was my goal with re-igniting this thread; I am not entirely sure how I would do that, and was looking for advice.

@puzrin
Copy link
Member

puzrin commented Dec 15, 2017

Giving good advice require to remember everything, and will take the same as fixing by myself :)

@stelcheck
Copy link
Author

Alright, fair enough :)

I'll see if I can dig into this one myself.

@sffc
Copy link

sffc commented Jul 15, 2018

Or, give js-yaml an option to accept TAB as a valid indentation character. Yes, the omission of tabs from the spec is explained on http://yaml.org/faq.html, but I see no harm in giving an option here in the implementation code. I use tab indentation for all other file formats in my project and would like to not be restricted from using it in "YAML" files (they wouldn't technically be YAML anymore).

By default the library should conform to the spec of course, but that doesn't forbid the library from providing useful non-standard options.

@stelcheck
Copy link
Author

I've never seen a parser taking such configuration element though; you'd end up with a file that is not readable by most (all?) other parsers. Even if other parsers would implement such option, you'd now be bound by configuration. This can create inter-operability headaches.

@ingydotnet
Copy link

Using tabs for indentation could not be done in a sane, safe and reliable fashion in YAML. We spent 3 months working on a way to use tabs in YAML during the creation of the language, and ultimately decided it was the best course of action to not allow them.

If you want to track down the emails in the yaml-core mailing list, read them all and propose a solution for tabs to work perfectly in YAML, be my guest. :)

For now it I think that trying to support tab indentation as an option (in any implementation) would be a really bad idea.

@rlidwka
Copy link
Member

rlidwka commented Dec 30, 2020

Fixed in aee620a (currently in dev branch). Now it shows up an error:

YAMLException: tab characters must not be used in indentation (2:1)

 1 | a:
 2 | →b:
-----^
 3 | →→c: yo mamma

What this fix does is: it checks that there were no prior tabs in current line when parsing block sequence or block mapping.

This is more of a hack, since it did not solve conceptual skipSeparationSpace mess (which parses both indent and separation space in yaml grammar without distringuishing them).

But all fixes all examples of wrong tab usage I've been able to come up with (see tests), even when other parsers like pyyaml fails to do so.

So it's probably good enough to close this issue. Please let me know if there are any side effects.

@rlidwka rlidwka closed this as completed Dec 30, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.