Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

YAML cache "bypasses" permitted_classes in some cases #434

Closed
nirvdrum opened this issue Jan 20, 2023 · 4 comments · Fixed by #435
Closed

YAML cache "bypasses" permitted_classes in some cases #434

nirvdrum opened this issue Jan 20, 2023 · 4 comments · Fixed by #435

Comments

@nirvdrum
Copy link

nirvdrum commented Jan 20, 2023

I made sure the issue is in bootsnap

Steps to reproduce

The YAML cache provided by Bootsnap loads some types that are permitted by MessagePack, but not by YAML by default. YAML.load_file uses YAML.load under the covers which restricts the set of classes that can be loaded dynamically. E.g., YAML.load will load Symbol, but not Date or Time. MessagePack will load Date and Time, but not DateTime. While I experimented with a few different serialized objects, I'm not sure that the list of potentially affected classes is exhaustive.

To illustrate the issue, I wrote a small script that attempts to a YAML document both before and after Bootsnap has been loaded. Since loading Bootsnap with the YAML cache will monkeypatch YAML, each example needs to be run in a new process to ensure everything starts in a clean state. I naively handled that in the script by making the user supply the class name (with minimal validation) or it'll default to Date.

You'd run it with one of the following:

$ ruby yaml_cache.rb
$ ruby yaml_cache.rb date
$ ruby yaml_cache.rb time
$ ruby yaml_cache.rb datetime
$ ruby yaml_cache.rb symbol

The script:

require 'bootsnap'
require 'fileutils'
require 'tempfile'
require 'yaml'

DOCUMENTS = {
  Date: "--- 2023-01-20\n",
  Time: "--- 2023-01-20 13:18:31.083375000 -05:00\n",
  Datetime: "--- !ruby/object:DateTime 2023-01-20 13:22:08.204345000 -05:00\n",
  Symbol: "--- :abc\n"
}

class_type = ARGV.empty? ? :Date : ARGV[0].capitalize.to_sym
yaml_doc = DOCUMENTS[class_type]

raise "Invalid class type '#{class_type}'" if yaml_doc.nil?

Tempfile.open("#{class_type}.yml") do |file|
  file.write yaml_doc
  file.close

  begin
    YAML.load_file file.path
  rescue Psych::DisallowedClass
    puts "Caught bad load (#{class_type}): #{$!}"
  else
    puts "#{class_type} loaded"
  end

  Bootsnap.setup(cache_dir: 'tmp/cache', compile_cache_yaml: true)

  begin
    YAML.load_file file.path
  rescue Psych::DisallowedClass
    puts "Caught bad load (#{class_type}): #{$!}"
  else
    puts "#{class_type} loaded"
  end
end

FileUtils.rm_rf 'tmp'

Expected behavior

I'd expect YAML.load_file to be functionally the same before and after loading Bootsnap's YAML cache. I.e., I'd expect the cache is an optimization with no visible effects on behavior.

Actual behavior

Loading the Bootsnap YAML cache on supported Ruby interpreters (where "supported" is defined by Bootsnap internals) will allow the loading of some classes that wouldn't be allowed if the YAML cache were not used:

> ruby -v yaml_cache.rb date
ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin21.5.0]
Caught bad load (Date): Tried to load unspecified class: Date
Date loaded

> ruby -v yaml_cache.rb time
ruby 3.2.0 (2022-12-25 revision a528908271) [arm64-darwin21.5.0]
Caught bad load (Time): Tried to load unspecified class: Time
Time loaded

In contrast, on a Ruby interpreter where the YAML compile cache is not supported, such as TruffleRuby, we see:

> ruby -v yaml_cache.rb date
truffleruby 23.0.0-dev-71f07786, like ruby 3.1.3, GraalVM CE Native [aarch64-darwin]
Caught bad load (Date): Tried to load unspecified class: Date
Caught bad load (Date): Tried to load unspecified class: Date

> ruby -v yaml_cache.rb time
truffleruby 23.0.0-dev-71f07786, like ruby 3.1.3, GraalVM CE Native [aarch64-darwin]
Caught bad load (Time): Tried to load unspecified class: Time
Caught bad load (Time): Tried to load unspecified class: Time

To load the code properly on an interpreter not supported by Bootsnap's YAML cache, we would have to supply the permitted_classes option to YAML.load_file (e.g., YAML.load_file("my.yml", permitted_classes: [Date, Time]). While that will allow the code example to work the same on all Ruby interpreters, it bypasses the YAML cache entirely because the permitted_classes kwarg is not a supported option for the cache.

System configuration

Bootsnap version: 1.15.0

Ruby version: ruby 3.2.0 (2022-12-25 revision a528908271) and TruffleRuby 23.0.0-dev

@byroot
Copy link
Contributor

byroot commented Jan 21, 2023

Thanks for the report, I introduced this in #392 and the intent was:

When loading from cache, we know if load_file or unsafe_load_file is used. unsafe_load_file can use cache generated in safe mode, but load_file won't use caches generated via unsafe_load_file.

Here you seem to always use the unsafe_load_file, and it only succeed on the second call, so I assume the first one raises but somehow still generate the cache? That's weird, but should be too hard to fix.

@nirvdrum
Copy link
Author

If you comment out the first YAML.load_file call in the script above, you'll see that the load succeeds with an empty cache as well. I can dig further into Bootsnap, but my guess is cache faults follows the same logic as cache hits. If the fix is to call the original YAML.load_file when a cache entry doesn't exist, I think that could preserve the non-caching original behavior while still retaining the benefits of the cache on a hit.

@casperisfine
Copy link
Contributor

Ok, I just dug a bit more, and I think the issue is that our strict_load method is based on the assumption that just rejecting tags is enough to not support any extra type, but that's an incorrect assumption because types such as dates don't need tags.

casperisfine pushed a commit that referenced this issue Jan 22, 2023
…ad_file

Fix: #434

Our strictness was based on the incorrect assumption that all extra types
would use a tag, which is incorrect as `Time` and `Date` objects can be
expressed in regular YAML syntax without the use of `!ruby/object`.
@casperisfine
Copy link
Contributor

#435 should fix this.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants