Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Reduce memory usage by 2/3 by using cut down mime type implementation #829

Closed
wants to merge 5 commits into from

Conversation

jeremyevans
Copy link
Contributor

Mail uses the mime/types gem, but uses only a tiny portion of it.
Unfortunately, the mime/types gem has a huge memory footprint. This
replaces the use of mime/types with an internal library that implements
only the parts of mime/types that mail uses, which are:

  • Determining whether a content type is binary or not, to allow
    guess_encoding to determine 7bit or binary.
  • Guessnig the content type based on a filename, if a content type is
    not explicitly specified.

This adds a copy_mime rake task, which will use the mime/types gem and
pull out only the information needed for the parts that mail uses, and
stores that cache in lib/mail/mime_type_cache.rb. This way if
mime/types adds types, it's easy to rebuild the cache.

To limit the amount of memory used, the mime types are stored as one of
the following two classes: Mail::MimeType (if 7bit) or
Mail::MimeType::Binary (if binary). Each just stores the content type
as a string. Separate classes are used so that each instance doesn't
have to store whether it is binary or not, which saves memory.

This is currently just a proof of concept to get feedback, more
documentation should be added before merging, at the least.

The 2/3 reduction in memory was taken from these numbers:

ruby RSS in KB:
without require mail: 6204
with require mail, before: 28860
with require mail, after: 14452

Mail uses the mime/types gem, but uses only a tiny portion of it.
Unfortunately, the mime/types gem has a huge memory footprint.  This
replaces the use of mime/types with an internal library that implements
only the parts of mime/types that mail uses, which are:

* Determining whether a content type is binary or not, to allow
  guess_encoding to determine 7bit or binary.
* Guessnig the content type based on a filename, if a content type is
  not explicitly specified.

This adds a copy_mime rake task, which will use the mime/types gem and
pull out only the information needed for the parts that mail uses, and
stores that cache in lib/mail/mime_type_cache.rb.  This way if
mime/types adds types, it's easy to rebuild the cache.

To limit the amount of memory used, the mime types are stored as one of
the following two classes: Mail::MimeType (if 7bit) or
Mail::MimeType::Binary (if binary).  Each just stores the content type
as a string.  Separate classes are used so that each instance doesn't
have to store whether it is binary or not, which saves memory.

This is currently just a proof of concept to get feedback, more
documentation should be added before merging, at the least.

The 2/3 reduction in memory was taken from these numbers:

ruby RSS in KB:
without require mail:       6204
with require mail, before: 28860
with require mail, after:  14452
@jeremyevans
Copy link
Contributor Author

The only failure in Travis CI was due to it being unable to connect to GitHub (https://travis-ci.org/mikel/mail/jobs/41050770). All other suites required to pass passed.

As mail no longer uses mime/types, stop testing many different
versions on travis.  Also, remove the gemfiles directory, as it
is no longer needed.
@@ -0,0 +1,29 @@
desc "Copy data from mime_types gem"
task :copy_mime do
require 'mime/types'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we add 'mime-types' to the gemfile under something like local_development and exclude that bundler group on CI? Or should we just assume anyone running this task will gem install mime-types and move on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assuming that anyone running the task has it installed would be fine. mime/types would be like the dependency on ragel, where you only need it if you want to regenerate the generated code, not for general development.

This adds some documentation and fixes the file and variable naming.

It also has a few bug fixes and improvements:

* The previous implementation was not choosing the preferred mime
  type, only the first one in the array of mime types.  This sorts
  the array by priority before picking the first one, similar to
  how the mime-types gem works.  I'm not sure why mime-types doesn't
  store the data in a sorted order.

* It sorts the hashes when writing the mime_types/cache file, so
  running the copy_mime rake task multiple times is now idempotent.
@jeremyevans
Copy link
Contributor Author

OK, I've added some documentation, fixed the variable naming, and fixed a couple of bugs in the code. I think it should be good to go now. Let me know if you'd like additional changes.

# Return false, since all instances of MimeType have a text/7bit content type.
def binary?
false
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the way this is written, you could create an attr_reader :binary and in the superclass, in initialize set @binary = false and in the subclass, call super, then set @binary = true. I'm curious what you think of that. I'm reacting to creating a class solely for over-writing a method in its superclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do that, but I think it takes more memory, since you are setting an additional instance variable in each object. As the whole point of this pull request is to reduce the amount of memory used, I thought this the most efficient approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@bf4
Copy link
Contributor

bf4 commented Nov 17, 2014

@jeremy @mikel Any thoughts on this? We should add this as a step in the release process (when we document that.. :)

Specify the mime-types version in the code, and include that
information in the generated file.

More variable name tweaking to make things more clear.

Fix some minor typos.
@jeremyevans
Copy link
Contributor Author

I've pushed a new commit that I think takes care of all of the minor issues pointed out in the copy_mime_data rake task. Please let me know if you'd like additional changes.

@bf4
Copy link
Contributor

bf4 commented Nov 17, 2014

@jeremyevans Looks awesome to me. Since it's a big change, I'd like another committer to 👍 it. @jeremy @mikel @dball @arunagw @ConradIrwin @pzb

- gemfiles/mime_types_2.2.gemfile
- gemfiles/mime_types_2.3.gemfile
- gemfiles/mime_types_2.latest.gemfile
- gemfiles/mime_types_edge.gemfile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice bonus!

@jeremy
Copy link
Collaborator

jeremy commented Nov 17, 2014

This is great. Only concern is for folks who update their MIME types and expect Mail to recognize them, not realizing we have a separate frozen snapshot. That should only affect apps that monkey-patch mime-types, so it's not a broad concern, but we may consider a compatibility breaking change (and hence major version bump). Not sure about that.

@bf4
Copy link
Contributor

bf4 commented Nov 17, 2014

@jeremy That's a good argument for adding a post_install_message in the gemspec something like Using cached mime-types from version 2.4.3.

Do you think we should let people opt out of this? Or opt in?

@bf4
Copy link
Contributor

bf4 commented Nov 17, 2014

We should also remove https://github.com/mikel/mail/blob/master/Appraisals and remove appraisals from the Gemfile. It wouldn't be entirely out of scope for this PR.

@jeremyevans
Copy link
Contributor Author

mime-types supports adding new types at runtime, so you don't even need to monkey patch it to add types. This implementation does not currently support similar behavior, but it would be easy to add if it is desired.

I think removing Appraisals is a good idea, but I think it should probably be a separate PR. However, if you want me to add it to this PR, let me know and I'll do that.

I'm fine with adding a post-install message if you want to. We'll have to set a constant in the mime_type/cache.rb file containing the mime-types version, instead of just including it as a comment.

If we want to make this conditional, it's possible to just check if ::MIME::Type is defined at run time and use that if it is instead of the internal MimeType implementation. But then you pay the memory penalty for both the internal implementation and the mime-types implemenation. An alterative approach is checking for ::MIME::Type at mail.rb load time and not loading mail/mime_type.rb in that case. That makes it so you don't pay the memory penalty for both, but it makes code order dependent.

I'm happy to modify this implementation if you want to be opt-in or opt-out, just let me know how you want to proceed.

@bf4
Copy link
Contributor

bf4 commented Nov 19, 2014

Another consideration, is that, e.g. in Rails, there's an initializer

# Mime::Type.register "text/richtext", :rtf

(like @jeremyevans said above "mime-types supports adding new types at runtime")

I imagine one would expect mime types added here to be handled by Mail, though, in practice, I don't know how often that would matter.

I'm not comfortable merging this, as great as it is, until we've ironed out any possibly breaking changes.

@jeremyevans
Copy link
Contributor Author

It's impossible for mail to offer the same API that mime-types uses for adding types at runtime, since it is class method on Mime::Type. @bf4, I couldn't tell from your message whether being backwards incompatible in this way would be acceptable with a major version bump, or whether it would just not be acceptable at all. If backwards compatibility cannot be broken in such a manner, it is pointless to pursue this pull request further.

I spoke with @halostatue at RubyConf about this, and we discussed possible ways to reduce memory usage inside mime-types. One of the ideas discussed was reusing objects, which I think could reduce the amount of memory the lib/mail/mime_type/cache.rb file uses by 1/2 to 2/3. Another was possibly storing the mime-type information in a file-backed database. I opened up an issue in mime-types to discuss possibly ways to decrease memory usage.

I don't have strong feelings about this pull request, I just want the end result of decreasing memory usage by mail, and this approach seemed to be the simplest way. However we get there is fine with me.

@bf4
Copy link
Contributor

bf4 commented Nov 19, 2014

@jeremyevans re: hesitancy to merge: basically I'm new to the repo and am being especially conservative since I'm maybe more 'active', but other collabs have more domain knowledge etc.

Nice talk at rubyconf from what I could tell :)

@halostatue
Copy link
Contributor

I’ll take a look at this when I have an opportunity to do so, but as I told @jeremyevans, it may be a little while before I can get this figured out (I have competing imperatives—some people want richer data; mail needs less data). @SamSaffron has a branch of mime-types 1.x that @charliesome contributed to that added a GDBM cache that may be able to help with this and I can look at adding that to MIME types as something that a user can take advantage of. The biggest problem with any sort of cache mechanism (unless you’re doing it as I understand this change to be doing) is that you need to get the user to opt into this.

So, it may be that you need to have documentation that says “reduce mail usage by setting this configuration” which points to the user’s GDBM cache.

@SamSaffron
Copy link
Contributor

+1 for working on this, I do wonder if loading stuff on demand may be a
good stopgap ... you load a core / minimal amount of information and when
people ask for atypical mime type stuff you load the rest.

a db is best though cause there is so much data.

On Thu, Nov 20, 2014 at 10:44 AM, Austin Ziegler notifications@github.com
wrote:

I’ll take a look at this when I have an opportunity to do so, but as I
told @jeremyevans https://github.com/jeremyevans, it may be a little
while before I can get this figured out (I have competing imperatives—some
people want richer data; mail needs less data). @SamSaffron
https://github.com/SamSaffron has a branch of mime-types 1.x that
@charliesome https://github.com/charliesome contributed to that added a
GDBM cache that may be able to help with this and I can look at adding that
to MIME types as something that a user can take advantage of. The biggest
problem with any sort of cache mechanism (unless you’re doing it as I
understand this change to be doing) is that you need to get the user to opt
into this.

So, it may be that you need to have documentation that says “reduce
mail usage by setting this configuration” which points to the user’s GDBM
cache.


Reply to this email directly or view it on GitHub
#829 (comment).

@halostatue
Copy link
Contributor

@SamSaffron: that’s more or less the direction I want to head, but don’t see making this happen before mime-types 3.0 (theoretically this year, but more practically early next year) because it will mean a fairly large change to the data structure (and I’m trying to figure out how to make that usable by other systems more easily).

@jeremyevans
Copy link
Contributor Author

The large memory usage issue is addressed by mime-types 2.6 with the columnar store. I'll send a pull request to use that.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants