-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Reduce memory usage by 2/3 by using cut down mime type implementation #829
Conversation
Mail uses the mime/types gem, but uses only a tiny portion of it. Unfortunately, the mime/types gem has a huge memory footprint. This replaces the use of mime/types with an internal library that implements only the parts of mime/types that mail uses, which are: * Determining whether a content type is binary or not, to allow guess_encoding to determine 7bit or binary. * Guessnig the content type based on a filename, if a content type is not explicitly specified. This adds a copy_mime rake task, which will use the mime/types gem and pull out only the information needed for the parts that mail uses, and stores that cache in lib/mail/mime_type_cache.rb. This way if mime/types adds types, it's easy to rebuild the cache. To limit the amount of memory used, the mime types are stored as one of the following two classes: Mail::MimeType (if 7bit) or Mail::MimeType::Binary (if binary). Each just stores the content type as a string. Separate classes are used so that each instance doesn't have to store whether it is binary or not, which saves memory. This is currently just a proof of concept to get feedback, more documentation should be added before merging, at the least. The 2/3 reduction in memory was taken from these numbers: ruby RSS in KB: without require mail: 6204 with require mail, before: 28860 with require mail, after: 14452
The only failure in Travis CI was due to it being unable to connect to GitHub (https://travis-ci.org/mikel/mail/jobs/41050770). All other suites required to pass passed. |
As mail no longer uses mime/types, stop testing many different versions on travis. Also, remove the gemfiles directory, as it is no longer needed.
@@ -0,0 +1,29 @@ | |||
desc "Copy data from mime_types gem" | |||
task :copy_mime do | |||
require 'mime/types' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we add 'mime-types' to the gemfile under something like local_development
and exclude that bundler group on CI? Or should we just assume anyone running this task will gem install mime-types and move on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think assuming that anyone running the task has it installed would be fine. mime/types
would be like the dependency on ragel, where you only need it if you want to regenerate the generated code, not for general development.
This adds some documentation and fixes the file and variable naming. It also has a few bug fixes and improvements: * The previous implementation was not choosing the preferred mime type, only the first one in the array of mime types. This sorts the array by priority before picking the first one, similar to how the mime-types gem works. I'm not sure why mime-types doesn't store the data in a sorted order. * It sorts the hashes when writing the mime_types/cache file, so running the copy_mime rake task multiple times is now idempotent.
OK, I've added some documentation, fixed the variable naming, and fixed a couple of bugs in the code. I think it should be good to go now. Let me know if you'd like additional changes. |
# Return false, since all instances of MimeType have a text/7bit content type. | ||
def binary? | ||
false | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the way this is written, you could create an attr_reader :binary
and in the superclass, in initialize
set @binary = false
and in the subclass, call super
, then set @binary = true
. I'm curious what you think of that. I'm reacting to creating a class solely for over-writing a method in its superclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do that, but I think it takes more memory, since you are setting an additional instance variable in each object. As the whole point of this pull request is to reduce the amount of memory used, I thought this the most efficient approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
Specify the mime-types version in the code, and include that information in the generated file. More variable name tweaking to make things more clear. Fix some minor typos.
I've pushed a new commit that I think takes care of all of the minor issues pointed out in the copy_mime_data rake task. Please let me know if you'd like additional changes. |
@jeremyevans Looks awesome to me. Since it's a big change, I'd like another committer to 👍 it. @jeremy @mikel @dball @arunagw @ConradIrwin @pzb |
- gemfiles/mime_types_2.2.gemfile | ||
- gemfiles/mime_types_2.3.gemfile | ||
- gemfiles/mime_types_2.latest.gemfile | ||
- gemfiles/mime_types_edge.gemfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice bonus!
This is great. Only concern is for folks who update their MIME types and expect Mail to recognize them, not realizing we have a separate frozen snapshot. That should only affect apps that monkey-patch mime-types, so it's not a broad concern, but we may consider a compatibility breaking change (and hence major version bump). Not sure about that. |
@jeremy That's a good argument for adding a post_install_message in the gemspec something like Do you think we should let people opt out of this? Or opt in? |
We should also remove |
I think removing Appraisals is a good idea, but I think it should probably be a separate PR. However, if you want me to add it to this PR, let me know and I'll do that. I'm fine with adding a post-install message if you want to. We'll have to set a constant in the If we want to make this conditional, it's possible to just check if I'm happy to modify this implementation if you want to be opt-in or opt-out, just let me know how you want to proceed. |
Another consideration, is that, e.g. in Rails, there's an initializer # Mime::Type.register "text/richtext", :rtf (like @jeremyevans said above "mime-types supports adding new types at runtime") I imagine one would expect mime types added here to be handled by Mail, though, in practice, I don't know how often that would matter. I'm not comfortable merging this, as great as it is, until we've ironed out any possibly breaking changes. |
It's impossible for mail to offer the same API that mime-types uses for adding types at runtime, since it is class method on I spoke with @halostatue at RubyConf about this, and we discussed possible ways to reduce memory usage inside mime-types. One of the ideas discussed was reusing objects, which I think could reduce the amount of memory the I don't have strong feelings about this pull request, I just want the end result of decreasing memory usage by mail, and this approach seemed to be the simplest way. However we get there is fine with me. |
@jeremyevans re: hesitancy to merge: basically I'm new to the repo and am being especially conservative since I'm maybe more 'active', but other collabs have more domain knowledge etc. Nice talk at rubyconf from what I could tell :) |
I’ll take a look at this when I have an opportunity to do so, but as I told @jeremyevans, it may be a little while before I can get this figured out (I have competing imperatives—some people want richer data; mail needs less data). @SamSaffron has a branch of mime-types 1.x that @charliesome contributed to that added a GDBM cache that may be able to help with this and I can look at adding that to MIME types as something that a user can take advantage of. The biggest problem with any sort of cache mechanism (unless you’re doing it as I understand this change to be doing) is that you need to get the user to opt into this. So, it may be that you need to have documentation that says “reduce mail usage by setting this configuration” which points to the user’s GDBM cache. |
+1 for working on this, I do wonder if loading stuff on demand may be a a db is best though cause there is so much data. On Thu, Nov 20, 2014 at 10:44 AM, Austin Ziegler notifications@github.com
|
@SamSaffron: that’s more or less the direction I want to head, but don’t see making this happen before mime-types 3.0 (theoretically this year, but more practically early next year) because it will mean a fairly large change to the data structure (and I’m trying to figure out how to make that usable by other systems more easily). |
The large memory usage issue is addressed by mime-types 2.6 with the columnar store. I'll send a pull request to use that. |
Mail uses the mime/types gem, but uses only a tiny portion of it.
Unfortunately, the mime/types gem has a huge memory footprint. This
replaces the use of mime/types with an internal library that implements
only the parts of mime/types that mail uses, which are:
guess_encoding to determine 7bit or binary.
not explicitly specified.
This adds a copy_mime rake task, which will use the mime/types gem and
pull out only the information needed for the parts that mail uses, and
stores that cache in lib/mail/mime_type_cache.rb. This way if
mime/types adds types, it's easy to rebuild the cache.
To limit the amount of memory used, the mime types are stored as one of
the following two classes: Mail::MimeType (if 7bit) or
Mail::MimeType::Binary (if binary). Each just stores the content type
as a string. Separate classes are used so that each instance doesn't
have to store whether it is binary or not, which saves memory.
This is currently just a proof of concept to get feedback, more
documentation should be added before merging, at the least.
The 2/3 reduction in memory was taken from these numbers:
ruby RSS in KB:
without require mail: 6204
with require mail, before: 28860
with require mail, after: 14452