Skip to content

Parser combinator framework consumes unnececessary amount of memory #319

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
scabug opened this issue Oct 12, 2012 · 2 comments
Open

Parser combinator framework consumes unnececessary amount of memory #319

scabug opened this issue Oct 12, 2012 · 2 comments

Comments

@scabug
Copy link

scabug commented Oct 12, 2012

Since all Readers provider by parser combinators framework use PagedSeq inside, using those parsers for working with large files seems impossible - because PagedSeq will not release already parsed elements.

For example, consider the scenario of parsing 1GB file, from which you need only a portion of information (you may want to skip headers, comments, etc.). PagedSeq will hold on the whole 1GB until the parsing finishes and GC would step in.

Example code:

import collection.immutable.PagedSeq
import util.parsing.combinator._
import util.parsing.input._

// virtual file reader (simulates ~400Mb file)
def in = new java.io.Reader {
  var buffersRead = 0
  def read(cbuf: Array[Char], offset: Int, l: Int) = {
    if (buffersRead < 100000) {
      (0 until cbuf.size).foreach(cbuf(_) = 't')
      buffersRead += 1
      cbuf.size
    } else -1
  }
  def close() {}
}

def parser = new RegexParsers {
  var gcCountdown = 0
  def tt = new Parser[Char] {
    def apply(in: Input) = {
      gcCountdown += 1
      if (gcCountdown > 10000) {
        System.gc()
        gcCountdown = 0
      }
      if (in.atEnd)
        Failure("", in)
      else
        Success(in.first, in.drop(1024))
    }
  }
  def go(in: java.io.Reader) = parseAll(tt.*, in).get.size
}
println(parser.go(in))

If you would look at memory usage using something like jvisualvm, you would notice that running this process consumes about 800Mb of RAM just to parse 400kb worth of characters.

@scabug
Copy link
Author

scabug commented Oct 12, 2012

Imported From: https://issues.scala-lang.org/browse/SI-6520?orig=1
Reporter: Platon Pronko (rogach)
Affected Versions: 2.9.2

@scabug
Copy link
Author

scabug commented Jul 10, 2013

@adriaanm said:
Unassigning and rescheduling to M6 as previous deadline was missed.

@scabug scabug closed this as completed Jul 17, 2015
@SethTisue SethTisue transferred this issue from scala/bug Nov 19, 2020
@scala scala deleted a comment from scabug Nov 19, 2020
@SethTisue SethTisue reopened this Nov 19, 2020
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants