-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Fix memory bugs in loading code #651
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
This change hardens the C++ code that loads the GGML file format. Some people download weights off the Internet to run inference on a trained model. Since weights don't contain code like graph definitions, having them be able to load in a secure manner is a reasonable expectation to have. Therefore this change addresses many of the weaknesses in how we were going about doing things earlier, which would allow untrustworthy weights to trigger undefined behaviors with memory. I haven't cared to investigate whether any of these weaknesses are exploitable, but it'll certainly be more difficult for that to happen, once this gets merged, which will enable our users to share more freely, happily, and safely.
break; | ||
} | ||
fin.read((char *)&n_dims, 4); | ||
if (fin.eof()) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anyone being stumped by this like I was: read_int32
can't be used here the same as the following two fields, because we want to handle EOF gracefully, and the EOF bit is not set until you try to read past the end, so you can't check it before reading.
I guess you could do it like this for better consistency: Nvm, that causes an error to be printed.
if (!read_int32(fin, &n_dims)) {
if (fin.eof()) break; else return false;
}
I thought about an optional argument bool allow_eof = false
for the check functions, but that just ended up complicating things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, i suggest removing all check around read. do validate the incoming number is in the valid range. This will also make the PR smaller.
static bool check_n_dims(int32_t n_dims) { | ||
if (n_dims == 1) return true; | ||
if (n_dims == 2) return true; | ||
fprintf(stderr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ggml supports 4 dimensions although llama only uses 1 or 2.
if (hFile == INVALID_HANDLE_VALUE) return 0; | ||
if (hFile == INVALID_HANDLE_VALUE) { | ||
LogWindowsError("CreateFileA"); | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return NULL as void* is a ptr.
return false; | ||
} | ||
|
||
static bool read_impl(std::istream &fin, char *buf, std::streamsize len, const char *thing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand there. if you already validate the size of file, how could read int/float/buf can fail? fail by what?
word.assign(tmp.data(), len); | ||
} else { | ||
word.clear(); | ||
} | ||
|
||
float score; | ||
fin.read((char *) &score, sizeof(score)); | ||
if (!read_float(fin, &score)) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check nan here. I don't think read will fail.
@@ -385,13 +457,13 @@ static bool llama_model_load( | |||
fin.rdbuf()->pubsetbuf(f_buf.data(), f_buf.size()); | |||
|
|||
fin.seekg(0, fin.end); | |||
const size_t file_size = fin.tellg(); | |||
const int64_t file_size = fin.tellg(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the file_size. we should have a very precise file size for different models.
@@ -546,15 +615,15 @@ static bool llama_model_load( | |||
const size_t scale = memory_type == GGML_TYPE_F32 ? 2 : 1; | |||
|
|||
// this is the total memory required to run the inference | |||
const size_t mem_required = | |||
const int64_t mem_required = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? size_t is correct here as we are calculate the memory size.
break; | ||
} | ||
fin.read((char *)&n_dims, 4); | ||
if (fin.eof()) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I mentioned earlier, please check the size intially. check eof is not good idea for every io.
|
||
int32_t nelements = 1; | ||
int32_t ne[2] = { 1, 1 }; | ||
if (!check_n_dims(n_dims)) return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check is good.
{ | ||
cur_size = ggml_quantize_q4_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data()); | ||
} break; | ||
cur_size = ggml_quantize_q4_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related. better a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I consider such kind of changes with very low priority and almost unnecessary.
The reason is that this is only really needed when making a production-ready software where you need to care about wether your files got corrupted or someone could be trying to exploit the software, etc.
This project is not that, since we are not expecting it to be used in serious / commercial products. llama.cpp
is mostly a playground for exploring inference techniques and applications of LLMs. So we want to keep things simple, even at the cost of some good programming practices and proper input sanitisation.
At a later stage, when and if we start building a proper ggml
-based engine, we can start considering this kind of safety issues more seriously. But for now, it's not really worth it.
Since you've already put the work into making this, I am approving it.
But unless you feel super strong about adding these changes, I recommend to close the PR
{ | ||
fprintf(stderr, "%s: unsupported quantization type %d\n", __func__, type); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The switch
format using {}
has to remain as it is
This works around a Win32 issue when piping output from a PyInstaller context, such as when doing so in a perl script or to an output file. Print statements from a Python context don't properly get output unless flushed. This strategically flushes the print statements so no information is lost, though it may be better to flush all print statements in a Python context via a subroutine wrapper. See also: https://mail.python.org/pipermail/python-bugs-list/2004-August/024923.html https://stackoverflow.com/a/466849 https://stackoverflow.com/q/62693079
This change hardens the C++ code that loads the GGML file format. Some people download weights off the Internet to run inference on a trained model. Since weights don't contain code like graph definitions, having them be able to load in a secure manner is a reasonable expectation to have. Therefore this change addresses many of the weaknesses in how we were going about doing things earlier, which would allow untrustworthy weights to trigger undefined behaviors with memory. I haven't cared to investigate whether any of these weaknesses are exploitable, but it'll certainly be more difficult for that to happen, once this gets merged, which will enable our users to share more freely, happily, and safely.