Skip to content

Fix memory bugs in loading code #651

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

jart
Copy link
Contributor

@jart jart commented Mar 31, 2023

This change hardens the C++ code that loads the GGML file format. Some people download weights off the Internet to run inference on a trained model. Since weights don't contain code like graph definitions, having them be able to load in a secure manner is a reasonable expectation to have. Therefore this change addresses many of the weaknesses in how we were going about doing things earlier, which would allow untrustworthy weights to trigger undefined behaviors with memory. I haven't cared to investigate whether any of these weaknesses are exploitable, but it'll certainly be more difficult for that to happen, once this gets merged, which will enable our users to share more freely, happily, and safely.

This change hardens the C++ code that loads the GGML file format. Some
people download weights off the Internet to run inference on a trained
model. Since weights don't contain code like graph definitions, having
them be able to load in a secure manner is a reasonable expectation to
have. Therefore this change addresses many of the weaknesses in how we
were going about doing things earlier, which would allow untrustworthy
weights to trigger undefined behaviors with memory. I haven't cared to
investigate whether any of these weaknesses are exploitable, but it'll
certainly be more difficult for that to happen, once this gets merged,
which will enable our users to share more freely, happily, and safely.
break;
}
fin.read((char *)&n_dims, 4);
if (fin.eof()) break;
Copy link
Contributor

@sw sw Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone being stumped by this like I was: read_int32 can't be used here the same as the following two fields, because we want to handle EOF gracefully, and the EOF bit is not set until you try to read past the end, so you can't check it before reading.

I guess you could do it like this for better consistency: Nvm, that causes an error to be printed.

if (!read_int32(fin, &n_dims)) {
  if (fin.eof()) break; else return false;
}

I thought about an optional argument bool allow_eof = false for the check functions, but that just ended up complicating things.

Copy link
Collaborator

@howard0su howard0su left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, i suggest removing all check around read. do validate the incoming number is in the valid range. This will also make the PR smaller.

static bool check_n_dims(int32_t n_dims) {
if (n_dims == 1) return true;
if (n_dims == 2) return true;
fprintf(stderr,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ggml supports 4 dimensions although llama only uses 1 or 2.

if (hFile == INVALID_HANDLE_VALUE) return 0;
if (hFile == INVALID_HANDLE_VALUE) {
LogWindowsError("CreateFileA");
return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return NULL as void* is a ptr.

return false;
}

static bool read_impl(std::istream &fin, char *buf, std::streamsize len, const char *thing) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand there. if you already validate the size of file, how could read int/float/buf can fail? fail by what?

word.assign(tmp.data(), len);
} else {
word.clear();
}

float score;
fin.read((char *) &score, sizeof(score));
if (!read_float(fin, &score)) return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check nan here. I don't think read will fail.

@@ -385,13 +457,13 @@ static bool llama_model_load(
fin.rdbuf()->pubsetbuf(f_buf.data(), f_buf.size());

fin.seekg(0, fin.end);
const size_t file_size = fin.tellg();
const int64_t file_size = fin.tellg();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the file_size. we should have a very precise file size for different models.

@@ -546,15 +615,15 @@ static bool llama_model_load(
const size_t scale = memory_type == GGML_TYPE_F32 ? 2 : 1;

// this is the total memory required to run the inference
const size_t mem_required =
const int64_t mem_required =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? size_t is correct here as we are calculate the memory size.

break;
}
fin.read((char *)&n_dims, 4);
if (fin.eof()) break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned earlier, please check the size intially. check eof is not good idea for every io.


int32_t nelements = 1;
int32_t ne[2] = { 1, 1 };
if (!check_n_dims(n_dims)) return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is good.

{
cur_size = ggml_quantize_q4_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());
} break;
cur_size = ggml_quantize_q4_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related. better a separate PR

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider such kind of changes with very low priority and almost unnecessary.

The reason is that this is only really needed when making a production-ready software where you need to care about wether your files got corrupted or someone could be trying to exploit the software, etc.

This project is not that, since we are not expecting it to be used in serious / commercial products. llama.cpp is mostly a playground for exploring inference techniques and applications of LLMs. So we want to keep things simple, even at the cost of some good programming practices and proper input sanitisation.

At a later stage, when and if we start building a proper ggml-based engine, we can start considering this kind of safety issues more seriously. But for now, it's not really worth it.

Since you've already put the work into making this, I am approving it.
But unless you feel super strong about adding these changes, I recommend to close the PR

{
fprintf(stderr, "%s: unsupported quantization type %d\n", __func__, type);
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The switch format using {} has to remain as it is

@ggerganov ggerganov closed this Apr 13, 2023
AAbushady pushed a commit to AAbushady/llama.cpp that referenced this pull request Jan 31, 2024
This works around a Win32 issue when piping output from a PyInstaller
context, such as when doing so in a perl script or to an output file.
Print statements from a Python context don't properly get output unless
flushed.

This strategically flushes the print statements so no information is
lost, though it may be better to flush all print statements in a Python
context via a subroutine wrapper.

See also:

    https://mail.python.org/pipermail/python-bugs-list/2004-August/024923.html
    https://stackoverflow.com/a/466849
    https://stackoverflow.com/q/62693079
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants