Fix memory bugs in loading code #651

jart · 2023-03-31T12:14:08Z

This change hardens the C++ code that loads the GGML file format. Some people download weights off the Internet to run inference on a trained model. Since weights don't contain code like graph definitions, having them be able to load in a secure manner is a reasonable expectation to have. Therefore this change addresses many of the weaknesses in how we were going about doing things earlier, which would allow untrustworthy weights to trigger undefined behaviors with memory. I haven't cared to investigate whether any of these weaknesses are exploitable, but it'll certainly be more difficult for that to happen, once this gets merged, which will enable our users to share more freely, happily, and safely.

sw · 2023-04-01T12:57:26Z

llama.cpp

-                break;
-            }
+            fin.read((char *)&n_dims, 4);
+            if (fin.eof()) break;


For anyone being stumped by this like I was: read_int32 can't be used here the same as the following two fields, because we want to handle EOF gracefully, and the EOF bit is not set until you try to read past the end, so you can't check it before reading.

~~I guess you could do it like this for better consistency:~~ Nvm, that causes an error to be printed.

if (!read_int32(fin, &n_dims)) { if (fin.eof()) break; else return false; }

I thought about an optional argument bool allow_eof = false for the check functions, but that just ended up complicating things.

howard0su

Overall, i suggest removing all check around read. do validate the incoming number is in the valid range. This will also make the PR smaller.

howard0su · 2023-03-31T15:44:56Z

llama.cpp

+static bool check_n_dims(int32_t n_dims) {
+    if (n_dims == 1) return true;
+    if (n_dims == 2) return true;
+    fprintf(stderr,


ggml supports 4 dimensions although llama only uses 1 or 2.

howard0su · 2023-04-01T15:02:29Z

llama.cpp

-    if (hFile == INVALID_HANDLE_VALUE) return 0;
+    if (hFile == INVALID_HANDLE_VALUE) {
+        LogWindowsError("CreateFileA");
+        return 0;


return NULL as void* is a ptr.

howard0su · 2023-04-01T15:03:38Z

llama.cpp

+    return false;
+}
+
+static bool read_impl(std::istream &fin, char *buf, std::streamsize len, const char *thing) {


I don't understand there. if you already validate the size of file, how could read int/float/buf can fail? fail by what?

howard0su · 2023-04-01T15:04:27Z

llama.cpp

                word.assign(tmp.data(), len);
            } else {
                word.clear();
            }

            float score;
-            fin.read((char *) &score, sizeof(score));
+            if (!read_float(fin, &score)) return false;


check nan here. I don't think read will fail.

howard0su · 2023-04-01T15:05:01Z

llama.cpp

@@ -385,13 +457,13 @@ static bool llama_model_load(
    fin.rdbuf()->pubsetbuf(f_buf.data(), f_buf.size());

    fin.seekg(0, fin.end);
-    const size_t file_size = fin.tellg();
+    const int64_t file_size = fin.tellg();


check the file_size. we should have a very precise file size for different models.

howard0su · 2023-04-01T15:05:34Z

llama.cpp

@@ -546,15 +615,15 @@ static bool llama_model_load(
        const size_t scale = memory_type == GGML_TYPE_F32 ? 2 : 1;

        // this is the total memory required to run the inference
-        const size_t mem_required =
+        const int64_t mem_required =


why? size_t is correct here as we are calculate the memory size.

howard0su · 2023-04-01T15:06:24Z

llama.cpp

-                break;
-            }
+            fin.read((char *)&n_dims, 4);
+            if (fin.eof()) break;


as I mentioned earlier, please check the size intially. check eof is not good idea for every io.

howard0su · 2023-04-01T15:06:44Z

llama.cpp


            int32_t nelements = 1;
            int32_t ne[2] = { 1, 1 };
+            if (!check_n_dims(n_dims)) return false;


this check is good.

howard0su · 2023-04-01T15:07:19Z

llama.cpp

-                        {
-                            cur_size = ggml_quantize_q4_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());
-                        } break;
+                        cur_size = ggml_quantize_q4_0(data_f32.data(), work.data(), nelements, ne[0], hist_cur.data());


not related. better a separate PR

ggerganov

I consider such kind of changes with very low priority and almost unnecessary.

The reason is that this is only really needed when making a production-ready software where you need to care about wether your files got corrupted or someone could be trying to exploit the software, etc.

This project is not that, since we are not expecting it to be used in serious / commercial products. llama.cpp is mostly a playground for exploring inference techniques and applications of LLMs. So we want to keep things simple, even at the cost of some good programming practices and proper input sanitisation.

At a later stage, when and if we start building a proper ggml-based engine, we can start considering this kind of safety issues more seriously. But for now, it's not really worth it.

Since you've already put the work into making this, I am approving it.
But unless you feel super strong about adding these changes, I recommend to close the PR

ggerganov · 2023-04-02T09:42:38Z

llama.cpp

-                        {
-                            fprintf(stderr, "%s: unsupported quantization type %d\n", __func__, type);
-                            return false;
-                        }


The switch format using {} has to remain as it is

This works around a Win32 issue when piping output from a PyInstaller context, such as when doing so in a perl script or to an output file. Print statements from a Python context don't properly get output unless flushed. This strategically flushes the print statements so no information is lost, though it may be better to flush all print statements in a Python context via a subroutine wrapper. See also: https://mail.python.org/pipermail/python-bugs-list/2004-August/024923.html https://stackoverflow.com/a/466849 https://stackoverflow.com/q/62693079

jart force-pushed the dayzero branch from 44133f6 to fed6b5d Compare March 31, 2023 14:28

sw reviewed Apr 1, 2023

View reviewed changes

howard0su reviewed Apr 1, 2023

View reviewed changes

ggerganov approved these changes Apr 2, 2023

View reviewed changes

comex mentioned this pull request Apr 4, 2023

Bring back the ggml model format and revert breaking mmap change (#613) #711

Closed

ggerganov closed this Apr 13, 2023

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix memory bugs in loading code #651

Fix memory bugs in loading code #651

jart commented Mar 31, 2023

sw Apr 1, 2023 •

edited

Loading

howard0su left a comment

howard0su Mar 31, 2023

howard0su Apr 1, 2023

howard0su Apr 1, 2023

howard0su Apr 1, 2023

howard0su Apr 1, 2023

howard0su Apr 1, 2023

howard0su Apr 1, 2023

howard0su Apr 1, 2023

howard0su Apr 1, 2023

ggerganov left a comment

ggerganov Apr 2, 2023

Fix memory bugs in loading code #651

Fix memory bugs in loading code #651

Conversation

jart commented Mar 31, 2023

sw Apr 1, 2023 • edited Loading

Choose a reason for hiding this comment

howard0su left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggerganov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sw Apr 1, 2023 •

edited

Loading