Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Missing some IDs on insert. #192

Closed
omeid opened this issue May 28, 2015 · 13 comments
Closed

Missing some IDs on insert. #192

omeid opened this issue May 28, 2015 · 13 comments

Comments

@omeid
Copy link
Contributor

omeid commented May 28, 2015

Running the following code:

package main

import (
    "fmt"
    "log"

    rt "github.com/dancannon/gorethink"
)

type T struct {
    Id string
    A  bool
    B  int
}

func main() {

    db, err := rt.Connect(rt.ConnectOpts{
        Address: rethinkServerAddr,
    })
    if err != nil {
        log.Fatal(err)
    }
    defer db.Close()

    db.SetMaxOpenConns(20)
    db.SetMaxIdleConns(20)


    go func() {
        stream, err := rt.Db("test").Table("ts").Changes().Field("new_val").Run(db)
        if err != nil {
            log.Println(err)
            return
        }
        v := T{}
        for stream.Next(&v) {
            //To avoid race condition/overwrite by next call.
            func(x T) {
                fmt.Printf("%#v\n", x)
            }(v)
        }
        err = stream.Err()
        if err != nil {
            log.Println(err)
        }
        log.Println("END OF stream.")
    }()

    for i := 0; i < 30; i++ {
        err := rt.Db("test").Table("ts").Insert(T{A: i%2 == 0, B: i}).Exec(db)
        if err != nil {
            log.Fatal(err)
        }
    }

}

Will give you, some Ids missing:

main.T{Id:"ff5c1e65-9ab1-49b8-befc-b5efca5af414", A:false, B:1}
main.T{Id:"6ca2c8de-9b6e-43d5-910b-fcf849890652", A:true, B:2}
main.T{Id:"9ebb807d-2604-4017-8ad3-70c50a75cac2", A:false, B:3}
main.T{Id:"ce006af3-aba3-4997-8837-5e839b1d9f9d", A:true, B:4}
main.T{Id:"802178f4-29bb-4eea-a54a-75a14ebdee00", A:false, B:5}
main.T{Id:"2ebddde6-5d49-4730-be96-232c64292a79", A:true, B:6}
main.T{Id:"9b7b14da-4030-4cd6-baea-6926c3639faf", A:false, B:7}
main.T{Id:"a0717c2d-dd35-444b-b0f2-7fd8a9aac279", A:true, B:8}
main.T{Id:"d69d5f68-f78d-46ea-853f-59239005fbf3", A:false, B:9}
main.T{Id:"946de5c6-c107-44a6-8e21-f85e54ace4b8", A:true, B:10}
main.T{Id:"bef36a90-e96c-4584-a983-82b626d127a8", A:false, B:11}
main.T{Id:"", A:true, B:12}
main.T{Id:"ad108124-908b-4b2a-92bd-186a8f1878ad", A:false, B:13}
main.T{Id:"4f665222-6edf-4eec-a870-4d2eedd84703", A:true, B:14}
main.T{Id:"", A:false, B:15}
main.T{Id:"d0bf350d-d3f1-4fb7-bc58-2f4ceec927cd", A:true, B:16}
main.T{Id:"", A:false, B:17}
main.T{Id:"fa075833-d07c-454d-bca6-dff1524a3254", A:true, B:18}
main.T{Id:"", A:false, B:19}
main.T{Id:"b1fdda33-aea9-4dae-959c-c8719fc158d6", A:true, B:20}
main.T{Id:"8431c425-424f-4247-a668-6ea15ec88f99", A:false, B:21}
main.T{Id:"", A:true, B:22}
main.T{Id:"d088c93c-879c-4ccd-a10d-d09b4daaa9f4", A:false, B:23}
main.T{Id:"", A:true, B:24}
main.T{Id:"9696a198-7ce9-4dc0-a1f4-02d10ce38412", A:false, B:25}
main.T{Id:"627a2974-1590-4f59-8182-6a81beec84e2", A:true, B:26}
main.T{Id:"a086c9c6-c146-405d-bcf3-2af03b042b6d", A:false, B:27}
main.T{Id:"90f3e7d3-38b0-43cc-b778-9d889023cf5c", A:true, B:28}
main.T{Id:"47c21238-8eb5-419c-9cfb-b2d1df61b02d", A:false, B:29}

While running r.db('test').table('ts').changes()("new_val") in Data Explorer shows a correct id per document.

@omeid omeid changed the title Missing IDs on insert. Missing some IDs on insert. May 28, 2015
@dancannon
Copy link
Collaborator

After running some tests I have tracked this issue down to the code that decodes the database result. Basically the raw result from the database looks something like this:

map[string]interface {}{"A":false, "B":41, "Id":"", "id":"f0d4bc93-048a-4383-8c99-15ef0218a067"}

Because there are two "Id" fields I think that because of how Go randomly iterates through maps the driver randomly uses either "Id" or "id" when decoding into the T struct. I will look into making this more reliable (if possible) but until then you can fix this issue by adding a struct tag, for example:

type T struct {
    Id string `gorethink:"id"`
    A  bool
    B  int
}

@omeid
Copy link
Contributor Author

omeid commented May 29, 2015

Well, here is another bug, adding gorethink:"id" without ,omitempty will send the document with T.id = "" when inserting a T with unset id, this is fine as empty and unset strings in go are the same: "".

However, the driver ignores the "first_error": "Duplicate primary key id returned by the server after the first document with ""/empty id is inserted.

Running the same query, as in inserting multiple documents with "id": "" does indeed give an error about Duplicate key id.

Temporary fix for this issue is to use gorethink:"id,omitempty" but the driver shouldn't ignore "Duplicate key" errors.

@dancannon
Copy link
Collaborator

The issue with the duplicate keys occurs because the DB doesnt return any kind of error response but instead puts the error in the response body.

I think the only way of fixing this is to update the RunWrite method to check the Errors field. Unfortunately this wouldn't help if you were running your insert queries using Exec.

@omeid
Copy link
Contributor Author

omeid commented May 30, 2015

To avoid the issue of Id vs id, perhaps you can walk the Struct instead of the map? I am not sure what the performance penalty would be for going down the reflection heavy path.

@dancannon
Copy link
Collaborator

So I wasnt quite right with my first explanation, this issue is actually due to the Id field being overwritten due to http://golang.org/pkg/reflect/#Value.MapKeys returning the keys in an unspecified order. I will update the driver to sort the result of MapKeys which should at least make the behaviour more predictable.

@dancannon
Copy link
Collaborator

Merged a fix to master. This fix sorts the result of MapKeys ensuring that the result should always be the same.

@omeid
Copy link
Contributor Author

omeid commented Jun 13, 2015

How does this effect performance?

On 8 Jun 2015, at 5:29 am, Daniel Cannon notifications@github.com wrote:

Closed #192.


Reply to this email directly or view it on GitHub.

@dancannon
Copy link
Collaborator

I would hope that it shouldnt affect performance too much, however since I am relying on reflection it could be potentially quite expensive.

Have you noticed any issues since the change?

@omeid
Copy link
Contributor Author

omeid commented Jun 15, 2015

I have not updated yet and was kinda hoping I could wait for v1.0 but happy to try this fix.

My concern is that sorting makes this O(n) where N is the size of struct, this might not be much but it adds up specially when fast writes are necessary where batch insert is not possible.

On 13 Jun 2015, at 8:48 pm, Daniel Cannon notifications@github.com wrote:

I would hope that it shouldnt affect performance too much, however since I am relying on reflection it could be potentially quite expensive.

Have you noticed any issues since the change?


Reply to this email directly or view it on GitHub.

@kofalt
Copy link

kofalt commented Jun 15, 2015

FWIW, I would support requiring annotated structs to handle this.
The set of keys is unbounded, and the advantage of sorting is unclear.

The alternative is always sorting all fields on the off chance that a user has used case-sensitive overlapping keys. Since using JSON structured like that sounds like a royal pain (Id != id), I would guess that most usage of RDB does not do that unless they have to for legacy reasons.

Maybe the path forward is to document how golang struct fields may lose information, and that cautious users should annotate their structs?

@omeid
Copy link
Contributor Author

omeid commented Jun 16, 2015

👍 for documentation over compromising on efficiency.

@dancannon
Copy link
Collaborator

Agreed, Ill revert the change and add some more documentation.

@dancannon
Copy link
Collaborator

Just released a new version with the change + more documentation. Thanks for the help @omeid + @kofalt.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants