Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implement ToPyObject for bytes::Bytes #1992

Open
Eric-Arellano opened this issue Nov 15, 2021 · 13 comments
Open

Implement ToPyObject for bytes::Bytes #1992

Eric-Arellano opened this issue Nov 15, 2021 · 13 comments

Comments

@Eric-Arellano
Copy link
Contributor

The bytes crate from Tokio is really useful for operating on bytes in a performant way: https://docs.rs/crate/bytes/1.1.0.

We use it in Pants. I'd love to convert a #[pyfunction] we have to be able to stop acquiring the GIL inside of a future so that we can better use py.allow_threads(||). But lifetimes mean that we only have access to &[u8] in the future, so we would need to convert into Vec<u8>, which is more copying than we can tolerate, especially because iirc that Vec<u8> would then get copied again into the PyBytes?. https://github.com/pantsbuild/pants/blob/d62a3cd87c114e379748edbfa279f6fa11da6fd5/src/rust/engine/src/externs/interface.rs#L1399

Instead, I think I'm hoping to have that future convert the &[u8] into Bytes (which does zero copying). And then PyO3 will convert Bytes into PyBytes automatically outside of my #[pyfunction] - allowing me to use py.allow_threads() the whole time. Does that make sense?

I imagine this would be a new Cargo feature.

I'm very happy to contribute this!

@mejrs
Copy link
Member

mejrs commented Nov 15, 2021

convert the &[u8] into Bytes (which does zero copying)

Only if the slice has a static lifetime. Otherwise the slice has to be copied regardless.

@davidhewitt
Copy link
Member

This would be very welcome IMO! I've wondered about adding an optional bytes dependency myself in the past but never got around to it. There's a bunch of optional deps like num-bigint with conversions added in src/conversions - you could add a new bytes.rs file following those examples. 👍

@Eric-Arellano
Copy link
Contributor Author

Only if the slice has a static lifetime. Otherwise the slice has to be copied regardless.

Oh, hm, yeah you're right: https://docs.rs/bytes/1.1.0/bytes/struct.Bytes.html#method.copy_from_slice Good point. We can possibly change that internal API to return Bytes though rather than &[u8], though, so this feature request would still be really helpful to Pants.

you could add a new bytes.rs file following those examples. 👍

Sweet! Thanks.

@mejrs
Copy link
Member

mejrs commented Nov 15, 2021

I don't think we need anything new for this. Converting Bytes into PyBytes works today (specifically, Bytes autodereferences to &[u8]):

use pyo3::prelude::*;
use pyo3::types::PyBytes;
use bytes::Bytes;

fn main() {
    let bytes = Bytes::from_static(&[1,2,3,4]);
    
    Python::with_gil(|py|{
        let pb = PyBytes::new(py, &bytes);
        dbg!(pb);
    })
}

Looking at the capi and Bytes' implementation it doesn't look like we can do something clever here.

@davidhewitt
Copy link
Member

Bytes may not work as a return value from #[pyfunction] without an IntoPy<PyObject> trait implementation? (And similarly as a #[pyfunction] argument.)

@mejrs
Copy link
Member

mejrs commented Nov 15, 2021

That is true, and it would be more convenient. But it has some footgunny behavior - I can totally see someone starting with a Vec<u8>, converting it to a Bytes (clone one), which then gets converted (second clone) into a PyBytes. While it would in fact be more performant to just use PyBytes::new(py, &my_vec) (cloning only once).

I don't think it would be a disaster or anything, but I think it would be a mistake to offer this api if we can't implement it in a clever way.

Perhaps this is best offered in a container-agnostic way, like impl IntoPy<Py<PyBytes>> for AsRef<[u8]> (would that work? I haven't tried), which also requires an explicit conversion.

@davidhewitt
Copy link
Member

I believe Bytes::from(vec) just takes ownership of the underlying buffer rather than having to copy the memory (but I could be wrong).

impl IntoPy<Py<PyBytes>> for AsRef<[u8]>

Implementations like this are possible, though I kinda don't like them because I think each Rust type should map to only one Python type. In the future I would like to remove the generic parameter from IntoPy and instead have an associated type, but last time I looked at that it needs min_specialization to be able to deal with some edge cases like () mapping to Python tuple except in return values where it should map to None.

@mejrs
Copy link
Member

mejrs commented Nov 15, 2021

I believe Bytes::from(vec) just takes ownership of the underlying buffer rather than having to copy the memory (but I could be wrong).

It goes through a boxed slice, which can reallocate if there is excess capacity.

@davidhewitt
Copy link
Member

davidhewitt commented Nov 15, 2021

It goes through a boxed slice, which can reallocate if there is excess capacity.

Ah interesting, I wasn't aware of that. I guess that's one of those impossible API questions where different users want different things 🤷

EDIT: actually thinking about it, not reallocating and continuing to hold the extra capacity is a perf optimization that would confuse a lot of users seeing excess memory consumption, so std probably makes the right choice.

@mejrs
Copy link
Member

mejrs commented Nov 18, 2021

I'm thinking the best way to do this is to create an object that wraps a Bytes and implements the buffer protocol?

@davidhewitt
Copy link
Member

That would be very efficient!

Because of the complexity of #1444 I shy away from utility pyclasses in libraries at the moment. It's definitely something we need to figure out, however at the moment each extension module would get its own copy of such a type and they wouldn't compare equal etc.

@davidhewitt
Copy link
Member

(However we could create an example for users to adapt in their own code? Also in #1884 I hope to offer a safe way to implement buffer protocol using #[pymethods] magic methods soon...)

@Eric-Arellano
Copy link
Contributor Author

An example sounds totally reasonable to me!

I definitely would not want adding this convenience to impede future foundational improvements to PyO3.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants