-
Notifications
You must be signed in to change notification settings - Fork 3
Short-term plan for this binding #2
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
In addition to the to-do list above, I also want to gradually bring this project to a relatively stable release rhythm, such as regularly synchronizing upstream updates or releasing. This can ensure that the project has a basic vitality. My personal goal for this project is to make it an official part of DataFusion (ref to apache/datafusion#13815), just like the Python binding. People can easily use DataFusion for computations in more scenarios. At the same time, this project already has several application scenarios (datafusion-wasm-playground, parquet-viewer, or the ongoing one apache/datafusion#13818), ensuring that the basic release rhythm can also facilitate existing projects to use it better. |
Hi @waynexia , This is Tommy. I’m a master student from Georgia Institute of Technology. My interests wide spread in DBMS, OS, compilers, etc direction, and DataFusion really caught my eyes. I’m would like to on Robust WASM Support for GSoc 2025, and I’ve expressed my interests under [EPIC] A collection of tickets for improved WASM support in DataFusion. Currently, I’m working resolving issues in DataFusion to get myself familiarized with the DataFusion codebase, and I plan to check more on datafusion-wasm-bindings (or other wasm related items) after my current PR sets are completed. I think this issue maybe one of the things I need to tackle for GSoc 2025. I’ve put together a rough draft of my proposal. Just a heads-up: GitHub shows the state of the repo at the time of commenting, so it might not reflect my latest updates. I’d really appreciate any feedback or suggestions when you get the chance! (It’s still an early version, so the goals and timeline might shift a bit as I get a better handle on the project.) I also reached out to @alamb—he was super kind, but mentioned he’s not too familiar with this area and might not be able to give detailed feedback. Thanks a lot for your time, and I’d love to hear any thoughts you have! |
Maybe @XiangpengHao has some idea of what would be useful |
Hi @qstommyshu you plan looks good to me! I'd suggest to make it more concrete and more focused, though. Here are my here are my two cents:
![]() Making this demo is already quite a lot of work and fun, making it work smoothly with different storages, file formats, browsers can be especially challenging.
|
Thank you @XiangpengHao for your reply,
Got it. I was thinking since there's already a DuckDB version to reference, it probably wouldn't take too long to build. But I definitely underestimated how much work goes into making a WASM shell playground — especially with all the compatibility stuff across different browsers and storage options. I think I’ll need to take another look at how much time this part might take. The reason I want to start with the live playground is because I feel like it can be built somewhat independently from the current WASM bindings. My idea is to get a basic shell working with what we have now, and once the bindings are updated, we can plug those changes into the shell pretty easily. Is my assumption about the wasm playground and the wasm binding correct?
Can you please point me to a working GitHub repo so that I can play around with it and see how I can tweak it? Having a something to look at would be very helpful!
Definitely! I also agree having good documentation is a key to let new comer to learn about the project. Documentation would be an important part in my GSoc proposal. |
Yes I think the wasm playground is a good start: https://github.com/datafusion-contrib/datafusion-wasm-playground |
Hi @XiangpengHao, @alamb, and @waynexia, I've been tinkering with @waynexia's code and datafusion. Now I have a pretty good grasp of how things work and how to get started on the live playground. I got something to work on my local: I realize my previous proposal might've skipped a few details. I'll update it tomorrow and submit a first draft (GSoc site said I can submit infinite times before the deadline), and I'll keep refining it as I dive deeper into each topic. |
Hi @qstommyshu, sorry for the late reply. As @XiangpengHao mentioned, the datafusion playground can be a great component to get people involved in datafusion project. Two major parts (the WASM binding and a frontend playground) covered in your proposal look good (thanks for your proposal ❤️). I suggest further subdividing them as follows:
I want to clarify that not all the tasks listed above need to be included in this GSoC project. I'm just posting my thoughts and we'll pick some to implement. Maybe @XiangpengHao @alamb or others also have some points. |
From my perspective, both have similar priorities. Running a WASM UDF requires a WASM runtime, and |
Hi @waynexia , Thanks for your thoughtful reply on my proposal ❤️, I've submitted an updated version to GSoc before the deadline yesterday. I think the updated proposal covered everything you mentioned.
Of course it would be hard to provide full support for all these items in just a summer. My core goal is to:
I had some frontend development experiences through my previous internships, the coding part should not be hard for me.
And there are some stretch goals in my proposal, I will worry about them after these goals are achieved. |
Task list for the first milestone.
The first milestone should:
Tasks:
SessionContext
)RecordBatch
1368bc5The text was updated successfully, but these errors were encountered: