Google Summer of Code 2018 Ideas

Ideas for Google Summer of Code 2018.

Contact

Feel free to reach us by joining #sciruby on chat.freenode.net or via our mailing list.

IMPORTANT NOTICE: SciRuby encourages diversity. Scientific progress in general benefits from diversity and software development for science is no exception. We are really happy that the number of people from Asia, Africa and South America applying for GSoC projects is increasing. Our org admin this year is from India, our previous org admin was from Brazil. We have had students from Japan, India, Sri Lanka, Russia, etc. We have women software developers in our programme. We are happy to hear from you all!

Instructions for students

We strongly recommend that you pick one of the ideas listed below. We value contributions in advance of GSoC, even if they're just little ones. Go pick out something in one of our trackers and work on it, talk to folks on the listserv, and get an idea for what features are needed.

You don't need to know a lot about Ruby to work on a project: depending on how much you already know, it'll be pretty easy to learn enough to be able to contribute. However, you may need some familiarity with scientific computation. If you don't have any, take a look at "Numerical Recipes in C", which you'll probably find in your university's library.

In any case, if you feel your skills aren't enough for some project, please ask us on our IRC channel (see contact section above) or our Google Group (see sciruby.com to #) and we can help you.

Read this before you commit your first patches

Most of the main SciRuby’s landing page on Github holds the stable version of SciRuby gems but developers and contributors should work on the very latest (bleeding edge) repositories in order to make sure that changes can be committed without conflict arising.

Try reading ~~Finding The SciRuby Development Repositories on Github~~ if you would like a brief introduction on finding the latest development gems to work on from Github. Also go through the coding guidelines before sending your first patch.

How to submit a patch ("pull request")

Here's a great tutorial: http://www.thinkful.com/learn/github-pull-request-tutorial/

Have a look and feel free to ask if you have any questions.

Instructions for mentors

Guidelines for mentors to submit projects:

Specify the name of your project as a heading.
Write a paragraph or two with further details.
Write a small 'Skills' section detailing the skills that the student must possess to complete the project.
Write down your own GitHub handle and contact details in a 'Mentor Details' section over which the student can contact you.
If anyone else wants to co-mentor a project, please specify your details along with the mentor's details.

Project Ideas

Visualization projects

Ruby Matplotlib

There exist several plotting libraries for Ruby, but none of them is as readily usable as either Matlab's plotting system or the Matlab-inspired Python Matplotlib. This lack of a matlab/matplotlib-compatible Ruby plotting library is Ruby's single biggest obstacle as a scientific library.

This project is longer term than just one GSOC, but the Ruby Science Foundation is prepared to allocate funds to provide an ongoing grant for development of such a library.

Several approaches have been discussed:

Provide a Ruby API for the C code in Python matplotlib. This approach has been considered in the past, but is almost certainly infeasible, as matplotlib is extremely tightly coupled with Python.
Provide a Ruby-to-Python bridge to expose Python matplotlib in Ruby. The largest problem with this strategy is the challenge of debugging across three languages. Suppose a plot call doesn't work; is the problem with the underlying C code, the Python code which accesses the C code, or the Ruby–Python interface?
Rewrite matplotlib in a language-independent form and expose it to Ruby. This is an enormous task, but a key advantage is the language independence; other languages' communities could provide APIs as well, and thus provide some of the locomotive force for development of such a language-independent tool. This would likely need to be attempted in C++ rather than C (due to ready availability of data structures). It is an enormous task.
Write a Ruby matplotlib from scratch. This approach should be significantly easier than the language-independent matplotlib, but may not have as broad of appeal. It allows you to use Ruby's native data structures (or possibly NMatrix) for storage, thus relying on underlying C (and/or Java code) so you have less rewriting to do.
Other? Perhaps you can think of something we've missed. Remember that GSOC with SciRuby is about researching and presenting the best solution, and then following through on it; and collaboration with others.

Mentors: John Woods (@mohawkjohn) for approach 3, 4, or 5.
Recommended skills: C/C++
This project may be able to accommodate multiple students with proven teamwork skills.

Numerical projects

Native CUDA kernels with Rubex and RbCUDA

Similar to native CUDA kernel support in Julia, we should have support in Ruby.

Since it is tough to augment the Ruby VM to support this, it can be done in an easier way using Rubex and RbCUDA. For example, a sample Rubex method that would compile to a native CUDA kernel can be defined with cudadef and written like this:

require 'rbcuda_native'
require 'rbcuda'

include RbCUDA::Driver

cudadef kernel_vadd(a, b, c)
    i = threadIdx().x
    c[i] = a[i] + b[i]
    return
end

# generate some data
len = 512
a = rand(len).to_i
b = rand(len).to_i

# allocate & upload to the GPU
d_a = GPU_Array.new(a)
d_b = GPU_Array.new(b)
d_c = GPU_Array.new(d_a)

# execute and fetch results.
kernel_vadd(d_a, d_b, d_c) with cuda(1,len)
c = d_c.to_cpu_array

Skills: C/C++, CUDA, Ruby, Ruby C API, parallel programming, familiarity with design of compilers.
Mentor: Sameer Deshmukh @v0dro, Prasun Anand @prasunanand
Difficulty: Moderate.

NMatrix projects

NMatrix is SciRuby's numerical matrix core, implementing dense matrices as well as two types of sparse (linked-list-based and Yale/CSR). NMatrix is a fairly well-established project which has received Summer-of-Code-like grants from both Brighter Planet and the Ruby Association (in other words, from Matz, who created Ruby). Those who contribute to NMatrix will likely eventually become authors of a jointly-published peer-reviewed science article on the library. Additionally, NMatrix is a good place to gain practical C and C++ experience, while also working to improve Ruby.

NMatrix currently relies on ATLAS/CBLAS/CLAPACK and standard LAPACK for several of its linear algebra operations. In some cases, native versions of the functions are implemented, so that the libraries are not required. There are quite a number of areas for growth in terms of the capabilities of NMatrix here.

Speed up element-wise operations in NMatrix

Mentors: John Woods (@mohawkjohn) , Prasun Anand(@prasunanand)
Per this discussion, constraints of the Ruby language currently slow down element-wise addition and subtraction for NMatrix objects. There are possibly some work-arounds, described in that email thread. A successful proposal would involve some preliminary research and design work on how to speed up element-wise operations.
Recommended skills: Some C/C++ would be beneficial, as you'll need to be working under the hood on NMatrix.

Daru and general Ruby projects

Mentors: Victor Shepelev (@zverok);
Co-mentors: Athitya Kumar (@athityakumar), Shekhar Prasad Rajak (@Shekharrajak);
Recommended skills: some (may be very small) experience with Ruby/Rails ecosystem, understanding or readiness to understand what other (non-scientific) Rubyists love and want

Business Intelligence with daru

Come up with your own ideas for Business intelligence applications with daru. It can be especially suitable for those types of software:

Reporting and querying software. Think library with daru inside and business-ready DSL outside, like "fetch something from DB and prepare several reports and export this to spreadsheet".
Digital dashboards (represent some data with lot of ways, tables, visualisations and so on)
Data cleansing (see below as a separate point)

Data cleaning library

Data pre-/post-processor for daru, akin to janitor R toolset: finding and dropping problematic rows and columns, getting rid of outliers, recoding wrong column types and so on and so force.

Ruby/Rails data analysis tools

In order to be closer to "general" Ruby developers, we could work on daru-based load-analyse-process-visualise data tools for such kinds of information as:

Rails logs;
ruby-prof and other measuring tools output;
...

The project in this area should go as a library, which:

uses daru, daru-io, daru-view;
can load data from specified format (say, Rails logs);
has a set of easy-to-use, already set up visualisation and grouping/analysing routines, producing useful and understandable results;
includes set of demos, including stand-alone scripts, integrateable into web-framework dashboards and IRuby notebooks, showcasing the usage and utility of the library.

NetworkX.rb

Mentors: Sameer Deshmukh (@v0dro), Athitya Kumar (@athityakumar);
Recommended skills: Some experience with Python and/or Ruby, a basic understanding of graphs, and familiarity with the networkx library.

A network analysis and graph library for Ruby, based on the NetworkX library of Python. It is intended to handle various use-cases of the Graph Data Structure. The different types of classes to be implemented are,

Graph (or, undirected graph)
DiGraph (or, directed graph)
MultiGraph
MultiDiGraph
Weights and other parameters of an edge are supposed to be specified as keyword arguments.

Each of these graph classes has

enumeration facility to do something like graph.each_node { |n| puts n }
set of manipulation functions such as add_edge, add_node, etc.
set of algorithms like BFS, DFS, etc.
set of analysis functions such as cardinality, diameter, etc.
set of IO methods (establish a bridge between NetworkX graphs and Daru DataFrames, and then use daru-io?)
set of plotting methods and View Helpers for usage in web applications (similar to daru-view)

The approach currently being considered for all graph classes is the nested Hash data structure. However, if you feel there's a better way to handle the data, feel free to suggest it here and/or in this mailing thread. The nested Hash used internally, looks like the below -

graph_hash
#-> {node_1: {node_2: {weight: 5}}, node_2: {node_1: {weight: 3}}}

graph_hash[:node_1]
#=> {node_2: {weight: 5}}

graph_hash[:node_1][:node_2]
#=> {weight: 5}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly