Proposal for GSOC 2015: Data Wrangler extension for WSO2 Machine Learner

/
0 Comments

Data Wrangler extension for WSO2 Machine Learner

Overview

What is WSO2 machine learner?
                    Machine learning [1] is about building models from sample data. So that these models can be used to analyse big data. Machine learning is a key component of analysing big data. WSO2 is in need of integrating machine learning into their platform. This can be used in various WSO2 products.

What is Wrangler?
                    Wrangler [2] can be used to transform and clean input test data. According to this transformation, Transformation scripts can be generated either in python or JavaScript. These scripts can be used to analyse big data.

What is Jaggery?
                Web applications need various web technologies. Front-end needs HTML, CSS, JavaScript. Back-end deals with POST, JSON, XML. Server side logic is in java, PHP. With all these technologies mismatches occur. Jaggery [3] is a Completely JavaScript based way to eliminate with mismatches in web development.

The idea behind the project
                    Project is Data Wrangler extension for WSO2 Machine Learner. Idea of the project is create the Wrangler in the machine learner. In here Jaggery UI will be used instead of Wrangler UI. Jaggery is connected to Wrangler. When an end-user connects to Jaggery and Transforms & cleans the data, Scripts are generated from Wrangler. Those scripts are converted into java and send to the back-end. These converted scripts can be used to transform big data efficiently. Wrangler, Jaggery and back-end all belong to machine learner. In this project Wrangler is also integrated into machine learner as an extension.


Design and Implementation Plan
Technologies used in this project
               Java/HTML/JavaScript
               Apache Spark
               Jaggery

Machine Learner Architecture
architecture-diagram.png
All the part implemented in should be added to the above architecture. Where each part is relevant in the above structure is explain below.  

  1. Implementing Jaggery UI

  • Selecting subset of operations from Wrangler and Design customized UI.
  • According to the design implementing the user interface using Jaggery.
  • This should be added to the top of the above structure.


      2.) Connect Jaggery UI to Wrangler
Wrangler has user interface and also back-end for scripts generating either in python or JavaScript. Jaggery UI should be usable as Wrangler UI for selected subset of operations. Wrangler libraries can be used to perform tasks on data cleaning and transformations for Jaggery UI. In order to this Jaggery UI should be connected to the Wrangler libraries. Next part is to get the generated JavaScript for selected operations using Jaggery by the user. In here Jaggery UI should be connected to the back-end of the Wrangler. Operations which is selected by the end user using Jaggery UI set as the input for the Wrangler back-end and get the generated scripts in JavaScript. In order to do this part Wrangler API can be used. Otherwise study how Wrangler works and create an API for done above connections. Wrangler should be in between machine learner UI and machine learner REST API.
                  3.) Conversion
This is the core part of the project. Get the scripts generated from the Wrangler. The API is necessary for the getting those scripts. Develop the API for doing that.  Use Apache Spark transformation libraries and mapping the JavaScript functions of generated scripts to Java code. The generated scripts have a particular format. Understanding the format and Developing a tool for compiling the generated scripts into java.  

      4.) Developing the Back-End
The back-end should be in the machine learner core shown in the above diagram. Developing the back-end in the machine learner core. Include those converted scripts into the back-end.   

     final structure of all the parts included can be shown as below.

     Deliverables    

  1. The implementation of the extension.
  2. Documentation and tests.
  3. Final documentation.

        

    Timeline

Pre-work
  • Get an idea about Data Wrangler API.
  • Literature survey of Jaggery and sample codes.
  • Literature survey of Apache Spark Transformation.
  • Literature survey of WSO2 machine learner.
May 11 — May 24
  • Implementing user interface using HTML/Jaggery.

         May 25 — June 14
  • Connect Jaggery UI to Wrangler.

June 15 — June 25
  • Developing API for getting the generated scripts.

         June 26 — July 3
  • Mid-term evaluation.
July 4 — July 26
  • Implement a tool for converting the generated scripts into java.
          
         July 27 — August 2
Developing the back-end for including converted scripts.


You may also like

No comments :

Powered by Blogger.