Janki Bhimani

Address: 

Janki Bhimani

Email:   bhimani@ece.neu.edu 

              LinkedIn: www.linkedin.com/in/jankibhimani            

                                                      

SUMMARY

Janki Bhimani ((bhimani@ece.neu.eduis PhD from department of Electrical and Computer Engineering at Northeastern University, Boston. She is the recipient of best paper awards at the IEEE CLOUD in 2018 and IPCCC in 2017. Currently, she has nine publications in highly selective conferences along with ten other conference publications. She has five journal publications with two of it in IEEE Transactions. She is also main inventor of four top graded patents. She has served Northeastern University as an instructor, teaching a 4-credit course on fundamentals of engineering algorithms to undergraduate students. She received excellent feedback from her class with 4.4/5 as her instructor effectiveness mean. She has worked closely with Samsung research labs during her PhD.


 

RESEARCH INTERESTS

System Performance Engineering; Flash-Based Storage Enhancement; Big Data Processing; Virtualization; Docker Container Scheduling; Datacenter Endurance and Reliability; Parallel Full-Stack Processing; High Performance Computing; Performance Modeling and Prediction; Capacity Planning; Resource Management; I/O Workload Characterization.


 

EDUCATION

  • Ph.D. in Computer Engineering, Northeastern University, Expected 2019.
  • M.S. in Computer Engineering, Northeastern University, 2016.
  • B.S. in GITAM University, 2013.

 

AWARDS, HONORS, AND FELLOWSHIP

  • 2018 The Best Paper Award at IEEE International Conference on Cloud Computing (IEEE CLOUD).
  • 2017 The Best Paper Award at 36th IEEE International Performance Computing and Communications Conference (IPCCC).
  • 2014 Double Husky Scholarship, Northeastern University.
  • 2012 The Best Budget Robot Award at 3rd Lunabotics International Mining Competition, NASA, FL for developing the most innovative and cost efficient Lunar Rover to operate on Synchronized Regolith at NASA Kennedy space center.
  • 2011 The Outstanding Debate Performance Award by Institute of Engineers India (IEI)
  • 2010 The Impromptu Speaker Award by International Society for Technology in Education (ISTE)
  • 2010-2013 University Merit Scholarship, GITAM University

 

CURRICULUM VITAE

[PDF] (Updated by Oct 2018)

Research Statement - [PDF]

Teaching Statement - [PDF]


 

REASERCH PROJECTS

KV-Kmeans 

  • Develop HPC Key Value API that translates file based machine learning applications to key and value based applications.
  • Simplify application data management by removing filesystem and offloading data storage from block based SSDs to key-value based SSDs.
  • Integrate OpenMP pragma with KV-kdd protocols to implement hybrid key-value based multi-threaded unsupervised clustering application with a KV SSD plug-in as primary I/O path.

Efficient System for Identifying Data Temperature for Stream Identification in Multi-Stream SSD (On Going)

Innovate a new data structure based upon bloom filters for efficient data temperature categorization which can be used to identify streamIDs while writting data into multi-stream SSDs. It is a memory efficient technique designed by keeping in mind the limited resources available within an SSD device. 


PatIO: Pattern I/O Generator 

PatIO is an orthogonal  approach to advancing a naive  synthetic I/O  engine and to  producing  I/Os  that  represent real-world workloads. Our methodology is based on a three-step process: dissect, construct and integrate. We first study I/O activities of real application workloads from storage point of view. We dissect the overall I/O activities of various real workloads into distinct I/O patterns. Then, we construct a pattern warehouse as the collection of all patterns. Each pattern is framed by a unique combination of various I/O jobs that can be generated by an I/O generating engine (e.g., FIO, a popular I/O engine) with the input of different features. Finally, different combinations of these synthetically generated I/O patterns are capable to reproduce the characteristics of various real workloads. We would like to emphasis that  our method is lightweight as it neither demands a large amount of storage resources to store traces or information of chunk characteristics, nor requires tedious and time-consuming installation, configuration and load phase of database before running. Furthermore, PatIO is scalable to generate I/O workloads over different storage sizes. 


Comprehensive Design Guidelines and Scheduler for Mapping Workloads to Modern Storage Platform 

Design and develop a Docker Workload Controller to decide the optimal initialization and operation of containerized docker workloads running on multiple NVMe SSDs. Our controller decides the optimal batches of simultaneously operating containers in order to minimize total execution time and maximize resource utilization. Meanwhile, our controller also strives to balance the throughput among all simultaneously running applications. We develop this new docker controller by solving an optimization problem using five different optimization solvers.

Image result for Docker Container Scheduler for I/O Intensive Applications running on NVMe SSDs


FIOS: Feature based I/O Stream-ID assignment for Multi-Stream SSDs 

Leverage multi-stream SSD firmware by inventing smart stream ID assignment algorithm for muti-stream SSDs to provide better endurance of flash devices and enhance the lifetime of SSDs. Develop an algorithm which may reduce WAF and can be adapted easity to any appplication as well as simultaneous multiple applications.


I/O Intensive Containerized Applications on Flash

Performance characterization to enable best performance and fairness for I/O intensive dockerized applications running on NVMe SSDs by implementing and exploring homogeneous and heterogeneous database workload container setup.


 


Performance Modeling of PCIe bus between Flash Cache devices, Accelerators and ALU

Extrapulate the design space in terms of Bandwidth, lanes, duplexing, overlap, traffic overhead, TLP overhead, initialization and symbolic encoding to model data transfer using PCIe bus.

Phase 1: Analyse actual platform

Phase 2: Design space exploration

Phase 3: Built simulation model

Phase 4: Performance analysis


Multi-Tier Hetrogeneous Storage Compatibility with Multi-Node Parallel Homogeneous Network 

Workload characterization with IO modeling and compute latency evaluation for system consisting of storage data center, parallel computing nodes. To study the processing delays in deap hirerchical structure.

Phase 1: Bottleneck testcase developement

Phase 2: Mimic compute nodes, master node, SSD and HDD

Phase 3: Parallel processing analysis

Phase 4: Optimal resource management and capacity planning


Simulation Queuing model of Multi-CPU, Multi-GPU Hetrogeneous Platform 

Model and Calibrate the platform consisting of multi stack of CPU's connected with high speed ethernet among each other. Each CPU connected with multi GPU accelerator with PCIe data link of diffence capacity.


Accelerating Performance of Application by Parallel Implementation 

Improved application performance to reduce total execution time by experimenting innovative algorithm design on different parallel platforms using multi-thread (OpenMP), multi-process (MPI), and accelerators such as GPUs (CUDA).

  • Coding: Implemented applications on multi-core, multiprocessor and heterogeneous clusters.

  • Algorithm design: Profiled and improved the critical area consuming maximum time by innovative algorithm redesign.

  • Inspect: Increasing parallel resources does not guarantee improvement in performance, so explored the performance of application with increasing parallel resources and different workloads to achieve optimal operation.

  • Evaluate: Studied speed-up and overheads of various phases of application on different platforms. 

Parallel Calculation Performance Prediction 

Designed analytical Markov model to predict calculation time and optimal performance settings to save time, resources and power spent in deployment of any application on distributed computing platform.

  • Markov model design: Modelled non deterministic time to completion with changing number of parallel resources using stochastic techniques.

  • Machine learning model design: Emulated hardware characteristics to estimate the hardware parameters for hardware independent operation of stochastic model. This made our technique unique.

  • Evaluate: Computed prediction error, perform classic rank study using different workloads and consolidate computing clients with preference to obtain maximum correlation between actual and predicted results.

Communication Network Latency Prediction 

Simulated communication network by developing appropriate queuing model to predict latency for different patterns of communication like scatter, broadcast and gather.

  • Data analysis: Researched different patterns of communication in uplink and downlink channel, using appropriate micro benchmarks to identify possible delays and lags.

  • Model formulation: Designed a queueing model that simulates the crowded wireless network environment with single master and multiple compute clients.

  • Calibrate: Computed the modelling constants using the set of training datasets and performed Chi − square goodness-of-fit test to evaluate accuracy of calibrated values.

  • Evaluate: Tested model to predict the communication latency using real time applications such as K-means clustering, Pagerank, Computed Tomography image reconstruction etc.

Modified MOESI protocol with improved cache performance simulated using Multi2sim

Redesigned the MOESI protocol to achieve better cache coherency to decrease the usage of off-chip bandwidth.

  • Instrument: Diagnosed the existing MOESI protocol using a trace generated by ‘Pin tool’ from ‘PARSEC suit benchmark’.

  • Protocol design: Remodeled the ‘Shared’ state to allow same level cache read from new common memory location on data fetch read request while in traditional method the request is forwarded to lower level of cache.

  • Deployment: Implemented new protocol as a pluggable module to Multi2Sim cache simulator and evaluate using three benchmarks: Blackscholes, Pthreads and Matrix.

  • Collaborative work: Worked with Multi2Sim simulator design team to modify existing cache simulation module.