NOMAD – leading the development of a scientific data sharing platform (2018 – now)

I joined the NOMAD project in 2018. At this point, it's many project partners had already created a isolated services based on incompatible technology stacks. In my role as architect and lead developer, I had to remove a lot of technical dept, introduce rigor engineering practices, and consolidate everything into a uniform sustainable software solution based on modern web technologies. As a single experienced software developer among many scientists, I learned to effectively communicate complex technical issues across domain language barriers and tutored non computer scientist to become better developers.

React, Material-UI, D3

The NOMAD UI is the portal to the worlds largest materials science database. Interactive visualizations and a powerful autocomplete search bar allow to efficiently filter million of entries based on complex facets of heterogenous metadata.

NOMAD's complex metainfo system that describes a complex data structure comprising thousands of different properties allows this frontend to visualize and consistently represent data acquired from different 40 different databases, data formats, and material science simulation tools.

Javascript, React, Recoil, Material-UI, D3

REST API for accessing the >500 billion different research data

In NOMAD we combined a parallel filesystem with mongoDB and Elasticsearch to provide instant access to billions of different data points processed from 50TB of raw data from over 10 million material simulations. Processing runs in distributed task queues, parses over 40 different formats, and runs comprehensive classifications and normalization steps. The backends REST API allows researchers to automate their work and drives the complex SPA frontend. All NOMAD services are using a customized keycloak-based SSO user management.

Python, REST, Flask, FastAPI, Elasticsearch, MongoDB, Keycloak, RabbitMQ/Celery, GPFS

Running in a cluster

With high and unpredictable workloads imposed by large amounts of data and complex classification and normalization of scientific data, we have to run NOMAD in a cluster with tools that allow to deploy and scale all components with ease. Deployment scripts based on docker-compose or helm allow us to easily setup and maintain NOMAD installations on Max Planck's HPC compute facility in Garching and at external sites alike.

NOMAD improves constantly with contributions from many developers. Test driven development, CI/CD, and GIT is the only way to maintain its constantly growing codebase.

Docker, Docker-compose, Kubernetes, Helm, GitLAB-CI, Pytest, TDD

Research in Model Driven Development (2011 – 2017)

Large Software Models

Increasingly complex software systems, require increasingly complext software models. Large models are traditianlly managed with SQL database persistent layers. However, SQL databases do not scale enough or are otherwiese indadequate due to the graph-like nature of software models in certain application contexts.

EMF-Fragments [ ] is a NoSQL persistent layer for EMF models. With EMF-Fragments, I explored the scalebility properties of a fragmentation-based persistence that facilitates the nature of document databases like mongoDB.

Eclipse, EMF, Java, MongoDB, NoSQL

Mining Source Code Repositories

On application for these large software models is model-based mininig of source code repositories. SrcRepo [ ] uses reverse engineering to create large abstract syntax trees models of whole source code repositories comprising all branches and revisions of code.

Reverse Engineering, Java, Metrics, Data Mining, Statistics, Git

Interactive Visual Analysis

To explore big data more intuively visual analytics uses interactive visualizations that allow uses to unserstand the relationships within complex data-sets based on visual connections between different data representations.

I build a web-based data visualization framework on top of Angular2 and d3.js called d3ng [  ]. With D3ng clients can create complex visualizations from normal charts that show relationships via brushing and linking. Selections in one chart influence the representation of the selected data-points in other charts and connections are visualized via colors.

Try for yourself:

Angular2, node.js, D3.js, TypeScript

DSLs

Domain specific languages (DSL) are computer languages that adhere to the specific conventions and vocabulary of a specific domain and therefore allow for more precise, more readable, and more accessible expression. I created the Textual Editing Framework (TEF) [ ] for dynamic languages that can extend and change their syntax in a running IDE.

Eclipse, EMF, Xtext, GMF, Domain Modeling

Sensor Networks

To research wireless sensor network applications for smart cities, we build the 300 sensor network HWL, and I created the model-based sensor network experimentation framework Clickwatch [    ] to experiment with it.

Eclipse, EMF, XML, Statistics, MongoDB, NoSQL, WIFI, Networks, Click

Publications

Technical reports

Review/chair activities

  • 17th International Conference on System Design Languages: SDL Forum 2015, Chair
  • BigMDE Workshop at STAF, 2013-2015, PC
  • Modevva Workshop at Models, 2012-2015, PC
  • EXE Workshop at Models, 2015, PC
  • DSML Workshop at Modellierung, 2008, Chair
  • International Journal on Software and Systems Modeling, Springer, Reviewer

Awards

  • DFG schoolarship as part of the Graduiertenkollegs METRIK at the Humboldt-Universität zu Berlin, April 2006
  • schoolarship granted by the city Berlin following the Nachwuchsförderungsgesetzt (NaFöG) in 2005 and 2006
  • Institutspreis of the Instituts für Informatik (HU Berlin) awarded for the best master thesis, Juli 2004

Theses

Toy Projects

XRAW

XRAW is a programming framework that allows to use any existing REST API within a JVM environment in a type-safe manner. It uses active annotations to alter the semantics of Java classes and fields to represent remote resources as if they were local data.

Twiamo comes with small excerpts of the APIs for the larger social networks Twitter, Facebook, YouTube, and Tumblr. It can easily be extended by defining types for requests and resources of more existing APIs or to create type-safe client libraries for APIs build from scratch.

XRAW is compatible with GWT and can be used for web app clients, Android, or regular desktop apps. It is open source and distributed as a maven module.

Java, Xtend, JSON, REST, Twitter, Facebook, Youtube, Tumblr, Maven

Apps & Games

Smartphone apps and app stores are probably one of the most distruptive things in the last decade that any programmer has to try build for. I experimented with a variety of platforms and technologies. I developed and published apps and games for both iOS and Android.

LibGDX, Unity, Java, C#, Android, iOS

twiamo

Twiamo was a friendship management app for Twitter. It uses metrics and keywords that allow users to quickly assess the value of other Twitter users.

Twiamo was build as a progressive web-app; a mobile-first application that feels and behaves like a native mobile app, but does not require installation or updates.

Java, GWT, App Engine, Cloud Storage, Javascript, Web Components, Polymer, Twitter API, REST

Experience

Skills

Brief CV

  • since 2017; engineer in research at Fritz-Haber-Institut der Max-Planck-Gesellschaft
  • 2011 – 2017 post-doc at Humboldt-Universität zu Berlin
  • 2009 – 2010 consultant for the adesso AG
  • Spring 2007 visiting researcher at Adger University, Norway
  • 2005 – 2009 PhD student at the Humboldt-Universität zu Berlin
  • Fall 2003 research internship at the NEC Research Laboratories, New Jersey
  • November 2002 – June 2003 software developer at the DResearch GmbH, development of telecommunication protocol handlers for the Siemens AG
  • August 1999 – November 2001 software developer for the Infopark AG
  • since fall 1998 student of computer science at the Humboldt-Universität zu Berlin

I worked for