Developer Journal - May 2018

This developer-journal entry was generated with a local AI model from my public GitHub activity, then reviewed before publication.

Digest

May 2018 focused on infrastructure setup for database pipelines and expanding Spark ML utilities, including XGBoost integration and refactoring feature classes.

main java codebase

Updated pom.xml to use a generic jblas version and added Docker scripts to automate database creation for 20th century NCEP datasets. Fixed a missing database reference in the Docker setup.

Notable sources

https://github.com/dafrenchyman/mrsharky/commit/95875017e0a7e5eb64601c762c344d5b6cadedc7
https://github.com/dafrenchyman/mrsharky/commit/f4caacc4d348d13ba5658852c4b902cf4395f481

Helpful Java Spark stuff

Added pipeline utilities and unit tests for data processing functions. Implemented XGBoostEstimator for Java Spark integration, refactored TopCategories to use Map parameters, and introduced WeightOfEvidence feature. Updated documentation and test coverage.

Notable sources

https://github.com/dafrenchyman/spark/commit/b997ce80d0b34d604807a87e03a88739b4be23c6
https://github.com/dafrenchyman/spark/commit/0584a45747b1bc8a774d1ea820f3a99f969355ff
https://github.com/dafrenchyman/spark/commit/6bd9a9f17854d44266377f2ab1c47cec43267fcb
https://github.com/dafrenchyman/spark/commit/0b9dbfe8be1f0a55ba521a3acab1274791036917
https://github.com/dafrenchyman/spark/commit/5fcd26ff929eddbc4508a5d7301d7bfe2cd345f5

Sources

dafrenchyman/mrsharky
dafrenchyman/spark
9587501
f4caacc
b997ce8
25aeecc
0584a45
6bd9a9f
0b9dbfe
5fcd26f
a361a01
6505915
ade0b4a
25ecb70
c99446e
4ce861c
630af24