Developer Journal - May 2018
A reviewed monthly digest generated from public GitHub activity.
This developer-journal entry was generated with a local AI model from my public GitHub activity, then reviewed before publication.
Digest
May 2018 focused on infrastructure setup for database pipelines and expanding Spark ML utilities, including XGBoost integration and refactoring feature classes.
main java codebase
Updated pom.xml to use a generic jblas version and added Docker scripts to automate database creation for 20th century NCEP datasets. Fixed a missing database reference in the Docker setup.
Notable sources
- https://github.com/dafrenchyman/mrsharky/commit/95875017e0a7e5eb64601c762c344d5b6cadedc7
- https://github.com/dafrenchyman/mrsharky/commit/f4caacc4d348d13ba5658852c4b902cf4395f481
Helpful Java Spark stuff
Added pipeline utilities and unit tests for data processing functions. Implemented XGBoostEstimator for Java Spark integration, refactored TopCategories to use Map parameters, and introduced WeightOfEvidence feature. Updated documentation and test coverage.
Notable sources
- https://github.com/dafrenchyman/spark/commit/b997ce80d0b34d604807a87e03a88739b4be23c6
- https://github.com/dafrenchyman/spark/commit/0584a45747b1bc8a774d1ea820f3a99f969355ff
- https://github.com/dafrenchyman/spark/commit/6bd9a9f17854d44266377f2ab1c47cec43267fcb
- https://github.com/dafrenchyman/spark/commit/0b9dbfe8be1f0a55ba521a3acab1274791036917
- https://github.com/dafrenchyman/spark/commit/5fcd26ff929eddbc4508a5d7301d7bfe2cd345f5