Minipost: Exploiting Spark as a ThreadPool

Apache Spark is a great tool for massively distributed data analysis and manipulation. It also features a machine learning library, but in the python variant, it is unfortunately nowhere near as good as the awesome scikit-learn library, especially in combination with numpy and scipy, while the linear algebra tools in PySpark make it seem that the coders didn’t even try to bother. This stands next to the fact that for now, my models are small enough to train them using the aforementioned python libraries. In my current setting, I run a single training procedure with one single parameter 100 times.

Continue reading »

Project: Strava Club Challenges

Strava is a social network for sportsy people (similar to Fitocracy), but focused mainly on endurance sports, such as running, cycling or swimming. It offers a great deal of options to track and analyze your¬†training and your progress (and even more in the Premium Version) and allows you to be actually social in so-called clubs or compete in public challenges. Although I’m not talking my mobile with me for my runs, I own a Polar V800, which synchronizes my runs rather nicely to Strava. However, the range of challenges is very limited, as there are on the one hand only

Continue reading »