Why I blog

Often I ask myself what is the reason to write articles in my blog. Is it useful? Does somebody read the stuff? Or is this just a waste of time?

Perhaps the last question is easy to answer. I would not do it if it would be useless. But why do I do it?

This is much harder to answer. According to the stats some people are reading my blog. So do you 😉 At the moment approx. 70 per week. So thanks for that. But back to the topic…

Basically for me the main reason is that I use this blog as a kind of notepad for stuff I did. It is quite cool to browse back in history and see what I have blogged about. It started as a blog to tell about my holidays to Australia and then evolved to something completly different: A blog about coding and other tech related topics. Kind of cool for me.

So what will this blog look like in the future? I don’t know but for sure I will continue to blog. Quite likely there is the idea to switch a bit more into the direction of IT management and organisation topics what fits to my current role as a Head of Software Engineering. But still there will be some more tech topics, too.

So stay tuned and I feel free to share and comment my articles.

GD Star Rating

TDBF – The bottom line of TDD?

I think that the question what the best approach of developing software is nearly as old as software development itself is. And additionally it can not be answered because in my opinion it is totally dependent on the environment. That means if you are working in an agency on projects or in an enterprise company on products, which language and tools you are using, how mature you and your team are, …

So for the last few years test driven development seems to be the rising star in IT industry especially when you would like to have good quality. But for me it is kind of curious because it seems to me that if you have a look in the software in the market there are two kinds of software: Well tested and successful from a business point of view. Okay, this might be really black and white thinking but to be honest I have never seen a successful enterprise software that has 100% test coverage with automated tests. But of course everyone in the industry with a technical background strives for TDD. I would rather say that most of the software I have seen is mainly tested manually. Imho the reason for this that mostly I have seen some legacy software that is in the market for 3 or more years.

I think that if you start something new that a TDD approach gives you plenty of advantages, e.g. you can implement software against a specification. This comes especially handy for ETL or REST services in my opinion. But I do not think that there is really a good KPI what good quality of a software means especially stuff like Code Coverage really are the wrong approach. Additionally testing is expensive: From my point of view if you really go test driven and also write tests in all levels of the test pyramid it is easy to spent 60% of the total effort in testing. Yes this pays back. But this takes time and also might not pay back if for example you do an experiment and decide to remove a feature because it is not successful.So I think that a good team really knows best what to test and what not and if TDD makes sense for them. And especially when to do it, in the proof of concept or later.

But for me the main question is how to get to a certain level of tests and to convince people of the advantages of tests. And also if you follow my approach to do an experiment with not so good tested code and later on decide that the feature stays live the question is how to improve quality and amount of tests?

I see only one way to approach that and that also might be a good way to improve legacy software: TDBF – test driven bug fixing. That means as soon as there is a bug the first task is not to fix the bug but to be able to reproduce it in an automated way. And this should be done ideally on the developers machine. I think the advantages are clear. On the one hand the developer is able to determine if his fix works already early in the development process, root cause analysis becomes much easier and additionally the bug will not appear a second time if that test is run automatically e.g. after each merge. It sounds easy but often different environments behave completely different and writing good tests is a hard task. But definitely for me that is the thing to do to be able to maintain software and also to improve the skills of your engineers. And of course this is much more pragmatic than just saying „Now we do TDD“. Lets hope that usefull and pragmatic TDD  evolves out of TDBF.

Please feel free to share your thoughts and have a nice week!

GD Star Rating

Big Data vs. Data Privacy

Currently I’m working on a big data / data ware house project. So basically what we want to achieve is collect data from different data sources (databases, tracking, …) at a central place to be able to extract certain data and do analysis on it.

Basically the architecture looks like the following:

The data is exported as a snapshot from the source system and saved to a snapshot file to have historical data available e.g. for data scientists.
From there it is but into a database that reflects the current state of the data.
Finally there are different applications accessing the current state of data, exporting it, doing aggregations, displaying charts, …

Of course we also talk in such cases about user specific data. So also data privacy matters a lot.

We were discussing a long time about the use case if user data has to be deleted from the system and how to do it.
Our finding was that the hardest part are the snap shots where for example you have a users data in different files that were collected of a year or longer because basically those snap shots are exports of the source database.
For the current view on the data and the analytics applications it is quite easy to delete and/or overwrite the data because normally you store e.g. a user’s email only once in such a system but for the snap shots you might have multiple instances of it in different snap shots. That would mean in a deletion scenario you had to go through a lot of files check for a specific user entry and delete / overwrite it.

So we came up with an idea of encrypting the snapshots: So the user’s data in each snapshot is encrypted for each user with its individual key. That key is stored in a central place. Each user will have only one key so there is e.g. a database table that contains a mapping of user id and the key.
During the creation of the snapshot the data is encrypted and saved and before importing into the current state database it is decrypted and the decrypted version is deleted right after the process.

In case user data has to be deleted deletion in current state and statistics applications will be performed. Additionally the encryption / decryption key of that specific user will be deleted so his data will still be in the snapshots but it is (not easily) possible to access the data.

At the moment this is more an idea and has not be implemented yet but we think it might be a good approach. So I will keep you updated about a proof of this concept.

Feedback and questions are appreciated as always! 😉

GD Star Rating