Spotlight: Meet Juraj, our CTO

Spotlight: Meet Juraj, our CTO

In our latest Spotlight interview, we sat down with Juraj to find out all about his role, his passion for technology and the challenges of being the CTO at datasapiens.

You are the CTO at datasapiens, tell us what does a CTO do?

I do not know what a generic CTO does, but I am aiming to be somehow useful 😀. 

In our tech team, I fill the role of a data engineer and DevOps engineer. Also, I investigate and test new tech stacks and together with my teammates we form the tech roadmap for our team. 

Each member of our team has a domain of expertise in the software engineering realm, and handles that domain when a problem or feature of interest must be addressed. We aim to teach ourselves to be as much independent and self-managed as possible. In this way, we try to work as a ‘distributed mind’ where each person addresses a different aspect of a topic at hand. Minimizing the presence of a team or company hierarchy is effective if we want to scale up in terms of delivered feature volume. 

How do you keep up with the fast-changing world of technology?

Together with other colleagues, we watch several blogs and news channels for modern technologies, software stacks, architectural designs ideas etc. 

We closely watch news from our cloud provider, since sometimes, a new feature can fix an existing problem that we have, or can reduce costs for our infrastructure, or can help us in replacing another tool or tech stack while also bringing more features. 

Since we have adopted the Hadoop ecosystem (HDFS, Hive, Spark, Alluxio, Trino, Pinot and others) as our core stack, we stay connected with tech blogs from most of the current well-established companies using this ecosystem (Google, Amazon, Netflix, Uber, LinkedIn, Lyft). They are, of course, on a different data scale, but the same problems that we face now, they have already solved and properly documented. And most of the projects from the Hadoop ecosystem come with an active community, which you can reach for help if needed. 

Apart from the infrastructure and whole software stacks, we also keep touch with various frameworks and libraries for the programming languages that we use for developing our internal tools. 

What do you like to do in your free time?

Well, in my free time I do most of the keeping up with the technology news 😊. But apart from that I socialize with my friends, do some minimum viable physical activities like running and gym training. 
Also, I read books on topics like economics, machine learning, global trends, and self-development. 

What do you find most challenging in your role?

The biggest challenge currently for me and for all our team is prioritization. There are 3 key ‘item buckets’, that require prioritization – new technologies bucket, features bucket, issues bucket. 

The new technologies bucket is unsurprisingly the most interesting set of items. Here we always want to try out the new and popular software tools. However, with this bucket, there is always an involved amount of risk, that a new tested software tool will not bear its expected fruit, meaning after first testing we will find out, that it doesn’t add value to our current tech scope. But that is expected in any trial-and-error process. Here, we try to give on a regular basis a small part of a normal sprint to do some innovations/investigations. 

The features bucket forms the key part of our tech roadmap. The features are properly defined and their solution is thoroughly described. Here we often tend to deprioritize due to more pressing issues from the issues bucket. 

The issues bucket is the most often addressed bucket since we always get an inflow of several contemporary issues. It might be the least interesting bucket for prioritization, but it is the most inspirational bucket for implementing new features and probing modern technologies. 

So, the issue is then what to prioritize: To solve a specific issue directly and quickly, or invest time and bare the risk of hitting a roadblock with a new technology? To develop a high-value but customer-specific feature or solve a low-value but generic bug? 

We always struggle to find some balance between these buckets. We don’t always succeed, but we have made progress on this. Still, there is a lot to improve 😊. 

What are you passionate about in the world of technology?

I would highlight two things here: the variability of technologies and the open-source communities

The variability of technologies often offers several workable solutions for a given problem. The task here then, is to choose, which technology is the most suitable for the problem. This is the most interesting but sometimes also the most difficult part in the choice process. Each technology comes in with several trade-offs – advantages and shortcomings.  

Quite recently, we were deciding for an OLAP stack, to which we would gradually migrate our data retrieval workloads. Among the candidates were Apache Druid, ClickHouse, Apache Pinot and Apache Kylin. After some testing time and going carefully through the documentation, roadmap, and the activity of the developer community of each candidate technology, we arrived at the conclusion that Apache Pinot is the most suitable candidate. But reaching such conclusions in these cases was lengthy since one must consider implications in medium and long-term time horizons. 

I like how vibrant open-source communities can be and how critical is their role in software development. From our experience, an open-source software project is 1/3 the software code and 2/3 the community behind that code, thus the community is far more important than the current state of the given software project. 

To give an example for this, some time ago, we were investigating a promising technology for our use cases. It had quite an intriguing set of features and if implemented, it would accelerate our data retrieval processes tremendously. However, it had a very isolated community, poor documentation and not much integration with other stacks. However, we have found a direct competitor to this technology, one that has a very vibrant community, is eager to collaborate, has nicely written documentation, aims to be as much open and available as possible. Both these technologies have many innovative ideas in them, but it is clear, that the second one is poised for long-term success and growth. 

Another aspect of open-source communities is the importance of taking part in them, one can easily start by highlighting bugs, issues, proposing features or improvements. In an active community, when you for example, post a bug, someone from the project’s committers (people actively writing code for that project) will take it and investigate the issue in a matter of few days and will try to incorporate the fix as soon as possible into a nearest version or path release. 

With our participation in open-source communities, we have started small – we have highlighted few low-level specific issues within a given software project like Alluxio or Trino.  

For Alluxio, we pinpointed a simple use case, that can reduce costs of cloud infrastructure under frequent conditions appearing in cloud data lakes.  

In case of Trino, we want to fully integrate Pinot with Trino, however, this is a quite challenging task due to significant differences between the two systems. There are several people within the community aiming to create this integration and we are in close contact with them. 

Where do you stand on cybersecurity? Coding? Programming languages?

Good cybersecurity is necessary irrespective of the size or stage of the company that you work for. Currently, we are undergoing a small transformation within the company to become compliant with ISO27001, which is one of the most common cybersecurity certificates. It has many aspects, and obtaining it is not trivial, since you need to, among other things, undergo penetration testing and have a review from an external security auditor. But obtaining one is a sign of the company become more mature and well established. 

Regarding coding and programming languages – in our company, we use many of the well-known programming languages – Javascript, Golang, Python, Scala, Java. For our internally developed stack, we use Javascript and Golang, for using the Hadoop stack, we use Python, Scala and Java. And since Python has become the lingua franca of many ML frameworks, we use Python also in our internal ML initiative projects. 

I code often, almost every day and mostly in Python and Scala since I often work with Apache Spark. I also write a lot of configuration files (since I am also a DevOps guy 😊). In my earlier company, I was coding in Golang as a backend developer.