Ep. 41: Ivo Sokolov - Data Engineering


Episode Artwork
1.0x
0% played 00:00 00:00
Jan 08 2020 13 mins  

FULL EPISODE TRANSCRIPT
Adam
: (00:00)

Welcome back for episode 41 of Count Me In, IMA's podcast about all things affecting the accounting and finance world. Our featured guests for this episode is Ivo Sokolov,, a data office division co-leader at one of the most profitable and best capitalized banks in Austria where he specializes in data engineering, regulatory tech and project governance Ivo joined Mitch to talk about how data engineering fits into today's accounting and finance function and it's emphasize the importance of proper controls over any data project. So what exactly is data engineering and why should accountants care? Let's jump into the conversation and have Ivo explain.

Ivo: (00:48)

Engineering is a discipline that is, consists of parts of data center, acknowledged meaning databases. Then as an architecture, data structures extract, transform load processes and tooling on the one hand programming skills. On the other hand, self care architecture and and a bit of all new generation technical infrastructure knowledge. For example, cloud infrastructure, containerization, dev-ops and all of that combined, is sort of data-engineering.

Mitch: (01:25)

And how exactly does data engineering fit into accounting and finance?

Ivo: (01:32)

The purpose of data engineering is to enable the organization to prepare data pipelines for, whatever analytical usage is may be required in the difference business functions. And finance is one of those functions providing the proper, the underlying data architecture that enables the finance function to quickly make sense of the data within the organization too quickly build and maintain data, pipelines for analysis and, and ETL processes and data science, including more, let's say more standard tasks like data cleaning and amortization, setting up batches, feeding them data into proper business intelligence tools, and preparing dashboards.

Mitch: (02:22)

So now with all of this, how does data flow through an organization and for the finance function particularly impact something like forecasting or analyzing financials?

Ivo: (02:33)

Using newer technologies in newer platforms, for example, Python and R, are tools that are usually employed nowadays, one can do predictions and forecasts on financial figures. For example, income statement, balance sheets, cashflow statements. And that can be done within the finance function, within the business division without requiring a specialized technique of the teams. Given that, you know, data engineering has prepared the data properly. The proper data pipelines ensures that finance has, depending on the news, the needs of the company, access to near time data or micro badge data, meaning finance does not work with data from the previous month or from the previous quarter. Having these, the underlying the data flows properly in the organization enables the forecasting and that is say for the financials to be much more timely.

Mitch: (03:30)

So I know you mentioned a couple of the tools, but which tools and particular skills really would be most useful for finance professionals to kind of borrow from data engineering and assist with these analyses?

Ivo: (03:43)

The tools are essentially around using modern of scripting languages such as Python or R. That also includes a lot of libraries with functions that are useful within finance for forecast for analysis. So a popular tool would be a set of source would be like the Jupiter, the Jupiter hub environment, Jupiter notebooks, finance professionals can simply log onto a browser from anything client and build their analysis in a, in a way quite similar to what, with Jupiter and Python or R. Similar to the manner in which you're a software developer, would you use the same tools to write software for other purposes. Other skills that are useful to borrow from data engineering are having whatever code one writes to do their analysis or their forecast or their models be put into, into version control. That way it can be, and used within the department that way people were structuring or going about solving a finance desk, the way that in software development, one would stop libraries with certain functions. For example, getting the proper customer segments or getting certain field there is that are used throughout. You know, one doesn't have to ride the same code three times and this is definitely, we see more and more of that in business departments and in finance

Mitch: (05:23)

As these tools and skills are starting to become shared across an organization, essentially. How is the data that is the end result ultimately viewed differently across these functions? For example, finance versus it or even something like marketing.

Ivo: (05:40)

Now the problem with a siloed data and every function having their own data warehouse or data to do their analysis. Looking at pretty much I'd say customer data in a finance looks at the customer and account from a different perspective that marketing was, but let's say 50%, 60% of the underlying selection of the data would be the same. And now if you've moved into a proper data architecture, you'd expect certain basic fields or basic definitions to be shared, to be put into version control. And that is different than the case there was before using the BI tools would imply publishing some of these dashboards on a server so that they can be shared throughout the organization so that they're not my Excel file sitting on a drive in my division. But also could be shared with marketing. It could be shared with IT if they have to add something or do something with it. So this goes into having a central aligned data architecture.

Mitch: (06:49)

So with this data architecture and the version control that you referenced, I know data is pretty free flowing across organizations now, so who's responsibility is it then to make sure there's proper governance in place and governance projects that are set up with internal controls to monitor how this data is used and seen?

Ivo: (07:13)

This really depends on the type of organization for certain organizations such as banks, there's a regulatory mandate, to do proper data governance and data aggregation capabilities across risk and finance. And data governance would imply that every individual owner of data within the organization is defined and then they would know when that data structure changes and continue to maintain it such that overall if you have a figure on your balance sheet and if you want to understand how that figure comes about, there's a very clear data lineage and that you know how to which steps and which data transformation steps or they engineering steps took place in order for the figures to be as they are, who is responsible for implementing that would differ. But we definitely see a lot of master data management or data governance initiatives and sometimes, depending on the state of the legacy systems of how new or how old the underlying data architectures their organization might need to rethink and initiate a strategic project in order to create the necessary there. The architecture for combining data from usually a really tens or or even hundreds of systems, operational systems. Where that piece of data source data originally resides. And usually ...