Sink or Swim: Making Data lakes a Central Pillar of Your C/ETRM

Discover the importance of modern data management, data interoperability, and security with leading energy and commodity trading experts.

September 22nd, 2023 | 44:11

Summary KeywordsEnergy trading, commodity trading, data management, data interoperability, ETRM systems, ETRM/CTRM, data security, C/ETRM, AI, machine learning, energy transition, risk technology, risk systems

Transcript

Ben HillaryManaging Director, Commodities People

Paul KaisharisSVP of Engineering, Molecule

Alex WhittakerGeneral Manager, Bonroy Petchem

Tim KramerFounder and CEO, CNIC Funds

Ryan RogersPrincipal, ENITE

Kari FosterVP of Marketing, Molecule

0:00

Ben Hillary Well, hello, everyone, and welcome to today's webinar, Sink or Swim: Making Data Lakes a Central Pillar of Your C/ETRM.

My name is Ben Hillary, Managing Director of Commodities People and would really just like to say a huge thank you to everyone for being here with us. Really delighted to see how much this webinar has attracted the interest of the industry, with over 400 registrants from all corners of the globe and all parts of the commodities and data ecosystem.

In the next 60 minutes, we'll be deep-diving into the latest best practices in data management. We'll be exploring how data lakes can work in partnership with your C/ETRM, allowing real-time answers to really complex data queries with the goal of providing absolute trading advantage.

Recent years have seen really, really incredible advances in advanced analytics, AI, machine learning (ML), and many other forms of interpreting and actioning data. However, all this is virtually useless without a strong and effective data management strategy in place - firmly integrated and understood throughout the organization. This is what we aim to shine light and provide best practices on today.

We've got a truly expert speaker panel lined up to whom I'm very, very grateful for their time and input. Some of the subjects we'll be covering today include the importance of data interoperability in a shifting energy market, approaches to managing and analyzing unstructured versus structured data, best practices for optimizing data accessibility across your trading organization, and the key role your E/CTRM provider plays within your data ecosystem.

The webinar will take the format of a panel discussion, followed by Q+A. So, on

that note, throughout the webinar, please be posting your questions in the Q+A box and upvoting others of interest. Also, do make full use of the chat channel for any comments you want to share with the panel and the audience or even just to say hello and introduce yourself.

I am now delighted to pass over to Kari Foster, VP of Marketing for Molecule. Kari, the

floor is yours.

2:20

Kari Foster Thank you so much, Ben, and it's great to be here. As Ben mentioned, I'm Kari Foster. I'm the VP of Marketing at Molecule, which is the modern ETRM/CTRM platform. An ETRM/CTRM, if you're not familiar, is an energy trading or commodity trading risk management platform.

I'm really excited to be here today, introducing just a stellar panel of experts who are representing trading, risk management, and technology. And, this is really such an important topic for anyone within the trading organization who depends on data to do their jobs basically. Don't we all? And, a data management strategy is so much more than just the technology you have in place. And certainly, that's going to be covered today - but, it's how the data is structured, how it's accessed and consumed, the ways that it can be analyzed. And, that all starts with a strategy that has the end in mind.

So, without further ado, I'd love to get this really important discussion going by introducing today's panelists. First, Paul Kaisharis is my colleague, the Senior VP of Engineering here at Molecule. Tim Kramer is the Founder and CEO of CNIC Funds, and he actually used Molecule to model the prototype for their U.S. Carbon Neutral Power Futures Index ETF. Ryan Rogers is Principal at ENITE, which is a management consulting firm delivering strategic solutions to energy utilities and manufacturing. And, Alex Whittaker is General Manager at global energy trading and supply company, Bonroy Petchem.

And, I'll pass things back over to you, Ben.

4:26

Ben Hillary Excellent. Thank you, Kari. Right, drumroll. We will get kicked off right away to begin with a poll for the audience, so I am launching the first poll now. And, hopefully everyone can see this.

So, the question is - and it's a single choice - what is the biggest challenge with getting better insights from your trading data? Is it data quality, data management tools, data analytics tools, overall data strategy, internal skills or knowledge, unsure, or not applicable? So, if everyone can ponder that one, and I'll end the poll in about 10 seconds.

Excellent, okay. I am ending the poll now. Okay, so interesting results. Alex, Tim, how do these results line up with your own experiences? And, from your perspectives, how do they align with the main challenges you face?

5:44

Alex Whittaker Yeah, I mean I think I can relate to those answers, like the data strategy and data quality being the main problems, but we'll say the situation is a bit of everything.

So, my experience with this is just how fragmented all the different data sources are, like the volume of data. All the different sources, all the different licenses that I need. How do I get all of my prices together, even for what is a relatively simple trading book?

So, yeah. That's what I take from this, is just that say, my own struggles at the start of getting everything set up for Bonroy, I think, come through in that - data quality and data strategy. And yeah, just thinking, how do I get just basic prices in? I mean, let alone more complicated prices. That's what concerns me at the moment - overall data quality.

6:02

Tim Kramer So, for what we're seeing, the overall quality and management haven't really been a problem because we're using exchange-traded prices. And so those things, have like an auto scrape and that just hasn't been an issue.

And, the skills and knowledge - the people that we're seeing right now that are working for us, plus the people that we interface with, just amazing. You know, the younger generation and their math skill stat skills or, kind of, what they know and how computers iterate. It's just stunning.

But, the part that we see that's a little bit challenging still is the analytics on this. There's, you know, different math techniques and different things that people want to look at. You know, different people have, like, a different version of how they want to do a sharp ratio, things like that.

And then, when people try to use the data to get some more insights out of it and actually make something useful with it to try to get an edge, that's where there's just so much more information that you can tease out of the data, like cross-correlation and co-integration and things like that.

So, trying to isolate the individual components once you have the data, that's kind of been what we see as the biggest challenge.

7:42

Ben Hillary Excellent, thank you.

Well, next question. The trading landscape itself has changed immensely in recent years. We've got factors like the energy transition, the rise of carbon markets, general shifts in technology. What has been the impact on business needs from a data perspective? Ryan, if we could start with you on that question, then we'll go to Tim, Paul, and Alex.

8:08

Ryan Rogers Certainly. So, the traditional highly structured data remains critical. You know, things like risk market, risk management, credit risk management, compliance.

Where I think we've seen new needs and new analytical capabilities needed is in some of the renewables markets. There's a lot of much larger networks for the Internet of Things (IoT). A lot more smart grids and sensors bringing in semi-structured data. Things that are in JSON or XML.

So, there's more of a struggle to deal with that semi-structured data in the traditional data warehouses. And then, I think the emerging markets - carbon credits, environmental credits. Some of those markets, where they are on exchange-traded platforms and have structured prices, it's great. But, some of them are auctioned.

There's, you know, infrequent data points to pull in. So, I think some of those emerging markets and the need for just monitoring news feeds, market sentiment for some of those emerging markets would benefit from natural language processing (NLP), machine learning (ML), and some of those emerging capabilities.

9:20

Ben Hillary And, Tim?

9:22

Tim Kramer So, what we've kind of seen for how the landscape has changed has been sourcing and documentation of the data. So, I mean the data comes in. It's good, but everyone says, "Where'd you get that? Where did that come from? Well, what's the web link to that? Well, how can I verify that?"

So, there's been a big push - and, again, because we're registered with the SEC for what we do - there's a big push on the actual data sourcing, the documentation. So, for like, for auditing, for like, SOC, et cetera. And, as Ryan said, that when you're taking a look at the carbon, people want to be able to kind of verify things all the way back to the source to say, "Okay, does

this qualify for SFDR, article six, seven, eight, nine..." whatever it would be.

So, that would be the thing that we're seeing is the documentation and the actual. It's all the way down, so someone can look at the actual links and verify them.

10:08

Ben Hillary Excellent. Paul, your thoughts.

10:12

Paul Kaisharis Yeah, so I guess what I would add, we've kind of touched on this already, about the volume and the variety of sources of data, and just the volume of that data has increased tremendously, you know, over time. And, it's really a big challenge to, you know, manage this volume and make sense of it, and add value to the business.

So, there's so much data coming from all different sources, and Tim touched on it - just having that connection of where that data has come from and the relevance of it. You know, I would say the technology has matured quite a bit. And, there is, for businesses, on how to use that technology and how to manage that volume of data and make appropriate use of that data.

I think that's an impact of business - to decide how to best, you know, utilize and leverage that. And then, of course, there's the larger question of, you know, AI - "the evil AI," and what do we do with machine learning (ML) and large learning models? You know, that's a real question businesses need to ask themselves. And, how, you know, if their competitors are going to be using that technology, is that going to leave them behind?

So, I think there's some big questions around that for businesses to answer.

11:21

Ben Hillary Alex, your thoughts.

11:24

Alex Whittaker Yeah. To echo what Paul just said, that really sort of impact we've seen. You know, more data, more data vendors, more delivery methods, more choice, more complications, more costs, more service problems. And yeah, just the sheer growth in data, different sources that it comes from.

And, I think it ties in quite well with this, you know, data lakes and your CTRM. What's the solution to that? It's trying to get that fragmentation to come together in one place with people who know what they're doing and how to do it and try to streamline it that way is something I've actually learned during the process of doing this panel. Right, talking to these guys. So, it's something I'm looking at for Bonroy right now, in fact.

12:03

Ben Hillary Next question. We'll go to Ryan and Paul with this one. What is the difference between unstructured, semi-structured, and structured data?

12:17

Ryan Rogers So, structured data is the one we're all familiar with. All of the pricing volume, transactional data contracts that the sources are well known, how fresh, it's time-stamped. It's verified. The trade controls person is monitoring that data daily

The semi-structured is probably the next most useful bucket of data that I've seen in my clients. So, this is time series data that has some structure but is not necessarily time-stamped, cleaned, verified. Things like meter data, SCADA data, PI data. Things that are necessary and useful for all the ancillary operations but difficult to get into a highly structured format. Or, the data source is enormous, so it's time-consuming and complicated to integrate into your highly structured data warehouse. So, that's probably the next most useful category.

Unstructured data is things like text, images, video, things that would benefit from natural language processing (NLP), machine learning (ML), especially in some of these emerging markets like we've been talking about.

13:30

Paul Kaisharis I mean, Ryan hit on it but maybe a little bit... at a not-too-technical, a little lower level. I mean, with the structured data, it's traditional relational database data. You

know, stuff that's stored in tables. You know, trading data, market data, curves, trade valuations. You know, that's considered the structured data, with rows and columns of information.

On the semi-structured side to elaborate down on that one a little bit. It's, kind of, JSON and XML type data, where you get a little bit more descriptive information about, what is that data about? For example, for Molecule on a semi-structured side, obviously Molecule has a lot of structured data with what I just mentioned, in terms of trades, market data, et cetera.

But, on semi-structured, we provide Value at Risk (VaR) calculations through a JSON structure, which is, again, a more complex structure that you can get more descriptive, more information on. Also, a lot of modern ETRM systems, of course like Molecule, have APIs that return data in that JSON structure. So, what do you do with semi-structured data like that?

And then, Ryan mentioned the unstructured, which is the documents, video, and images, and all that. So, that's how I would... I mean that's a typical categorization of those data structures.

14:40

Ben Hillary Following on from that, what are your approaches and your best practices

in managing unstructured, semi-structured, and structured data? And, Paul, if you want to continue.

14:55

Paul Kaisharis Sure, yeah. So, I would always say start with security, and start with security first. And, you've always got to consider security, and security of the data. Always consider the principle of least privilege that basically means only provide access to what a person or entity needs to do their job. So, start with security first.

You know, Alex mentioned about all these different sources and different locations of the data. Bring all the data together into a centrally managed, you know, store location. You know, bring that information together. And, you know, I mentioned about technology maturity. I mean, you know, take advantage of relatively low-cost cloud storage infrastructure.

There's always ways to store and bring this data together. You know, bringing that data together and take advantage of that you know lower cost cloud storage I think is something to consider there.