How Successful Hedge Funds Deal with Bad Data

By Hardeep

July 8, 2020 · 10 min read

Introduction

The search for alpha has pushed fund management into a domain where data reigns supreme, making the need for accurate, valid, consistent, relevant, and reliable data (“good data”) more important than ever. Most successful hedge funds realize that high-quality data plays an integral part in fund management, especially when it comes to minimizing operational risk and maximizing alpha.

Data informs nearly every aspect of day-to-day fund operations, from the communication with current and potential investors to your investment strategies, post-trade reconciliations, analytics, and reporting.  Therefore, using bad data can come at a high cost, and not just for the hedge fund industry. According to a study from the Harvard Business Review, “Bad Data Costs the U.S. $3 Trillion Per Year”. Who can forget the infamous “London Whale” trade in 2013? Leaving the significant financial cost, and the increased regulatory scrutiny aside, the loss of reputation among peers and investors can be almost impossible to repair in the short term. As a result, bad data can radically impede a hedge fund’s ability to maintain its place in an already competitive market.

On the other hand, good data leads to better-informed investment decisions and increased investor confidence, resulting in higher returns, increased client satisfaction, and growth in assets. 

Microsoft Excel is a powerful tool for aggregating and analyzing data.  As a result, it has become entrenched in the business processes of both emerging and established hedge funds. Microsoft Excel is used to store historical data and for performing data analysis, including financial modeling, what-if scenarios, reporting, and visual analytics, among other daily front, middle, and back-office tasks.  However, due to the need for a low learning curve, Microsoft Excel, by design, has limitations that are obvious to most seasoned fund professionals. These limitations increase the likelihood of storing and generating bad data within Microsoft Excel.

What is Covered in this Article?

Based on our experience working with a multitude of hedge funds, we have identified some of the most common data quality issues. Our goal is to provide a better understanding of the types of data related frustration that operations and investment teams might be facing, and the steps you could take to address these issues, as part of the long-term data strategy for your fund. 

What exactly is ‘Bad’ Data Within Your Fund Operations?

Bad data, or dirty data as it is often referred, is erroneous data that negates the integrity of the whole data set, in some cases making it unusable. The most common four types that we have come across are discussed below.

Missing or Incomplete Data 

Missing or incomplete data is one of the most common types of bad data. This occurs when important cells within a master (source) workbook are left blank (null) due to unforeseeable reasons.  A common example of this is missing corporate action data for equity when calculating its market value. 

This could happen because the CSV file that you received from the counterparty does not yet have this information available at the time of download from their FTP server. Another example could be a missing currency conversion rate for a specific trading day due to the market being closed when working with multi-currency portfolios.

Inconsistent Data 

Inconsistent data is one of the biggest problems that exist for data-driven businesses and can affect not only your bottom line but your reputation in a competitive market. If data is entered into a Microsoft Excel workbook in inconsistent formats, analyzing or storing the information becomes difficult and requires back-office analysts to clean the data to allow for accurate interpretation. For example, different users have entered the pricing information into multiple workbooks and have used a different format type in multiple worksheets for the same information.  

Another example is the formatting of a date field is not consistent among multiple workbooks. In one workbook it’s entered as MM-DD-YYYY, whereas in another it’s entered as DD-MM-YYYY. The information entered into these fields may be accurate but if not in the pre-determined format, it will lead to issues when performing analysis on this data.

While most hedge funds have predetermined requirements for the format of each column, it is easy for these guidelines to not be followed, especially, when working under pressure.

Invalid Data 

Invalid data can occur when the data entered is not within the predetermined range. For example, a subscription was entered with a negative sign and a redemption with a positive.

Another example of invalid data could be gross margin values that are not between 0 and .9 in certain instances. Again, this can result in the hedge fund operations or investment team spending a significant amount of time and resources cleaning the data. 

This can be a potentially costly business interruption, as analysts may be spending the majority of their time focused on data quality, rather than providing useful analysis and reporting.

Duplicate Data 

Duplicate data refers to the same data being entered into a file more than once. Duplicate data is typically due to human error and can occur when one or more people with access to the same file enter the same data several times. 

Or, it can be an automated system process that has gone wrong. For example, an asset is listed multiple times in the position data you received from a counterparty. This error usually occurs due to human oversight, and there is no way to identify this in Microsoft Excel, without manually reviewing the file for these types of issues. In most instances, this is caught by the investment manager or the person looking at the analysis or report, which is typically too late in the process to be of benefit to the team or prevent wasted resource time.

Bad Raw Data vs Bad Calculated Data – What’s the Difference?

Raw data or source data is primarily unprocessed data used to extract meaningful information for analyzing and processing, drawing conclusions, and making projections, etc. The end product of this processed data is a visual analytics and various reports.

Raw data is of little use to your company before being processed and turned into information. Therefore, if you begin with dirty source data you will end up with unreliable, inaccurate information, hindering your ability to make sound decisions for your investors.

Calculated data is a numeric or date field that derives its data from the calculation of other fields using calculated columns in Microsoft Excel tables. Calculated columns in Microsoft Excel is an incredible tool, as it allows you to enter a single formula in one cell, and that formula will automatically expand to the rest of the columns. Although this can create great efficiencies for the operations team, it can also create significant inefficiencies. For example, if you have multiple rows of data and make an error in a formula, that change will expand to the rest of the calculated columns, corrupting all of the data throughout the Microsoft Excel workbook.

Bad Data is Wasting your Operations Team’s Valuable Time 

Garbage in, garbage out — your hedge fund’s operational cost, risk, and even more importantly your reputation, depends on the accuracy of your data.  Poor data quality can significantly reduce productivity and the business impact of dirty data is staggering. 

It is not uncommon for operations teams to spend as much as 2-3 hours daily reconciling issues created by bad data. Modern techniques and technology can minimize this impact, allowing your operations team to be more agile while reducing inefficiencies and operational costs.

Bad data results in poor investment decisions, lead to reputational consequences and loss of revenue, greater inefficiencies as valuable time is spent cleaning data, wasted resources due to wrong decisions, and unprofitable communications. Experian reports that on average, companies across the globe feel that 26% of their data is dirty, costing the average business 15% to 25% of revenue, and the US economy over $3 trillion annually.

What Are Some Promising Solutions? 

Most hedge funds use considerable amounts of diverse data. Once you have the data, cleaning and aggregating the data into a usable format presents a significant challenge within most hedge funds.

Often, the most instinctive solution is a short-term fix, which includes adding more people to the operations or investment team to focus on cleaning the data. At Cognivo, we are taking two approaches to help hedge funds address this challenge for the long-term. 

Approach #1 – Building on Top of Your Current Microsoft Excel Reliant Set up 

Microsoft Excel is not going anywhere, and hedge funds will continue to use Microsoft Excel as one of their primary tools, due to the familiarity and comfort level that users have with it.

Hedge funds on a growth trajectory, realize that they need to introduce more advanced and creative solutions in addition to Microsoft Excel, to minimize operational cost and risk. It is imperative to employ solutions that ensure your raw data is accurate and reliable. This elicits confidence in the knowledge gleaned from the data, ensuring it’s an asset and not a long-term liability.

In this approach, we utilize proprietary methods, software, and tools that we have developed over the years. These tools are built on top of your current Microsoft Excel datasets, allowing us to extend Microsoft Excel, and provide more advanced visual analytics, dashboards, and reporting, while utilizing your current Microsoft Excel workbooks, CSV files, emails, and other data sources.

Realizing the substantial value inherent in integrating highly customized business intelligence solutions, with your current Microsoft Excel set up, to support your stores of data, will eliminate inefficient processes and minimize the frustration your operation team experiences. In turn, this approach makes it much easier to manipulate, analyze, and summarize data, for reporting to the investment management team, compliance reporting, as well as investor reporting.

Approach #2 – Cleaning, Aggregation & Centralization of Your Internal Fund Data

In this approach, we develop a comprehensive data-driven strategy for your specific fund, based on your exact needs. 

We start by discussing and understanding your current data strategy. Then, we take an inventory of the current data assets, sources (both structured and unstructured data), what systems, processes, and reports that use the available data. Once that is done, we clean, normalize, and aggregate your fund data into a highly robust relational data warehouse.

Relational databases prevent data integrity violations through the adoption of database security policies that restrict data modifications to the data used for reporting. Database auditing features allow the ability to check if any unauthorized modifications have been performed on the data being analyzed. Using a database will ensure data validity, consistency, and prevent the unintentional loss of data.

A Robust Data-Driven Strategy is Vital to Your Hedge Fund’s Success 

While data is abundant and inexpensive to store, it will be a burden and put you at a disadvantage if it is not clean, correct, and readily available in a usable format. There is no getting around the critical process of ensuring you have clean data. Therefore, modernizing your processes is a necessary investment if you want your data to be an asset and not a liability.

While investment firms understand the importance of analyzing data and data quality, data aggregation is often overlooked when developing a data strategy. This, in turn, wastes time and resources, while hindering their ability to mitigate risk. 

Implementing a data aggregation solution can help enhance your data analysis and reporting capabilities, by adding features like real-time P/L, drill-downs, interactive charts, and one-click export to multiple formats for sharing. Over and over, we see that automating critical manual processes improves the fund’s ROI, as it frees up time to focus on the analytics process.

As the investment management industry becomes more data-driven, it is crucial for hedge funds to have strong data plans in place that guarantee data retrieved is clean and ensures analysts can turn it into reliable insights in an efficient manner.

We hope this article helps you formulate your thoughts as you think through your hedge fund’s data strategy and challenges in the coming year.