Announcing Tiingo Composite Price Feeds
You can access the data here: Tiingo API – EOD Daily Data
This year our company hit a major turning point with revenue rapidly rising. The first thing on the list we had wanted to do? Get better data. Announcing our new End-of-Day (EOD) Price Data Engine powering Tiingo.com and its API
Because of each and every one of you, we were able to expand our data budget literally 15-fold in the past couple months. And today, I am proud to announce our new Data Engine initiative. As of June 28th, 2017, we have converted 98% of equities over, and 60% of Mutual Funds. The rest is being migrated this week.
But what is the new methodology? Glad you asked. We went back to the drawing board and realized, “if ISPs and web hosts have redundancy? Why don’t we as a data firm?” We started there and expanded it.
So we broke up our process into 4 phases as you can read below. In summary:
Each ticker must go through 4 phases before prices are made available:
Phase 1: Each ticker is covered by at least 2 different data providers. This ensures redundancy and is also a method to cross-check updates
Phase 2: Each data provider’s data must then pass our statistical error checks. If there are any errors, our system looks to autocorrect them. For ex. one example of what our statistical engine does is detect duplicates
Phase 3: Human intervention. Companies do weird things and markets haven’t always been automated. This makes it very hard for computers to detect things like re-listings or sparse data on lesser liquidity companies, or companies that have been pre-computer era. Our systems alert us when the statistical engine can’t auto-correct. Each of our human steps are documented, so we can explain what decisions were made and why.
Phase 4: AI. Once we have enough data from Phase 2 & 3, our systems can start auto-correcting certain errors. Note: for readers of the blog, you know we are skeptical of full-automation, or even most of the AI methods out there when it comes to financial data. The AI will always be conservative, but it is an important step when error-checking.
Only after the above 4 phases do we release price data for a ticker. Now imagine that times 40,000.
Just a quick note: EOD is very hard to get right, especially with companies doing weird things with listings/delistings/restructuring. So if you identify an issue, please let us know.We are actively working hard on transparency and also creating better data for all, but it will require a joint effort. We look forward to working on this solution!
Here are cool graphics below:
What It Takes For Prices To Be Published On Just 1 Ticker
(Now multiply this by 40,000)
Phase 1: Source data from multiple providers for both redundancy and error-checking
We’ve gone to a variety of different data vendors, each with different methods of access, to ensure that the data feeds remain as unique as possible. Our goal is to have a minimum of 2 data providers per ticker. We are using AAPL in the examples below
4 Different Data Providers for AAPL
When then extend and compare each historical EOD data from each provider. We even use some datasets that are no longer around, but offer historical time data that far surpass the others.
Phase 2: Run Statistical Error Checking on Each Data Source
We then use a proprietary suite of statistical tools to clean each data feed and also detect issues or errors within each feed. This helps us score and keep track of each feed, and also automate common errors that we may find, e.g. duplicate values
Phase 3: Good-ole Human Intervention
Computers are smart, but they don’t understand qualitative history quite well enough yet. When our statistical engine catches a discrepancy that it can’t auto-fix, we go on a mission to dig into what happened. This involves anywhere from scanning through historical press releases, financial statements, making phone calls, or whatever we need to do in order to get to the bottom of it.
When you alert us of an error, we go out of our way to fix it. We built this entire engine because users identified an error and we realized one data source wasn’t going to cut it. We take these reports that seriously.
Phase 4: AI
In order for robots to take over, they need to learn. We keep track and audit every override we’ve done from phases 2 and 3 so when we have enough data, we will implement our AI that will learn how to auto-correct better and when to alert us to issues. Those of you who frequent the blog, or know our team, know we are very wary of AI’s ability to fix data errors. When this is implemented, it will be incredibly conservative as we will always prefer phase 3.