Browsed by
Author: Rishi Singh - Founder

Tiingo’s New End-of-Day (EOD) Price Engine

Tiingo’s New End-of-Day (EOD) Price Engine

Announcing Tiingo Composite Price Feeds

You can access  the data here: Tiingo API – EOD Daily Data

 

This year our company hit a major turning point with revenue rapidly rising. The first thing on the list we had wanted to do? Get better data. Announcing our new End-of-Day (EOD) Price Data Engine powering Tiingo.com and its API

Because of each and every one of you, we were able to expand our data budget literally 15-fold in the past couple months. And today, I am proud to announce our new Data Engine initiative. As of June 28th, 2017, we have converted 98% of equities over, and 60% of Mutual Funds. The rest is being migrated this week.

But what is the new methodology? Glad you asked. We went back to the drawing board and realized, “if ISPs and web hosts have redundancy? Why don’t we as a data firm?” We started there and expanded it.

So we broke up our process into 4 phases as you can read below. In summary:

Each ticker must go through 4 phases before prices are made available:

Phase 1: Each ticker is covered by at least 2 different data providers. This ensures redundancy and is also a method to cross-check updates

Phase 2: Each data provider’s data must then pass our statistical error checks. If there are any errors, our system looks to autocorrect them. For ex. one example of what our statistical engine does is detect duplicates

Phase 3: Human intervention. Companies do weird things and markets haven’t always been automated. This makes it very hard for computers to detect things like re-listings or sparse data on lesser liquidity companies, or companies that have been pre-computer era. Our systems alert us when the statistical engine can’t auto-correct. Each of our human steps are documented, so we can explain what decisions were made and why.

Phase 4: AI. Once we have enough data from Phase 2 & 3, our systems can start auto-correcting certain errors. Note: for readers of the blog, you know we are skeptical of full-automation, or even most of the AI methods out there when it comes to financial data. The AI will always be conservative, but it is an important step when error-checking.

Only after the above 4 phases do we release price data for a ticker. Now imagine that times 40,000.

Just a quick note: EOD is very hard to get right, especially with companies doing weird things with listings/delistings/restructuring. So if you identify an issue, please let us know.We are actively working hard on transparency and also creating better data for all, but it will require a joint effort. We look forward to working on this solution!

Here are cool graphics below:

What It Takes For Prices To Be Published On Just 1 Ticker

(Now multiply this by 40,000)

Phase 1: Source data from multiple providers for both redundancy and error-checking

We’ve gone to a variety of different data vendors, each with different methods of access, to ensure that the data feeds remain as unique as possible. Our goal is to have a minimum of 2 data providers per ticker.  We are using AAPL in the examples below

4 Different Data Providers for AAPL

When then extend and compare each historical EOD data from each provider. We even use some datasets that are no longer around, but offer historical time data that far surpass the others.

Phase 2: Run Statistical Error Checking on Each Data Source

We then use a proprietary suite of statistical tools to clean each data feed and also detect issues or errors within each feed. This helps us score and keep track of each feed, and also automate common errors that we may find, e.g. duplicate values

Phase 3: Good-ole Human Intervention

Computers are smart, but they don’t understand qualitative history quite well enough yet. When our statistical engine catches a discrepancy that it can’t auto-fix, we go on a mission to dig into what happened. This involves anywhere from scanning through historical press releases, financial statements, making phone calls, or whatever we need to do in order to get to the bottom of it.

When you alert us of an error, we go out of our way to fix it. We built this entire engine because users identified an error and we realized one data source wasn’t going to cut it. We take these reports that seriously.

(We spent all of our money on data, not clothes)

 

Phase 4: AI

In order for robots to take over, they need to learn. We keep track and audit every override we’ve done from phases 2 and 3 so when we have enough data, we will implement our AI that will learn how to auto-correct better and when to alert us to issues. Those of you who frequent the blog, or know our team, know we are very wary of AI’s ability to fix data errors. When this is implemented, it will be incredibly conservative as we will always prefer phase 3.

 

Completion

With all the data now cleaned, derived, and made into composite indices, we release it to you all in a single EOD data source.

Facebooktwittergoogle_plusredditpinterestlinkedinmail
Making the World’s Best Screener for Our Users Pt. 2

Making the World’s Best Screener for Our Users Pt. 2

If you haven’t seen part one – read it here: Making the World’s Best Screener for Our Users

As we’ve improved our screener- we also couldn’t stand idly by not updating our custom metrics creator. Tiingo was the first major fintech company to allow any user to create their own stock screening metric.

And as time passed – we realized we were going to make it so much better for you.

Announcing: The Sexy, Newly Revamped Custom Metrics Creator:

The New “IDE”

Programmers use “IDE”s to code – and we wanted to make that simple for everybody to use. We created our own version that makes it so simple – so if you know Excel, you know how to make a custom metric.

And the best part? Each line will give you the number calculated so you get feedback immediately.
IDE Example

We even added autocomplete:

IDE Autocomplete

 

The Shifting Distribution

One of the most impressive features we’ve ever coded on Tiingo is taking your custom metric, and then calculating analytics on it immediately. The horse power this took was incredible and it pushed our coding abilities. Coding the shifting distributions, especially with custom metrics, took us 80% of our time.

As you enter your metric, you will see the distribution of your metric across the entire Tiingo Universe:

altman-z-score

And now when you screen – just like the screener, the distribution of your metric will shift

custom-shifting-distribution-pe-market-cap-altman

The Docs

This blog post couldn’t cover all of the metric and functions (like calculating the mean EPS over the past two years), so we created comprehensive documentation that let’s you harness the full power of the new screener

The Docs

 

We know you’re going to love our new Custom Metrics: Tiingo Custom Metrics

Facebooktwittergoogle_plusredditpinterestlinkedinmail
Making the World’s Best Screener for Our Users Pt. 1

Making the World’s Best Screener for Our Users Pt. 1

It is over a year ago Tiingo launched it’s first screener. We were attempting to move forward the power of screeners, and we had a grandiose ideas of how to do it. We were the first to:

  1. Allow users to create their own metrics
  2. Created a new UI that challenged existing assumptions of screeners

We’re never happy with the status quo, so we decided to challenge ourselves further. We were going to make the custom metrics more intuitive, the screener more informative, and the user experience so intuituve –  you would have no idea that you just screened through ten million datapoints because it took 200ms.

Announcing: The Sexy, Newly Revamped Screener:

Tiingo Screener

The New Notebook

We’ve consolidated the screener overview page into a notebook format. This allows for easy switching among screens and reduces clutter while saving you clicks. We strive for beautiful minimalism here at Tiingo:

Tiingo Screener Notebook format

Searchable Filters

While the old drag and drop was nice, we wanted to come up with a new way to add/remove filters. We’ve created a beautiful searchable table, organized by the type of metric.

Metric Selection Table

Shifting Distributions

We believe data visualization should be done with a level of minimalism. We don’t want charts for the sake of charts. And research has shown time, and a time again, less is more when conducting analysis with numbers.

So we started off with the concept that when somebody screens, they should have context.

Is a filter for a P/E Ratio between 10-25 too common?

PE Between 10 and 25 with Distribution

But that wasn’t enough.

If we’re looking at a PE Ratio of 10-25, what kind of companies are we choosing? Are they small-cap or large cap? The Market Cap distribution has shifted and we want to know that.

Announcing:

The Shifting Distribution

pe-10-25-market-cap

We can see that a P/E of 10-25 leans slightly toward larger companies.

How about a P/E ratio of -20-0.

PE -20 to 10 Market Cap

The distribution has switched largely to smaller Market Caps.

Suddenly, you now have context into all of your screening metrics. And the best part? It’s all done in a flash of a second. If you don’t believe us – check it out here (no registration required): Tiingo Screener

The Results

We wanted our users to be able to see the latest data – quickly. Now the results show you metrics seamlessly and beautifully:

Screen Results

And you can simply click to see more about a company:

Screen Results Expanded

We know you’re going to love this new screener: Tiingo Screener

Facebooktwittergoogle_plusredditpinterestlinkedinmail
Presenting the Tiingo API

Presenting the Tiingo API

It’s here, it’s finally here.

The Official Tiingo API has launched after months and months of people requesting this, followed by months and months of dev time. The reason it took so long? We didn’t just do standard-API stuff, but we built infrastructure in exchange data centers to help significantly reduce to costs to everyday users to FinTech and Institutional players.

For example, EOD data is included in the Tiingo price, whereas for FinTech: real-time data is $500/month instead of $4200/month.

In summary: the entire API was built with the idea, “how much can we give and get away with it?” instead of, “how much can we charge and get away with it.”

And with that here’s the lowdown:

Link: https://api.tiingo.com/
Docs: https://api.tiingo.com/docs/general/overview
Our technological approach: https://api.tiingo.com/about/difference
The origin story: https://api.tiingo.com/about/origin

There exist a few limitations:

  1. Every user is entitled to 40GB of bandwidth a month. Yes we realize that’s insane – it’s why we did it.
  2. Every user gets 10k requests an hour and 100k requests a day. We plan to increase these as more datasets come online and as we phase out of beta. You can monitor your usage at: https://api.tiingo.com/account/usage
  3. In order for us to track these limitations, we will need you to create an account (hope that’s ok!)

 

We’ve worked hard to make the documentation super simple to use. You can view them here: https://api.tiingo.com/docs/general/overview

 

Here are our datasets:

Included in Tiingo.com

  • EOD Data 
    • End of Day price data for over 37,000 tickers including ADRs and Chinese stocks
  • Mutual Fund Data
    • Getting ready for launch
  • Technicals
    • Getting ready for launch

Additional (for FinTech)

  • IEX Price Data
    • Tiingo is the first FinTech company to bring IEX Real-time data to the mainstream public
    • IEX Real-time data for $500/month versus $4200+/month for other services
    • Websockets and REST implementation

 

 

Facebooktwittergoogle_plusredditpinterestlinkedinmail
How to set up Hosted Web Apps with Windows Live Tiles

How to set up Hosted Web Apps with Windows Live Tiles

For those of  you who have been keeping up with this blog, the Javascript container process is something I’ve been following closely for the past decade. Earlier in the year, Peter Kruger from Microsoft reached out asking if I could test their latest implementation, which we presented at //Build. It was an honor, and since then I’ve been advocating Microsoft and OpenFin’s implementation as my favorites.

In a nutshell: The Javascript container process let’s you take a JS website and make it feel native to the operating system – whether it’s iOS, Droid, or Windows. We’re going to cover the Windows Live Tile implementation here.

For those of you with Windows machines, tablets, or phones (okay Surface users and PCs) – you may see what Windows calls “Live tiles.” Windows 8 may have overdone it 😉 but Windows 10 nailed it. Tiles allow you to gain a snapshot of what the app is doing without having to open it. I always found this implemented on Android clumsy from a UI standpoint and the feature is mostly non-existent on iOS -with the exception of a Apple-apps like weather and the clock. I use an iPhone for the record.

But – Microsoft nailed it IMO with the perfect amount of structure and dynamic content. Whereas Android has widgets which are all sorts of shapes, Microsoft forces structure and lets you “snap” together tiles.

E.g.

Windows Live Tiles
Windows Live Tiles

Notice the Tiingo one? Yeah we like it it too 🙂

We’re going to cover how we got these going in our pure-Javascript implementation. It didn’t require any native coding which was nice. It turns out if you’re using Hosted Web Apps, which let you convert your Javascript web app into Windows app, Microsoft injects a Windows library that you can use to interact with Windows.

This GitHub page does a good overview, but we’re going to go a little more in-depth. Still a good read-through: https://microsoftedge.github.io/WebAppsDocs/en-US/win10/HWAfeatures.htm

 

Step 1 – Download the source code/generate the manifest

You need to generate source code, or a manifest file for this to work. If you don’t know what it is (like me initially), you can use App Studio which has a wizard and takes care of this for you. Visit here: http://appstudio.windows.com/en-us, make an account, and then created a “Hosted Web App” via this URL: http://appstudio.windows.com/projects/create

When you’re done with the wizard, click “Generate” and download the Source code.

Download Source Code

Once you the source code, you can open it up via Visual Studio. You can download the Community edition for $0 here: https://www.visualstudio.com/en-us/products/visual-studio-community-vs.aspx

Step 2 – Choose a Template

Microsoft has pre-generated templates that you can “fill-in.” In reality, these are XML templates where you can change the content and then update it. So we’re going to choose a template, populate it with data, and then send the notification update to the Windows Notification library

Find a template that you like. We’re going to change the content in them to present the data that we want. You can see the catalog here:  https://msdn.microsoft.com/en-us/library/hh761491.aspx

For Tiingo, we went with tileWide310x150Text05 . Keep track of this “identifier” code as we will need it in our javascript code.

I like the idea of clean, text and for financial data images are not as necessary. Maybe later we will include them for news stories, but first I wanted to include text.

Once you choose the template, you can scroll down and see the XML. For tileWide310x150Text05  it looked like (taken from MSFT’s website):

<tile>
  <visual>
    <binding template="TileWideText05">
      <text id="1">Text Field 1</text>
      <text id="2">Text Field 2</text>
      <text id="3">Text Field 3</text>
      <text id="4">Text Field 4</text>
      <text id="5">Text Field 5</text>
    </binding>  
  </visual>
</tile>

<tile>
  <visual version="2">
    <binding template="TileWide310x150Text05" fallback="TileWideText05">
      <text id="1">Text Field 1</text>
      <text id="2">Text Field 2</text>
      <text id="3">Text Field 3</text>
      <text id="4">Text Field 4</text>
      <text id="5">Text Field 5</text>
    </binding>  
  </visual>
</tile>

Step 3 – Update the tile in your JS code

Next we have to tell Windows when to update the data and what to do.
We used this snippet, check the comments to see what each line means:

//See if the Windows namespace is available (injected by Windows for HWAs)
if (typeof Windows !== 'undefined' && typeof Windows.UI !== 'undefined' &&
typeof Windows.UI.Notifications !== 'undefined') { 
     
     //setting dummy market data
     var marketData = {spy : {returns : .05}, newsLinks: [{title: "Headline 1"}, {title: "Headline 2"} ]};
     //Get the Windows UI Notifications
     var windowsNotifications = Windows.UI.Notifications;
 
     //Load in the template, which will contain the XML we can modify
     var tileTemplate = windowsNotifications.TileTemplateType.tileWide310x150Text05;
     var tileXML = windowsNotifications.TileUpdateManager.getTemplateContent(tileTemplate);
 
     //We now get all the text elements and append text nodes
     var tileText = tileXML.getElementsByTagName('text');
     //First line will be a header
     tileText[0].appendChild(tileXML.createTextNode("Market Snapshot"));
 
     //Next we get the returns and append a "+" sign if the return is >0. For negative numbers, JS defaults to appending a "-"
     if(marketData.spy.returns > 0) 
          tileText[1].appendChild(tileXML.createTextNode("S&P 500 +" + (marketData.spy.returns * 100).toFixed(2).toString() + "%"));
     else
          tileText[1].appendChild(tileXML.createTextNode("S&P 500 " + (marketData.spy.returns * 100).toFixed(2).toString() + "%"));
 
     //Next we add the news headlines
     tileText[2].appendChild(tileXML.createTextNode(marketData.newsLinks[0].title));
     tileText[3].appendChild(tileXML.createTextNode(marketData.newsLinks[1].title));
 
     //Create the TileNotification, passing our modified XML template and then send the update command
     var tileNotification = new windowsNotifications.TileNotification(tileXML);
     var tileUpdater = windowsNotifications.TileUpdateManager.createTileUpdaterForApplication().update(tileNotification);
 }

Since we are using Angular, we wrapped the initial call in a $timeout() and then set an $interval to get the marketData JSON object from our back-end every 30 seconds.

 

Step 4 – Test the app by running it in Visual Studio, pin the app to your start menu, and voila!

Beautiful!

Our Example Tile
Our Example Tile
Facebooktwittergoogle_plusredditpinterestlinkedinmail
AWS vs Packet.net
Why we left AWS

Benchmarking AWS’s Network, Disk, and CPU Performance

AWS vs Packet.net
Why we left AWS

Benchmarking AWS’s Network, Disk, and CPU Performance

If this sounds like a glowing review of Packet.net – it is. I found myself re-reading this post over and over, trying to make it sound less shrilly – but I can’t. It’s just a ridiculously good product and value – EC2 containers just don’t make sense anymore.

A friend once told me, “Rishi – sometimes if you don’t advocate a product aggressively – you can be doing society a disservice in your attempt to be neutral. If the value is so good, you must tell everybody about it.”

This is one of those times.

EDIT: Feeling really grateful the HackerNews community decided to link to Tiingo a second time. In the first HackerNews posting many of you asked for an API, which is what led to me finding the AWS bottleneck. The API launched [quietly] this week at: https://api.tiingo.com where Tiingo is now the first company to bring IEX (anti-HFT exchange/darkpool) data to mainstream FinTech. Kind of went full-circle as this post wouldn’t have existed without the original HN coverage.

TL;DR:

The performance of AWS on network speed, disk speed, and CPU performance are quantitatively just “not good,” for what we needed. When we introduced real-time market data, we were in search of our bottleneck and realized it was AWS. We made the decision to switch to Packet.net and the below reflects on our decision and explains why. The benchmarks continue to reaffirm our decision. Having said all of this, certain features of AWS remain incredibly convenient like S3, Cloudfront, and Route53 – but we can’t justify using EC2.

In Networking: Packet is significantly faster, more stable, and 15%-44% cheaper

In Disk Usage: Packet is more performant and 92% cheaper

In CPU: Packet is 30-40% more performant and 15% cheaper

In machines: Packet’s systems are all bare-metal/dedicated, whereas AWS charges extra for dedicated machines

 

If you’ve noticed Tiingo being particularly snappy these days, it’s because I couldn’t stand it anymore. I had tried everything – buying more expensive instances on AWS, allocating more space, scaling horizontally, but it wasn’t matching up to my local dev machines. And so I started searching for the bottleneck – only to realize it was AWS.

I started researching AWS, I found I wasn’t alone. Many people experienced what I had but I tried prolonging the switch. Trying to change cloud service providers is  frustrating: scripts break, performance temporarily suffers, you experience downtime, and you know there will be unknown-unknowns.

Recently we just got real-time market data and this exacerbated the issues. Our websockets were being overwhelmed in queues and throwing more nodes at the problem was becoming expensive. We were trying to put a bandaid over a burst pipe. I finally decided on Packet.net and I want to share the reasons why. I’ve included benchmarking results to help emphasize the point.

Our search was motivated by two major reasons:

  1. The costs were getting out-of-hand
  2. After reading the below Reddit post on AWS’s [lack of] network stability, we started asking around and realized the experts were right… AWS’s network is slow. If we are going to give our users real-time data directly from the exchanges, that’s a heck-of-a-lot of data and we need it to be as fast as possible.

The Reddit/Blog Post was from an engineer at Stack Overflow.

http://nickcraver.com/blog/2016/02/17/stack-overflow-the-architecture-2016-edition/.

More specifically, this Reddit comment on AWS’s network stability that seemed echo’d by many:

Reddit Comment Inspiring Inquisition
https://www.reddit.com/comments/468p2m

We explored options like DigitalOcean, but Tiingo, like all financial data/analytics companies, is very data heavy and their plans didn’t allow for flexible data storage (EBS on Amazon for example). We looked into Rackspace and Azure, but the cost differentials didn’t make it seem worth the transition. Admittedly, having used Rackspace in the past – I’ve always loved their customer support and was personally disappointed I couldn’t justify the cost.

Eventually I came across Packet and spoke to their engineers since I hadn’t heard of them before.

I took a chance. It paid off.

I told them my concerns and what I was trying to solve (market data connectivity and high data transfer rates). One of the co-founders, who was a networking engineer, personally oversaw my connectivity project to the data exchanges. I’m pretty sure this was Paul Graham 101 on start-ups and customer service.

Ultimately though – I’m a data nut and so I decided to benchmark AWS vs Packet and was really curious about the Reddit comments on AWS’s network stability. The benchmarks closed the deal for us. It was a no-brainer. Part of the major reason being that Packet.net is bare metal (dedicated physical machines) whereas AWS tends to be focused on virtual machines. The hardware/pricepoint is actually even cheaper on Packet. We are paying 1/3rd of what it would cost to get a similar, less performant, system on AWS.

SO here you have it!

The tests below compare AWS vs Packet for disk, network, and CPU benchmarking – and also cost.

I’ve outlined and commented the results below so you can reproduce the tests.

Hardware

Since we are testing Packet vs AWS, we started off with the Packet hardware and found the AWS price equivalent. We started with the Type 1 and worked backwards to find the equivalent in performance/price on AWS.

Note: For the network test, we also test a smaller machine. The reason for the lighter hardware is for load balancing (HAProxy in this sense). If all of the back-end servers can have high network throughput, but we need to send it to the end-user, the load-balancer’s networking performance will be the determining factor. This is especially important in cases like real-time data.

Packet:

Instance Name CPU Memory/RAM Price Price/Month Notes
Type 1 (Server) (4 core, 8-threaded)  3.4ghz Intel Xeon E3-1240 v3 32gb $0.40/hr

$0.37/hr if reserved for 1 month

$292.80/month
Type 0 (Load Balancer) (4 core) 2.4ghz Intel Atom C2550 8gb $0.05/hr

$0.0459/hr if reserved for 1 month

$36.60/month What somebody may choose as their load balancer

*Note:We assume 732 hours in a month; but if you reserve a Packet instance for a month, they will only charge you 672 hours per month. However, to make apples-to-apples comparisons, all calcs in Price/Month assume you choose hourly pricing (732 hours for 1 month) to keep things normalized.

AWS:

Instance Name CPU Memory/RAM Price Notes
m4.2xlarge (Server) 8 VCPU  (2.4ghz Intel Xeon E5-2676) 32gb $0.479/hr $350.63/month xlarge was chosen for it’s optimized network performance
t2.medium (Load Balancer) 2 VCPU  (Xeon processors burstable to 3.3ghz) 4gb $0.052/hr $38.07/month What somebody may choose as their load balancer

 

OS:

Ubuntu 14.04 server

 

The Benchmarks

Network:

For this test, we used iperf3 as per the AWS documentation

(https://aws.amazon.com/premiumsupport/knowledge-center/network-throughput-benchmark-linux-ec2/)

We wanted to simulate a very real-world network configuration for ourselves – basically what our site looks behind a load balancer. Load balancers tend to require very low processing power, and serve as a network bottleneck to the user.

We are testing:

  • Internet -> Load-balancer (Haproxy)
  • Load-balancer (HAProxy) -> Server
  • Server -> Server

The “Internet” machine used was an Azure machine. Not perfect, but we figured it was a good 3rd party control.

You can view the detailed methodology in the Appendix below.

Results:

Performance:

AWS came out incredibly inconsistent – with a high std. deviation and low mean transfer rates. What AWS considered a “High” performance network tier, was the least expensive tier on Packet. Why didn’t we use AWS Elastic-Load-Balancer (ELB)? For our use case with websockets, – we found ELB to be lacking what we needed. This will be a blog post for a later day.

 

Comparing transfer speeds across machines on Packet vs AWS

What was particularly interesting was the inconsistency of the lower tier machines. We ran our benchmarks over an hour, and here is what the rates looked like when making requests to-and-from the lower tier (t2.medium) EC2 Instance. This seems consistent with their “burstable” instance – which is great and all…except Packet’s lowest tier outperforms it:

AWS's speeds decay significantly over time for the T2.medium instance- making it a poor choice for a load balancer
AWS’s speeds decay significantly over time for the T2.medium instance- making it a poor choice for a load balancer

 

Pricing:

The above AWS configuration is $.081/hour more expensive than Packet and also less performant.

Another consideration is bandwidth costs. AWS charges $0.09/GB (for the first 10TB) out to the internet. Packet.net charges $0.05/GB out to the internet. Within the same data centers (availability zones in AWS), both Packet and AWS are free. However, when transferring to a different availability zone, AWS charges $0.02/GB and Packet.net charges $0.05/GB.

Conclusion:

Packet is the clear winner in this. In both absolute speed and stability. In terms of price, Packet is cheaper by $.081/hour in the above configuration, or 15% cheaper – and for the majority of our bandwidth we go external to the internet. In outbound internet traffic, Packet is 44% cheaper.

Disk:

Packet offers two storage types: Basic (500 IOPS) and Performance (15,000 IOPS).

We created a EBS volume on both Packet & AWS with provisioned IOPS of 500 and then 15,000. Then we used sysbench to run an I/O test (see Appendix below for methodology).

Results:

Performance:

When getting to the 15k IOPS, we saw a more significant performance differential favoring Packet. At Tiingo we used the performance tier given the amount of data we store and calculate.

Disk Trasnfer Rates

Price:

Provisioning 15,000 IOPS on AWS @ $0.065/IOPS  = $975. But wait, that’s not all! They also charge $0.125/hour per GB.  So a 15k IOPS 500GB HDD on AWS would be $1037.50

On Packet it would be 500GB * $0.15 = $75.

Doing  a bit of algebra, the cost for 15k IOPS on AWS would be cost effective if you have >39TB of storage. That’s right – Packet is cheaper until you hit 39TB of storage….

Conclusion:

Packet is literally 92.3% cheaper than AWS for 15k IOPS performance, and Packet is even more performant. It’s the victor in disk performance as well.

CPU:

CPUs cannot be benchmarked purely on the speed of the processor [clock] alone.  For these reasons, we ran a sysbench test as well on different threads.

Results:

Performance:

The results are damning for AWS. On an 8 processor machine, the benchmark ran slower on 8 cores than 4. I ran this multiple times, double checked to make sure this was an m4.2xlarge. Then I spun up another m4.2xlarge and the results were more in line with what I expected (still slower than Packet).
However, I am going to keep the original instance’s benchmark below to highlight the point of noisy neighbors. With AWS, you can get a shared machine with other neighbors who are processor intensive and reduce your performance. This is what virtualization is. With Packet, you get a dedicated system. What most likely happened was that our original machine had a noisy neighbor.

Here are the results – you can see at 8 threads Packet performs 4x faster than AWS.

Packet is 4x Faster on This Noisy Neighbor AWS Machine

OK OK – I will show the second instance’s performance – even when there are no noisy neighbors.

Packet is 30-40% faster even with a better AWS instance
Packet is 30-40% faster even with a better AWS instance

Even with a non-noisy neighbor machine, Packet is 30-40% faster in processor benchmarks.

EDIT: A user asked me to run the benchmark using a compute-optimized EC2 instance. I decided on c4.2xlarge which has 8 threads, but half as much memory (16gb). It cost $0.419/hour ($0.019/hr more expensive than a Type1 Packet server). Here are the results (Packet wins again but less drastic of a margin)

Even using AWS Compute-optimized, Packet Type 1 outperforms it

Price:

On the above setup, Packet is $0.079/hour cheaper.

Conclusion:

There really is no way around it – the above benchmarks show the issues with virtualization. Even with those issues aside, AWS is slower and more expensive. Packet wins this one again.

Conclusion

Even giving AWS the benefit of the doubt, there is no way around it – Packet is faster and SIGNIFICANTLY cheaper.

Let’s take a very real-world example of our server set-up:

Packet:

Instance Name CPU Memory/RAM Price Price/Month
Type 1 (Server) (4 core, 8-threaded)  3.4ghz Intel Xeon E3-1240 v3 32gb $0.40/hr

$0.37/hr if reserved for 1 month*

$292.80/month
Type 0 (Load Balancer) (4 core) 2.4ghz Intel Atom C2550 8gb $0.05/hr

$0.0459/hr if reserved for 1 month*

$36.60/month
15k IOPS 1TB HDD $0.15/GB $150/month
2TB Outbound Bandwidth $.05/GB $100/month
Total $579.40/month

*Note:We assume 732 hours in a month; but if you reserve a Packet instance for a month, they will only charge you 672 hours per month. However, to make apples-to-apples comparisons, all calcs in Price/Month assume you choose hourly pricing (732 hours for 1 month) to keep things normalized.

AWS:

Instance Name CPU Memory/RAM Price Price/Month
m4.2xlarge (Server) 8 VCPU  (2.4ghz Intel Xeon E5-2676) 32gb $0.479/hr $350.63/month
t2.medium (Load Balancer) 2 VCPU  (Xeon processors burstable to 3.3ghz) 4gb $0.052/hr $38.07/month
15k IOPS 1TB HDD $0.125/GB + $0.065/provisioned IO $1,100/month
2TB Outbound Bandwidth $.09/GB $184.23/month
Total $1,838.84/month

 

Packet is literally less than 1/3rd the price and is more performant than AWS.

It’s allowed us to deploy resources we didn’t think would be affordable before.

Thank you to everyone @ Packet for making this product possible.

Further Steps:

If anybody wants to continue this study, I would love to hear your results. AWS does allow you dedicated machines for extra $, but we didn’t bother testing them since Packet is already cheaper than their virtual machines.

Appendix:

Methodology:

Networking:

Setting up AWS:

We want to make sure we give AWS the best chance. First, we have to make sure enhanced networking is enabled. Running the command:

modinfo ixgbevf

Will give us the output, and look for “version”. In our instance we have version 2.11.3-k. Amazon recommends we upgrade. For the ubuntu users out there, follow this gist and run the commands:

ixgbevf 2.16.1 upgrade for AWS EC2 SR-IOV “Enhanced Networking” on Ubuntu 14.04 (Trusty) LTS

After rebooting run:

modinfo ixgbevf

Again to make sure the version now reads: 2.16.1

Let’s also check via command line to make sure enhanced networking is supported (Ubuntu 14.04):

sudo apt-get install python-pip
sudo python pip --upgrade pip
sudo pip install awscli
#Note: Create an IAM user and attach the policy: AmazonEC2ReadOnlyAccess
#Use the security credentials in the configure policy
aws configure
#after configuring run (replacing instance_id with your instance_id):
aws ec2 describe-instance-attribute --instance-id instance_id --attribute sriovNetSupport

If you get the output, you’re good:

"SriovNetSupport": 
{ 
"Value": "simple" 
},

Next, we used iperf3 to run the diagnostic scripts and scrapy bench. iperf3 is a common network benchmarking tool and scrapy is the framework that powers Tiingo’s scraper farm. We figured Scrapy would be another real-time test to see how things flow.

the iperf3 command was:

iperf3 -B internal_ip_of_current_machine -c internal_ip_of_iperf_server -i 1 -t 3600 -V -p 80 -P 10 --logfile test.txt

Meaning we ran the tests for one hour (3600 seconds), and with 10 processors in parallel. Also note to set the -B option on Packet machines as it takes advantage of the full bonding algo and increases thoroughput.

Note: make sure to use the internal IP addresses to give the best benefit of doubt 🙂

Disk:

First install/update sysbench on your Ubuntu machine using the code:

echo "deb http://repo.percona.com/apt trusty main" >> /etc/apt/sources.list.d/percona.list
echo "deb-src http://repo.percona.com/apt trusty main" >> /etc/apt/sources.list.d/percona.list
apt-key adv --keyserver keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
apt-get update
apt-get install sysbench

Then we used the command:

sysbench --test=fileio --file-total-size=150G --file-test-mode=rndwr --max-time=720 --max-requests=0 --num-threads=8 --file-num=64 --file-io-mode=async --file-extra-flags=direct --file-fsync-freq=0 run

The file size must be greater than the RAM size for this test to properly work.

CPU:

See the above “Disk” section to set up sysbench.

We then ran the command below, replacing “num-threads” with 1, 4, and 8 respectively

sysbench --test=cpu --cpu-max-prime=20000 run --num-threads=1

 

Facebooktwittergoogle_plusredditpinterestlinkedinmail
The Protagonists Fixing the Problem that Apps Created (Part 2)

The Protagonists Fixing the Problem that Apps Created (Part 2)

This is part 2 of the blog post: Apps Have Recreated the Problem the Web Was Trying to Fix

 

In this post we’re going to discuss the protagonists who are creating tools and frameworks to unify the “App” experience across desktop and mobile. If successful, this will mean we are getting closer to mobile and desktop cross-platform and cross-browser compatibility. Please read part 1 if you are curious as to what this problem has meant for firms and developers.

Google

All UX engineers will tell you that the mobile interface is fundamentally different than a desktop application. After all, we all know what the “three lines” mean, right?

The three-lines we came to know as the “Hamburger Menu”

It is universal code for, “There are more features that will show themselves if you click us. Do it. Click us.”

What Google is therefore doing is creating a design specification that makes a unified standard across both desktop and web applications. For a very comprehensive description of this, check out their website available here: Material Design Introduction. It’s a wonderful read of their philosophy and great information for those of you learning UX like myself.

One example of Material Design, for those of us familiar with Google’s Hangouts App, is this menu:

 

Here we can see Google is attempting to unify the experience of the “Hamburger Menu,” by creating both a mobile and desktop interface for it.

But Google isn’t the first to attempt this.

Note: The hamburger menu has it’s critics, but that is beyond the scope of this blog post.

Twitter (Bootstrap)

Twitter created a platform known as Bootstrap that’s become ubiquitous and set a new standard for a unified desktop/mobile experience, otherwise known as “responsive.” It set the foundation for many of the design frameworks you see today and almost all responsive web applications rely on this framework today.

It popularized the “grid layout” and always had the philosophy of “Mobile First.” It even helped set the mobile and web icons  you see today. For a full list of all the features please visit: http://getbootstrap.com/components/

If there was a museum of “web development,” I would argue Bootstrap would have its own exhibit. The impact it’s had is absolute awe-inspiring and all of the criticisms people have of it come with an implicit asterisk:

*We are not insulting bootstrap. It’s amazing. The whole reason we can critize them is because they set a new standard that got people thinking differently.

Please visit: http://getbootstrap.com/components/ as no matter what images I post, it will not due them justice.

Microsoft

Microsoft has been the platform I have been the most excited about. Close friends of mine have heard my rants on unified web experiences, so it felt like kismet when a senior product individual reached out to me asking to test out their Web App Studio.

I was impressed with the premise: they are allowing individuals to create their own apps meanwhile creating a container process to take HTML5 web apps and make them feel like native experiences. While they are not the first (as we will discuss below), they are the major web company actively supporting this process given the deprecated “Mobile Chrome Apps.”

The premise of the App studio is two fold (Fed Dual mandate anyone?…sorry)

  1. Allow users to create their own apps in a point-and-click manner
  2. Allow your HTML5 web application to feel like a native app

While this post won’t get into 1, it does help many small businesses who want an app alongside their product.

With respect to 2, I found the app submission process relatively easy, with the majority of my time spent typing out app descriptions, ratings, etc. The actual wrapping of the Tiingo took all of about 15 minutes.

Here is a screenshot of Tiingo running a native desktop application in Windows 10:

Tiingo Running in their Web App Studio Container
Tiingo Running in their Web App Studio Container

For those of you who’ve never published an App before in the Windows Store, use the videos in the middle of the page: Web App Studio. I find it difficult to sit still and watch videos, so I will be posting a graphical walk-through of how to do this.

Having been around web development and seen multiple container processes come and go, this has been the easiest experience to date. So far I have not found the same memory leaks that have plagued the fork’d Chrome projects with a similar premise in mind.

Also – a thank you to Microsoft with their Edge browser. Seriously -the company that brought you IE6, has launched a new browser that is challenging other browsers in benchmarks (include Google’s benchmarking tests) and recently they have open-sourced their javascript engine: https://github.com/Microsoft/ChakraCore. While it has a ways to go, especially with extensions and feature compatibility, initial results are more than promising – they’re exciting. And thankfully, this performant javascript engine is powering their Web App container.

Apache Cordova

The 500 lb gorilla in the room: Apache Cordova

I love what this platform is doing, but I detest that it’s had to exist because the major tech giants couldn’t get together to hammer out a standard (looking at you Apple….from my iPhone).

The goal of this platform is to take an HTML5 web application and wrap it so it can be pushed to the App Stores of Google, Apple, and Microsoft. This has benefits as this means a native feel and interaction with a phone’s hardware and interfaces such as cameras, GPS, and notifications.

The downside, and similar to the Java Virtual Machine, is that these programs run in Javascript and the performance noticeably slower since native code will always be faster than Javascript (although the gap doesn’t have to be this wide – something Java has closed decently well).

Compatibility Features with Native Applications
Compatibility Features with Native Applications

 

Conclusion

The open source and web-dev communities are doing wonderful things to address the problem of cross-platform/browser compatibility, but ultimately it is the platforms that have web stores that should be pushing forward with a solution. If Apple continues down this road, it will only be a matter of time before development becomes more inconvenient, and if market-share shifts, the iOS will become the second app we develop for instead of the first. Even more so, arguably the Safari browser is becoming more difficult to work with. As Microsoft can tell you, that’s a hard reputation to brush off.

Ultimately, projects like Apache Cordova are wonderful, but I hope go the direction of jQuery where they are no longer necessary or become components of high-level frameworks like Angular. The work jQuery did set a new standard and I hope Cordova goes the same way.

I applaud both Google and Microsoft from tackling this problem head-on with different solutions: support for Cordova, making a unified UX, and explicitly supporting Web App Containers to save developers time.

Well done –

 

Facebooktwittergoogle_plusredditpinterestlinkedinmail
Apps Have Recreated the Problem the Web Was Trying to Fix (Part 1)

Apps Have Recreated the Problem the Web Was Trying to Fix (Part 1)

It’s no question the word “App” has become ubiquitous and for good reason. For users it’s allowed us to install entire applications within a click. Just 10 years ago, we had to run into a store like Best Buy, look for an oversized software box, and load the game or software over 4 CDs. For developers, it’s meant we can take our product and unleash into a massive pipeline where people are already on their phone constantly.

I still have font memories of replacing CD after CD 6 times to install this game
I still have font memories of replacing CD after CD 6 times to install this game

The problem is we’ve distracted ourselves from a problem the web has been working to solve: cross-platform compatibility. In fact, we’ve made it even worse.

Let me explain.

Before Tiingo and trading, I used to work at the National Institutions of Health (NIH) 10 years ago where I co-founded an open-source computational chemistry tool. We called this project CHARMMing (Chemistry at HARvard Molecular Mechanics INterface and Graphics – yes I also had to take a breath to finish the acronym).

I was a junior software engineer at the time but I remember the discussion vividly. We knew we wanted to make a pretty interface to this complex chemistry package. We wanted this UI to be both educational and friendly, but we also wanted to disrupt a commercial vendor that was charging insane rates (and now you know where the ethos of Tiingo started).

The problem was that if students were part of our audience, how could we make this chemistry package accessible when it was only usable via scripts, command lines, and on UNIX based systems? If we wanted non-technical chemists to conduct research, asking them to run scripts and switch to RedHat would be unreasonable.

So how could we make our program accessible on Windows, Macs, and Linux-based systems? We call this concept cross-platform compatibility.

For those of you new to programming, this has always been an issue. You can see it today where some Apps are only available for Androids and/or iPhones. The reason a solution to cross-platform compatibility been so coveted in the programming world, is that if solved, it would result in hours of development time saved. It would be millions of dollars saved, and keep developers from pulling their hair out. It would mean developers would not have to make one app for iOS and another for Android. There would be just one app.

This is why you see certain applications more feature rich in one operating system (or phone) over another. It’s also why you see the wide-spread use of Java – a programming language meant to solve this problem.

The obvious solution to the cross-compatibility of our chemistry software at the time, which many people supported, was indeed Java. At the time though, Java was still riddled with performance issues that really made our process difficult. But the deciding factor was we didn’t enjoy coding in Java and we were just getting into Python. So we could’ve used Java, but something inside of us hesitated. We struggled with this until a chemistry researcher in our lab proposed something different:

“What if we made a web application? I think this will be the future of software development.”

Looking back that foresight was visionary.

Keep in mind this was 2005. We had witnessed the end of the tech bubble, the iPhone hadn’t been invented yet, YouTube has just been founded, and Will Smith was still writing songs.

 

So we  developed the web app, making our interface cross-platform. But there was another problem – we had to make it cross-browser compatible. While cross-platform dealt with different operating systems, cross-browser meant the web application had to work with Internet Explorer, Firefox, Chrome, Safari, Opera, and so on.

For newer web developers, Firefox, Chrome, Safari, and Edge all tend to render pages similarly now, but this wasn’t the case of 2005. For those of you unfamiliar, the web has always been almost the wild west of software libraries. You can see this with the modern day website.

When making a new website, you need to know at least 3 different languages. You have HTML to layout the page, CSS to further advance the layout and style the page, Javascript to write code in the browser, and [usually] a different programming language on the backend to deal with server logic.

Not only that, originally CSS and Javascript were not even standardized across browsers! So developers had to program special conditions for Internet Explorer, Chrome, Firefox, and Opera. While Chrome and Firefox were far better at following this standard, eventually Internet Explorer (now Edge) caught up.

Even with all of this complexity behind the web and forming standards, it didn’t deter leaders in the web community to come together and hammer out a standard that was actually followed. It took awhile, but finally now all browsers [mostly] comply.

However…

The “App Revolution” is undoing the work the web community put in to make the internet both cross-platform and cross-browser compatible.

When Apple launched their App Store, they required a programming language known as Objective C, and then acquired a “friendlier” version known as Swift. Meanwhile, Google with the Android phone promoted Java – yep the same one mentioned above. At least Google was trying…

So while web standards were hammered out to create unified experiences, Apps started going in the exact opposite direction.

What does this mean for the developers behind your favorite websites?

Let’s say you are a developer and you make a website/webapp (like Tiingo) that is accessible from everywhere around the world, and because of the work of many experts across the web, you know your users are getting the same experience regardless of their browser or operating system when using their desktop or laptop.

But now your users are asking you to make apps for their phones and tablets. What do you do? After all they are very friendly and native ways to interact with users on the go.

Now let’s make this even worse 🙂

Some people don’t want to download your app (we’re all guilty of it), and want to google your site on their phones. Now you have to make a mobile version of your site so it looks decent on mobile phones, even if somebody isn’t using an app.

The modern day web company therefore has to do the following to be accessible on the majority of computers, phones, and tablets:

  1. Make a desktop version of the website
  2. Make a mobile version of the website
  3. Make an iPhone app
  4. Make an Android App

In order to make these apps, you have to code an entirely new front-end to communicate with iPhones, then do it again for Androids (there are some tools that help with this but they are not perfect). So now you need to know HTML, CSS, Javascript, a back-end programming language, Objective C/Swift, AND Android Java for a modern web app. This is why you see newer companies choosing either an app or a web app for their product.

But is there really a reason we have to re-create 3 mobile versions of our products? After all, shouldn’t the mobile version of the site work on all phones the same way? Given how much work was done to standardize browsers and webpage rendering?

Exactly.

This abandoning of cross-compatibility in the web has led to multiple programming teams just for mobile, which means higher costs, more overhead, and slower development. It has been a huge cost on companies and developers but luckily there are protagonists to this story.

In Part 2, we will be discussing the steps Microsoft, Google, and a few other smaller companies have been doing to unify the “App” experience.

 

Facebooktwittergoogle_plusredditpinterestlinkedinmail
Matisse and Memory – What Matisse Means for Traders

Matisse and Memory – What Matisse Means for Traders

I was going through old E-mails when I came across a correspondence I wrote to some friends awhile ago. Those of you reading this may be wondering, “what does Matisse have to do with trading?”

Let me explain.

Investors and traders are creatives. Experienced market players will acknowledge this, but to those who are just breaking in you may not realize it. After all, finance is fraught with imagery of old men in suits, but if you were to stop any long-time trader in a hedge fund, you would find varied interests and a more holistic view of life.

Why is that?

A trader or investor’s job is to find ideas the rest of the world hasn’t. By that very nature, they have to be creative and you see prominent trading psychologists, like Brett Steenbarger, who help traders with idea generation.

A second important fact of traders is the concept that we must focus on process vs. outcome. This is something I learned from Brett, and you can find a great PDF read on the topic here: Process vs. Outcome in Sports, Business, and Economics

The top traders in the world may only be right 55-60% of the time. If we were to try and keep track of that in our heads, it would probably feel no different than a coin flip. Actually, it would feel far worse because our psychological biases make us assign a higher probability to events that are stronger in memory. You can see this with the number of people who may buy insurance after an outlier event like a hurricane that may not happen frequently. Even if the odds of a same hurricane of that magnitude are exactly the same, people will assign a higher liklihood of that happening because it is stronger in memory.

A great book, which we will get to in a moment with Matisse, that goes into our psychological biases with data analysis was made public domain by the CIA. The founder of the former fund I used to trade at called it one of the best books he ever read on markets: Psychology of Intelligence Analysis

And the final thing touched upon in the E-mail that relates to markets is: how much information is too much information? Those of you may have heard of Nate Silver, who writes the 538 Blog. Another person who has been in this realm for a long time is Philip Tetlock who specializes in forecasting. His most recent book is out (haven’t read it yet, but his previous one was fanstic) which is getting rave reviews: Superforecasting: The Art and Science of Prediction

Basically he has found that experts suck at prediction, and sometimes too much information actually leads to worse prediction. This is something. Psych. of Intel. Analysis touches upon as well.

So with that preface, I will copy and paste the E-mail I wrote below. Enjoy 🙂

 

I thought back to our conversation over the weekend about Matisse. I wanted to crystallize what it was about him that I most enjoyed. Here are some thoughts from my notebook:

Matisse was a man who followed the process and didn’t necessarily seem to view the process as a struggle but as a means to achieving the product he wanted. He would attempt to reduce items to their most simple form and reduce complexity and detail to evoke the same emotional arousal. He worked iteratively and hired a photographer to document his process once the technology was there. It becomes apparent that his work started with many details and much complexity, and the later revisions would come back to capture the essence. His first successful attempt at this was in Young Sailor I and II.

Young Sailor I and II
Young Sailor I and II

Young Sailor I and II have to have been my most favorite pieces I have seen of Matisse to date. I think part of it is how he captured the very essence and process of memory storage:
Sensory Image Storage -> Short-term memory -> Long-term memory (recap https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/books-and-monographs/psychology-of-intelligence-analysis/PsychofIntelNew.pdf on page 18-19). I haven’t found commentary yet that makes note of this. What is so Matisse about this is that he captures the process of memory storage, while trying to actually evoke memory and feeling! He started documenting the process before he knew he was documenting a process about recreating an emotion. It’s so meta and hipster – he belongs in williamsburg.

Young Sailor I contains all the fine details of an actual image of a young sailor that looks a lil’ bit angsty. The second iteration he comes back to his mental image of the the young sailor… but this time he is a little boyish figure with a big smile and child-like features (Young Sailor II). The difference in brush strokes subconsciously makes clear the Young Sailor II is a mental image and also lighter and softer in features. A common experience was how the 6th graders looked so large as a 3rd grader; but, looking back I realize how little they were themselves. To any parent, even the meanest 6th grader was an angel.

It was the process of Sailor I and II and the idea of capturing an essence rather than all the fine details that sticks to me about this painting. These ideas also travel over into statistics and psychological biases where it’s been shown added information can only help predictive edge so a certain degree before it actually hurts. Sometimes knowing less about a topic lets you make a more correct decision (reading Tetlock right now about this …another source is  Psychology of Intelligence Analysis).

And to add on top of all of this…Matisse was embarrassed of Young Sailor II and originally told everyone that someone else did it to avoid embarrassment. Young Matisse was human. Even he didn’t think it would be received well.

Best,

Rishi

 

Facebooktwittergoogle_plusredditpinterestlinkedinmail
Launching the Tiingo Open Data Initiative

Launching the Tiingo Open Data Initiative

Since day one, Tiingo has been committed to providing you with top quality data that is more accurate than companies charging $30k+ a year.

How in the world is a company with a “set your own price” model going to pull this off?

Because we’re going to do something unheard of.

Presenting the Tiingo Open Data Initiative

Sound sexy?

No? Good, because clean data should be boring. Except we’re still going to try and make it sexy (we like doing the impossible).

aapl div

 

We’ve all been there, looking at a number on a financial tool and wondering, “is that number actually right?” Then we might go to another website or source and double check. Even if we see two equal numbers, we think, “hmm… ok.”

And our skepticism is well-warranted. Often there will be 1 or 2 main vendors for the same source of data. If one vendor is wrong, then many financial sites are wrong.

So what if a company could show you where, when, and how they got their numbers? This is what our Open Data Initiative is about: transparency.

Now within less than a second, you can verify Tiingo’s numbers straight from the official source: press releases. Either hover your mouse over the orange binoculars or click the “Source” link directly to see where, when, and how we got our data.

Try it here:
AAPL Dividend History

Cool huh?

Dividends are just the start.

BUT WAIT THERE’S MORE!
Since the ethos of Tiingo is to “Actively Do Good,” when we are ready we will open all of this data to the world via an API. Right now when we catch mistakes, we are notifying our data vendor so they can fix the data for all their users. We don’t believe in holding good data hostage.

A quick aside: back-populating dividend data and sourcing it is a data intensive process but we are working around the clock to load in this data historically. However, future dividends are being monitored and added in real-time.

From Tiingo with Love

-Rishi

 

Facebooktwittergoogle_plusredditpinterestlinkedinmail