20131203

DV footprints on Disk and in Memory, Part 1

More than 2 years ago I estimated the footprints for the sample dataset (428999 rows and 135 columns) when it encapsulated in text file, in compressed ZIP format, in Excel 2010, in PowerPivot 2010, Qlikview 10, Spofire 3.3 and Tableau 6. Since then everything upgraded to the "latest versions" and everything 64-bit now, including Tableau 8.1, Spotfire 5.5 (and 6), Qlikview 11.2, Excel 2013 and PowerPivot 2013.


I decided to use the new dataset with exactly 1000000 rows (1 million rows) and 15 columns with the following diversity of values (Distinct Counts for every Column below):


[googleapps domain="docs" dir="spreadsheet/pub" query="hl=en&hl=en&key=0AuP4OpeAlZ3PdGFyUUl6VmdSWWVubk5sbjZ3Z256Znc&single=true&gid=1&output=html&widget=true" width="240" height="360" /]

Then I put this dataset in every application and format mentioned above - both on disk and in memory. All results presented below for review of DV blog visitors:


[googleapps domain="docs" dir="spreadsheet/pub" query="hl=en&hl=en&key=0AuP4OpeAlZ3PdGFyUUl6VmdSWWVubk5sbjZ3Z256Znc&single=true&gid=0&output=html&widget=true" width="420" height="260" /]

Some comments about application specifics:




  • Excel and PowerPivot XLSX files are ZIP-compressed archives of bunch of XML files




  • Spotfire DXP is a ZIP archive of proprietary Spotfire text format




  • QVW  is Qlikview's proprietary Datastore-RAM-optimized format




  • TWBX is Tableau-specific ZIP archive containing its TDE (Tableau Data Extract) and TWB (XML format) data-less workbook




  • Footprint in memory I calculated as RAM-difference between freshly-loaded (without data) application and  the same application when it will load appropriate application file (XLSX or DXP or QVW or TWBX)



7 comments:

  1. Thanks, interesting stuff. Really surprised about the different in footprint between Spotfire and Tableau. So does this essentially mean you can work with 3-4 X as much data into memory with Tableau than you could with Spotfire, given a machine with the same amount of RAM?

    ReplyDelete
  2. Nice comparison. But what is final decision out of this comparison ? Seems like Size of Tableau is very less. So is tableau better than other 2 in DV Giants ?

    ReplyDelete
  3. Hello Andrei, can you possibly share your new sample data set? I would be great if it was available since it would make your comparative test even more valuable. With the data, people can compare additional products with your results as a benchmark.

    ReplyDelete
  4. Andrei Pandre4/12/13 03:19

    Hi Steve: the comparison above is not to suggest that Tableau can work with more data in Memory. Many other factors involved, e.g. the usage of Disk space as Virtual Memory. Both Spotfire and Tableau use the Virtual Memory, when RAM is not available and when the size of Dataset will grow it will affect RAM footprint for sure.
    Qlikview did not use Virtual RAM; until v 11.2 Qlikview required all data loaded into RAM, but since v.11.2 it introduced Direct Discovery allowing connect to disk-located data. Usage of Direct Discovery actually can slow down Qlikview.
    In any case, modern proliferation of SSD can improve the speed of Virtual memory and a speed of other DV interactions with Disks.

    ReplyDelete
  5. Andrei Pandre4/12/13 03:25

    Hello Justin: I used the proprietary dataset to save my time and I cannot share proprietary data, Sorry!

    ReplyDelete
  6. Hey Justin, thats a good one, I will try that over the weekend.

    Hi Andrei, Can you include SiSense on your data? I know this is random request but would be really helpful to see what results we get with SiSense as they have been ranked #1 for Analytics on laptop.

    See the link below

    http://finance.yahoo.com/news/sisense-announces-worlds-smallest-big-130000596.html

    ReplyDelete
  7. […] previous blogpost, comparing footprints of DV Leaders (Tableau 8.1, Qlikview 11.2, Spotfire 6) on disk (in terms of size of application file with […]

    ReplyDelete