That's exactly where I first heard it. But, they stole it from the 3 Stooges. Steal from the best, I always say. And no matter what, Don't drive like my Brother
That's Data's tag line.
To be fair to Terry McIntosh, Ares Digital did integrate all that data into one system. He's talking about the source spreadsheets, exported from the crowdfunding platforms. Now that he's gone, all the updated and integrated data went with him, apparently, meaning Axanar's now starting from scratch.
So if he destroyed a couple days of rejoining the 8 or so source lists, its not cause for saying that starting over is such a big deal. Just some manual labor lost.
![]()
I will now give you a crash course in the data geekery that is database joins. I worked in data for a good 14 years, for insurance and financial services firms. Millions of rows of data, no lie.
Imagine an insurance database just for auto accidents (there were other kinds of claims, but I'm trying to simplify here). Let us assume there are only 3 people we care about: Andy, Beth, and (heh) Carlos.
Andy and Carlos are drivers but only Andy is insured by the company. Beth is Andy's passenger. Collision ensues. Beth is injured. Both cars are damaged.
There are tables of insureds and of claimants. It's basic information, names and addresses, dates of birth. The kind of stuff you would find in your address book. Another table is just for policies. There are two rows for Andy: one is his collision coverage; the other is personal liability. In some states that's called PIP (personal injury protection). Depends on what state he's in re what the minimum is. The guy could also have homeowner's or umbrella coverage but I am going simple here.
- One data row is for Andy's property damage claim on his vehicle.
- One data row is for Beth's personal injury claim.
- One data row is for Carlos's property damage claim.
Join the insured table to the policies table and you get Andy's complete info (that's the join showed in the middle, above). The row would look like this:
But if you want to see if Beth and/or Carlos are insureds, you do an uneven (outer) join. That is the one in the lower right corner. It will show you which rows, between the two tables, do not intersect. Hence the rows may look like:
- Andy, whatever his address is, and then his policy # in the policy field.
Yeah, they're that easy. It might take a junior data person a little while to get up to snuff but they have their vaunted 12 - 14k donors. Why not ask one of them for help?
- Beth, whatever her address is, and then nothing in the policy field.
- Carlos, whatever his address is, and then nothing in the policy field.
Or to put it in Axanar terms, once you transcribe the messy free-text emails containing miscellaneous info into a spreadsheet and you have altogether a bunch of spreadsheets with pledge/address data from various sources, you load them into database tables (trivial), and the "JOIN" geekery is "match up the entries on each table that are for the same person". That's some really deep stuff you're smokin there, the "AND" example above, the simplest one. Don't even need each table to have exactly the same address or other data, just similarly parsed key fields you want to match on.
Match on lastname, firstname, postal code across data from the differing platforms gives you maybe 80% of the data rejoined, pulled out per person on the first pass. Match those by transaction IDs (purchase order IDs, donation IDs, whatever) to transaction data records on each system (there could be more than one KS donation for a person, for example). Then look at partial matches where perhaps the first name is different but all otherwise matching key fields, make manual judgments, maybe mark tentative joins for manual review, cleans out another 10%. Use a few hypotheses, make some more proposed joined data, end with a residue. Browse the joined data see if there are any join artifacts (weird mismatching), pull them aside for special processing.
Email the definite ones with their combined data, ask "we'd like to verify this". If you don't want to email the details, set up a survey in ks/igg that the email links back to, question 'this your whole donation history across platforms?' answer 'write any comments or click yes'. Kickstarter/igg emailings can be made to the ks/igg email addresses, very likely to be correct. ACT on all confirmed results. FIX and ACT on all corrections sent in.
Make successive passes pattern matching and cleansing special cases.
The magic of AND JOINs, data geekery for 'find the same people in each spreadsheet' !!!
And the he is in the wrong. The data was not his. When he left the project, he had a moral, ethical, and probably legal responsibility to leave the database behind.
His potential solution was simple. Export his joined data in a simple format and give it to Axanar. Then destroy his local copies. He didn't have to face a dilemma of giving them his software, if he felt that the data was somehow bound to his system. Just pull the data in a way that doesn't reveal the super secret details of his design for what a million people have done before, and give Axanar the data.
As for legal obligations, I should think there very well may be some, if any transactions were recorded in his system (financial, data update (address etc.)), or otherwise, and he didn't give that data to Axanar, still keyed with purchase order numbers and whatever else kept the data intact in the system that performed the transaction. Wasn't he collecting donations through this software? I don't recall for sure.
Last edited: