No answers, only opinions

Oh the Audacity

Open source software is developed within a community. Not everyone in the community needs to know how to write software to contribute. Some of the most important contributions come in the form of user feedback.

Truth be told, most software developers have no idea how much pain they've caused with their software until someone tells them. One approach is to add telemetry to software. In closed-source applications, an end-user has zero insight into anything about what that telemetry process might look like.

Audacity is audio editing software that is open-source. There's currently a change drafted to add telemetry from Google and Yandex. This has sparked a lot of discussion and the conversation has become heated at times.

I see quite a few of my friends chiming in on the thread to protect user privacy. I'll capture my thoughts here.


I know my blog posts have taken a dark turn lately, where I invoke the topic genocide. Let me tell you, I don't take that lightly and I'm not trying to be alarmist or edgy. It's a tough conversation I feel we need to have.

I live in America. Things are relatively stable here for my life in this society, I don't live in fear for my safety or those of my loved ones. During last year's election cycle, tech bro peers would talk about which country they would move to if Trump was elected a second term.

I think a lot about the people that can't afford to just up and move when the political landscape is unsatisfying.

Right now, my mind goes out to the people of Myanmar/Burma. The military are disappearing young men to crush uprising. For context, the military seized power from their democracy, so the uprising is people that would like their democracy back.

This is the ultimate worst-case scenario I think of whenever I'm writing software. I consider, “if a hostile force gains access and control to this, what is the worst thing they can do with it?”

The worst thing to me is to use that data to figure out who to disappear. The easiest way to do that is telemetry, which can also provide an accurate when and where to make a disappearance happen.

For Audacity's case, the telemetry being added provides when the software is in use and what the user is doing within it. This is the sales pitch on the side of the box of Google Analytics. In conjunction with other data Google has access to, it's possible to deduce the who and the where.

To quote many in the discussion thread, UUIDs are pseudonymous at best. The truth is, we don't know which military forces Google collaborates with. We just know IBM was willing to work with Hitler because the money was legal tender.

They are a business too, right?


What is a UUID?

A Universally Unique Identifier. This is an example of one: 5e4f2045-520a-44c1-835d-1ddb827c31be

Using them is an extremely common practice in software development. I can create one and know with statistical certainty that there is not another one like it.

Only after generating 1 billion UUIDs every second for approximately 100 years would the probability of creating a single duplicate reach 50%.
Wikipedia

Since generating UUIDs is so trivial, literally everything in the surveillance (or advertising) world is tagged with one.

  1. Refrigerator opened: UUID.
  2. Face unlock successful: UUID.
  3. GPS Coordinates entered: UUID.

Any discrete event is not particularly valuable. The value comes from there being a lot of data. In the example above, coincidence analysis can be used to infer where someone is now, who they are, where they are going, and when they might arrive.

And that's just three casual data points. Since UUIDs are so unique, they can also serve as a permanent record.

  1. Imagine the UUID we generated above represents this blog post.
  2. Let's take everything that's ever been tagged by a UUID and put each one on an index card.
  3. Let's put all those index cards into a really big room.
  4. Let's make a robot that can find a specific index card in the room based on a given UUID.
  5. Let's ask the robot to find: 5e4f2045-520a-44c1-835d-1ddb827c31be

In pseudo-code, one approach looks like:

const blogPostId = '5e4f2045-520a-44c1-835d-1ddb827c31be';

const everything = [...allIndexCards];

const blogPost = everything.find({uuid} => uuid === blogPostId);

console.log(blogPost);

blogPost: {
uuid: “5e4f2045-520a-44c1-835d-1ddb827c31be”,
published: “May 8th, 2021”,
author: “9f4747e1-fdb2-4bfa-9d67-af9c641d6757”,
postBody: “...”
}

That UUID represented as an author can then be used to search the room of index cards.

author: {
name: “Bob Dobalina”,
address: “1200 12th Avenue South, Suite 1200”,
city: “Seattle”,
state: “WA”,
zip: “98144”,
country: “USA”,
email: “bobby@example.com”,
phone: “+1 (657) 867-5309”,
birth: “1970-01-01”,
politicalAffiliation: “d186694e-77bf-4d4c-98f2-c9c846ee94c4”,
religiousAffiliation: “1fe14a50-298f-4a9a-8ce4-b51b954afa57”,
employer: “cf74f426-7b62-42fe-afbb-c46521a42059”,
interests: [
“9fe9d552-07ce-4317-a1f1-5f45960b4302”,
“69a6083b-6e9f-4f24-8b1a-cbab2c16fdcc”,
“d1cd3da1-5f7e-4274-b5b6-5b7ca7cc071a”
]
}

Each of those UUIDs can then be looked up. This type of data structure is a graph. Every item can be associated to another item by a UUID.

In theory, the limit to building a mechanism is the cost associated to run it. In practice, we've only got social structures to protect people from unnecessary telemetry.

In the case of Audacity, this is the comment section on that pull request in GitHub.

For the rest of the world, we've only got each other. Let's not build spyware.