Blog: May 2012

Blog from May 2012

There are 6 blog entries from May 2012

Simple Podcast Manager v1.1

May 30th, 2012 | View Post

The Simple Podcast manager is a podcast manager designed to work specifically with iTunes. It has no bells and whistles, but allows one to quickly and easily put a new podcast online.

I actually built this project as a favor for Bill Burr a few years back. He had just started his now famous, Monday Morning Podcast, and needed a way to manage his podcasts. His audience eventually grew beyond the capacity that I could keep up with and so I suggested implementing a commercial solution. When we eventually stopped using the Podcast Manager in exchange for something else, I decided I'd go ahead and release it for public consumption. It comes with no guarantees or warranties, but it works well.

I've released it under the GPL license, so please do what you will with it.

Download from Freecode at:
https://freecode.com/projects/simple-podcast-manager

Download directly from:

Tar/GZ: SimplePodcast.v1.1.tar.gz (32,530 bytes)

Bayesian Flooding and Facebook Manipulation

May 22nd, 2012 | View Post

For the past few months I've been conducting an open online experiment. It's really an extension of a seven-year project I've been working on, but this newer portion is exclusively focused on Facebook. The purpose of the experiment is to explore how much control a typical Facebook user has over his or her personal information online, not what is externally visible, but what is internally being analyzed. In other words, is it possible to manipulate the flow of information being collected for advertising and marketing purposes?

The consideration behind this idea is due to three basic realities:

The advancement of online technology and communication is not going to slow down, much less stop. Social networking plays a vital role in this.
In order to participate in society, one cannot simply hide from technology.
There is a lucrative market for acquiring and selling private, personal information (demographics, lifestyle choices, spending habits, interests, etc.).

Over the past few years I've had dozens of people ask me if I thought they could remove themselves from the watchful eyes of Facebook. My answer is always no. The problem is that once the information has been collected, it will always be stored and associated with you. I have therefore devised a slightly different method for dealing with this problem. Rather than trying to hide information from Facebook, it may be possible simply to overwhelm it with too much information. While this may sound counter-intuitive, there is a well-known mathematical theorem that may in fact validate the idea.

Imagine if you were asked to look inside of a friend's pantry or cupboard for 30 seconds, and then to make a guess about their general diet. For most people this would be a pretty simple exercise and the results would probably be very accurate. But now imagine that when you peered inside, their pantry was magically the size of an entire grocery store and contained just as many products. Other than the fact that they weren't starving, what could you really report about this friend's diet? The amount of variation available would make the analysis very difficult. This is the basic idea.

Target and Teenage Pregnancy

Back in February of this year, Forbes released an article detailing how Target successfully determined that a teenage girl was pregnant before her father was able to. The girl had received coupons in the mail for a host of maternity items. This prompted her father to call Target's customer service department and complain how it was inappropriate to send a teenage girl such material. Target was quick to apologize, but a few days later the father actually wound up apologizing to Target noting that his daughter was indeed pregnant. The question is, how could Target possibly have known this?

Like most big-box stores, Target tracks everything that their customers purchase. According to the article, customers are assigned a "guest ID number, tied to their credit card, name, or email address that becomes a bucket that stores a history of everything they've bought and any demographic information Target has collected from them or bought from other sources". So not only does Target keep a history of what customers buy, but they purchase demographic information about these customers from other companies as well. Perhaps companies like Facebook.

They are then able to have programmers and statisticians analyze giant chunks of data and assemble patterns of consumer spending habits. In the case of Target, the statisticians noticed that women tend to stock up on health supplements in their first trimester, and unscented lotions in their second trimester. As it turns out, they claim to have about 25 products that they use to indicate a pregnancy score - the likeliness that a customer is pregnant. The article explains that when just four of those products are purchased by the same customer, the likeliness of them being pregnant is a whopping 87%.

It might sound a bit like witchcraft, but it's the type powerful of statistical analysis that can be done with such large data pools. Imagine the selling power a company like Facebook has when it comes to providing supplemental demographic information to a company like Target. Whether or not they do sell such information is a matter of their internal business practices, but the data they are regularly collecting is priceless. This provides them with a continued incentive to collect and store personal data about individuals.

Email Extrapolation

In 2004 I was working for an email marketing company. Naturally we were eager to analyze potential customers any way that we could, but often had little more to work with than their email address (incidentally these were typically acquired via some pseudo-scandalous online promotion). I began working independently on an interpretive email address analyzer and after leaving the company, had mixed success in marketing the product to other businesses. The relevance to this experience is how much can be ascertained just from a single email address, never mind the hoards of voluntary information associated with a Facebook profile.

The basic process would run an email address through a series of filters I had devised. Each filter would output how the email address scored against the particular filter. For example, say we wanted to analyze the gender of a user. If the email address reads susan@example.com, it's pretty likely it belongs to a female for the obvious reason that "Susan" is a female's name. The program would cross-reference a database of names and assign a probable score of the user being female. But consider a less obvious address like steelerj67@example.com. A different gender filter would extrapolate the word "steeler", cross reference a list of sports teams, and associate it with the Pittsburgh Steelers. It would then score the user as a male, albeit with much less certainty than the previous example. The logic being that females tend not to associate their email addresses with sports teams. There could be dozens of filters applied just to determine the likely gender of the person. And of course other filters would perform completely different tasks. In this case one would also extrapolate the "67" and interpret it as a birth year.

Without performing this kind of analysis we just have a customer email address. But with this analysis we potentially have a 45 year old male living in Pennsylvania who likes football. That's a lot of data to extract from a single address, even if it is speculative.

This type of analysis is certainly not always accurate, but it is statistically very relevant, and can greatly affect the usefulness of large datasets. More importantly, when we apply the analysis of these filters to other collected demographic information, the results tend to improve very significantly. When these methodologies are applied to tens of millions of customer records, the aggregate change in revenue can increase tremendously. Advertising to a 13-year-old female versus a 62-year-old man requires a pretty substantial change in advertising content. Simply knowing this is invaluable to a company.

If I was able to successfully extrapolate so much information just from an email address, think of how accurate Facebook can be with all of the information users provide them.

Bayesian Flooding

A look at Facebook's Advertising system.
Click for Full Size

Facebook no doubt uses methods similar to the two described above, but they also enjoy the luxury of collecting large amounts of personal data directly from users - this is exclusively what their service does. In turn, they're able to use the information collected to advertise to their users. If you've ever used their advertising system, you probably know just how powerful it is. The level of granularity and depth one is able to target for advertising purposes is far beyond the scope of Google. And yet this powerful advertising model may still suffer from a rather obvious Achilles' heel. For the time being at least, it appears to depend very heavily upon the honesty of its users.

This is where the experiment I have been working on comes directly into play.

Over the past several months I have entered a myriad of life-events to my Facebook profile using their new Timeline feature. Some of those life-events are true, and some of them are not. In my fictitious life I've explored a dozen different religions, had countless injuries and broken bones, suffered twice through cancer, been married, divorced, fathered children all around the world, and have even fought for numerous foreign militaries.

This is what I refer to as Bayesian Flooding, and to be perfectly honest, it's turned out to be a great deal of fun. My intent was to coin the term within the same sphere as Bayesian Filtering, a common method of filtering junk email by word analysis. Of course both terms pay homage to Thomas Bayes, a mathematician best known for Bayes' Theorem.

The basic formula describing Bayes' Theorem. It depicts the conditional probability of event A given event B has occurred.

Bayes' Theorem is a commonly applied mathematical formula used for calculating the conditional probability of some event given that some additional event has occurred, or that some additional knowledge has been gained. For example, if someone told you they had a nice conversation on a train, the probability it was a woman they spoke with is 50%. If they told you the person they spoke to was going to attend a quilt exhibition, it is far more likely than 50% it is a woman^[1].

The probability of correctly assuming the gender increases because you have gained more information about the original problem. Mathematically speaking, you are now considering the probability that the person is a woman given that you know the person was attending a quilt exhibition; knitting, crocheting, and quilting are more typically associated as being female hobbies. An interesting quirk of Bayes' Theorem is that it heavily relies upon sexism, racism, ageism, and every other type of generalization imaginable in order to draw assumptions. This is not because it is somehow prejudiced, but rather because such categorizations can be shown to be statistically accurate. The key is to have accurate statistics about the topic being generalized. To borrow from the esteemed Sherlock Holmes, "You can never foretell what any one man will do, but you can say with precision what an average number will be up to." This is essentially the backbone of Bayes' Theorem.

When Facebook analyzes my profile and notices that I have participated in a dozen different religions over the past 30 years, their engine should make the assumption that I am interested in theology and various disciplines of spirituality. As a result, they're more likely to serve me ads and recommendations within this realm, perhaps for spiritual books, personal retreats, or the like. For a sizable majority of people, these assumptions and recommendations will be accurate and should result in a better click-through rate and ultimately more revenue. But the truth is that I'm an agnostic atheist and certainly couldn't care less about religious topics.

It might seem like childish anarchy, but there is a legitimate rationale behind wanting to fool the engine. As data analysis becomes more and more detailed (namely due to our world being digitally cataloged), companies are inventing coercive psychological tricks that manipulate consumers into spending more, plain and simple. The products aren't necessary getting better, rather the science of selling the products is. Advertisers argue there are benefits to more efficiently targeting customers, but I believe these benefits fail to acknowledge the downside consumers face. It's simply a matter of knowing far too much about a person while having the singular goal of acquiring their money. If consumer manipulation is harmless, I would have to strongly question why we condemn psychics for applying similar tricks, while at the same time congratulating the business world.

Beyond psychological manipulation, there are also legitimate privacy concerns that need to be taken into consideration, much like the case of the pregnant teen. If advertisements became completely personalized, it would be possible to learn virtually anything about someone just by observing what they were suggested to buy, never mind what they actually bought. Whether or not a teenager should be made to disclose her pregnancy to her father is a matter for a different debate. But I believe it's a pretty unanimous position that Target should not be involved at any level of the discussion.

There are dozens of very large players in this game at the moment, but Facebook and Google are most likely the two best-known. Even people who may not grasp the complexity of personal data collection probably still suspect they're being cataloged - and they're right. But an interesting difference with Facebook versus say, Google, is that I believe their data pool can be distorted without inhibiting one from using their site. If Facebook became the go-to source for private, personal information, and that information was flawed, it would potentially affect all other analysis of the individual as well.

The theoretical advantage a company like Google has is that it would extremely difficult to apply an idea like Bayesian Flooding to their model with any level of practicality. Google's paid advertising is primarily based upon the user's active search query. If you enter a false query, you'll get answers to questions unimportant to you; it would be a futile exercise. Conversely, and when you actually need to search for something, the ad engine would still be just as effective since it runs in real-time. This is especially true of services like Gmail. In order to flood a Gmail account, not only would you have to send non-sensible emails to contacts with some regularity (often referred to in this sense as Bayesian Poisoning), but recipients would have to reply in a similar manner. Of course the only reason one sends email in the first place is to exchange communication and thus the purpose of the tool would again be lost, at least in practical terms.

By contrast, applying Bayesian Flooding to a Facebook profile is quite trivial and in no way inhibits one from still enjoying the many facets of their service. The method only disturbs the advertising and recommendation model, not the actual tool. It is still possible to share photos, exchange stories and ideas, and comment on posts regardless of any superfluous details that happen to be associated with one's profile. What's the real harm if someone on Facebook thinks I spent two years in the Pakistani National Army so long as I can still share photos with them from my recent trip to Canada? With the release of their Timeline system, anybody is free to add such details, regardless of how accurate they are.

Now that Facebook has decided to become a publicly traded company, it seems to me this is a pretty significant detail shareholders are likely to begin questioning. It may even be one of the reasons as to why they have recently become so anxious to get people using their email services; such services are much more complicated to fool (as described via Google above). The more people that begin to use Facebook for day-to-day emailing and chatting, the more accurate and valuable each individual dossier becomes.

Results of the Experiment

Some of my current recommendations from the Facebook robot.

Thus far, my experiment seems to be producing exactly the results I had hypothesized it would. Whatever algorithm(s) Facebook uses to recommend pages is evidently picking up on my colorful assortment of life-changing events. This is a promising start for those interested in reducing what they're worth to Facebook as a human commodity.

Companies are willing to pay for advertisements because ads produce quantitative, measurable results. If one receives an advertisement that is irrelevant to them, the cost of that ad space has been wasted. If the cost of advertising outweighs the return on investment, companies tend to stop advertising. This is an oversimplification of the whole cycle, but illustrates the basic premise.

Of course there are numerous methods Facebook uses to provide recommendations and advertisements to people. Some are based on the 'likes' of Facebook friends, some are based on COOKIES from other sites, and I would imagine that some are even based on internal browsing history. So while it may not be possible to manipulate all of the personal information being collected about you, the Timeline feature can at least be used to manipulate it to some degree - at least per my own experimentation.

Incidentally I've also discovered numerous bugs in their system, most of them related to dates on the Timeline. Although Facebook does not allow you add life-events prior to your birth, they do permit you to include other people in your life-events that occurred before they were born. I'm sure Facebook will eventually fix this, but for the time being I suspect it makes their Bayesian analysis that much more inaccurate.

Algorithmic Corrections

It might be fair to ask if a company like Facebook would be able to adapt to something like Bayesian Flooding. The short answer is that yes, they most definitely would be able to. Data-mining companies can analyze all sorts of patterns illustrating how "normal people" tend to enter personal data. Once normal behavior patterns are established, it is not particularly difficult to flag outliers. Of course what Facebook might do with such outliers requires a bit of speculation.

At worst they could be banned for intentionally misusing the system. However, it seems this would be a little short-sighted as the user still carries a marketable value, perhaps just less of one. Instead I suspect they would be flagged to not receive certain kinds of advertisements thus not wasting advertisers money. After all, the company hosting the ad wants the product to sell as much as the company selling the product does. Why else would the advertising dollars continue to pour in?

If Facebook chose to implement filters to detect this type of Bayesian Flooding, people like myself would simply concoct new ways to further interfere with those filters, perhaps by adding events at a slower rate, or adding events more central to my actual life. Facebook would then try to correct those methods, and so on. This is how technological cat and mouse games get started, similar to fighting SPAM or attempting to prevent piracy. As a technological rule of thumb, every measure invented to curb a certain practice eventually has a counter-measure to circumvent it.

But having written that, there is an upside to those who take to Bayesian Flooding. I very sincerely doubt enough people will be interested in the idea to create a blip on Facebook's radar, much less change anything. That probably means those wishing to partake would find success in the idea.

Conclusion

While I'll definitely keep experimenting with their system, it does seem for the time being that people can directly affect what advertisements they receive simply by flooding their profile. The new Facebook Timeline feature makes this simple and even fun. And though the idea may appear petty to some people, those of us wishing to protect our privacy and avoid being cataloged by corporate America may find it beneficial to our cause.

If you have any comments or thoughts on this process, please feel free to contact me as always.

Benches Built and Engine Almost Ready

May 18th, 2012 | View Post

It's been challenging building out the interior and fixing the engine at the same time, but both seem to be going rather well. I managed to finish the interior bench seats the other day and started playing around with the vinyl, just to see what it would look like. Once I have the seat backs properly designed, Caroline and I will start spraying the foam seating in then stretch the vinyl over it. Unfortunately there is an order of operations to all of this so that the stapled parts are not shown in the final product.

I really can't thank my neighbor, Danny, enough for all of the time he's spent with me working on the engine. He and I finally got the new radiator installed and it looks beautiful. We've also put in a new coolant reservoir, changed all of the belts, changed the radiator hoses, and stripped out a bunch of unnecessary parts. We'll likely have to change out the engine gaskets as well since they appear to be leaking a bit, but not before we get it started back up.

The interior of the bus with both bench seats properly mounted in place.

There were two holes on the roof of the coach. I filled both of them with "Great Stuff" insulating foam sealant (courtesy of Mike Crockett) and cut the excess off. I put the nozzle well into the hole and likely filled a good part of our ceiling with it.

A look at the new radiator in place. I still have to re-install the radiator cooling fan. The metal screen over the radiator is just part of a screen door. We put it in place since we removed the front condenser, thus exposing the radiator to more bugs.

Driver's Side Bench Seating Installed

May 13th, 2012 | View Post

It's taken a ton of planning to get the interior to this part, but it's finally taking full shape. I completed the bench seating frame of the driver's side today and was able to affix the 1/2" ply before the day ended. I wanted to make sure the job was done right so I used proper 2x4 joists for the crossbar support beams. They're probably a little overkill, but at $0.74 each, I figured I could spare it.

I don't have any pictures of it, but Caroline and I purchased some sample foam and black vinyl just to see what it will look like. Needless to say it wraps the seats very well. I'm looking forward to getting there in the next week or two.

You can also see the polyethylene foam (the pink noodle) in one of the photos below. We'll be putting safety foam over all of the edges before wrapping them in vinyl.

A closeup of one of the support joists.

The 2x4 support beams installed in the joists.

The completed driver's side bench frame from the front of the bus.

The completed driver's side bench frame from the back of the bus. You can see the polyethylene foam temporarily wrapped around the door frame.

Removing the Internal A/C Compressor

May 6th, 2012 | View Post

After spending a few days in Kansas City, it was nice to get home on a weekend and put a little time into the bus. I started working on building out the interior when my neighbor, Danny, stopped by to help me out. He had taken out the radiator for me while I was out of town and wanted to know if we could work on the engine some more. Since I'd spent about 3 hours trying to figure out how to trace the contours of the roof onto wood (quite the challenge), I thought it would be nice to have a little change of pace.

Since I had already removed the rear air conditioner from the vehicle, we went ahead and removed the front-cabin a/c compressor just to make more room under the hood. This in turn allowed us to cut a handful of hoses out of the equation as well. Removing the compressor was a little tricky. I first had to remove a few steel plates that protect it. Unfortunately in order to do that, I had to first remove the radiator fan. Once all of that was out, there were four bolts on the underbelly of the compressor. One of them was particularly difficult to reach, but I eventually managed.

We're going to remove the front air conditioning condenser next, just to entirely clear up the front of the van.

A look at the engine after removing the air conditioner compressor.

The old radiator and passenger air conditioner compressor.

Once we finished up and I got back to working on the internals of the bus, I went ahead and installed the first wall and the first row of seats (without padding of course). It's starting to take shape!

The first wall and the first bench installed.

Billings Oklahoma Tornado Enforcement

May 2nd, 2012 | View Post

Over the past few days I have been helping my good friend Dave G. move his life from Houston to Kansas City. After loading his rental truck in Houston, we took I-45 north to Dallas and jumped onto I-35 for the remainder of the trip. It's not the fastest route, but we figured it would be easier to stay on the interstates with a U-haul (technically a Penske).

An image of the massive cell that developed in about 2 hours time. The blue line on the right of the cell is I-35. We were in the immediate path of this system.

We stopped in Norman, Oklahoma around 7pm to grab some dinner and started noticing a rather large cell forming towards Wichita, KS. We figured it would pass before we got that far north. Continuing north on I-35, we started noticing a storm forming in the distance; computer radar confirmed the storm was rapidly expanding to the south. After we saw an actual "StormChaser" vehicle fly past us, we figured there was a pretty legitimate system up ahead. We would later be proven very correct.

As it turns out, an enormous system formed about an hour west of I-35 just south of the Oklahoma and Kansas border. The lightning started getting to the point where there were no breaks in it and the wind was picking up. We turned on a local weather station and evidently EF-2 tornadoes were touching down 20-30 miles northwest of us in the town of Medford. It seemed like a prudent time to pull over. We stopped at a Conoco Station on the SE corner of I-35 and Acre Road, mostly thinking the storm would move due east. Unfortunately the storms soon turned and started heading directly for us. We fueled up and left the truck at the pump to wait out the storm. Dave even adjusted the vehicle so that it was more likely to take the wind head on.

A view from the back of the storm shelter where about 30 of us were packed into a small hallway.

About 30 minutes into this, the place had really started to fill up with people seeking shelter. When the storms finally landed, the attendants started frantically yelling for everyone to get inside and to the storm shelter (really just bathrooms, showers, and storage space). Eventually the first wave of weather ripped over us and produced some of the strongest rain and winds I've ever seen. It was at this point that a Billings police officer walked into the store.

The officer was an older gentleman, probably in his early 60s and a little heavy. Although the rain was coming in sideways and there were reports of tornadoes all around us, he did not seem particularly concerned with anyone's safety. In fact, he started telling everyone in the store that this was a private business and that cars could not be left at the pumps. He said he had no way to gas up his cruiser (despite the outer pumps being vacant). He even went so far as to tell the crowd that if the cars weren't moved, he was going to start calling in license plates. I think most of the people in the store at this point were just dumbfounded. Best we could tell, nobody from the store had made this request; the gas station attendants had been trying to corral people into safety.

I turned to Dave and jokingly said, "Dave, I'd love to say something to this guy. I'd love to put my hand on his shoulder, look him right in the eyes, smile at him and ask, 'Hey, you know how people sometimes think cops are dickheads? Well this is why.' And then just walk away".

When the cop finally went outside, we assumed he was taking off. But in reality, he started slowly driving across the parking lot stopping in front of parked vehicles. I assume he was actually taking down plate numbers, but I have no idea what he could have possibly been doing with them. Dave was concerned about the truck and decided to move it. We spent the remainder of the storm sitting in a giant moving vehicle in front of the store. Fortunately no tornadoes came our way.

Special thanks to the officer in Billings, Oklahoma for reinforcing why I should not trust law enforcement, even in the most disastrous of conditions.