Local computer programmers predict which films will fly, flop
January 24, 2007
The Sundance Film Festival movie catalogue no longer carries the word "riveting" for the first year in the history of the festival, and it might very well have to do with the Park City-based computer statisticians behind the Web site, deconstructingsundance.com.
As a lark, employees of local anti-Internet spam company Unspam Technologies, Inc., wrote a Bayesian algorithm program using the catalogue’s words to predict the winners of the top four 2006 Sundance Film Festival awards. As it turned out, with 100-percent accuracy, the program managed to pick the Grand Jury Prize: Dramatic ("Quinceanera"), the Grand Jury Prize: Documentary ("God Grew Tired of Us") the Audience Award: Dramatic ("Quinceanera"), and the Audience Award: Documentary ("God Grew Tired of Us"), — without seeing a single film.
"Last year we did it, it was almost tongue-and-cheek," says Unspam CEO Matthew Prince. "There was real technology and real math behind it, but I don’t think any of us expected it work, and then when we got the results."
After the Wall Street Journal ran a story about the company’s prophetic calculations, the phone began to ring from around the world: can the company predict earthquakes? The stock market? Can it predict whether this screenplay will be a success?
For now, the company is sticking to Sundance, forecasting this year’s independent film prize-winners will likely be: "The Good Life" or "Grace is Gone" for the dramatic competition, and "For the Bible Tells Me So" or "War Dance" in the documentary category.
The Unspam crystal ball algorithm is based on the last 11 years of Sundance film catalogue data. Unspam uses a 250-point system including where the film premieres, which reviewer writes the film synopsis in the catalogue, what nouns and adjectives are used to describe the film, the photograph used to market the film, the number of producers in the film, and whether or not the film was shot with a digital camera or on 35 mm film.
Recommended Stories For You
The word "sexy" as with the word "riveting" used in a synopsis detracts from a film’s chances of winning a top award, according to the program. Likewise for the words "black," "Africa," "truth" and "world."
The word "sex," on the other hand, is a "word that makes you golden." Changing the utility of the term appears to better a film’s chances it will go home with a prize.
"The reason is, one is an adjective and one is a noun. ‘sex’ is good because it’s a noun. If a film is described as ‘sexy,’ then that’s a big predictor that it’s not going to be successful," Prince says.
"It kind of makes sense. Put yourself in the position of being a Sundance reviewer. If the story is really good, then you can go and just write about the story. Is the film isn’t as good, then you might puff it up with adjectives in order to say ‘oh this film was sexy.’"
Prince stresses the words chosen as duds are not Unspam’s value judgments, the words are chosen simply because they show up with the most frequency in films that ultimately were unsuccessful at winning a prize or films that had very little commercial play beyond the festival.
"We’re not saying those films don’t deserve to be screened. All we’re saying is that the films that are about those topics have a tendency to be less successful in the box office," he confirms.
The inspiration came from a theory, Prince says, he had about the catalogue based on his understanding of real estate listings.
"One of the things they talk about in real estate listings is that you want to be careful of places described as cozy’ or ‘homey’ that are described as ‘spacious’ that have a lot of adjectives that are used to describe them, as opposed to real estate listings that talk about ‘hardwood floors’ or ‘marble countertops’ Homes tend to sell more if they advertise real, tangible things," he explained.
Many of the hundreds of points are superfluous, but a few have been shown to be a good predictors.
According to the program, if a film is a documentary and has too few producers, it will likely be unsuccessful. If a film is a dramatic picture and has too many producers, it will likely be a flop.
If a film is shot on 35 mm film as opposed to Sony HD, it is more likely to become a hit. The Web site gives Sony HD films a 19 percent chance of making it big.
If a film’s catalogue synopsis is written by Sundance Film Festival Director Geoffrey Gilmore or senior programmer Shari Frilot, the film has a very good chance at being a prize-winner, so says the computer.
"Again, it’s quite the opposite of the value judgment," assures Prince. "We’re the most unbiased critics of all, because we haven’t even seen the movie. We’re using statistics of the past to predict what’s going on in the future."
So how about equal opportunity for film?
"The Sundance folks want very much to say every single film that is at the festival has a place here, and I’m not disputing that, I think they do," Prince says. "They don’t want to play favorites at all, but the very act of writing about a film that is going to be commercial is going to inherently have a different description than writing about a film that is not going to be as commercially successful."
Though Prince has yet to receive any feedback from Sundance staffers directly, he says he knows that they’ve heard of deconstructingsundance.com. The absence of "riveting" from the catalogue this year, after Unspam blacklisted it as a term used to describe below-average films, was a good indication. All of the other catalogues he collected over the last decade contained the word.
In the future, catalogue makers may avoid certain terms in an attempt to level the playing field for films, but the festival and the independent filmmaker is no match for the Unspam film algorithm. Some good predictors can’t change like the name of a reviewer like Gilmore, and even if festival programmers sidestep "sexy," or filmmakers forego HD films, Prince says the computer program will account for the change.
And if "The Good Life" or "For the Bible Tells Me So" goes home empty-handed, there’s always next year.
"What’s neat about the system is that it actually learns over time," he explains. "Even if we get it wrong this year, it will learn from this year’s mistakes and get better next year."