Thursday, January 12, 2012

My Music Tastes are 13.3% Terrible, Says Algorithm

Adku engineers devote 20% of their time working on projects of their own choosing. These projects are usually a little bit more removed from our daily work and are a great outlet for chasing down some of our crazier ideas. Earlier, Leah wrote about Mildred, a “20% time” project for visualizing your friends’ Facebook likes. Visualizing data is a great way to intuitively spot patterns and draw some quick conclusions, but the real fun comes in automating that process with machine learning. Today, I’m going to explain how Latent Dirichlet Allocation works and how we used it to draw conclusions about people’s music tastes from their Facebook likes. 

Latent Dirichlet Allocation (LDA) was invented in 2003 primarily for automatically inferring the topics in a text document (the original paper also mentions applications for collaborative filtering). For example, LDA could be used to determine that a bioinformatics scientific paper is about evolutionary biology and computers, or perhaps that a particular court case is concerned with torts and constitutional law. It works by applying statistical analysis on the words in the document. For the bioinformatics paper, LDA would recognize that the document contained words related to evolutionary biology like “genome” and “selection” as well as computer words like “big O” and “algorithm.” 

The basic idea behind LDA is that every document is generated by a simple probabilistic model. Suppose we wanted to randomly generate a document using this probabilistic model. To begin, we select the topic(s) of the document by drawing a green die out of a bag. There are many dice in the bag and each side of a die represents a topic. If we decide that there are 6 topics in all, then each green die will have 6 sides. Drawing a die that is heavily weighted towards the “computer” and “evolutionary biology” topics means that the document will be about computers and evolutionary biology. There are also 6 red dice--one die for each topic. Each side of a red die represents a word. The red die for the computer topic, for example, will be heavily weighted toward words like “RAM” and “processor.”

Here’s how each word in the document is generated. First, the green die is rolled to determine the topic of the word. The red die corresponding to that topic is then rolled to determine the word itself.

An example roll could have the green die giving us the topic “computers.” We then find the “computers” red die and roll it, giving us the word “RAM.” “RAM” is appended to the document. Note that the order of the words generated is not taken into account at all; the document is just a “bag of words.”

All the variables involved in generating documents--the green and red dice, the assortment of dice in the bag--are so-called latent variables. The dice represent multinomials; the bag of dice represents a Dirichlet. In the computer example, we assumed that the latent variables were known to us (i.e. we had dice and bags of dice to draw from). In real life, however, the latent variables are hidden from us. The only information we have are the words that are in each document. The goal of LDA is to figure out the values of the latent variables by working backwards (using Bayesian inference) from a collection of documents, and allocate each word in each document to a particular topic. 

Now that we know a little bit about how LDA works, applying it to Facebook likes is straightforward. Each person represents a document, and that person's likes is analogous to the words in a document. The Mildred visualization indicates that people tend to have more music likes, so we'll just look at music likes for now. This makes it easier to get an intuitive grasp for how good the LDA results are. If the topics recovered by LDA end up matching recognizable music genres, then it's probably doing something right.

The five most common likes for each topic as computed by LDA are shown in the tables below. The number of topics was arbitrarily set at 20. Some of the topics are labelled with a very heavy hand :-).

1. Indie music I (n=869)2. Contemporary Pop (n=709)3. Oldies Pop (n=740)4. Rap and Hip Hop (n=736)5. 90’s (n=643)
RadioheadRihannaPink FloydJay-ZRed Hot Chili Peppers
Sufjan StevensLady GagaThe BeatlesKanye WestIncubus
Bon IverBeyoncéRadioheadLil WayneNirvana
Belle & SebastianKanye WestQueenKid CudiSublime
WilcoKaty PerryMetallicaEminemPearl Jam

6. Pop (n=576)7. Electronic/Trance/House (n=442)8. 00’s (n=759)9. 90’s/00’s I (n=400)10. (n=375)
Lady GagaTiëstoDeath Cab for CutieU2Lady Gaga
Taylor Swiftdeadmau5ColdplayMuseColdplay
Michael JacksonLady GagaThe KillersColdplayJay Chou
BeyoncéDaft PunkMuseWeezerTHE FU
Katy PerryArmin van BuurenWeezerOasisEminem

11. Emotional Male Singers I (n=404)12. Classics (n=700)13. (n=520)14. Rock (n=376)15. (n=276)
John MayerThe BeatlesBob MarleyRed Hot Chili PeppersLady Gaga
Dave Matthews BandBob DylanJack JohnsonAC/DCCarousel
U2Pink FloydJohnny CashAerosmithThe Beatles
Jason MrazQueenBen HarperQueenDream Theater
Jack JohnsonThe DoorsO.A.R.Van HalenJustin Diamond

16. Emotional Male Signers II (n=620)17. 90’s/00’s II (n=872)18. Indie Music II (n=864)19. (n=497)20. Classical (n=459)
Ben FoldsLinkin ParkRadioheadRegina SpektorThe Beatles
ColdplayColdplayArcade FireThe BeatlesBeethoven
Jack JohnsonGreen DayDaft PunkFrank SinatraMozart
Dave Matthews BandNickelbackBeckColdplayChopin
John MayerThe FrayMGMTMichael BubléBach

Table 1. The twenty topics recovered by running LDA on Facebook music likes. Only the five most common likes for each topic are shown. The number of likes classified within each topic is shown in parentheses. 1517 people with a collective total of 11837 music likes were analyzed.

Just for fun, let’s take a look at how LDA classifies my music tastes. I have 15 music likes:

Vampire Weekend, Arctic Monkeys, Death Cab for Cutie, Eric Clapton, Jack Johnson, The Killers, Oasis, Pink Floyd, Postal Service, Radiohead, Third Eye Blind, Weezer, Wilco, Muse and The Beatles. The majority of my likes (9/15) are classified under topic 8: 00’s music. Three of my likes are classified as topic 3, Oldies Pop. I also have two likes in topic 16 and one like in topic 19.

We can also look at Carlos’ music preferences. He likes The Smashing Pumpkins, The Killers, Coldplay, Pearl Jam, The Verve, Weezer, Quartus, Michael Jackson and Bush. Carlos’ likes are evenly split between topics 3, 8 and 1: Oldies Pop, 00’s and Indie Music.

You could imagine several ways of building a simple recommendation engine from these results. A simple one is to use the generative process, just like how a we generated a “computer” document. The music recommendations for Carlos would come from a document that is generated from the Oldies Pop, 00’s and Indie Music topics. These recommendations are explanable. We can tell Carlos that Metallica was recommended because he likes Oldies Pop.

A fully fleshed out recommendation engine sounds like another 20% project, though. Until then, try not to snicker at my music tastes too much!


  1. Singer now a days also do rap and I don't appreciate it a lot. I want the old music way back around 90's it was perfect for me.

    rita ora music

    1. The effectiveness of IEEE Project Domains depends very much on the situation in which they are applied. In order to further improve IEEE Final Year Project Domains practices we need to explicitly describe and utilise our knowledge about software domains of software engineering Final Year Project Domains for CSE technologies. This paper suggests a modelling formalism for supporting systematic reuse of software engineering technologies during planning of software projects and improvement programmes in Final Year Project Centers in Chennai.

      Software management seeks for decision support to identify technologies like JavaScript that meet best the goals and characteristics of a software project or improvement programme. JavaScript Training in Chennai Accessible experiences and repositories that effectively guide that technology selection are still lacking.

      Aim of technology domain analysis is to describe the class of context situations (e.g., kinds of JavaScript software projects) in which a software engineering technology JavaScript Training in Chennai can be applied successfully

  2. We are always interested in the "neglect" of us. But we are "abandoning" who cares about us ...
    Friv | Unblocked | ABCya | Yepi

  3. Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article.thank you for sharing such a great blog with us. expecting for your.

    seo company in india

  4. academic-probation

  5. Finding the time and actual effort to create a superb article like this is great thing. I’ll learn many new stuff right here! Good luck for the next post buddy..
    Java Training in Chennai

  6. يمكنكم الان الحصول على رقم توكيل كاريير لجميع انواع الاجهزة الكهربائية صيانة معتمدة وعلى اعلى مستوى

  7. يمكنكم الان الحصول على رقم توكيل كاريير لجميع انواع الاجهزة الكهربائية صيانة معتمدة وعلى اعلى مستوى

  8. يمكنكم الان من خلال شركة ابادة حشرات بالمنزل التخلص من كافة انواع الحشرات بوسائل امنة وطرق فعالة غير مضرة باسعار مميزة ومتاحة للجميع
    للتواصل معنا

  9. It is just in crises that payday credits prove to be useful. Payday credits likewise safeguard you out of circumstances of ricocheted checks and late installment punishments by influencing the fitting money to progress accessible. Payday Loans Chicago

  10. This was an nice and amazing and the given contents were very useful and the precision has given here is good.
    AWS Training in Chennai

  11. ان الوصول بالموقع الالكترونى الى النتائج الاولى فى محرك البحث افضل شركات تسويق الكترونى فى الرياض بواسطة شركات تسويق مواقع فى السعودية افضل شركة تسويق الكترونى والرياض او غيرها تعتبر افضل شركات التسويق الالكترونى فى السعودية من دول العالم اعطى ميزه كبيره للخدمات شركة موشن جرافيك والمنتجات بأن تتخطى وتخرج خارج الحدود الاقليمية للدولة وتساعد على سرعة الانتشار افضل شركة تصميم مواقع في الامارات بسهولة وتكاليف قليله ,وزيادة عدد الزائرين لموقعك مم يزيد شركات تسويق الكتروني في الامارات من احتمالية بيع الخدمة او المنتج شركات تصميم مواقع في السعوديه الذى يقدمة الموقع حيث ان التسويق الالكترونى يساعد شركات السوشيال ميديا فى السعودية على ظهور نتائج ايجابيه فى فتره قصيره افضل شركة تصميم مواقع فى السعودية وكسب عملاء جدد فى وقت افضل شركة تصميم مواقع فى الرياض وزمن قصير

  12. Thanks for sharing this blog, I am reading your post from the beginning, it was so interesting to read. Visit for
    Web Designing Company in Delhi

  13. Play online casinos with fun and money fun slots Fun money, take while there.

  14. This comment has been removed by the author.

  15. Эксклюзивная лента светодиодная для подсветки дизайнерского освещения и уникальных светильников я обычно беру у Ekodio


  16. And indeed, I’m just always astounded concerning the remarkable things served by you. Some four facts on this page are undeniably the most effective I’ve had.

    cloud computing courses in chennai | cloud computing training in chennai | cloud training in chennai | cloud certification in chennai | cloud computing classes in chennai

  17. Wow, amazing blog layout! How long have you been blogging for? you make blogging look easy. The overall look of your website is fantastic, let alone the content!

    3d animation Company
    Best Chatbot Development Company
    Mobile app development in Coimbatore

  18. I enjoyed it, thanks for posting it, I hope this post of yours will be more appreciated by it really excellent. I do not feel sorry for taking the time to read this post, it is really nice and useful to me, thanks for posting it.
    Games io 2019, Jogos para crianças 2019, Jogos online 360, cá koi mini

  19. This comment has been removed by the author.

  20. Find your favorite sport at BT Sport, home of live sport, to watch all the recent videos, TV catch up, news, outcomes, fixtures and more.

  21. In their running defense, Carolina ought to be greatly enhanced. This year's Panthers should have one of the finest defensive elements of the league, who https://triadbex.comwill be using 3-4 alignment. Gerald McCoy, the largest freelancer team that has signed this off-season, was a six-time Pro Bowler with the Buccaneers during his time. The 31-year-old was inspired to show everyone, after the first Pro Bowl in 2011, that he is still one of the League's first defensive fighting.

  22. interesting.. where else you gonna use that knowledge in your life?

  23. Nice blog! Full of informative ideas. Thank you, keep sharing.
    web design company in chennai

  24. This is Very very nice article. Everyone should read. Thanks for sharing. Don't miss WORLD'S BEST BikeRacingGames

  25. Hello Admin!

    Thanks for the post. It was very interesting and meaningful. I really appreciate it! Keep updating stuffs like this. If you are looking for the Advertising Agency in Chennai / Printing in Chennai , Visit us now..

  26. bookmarked!!, I love your blog! online google pixel display repair

    Good post. I am going through some of these issues as well.. online nokia display repair

    Way cool! Some extremely valid points! I appreciate you writing this post and also the rest of the website is really good. online iphone display repair

  27. Excellent blog you have got here.. It’s hard to find good quality writing like online mi display repair yours these days. I honestly appreciate people like you! Take care!!

    This is a topic that's near to my heart... Cheers! Where are your online lg display repair contact details though?

    I absolutely love your blog.. Very nice online mobile repair marathahalli colors & theme.

  28. I'm highly impressed by the piece of thoughts you have shared on this portal. all the best
    connect us on Assignment Help can shed your burden of assignments with return of qualitative assignments.

  29. Thank you for excellent article.You made an article that is interesting.
    Digital marketing course in Bangalore with Live Projects. 100% placement, 20+ modules, 10+ certifications, Great discounts on course fees.

  30. I have been reading for the past two days about your blogs and topics, still on fetching! Wondering about your words on each line was massively effective. Techno-based information has been fetched in each of your topics. Sure it will enhance and fill the queries of the public needs. Feeling so glad about your article. Thanks…!
    best software testing training in chennai
    best software testing training institute in chennai with placement
    software testing training

    software testing training and placement
    software testing training online
    software testing class
    software testing classes in chennai
    best software testing courses in chennai
    automation testing courses in chennai

  31. keep up the good work. this is an Assam post. this to helpful, i have reading here all post. i am impressed. thank you. this is our digital marketing training center. This is an online certificate course
    digital marketing training in bangalore |


  32. After looking at a number of the articles on your web page, I seriously like your technique of blogging. I book marked it to my bookmark site list and will be checking back in the near future. Take a look at my website as well and let me know what you think.
    onsite mobile repair bangalore
    This site certainly has all the info I wanted concerning this subject and didn’t know who to ask.
    asus display replacement
    There's definately a lot to learn about this topic. I really like all of the points you've made.
    huawei display replacement

  33. You have made some really good points there. I looked on the web for more information about the issue and found most individuals will go along with your views on this web site.
    vivo display replacement
    Good post. I learn something totally new and challenging on blogs I stumbleupon everyday. It's always helpful to read articles from other authors and practice a little something from their sites.
    lg battery replacement
    I blog often and I really thank you for your information. This great article has truly peaked my interest. I will book mark your site and keep checking for new information about once per week. I opted in for your Feed too.
    motorola display replacement

  34. Oscars 2020 Live Stream Red Carpet How To Watch Online. This year the 92nd Academy Awards ceremony will be held from 27 January to 9 February at the

    Dolby Theatre in Hollywood, Los Angeles, California.The best films of 2019 will be awarded by
    the Academy of Motion Picture Arts and Sciences (AMPAS),an international recognition of
    excellence in cinematic achievements. Every big movie star will be there to celebrate the
    very best the movie industry work from the past year.

  35. Super Bowl 2020 Live Stream. If you are searching for how to stream super bowl 2020,then this is the right place to get the detail informations.The 54th Super Bowl is goining to be held on 2nd February which is regarded as one of the biggest events of the country. The game is the culmination of a regular season that begins in the late summer of the previous year.The event of this prestigious championship will take place at Hard Rock Stadium in MIami Garden.We’ve got all the information you need right here..


  36. Enter the latest news about the 2020 Oscars, including nominations, winners

    Oscars 2020 Live
    Oscars 2020 Live Stream
    Oscar Awards 2020 Live

  37. Get complete programming homework without even writing a single word using a Programming help . You will get a chance to connect with an experienced programmer using Programming assignment help online services.

  38. Hello Admin!

    Thanks for the post. It was very interesting and meaningful. I really appreciate it! Keep updating stuffs like this. If you are looking for the Advertising Agency in Chennai | Printing in Chennai , Visit Inoventic Creative Agency Today..

  39. It would have been the happiest moment for you,I mean if we have been waiting for something to happen and when it happens we forgot all hardwork and wait for getting that happened.
    AWS training in chennai | AWS training in annanagar | AWS training in omr | AWS training in porur | AWS training in tambaram | AWS training in velachery

  40. that’s a nice article, thank you for a great article. It helped me a lot. Keep it up Must Visit: Dell Printer Klantendienst

  41. This comment has been removed by the author.

  42. You are in point of fact a just right webmaster. The website loading speed is amazing. It kind of feels that you're doing any distinctive trick. Moreover, The contents are masterpiece. you have done a fantastic activity on this subject!
    Business Analytics Training in Hyderabad | Business Analytics Course in Hyderabad

  43. The content that I normally see is nothing like what you have written. This is very well-thought out and well-planned. You are a unique thinker and bring up great individualized points. Please continue your work.
    Best Data Science training in Mumbai

    Data Science training in Mumbai

  44. I feel really happy to have seen your web page and look forward to so many more entertaining times reading here. Thanks once more for all the details.
    Data Science Training in Hyderabad | Data Science Course in Hyderabad

  45. I am impressed by the information that you have on this blog. It shows how well you understand this subject.
    Data Science Institute in Bangalore

  46. I think I have never seen such blogs ever before that has complete things with all details which I want. So kindly update this ever for us.
    Data Science Course in Bangalore

  47. I was taking a gander at some of your posts on this site and I consider this site is truly informational! Keep setting up..
    Data Science Training in Bangalore

  48. Really Nice Post Admin, Very helpful looking for more posts, Now I have to share some information about How To Fix “Acer Troubleshooting Guide” problem. If you are going through this problem you can simply bellen Acer belgie.

  49. That’s a nice article, thank you for a great article. It helped me a lot. Keep it up Must Visit: Avast bellen belgie

  50. Really Nice Post Admin, Very helpful looking for more posts, Now I have to share some information about How To Fix “Brother Printer Troubleshooting Guide” problem. If you are going through this problem you can simply Contact Brother Printer Belgie.

  51. Thanks for sharing it is important for me. I also searched for that from here. Visit our site Dell ondersteuning.

  52. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
    Data Science Certification in Bangalore

  53. That’s a nice article, thank you for a great article. It helped me a lot. Keep it up Must Visit Epson Printer belgie

  54. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
    Data Science Course in Bangalore

  55. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    Data Science Training in Bangalore

  56. Nice blog. I finally found great post here Very interesting to read this article and very pleased to find this site. Great work!
    Data Science Training in Pune
    Data Science Course in Pune

  57. I have to agree with the valid points you make in your article because I see things like you. Additionally, your content is interesting and really good reading material. Thank you for sharing your talent.
    SAP training in Kolkata
    SAP training Kolkata
    Best SAP training in Kolkata
    SAP course in Kolkata
    SAP training institute Kolkata

  58. hi, your post is very helpful for me. Finally, I found exactly what i want. If need information regarding printers then you can visit our site Xerox Printer contacteren for help.

  59. hi, Your post is very helpful for me, If you want to know more about antivirus then you can visit our site Canon Printer Canon Printer klantenservice belgie for help.

  60. hi, Your post is very helpful for me, finally i found exactly what i want , If you want to know more about antivirus then you can visit our
    Bitdefender ondersteuningsnummer
    for help.

  61. hi, Your post is very helpful for me, finally i found exactly what i want , If you want to know more about antivirus then you can visit Kaspersky Antivirus ondersteuningsnummer for help.

  62. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
    Data Science Course in Bangalore

  63. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
    Data Science Training in Bangalore

  64. Attend The Data Analyst Course From ExcelR. Practical Data Analyst Course Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analyst Course.
    Data Analyst Course

  65. hi, your post is very helpful for me. Finally, I found exactly what i want. here i want to share some information about Norton Antivirus. please visit over this site- Norton ondersteuningsnummer

  66. Thanks for sharing it is important for me. Finally, I found exactly what i want. If need information regarding Antivirus then you can visit our site Avast ondersteuningsnummer for help.

  67. Thanks for sharing it is important for me. Finally, I found exactly what i want. If need information regarding Printer then you can visit our site Hp printer ondersteuningsnummer for help.