I’m into lists, whether it is bullet points, lists of movies, lists of tasks, lists of goals, lists of restaurants, lists of people, lists or songs, or whatever. The issues I am raising here are threefold:
- How to share lists
- How to maintain lists
- How to get details for items on lists without manually adding that detail
Sure, I can enter my lists in Excel or a text file (or Medium!), but then the list is essentially dead, cast in concrete, with only the detail that I have taken the effort to manually add myself.
I have thought about the Semantic Web and Linked Data, and maybe the solution does lie there, but for now I simply want to capture my requirements and somebody else can ponder the ultimate solution.
For purposes of discussion I’ll focus on one of my list, my list of songs, without intending to make that my primary goal, but simply because most people can relate to it and it does illustrate most of the issues I am trying to raise here.
Ultimately, my conclusion is that I see a need for a List Manager, a software product or service to facilitates the creation, management, and sharing of lists of all sorts, not a specific app for each type of list, but a generic piece of software that can intelligently support any type of list.
I started this list of songs a few years ago when I realized that I was having difficulty remembering the names of songs and bands or performers when I would hear a song in a store or wherever.
I started with an empty Excel spreadsheet and tried to enter in as many of my favorite songs as I could remember. As an aside, I defined favorite as any song that I could probably listen to several times a day and still not get tired of it. There are literally thousands of very popular hits songs that didn’t make my list. It really is supposed to be my best of the best.
Initially I estimated that maybe it would be no more than maybe fifty songs, tops. Talk about a fading memory — my current list has over 800 songs.
Besides my faded memory my first stumbling block was what exactly I should place in the spreadsheet. Obviously the name of the song is the starting point. For many songs their name is sufficient to identify even the performer, although technically a song could be covered by a number of performers. In database parlance we call this a one to many relationship. So, technically, my simple list of song names should be sufficient to identify the song even if the performer might be ambiguous.
Since my fading memory was my main goal I decided to include the primary or initial or main performer or recorder of the song as a second column. In most cases this is simply the name of a band.
Actually, I decided to put the band name in the first column and the song name in the second column, both sorted.
I used Google, YouTube, and Wikipedia to help supplement my faded memory to produce accurate detail for both the song and band/performer names. YouTube recommendations helped me greatly, leading me from my few initial direct memories to related songs which were also floating unidentified in my head. I also found a web site that had Top 100 lists by year — it was tedious, but helped me recall memories that I had completely forgotten about.
One immediate problem was the specific detail of names, such as which bands had “The” as an explicit part of their names. The Beatles. The Rolling Stones. The Byrds. The Moody Blues. The Procul Harum. Oops! No, it’s “Procul Harum”, without the “The.” The next issue there was how to alphabetize names when they started with “The.” I opted to include “The” in the name text but to ignore it when alphabetizing.
I also decided to alphabetize by full name, starting with first name. I suppose last name might be a better choice, but my memory of most performers is of their full first and last names.
I spent a lot of time tracking down band names even when I knew the approximate song name (or at least thought I did.) In a lot of cases it wasn’t so much the amount of time as the surprise factor. I never knew who performed the hit song “Just Walk Away Renée” — it’s The Left Banke. And I spent decades, literally, believing that “Come on Eileen” was performed by “Dixie’s Runners” (that’s what I thought they said on the radio!), but it is in fact “Dexys Midnight Runners.” The amusing thing is that so many YouTube users are just as bad as me and YouTube manages to redirect us to the right song.
The next immediate problem was two difficulties with song names. The first is that songs sometimes are well-known by their hook lines rather than their actual names. Most people know “Teenage Wasteland”, but not “Baba O’Riley.” I decided to stick with the formal, proper song name, but I also have extra columns in which key phrases from lyrics can be entered.
The second problem with song names is that sometimes part of the name is parenthesized, such as “Brandy (You’re a Fine Girl)” by Looking Glass. Wikipedia to the rescue. Just about every popular song, at least on my list, has its own Wikipedia page that gives the full, proper, canonical name of the song, regardless of its popular name. I opted to use that proper name from Wikipedia. Another example is “(Last Night) I Didn’t Get to Sleep at All” by The 5th Dimension, with “The” and “5th” rather than “Fifth.” Now, whether the Wikipedia is actually correct and definitive is a matter beyond my own efforts, but I had to stop somewhere.
Another great example is “In the Year 2525” by Zager and Evans. Actually, as per Wikipedia, the full, proper song name is “In the Year 2525 (Exordium and Terminus)”, but how many people people would recognize “Exordium and Terminus”?
Sometimes some of us can get terribly confused. I always thought that “Kentucky Rain” and “Walking in Memphis” were both songs by Elvis Presley. Okay, I was half right. “Kentucky Rain” was an Elvis hit, but “Walking in Memphis” was actually written and performed by some dude named Marc Cohn who I had never heard of before searching for the song name in YouTube, Google, and Wikipedia. In fact, I spent a fair amount of time under the assumption that he must have been performing a cover of an Elvis song, but Wikipedia set me straight. YouTube will get you to the song even if you search for “Elvis Presley Walking in Memphis.” There are even YouTube videos for the song labeled as being Elvis, but they are all Marc Cohn. So, I wasn’t the only confused person there.
Another song that had me a little confused was “Under Pressure” by David Bowie — or so I thought. A quick search in YouTube finds the song, but it also finds the song as performed by Queen. I figured Queen had covered Bowie’s song, but… a quick trip to the Wikipedia showed how wrong I was. The original recording was by Queen, but “featured” David Bowie on vocals along with Freddie Mercury of Queen. Both Queen and Bowie have performed the song separately since that original recording. They have also performed the song together live. On my list I have the song listed twice, under both David Bowie and Queen.
I was tempted to include the URL for each song from YouTube, but that seemed like an unsolvable problem from my perspective. Just about every popular song has a number of YouTube renditions ranging from multiple copies of the mass-produced record to multiple live performances. Some have actual video while others simply show the album cover or have a slideshow, all for the same audio track. Some videos have lyrics as well, while others do not. Personally, I prefer the studio masters, but sometimes the live performance really is more memorable. In quite a few cases I discovered that even I didn’t have a single best preference for the performance of a given song. Tough problem. So I punted on that. In fact, that’s one of the motivations for this essay — to illuminate unsolved problems and seek to enlist others to address them. Ultimately, my list might want to have two separate YouTube columns, one for studio and one for live. Or, who knows how many columns. Lyrics alone are an issue to address.
After deciding to focus on just those two main columns of band and song name I briefly considered other details, like year, running time, country, name of vocalist, name of songwriter, etc. Even lyrics (or at least a URL to a lyric web page.) But it all started to make my head spin, so I punted.
So, here I am, with an Excel spreadsheet of 833 songs, sitting on my computer, with no convenient, easy way to share that list, nor an easy way to get all the detail of a given song other than copy-pasting the song name into Google and finding the Wikipedia page, or pasting the song name into YouTube and scanning through a long list of variations to find the optimal recording of the song.
Of course there is always iTunes, but even iTunes doesn’t have all songs and their videos and lyrics and live performances. Besides, my goal is easily shared lists that are not held captive by any vendor.
Besides songs, the obvious lists that pop into mind, for me, are:
- Travel destinations
- To Do
- Christmas shopping
- Short lists (of whatever — extracts from larger, more formal lists)
- Bookmarks, lists of web site URLs but preferably English phrases that can be intelligently looked up
- Glossaries of terms of interest
By people, I was personally thinking of wanting to replicate LinkedIn using lists that individual users would maintain rather than being dependent on the walled garden that LinkedIn itself uses today.
There are plenty of other people lists:
- Celebrities — even separate lists by categories such as sports, politics, science, and movies
- Friends — such as for personal events
- Contacts — such as affinity groups
- Mailing lists — but of course there are plenty of sophisticated software packages and services for mailing lists
Again, these are only some illustrative examples of possible lists and the goal is not that each list type needs its own special software implementation, but that some common software technology can be designed and implemented to support lists in general, and then templates can be designed for each specific type of list.
The first requirement for sharing is that there has to be a place and format for sharing.
The second requirement is agreement on the specific data fields required.
In my case, I settled on exact band name and exact song name, but that won’t work with everybody.
The goal is to support a variety of fields and to do intelligent matching, rather than having some formal exact string for a database ID field.
For example, one user might have “The Beatles” while another just has “Beatles.” And one user may have the full, formal song name, just a phrase from the song name, or the hook line or some key phrase from the lyrics. For example, the lyric phrase “How can people be so heartless” should be able to select the song “Easy to be Hard” by Three Dog Night.
The user should be able to maintain their lists in their own terms, and then the underlying list manager can intelligently fill in the blanks.
In some cases the year might be a reasonable way to narrow down a song name than the band who happened to cover it in a given year.
Or maybe the combination of the band name and a year will indicate the specific hit song.
And there are certainly plenty of one-trick pony songs where the band name alone is a solid indicator of the desired song. The group “The Original Caste” had exactly one hit, and there was exactly one group who scored a hit with “One Tin Soldier.”
Confusion about names is a really big deal. YouTube does a reasonable job, so any list manager needs comparable capabilities.
Each user should be able to sort lists by whatever criteria they desire, independently from how the list was originally authored. My list is sorted by Band, but year recorded is an equal reasonable sort criteria.
Besides raw sharing of lists, the first thing I want to do is be able to quickly compare two lists and quickly answer these questions:
- What percentage is in common and what percentage is different?
- Which songs in common?
- Which songs are on my list and not some other list?
- Which songs are on some other list but not my list?
I’d also like to be able to specify a list of users and treat their separate lists as one combined list. But how to combine those lists is an open question. Give me merge options such as:
- The union of all songs on all lists
- Only songs that are on at least two (or three) of the lists
- Only songs that are on at least half the lists
- Only songs that are on all but two or three of the lists
- Only songs that are on at least 25% of the lists
- Only songs that are on at least 75% of the lists
- List subgroups of the users based on how much commonality they have
Beyond simply referring to the target lists alone, refer to the combination of my list and all or groups of the target list.
Users should be able to join affinity groups and have their personal lists merged by jointly agreed rules, chosen from the above.
A user should be able to re-base their own list based on some target list. This would effectively eliminate the common entries from the user’s list so that their own list would simply be the differences from the larget list — add entries from their own list and delete entries that were on the target list but not their own. The user could then periodically check their list and see what entries the target may have added that they are missing out on and then add them as their own, either selectively or in bulk. They could even add in bulk and then selectively remove them.
There would be a distinction here between the visible list for a user and the actual underlying list so that the underlying entries could always be retrieved even if the target list went away or was radically changed in some intolerable manner.
In essence, the lists should be a set or sequence of rules and operations that have the effect of producing a final visible list but capable or dynamically changing as the underlying target lists change.
And of course there are privacy and security requirements as with any data, but I am not adding to such normal requirements here.
It would be nice to register an interest in somebody’s list and then get an email or app alert when that list changes. And when alerts are set on multiple lists of the same type, the alert should merge the changes into a single alert, but detail which list each change came from.
Social Media Integration
Beyond just the raw lists, there are obvious opportunities for building social media communities, whether through simply making comments on lists and entries in lists, voting on favorites, or any of the other popular techniques used by the popular social media platforms of today.
And I would hope that platforms such as YouTube would want to access and exploit these lists as well.
Managing and Viewing my Lists
The initial requirement is to specify the key fields for each list entry, such as band and song name, or maybe year.
A key requirement is data validation. Immediately upon entering a value in a field the software should be able to validate the input. This should include spelling correction and auto-suggest so that the user can instantly select the desired data. For example, keying “beat” in the band column will highlight “The Beatles” as a top choice. Ditto for song names.
Heuristics will be needed for matching, both for simple partial literal matches and fuzzy and phonetic matches, such as “Dixie’s” and “Dexys.”
Raw validation will be required as well, such as when a existing list is imported from a spreadsheet or a CSV file. An interactive popup can then prompt the user as to exactly which entry is desired.
All of this means that each column of data must be semantically linked to an authoritative data source. In my case, band names and song names. To the best of my knowledge, despite the breadth and depth of detail in Wikipedia and YouTube, there are no publicly available sources for band names and song names.
Once the user’s key columns have been populated and validated, it should be an easier problem to “join” the user’s list with any number of data sources to produce displays at any level of detail. For example, the user could select that they want “genre” for each song. The genre should be available either directly in the Wikipedia entry or in some more technical raw data source that uses song name as a key.
This raises the question of what the canonical key should be for any given entity. I would still go my song name, but in some or many cases it might be necessary to map the external value (name) to some internal, ID-like key value. I mean, each of the many data columns for entities like songs should not need the full cannonical song name. Or, maybe they should. The Wikipedia URL for a song page might be a reasonable compromise for the resource ID for a song.
I’m still torn over whether my song list should be simply the song or performances of the song. Ultimately, I can see that individuals might want to do either, and compare and merge operations should be able to accommodate both. For example, one person could create a list of songs by name alone and then wish to pick up live performances from the lists of others.
Guess What Kind of List This Is
A naive solution would require that the user declare what kind of list they are creating — its category. But I’d prefer an intelligent solution that lets me just start typing in names and phrases and statistically guesses the semantic type of my entries. This semantic guessing should occur for both the list overall and each column or field of the list.
Sure, there may be semantic ambiguity, but that can be readily resolved by just letting the user select from a list of the alternatives or to simply wait for more data that resolves the ambiguity. The user could of course manually resolve the ambiguity at any moment of their choosing.
The main thing I want to avoid is turning this into a stifling knowledge engineering exercise. I want the experience to be more like Excel and less like SQL.
Why not just use Excel?
Sure, Excel is an easy to use tool and a great stopgap measure, but is lacking in the core capabilities of a decent list manager:
- No semantic knowledge of the type of list or types of columns
- No collaborative features
- No social media features
- Nothing in the way of intelligence
- Nothing to help you create initial list or expand your horizons
In short, Excel is just too dumb. It’s a great tool for what it does — managing dumb lists and dumb formulas, but dumb just doesn’t cut it for a decent list manager.
Entity Extraction in a Box
I didn’t want to get too technical here, but did want to point out that a lot of the intelligence for the list manager comes from being able to do what the technical experts call entity extraction on the simple strings of text that a user enters for the value of a column or field of an entry in a list. This primarily means recognizing names of people, places, and things, as well as concepts in general. Ultimately, the value in each cell of the list is a reference to some entity. The literal value entered by the user may be a proper, full name of an entity or a nick name, abbreviation, or characterization that happens to narrow the possible entities to a reasonable small list.
Ideally, entity extraction would be done on a single cell (column of a row) of the list, but a fair amount of the impressive intelligence of the list manager will come from correlating between multiple cells of each row of the list, my song name and band name being an example.
Searchable text and queryable databases
A key goal of the list manager proposed by this paper is that it is a seamless hybrid of text and databases.
It should be just as easy to browse and search a list using simple text keywords as it is to construct formal SQL queries.
As with a basic spreadsheet application or table in a web page, the user should be trivially able to sort a list by any column or field.
A list could of course be easily exported to a spreadsheet or inserted into a database table. Although, a key design goal is that the vast majority of users should never have to do so since many of the features of a spreadsheet and database should already be more easily available in the list manager.
A taxonomy is a fairly sophisticated form of list, especially with its sense of hierarchy, but is a list nonetheless.
The proposal of this paper would be a natural fit for taxonomies as well.
An ontology is not so clearly an obvious fit for the list manager proposed in this paper, but there may well be a relatively natural extension from basic lists to the world of ontologies, especially once the hierarchical needs of taxonomies are incorporated.
It was not my intent to completely specify all detail of requirements for a decent list manager, but simply to provide an overall outline and enough hints to inspire somebody else (or a bunch of somebody elses) to flesh out the detail and proceed to implementations that I and everybody else can actually use.
And it was not my goal to delve into exactly what types of technology could or should be used for implementation. Maybe the Semantic Web and RDF would be appropriate… or maybe not.