NoseSQL and SenseDB: New paradigms for crowdsourced databases

11 Nov 2011

Introductory note: Given the high risk of being scooped, I've decided to unveil my vision for the future of human computer interaction and crowdsourced databases.  In light of the impending explosion of research in this area, I have deviated from my plans to submit to a proper database conference and have instead chosen to publicly lay claim to these ideas in this post.  Because you'll wonder, I am serious about these ideas and believe there are interesting problems in this space.  I'd even entertain proposals for collaboration.

(Edit: To be clear, I'm kidding about getting scooped.  This isn't my day-to-day research, but I do like the idea.)

Current crowdsourced databases are incapable of answering three major classes of queries.  Database systems such as CrowdDB and Qurk leverage human-powered computation to answer queries that computers cannot typically answer, such as performing complex image classification or processing uncertain or underspecified queries.  These systems are generic but have thus far focused on processing known information about entities in the outside world.  However, to the best of my knowledge, crowdsourced databases have overlooked a large part of the human experience: our senses.  In the remainder of this post, I will outline crowdsourcing extensions that represent an improvement over existing databases: the ability to query over scents, tastes, and tactile sensations.

Olfaction, taste, and touch/texture sensors are immature and are relatively specialized. Computers cannot reliably answer a wide range of pressing questions about raw sensory input and our interactions with the physical world.  Electronic sensors can detect specialized inputs, such as chemical presence (e.g., explosives) and some flavors (e.g., selected features of wine) but, to the best of my knowledge, are not generally applicable or widely available.

Online databases can answer questions about particular sensing domains such as beer and food tasting and musical preferences.  These databases contain knowledge of high-level, narrowly-constrained semantic interpretations of the raw sensory data.  A beer rating is a condensation of multiple factors, many of which are reflections on the beer's taste, nose, and mouthfeel--but the raw taste, scent, and mouthfeel data is not available.

Operating on raw data allows greater query expressivity and insight than operating on a set of features describing the data. We can view preferences regarding senses as functions over the set of raw stimuli in the world.  Without sensory data, it is difficult to infer connections between sensations, such as why we like taste of peanut butter, banana, and bacon in sandwiches, the smell of cucumber and Good & Plentys, and the seemingly culturally universal combination of heat and steam in a sweat lodge.  We cannot easily make connections between even somewhat similar domains.  For example, answering questions about wine and recipe pairings requires either additional cross-domain knowledge (a database of explicit beer and recipe pairings) or lower-level sensory data (what flavors are in each beer?) paired with filters on this data.  These solutions appear similar, however the latter scales to more domains without requiring additional external expert input.

While computers are deficient at answering sense-based queries, thankfully (and by definition), most humans come complete with detectors for all five of our senses.  Employing humans to power general-purpose sensory databases is a natural extension of crowdsourcing technology.  Compared to a specialized mechanical solution such as a chemical-specific detector, a human crowd is more general and likely less expensive than highly-specialized equipment when answering a wide range of queries.  Similarly, humans can be used for both lower-level sensory analysis and broader semantic-level comparisons than narrowly scoped online information aggregation sites.  Accordingly, I propose the development of a crowdsourced sense-oriented database, SenseDB.  This database no doubt needs a query language for user-defined functions, which will consist of embedded DSLs for scent, taste, and touch queries, or NoseSQLFlavorSQL, and FeelSQL, respectively.

Harnessing the power of human-powered sense-based query processing leads to several research questions:

Raw versus semantically-rich data. To what extent does encoding raw sensory data aid in query processing?  Does querying a (logical) database of taste, scent, and touch details provide higher accuracy, speed, or throughput than simply presenting the question to a crowd from a high semantic perspective?  Can we better re-use raw data between queries? Do semantically rich queries impact the bias of the results?

Encoding. How do we encode the sensory details required to answer a query? Which aspects of the sensory experience are required to answer a query? The degree of specificity in formulating a particular query limits the applicability of the results for future queries and analysis. There are many published ISO standards governing sensory analysis (including which tasting glasses to use with olive oil), but applying these standards to a general-purpose crowdsourced query processing system remains an open problem.

Transmission. Sights and sounds can be easily recorded and transmitted for processing, but we lack mechanisms for reliably communicating touch, taste, and smell stimuli.  One option is to use a crowd that is physically co-located with the set of objects to be queried, but this does not scale in the size of the set of objects or in the number of queries.

Non-human processing. Are humans the most efficient computation engine for sensory queries?  Can we humanely use canines or other macrobiotic organisms to process these queries instead?  How does the throughput of a non-human compute engine compare to a human compute engine?  What about queries per second, queries per dollar, or total cost of ownership? Both rats and pigs have been successfully employed in demining scenarios, however the generality of these mechanisms is unclear.

These challenges are only a subset of the problems inherent in developing a sense-oriented query database.  However, given the apparent advantages of ScentDB, I believe the database community will rise to the occasion and take crowdsourcing to the next level, providing valuable insights into the human condition along the way.

I would like to thank Joe Hellerstein, Mike Franklin, the BOOM team, and those explicitly not mentioned here for their feedback on these ideas.
You can follow me on Twitter here.