I have just recently joined the INSA of Lyon to work on the European EEXCESS project. More particularly I’ll be working on ensuring that the service will give the best privacy guarantees to its users (but that will be the topic for another post).
Finding content on the web
Before we start describing EEXCESS, let’s just draw a picture of searching on the current web. The web is made of tonnes of content we can search through using our favorite search engine. These engines do their best a trying to put forward the most relevant content based on only a couple of words we give them and expecting the most. How do they do that ? By trying to distinguish useful content from less useful content. How do you evaluate utility ? Well, that’s where Google made a break through in web-searching more than a decade ago. For the science behind it you can read [acp short id=”Brin1998″ author=”Sergey Brin and Lawrence Page” media=”paper” title=”The Anatomy of a Large-Scale Hypertextual Web Search Engine” year=”2008″ /]. Basically, Google uses the Page-rank algorithm to evaluate the “popularity” of a website using some sort of “counting” of the links between sites and pages.
And isn’t the problem solved then ? If everybody had the same information needs, if everybody was interested in popular documents, and if everybody understood how to find documents which may not have been put forward but could still be interested, then it could be so. Now that’s a lot of if’s, and here we are only considering “document”, not “object” or “information” searching… for which I’ll have to right another post. As you understood, the answer is far from a strong yes. First of all (even if its less and less true) what’s good for me isn’t necessarily good for you and search engines tend to evaluate usefulness globally. That’s using absolute metrics where we should see things relatively. Also popularity is interesting, as long as your looking for popular stuff… When your looking for precise and detailed answer there are chances that this is not true. There are even theories out there trying to characterization what’s going on. Some may have heard of the long-tail problem. In recommendation (and search can be seen as some kind of recommendation based on a user context limited to a search query and links as item ranking), items (documents, objects, whatever) tend to be distributed following a power law: some items with very high popularity and lots of items without much feedback (but not meaning that they are not potentially very interesting for particular users. In some sense, we are faced to a system working well for the mass but not for the specific.
So, what is EEXCESS and what are its goals ?
That’s where we come to EEXCESS and its goal. Among the partners of EEXCESS, are European content providers with highly qualified content. Examples are museum object descriptions, precise economical data, scientific papers, etc. To this day, many are not always as visible as they could be for user’s which could be interested in them. The objective of EEXCESS is to push these contents to the users when appropriate. To do so, the EEXCESS platform will provide application extensions allowing to collect information about the user’s activity to better understand the user’s interest and information needs. The user’s profile (who the user is, what his interests are, etc.) and context (what document is he working on, what page is he browsing) will allow to provide documents of very high interest to him and his current need. This will be done through recommendation techniques rather than explicit search from the user: the system will take initiative of suggesting content rather than having the user triggering a search for it.
User profiling ? Isn’t that dangerous ?
As very often, technology is neither good nor bad… all depends on what you do with it. The same goes for user profiling. There is a risk of having detailed information about a user in one place being accesed by non trustworthy people or organizations. But at the other end, a better understanding of a user will also allow for more less strain in finding data appropriate for what each of us are doing. However, it is completely understandable that some people in certain conditions simply don’t want to or can not take this risk. This is why privacy issues are of great concern. And that happens to be the job I’ll be working on within the EEXCESS project: allowing for different privacy policies, ensuring that the EEXCESS system is built in such a way that these policies are respected. To go even further, we are studying ways which would allow to make recommendations as good as if a complete user profile where used but in such a way that only secured small parts of the overall system have access to sensitive data. I won’t say its a tough job, but our goal will be providing the best that is currenlty and maybe even try finding new better ways to do so 🙂
[acp display title=”References” /]