Decentralized search engine for the decentralized web

Rationale

We propose an implementation of a decentralized search engine for the Massa decentralized web.

How would it work

When uploading a website, users can be asked to provide a title, a description, and a limited number of keywords as metadata (see https://forum.massa.community/t/community-feedback-proposal-for-improving-website-storage-on-deweb ).

The search engine would work through a submission process: the creator of a website can call the search engine smart contract to declare their website address. This can also be proposed as a grouped service when uploading a site.

When scanning a site, the search engine does the following:

  • read the title and keywords from the website metadata
  • if the website is already indexed, clear the indexes related to it
  • cleanup keywords:
    • canonicalize (lowercase, trim and other normalizations)
    • remove invisible/control characters, whitespaces etc…
    • remove too short / too long keywords
    • ignore common keywords (eg. “the”)
  • insert each keyword in an index with the following datastore entry: [KEYWORD_TAG][keyword][address] -> (no value)
  • also insert them to a local store: [STORE_TAG][address][keyword] -> (no value) to keep a snapshot in case the keywords change on the original site, and also to be able to update a mutable website in case it is updated by its creator

When a user searches a string, do the following in the front-end of the search engine:

  • tokenize and cleanup the words in the searched string
  • query the blockchain to find all keywords matching those words (note: prefix matching is also counted but with a lower matching score)
  • score, sort and display the results by matching rate
  • when displaying, put the title of the website, its description, any matching MNS, and a link to the website

Anyone can submit a website address to ask for a scan/rescan of a given website but cannot cheat by proposing keywords directly, as they are scanned from the website’s metadata.

4 Likes

I agree that we do need a search engine and I would suggest this one to be a collaboration with other crypto projects for added visibility.

I would say that we should also leverage AI as it is the future of search engine.

It will also be difficult to find expert in this domain to make a reliable and performance search engine in my opinion.

For those reasons, I think the search engine should be a collaboration, asking existing proven blockchain search engine to include Massa.

1 Like

I love it. Maybe include something more sophisticated than prefix matching, like DTW.

To our knowledge, nobody has written a truly decentralized search engine. It’s quite a different beast than traditional ones… But if you have any contacts, feel free to invite them to this discussion

Maybe include something more sophisticated than prefix matching, like DTW.

DTW can be implemented for a-posteriori matching score refinement in the front-end. However it is tricky to do in the query phase because the datastore is a binary tree (rocksdb) unless the database saves stuff like all permutations of each keyword. But it would make things quite heavy and complicated.

Any ideas ?