Machine Learning (Theory)


Rexa is live

Rexa is now publicly available. Anyone can create an account and login.

Rexa is similar to Citeseer and Google Scholar in functionality with more emphasis on the use of machine learning for intelligent information extraction. For example, Rexa can automatically display a picture on an author’s homepage when the author is searched for.

14 Comments to “Rexa is live”
  1. Andrew McCallum says:

    Here is more information about the present, future and positioning of Rexa.
    –Andrew McCallum (Rexa Project PI)

    Rexa is a digital library covering the computer science research literature and the people who create it. Rexa is a sibling to CiteSeer, Google Scholar, and the ACM Portal. It’s chief enhancement is that Rexa knows about more first-class, de-duplicated, cross-referenced object types: not only papers and their citation links, but also people, grants, topics—and in the future universities, conferences, journals, research communities, and more.

    Rexa currently provides:
    * Keyword search on over 7 million papers (mostly in computer science)
    * Cross-linked pages for papers, authors, topics and NSF grants
    * Browsing by citations, authors, co-authors, cited authors, citing authors;
    (find who cites you most by clicking “Citing authors” on your home page)
    * Web-2.0-style “tagging” to bookmark papers
    * Automatically-gathered contact info and photos of author’s faces
    * Analysis of research topics, their impact, and how they relate.

    Coming soon:
    * Much improved coverage of recent CS papers (it’s a little weak now)
    * Ability to make corrections to extracted data

    Coming later:
    * Improved extraction and co-reference accuracy
    * Much more data mining
    * Broader coverage of more research fields

    Rather than seeing our siblings as competitors, we believe that such services are like “newspapers for the research community”, and, just as it is tremendously important that there is not just one national newspaper, we think there should be many such services. This is especially true since increasingly they will do more than simply supply raw information, but also provide subjective analysis, pattern discovery, and predictions.

  2. jl says:

    I find the strong emphasis on information extraction useful independent of competitors-or-not.

    1. It’s valuable to push that technology further.
    2. It’s also inevitable that important decisions are made partially based upon the information extracted. Given that, the emphasis on robust information extraction could be broadly useful to all of computer science.
  3. Data Mining says:


    Rexa, which shares a pedigree with Cora from Just Systems, and which provides a similar product to CiteSeer and Google Scholar is now live, according to John Langford’s Machine Learning blog. Andrew McCallum, whom I worked with back at WhizBang,

  4. Shane says:

    Without the ability for corrections it is pretty useless ATM.

  5. AM says:

    Rexa seems quite buggy and incomplete. Fetches wrong papers/pictures with typos in author names.

  6. Andrej Bauer says:

    It conflates me with several other “A. Bauer’s”, and on some papers it even RENAMED me to “Andreas Bauer”. I surely hope nobody makes important decisions based on this. It should say somewhere in big letters “STILL BETA TESTING–DO NOT TRUST”.

    People who open up such systems to public are responsible for putting suitable and VISIBLE disclaimers on their pages.

  7. Charles says:

    Shane: A limited correction ability (to the BibTeX entry only) is available now. The ability to submit more complex corrections (for example, if your paper is misattributed) is something that we’re working very hard on right now. We hope that the current system is still useful to many.

    Andrej: I apologize that some of your papers have been misattributed. The automatic extraction and author merging performed by Rexa has accuracy in the 90s, but inevitably there are errors. Our group has done a lot of research in author coreference (which is the problem of deciding which of the many “A. Bauers” cited across the literature), and some of this has been incorporated into Rexa, but there is much more work to be done. Author coreference is still an interesting and difficult research problem.

  8. Why does one need an account to use it? It seems like this will keep a lot of people from trying it out.

  9. Charles says:

    Aaron: First, some features don’t make sense without requiring a login. Right now, only tagging really requires it, but it will be useful/necessary for other features that we’re working on now (like corrections). Of course, we could allow search-only without a login, and if I understand correctly, we are planning to do this. But the second reason to require logins for now is that we’re trying to ramp up the number of users gradually, as our scalability and coverage improves.

  10. hal says:

    It would be awesome if there were an API to Rexa, especially if one could grok text versions of the papers from it!

  11. M.Shaw says:

    Can u pls send me downloads or rexa sites for evaluation, so that I become better informed??

  12. […] Here’s a blog post from the PI on this project, Andrew McCallum, who details the announcement, and a little more here, from Matthew Hurst’s Data Mining blog. […]

  13. lokesh says:

    Rexa is a good concept to search for research material. We are developing a tool for networking research communities around their research field. The tool complore favours sharing of research work and free access to people.

    You can register and acess freely at

  14. akber says:

    yea i try it its going good and fast

Sorry, the comment form is closed at this time.

Powered by WordPress