Search 200M+ papers across arXiv, Semantic Scholar, OpenAlex, CrossRef, and PubMed β all in one place.
discuria brings together tools that researchers actually need β no more jumping between five different apps.
Follow along as papers are read to you with synchronized word highlighting. Absorb complex research at your own pace β pause, rewind, speed up.
Highlight any sentence or concept and leave a note. See what others found confusing or insightful. Build shared understanding together.
Get instant plain-language summaries of papers, sections, or even individual paragraphs. Perfect for quickly assessing relevance.
Have real conversations about papers β not buried in Reddit threads. Upvote insights, reply to comments, and follow expert takes.
Upload any PDF and instantly make it available for annotation and discussion. Share with your lab, your class, or the world.
See what papers the community is buzzing about across every field. Never miss a breakthrough β let the crowd surface what matters.
Upload any PDF and start annotating, discussing, and sharing with the community in seconds.
Supports PDF up to 50 MB Β· Free for all users
Drag & drop your PDF here
or click to browseJump into the fields that matter to you β fresh papers updated daily from arXiv
Real discussions happening right now on trending papers
I tried replicating on A100s β got similar results (~38% reduction). The key is their initialization trick described in Appendix B. Without it, training diverges after 50k steps.
Great catch on Appendix B, Marcus. We're using a similar approach in our upcoming work. Also worth noting that Table 3 has some inconsistencies with the text on page 8 β I left an annotation there.
Thousands of researchers are already reading, annotating, and discussing papers on discuria.
The ablation study in Section 4.2 is really compelling. They show that the modified attention heads reduce compute by 40% without sacrificing perplexity. Has anyone tried reproducing this?