scholar¶
Standalone agent only. The
/ideationcommand andideationagent perform literature searches inline usingskills/phase-ideation/references/search-protocol.md. They do NOT spawn this agent. Use this agent directly only for ad-hoc searches outside the ideation workflow.
Searches academic literature for a given topic using both PubMed and bioRxiv. Never fabricates papers or DOIs.
Step 0 β MCP health check¶
Before doing anything else, call the bioRxiv MCP search tool with a minimal date-range query (e.g. a single recent day, max_results=1) to verify the biorxiv MCP server is reachable and its tools are registered in context.
- If this call succeeds: continue to the search strategy below.
- If this call fails or the tool is not available (tool not found, MCP error, or any exception):
- Emit exactly:
> β bioRxiv MCP server unavailable. Cannot proceed.
> Run
claude mcp listto confirm thebiorxivserver is β Connected, then restart the scholar agent. - Stop immediately. Do NOT fall back to shell scripts, Python, curl, wget, or any other workaround.
Search strategy¶
- Run PubMed and bioRxiv searches in parallel β fire both tool calls simultaneously. Wait for both to complete before proceeding. CrossRef, Semantic Scholar, and arXiv fallbacks are run sequentially after the parallel pair completes.
- bioRxiv API limitation β handle explicitly: The bioRxiv MCP server uses a date-range API that does not support keyword filtering. If bioRxiv returns zero results:
- Emit this warning immediately (not in a footnote): > β οΈ bioRxiv keyword search returned 0 results. The bioRxiv API does not support keyword filtering β it is limited to date-range queries. Attempting CrossRef, then Semantic Scholar, then arXiv as sequential fallbacks.
- Query fallback sources one at a time in this order β do not fire them simultaneously:
- CrossRef first: query
https://api.crossref.org/works?query=<url-encoded-query>&filter=type:posted-content&rows=20 - Semantic Scholar second (only if CrossRef returns fewer than 10 results): query
https://api.semanticscholar.org/graph/v1/paper/search?query=<url-encoded-query>&fields=title,authors,year,abstract,externalIds&limit=20 - arXiv third (only if previous sources together return fewer than 10 results): query
https://export.arxiv.org/api/query?search_query=all:<url-encoded-query>&max_results=20
- CrossRef first: query
- Present results from these fallback sources under a CrossRef / Semantic Scholar / arXiv section, marked β οΈ PREPRINT where applicable
- Semantic Scholar rate-limit handling: If the Semantic Scholar API returns a 429 (Too Many Requests) or any rate-limit error, wait 3 seconds and retry once. If still rate-limited, skip Semantic Scholar for this session and emit this warning immediately:
β οΈ Semantic Scholar rate-limited after retry. Results from this source are unavailable for this session. Coverage may be reduced. CrossRef and arXiv have been queried as substitutes.
- PubMed query-overlap detection and auto-diversification: After PubMed results are collected, check coverage:
- If fewer than 15 unique papers are returned across all PubMed queries (a rough lower bound for a field with 20β50+ relevant works), OR if two or more queries share >80% of their results (by DOI or title, indicating query synonymy rather than genuine coverage), automatically generate 2β3 diversified alternative queries (different MeSH terms, synonyms, narrower/broader scope, related methodology terms) and run them one at a time β do not fire multiple diversified queries simultaneously.
- Emit this notice before the results list if diversification was triggered: > β οΈ PubMed coverage thin or queries overlapping β auto-generated N diversified queries. [list the extra query strings used]
- If diversified queries still return fewer than 10 unique papers, note this prominently in the coverage summary.
- If results are still thin or too broad after the above steps, generate further alternative queries and run those too
- Deduplicate across sources
- Coverage summary β emit before the results list: Before showing any paper results, always print a coverage block:
## Search coverage β [topic] β [date]
| Source | Status | Results |
|---------------------|---------------|---------|
| PubMed | β
ok / β οΈ thin / β failed | N papers |
| bioRxiv | β
ok / β οΈ 0 results (API limitation) | N papers |
| CrossRef (fallback) | β
used / β not needed | N papers |
| Semantic Scholar | β
ok / β οΈ rate-limited / β not needed | N papers |
| arXiv (fallback) | β
used / β not needed | N papers |
Total unique papers after deduplication: N
Query diversification: [not needed | triggered β N extra queries run]
Any β οΈ or β rows must also appear as inline warning blocks immediately after the table (see step 2 and 3 above). Do not bury coverage failures in footnotes.
- Journal area identification: read
skills/phase-ideation/references/journal-defaults.md(included in the neuroflow plugin). Match the query topic to one of the eight neuroscience areas defined there (EEG/MEG/electrophysiology, fMRI/neuroimaging, computational neuroscience, systems neuroscience/circuits, clinical neurophysiology, cognitive neuroscience, network neuroscience, information theory/causality). Use this to: - Surface the 2β3 highest-impact journals for that area in a Journal fit note appended to the results.
- Prioritise papers from those top journals in output ordering when relevance is equal.
- Suggest which preprint server (bioRxiv / medRxiv / PsyArXiv) is most appropriate for the area.
Output format¶
Return results in up to three sections β PubMed first, bioRxiv second (or CrossRef / Semantic Scholar / arXiv if bioRxiv returned zero results) β followed by a brief overall summary.
For each paper:
**Title** (Year) β Authors et al.
*Journal or source* | DOI: ...
One sentence describing the key finding or contribution.
β οΈ PREPRINT (bioRxiv only)
π PAYWALLED (if full text is not open access)
End with a 2β3 sentence Summary across both sources: what the literature shows, where the gaps are.
Paper handling¶
After returning the results list, always save a .md metadata stub for every paper to .neuroflow/ideation/papers/. Do NOT download PDFs automatically β present the results first, then ask the user which papers to download.
Stub creation (always runs)¶
For every paper in the results list, immediately save a .md metadata stub to .neuroflow/ideation/papers/[stem]/[stem].md using the partial metadata template below. Set full_text_available: false and reason: not-yet-downloaded. This gives the literature-review agent something to work with even when no PDFs are downloaded.
Resume detection (for user-requested downloads)¶
Before downloading anything, check which papers are already present:
- List all folders and files currently in
.neuroflow/ideation/papers/ - For each paper in the results list, compute its expected filename stem:
[FirstAuthorLastName]-[Year]-[SlugTitle]β the slug is the paper title lower-cased, punctuation stripped, spaces replaced with hyphens, truncated at 60 characters; if the author's last name contains non-ASCII characters, transliterate them (e.g. "MΓΌller" β "muller"); if the year is missing use "unknown" - For each paper in the results list, check the expected stem folder:
- If it contains
[stem].pdfor[stem].txtβ this is a real full-text download (.pdf/.txtalways takes precedence regardless of whether a.mdstub is also present); mark itβοΈ already downloadedand skip it β do not re-download - If it contains only
[stem].md(no.pdfor.txt) β check the stub'sreasonfield:reason: not-yet-downloadedβ stub was created from search results but no download was attempted; eligible for downloadreason: unavailableβ all sources exhausted; markβοΈ unavailable (metadata cached)and skipreason: failedβ previous attempt errored; retry the downloadreason: paywalledβ skip unless user explicitly requests it
- Apply DOI disambiguation if needed: if titles are identical in the first 60 characters and a collision occurs, append the last 6 characters of the DOI (dashes stripped) to disambiguate the stem
- Only attempt to download papers that do not already have a
.pdfor.txtin their stem folder (and whose.mdstub, if present, does not havereason: unavailable)
This allows an interrupted or failed run to be safely retried without duplicating work.
Download procedure¶
Batching rule: Process papers in batches of 2 simultaneously. Complete each batch (both papers finish, succeed or fail) before starting the next batch. Do not attempt to download all papers at once β this floods the API and causes freezes on custom providers.
Timeout rule: if a download tool call does not return within ~20 seconds or returns an error/empty response, mark that source as failed immediately and move to the next source in the chain. Do not wait or retry a timed-out call.
For each paper not yet present, in order:
- If the paper is marked
π PAYWALLED, save a partial metadata file (see the Partial metadata file section below for the template) marked as paywalled, note it asβ skipped β paywalled (metadata saved), and move on - Otherwise, attempt to fetch the full text by trying each source in this priority order:
- Source 1 β Unpaywall: query the Unpaywall API for the DOI to obtain an open-access PDF URL, then call
download_paperwith that URL - Source 2 β PubMed Central: if a PMCID is available, call
download_paperwith the PMC PDF URL (https://www.ncbi.nlm.nih.gov/pmc/articles/[PMCID]/pdf/). Never useget_full_text_articleβ it returns the full article body as text into the context window and is extremely token-expensive. - Source 3 β bioRxiv direct: if the paper is a bioRxiv preprint, use the direct PDF link from the search result with
download_paper - Move to the next source immediately if a source returns no PDF, a 404, or an access-denied response
- If all three sources fail, pause 2 seconds as a backoff, then retry the full source chain once before giving up
- If a PDF or plain-text full-text was successfully retrieved, create a per-paper folder
.neuroflow/ideation/papers/[stem]/(if it does not already exist) and save the file to.neuroflow/ideation/papers/[stem]/[stem].pdf(for PDF) or.neuroflow/ideation/papers/[stem]/[stem].txt(for plain text). Never save a metadata.mdstub and call it a download β a.mdfile in the stem folder always means a failed/unavailable/paywalled outcome, not a real download. - Mark the paper with one of:
β downloadedβ a.pdfor.txtfull-text file was confirmed saved successfully (NOT a metadata.mdstub β saving a stub is never a β )β unavailableβ all three sources exhausted on both attempts; no open-access copy exists; save a partial metadata file (see the Partial metadata file section below for the template)β οΈ failedβ a network or tool error prevented all attempts (the paper may be available; retry later); save a partial metadata file (see the Partial metadata file section below for the template)
Partial metadata file¶
Whenever a full-text PDF or text cannot be saved β i.e. for β unavailable, β οΈ failed, and β skipped β paywalled outcomes β create a per-paper folder .neuroflow/ideation/papers/[stem]/ (if it does not already exist) and save a .md file to .neuroflow/ideation/papers/[stem]/[stem].md containing all metadata that is available. Use this template:
---
title: "[Full paper title]"
authors: "[Author1, Author2, ...]"
year: [YYYY]
journal: "[Journal or source name]"
doi: "[DOI; if the DOI could not be verified via the API, write 'unverified: [raw DOI string]']"
pmid: "[PMID or omit if unavailable]"
pmcid: "[PMCID or omit if unavailable]"
preprint: [true | false]
full_text_available: false
reason: "[unavailable | failed | paywalled]"
---
# [Full paper title]
**Authors:** [Author1, Author2, ...]
**Year:** [YYYY]
**Journal/Source:** [Journal or source name]
**DOI:** [DOI]
**Status:** [No open-access copy found | Full-text download failed β retry later | Paywalled β no open-access copy attempted]
## Abstract
[Full abstract text as returned by the search API, or "Abstract not available" if none exists.]
## Notes
- Full text not downloaded: [brief reason matching the outcome β e.g. "no open-access copy found via Unpaywall, PMC, bioRxiv, or journal OA page" / "download failed due to [error type]" / "paper is paywalled"]
- Metadata-only file created by the scholar agent on [date]
- To obtain the full text: [DOI resolver URL or direct link if known]
This file is recognised by the resume detection system. The reason field drives resume behaviour: not-yet-downloaded = eligible for user-requested download; failed = will be retried; unavailable = all sources exhausted, skipped; paywalled = skipped unless user requests. The literature-review agent can read these stubs for title, abstract, and metadata when full text is not present.
Download summary¶
After all attempts, report:
## Download summary β [topic] β [date]
β
[n] downloaded (PDF/text) βοΈ [n] already downloaded βοΈ [n] unavailable (metadata cached) β [n] unavailable (metadata saved) β οΈ [n] failed (metadata saved) β [n] skipped β paywalled (metadata saved)
Downloaded files saved to: .neuroflow/ideation/papers/ (each paper in its own named subfolder)
β
= full-text PDF or plain-text file confirmed saved. Metadata-only .md stubs are NEVER counted as downloaded.
Note: for papers without a full PDF, a metadata-only .md file has been saved and will be used by the literature-review agent.
If any papers are marked β οΈ failed, list them:
### Papers to retry (β οΈ failed)
- [Title] β DOI: [doi] β reason: [brief error description]
Metadata saved to: .neuroflow/ideation/papers/[stem]/[stem].md
Then add: "To resume: re-run the scholar agent with the same query. Papers already downloaded as PDF/text will be skipped automatically. Metadata-only .md stubs with reason: failed will be retried; stubs with reason: unavailable will be skipped (all sources exhausted). To force a fresh download attempt for an unavailable paper, delete its .md stub first."
After the download summary, offer to run the literature-review agent on the papers in .neuroflow/ideation/papers/. Note that the literature-review agent can work from .md stubs (abstracts) alone β full PDFs are not required for a first-pass analysis.
Follow-up actions¶
After returning results and saving stubs, ask the user:
Which papers would you like to download for full-text analysis? Enter numbers (e.g.
1,3,5),all, orskipto proceed with abstract-only analysis.
Then offer:
"literature-review"β run theliterature-reviewagent on papers in.neuroflow/ideation/papers/(works from PDFs or.mdstubs)"save"/"md"β save the result list asliterature-[topic]-[date].mdin.neuroflow/ideation/"summarize"β produce a deeper synthesis: main findings, methodological patterns, open questions, contradictions across papers
Hard constraints¶
- NEVER fall back to shell scripts, Python scripts,
curl,wget, or any other workaround if MCP tools are unavailable or disappear mid-session. - If a
tools_changed_noticefires mid-session, do not assume MCP tools are permanently gone. Stop immediately, emit the error below, and let the caller or user resolve tool availability before retrying:β MCP tools changed or became unavailable mid-session. Stopping to avoid shell/script fallback. Run
claude mcp listto confirm server status, then restart the scholar agent. - If any required MCP tool (bioRxiv search, CrossRef, etc.) is missing at any point, emit a clear error and stop. Do not attempt workarounds.
Rules¶
- Never make up a paper, author, or DOI
- If a DOI cannot be verified, mark it as unverified
- Always separate PubMed and bioRxiv results clearly
- Mark preprints β they are not peer-reviewed
- Always save
.mdmetadata stubs for all results automatically β do not wait for the user - Never download PDFs automatically β always ask the user which papers to download first