Abstract
This paper studies research allocation in economics: how to identify questions that are neglected enough to be open, supported enough to be credible, and concrete enough to become a paper. I build a claim graph from a field-weighted citation impact selected published-journal corpus of 242,595 papers spanning 1976-2026. Nodes are extracted concepts, while links summarize claim-like relations already observed in papers. Missing links become candidate next papers. I rank them with a graph-based score that combines path support, underexploration gaps, motif support, and hub penalties, and I evaluate the ranking prospectively using leakage-controlled vintage backtests over 3-, 5-, 10-, and 15-year horizons. The main benchmark is preferential attachment, a cumulative-advantage rule that favors links between already central concepts. At the strict shortlist benchmark, preferential attachment remains difficult to beat. But that tight-rank headline is too narrow to settle the research-allocation question. Once the reading budget widens, the graph-based rule becomes more competitive; weighting future links by later reuse changes the scale of the benchmark but does not overturn it; and the favorable cases are concentrated in adjacent journals, design-based causal slices, and path-rich parts of the literature. A complementary path-evolution audit suggests that the literature often adds mediating structure around existing direct claims more often than it closes a missing direct link already implied by local paths. Alongside the empirical exercise, the paper introduces frontiergraph.com as a public interface for browsing candidate questions and inspecting the graph structure behind them.
Keywords. research allocation; economics of science; knowledge graphs; preferential attachment; AI-assisted discovery
1. Introduction
Choosing what to work on is one of the least formalized decisions in economics. We have disciplined frameworks for identification, estimation, and inference, but much less for the upstream choice of which question deserves scarce attention in the first place. Bloom et al. (2020) argue that ideas are getting harder to find, while Jones (2009) emphasizes the growing knowledge burden faced by new researchers. Those arguments point in the same direction: the frontier becomes harder to navigate even as the stock of published work keeps growing.
That problem becomes sharper, not weaker, when AI lowers the cost of adjacent research tasks. Systems such as Project APE are explicitly testing whether parts of the paper-production cycle can be automated. Tools such as Refine, venue-level reviewer guidance such as the ICLR 2026 Reviewer Guide, and agentic review prototypes such as Stanford Agentic Reviewer all point toward the same institutional change: drafting, review assistance, and iterative revision are becoming cheaper. If production and review become less scarce, the bottleneck shifts toward a different margin. The question is no longer only how to write or review a paper more cheaply. It is how to decide where the next paper should go.
This paper studies that margin. The core idea is to treat candidate next papers as missing links in a graph of claim-like relations already present in the literature. Suppose the literature already contains links such as public debt -> public investment and public investment -> CO2 emissions, but the direct relation public debt -> CO2 emissions has not yet appeared. That missing direct link is a concrete candidate question. More generally, the object is not generic semantic similarity and not a free-form brainstorming system. It is a structured search for missing relations that look plausible given nearby literature and underworked given the current graph. In this paper, should is therefore used in a narrower operational sense than a welfare theorem: a question should be worked on next when it is neglected enough to remain open, supported enough to be credible, concrete enough to become a paper, and better read at realistic fixed attention budgets than at a winner-take-all top rank. This paper studies that margin and offers a tool to inspect it in practice at frontiergraph.com.
That graphical view is useful because it preserves more of the local logic of scientific development than keyword overlap or raw citation counts alone. It lets us see whether a putative question is supported by short paths, repeated motifs, and nearby mediating concepts, and whether the direct link itself still looks thin relative to its neighborhood. The framework is intentionally modest. It is a discovery aid, not proof of importance; a prospective ranking exercise, not a welfare theorem; and a graph of extracted claim relations, not a full adjudication of causal truth from complete papers.
The empirical design starts from a field-weighted citation impact selected corpus of top core and adjacent journals. The selected sample contains 242,595 papers from 1976-2026, of which 230,929 contain at least one extracted edge and 230,479 survive into the normalized graph used in evaluation. I build that graph from the paper-level extraction framework in Garg and Fetzer (2025), then distinguish between directed causal links and undirected contextual support inside a single graph object. Missing directed links are ranked by a graph-based score built from path support, underexploration gaps, motif support, and hub penalties. I then freeze the graph at year t-1, rank candidates, and test whether those links first appear over 3-, 5-, 10-, and 15-year horizons.
The headline result is mixed and therefore informative. Preferential attachment remains a serious benchmark and still wins in the pooled rolling benchmark at very tight shortlists. In concrete terms, a 100-paper shortlist built from preferential attachment retrieves roughly 2.6, 3.3, 7.0, and 10.0 more realized directed links than the graph score at h=3,5,10,15. But that is not the end of the story. Once the reading budget is allowed to expand beyond the strict top 100, the graph-based score becomes more competitive. The newer heterogeneity results also suggest that pooled averages hide meaningful variation across journals, methods, and parts of the literature. A separate path-evolution exercise points to a second pattern: research often builds mediating structure around existing direct claims more often than it closes a direct link already implied by local paths.
2. Related Literature and Positioning
This paper sits at the intersection of four literatures.
First, it belongs to the economics of ideas and discovery. Bloom et al. (2020) document rising research effort alongside falling research productivity across several domains, while Jones (2009) studies how accumulating knowledge changes the organization of innovative activity. The present paper shifts attention to a narrower but operationally central problem: given a large existing literature, how should one screen candidate next questions?
Second, the paper draws on the science-of-science literature that uses large-scale scientific data to study novelty, impact, and frontier formation. Fortunato et al. (2018) provide a broad synthesis. Wang and Barabasi (2021) show how scientific frontiers can be studied quantitatively, while Uzzi et al. (2013) show how novelty often combines conventional structure with a limited number of atypical combinations. That literature is highly relevant in spirit, but my object differs. I do not measure novelty from citations or reference-pair combinations. I define candidate next papers as missing links in a claim graph and evaluate them prospectively.
Third, the benchmark logic comes from network growth and cumulative advantage. Price (1976) and Barabasi and Albert (1999) show why already connected nodes tend to attract more links. For this project, that is the main null. If the future of the literature is mostly a popularity process, then a rich-get-richer rule should perform well when the target is future edge appearance.
Fourth, the paper enters a fast-moving literature and policy discussion around AI-assisted scientific work. Some recent systems aim at hypothesis generation, literature synthesis, or general scientific assistance. Others are already productized around manuscript feedback, reviewer assistance, or automated evaluation workflows. My contribution is not a new general-purpose AI assistant and not a new claim-extraction model. The paper uses AI-extracted paper-level structure as an enabling layer, then asks an economics question: can we convert that structure into an inspectable, prospectively testable research-allocation object?
3. Corpus, Paper-Local Extraction, and Node Normalization
The paper starts from a published-journal corpus rather than a broad scrape of all economics-adjacent writing. The selected journal corpus contains 242,595 papers drawn from the top 150 core economics journals and the top 150 adjacent journals under the field-weighted citation impact selection rule. The sample spans 1976-2026. Of those papers, 230,929 contain at least one extracted edge, yielding 1,443,407 raw extracted edges. After normalization and graph construction, the evaluation graph retains 230,479 papers, 6,752 concept codes, and 1,271,014 normalized links.
3.1 Corpus definition
The paper uses the published-journal corpus because the goal is to study realized scientific structure in an economics-facing literature, not to optimize coverage of working papers, drafts, or preprints. This choice is conservative. It sacrifices some freshness in exchange for clearer journal control, more stable metadata, and a graph that is easier to interpret as realized economics research rather than a noisy mix of partially filtered text. OpenAlex provides the work-level metadata, journal metadata, and journal assignments used to define that bibliographic layer.
flowchart LR
subgraph P1["Within papers"]
PA["Paper A"]
A1["Public debt"] --> A2["Public investment"]
PB["Paper B"]
B1["Public investment"] --> B2["CO2 emissions"]
end
P1 --> P2
subgraph P2["Across papers"]
C1["Public debt"] --> C2["Public investment"]
C2 --> C3["CO2 emissions"]
C4["Repeated labels become<br/>shared concepts"]
end
P2 --> P3
subgraph P3["Candidate question"]
D1["Public debt"] --> D2["Public investment"]
D2 --> D3["CO2 emissions"]
D1 -. missing direct link .-> D3
end
Notes. Figure 1 shows the core measurement pipeline as a worked example. The unit of observation at extraction is the individual paper title and abstract. Each paper first produces a paper-local graph, repeated concept labels are then matched into shared concept identities across papers, and the resulting concept-level structure can surface a missing direct relation as a candidate question.
3.2 Paper-local research graphs
The extraction layer builds on Garg and Fetzer (2025). Each title and abstract is converted into a paper-local graph in which nodes correspond to extracted concepts and edges summarize the relations the paper itself states, studies, or reports. The present paper inherits that idea but extends it in three ways that matter downstream. First, the schema is broader than explicit causal claims, because the evaluation also needs undirected contextual support. Second, the schema separates the paper’s causal presentation from the evidence method used to support a claim. Third, the local graph stores contextual qualifiers in dedicated fields rather than forcing them into the node label. Exact prompts, the full schema, and the design logic are reported in Appendix A. The code and prompt files used in this paper will be released at github.com/[repository-to-be-inserted-at-release].
3.3 Concept identity and node normalization
The normalization problem is central in this paper because candidate generation, path counts, gap measures, and missingness all depend on node identity. Paper-local concept strings vary in wording, scope, and granularity. The paper therefore builds a native concept ontology and combines deterministic lexical matching with an embedding-based retrieval step for harder cases, while preserving mapping provenance and quality bands. In practice, the embedding layer is used only after exact and signature-based passes, so it helps rank plausible matches rather than silently forcing all strings together. Appendix B gives the full algorithmic detail.
| Symbol | Definition |
|---|---|
G_(t-1)=(V,E_(t-1)) | Claim graph assembled from papers observed through year t-1. |
u,v,w | Normalized concept nodes in the ontology-backed graph. |
u -> v | Directed causal link or directed causal candidate. |
{u,v} | Undirected noncausal pair. |
h | Evaluation horizon in years. |
K | Shortlist size in the fixed-budget retrieval problem. |
4. Candidate Questions and Evaluation Design
The framework is simple. At year t-1, let G_(t-1)=(V,E_(t-1)) denote the claim graph assembled from papers observed through that date. A candidate next paper is represented as a missing link in that graph. For a directed causal candidate, u -> v is eligible when that ordered directed link has not yet appeared in the historical graph. For an undirected noncausal candidate, {u,v} is eligible when the pair has not yet appeared as undirected support. The headline object in this paper is the directed causal candidate.
4.1 Missing links as candidate questions
One way to read the novelty and frontier literatures is that many advances come from combinations or connections that were not yet explicit in the recorded structure of a field. The representation used here takes that intuition in a narrow form. The point is not that every paper can be reduced to one edge. The point is that many research moves can be approximated as the appearance of a relation that was locally plausible before it was explicit. That lets us define a narrow, prospectively testable object.
4.2 Gap and boundary questions
Gap questions already have rich nearby support but remain directly underworked. Boundary questions connect areas that still have little direct traffic between them.
flowchart LR
subgraph G1["Gap-like question"]
PD["Public debt"] --> PI["Public investment"]
PI --> CO2["CO2 emissions"]
PD --> FD["Financial development"]
FD --> CO2
PD -. missing direct link .-> CO2
end
subgraph G2["Boundary-like question"]
TL["Trade liberalisation"] --> EG["Economic growth"]
TL --> FDI["FDI"]
EG --> EC["Energy consumption"]
TL -. thinner missing bridge .-> EC
end
Notes. Figure 2 contrasts two candidate objects. The left panel is a local gap: nearby support is already dense, but the direct relation remains missing. The right panel is a thinner boundary question: the two end concepts are connected only by sparse bridges. The node labels are lightly cleaned versions of surfaced economics-facing examples.
flowchart LR
subgraph L["path -> direct"]
L1["Public debt"] --> L2["Public investment"]
L2 --> L3["CO2 emissions"]
L1 -. missing direct link .-> L3
LX["Local path first;<br/>missing direct link second."]
end
subgraph R["direct -> path"]
R1["Monetary policy"] --> R3["Energy consumption"]
R1 --> R2["Output"]
R2 --> R3
RX["Direct link first;<br/>mediator path later."]
end
Notes. Figure 3 fixes the core candidate object. A local path such as u -> w -> v can nominate a missing direct link u -> v as a candidate next paper. But later work can also add mediator paths around an existing direct relation rather than closing a missing direct link. The main backtest focuses on the left-hand object; the later path-evolution section returns to the right-hand pattern.
4.3 How the score reads the graph
The ranking rule combines four ingredients: path support, underexploration gap, motif support, and hub penalty. Path support asks whether the two endpoint concepts are already connected by short routes through nearby mediators. The gap term asks whether those routes exist even though the direct relation itself is still absent or thin. Motif support asks whether the same endpoint pair keeps reappearing inside nearby structural patterns rather than in only one fragile corner of the graph. The hub penalty moves in the opposite direction: it reduces scores that are high only because both endpoints are extremely generic, heavily connected concepts.
flowchart TB
subgraph A["Path support"]
A1["Debt"] --> A2["Invest."]
A2 --> A3["CO2"]
AT["Short routes already<br/>connect the endpoints."]
end
subgraph B["Underexploration gap"]
B1["Debt"] --> B2["Invest."]
B2 --> B3["CO2"]
B1 -. missing .-> B3
BT["The direct link is still<br/>missing despite support."]
end
subgraph C["Motif support"]
C1["Debt"] --> C2["Invest."]
C2 --> C4["CO2"]
C1 --> C3["Finance"]
C3 --> C4
CT["More than one local<br/>pattern points the same way."]
end
subgraph D["Hub penalty"]
D1["Output"] -. discounted .-> D2["Growth"]
DT["Very generic hubs should not<br/>dominate every shortlist."]
end
Notes. Figure 4 breaks the score into four local graph features. Path support asks whether short routes already connect the endpoints. The underexploration gap asks whether the direct relation is still missing despite that support. Motif support asks whether more than one nearby pattern points toward the same endpoint pair. The hub penalty discounts pairs that would rank highly only because both concepts are very generic and heavily connected.
In the implementation used for this paper,
s(u,v)=alpha * P_tilde(u,v) + beta * G(u,v) + gamma * M_tilde(u,v) - delta * H_tilde(u,v).
4.4 Prospective evaluation
The prospective design freezes the graph at year t-1, ranks candidates using only information available at that date, and then checks whether those links first appear over the evaluation horizon. In other words, the score at t-1 is not allowed to borrow future edges, future degrees, or later realized papers. A cutoff is eligible for horizon h only if t+h <= 2026.
Preferential attachment as benchmark. Preferential attachment scores a candidate ordered pair by source out-degree times target in-degree: PA(u,v)=d_out(u) * d_in(v).
Fixed-budget retrieval. The evaluation problem is a scarce-attention problem. That is why the paper emphasizes Recall@100, other fixed-budget shortlist measures, and frontier-style comparisons over larger K.
Horizon choice. The main horizons are 3, 5, 10, and 15 years because they correspond to distinct practical windows. Twenty years remains an appendix extension.
5. What the Benchmark Shows
5.1 Popularity at the strict shortlist
Preferential attachment remains stronger than the graph-based score at the strict top-100 margin across all four main horizons. The easiest way to read the magnitude is in future links captured inside a 100-paper shortlist: preferential attachment places about 8.3, 12.0, 23.3, and 36.3 future directed links inside the top 100 at h=3,5,10,15, while the graph-based score places about 5.7, 8.7, 16.3, and 26.3. So the popularity benchmark buys roughly 2.6, 3.3, 7.0, and 10.0 extra realized directed links inside the same 100-paper reading list. Put differently, preferential attachment retrieves roughly 40 percent more realized directed links than the graph score, depending on the horizon. The normalized Recall@100 and MRR statistics tell the same story.

Notes. The left panel reports Recall@100 and the right panel reports mean reciprocal rank. Each bar is the mean across eligible rolling cutoffs for a given horizon, with bootstrap confidence intervals. Higher values mean better prospective ranking performance. For readers who prefer a more concrete scale, the corresponding mean hits inside the top-100 shortlist are about 5.7, 8.7, 16.3, and 26.3 for the graph score and 8.3, 12.0, 23.3, and 36.3 for preferential attachment across h=3,5,10,15.
The small values are real, but they are not trivial. On average, the future contains about 2,955 realized directed links at h=3, 4,994 at h=5, 13,221 at h=10, and 29,809 at h=15. So the top-100 shortlist is being asked to recover a small fraction of a very large future stock.
5.2 The attention-allocation frontier
The first way to move from prediction toward allocation is to relax the top-100 bottleneck. Economists rarely consume one candidate suggestion and stop. They browse shortlists. The attention-allocation outputs therefore ask what happens when the screening budget expands from K=50 to K=1000. I summarize that margin using future links per 100 suggestions, which is just the shortlist precision rescaled into a more readable unit.
The result is again mixed but informative. At h=3, preferential attachment places about 11.0 future links per 100 suggestions at K=100, compared with 5.75 for the graph score. By K=1000, the two rules are essentially tied in practical terms: preferential attachment yields about 4.25 future links per 100 while the graph score yields about 4.70. The same pattern appears at h=5: the gap is 14.75 versus 8.25 at K=100, but 6.83 versus 7.10 by K=1000. At h=10, the tight-budget gap remains larger, yet even there the frontier narrows substantially, from 27.5 versus 17.75 at K=100 to 13.6 versus 13.5 by K=1000.

Notes. Each panel reports mean future links per 100 surfaced suggestions as the shortlist expands from K=50 to K=1000. The quantity is a rescaled precision measure computed over common rolling cutoff-year cells. Preferential attachment remains stronger at very small K, but the gap shrinks sharply as the reading budget widens.
That makes the current paper’s answer to the title more precise. If what should economics work on next? is interpreted as what is the single most likely next direct link?, preferential attachment wins. If it is interpreted as what should a researcher read, scope, or test next under a realistic shortlist budget?, the graph-based object becomes more relevant. It remains weaker at the very top rank, but it moves materially closer once the screening problem looks more like actual research browsing.
5.3 What changes when future links are value-weighted
Future appearance is not the only margin that matters. A later realized link can also be weighted by downstream reuse, so that some realized links count more than others. The impact-weighted rerun therefore asks whether the graph score looks relatively better once the future is weighted by later reuse rather than treated as binary appearance alone.
The answer is again disciplined rather than triumphant. Weighted MRR still favors preferential attachment at each of the main horizons: about 0.001523 versus 0.001383 at h=3, 0.001154 versus 0.000943 at h=5, and 0.000809 versus 0.000568 at h=10. So the strict-headline result is not only about low-value fills. Central concepts still capture more of the heavily reused future links. But the broader weighted frontier is less one-sided than the weighted MRR line alone suggests. At K=1000, weighted recall is nearly tied at h=3 and h=10: preferential attachment reaches about 0.01762 and 0.01495, while the graph score reaches about 0.01729 and 0.01488. The gap is still larger at h=5, but even there it is far smaller than the tight-rank headline would suggest.

Notes. The left panel reports weighted MRR by horizon, where future realized links are weighted by later reuse. The right panels report weighted recall frontiers over shortlist size K. Preferential attachment still dominates the tighter top ranks, but the weighted frontier narrows materially at broad lists.
That result matters for how the title should be read. It shows that the paper is not merely reclassifying trivial future links as success. Weighting by downstream reuse leaves the top-rank popularity story intact. The more favorable reading for the graph score enters instead through broader attention frontiers and through the kinds of literatures in which local structure does more screening work.
This is also where the paper’s credibility story enters. The graph is not built from raw co-occurrence. It already carries stability, causal-presentation, evidence-type, and edge-role metadata from the paper-local extraction layer. Appendix E shows that directed causal rows have mean stability around 0.93, compared with about 0.87 for undirected contextual rows, and that over 90 percent of directed causal rows fall into the high-stability band. So the method-family heterogeneity results below are not just subfield color. They are part of the paper’s broader claim that some local graph neighborhoods are more credible terrain for screening than others, even though the current main score does not yet fully weight those signals.
5.4 Where structure helps more
The pooled top-100 comparison hides meaningful variation. The most useful way to read the atlas is not as a search for one subgroup in which the graph score cleanly “wins.” It is a map of where cumulative advantage is more dominant and where local graph structure adds more screening value. Once the frontier is evaluated over broader fixed-K and percentile-K shortlists, the graph score becomes substantially more competitive than the strict top-100 headline suggests.

Notes. The pooled frontier figure reports the graph score’s relative recall advantage over preferential attachment. Positive values favor the graph score. The lighter horizon lines correspond to shorter horizons and the darker lines to longer horizons.
Journal tier matters. Adjacent journals are more favorable terrain for the structural score than the core. Method family matters as well. Design-based causal slices are much more favorable than panel- or time-series-heavy slices.

Notes. This figure reports the main method-family heterogeneity estimates from the atlas. The practical reading is that design-based causal slices are materially more favorable to the graph score than panel- or time-series-heavy slices.
Funding adds nuance rather than a single clean pattern. In the coarse funded-versus-unfunded split, the funded literature is less favorable to the graph score than the unfunded literature. The appendix therefore treats funding as suggestive rather than central, and the funding-by-journal interaction view is useful mainly because it shows that the funded pattern is not uniform.

Notes. The main topic heatmap prioritizes the most populous economics-facing topic groups rather than all broad adjacent categories. Cell color reports the pooled percentile-frontier advantage of the graph score over preferential attachment, while the annotations report the top-100 hit delta in basis points.
The robust main-text message is therefore restrained but substantive. Broader frontier shortlists soften the pooled headline. Adjacent journals look better than the core. Design-based slices look better than panel or time series. Several concrete economics topics look better than the pooled average. Funding seems to matter, but mostly as a secondary institutional layer on top of the more basic popularity-versus-structure comparison. If the title is read as a question about where a structural screen is most useful, this subsection gives the clearest answer: not everywhere equally, but especially in adjacent, design-based, and several concrete economics-facing topic clusters.
5.5 Path evolution beyond direct-link closure
5.3.1 Aggregate transition patterns
The direct-link framing is not the only way research can evolve. I distinguish two simple transition types on length-2 structure. path_to_direct means a supporting path exists first and the missing direct edge later appears. direct_to_path means the direct edge exists first and a supporting mediator path appears only later.

Notes. The figure compares path_to_direct and direct_to_path transitions. The key interpretation is that mechanism-deepening around existing direct claims is often more common than direct-link closure.
At h=10, for example, the direct-to-path share rises from roughly 0.049 in the 1980s to 0.089 in the 1990s, 0.178 in the 2000s, and 0.355 in the 2010s. The corresponding path-to-direct shares are much smaller: about 0.023, 0.014, 0.015, and 0.020.
5.3.2 Where path closure is more common
The journal split is especially revealing. At h=3,5,10,15, the share of realized path-related transitions that take the path-to-direct form is about 0.571, 0.579, 0.529, and 0.471 in adjacent journals, but only about 0.442, 0.443, 0.400, and 0.360 in the core.

Notes. The figure reports the share of realized path-related transitions that take the path-to-direct form by journal tier and horizon. Adjacent journals are consistently more path-to-direct heavy than core journals.
The broad subfield split points in the same direction. Economics and Econometrics is relatively balanced at short horizons, while Finance is more direct-to-path heavy throughout.

Notes. This figure reports the share of realized path-related transitions that take the path-to-direct form by broad subfield and horizon. Economics and Econometrics is more balanced than Finance at short horizons, but both become more direct-to-path heavy at longer horizons.
5.3.3 Current path-rich examples
The recommendation layer already hints at what path-rich questions look like. Investment -> carbon emissions is supported by 38 observed paths through concepts such as economic growth, technological innovation, and economic development. Public debt -> CO2 emissions has 23 supporting paths through growth, financial development, and renewable energy consumption. Monetary policy -> energy consumption has 23 supporting paths through income, output, and income inequality. These examples are not historical validation evidence, but they do show why the path-based object is concrete enough to inspect in the public interface rather than treat as an abstract graph statistic.
| Candidate question | Supporting paths | Example mediators |
|---|---|---|
| Investment -> carbon emissions | 38 | economic growth; technological innovation; economic development |
| Public debt -> CO2 emissions | 23 | economic growth; financial development; renewable energy consumption |
| Monetary policy -> energy consumption | 23 | income; output; income inequality |
| Trade liberalisation -> energy consumption | 5 | economic growth; foreign direct investment; trade liberalization |
| Urbanization -> output growth | 17 | CO2 emissions; energy consumption; energy use |
Taken together, Sections 5.1 to 5.5 imply a cumulative reading of the evidence. The strict top-100 benchmark is harsh and popularity-dominated. Broader attention frontiers soften that headline. Value-weighting changes the scale of the comparison without reversing it. Heterogeneity shows where structural screening is actually more useful. The path audit then explains why even that richer reading still does not exhaust the graph’s value: a good share of scientific development takes the form of mechanism-deepening around existing direct claims, not only direct-link closure itself. In that sense, what economics should work on next is often better understood as a path-rich research program than as a single isolated missing edge.
6. FrontierGraph as Research Tool
FrontierGraph is the public interface built on top of this empirical framework. The site is not the evidence for the paper’s main claims. Its role is different: it makes the ranked objects inspectable. A user can browse candidate questions, inspect supporting paths and mediators, move from a surfaced pair to nearby literature, and decide whether the object is a gap question, a boundary question, or a path-rich mechanism question that still needs substantive vetting.
7. Discussion and Conclusion
Several limits matter for interpretation. A future realized link is not the same thing as truth, importance, or policy value. The benchmark is about future appearance in the literature, not about a complete normative theory of what economists should work on. If cumulative advantage dominates the future, preferential attachment can outperform even when the graph score is surfacing more genuinely underexplored questions.
Direction in the graph records ordered claim relations rather than final causal adjudication. The main score still treats the existence of an edge more seriously than the strength or credibility of the underlying evidence. The published-journal corpus is a further deliberate restriction rather than a universal map of all economics research.
These limits do not make the exercise empty. They define its scope. The paper’s answer to its own title is narrower than a welfare theorem but still substantive: economics should not read what comes next only through cumulative advantage. The most useful surfaced objects are neglected enough to remain open, supported enough to be credible, concrete enough to become papers, and best read at realistic attention frontiers rather than at winner-take-all top ranks. Empirically, that means the strict top-100 shortlist still favors preferential attachment, but broader attention frontiers, value-weighted outcomes, heterogeneity, and path development all make more room for structural screening than the pooled headline alone suggests. Practically, the paper pairs that screening object with a public interface that makes surfaced questions inspectable rather than opaque. A next iteration should add stronger credibility weighting and richer path-based objects, and could also compare explanation or reranking layers across LLMs as a bounded appendix-style extension without changing the current paper’s observational core.
Appendix A. Paper-local graph extraction
This appendix documents the paper-local extraction layer used in the paper’s evaluation. Garg and Fetzer (2025) show that economics papers can be converted into paper-level claim graphs by prompting a language model to recover nodes, directional relations, and claim metadata from title-and-abstract text. The present paper inherits that paper-local view of extraction but extends it in service of a different downstream object.
Prompts
System prompt
You extract a paper-local research graph from a paper title and abstract.
Return only structured output that matches the supplied JSON schema.
Task:
- Read the paper title and abstract.
- Build a paper-local graph with `nodes` and `edges`.
- Reuse the same node when the same concept genuinely recurs within the same paper.
- Do not use outside knowledge.
- Do not infer relationships that are not supported by the title or abstract.
Purpose:
- The output will later be turned into a larger deterministic research graph.
- Downstream systems depend on consistent paper-local node reuse.
- If the abstract contains a chain like `A -> B`, `B -> C`, and `X -> B`, the shared concept `B` should be represented by the same paper-local node if it is genuinely the same concept.
- However, do not merge distinct concepts just because they seem related.
Critical rules:
- Do not create transitive closure.
- If the abstract states `A -> B` and `B -> C`, do not create `A -> C` unless the paper explicitly states `A -> C`.
- Do not create both `A -> B` and `B -> A` for one undirected claim.
- If the title or abstract says two variables are associated or correlated without directional language, encode one edge with `directionality = undirected`.
- For undirected edges, use the first-mentioned concept as `source_node_id` and the second-mentioned concept as `target_node_id` only as a storage convention.
How to represent nodes:
- Use concise noun phrases grounded in the paper text.
- Keep node labels concept-level when possible.
- Do not bake country or year into the node label unless it is essential to the concept itself.
- Put local scope information into `study_context` or `condition_or_scope_text`.
- Use `surface_forms` for distinct mentions that refer to the same paper-local concept.
- Use `study_context` only for context explicitly stated in the title or abstract.
- If no context is stated, use:
- `unit_of_analysis: []`
- `start_year: []`
- `end_year: []`
- `countries: []`
- `context_note: "NA"`
How to represent edges:
- Extract only relations that the title or abstract states, studies, or reports.
- Keep background or prior-literature claims only if they are explicitly stated in the title or abstract, and mark them with `edge_role = background`.
- Use `claim_text` as a short normalized relation string.
- Use `evidence_text` as a short supporting excerpt or close paraphrase from the title/abstract only.
Directionality:
- Use `directionality = directed` when the paper frames one concept as affecting, predicting, changing, increasing, decreasing, explaining, or determining another.
- Use `directionality = undirected` when the paper frames the relation as association, correlation, co-movement, similarity, or linkage without directional commitment.
- Prediction is directional, even if it is not causal.
Causal presentation:
- `explicit_causal`: the paper explicitly uses causal language such as affects, causes, leads to, increases, reduces, impact of, effect of.
- `implicit_causal`: the paper strongly frames the relation as an effect or treatment relation without fully explicit causal wording.
- `noncausal`: the paper frames the relation as association, correlation, prediction, linkage, or descriptive relation.
- `unclear`: the wording is too ambiguous to classify confidently.
- This field is about how the paper presents the relation, not whether the method truly justifies causality.
Relationship type:
- `effect`: one concept is presented as affecting another.
- `association`: correlation, co-movement, linkage, or association.
- `prediction`: one concept predicts or forecasts another.
- `difference`: one concept differs across groups, places, times, or conditions.
- `other`: only if none of the above fit.
Edge role:
- `main_effect`: central edge or main result in the abstract.
- `mechanism`: pathway or channel relation.
- `heterogeneity`: subgroup or conditional variation in a relation.
- `descriptive_pattern`: stylized fact or descriptive empirical pattern.
- `background`: motivating or prior-literature relation stated in the abstract.
- `robustness`: supporting or validating relation rather than the main contribution.
- `other`: only if needed.
Claim status:
- `effect_present`: the abstract reports that the relation is present.
- `no_effect`: the abstract reports no effect or no relation.
- `mixed_or_ambiguous`: the abstract reports mixed, inconsistent, or ambiguous results.
- `conditional_effect`: the relation holds only for some subgroup, time period, or condition.
- `question_only`: the abstract raises or studies the relation but does not report a result.
- `other`: only if needed.
Explicitness:
- `result_only`: the relation is presented as a result.
- `question_only`: the relation is posed as a question or objective only.
- `question_and_result`: the abstract both frames the question and reports a result on the same relation.
- `background_claim`: the relation appears as background motivation or prior literature.
- `implied`: the relation is clearly implied by the abstract wording but not directly phrased as a standalone claim.
Condition or scope:
- Use `condition_or_scope_text` for subgroup, timing, geographic, or sample qualifiers on the edge.
- Use `NA` if not needed.
Sign:
- `increase`, `decrease`, `no_effect`, `ambiguous`, `NA`
Statistical significance:
- `significant`, `not_significant`, `mixed_or_ambiguous`, `not_reported`, `NA`
Evidence method:
- Choose the best supported option from the schema.
- `experiment`, `DiD`, `IV`, `RDD`, `event_study`, `panel_FE_or_TWFE`, `time_series_econometrics`, `structural_model`, `simulation`, `theory_or_model`, `qualitative_or_case_study`, `descriptive_observational`, `prediction_or_forecasting`, `other`, `do_not_know`
Nature of evidence:
- Choose the broad evidence type used for that edge.
Uses data:
- `true` if the edge is supported by data use described in the title/abstract.
Sources of exogenous variation:
- Record only if explicitly stated in the title or abstract.
- Otherwise use `NA`.
Tentativeness:
- `certain`, `tentative`, `mixed_or_qualified`, `unclear`
What not to do:
- Do not label edges as collider, confounder, mediator, instrument, or any other downstream graph-structural role.
- Do not globally canonicalize concepts across papers.
- Do not create edges from general world knowledge.
- Do not invent countries, years, samples, or methods.
If the title/abstract contains no extractable graph:
- return `nodes: []` and `edges: []`
User prompt template
Extract a paper-local research graph from the following title and abstract.
Use only the information in the title and abstract.
Return only the structured output that matches the supplied JSON schema.
Title:
{{paper_title}}
Abstract:
{{paper_abstract}}
Schema
Node fields
| Field | Allowed values / type | Meaning | Why it exists downstream |
|---|---|---|---|
node_id | string | Paper-local identifier such as n1 | Stable local concept key |
label | short string | Concise concept label | Base string passed to normalization |
surface_forms | array of strings | Distinct surface mentions | Preserves within-paper synonymy |
study_context.unit_of_analysis | array | Explicit unit of analysis | Keeps sample off the node label |
study_context.start_year | array of integers | Explicit start years | Preserves local scope |
study_context.end_year | array of integers | Explicit end years | Preserves local scope |
study_context.countries | array of strings | Explicit countries | Preserves setting without splitting concept identity |
study_context.context_note | string | Residual local scope text | Keeps nuance for audit/display |
Edge fields
| Field | Allowed values / type | Meaning | Why it exists downstream |
|---|---|---|---|
edge_id | string | Paper-local edge identifier | Stable local relation key |
source_node_id, target_node_id | strings | Local node references | Connect edge to local nodes |
directionality | directed / undirected | Whether the relation is directional | Determines storage as ordered or undirected |
relationship_type | effect / association / prediction / difference / other | Coarse semantic type | Later filtering and audit |
causal_presentation | explicit_causal / implicit_causal / noncausal / unclear | How the paper speaks about the relation | Separates language from design |
edge_role | main_effect / mechanism / heterogeneity / descriptive_pattern / background / robustness / other | Role in the paper | Separates central claims from channels and background |
claim_status | effect_present / no_effect / mixed_or_ambiguous / conditional_effect / question_only / other | What result the paper reports | Distinguishes results from question-only edges |
explicitness | result_only / question_only / question_and_result / background_claim / implied | How explicitly the edge is framed | Distinguishes stated from implied claims |
condition_or_scope_text | string | Edge-level qualifier | Keeps scope on the relation |
claim_text | string | Short normalized relation text | Audit-friendly summary |
evidence_text | string | Short supporting excerpt/paraphrase | Makes extraction inspectable |
sign | increase / decrease / no_effect / ambiguous / NA | Reported sign | Later descriptive splits |
effect_size | string | Reported magnitude | Reserved for future extensions |
statistical_significance | enum | Reported significance status | Keeps evidence strength distinct |
evidence_method | enumerated method family | Main method family | Determines directed vs contextual role downstream |
evidence_method_other_description | string | Text when method is other | Preserves method specificity |
nature_of_evidence | enum | Broad evidence type | Helps separate theory and empirical slices |
uses_data | boolean | Whether the edge uses data | Simple theory/empirical flag |
sources_of_exogenous_variation | string | Explicit exogenous source if named | Future credibility-weighting hook |
tentativeness | certain / tentative / mixed_or_qualified / unclear | Strength of the language | Separates cautious from assertive claims |
Design choices
Paper-local node reuse. Downstream graph construction depends on whether a path inside one paper reuses a local concept consistently.
No transitive closure. Missing direct links are the object of interest. If extraction created transitive closure, the benchmark would erase many of the candidates it later wants to rank.
Directed versus undirected storage. Undirected relations are stored once by convention rather than duplicated as two directed edges.
Keeping scope off the node label. Country, year, subgroup, and sample qualifiers are stored in dedicated context fields whenever possible.
Separating causal_presentation from evidence_method. A paper can talk causally without using a strong design, and vice versa.
Separating claim_status, explicitness, tentativeness, and edge_role. These fields overlap in plain language but do different jobs downstream.
Appendix B. Node normalization and ontology construction
Node normalization is a major measurement problem in its own right. Paper-local concept strings vary in wording, scope, and granularity. Candidate generation, path counts, and missingness all depend on node identity.
Why a native ontology is needed here
Garg and Fetzer (2025) use the extracted paper-level objects for a different downstream task. The present paper asks a more node-sensitive question. Here the downstream object is a reusable concept graph in which a candidate next paper is a missing link between specific concepts. In that setting, broad field labels are too coarse.
Implemented pipeline
| Stage | What it does | Why it is needed |
|---|---|---|
| Head pool | Selects high-support labels by coverage and support thresholds | Defines a stable candidate set |
| Accepted heads | Clusters compatible head labels into native concept IDs | Creates the ontology inventory |
| Hard mapping | Uses exact, lexical-signature, and reviewed embedding rules | Resolves easy cases conservatively |
| Soft mapping | Uses shortlist/global embedding matching and lexical shortlists | Maps the middle tail |
| Pending labels | Stores unresolved labels and why they failed to map | Makes the tail auditable |
| Force-mapped tail recovery | Assigns unresolved labels to existing head concepts with stored score, margin, and quality band | Expands benchmark coverage while preserving provenance |
The fuller force-mapped corpus is now the canonical benchmark because the alternative was to let mapping quality mechanically determine which literatures count as candidate-generating.
Appendix C. Benchmark construction and significance
| Quantity | Count |
|---|---|
| Source-selected papers | 242,595 |
| Papers with extracted edges | 230,929 |
| Raw extracted edges | 1,443,407 |
| Normalized benchmark papers | 230,479 |
| Unique concepts in benchmark graph | 6,752 |
| Directed causal rows | 89,737 |
| Undirected contextual rows | 1,181,277 |
| Total normalized links | 1,271,014 |
| Metric | h=3 | h=5 | h=10 | h=15 |
|---|---|---|---|---|
| Recall@100, graph-based main model | 0.003239 | 0.002518 | 0.001956 | 0.001494 |
| Recall@100, preferential attachment | 0.003826 | 0.003105 | 0.002784 | 0.002138 |
| MRR, graph-based main model | 0.000811 | 0.000524 | 0.000334 | 0.000227 |
| MRR, preferential attachment | 0.000901 | 0.000637 | 0.000420 | 0.000281 |
| Quantity | h=3 | h=5 | h=10 | h=15 |
|---|---|---|---|---|
| Delta Recall@100 | -0.000587 | -0.000588 | -0.000828 | -0.000644 |
| p-value for Delta Recall@100 | 0.740 | 0.064 | 0.000 | 0.000 |
| Delta MRR | -0.000090 | -0.000113 | -0.000086 | -0.000054 |
| p-value for Delta MRR | 0.000 | 0.000 | 0.000 | 0.000 |
Appendix D. Heterogeneity atlas extensions

Notes. Rows correspond to cutoff-period bins and columns to horizons. Cell color reports the pooled percentile-frontier advantage of the graph score over preferential attachment.

Notes. This interaction view combines funding status and journal tier. It is useful but secondary because funding coverage is uneven and institution-level interpretation mixes composition with behavior.

Notes. The main topic heatmap prioritizes the most populous economics-facing topic groups rather than all broad adjacent categories.

Notes. Only stable, high-support funders are shown. The current stable set keeps ESRC, NSFC China, DFG, and NSF.
Appendix E. Credibility audit summaries
The main score does not yet fully weight evidence quality, but the benchmark object is not blind to it either. The extraction layer already records stability, causal presentation, evidence type, and related claim metadata. The tables below should therefore be read as a quality audit of the empirical object rather than as a replacement ranking model.
Edge kind
| Edge kind | Rows | Papers | Mean stability | Explicit-causal share |
|---|---|---|---|---|
| Directed causal | 89,737 | 23,213 | 0.930 | 69.0% |
| Undirected contextual | 1,181,277 | 221,192 | 0.868 | 45.3% |
Directed causal evidence types
| Evidence type | Rows | Mean stability | Explicit-causal share |
|---|---|---|---|
| Panel FE / TWFE | 44,106 | 0.938 | 68.8% |
| Difference-in-differences | 16,658 | 0.933 | 75.1% |
| Experiment | 15,616 | 0.900 | 56.7% |
| Event study | 6,247 | 0.940 | 73.5% |
| Instrumental variables | 5,601 | 0.933 | 78.3% |
| Regression discontinuity | 1,509 | 0.922 | 80.3% |
Stability bands
| Edge kind | High stability | Mid stability | Low stability |
|---|---|---|---|
| Directed causal | 91.8% | 5.6% | 2.6% |
| Undirected contextual | 85.4% | 5.1% | 9.5% |
Appendix F. Path-evolution extensions

Notes. Economics and Econometrics is more balanced than Finance at short horizons, but both become more direct-to-path heavy at longer horizons.
| Candidate question | Supporting paths | Example mediators |
|---|---|---|
| Investment -> carbon emissions | 38 | economic growth; technological innovation; economic development |
| Public debt -> CO2 emissions | 23 | economic growth; financial development; renewable energy consumption |
| Monetary policy -> energy consumption | 23 | income; output; income inequality |
| Trade liberalisation -> energy consumption | 5 | economic growth; foreign direct investment; trade liberalization |
| Urbanization -> output growth | 17 | CO2 emissions; energy consumption; energy use |
Appendix G. Tool and interpretation notes
The website layer is designed to expose the ranked objects rather than to replace the benchmark. In particular, the public interface helps with three tasks that the evaluation tables alone cannot solve: inspecting why a candidate surfaced, seeing which mediators or neighboring concepts support it, and distinguishing a local gap question from a thinner boundary question.
Two final diagnostic facts are useful when reading the paper. First, grant metadata are present but incomplete, with coverage much stronger in recent years than in older decades. Second, the graph already contains method and stability metadata that can support richer credibility-weighted benchmarks later.
References
- Barabasi, Albert-Laszlo, and Reka Albert. 1999. “Emergence of Scaling in Random Networks.” Science 286(5439): 509-512.
- Bloom, Nicholas, Charles I. Jones, John Van Reenen, and Michael Webb. 2020. “Are Ideas Getting Harder to Find?” American Economic Review 110(4): 1104-1144.
- Carnehl, Christoph, Marco Ottaviani, and Justus Preusser. 2024. Designing Scientific Grants. NBER Working Paper 32668.
- Fortunato, Santo, et al. 2018. “Science of Science.” Science 359(6379): eaao0185.
- Garg, Prashant, and Thiemo Fetzer. 2025. “Causal Claims in Economics.” arXiv:2501.06873.
- Jones, Benjamin F. 2009. “The Burden of Knowledge and the `Death of the Renaissance Man’: Is Innovation Getting Harder?” Review of Economic Studies 76(1): 283-317.
- Price, Derek J. de Solla. 1976. “A General Theory of Bibliometric and Other Cumulative Advantage Processes.” Journal of the American Society for Information Science 27(5): 292-306.
- Uzzi, Brian, Satyam Mukherjee, Michael Stringer, and Ben Jones. 2013. “Atypical Combinations and Scientific Impact.” Science 342(6157): 468-472.
- Wang, Dashun, and Albert-Laszlo Barabasi. 2021. The Science of Science. Cambridge University Press.