The Lightning Rod SDK provides a simple Python API for generating custom forecasting datasets to train your LLMs. Transform news articles, documents, and other real-world data into high-quality training samples automatically.
Based on our research: Future-as-Label: Scalable Supervision from Real-World Outcomes
pip install lightningrod-aiSign up at dashboard.lightningrod.ai to get your API key and $50 of free credits.
Generate 1000+ forecasting questions in minutes - from raw sources to labeled dataset, automatically. ⚡
from lightningrod import LightningRod, AnswerType, QuestionPipeline, NewsSeedGenerator, ForwardLookingQuestionGenerator, WebSearchLabeler
lr = LightningRod(api_key="your-api-key")
binary_answer = AnswerType(answer_type=AnswerTypeEnum.BINARY)
pipeline = QuestionPipeline(
seed_generator=NewsSeedGenerator(
start_date=datetime.now() - timedelta(days=90),
end_date=datetime.now(),
search_query=["Trump"],
),
question_generator=ForwardLookingQuestionGenerator(
instructions="Generate binary forecasting questions about Trump's actions and decisions.",
examples=[
"Will Trump impose 25% tariffs on all goods from Canada by February 1, 2025?",
"Will Pete Hegseth be confirmed as Secretary of Defense by February 15, 2025?",
]
),
labeler=WebSearchLabeler(answer_type=binary_answer),
)
dataset = lr.transforms.run(pipeline, max_questions=3000)
dataset.flattened() # Ready-to-use data for your training pipelinesWe use this to generate the Future-as-Label training dataset for our research paper.
We have some example notebooks to help you get started! If you have trouble using the SDK, please submit an issue on Github.
For complete API reference documentation, see API.md. This includes overview of the core system concepts, methods and types.
MIT License - see LICENSE file for details
