Merge saas branch: add SaaS infrastructure#143
Conversation
merge main to saas
Merge Main to SaaS
- Add referral columns to Profiles model (code, referrer, count) - Create ReferralService for managing referrals - Add API endpoints for applying codes and viewing stats - Centralize profile creation logic with ensure_profile_exists - Generate migration script (unapplied) with data backfill support
- Make referral_code nullable and remove default generation - Implement lazy generation in ReferralService and API - Use db_transaction in ensure_profile_exists - Update migration script to reflect nullable columns - Handle feedback from PR review
- Make referral_code nullable and remove default generation - Implement lazy generation in ReferralService and API - Use db_transaction in ensure_profile_exists - Update migration script to reflect nullable columns - Handle feedback from PR review - Fix linting and type errors - Add unit tests for referral system
- Updated `alert_admin` tool to include the user's email in the success log message. - Removed the Telegram message ID from the log message as requested. - Verified that `user_profile` is correctly retrieved and used. - Passed all existing tests for `alert_admin`.
…754171 Update admin alert log message
- Make referral_code nullable and remove default generation - Implement lazy generation in ReferralService and API - Use db_transaction in ensure_profile_exists - Update migration script to reflect nullable columns - Handle feedback from PR review - Fix linting and type errors - Add unit tests for referral system
- Make referral_code nullable and remove default generation - Implement lazy generation in ReferralService and API - Use db_transaction in ensure_profile_exists - Update migration script to reflect nullable columns - Handle feedback from PR review - Fix linting and type errors
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Make referral_code nullable and remove default generation - Implement lazy generation in ReferralService and API - Use db_transaction in ensure_profile_exists - Update migration script to reflect nullable columns - Handle feedback from PR review - Fix linting and type errors - Add unit tests for referral system
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Remove manual collision check in ReferralService - Handle IntegrityError instead - Update get_or_create_referral_code logic
- Wrap database updates in db_transaction to ensure atomicity - Prevents partial state if validation logic fails after commit (though logically validated before) - Aligns with codebase patterns
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Make referral_code nullable and remove default generation - Implement lazy generation in ReferralService and API - Use db_transaction in ensure_profile_exists - Update migration script to reflect nullable columns - Handle feedback from PR review - Fix linting and type errors
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
- Implement atomic increment for referral count to prevent race conditions - Clean up comments in ReferralService - Verify CI passes
…2664991858159 Referral System Implementation
Introduces a new GET endpoint to retrieve daily rate limit usage, providing visibility into current consumption, remaining quota, and reset times. - Adds `AgentLimitResponse` Pydantic model for standardized responses. - Implements `get_agent_limits` route using `ensure_daily_limit`. - Adds E2E tests for authenticated and unauthenticated access.
Introduces a new GET endpoint to retrieve daily rate limit usage, providing visibility into current consumption, remaining quota, and reset times. - Adds `AgentLimitResponse` Pydantic model for standardized responses. - Implements `get_agent_limits` route using `ensure_daily_limit` with `enforce=False`. - Adds E2E tests for authenticated and unauthenticated access. - Validated via `make ci` and `make test`.
…25189082607 Add agent limits status endpoint
- Fixed a critical security vulnerability in `workos_auth.py` where authentication could be bypassed in production if the script name contained "test". The check is now gated by `DEV_ENV != "prod"`. - Moved hardcoded CORS origins from `src/server.py` to `global_config.yaml` to support configurable environments. - Enabled access logs in `uvicorn` for better production observability. - Updated `common/config_models.py` to include `ServerConfig`.
…ability-11261091240424952918 🔒 Fix critical auth vulnerability and improve observability
…eaming) Intelligently merges the saas branch into main, keeping main's maintained infrastructure while bringing in SaaS-specific features including database models, alembic migrations, Stripe payments, WorkOS auth, agent chat with streaming, Telegram integration, and Railway deployment support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| """ | ||
| Return a deterministic SHA-256 hash for an API key. | ||
| """ | ||
| return hashlib.sha256(api_key.encode("utf-8")).hexdigest() |
Check failure
Code scanning / CodeQL
Use of a broken or weak cryptographic hashing algorithm on sensitive data High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 13 hours ago
General approach: Replace the direct use of hashlib.sha256 with a password‑hashing or key‑derivation function that is computationally expensive and parameterizable. Because we only need deterministic, equality‑checkable hashes in the database (not reversible encryption), PBKDF2 from the standard library (hashlib.pbkdf2_hmac) is a good choice that avoids adding non‑stdlib dependencies. We will introduce a module‑level, fixed salt and iteration count for API key hashing, and use PBKDF2‑HMAC‑SHA256 to derive a hash string. This keeps behavior (store hash, compare hash) intact while strengthening resistance to offline brute‑force attacks.
Concrete changes in this file:
- Add constants for an API‑key hashing salt, iteration count, and output length near the other module‑level constants.
- Rewrite
hash_api_keyto:- Use
hashlib.pbkdf2_hmac("sha256", ...)with the defined salt and iteration count. - Hex‑encode the resulting bytes (
.hex()), preserving the string nature ofkey_hash.
- Use
- Leave the surrounding logic in
create_api_keyandvalidate_api_keyunchanged, since they already rely solely on the deterministic hash string.
This change is fully local to src/api/auth/api_key_auth.py and keeps the DB schema (key_hash as a string) and external behavior intact. The only imports needed come from hashlib, which is already imported.
| @@ -20,16 +20,28 @@ | ||
| API_KEY_PREFIX = "sk_" | ||
| KEY_PREFIX_LENGTH = 8 | ||
|
|
||
| # Parameters for API key hashing. PBKDF2 adds computational cost to resist brute-force attacks. | ||
| API_KEY_HASH_SALT = b"api-key-hash-salt-v1" | ||
| API_KEY_HASH_ITERATIONS = 100_000 | ||
| API_KEY_HASH_LENGTH = 32 # 32 bytes = 256-bit derived key | ||
|
|
||
|
|
||
| def _utcnow() -> datetime: | ||
| return datetime.now(timezone.utc) | ||
|
|
||
|
|
||
| def hash_api_key(api_key: str) -> str: | ||
| """ | ||
| Return a deterministic SHA-256 hash for an API key. | ||
| Return a deterministic, computationally expensive hash for an API key. | ||
| """ | ||
| return hashlib.sha256(api_key.encode("utf-8")).hexdigest() | ||
| dk = hashlib.pbkdf2_hmac( | ||
| "sha256", | ||
| api_key.encode("utf-8"), | ||
| API_KEY_HASH_SALT, | ||
| API_KEY_HASH_ITERATIONS, | ||
| dklen=API_KEY_HASH_LENGTH, | ||
| ) | ||
| return dk.hex() | ||
|
|
||
|
|
||
| def generate_api_key_value() -> str: |
Greptile SummaryThis PR merges the
Confidence Score: 2/5
Important Files Changed
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 96-101
Comment:
**`datetime.fromtimestamp(None)` will crash**
`invoice.get("period_start")` and `invoice.get("period_end")` can return `None` if the keys are missing from the Stripe invoice payload. Passing `None` to `datetime.fromtimestamp()` raises a `TypeError` at runtime. The same pattern repeats at lines 193-199 and 212-218 with `subscription_data.get("current_period_start")` / `subscription_data.get("current_period_end")`.
Consider using bracket access (`invoice["period_start"]`) to get a clear `KeyError` pointing at the root cause, or guard with an explicit `None` check before converting.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 247-265
Comment:
**Immediate downgrade on first payment failure**
On any `invoice.payment_failed` event, the user is immediately downgraded to the free tier. Stripe typically retries failed payments multiple times (configurable in Stripe dashboard), and each retry emits another `invoice.payment_failed` event. Downgrading on the first failure means users lose access before Stripe has finished its retry cycle.
Consider checking the invoice's `attempt_count` or `next_payment_attempt` field and only downgrading after all retries are exhausted (i.e., `next_payment_attempt` is `None`).
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/api/auth/workos_auth.py
Line: 137-149
Comment:
**Test mode auth bypass relies on multiple heuristics**
The test mode detection combines three checks (`is_pytest`, `is_dev_env_test`, `is_script_test`) where any one being `True` disables JWT signature verification entirely. While the `sys.argv` check is gated behind `DEV_ENV != "prod"`, the `"pytest" in sys.modules` check has no such guard — if `pytest` is accidentally in `sys.modules` in a production deployment (e.g., a test dependency pulled transitively), authentication is bypassed.
Consider making the bypass strictly depend on a single explicit signal (e.g., only `DEV_ENV == "test"`) rather than ORing multiple heuristics together.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 191
Comment:
**String literal instead of `SubscriptionTier` enum**
This uses the string literal `"plus_tier"` while `checkout.py` uses `SubscriptionTier.PLUS.value`. Both currently resolve to the same string, but if the enum value changes, this file will silently diverge. The same issue occurs at lines 210, 241, and 265.
```suggestion
subscription.subscription_tier = SubscriptionTier.PLUS.value
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 22-55
Comment:
**Fallback to secondary webhook secret weakens secret rotation**
When verification with the primary secret fails, the code falls back to the secondary secret. This means that after a secret rotation, an attacker who obtained the old secret can still send valid webhook events until the old secret is fully removed from config. The comment says this "helps when env vars are swapped," but this could be addressed with proper deployment configuration rather than accepting two secrets at once.
If the dual-secret approach is intentional for zero-downtime rotations, consider at minimum logging a warning when the secondary secret is used so operators know the primary needs fixing.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 275-277
Comment:
**Exception detail exposed in webhook error response**
`str(e)` is passed directly as the HTTP response detail, which can leak internal implementation details (stack traces, DB connection strings, etc.) to whoever sends the webhook request.
```suggestion
logger.error(f"Error processing subscription webhook: {str(e)}")
raise HTTPException(status_code=500, detail="Internal server error")
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: fbc977a |
| subscription.billing_period_start = datetime.fromtimestamp( | ||
| invoice.get("period_start"), tz=timezone.utc | ||
| ) | ||
| subscription.billing_period_end = datetime.fromtimestamp( | ||
| invoice.get("period_end"), tz=timezone.utc | ||
| ) |
There was a problem hiding this comment.
datetime.fromtimestamp(None) will crash
invoice.get("period_start") and invoice.get("period_end") can return None if the keys are missing from the Stripe invoice payload. Passing None to datetime.fromtimestamp() raises a TypeError at runtime. The same pattern repeats at lines 193-199 and 212-218 with subscription_data.get("current_period_start") / subscription_data.get("current_period_end").
Consider using bracket access (invoice["period_start"]) to get a clear KeyError pointing at the root cause, or guard with an explicit None check before converting.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 96-101
Comment:
**`datetime.fromtimestamp(None)` will crash**
`invoice.get("period_start")` and `invoice.get("period_end")` can return `None` if the keys are missing from the Stripe invoice payload. Passing `None` to `datetime.fromtimestamp()` raises a `TypeError` at runtime. The same pattern repeats at lines 193-199 and 212-218 with `subscription_data.get("current_period_start")` / `subscription_data.get("current_period_end")`.
Consider using bracket access (`invoice["period_start"]`) to get a clear `KeyError` pointing at the root cause, or guard with an explicit `None` check before converting.
How can I resolve this? If you propose a fix, please make it concise.| elif event_type == "invoice.payment_failed": | ||
| # Handle payment failure -> auto-downgrade | ||
| invoice_obj = event["data"]["object"] | ||
| invoice_subscription_id = invoice_obj.get("subscription") | ||
|
|
||
| if invoice_subscription_id: | ||
| subscription = ( | ||
| db.query(UserSubscriptions) | ||
| .filter( | ||
| UserSubscriptions.stripe_subscription_id | ||
| == invoice_subscription_id | ||
| ) | ||
| .first() | ||
| ) | ||
|
|
||
| if subscription: | ||
| with db_transaction(db): | ||
| subscription.is_active = False | ||
| subscription.subscription_tier = "free" |
There was a problem hiding this comment.
Immediate downgrade on first payment failure
On any invoice.payment_failed event, the user is immediately downgraded to the free tier. Stripe typically retries failed payments multiple times (configurable in Stripe dashboard), and each retry emits another invoice.payment_failed event. Downgrading on the first failure means users lose access before Stripe has finished its retry cycle.
Consider checking the invoice's attempt_count or next_payment_attempt field and only downgrading after all retries are exhausted (i.e., next_payment_attempt is None).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 247-265
Comment:
**Immediate downgrade on first payment failure**
On any `invoice.payment_failed` event, the user is immediately downgraded to the free tier. Stripe typically retries failed payments multiple times (configurable in Stripe dashboard), and each retry emits another `invoice.payment_failed` event. Downgrading on the first failure means users lose access before Stripe has finished its retry cycle.
Consider checking the invoice's `attempt_count` or `next_payment_attempt` field and only downgrading after all retries are exhausted (i.e., `next_payment_attempt` is `None`).
How can I resolve this? If you propose a fix, please make it concise.| # Check if we're in test mode (skip signature verification for tests) | ||
| # Detect test mode by checking if pytest is running or if DEV_ENV is explicitly set to "test" | ||
| # We also check for 'test' in sys.argv[0] ONLY if we are NOT in production, to avoid security risks | ||
| # where a script named "test_something.py" could bypass auth in prod. | ||
| is_pytest = "pytest" in sys.modules | ||
| is_dev_env_test = global_config.DEV_ENV.lower() == "test" | ||
|
|
||
| # Only check sys.argv if we are definitely not in prod | ||
| is_script_test = False | ||
| if global_config.DEV_ENV.lower() != "prod": | ||
| is_script_test = "test" in sys.argv[0].lower() | ||
|
|
||
| is_test_mode = is_pytest or is_dev_env_test or is_script_test |
There was a problem hiding this comment.
Test mode auth bypass relies on multiple heuristics
The test mode detection combines three checks (is_pytest, is_dev_env_test, is_script_test) where any one being True disables JWT signature verification entirely. While the sys.argv check is gated behind DEV_ENV != "prod", the "pytest" in sys.modules check has no such guard — if pytest is accidentally in sys.modules in a production deployment (e.g., a test dependency pulled transitively), authentication is bypassed.
Consider making the bypass strictly depend on a single explicit signal (e.g., only DEV_ENV == "test") rather than ORing multiple heuristics together.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/api/auth/workos_auth.py
Line: 137-149
Comment:
**Test mode auth bypass relies on multiple heuristics**
The test mode detection combines three checks (`is_pytest`, `is_dev_env_test`, `is_script_test`) where any one being `True` disables JWT signature verification entirely. While the `sys.argv` check is gated behind `DEV_ENV != "prod"`, the `"pytest" in sys.modules` check has no such guard — if `pytest` is accidentally in `sys.modules` in a production deployment (e.g., a test dependency pulled transitively), authentication is bypassed.
Consider making the bypass strictly depend on a single explicit signal (e.g., only `DEV_ENV == "test"`) rather than ORing multiple heuristics together.
How can I resolve this? If you propose a fix, please make it concise.| subscription.stripe_subscription_id = subscription_id | ||
| subscription.stripe_subscription_item_id = subscription_item_id | ||
| subscription.is_active = True | ||
| subscription.subscription_tier = "plus_tier" |
There was a problem hiding this comment.
String literal instead of SubscriptionTier enum
This uses the string literal "plus_tier" while checkout.py uses SubscriptionTier.PLUS.value. Both currently resolve to the same string, but if the enum value changes, this file will silently diverge. The same issue occurs at lines 210, 241, and 265.
| subscription.subscription_tier = "plus_tier" | |
| subscription.subscription_tier = SubscriptionTier.PLUS.value |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 191
Comment:
**String literal instead of `SubscriptionTier` enum**
This uses the string literal `"plus_tier"` while `checkout.py` uses `SubscriptionTier.PLUS.value`. Both currently resolve to the same string, but if the enum value changes, this file will silently diverge. The same issue occurs at lines 210, 241, and 265.
```suggestion
subscription.subscription_tier = SubscriptionTier.PLUS.value
```
How can I resolve this? If you propose a fix, please make it concise.| def _try_construct_event(payload: bytes, sig_header: str | None) -> dict: | ||
| """ | ||
| Verify and construct the Stripe event using available secrets. | ||
|
|
||
| Uses the environment-appropriate secret first, then falls back to the | ||
| alternate secret if verification fails (helps when env vars are swapped). | ||
| """ | ||
|
|
||
| def _secrets() -> Iterable[str]: | ||
| primary = ( | ||
| global_config.STRIPE_WEBHOOK_SECRET | ||
| if global_config.DEV_ENV == "prod" | ||
| else global_config.STRIPE_TEST_WEBHOOK_SECRET | ||
| ) | ||
| secondary = ( | ||
| global_config.STRIPE_TEST_WEBHOOK_SECRET | ||
| if global_config.DEV_ENV == "prod" | ||
| else global_config.STRIPE_WEBHOOK_SECRET | ||
| ) | ||
| if primary: | ||
| yield primary | ||
| if secondary and secondary != primary: | ||
| yield secondary | ||
|
|
||
| if not sig_header: | ||
| raise HTTPException(status_code=400, detail="Missing stripe-signature header") | ||
|
|
||
| last_error: Exception | None = None | ||
| for secret in _secrets(): | ||
| try: | ||
| return stripe.Webhook.construct_event(payload, sig_header, secret) | ||
| except Exception as exc: # noqa: B902 | ||
| last_error = exc | ||
| continue |
There was a problem hiding this comment.
Fallback to secondary webhook secret weakens secret rotation
When verification with the primary secret fails, the code falls back to the secondary secret. This means that after a secret rotation, an attacker who obtained the old secret can still send valid webhook events until the old secret is fully removed from config. The comment says this "helps when env vars are swapped," but this could be addressed with proper deployment configuration rather than accepting two secrets at once.
If the dual-secret approach is intentional for zero-downtime rotations, consider at minimum logging a warning when the secondary secret is used so operators know the primary needs fixing.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 22-55
Comment:
**Fallback to secondary webhook secret weakens secret rotation**
When verification with the primary secret fails, the code falls back to the secondary secret. This means that after a secret rotation, an attacker who obtained the old secret can still send valid webhook events until the old secret is fully removed from config. The comment says this "helps when env vars are swapped," but this could be addressed with proper deployment configuration rather than accepting two secrets at once.
If the dual-secret approach is intentional for zero-downtime rotations, consider at minimum logging a warning when the secondary secret is used so operators know the primary needs fixing.
How can I resolve this? If you propose a fix, please make it concise.| except Exception as e: | ||
| logger.error(f"Error processing subscription webhook: {str(e)}") | ||
| raise HTTPException(status_code=500, detail=str(e)) |
There was a problem hiding this comment.
Exception detail exposed in webhook error response
str(e) is passed directly as the HTTP response detail, which can leak internal implementation details (stack traces, DB connection strings, etc.) to whoever sends the webhook request.
| except Exception as e: | |
| logger.error(f"Error processing subscription webhook: {str(e)}") | |
| raise HTTPException(status_code=500, detail=str(e)) | |
| logger.error(f"Error processing subscription webhook: {str(e)}") | |
| raise HTTPException(status_code=500, detail="Internal server error") |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/api/routes/payments/webhooks.py
Line: 275-277
Comment:
**Exception detail exposed in webhook error response**
`str(e)` is passed directly as the HTTP response detail, which can leak internal implementation details (stack traces, DB connection strings, etc.) to whoever sends the webhook request.
```suggestion
logger.error(f"Error processing subscription webhook: {str(e)}")
raise HTTPException(status_code=500, detail="Internal server error")
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
saasbranch intomain, preserving main's maintained infrastructure while bringing in SaaS-specific featuresMerge Strategy
alembic/,src/db/,src/api/,src/stripe/), new SaaS-specific filesconfig_models.py— combined both sets of Pydantic models (main'sFeaturesConfig/RedactionConfig+ saas's SaaS models)global_config.py— main's split YAML loading + relative imports + saas'sapi_base()/DB URI resolution/SaaS config fields (all optional)global_config.yaml— main's model names + redaction + saas's timeout/agent_chat/subscription/stripe/telegram/serverMakefile— main'simport_lint/citargets + saas's DB sectionpyproject.toml— main's newer deps + saas's SaaS deps + merged vulture excludesdspy_inference.py— main's fallback/retry/feature flags + saas's lazy init/streaming/timeoutdspy_langfuse.py— main's langfuse v3 API + saas's explicit trace context/call_id matchingTest plan
uv locksucceeds after pyproject.toml changesmake test_fastto ensure existing tests pass🤖 Generated with Claude Code