Files
PowderCoatingLogix/scripts/Prismatic Data Scraper/README.md
T
spouliot 843d1c3c51 Add token-authenticated catalog import API endpoint
POST /PowderCatalog/ImportApi accepts the JSON scrape format in the request
body, authenticated by a shared secret in the X-Import-Token header (matched
constant-time against CatalogImport:Token), with the vendor in X-Vendor-Name.
Runs through the same ImportJsonAsync -> shared upsert as the manual upload, so
the offline PrismaticSync tool can push unattended.

ImportJsonAsync refactored to take a Stream (the form upload now passes
file.OpenReadStream()). Endpoint is AllowAnonymous + IgnoreAntiforgeryToken
(it's token-gated, not cookie-auth) and returns 401 until a token is configured,
so it's inert by default. README updated with the route + token wiring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-18 11:35:30 -04:00

87 lines
3.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PrismaticSync
A standalone .NET console tool that scrapes the Prismatic Powders catalog and pushes it into the
Powder Coating Logix catalog import endpoint. It exists because Prismatic has **no API** (unlike
Columbia Coatings) — so the data has to be scraped via browser automation.
> **Runs on a workstation you control — never on the deployed app server.** Scraping from the cloud
> app's IP would get blocked and isn't appropriate. This tool is deliberately *not* part of
> `PowderCoating.sln`; build and run it independently.
## First-time setup (per machine)
```powershell
cd "scripts/Prismatic Data Scraper"
dotnet build
pwsh bin/Debug/net8.0/playwright.ps1 install chromium # one-time browser download
```
## Commands
```powershell
dotnet run -- run # default: discover-new + scrape (new + stale >30d) + push
dotnet run -- discover-new # cheap: find newly-added colors (newest-first, stops at known)
dotnet run -- discover-full # heavy: crawl all color filters (reconcile whole set / removals)
dotnet run -- scrape # scrape product pages from product-urls.txt (resumable)
dotnet run -- scrape --refresh-older-than=30 # also re-scrape products older than 30 days (price changes)
dotnet run -- push # push prismatic_powders.json to the import endpoint
```
Options: `--max-products=N`, `--retry-errors`, `--headed` (show the browser for debugging).
Everything streams to the console live (warnings/errors in color) **and** to `prismatic-sync.log`.
## Operating model (suggested cadence)
| Run | Command | Cadence | Why |
|-----|---------|---------|-----|
| Find new colors | `run` (does discover-new + scrape-new) | Weekly | Cheap; Prismatic adds colors often |
| Price refresh | `scrape --refresh-older-than=30` then `push` | Monthly | Re-scrapes stale products to catch price changes (slow, ~hours) |
| Full reconcile | `discover-full` then `scrape` | Quarterly | Catches removed/discontinued colors |
A full scrape of ~5,000 products takes hours (polite delays). It saves after every product and is
fully resumable, so stop/restart any time.
## Politeness / anti-block
Configurable in `appsettings.json`: randomized 614s base delay, an escalating **cooldown + retry on
403** (so a temporary block doesn't get you hard-banned mid-run), and a periodic long rest. Leave
these conservative — getting blocked is worse than being slow, and Prismatic is a partner.
## Pushing into the app
Set in `appsettings.json`:
- `Sync.Import.EndpointUrl``https://<your-app>/PowderCatalog/ImportApi`
- `Sync.Import.Token` → the same secret as the app's `CatalogImport:Token` config
The tool POSTs the JSON with an `X-Import-Token` header (and `X-Vendor-Name: Prismatic Powders`) to
that endpoint, which authenticates the token and runs the records through the same upsert as the
Columbia sync. If the endpoint/token isn't configured here, `push` is skipped and you upload
`prismatic_powders.json` manually via the Powder Catalog admin page instead.
> **App side:** set `CatalogImport:Token` in the web app's config (Azure App Setting in prod). The
> endpoint returns 401 until a token is set, so it's inert by default.
## Scheduling (Windows Task Scheduler)
Point a scheduled task at the published exe (or `dotnet run`). Example weekly task command:
```
Program/script: C:\Tools\PrismaticSync\PrismaticSync.exe
Arguments: run
Start in: C:\Tools\PrismaticSync
```
Publish a self-contained build to drop on the workstation:
```powershell
dotnet publish -c Release -r win-x64 --self-contained false -o C:\Tools\PrismaticSync
pwsh C:\Tools\PrismaticSync\playwright.ps1 install chromium
```
## The long game
This is the interim path. The durable endgame is a real Prismatic **API** (the partnership), at which
point this tool is replaced by a clean in-app sync like Columbia's — reusing the same upsert,
propagation, and discontinued handling.