StockVis i onz

How StockVisionz Works

An interactive guide to the algorithmic trading research platform — from searching a ticker to ML-powered predictions, explained simply.

Explore the Workflow
Scroll to explore
0
ML Models
0
Leakage Checks
0
API Endpoints
0
Training Window (days)
0
Years of Data
The Big Picture

From Ticker Search to Predictions

StockVisionz follows a clear pipeline. Each step feeds the next — nothing is skipped, nothing is guessed.

🔍

Search

User types a ticker

📥

Ingest

Pull price & news data

📊

Compute

Calculate indicators

🧠

Train

Run ML models

🧪

Validate

Walk-forward testing

📈

Visualize

Dashboard results

Interactive Demo

Watch a Ticker Get Analyzed

Press the button below to simulate what happens when you search "AAPL" — step by step.

🔎 Simulating: AAPL

1

User searches "AAPL" on Dashboard

The frontend sends a request to the backend API

2

Check if AAPL exists in the database

If new → full backfill from 2016. If existing → fetch only missing days.

3

Pull price data from Yahoo Finance

Open, High, Low, Close, Volume — stored in time-series database

4

Compute technical indicators

RSI, MACD, Bollinger Bands, ATR, moving averages — all saved

5

Fetch news & run sentiment analysis

Alpaca News API + FinBERT scores each headline positive/negative

6

Build ML feature matrix

Combine price + indicators + sentiment into one table for ML models

7

User triggers ML training

Job queued → Worker picks it up → Walk-forward validation runs

8

Results appear on dashboard

Accuracy, Sharpe ratio, equity curve — compare all models side by side

Under the Hood

How Each Stage Works

Let's unpack what's happening at every step of the pipeline.

📥

Data Ingestion

The Foundation of Everything

When you search a ticker, the system pulls years of daily price data from Yahoo Finance — the entire trading history downloaded into our database.

  • New tickers: backfill from January 2016
  • Existing tickers: only fetch missing days
  • Time-series database optimized for fast queries
  • News articles fetched and scored for sentiment
📊

Technical Indicators

Turning Prices Into Signals

Raw prices aren't enough. We compute mathematical indicators that traders use to spot momentum, trends, and volatility patterns.

  • RSI (14-day) — Overbought or oversold?
  • MACD — Is momentum shifting?
  • Bollinger Bands — Price volatility
  • ATR — Average daily price range
  • Volume Ratio — Unusual activity?
🧠

ML Feature Matrix

What the Models Actually See

All indicators and price data combine into a single table. Each row = one trading day. Each column = a signal the model learns from.

  • Lagged returns: 1, 2, 5, and 10 days ago
  • All features shifted by 1 day — no future peeking
  • Target: will the stock go UP or DOWN tomorrow?
  • This is the critical "no cheating" step
⚙️

Job Queue System

How Training Gets Managed

Click "Train" and a job enters the queue. A background worker picks it up and trains the model. You see progress in real time.

  • Queued → Picked Up → Training → Completed
  • Multiple workers run safely (no conflicts)
  • Live progress via Server-Sent Events
  • Failed jobs automatically retried
🧪

Walk-Forward Validation

Testing Like the Real World

We test exactly how a model would perform in real life: train on the past, predict the future. The window slides through time.

  • Train on 252 days, test on 21 days
  • Slide forward 21 days, repeat
  • 1-day purge gap prevents data leaking
  • Scaler only learns from training data
📈

Dashboard & Results

Making Sense of It All

Interactive charts, ML experiments, backtesting, news sentiment, and drift monitoring — all unified in one dashboard.

  • Compare all 9 model types side by side
  • Equity curves show strategy performance
  • Drift monitor catches market changes
  • AI Insight Bot answers your questions
System Architecture

How Data Flows Through the System

Trace a request from your search to live results. Each color is a layer.

Simple Advanced ↗
👤 You Search "AAPL"
Next.js Dashboard
API Routes
1 Backend Processing
GET /api/ohlcv
Ingestion Pipeline
Yahoo Finance API
OHLCV + Indicators
TimescaleDB
Feature Matrix View
2 ML Training (on demand)
You Click "Train"
POST /api/jobs
Job Queue
Worker Polls
Model Pipeline
Walk-Forward + Leak Check
Metrics Computed
Results Saved
Dashboard Updates Live ✨
The Brain

9 ML Models, One Question

"Will this stock go up or down tomorrow?" — each model takes a different approach to answering it.

📏

Ridge Regression

Simple linear baseline

Baseline
📐

Logistic Regression

Classic up/down classifier

Baseline
🎯

SVM

Finds the best boundary

Traditional
🌲

Random Forest

Votes from many trees

Traditional
🚀

XGBoost

GPU-powered boosting

GPU Required
🔮

LSTM

Deep learning sequences

Deep Learning
🕹️

DQN

RL agent learns to trade

Reinforcement
🎮

PPO

Policy gradient agent

Reinforcement
🤖

A2C

Actor-Critic agent

Reinforcement

🏆 Fair Comparison Guaranteed

All 9 models train on the exact same data splits — same dates, same windows. They share a comparison group ID so you can view them side by side.

Accuracy
Sharpe Ratio
CAGR
Max Drawdown
Testing Integrity

Walk-Forward Validation Visualized

Train on the past, test on the future, slide forward. This is how the model would actually be used in real life.

Training (252 days ≈ 1 year)
Purge Gap (1 day)
Testing (21 days ≈ 1 month)

Each row is one "fold." The window slides forward by 21 days each time, so the model is always tested on data it has never seen — exactly like real trading.

Safety First

6 Guardrails Against Cheating

The #1 risk in stock ML is "data leakage" — accidentally letting the model see the future. These 6 automated checks run on every single training job.

🛡️1

Window Integrity

Training always ends before testing starts, with a purge gap in between.

2

Feature Lag Verification

All features shifted by 1 day — the model never sees "today's" data when predicting tomorrow.

🎯3

Target Alignment

The prediction target is confirmed to be the next day's actual result — no future information leaks in.

⚖️4

Scaler Isolation

Data normalization fit only on training data. Test data is never used to compute the scaler.

🚫5

No Future Data in Training

Database queries validated — no row from the future sneaks into the training set.

🔒6

Cross-Window Contamination

Test sets across different folds never overlap — no data point is tested twice.

Built With

Technology Stack

The tools and frameworks powering StockVisionz.

Frontend

Next.jsFramework
React + RechartsUI & Charts
Tailwind CSSStyling
Framer MotionAnimations
ClerkAuthentication

Backend & ML

PythonCore Language
XGBoost + CUDAGPU Training
PyTorchLSTM Models
Stable-Baselines3RL Agents
FinBERTSentiment AI

Infrastructure

Neon PostgresCloud Database
TimescaleDBTime-Series
SentryError Tracking
Gemini 2.5AI Insights Bot
NVIDIA CUDAGPU Compute