Skip to content

Instantly share code, notes, and snippets.

@FrankRuns
Created January 9, 2025 12:28
Show Gist options
  • Save FrankRuns/5d76117d2117f2b05f9de0bcf7d351ba to your computer and use it in GitHub Desktop.
Save FrankRuns/5d76117d2117f2b05f9de0bcf7d351ba to your computer and use it in GitHub Desktop.
Prompt to get python script experimenting with jittering input training data for an ML model.
I want a Python script demonstrating a simple approach for “shaking up” that historical data. Specifically, show me how to:
Load the Boston Housing dataset (or a similar publicly available dataset).
Split the data into training and test sets.
Add a small amount of random noise (jitter) to the training set features.
Train one linear regression model on the unmodified data and another on the jittered data.
Compare the MSE (Mean Squared Error) of each model on the same test set.
For the jitter, just use a normal distribution with a small standard deviation, something like 0.01. Then show me how the MSE differs between the original and jittered data. If the jittered version yields a lower MSE, let me know in the script output. If it’s worse, let me know that, too.
Nothing too fancy, just enough that I can make a point about how “bad data” might become surprisingly helpful when we own the uncertainty and inject it. And please include some print statements that display the MSEs. That’s it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment