![]() |
|
Data Science & Simulation — Analysis, Modelling, Visualisation & Real-World Systems - Printable Version +- The Lumin Archive (https://theluminarchive.co.uk) +-- Forum: The Lumin Archive — Core Forums (https://theluminarchive.co.uk/forumdisplay.php?fid=3) +--- Forum: Computer Science (https://theluminarchive.co.uk/forumdisplay.php?fid=8) +---- Forum: Data Science & Simulation (https://theluminarchive.co.uk/forumdisplay.php?fid=27) +---- Thread: Data Science & Simulation — Analysis, Modelling, Visualisation & Real-World Systems (/showthread.php?tid=100) |
Data Science & Simulation — Analysis, Modelling, Visualisation & Real-World Systems - Leejohnston - 11-13-2025 Data Science & Simulation — Analysis, Modelling, Visualisation & Real-World Systems Data Science combines mathematics, computing, and statistics to extract meaningful insights from data. Simulation models recreate real-world systems so we can study behaviour, make predictions, and test scenarios. This thread introduces the fundamentals in a practical, beginner-friendly way. ----------------------------------------------------------------------- 1. What Is Data Science? Data Science involves: • collecting data • cleaning and organising data • analysing trends • building models • visualising results • making predictions Example applications: • predicting weather • analysing stock markets • detecting fraud • modelling disease spread • evaluating scientific experiments ----------------------------------------------------------------------- 2. Types of Data Numerical data: numbers • continuous (temperature) • discrete (counts) Categorical data: colours, labels, categories Time series data: data measured over time Text data: natural language, documents Understanding data type → choose the right analysis method. ----------------------------------------------------------------------- 3. Essential Tools of Data Science Popular tools: • Python • NumPy (numerical computing) • pandas (data tables) • Matplotlib / Seaborn (visualisation) • SciPy (scientific computing) • scikit-learn (machine learning) • Jupyter notebooks Python is the most widely used language for data science. ----------------------------------------------------------------------- 4. The Data Science Workflow 1. Collect data • sensors, simulations, databases, surveys 2. Clean data • remove errors • fill missing values • normalise formats 3. Analyse • summary statistics • correlations • distributions 4. Model • regression • classification • clustering • forecasting 5. Visualise • graphs • heatmaps • scatter plots • dashboards 6. Interpret • conclusions • insights • recommendations ----------------------------------------------------------------------- 5. Basic Statistical Concepts • mean, median, mode • range, variance, standard deviation • correlation • probability distributions • confidence intervals Understanding these supports every analysis. ----------------------------------------------------------------------- 6. Regression — Predicting Continuous Values Regression finds the best-fit relationship between variables. Example: Predicting house prices from: • size • number of rooms • location The model learns: Code: price = a(size) + b(rooms) + c----------------------------------------------------------------------- 7. Classification — Predicting Categories Used for: • spam vs non-spam • disease vs healthy • yes/no decisions • image recognition Common models: • logistic regression • decision trees • random forests • support vector machines ----------------------------------------------------------------------- 8. Data Visualisation Visuals help reveal patterns. Useful graphs: • line charts (trends over time) • bar charts (comparisons) • histograms (distribution) • scatter plots (correlation) • box plots (spread & outliers) Good visualisation = clear structure + correct axes + readable labels. ----------------------------------------------------------------------- 9. Simulation — Modelling Real Systems Simulations recreate complex behaviour so we can test ideas safely. Examples: • orbital mechanics • population dynamics • chemical reactions • traffic flow models • weather and climate systems • disease spread (SIR models) • physics engines in games Simulations help explore “what if?” scenarios. ----------------------------------------------------------------------- 10. Types of Simulation Models Deterministic: same input → same output every time (e.g., classical physics) Stochastic: includes randomness (e.g., Monte Carlo simulations) Agent-based models: individual agents follow rules (e.g., modelling animals, people, robots) Differential equation models: describe continuous systems (e.g., population growth, disease spread) ----------------------------------------------------------------------- 11. Monte Carlo Simulation Monte Carlo methods use random sampling to estimate results. Used in: • finance • physics • nuclear engineering • AI • risk assessment Example: Estimating π by random points inside a circle. ----------------------------------------------------------------------- 12. Common Mistakes in Data Science ❌ Not cleaning the data ✔ poor data = poor model ❌ Confusing correlation with causation ✔ correlation ≠ cause ❌ Overfitting the model ✔ model memorises instead of generalising ❌ Using incorrect visualisations ✔ wrong graph = misleading ❌ No test/train split ✔ leads to inflated accuracy ----------------------------------------------------------------------- 13. Practice Questions 1. Name two differences between regression and classification. 2. What is a Monte Carlo simulation used for? 3. Why is data cleaning important? 4. Explain one advantage of using Python for data science. 5. Give an example of deterministic vs stochastic simulation. ----------------------------------------------------------------------- Summary This post covered: • data types • tools • workflow • regression • classification • visualisation • simulation theory • Monte Carlo • common mistakes • practice questions Data Science & Simulation form the bridge between raw information and deep understanding — essential for research, engineering, AI, and scientific discovery. |