The Lumin Archive
Data Science & Simulation — Analysis, Modelling, Visualisation & Real-World Systems - Printable Version

+- The Lumin Archive (https://theluminarchive.co.uk)
+-- Forum: The Lumin Archive — Core Forums (https://theluminarchive.co.uk/forumdisplay.php?fid=3)
+--- Forum: Computer Science (https://theluminarchive.co.uk/forumdisplay.php?fid=8)
+---- Forum: Data Science & Simulation (https://theluminarchive.co.uk/forumdisplay.php?fid=27)
+---- Thread: Data Science & Simulation — Analysis, Modelling, Visualisation & Real-World Systems (/showthread.php?tid=100)



Data Science & Simulation — Analysis, Modelling, Visualisation & Real-World Systems - Leejohnston - 11-13-2025

Data Science & Simulation — Analysis, Modelling, Visualisation & Real-World Systems

Data Science combines mathematics, computing, and statistics to extract meaningful insights from data. 
Simulation models recreate real-world systems so we can study behaviour, make predictions, and test scenarios.

This thread introduces the fundamentals in a practical, beginner-friendly way.

-----------------------------------------------------------------------

1. What Is Data Science?

Data Science involves:
• collecting data 
• cleaning and organising data 
• analysing trends 
• building models 
• visualising results 
• making predictions 

Example applications:
• predicting weather 
• analysing stock markets 
• detecting fraud 
• modelling disease spread 
• evaluating scientific experiments 

-----------------------------------------------------------------------

2. Types of Data

Numerical data: numbers 
• continuous (temperature) 
• discrete (counts)

Categorical data: colours, labels, categories 

Time series data: data measured over time 

Text data: natural language, documents 

Understanding data type → choose the right analysis method.

-----------------------------------------------------------------------

3. Essential Tools of Data Science

Popular tools:
• Python 
• NumPy (numerical computing) 
• pandas (data tables) 
• Matplotlib / Seaborn (visualisation) 
• SciPy (scientific computing) 
• scikit-learn (machine learning) 
• Jupyter notebooks 

Python is the most widely used language for data science.

-----------------------------------------------------------------------

4. The Data Science Workflow

1. Collect data 
• sensors, simulations, databases, surveys 

2. Clean data 
• remove errors 
• fill missing values 
• normalise formats 

3. Analyse 
• summary statistics 
• correlations 
• distributions 

4. Model 
• regression 
• classification 
• clustering 
• forecasting 

5. Visualise 
• graphs 
• heatmaps 
• scatter plots 
• dashboards 

6. Interpret 
• conclusions 
• insights 
• recommendations 

-----------------------------------------------------------------------

5. Basic Statistical Concepts

• mean, median, mode 
• range, variance, standard deviation 
• correlation 
• probability distributions 
• confidence intervals 

Understanding these supports every analysis.

-----------------------------------------------------------------------

6. Regression — Predicting Continuous Values

Regression finds the best-fit relationship between variables.

Example:
Predicting house prices from:
• size 
• number of rooms 
• location 

The model learns:
Code:
price = a(size) + b(rooms) + c

-----------------------------------------------------------------------

7. Classification — Predicting Categories

Used for:
• spam vs non-spam 
• disease vs healthy 
• yes/no decisions 
• image recognition 

Common models:
• logistic regression 
• decision trees 
• random forests 
• support vector machines 

-----------------------------------------------------------------------

8. Data Visualisation

Visuals help reveal patterns.

Useful graphs:
• line charts (trends over time) 
• bar charts (comparisons) 
• histograms (distribution) 
• scatter plots (correlation) 
• box plots (spread & outliers) 

Good visualisation = clear structure + correct axes + readable labels.

-----------------------------------------------------------------------

9. Simulation — Modelling Real Systems

Simulations recreate complex behaviour so we can test ideas safely.

Examples:
• orbital mechanics 
• population dynamics 
• chemical reactions 
• traffic flow models 
• weather and climate systems 
• disease spread (SIR models) 
• physics engines in games 

Simulations help explore “what if?” scenarios.

-----------------------------------------------------------------------

10. Types of Simulation Models

Deterministic: same input → same output every time 
(e.g., classical physics)

Stochastic: includes randomness 
(e.g., Monte Carlo simulations)

Agent-based models: individual agents follow rules 
(e.g., modelling animals, people, robots)

Differential equation models: describe continuous systems 
(e.g., population growth, disease spread)

-----------------------------------------------------------------------

11. Monte Carlo Simulation

Monte Carlo methods use random sampling to estimate results.

Used in:
• finance 
• physics 
• nuclear engineering 
• AI 
• risk assessment 

Example:
Estimating π by random points inside a circle.

-----------------------------------------------------------------------

12. Common Mistakes in Data Science

❌ Not cleaning the data 
✔ poor data = poor model

❌ Confusing correlation with causation 
✔ correlation ≠ cause

❌ Overfitting the model 
✔ model memorises instead of generalising

❌ Using incorrect visualisations 
✔ wrong graph = misleading

❌ No test/train split 
✔ leads to inflated accuracy

-----------------------------------------------------------------------

13. Practice Questions

1. Name two differences between regression and classification. 
2. What is a Monte Carlo simulation used for? 
3. Why is data cleaning important? 
4. Explain one advantage of using Python for data science. 
5. Give an example of deterministic vs stochastic simulation. 

-----------------------------------------------------------------------

Summary

This post covered:
• data types 
• tools 
• workflow 
• regression 
• classification 
• visualisation 
• simulation theory 
• Monte Carlo 
• common mistakes 
• practice questions 

Data Science & Simulation form the bridge between raw information and deep understanding — essential for research, engineering, AI, and scientific discovery.