COM6018 Data Science with Python

Datasets vs Databases

Data scientists often talk about dataset; this should not be confused with a database.

A dataset is a structured collection of data, typically stored in a single file.
A database is a broader concept, typically a collection of datasets, stored in a database management system (DBMS).

Used for different purposes:

A database will typically be designed to allow efficient querying, i.e. recalling information, cross-referencing items, etc.
Data scientists are more often interested from learning something by using the whole dataset.

e.g. compare 'what was the weather yesterday?' with 'what will the weather be tomorrow?'