516 Big data and web architecture
Understand how the growth of digital data is shaping how web systems are designed, managed, and scaled.
Overview
In this topic, we explore how the rise of big data is transforming web architecture. From streaming platforms to recommendation engines, modern web applications increasingly depend on processing large volumes of data. Students examine how this trend affects the structure and design of web systems, with a focus on metadata, data mining, and the challenges of managing continuous data streams.
Targets
In this topic, students learn to:
Define big data and identify its sources in web applications
Describe the role of metadata in filtering, sorting, and categorising online content
Explain how data mining is used in personalisation and analytics
Understand how streaming services deliver large volumes of data in real time
Recognise how big data influences scalability and system design
Syllabus references
What is big data?
Big data refers to datasets that are too large or complex to be processed using traditional methods. Web platforms such as YouTube, Spotify, or Amazon handle enormous volumes of data that must be stored, retrieved, and analysed efficiently. These systems are designed to scale horizontally and support rapid access to large datasets in real time.
The role of metadata
Metadata is data about data. In web applications, it is used to:
Describe content (e.g. title, genre, creator, tags)
Improve search and filtering
Power recommendation engines
For example, video platforms use metadata to sort videos by category, popularity, or publication date.

Metadata helps web applications categorise and retrieve content efficiently, supporting personalised search and user experiences.

Data mining in web systems
Data mining is the process of finding patterns in large datasets. In web applications, it is used to:
Recommend products, songs, or videos based on user behaviour
Identify trends and predict demand
Detect anomalies or potential security issues
Websites often analyse usage patterns to improve performance or tailor content to individual users.
Streaming service management
Streaming platforms must deliver data continuously with low delay. This affects web architecture by requiring:
Data buffering to handle network interruptions
Content Delivery Networks (CDNs) to reduce load times
Load balancing to distribute requests evenly across servers
Real-time data transmission presents challenges for latency, bandwidth, and storage.
Summary
Big data influences how web systems are designed and scaled. Metadata makes content searchable and customisable, data mining helps systems learn from behaviour, and streaming services require fast, distributed infrastructure. Understanding these forces is critical for building scalable, data-driven web applications.
Last updated
Was this helpful?