NPTEL Business Intelligence & Analytics Week 1 And 2 Assignment Answers 2025

1. What does KDD stand for in the context of data mining?
a) Key Data Development
✅ b) Knowledge Discovery from Data
c) Knowledge Data Design
d) Key Data Distribution

Explanation:
KDD stands for Knowledge Discovery from Data, which is the process of identifying valid, novel, useful patterns in data.

2. Which of the following does NOT belong to the data preparation phase?
a) Data cleaning
b) Data integration
✅ c) Data visualization
d) Data selection

Explanation:
Data visualization is part of the analysis/presentation phase, not data preparation.

3. Which type of data is specifically classified as unstructured?
a) A table of employee records with defined columns
✅ b) Free-text comments collected from customer surveys
c) Data logged in sequential order with timestamps
d) Numerical data organized in a consistent matrix

Explanation:
Free-text data lacks a pre-defined data model, making it unstructured.

4. In real world applications, data can often be a mixture of:
a) Only structured data
b) Only unstructured data
✅ c) Structured, semistructured, and unstructured data
d) Structured and unorganized data

Explanation:
Real-world data sources are varied, often combining multiple data types.

5. A car company has collected sales data, customer feedback, and production reports over the past five years in various formats. To analyze total sales by region and model across this period, the data must undergo ______________ to consolidate, aggregate, and restructure it for meaningful analysis.
a) Data integration
b) Data normalization
✅ c) Data transformation
d) Data cleaning

Explanation:
Data transformation adapts various formats for unified analysis.

6. Which of the following is an example of how online platforms like Amazon enhance the long tail phenomenon discussed in the lecture?
a) Offering only best-selling items in stock
✅ b) Providing recommendations for less popular books based on user preferences
c) Limiting the variety of products available for faster shipping
d) Only stocking products that have been successful in physical stores

Explanation:
Amazon promotes niche products that cater to diverse preferences—core to the long tail effect.

7. As discussed in the lecture, in the context of analytics, which of the following terms best represent “DSS”?
a) Data Storage Systems
b) Digital Software Solutions
✅ c) Decision Support Systems
d) Data Security Services

Explanation:
DSS helps in decision-making using data analysis.

8. What does ETL stand for in the context of data management?
✅ a) Extract, Transform, Load
b) Evaluate, Transfer, Load
c) Extract, Test, Load
d) Evaluate, Transform, Load

Explanation:
ETL refers to the process of extracting data, transforming it for analysis, and loading it into a storage system.

9. As discussed in the lecture, which of the following terms is NOT usually associated with the vocabulary of analytics?
a) Data Warehousing
b) Data Governance
c) Online Analytical Processing (OLAP)
✅ d) Knowledge Discovery in Databases (KDD)

Explanation:
While KDD is related to data mining, it is broader and not a direct term under analytics vocabulary.

10. What is an outlier in a data set?
a) A data object that conforms to the general behavior of the data
✅ b) A data object that does not comply with the general behavior or model of the data
c) A common occurrence within the data set
d) A data point that represents average behavior

Explanation:
Outliers deviate significantly from other observations.

11. Which of the following is an example of supervised learning?
a) Grouping data points into clusters based on inherent similarities
✅ b) Diagnosing diseases based on labeled medical data
c) Discovering patterns in data without predefined categories
d) Organizing items into groups based on unknown attributes

Explanation:
Supervised learning uses labeled data for prediction or classification.

12. What is the primary function of a data warehouse?
a) Increases the security of data
✅ b) Integrates data from multiple sources
c) Optimizes search engine results
d) Analyzes data in real-time

Explanation:
Data warehouses consolidate information for reporting and analysis.

13. In the context of business intelligence, what is the role of clustering?
a) To create data warehouses
✅ b) To group customers based on their similarities
c) To enhance data visualization
d) To manage real-time data

Explanation:
Clustering is an unsupervised learning technique to group similar data points.

14. How does data mining contribute to business intelligence?
a) By transforming raw data into meaningful information
✅ b) By providing insights from historical and current data
c) By ensuring the security of sensitive data
d) By automating data storage processes

Explanation:
Data mining uncovers patterns and trends for better decisions.

15. In the context of data mining, which of the following best describes the nature of relational databases?
✅ a) They store highly structured data with predefined attributes and semantic meaning
b) They store unstructured data such as images, audio, and text
c) They dynamically adjust to various data types and structures without predefined constraints
d) They are specifically designed for organizing large amounts of multimedia content

Explanation:
Relational databases use structured schemas and tables for organizing data.

NPTEL Business Intelligence & Analytics Week 2 Assignment Answers

1. Data warehouses provide __________ tools for interactive analysis of multidimensional data of varied granularities.
a) Data mining
✅ b) Online Analytical Processing (OLAP)
c) Transaction processing
d) Data visualization

Explanation:
OLAP tools support complex queries and multidimensional analysis in data warehouses.

2. What was the primary business problem faced by AT&T Long Distance?
a) Inefficient telemarketing campaigns
✅ b) Difficulty in acquiring new customers
c) Lack of technology for data analysis
d) Insufficient funding for marketing

Explanation:
AT&T struggled with acquiring new customers efficiently using traditional methods.

3. What is the purpose of data cleaning and integration techniques in the construction of a data warehouse?
a) To enhance the speed of data retrieval
✅ b) To ensure consistency in naming conventions, encoding structures, and attribute measures
c) To permanently delete irrelevant data
d) To store data in multiple formats for redundancy

Explanation:
Cleaning and integration help maintain uniformity across diverse data sources.

4. Which of the following best describes the “nonvolatile” nature of data in a data warehouse?
a) Data is constantly changing and requires frequent updates
✅ b) Data remains stable and is not subject to regular deletions or modifications
c) Data is always accessed for real-time transactions
d) Data is stored temporarily and can be easily altered

Explanation:
Once data is entered in a warehouse, it remains unchanged for analysis purposes.

5. An OLTP system usually adopts an _________ data model.
a) Hierarchical
✅ b) Entity-Relationship
c) Object-Relational
d) Network

Explanation:
OLTP systems typically use ER models for efficient transaction processing.

6. What is the primary characteristic of OLAP system access patterns?
a) Frequent updation and live alteration to the data
b) Real-time transaction processing
✅ c) Primarily read-only operations
d) Atomic transactions

Explanation:
OLAP focuses on data analysis, which involves mostly read operations.

7. Why is it not recommended to process complex OLAP queries on operational databases?
a) It can increase the risk of security breaches
b) It can lead to data inconsistencies
✅ c) It can significantly degrade the performance of transactional operations
d) Incompatible with operational data structures

Explanation:
Running analytical queries can slow down the performance of transactional systems.

8. Which of the following is a key function of back-end tools and utilities in a data warehouse system?
a) Query optimization
✅ b) Data extraction and transformation
c) User interface design
d) Data visualization

Explanation:
Back-end tools manage data ingestion and transformation processes.

9. What is a major challenge associated with data loading in large-scale data warehouses?
a) Ensuring data consistency across multiple sources
✅ b) Managing distributed loading and performance optimization
c) Preventing data loss during the loading process
d) Ensuring data privacy and security during the loading process

Explanation:
Coordinating high-volume data loads without performance degradation is complex.

10. What is the primary difference between an enterprise data warehouse and a data mart?
a) Enterprise data warehouses are more complex to implement and need more technical expertise than data marts
b) Data marts are primarily used for strategic decision-making, while enterprise data warehouses are for tactical decisions
✅ c) Enterprise data warehouses provide a comprehensive view of the entire organization’s data, while data marts focus on specific business areas
d) Data marts are more expensive to maintain than enterprise data warehouses

Explanation:
Enterprise warehouses are organization-wide; data marts are department-specific.

11. What is a potential drawback of using a virtual warehouse?
✅ a) Increased load on operational databases
b) High implementation costs
c) Limited scalability and performance
d) Difficulty in integrating data from multiple sources

Explanation:
Virtual warehouses depend on live data access, impacting operational systems.

12. Consider a large-scale healthcare organization with multiple hospitals and clinics. What are the primary benefits of implementing a centralized database management system (DBMS) to manage patient records, medical history, and billing information?
✅ a) All of the above

Improved data consistency and accuracy
Enhanced data security
Efficient data retrieval and analysis

Explanation:
A centralized DBMS ensures uniformity, security, and better data-driven decisions.

13. A person transferred ₹2000 to his friend via a UPI application to contribute to a weekend trip. Which ACID property ensures that the transaction is either fully completed or completely rolled back?
✅ a) Atomicity
b) Consistency
c) Isolation
d) Durability

Explanation:
Atomicity ensures transactions are all-or-nothing.

14. What is the purpose of normalization in a snowflake schema?
a) To improve query performance
✅ b) To reduce data redundancy
c) To increase data security
d) To simplify data loading and transformation

Explanation:
Normalization minimizes duplication by organizing data into related tables.

15. Which data warehouse schema is typically more efficient for querying due to its simplified structure and less number of joins?
✅ a) Star schema
b) Snowflake schema
c) Both A & B are equally efficient
d) It depends on the specific query workload

Explanation:
Star schemas are flatter and involve fewer joins, making queries faster.