NPTEL Deep Learning – IIT Ropar Assignment Answers

1. The table below shows the temperature and humidity data for two cities. Is the data linearly separable?
a1q1

  • Yes
  • No
  • Cannot be determined from the given information
Answer :- a. Yes

2. What is the perceptron algorithm used for?

  • Clustering data points
  • Finding the shortest path in a graph
  • Classifying data
  • Solving optimization problems
Answer :- c. Classifying data The perceptron algorithm is a type of supervised learning algorithm used for binary classification tasks. It takes an input vector and assigns it to one of two possible categories or classes based on a linear combination of the input features and associated weights. The perceptron algorithm is particularly effective when the data is linearly separable, as it tries to find a hyperplane that can separate the two classes.

3. What is the most common activation function used in perceptrons?

  • Sigmoid
  • ReLU
  • Tanh
  • Step
Answer :- d. Step The step function is a type of activation function that takes an input and returns 1 if the input is greater than or equal to a threshold value, and 0 otherwise. It is one of the simplest activation functions used in early versions of perceptrons.

4. Which of the following Boolean functions cannot be implemented by a perceptron?

  • AND
  • OR
  • XOR
  • NOT
Answer :- XOR

5. We are given 4 points in R2 say, x1=(0,1),x2=(−1,−1),x3=(2,3),x4=(4,−5).Labels of x1,x2,x3,x4 are given to be −1,1,−1,1 We initiate the perceptron algorithm with an initial weight w0=(0,0) on this data. What will be the value of w0 after the algorithm converges? (Take points in sequential order from x1 to x)( update happens when the value of weight changes)

  • (0,0)
  • (−2,−2)
  • (−2,−3)
  • (1,1)
Answer :- (−2,−3)

6. We are given the following data:
a1q6
Can you classify every label correctly by training a perceptron algorithm? (assume bias to be 0 while training)

  • Yes
  • No
Answer :- b. No

7. Suppose we have a boolean function that takes 5 inputs x1,x2,x3,x4,x5? We have an MP neuron with parameter θ=1. For how many inputs will this MP neuron give output y=1?

  • 21
  • 31
  • 30
  • 32
Answer :- c. 31

8. Which of the following best represents the meaning of term “Artificial Intelligence”?

  • The ability of a machine to perform tasks that normally require human intelligence
  • The ability of a machine to perform simple, repetitive tasks
  • The ability of a machine to follow a set of pre-defined rules
  • The ability of a machine to communicate with other machines
Answer :- a. The ability of a machine to perform tasks that normally require human intelligence. Artificial Intelligence (AI) refers to the capability of machines or computer systems to perform tasks that typically require human intelligence, such as problem-solving, learning, reasoning, understanding natural language, and adapting to new situations. AI aims to create machines that can simulate human-like intelligence and behavior, enabling them to perform complex tasks and make decisions without direct human intervention.

9. Which of the following statements is true about error surfaces in deep learning?

  • They are always convex functions.
  • They can have multiple local minima.
  • They are never continuous.
  • They are always linear functions.
Answer :- They can have multiple local minima. Error surfaces in deep learning, also known as loss surfaces or cost functions, represent the relationship between the model's parameters (weights and biases) and the error (or loss) of the model on the training data. These surfaces are typically non-convex, meaning they can have multiple local minima, maxima, and saddle points. Local minima are points where the error is relatively low compared to the neighboring points, but they may not be the global minimum, which represents the best set of parameters for the model.

10. What is the output of the following MP neuron for the AND Boolean function?

y={1,0,if x1+x2+x3≥1 0, therwise

  • y=1 for (x1,x2,x3)=(0,1,1)
  • y=0 for (x1,x2,x3)=(0,0,1)
  • y=1 for (x1,x2,x3)=(1,1,1)
  • y=0 for (x1,x2,x3)=(1,0,0)
Answer :- a. y=1 for (x1,x2,x3)=(0,1,1) c. y=1 for (x1,x2,x3)=(1,1,1)

11. What is the range of the sigmoid function σ(x)=1/1+e−x?

  • (−1,1)
  • (0,1)
  • −∞,∞)
  • (0,∞)
Answer :- (0, 1) The sigmoid function σ(x) = 1 / (1 + e^(-x)) outputs values between 0 and 1. As x approaches positive infinity, the value of σ(x) approaches 1, and as x approaches negative infinity, the value of σ(x) approaches 0. Therefore, the range of the sigmoid function is between 0 and 1, but it never actually reaches 0 or 1.

12. What happens to the output of the sigmoid function as |x| very small?

  • The output approaches 0.5
  • The output approaches 1.
  • The output oscillates between 0 and 1.
  • The output becomes undefined.
Answer :- The output approaches 0.5 As the absolute value of x becomes very small (close to 0), the exponential term e^(-x) in the sigmoid function becomes very close to 1. As a result, the denominator of the sigmoid function (1 + e^(-x)) becomes approximately 2. This leads to the output of the sigmoid function approaching 1/2, which is 0.5. So, as |x| becomes very small, the output of the sigmoid function approaches 0.5.

13. Which of the following theorem states that a neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function?

  • Bayes’ theorem
  • Central limit theorem
  • Fourier’s theorem
  • Universal approximation theorem
Answer :- Universal approximation theorem The Universal Approximation Theorem states that a feedforward neural network with a single hidden layer containing a finite number of neurons (units) can approximate any continuous function to arbitrary accuracy, given a sufficiently large number of neurons in that hidden layer. This theorem highlights the powerful approximation capabilities of neural networks.

14. We have a function that we want to approximate using 150 rectangles (towers). How many neurons are required to construct the required network?

  • 301
  • 451
  • 150
  • 500
Answer :- 301

15. A neural network has two hidden layers with 5 neurons in each layer, and an output layer with 3 neurons, and an input layer with 2 neurons. How many weights are there in total? (Dont assume any bias terms in the network)

Answer :- 50

16. What is the derivative of the ReLU activation function with respect to its input at 0?

  • 0
  • 1
  • −1
  • Not differentiable
Answer :- Not differentiable

17. Consider a function f(x)=x3−3x2+2. What is the updated value of xafter 3rd iteration of the gradient descent update, if the learning rate is 0.10.1 and the initial value of x is 4?

Answer :- 1.85,1.95

18. Which of the following statements is true about the representation power of a multilayer network of sigmoid neurons?

  • A multilayer network of sigmoid neurons can represent any Boolean function.
  • A multilayer network of sigmoid neurons can represent any continuous function.
  • A multilayer network of sigmoid neurons can represent any function.
  • A multilayer network of sigmoid neurons can represent any linear function.
Answer :- A multilayer network of sigmoid neurons can represent any continuous function. This statement reflects the universal approximation theorem, which states that a feedforward neural network with a single hidden layer containing a finite number of sigmoid (or similar activation function) neurons can approximate any continuous function to arbitrary accuracy, given a sufficiently large number of neurons in the hidden layer.

19. How many boolean functions can be designed for 3 inputs?

  • 65,536
  • 82
  • 256
  • 64
Answer :- 256

20. How many neurons do you need in the hidden layer of a perceptron to learn any boolean function with 6 inputs? (Only one hidden layer is allowed)

  • 16
  • 64
  • 16
  • 32
Answer :- 64

21. Which of the following statements about backpropagation is true?

  • It is used to optimize the weights in a neural network.
  • It is used to compute the output of a neural network.
  • It is used to initialize the weights in a neural network.
  • It is used to regularize the weights in a neural network.
Answer:- a

22. Let y be the true class label and p be the predicted probability of the true class label in a binary classification problem. Which of the following is the correct formula for binary cross entropy?

Answer:- b. −(ylogp+(1−y)log(1−p))/

23. Let yi�� be the true class label of the i�-th instance and pi�� be the predicted probability of the true class label in a multi-class classification problem. Write down the formula for multi-class cross entropy loss.

Answer:- c. −∑Mc=1yo,clog(po,c)

24. Can cross-entropy loss be negative between two probability distributions?

  • Yes
  • No
Answer:- b

25. Let p� and q� be two probability distributions. Under what conditions will the cross entropy between p� and q� be minimized?

  • p=q
  • All the values in p� are lower than corresponding values in q�
  • All the values in p� are lower than corresponding values in q�
  • p� = 0 [0 is a vector]
Answer:- a

26. Which of the following is false about cross-entropy loss between two probability distributions?
It is always in range (0,1)
It can be negative.
It is always positive.
It can be 1.

Answer:- a, b

27. The probability of all the events x1,x2,x2….xn
in a system is equal(n>1
). What can you say about the entropy H(X)
of that system?(base of log is 2)

  • H(X)≤1
  • H(X)=1
  • H(X)≥1
  • We can’t say anything conclusive with the provided information.
Answer:- c

28. Suppose we have a problem where data x
and label y
are related by y=x4+1
. Which of the following is not a good choice for the activation function in the hidden layer if the activation function at the output layer is linear?

  • Linear
  • Relu
  • Sigmoid
  • Tan−1(x)
Answer:- a

29. We are given that the probability of Event A happening is 0.95 and the probability of Event B happening is 0.05. Which of the following statements is True?

  • Event A has a high information content
  • Event B has a low information content
  • Event A has a low information content
  • Event B has a high information content
Answer:- c, d

30. Which of the following activation functions can only give positive outputs greater than 0?

  • Sigmoid
  • ReLU
  • Tanh
  • Linear
Answer:- a

Leave a Comment