Responses

1)

A) Problem Statement:

Diabetes is a very prevalent disease around the world, with over 30.3 million diabetic individuals in the United States alone. A disease known as diabetic retionopathy, although uncommon, has a number of negative consequences including color blindness, blurry vision, and a loss of vision altogether. This disease is characterized by the damage it causes to the back of the eye (also known as the the fundus). By taking images of the fundus, scientists are able to learn whether or not an individual has diabetic retinopathy. By creating a convolutional neural network, scientists can feed in training and testing images to teach a network how to accuractly predict whether an individual has diabetic retinopathy based on photographs of the fundus. Although this disease is rare, it still has a number of debilitating consequences. By training this convolutional neural network and using other machine learning methods, scientistis have the ability to accuratley diagnose patients with diabetic retinopathy and treat them before it is too late. Not only will the images allow scientists to distinguish whether an individual has the disease or not, but also the severity of it. Determining the serverity of this diesease is a common struggle for scientists because of the amount of time and energy it takes to analyze each image. By training convolutional neural networks the system will be able to do some of the intense analyzing that humans are typically doing.

C) Future planning:

I will create a poster that includes a portion for the introduction of the disease, another portion for the introduction of a convolutional neural network (and what it is used for in this study), a portion for the data used, conclusion of the study (how accurate the network was), and another portion for some of the code I used and how the network was created and performed.

D:

1)

The optimizer used for this exercise was the RMSprop optimizer. This optimizer was paired with a high learning rate (lr = 0.001) in order to help the model perform better. The RMSprop optimizer works by doing two things: help keeps a moving average of the square gradients and then divdies the gradients by the square root of the average from the first step. The RMSprop optimizier is similar to Adagrad which is another adaptive learning rate algorithim. Adagrad works slighly differently from RMSprop, in that is keep a running sum of the squared gradients. Then, the learning rate is divided by the running sum of the squared gradients.

2)

The loss function used for the cats and dogs dataset was the binary cross-entropy loss function. This is the most common lost function used for binary datasets. This loss function works by returning high loss values for bad predicitons and low loss values for good predictions. As each prediction is made, the loss function usually goes down because the predictions are getting better and better with each guess. With a binary dataset, the probability of the true class goes to one (with the loss being 0). This can be done by taking the negative log. By taking the negative log, we are able to get a positive value for the loss.

3)

There is a metric= argument in our model.compile() function which is very important in judging the performance of a specific model. The accuracy metric works to calculate how often predictions equal labels. The accuracy metric has two variables (total and count) which work to figure out how many times y_pred matches y_true (binary accuracy). This binary accuracy then divides the total by count.

4)

Screen Shot 2020-07-21 at 8 35 54 AM

Screen Shot 2020-07-21 at 8 36 08 AM

The first graph shows that both the training and validation accuracy were going up for the most part, meaning that the models accuracy did go up for each epoch. In the second graph (loss), it is evident that the validation line (blue) is going up, which is a clear sign of overfitting. However, in the loss graph, the training data did perform pretty well and the loss did go down for the most part.

5)

CATJPEG

Screen Shot 2020-07-20 at 5 00 44 PM

CAT2!

Screen Shot 2020-07-20 at 5 02 01 PM

CAT3

Screen Shot 2020-07-20 at 5 03 51 PM

dogimage1

Screen Shot 2020-07-20 at 5 05 49 PM

DOGIMAGE2

Screen Shot 2020-07-20 at 5 06 39 PM

FUNNYDOG

(This one was just for fun)

Screen Shot 2020-07-20 at 5 08 12 PM

dogyay

Screen Shot 2020-07-20 at 5 08 48 PM

It seems that the model was able to get most of the images right, but it had a hard time distinguishing cats. The model got 2 out of the 3 cats right. I think that this model is able to understand dogs for the most part, but has trouble understanding all of the features for cats. This problem seems that it could be possible since the validation set did show overfitting. It is important to decrease this overfitting so that the model performs better. One way to fix this would be to increase the number of training images inputted in to the model, as well as use image augmentation. That way, the model can get a better understanding of what cats look like so it can perform better when put in practice.