The answer about One Hot Encoding is that this is the structure that is often used for training labels when we have a multi-class classification problem; these labels are vectors of a dimension which is the number of classes being classified, and a position in the vector has value of only One or Zero, only one position will have One, and that position will represent the class of that training example. So, for example, if the task is to recognize digits, then the training label for the digit 7 would be 0000001000, while the training label for the digit 3 would be 0010000000.