Waving Robot

I really wanted this arm to be sticking out the side of my window:

But I first tried to use the feature classifier to tell if people are walking towards or away, which really did not work at all. It could barely even tell if people were there or not. I spent a lot of time getting a large dataset before even really testing if it would work, so I wasted a lot of time on that.

So then I switched to a KNN classifier mixed with posenet, to try to classify people walking in's poses. I think that might've worked with a bigger dataset, but I tried with around 150 examples for each category, and it just wasn't reliable, so I downsized again to it just recognizing me in my room. Hopefully I can extend it to outside, because I think it would be cool to have a robot that waves at people. I think I'll just hack it together rather than using a knn classifier, but for this project I had to train a model.

I actually used an example that didn't use p5js, because the example that I was using was not registering the poses properly when I was using it outside, and I had written my program around that example, so I just kept using it inside.

I first was going to use a raspberry pi or an arduino in conjunction with the browser, somehow?, and I tried a lot of different ways of doing that and got kinda close but it was just really complicated, so I actually used one of the worst hacks I've ever done. I got the browser to draw either a black or white square based on if someone was waving or not, and then I used a light sensor on the arduino to detect it and wave if the square was white. But if it works, it works, I guess.

PS. Sorry for the delay on this post.


I was really interested in training the computer specific images and using them in a different application to see what the computer would recognize. For the application I decided to use clouds as a field for the computer to recognize any images that it was trained to. I was inspired by a work Golan showed earlier in the semester called Cloud Face. This got me thinking about all the possibilities of things humans do recognize when looking up at the sky vs what a computer could/wouldn't be able to recognize through machine learning training.


I worked with @meh on this project.

We decided to do something simple, and came up with the idea to use ml5.js' feature extraction and send that prediction data through to Space Invaders, so that a person could play the game with just their hand. We used a 2-dimensional image regressor, tracking both the orientation of the hand (to move the spaceship) and the thumb's position (to trigger the blaster). We also modified a version of Space Invaders written in JavaScript that we found online into a p5.js environment (the code was provided from here and here.

Unfortunately, we weren't able to figure out how to save/load the model, so all of our code had to be on the same sketch, with the game mode being triggered by a specific key press after the training of the machine was over. Because of this, it caused the game mode to lag (severely), but it is functional and still playable. If we had more time on this project, we would have tried to port our ML training values into a game of our own making, but we are still somewhat satisfied with the overall result.


MoMar- SituatedEye

Is the glass half-full or half-empty?

Well, it depends on the weather! If it's raining in Pittsburgh, it's half-empty. If there's nice weather, it's half full.  Basically, depending on the weather the reader will either have a pessimist or optimist look on the world.

The machine learning algorithm is looking for three states: Full, Half, and Empty. Depending on the state, it will ask the Openweather API the state of the weather. If the weather is bad, the machine learning system will say that the Glass is half empty, else it will say that it is half full.

I suggest the viewer to train the model using a dark liquid because it would be easier for the computer to differentiate between the bottle and the background.

Originally, I wanted to do face tracking but then I realized that having to retrain ML5 over and over again would prove too repetitive, instead  I decided to make something that would tell me how full a bottle of water is.

When I was working on the accuracy of my project, I modified Professor Levin's variant of the ml5 p5.js classifier so it would tell me what items it would see. For example, if I were to train the model using three different labels, it would give me all the labels in order of accuracy. The most accurate label would appear first, followed by the least accurate and so on.

At one point, I was trying to have a trained model load upon runtime because I didn't want to constantly retrain the classifier. Unfortunately, I couldn't get it to work in time.




ilovit-SituatedEye have to train it yourself when you start)

I created a game where the player is protecting themselves from some unknown entities that are trying to gain passage through the window. The game state is communicated to the player entirely through audio, while they interact with the game solely by opening and closing the window.

The player will periodically hear a whistling noise, indicating an oncoming assailant. If they close the window in time, the assailant will be stopped with a crash. If they fail to close it in time, the room will become a bit more "stuffy" - indicated by a static noise in the background (this is effectively player health). If the window remains closed, the room also becomes slowly stuffier, while an open window will let the air out. There is currently no lose condition. The static noise just becomes loud and annoying.

I think the basic interaction is fun, but various variables aren't tuned quite right. My original intent was to have the play sitting down doing something else, and then have to get up to open and close the window every once in a while, but I couldn't figure out a good way of encouraging the player to sit down (one thought is that this game is to be played while trying to do other things), and the sound that indicates an imminent assailant is too short for someone to get up and close the window in time. I should also have some kind of instructions for the player explaining how the whole thing works.


The Situated Psychic Eye

I am fascinated by fortune tellers and the idea of a "psychic eye"  (I don't buy the "psychic" one bit) but the elements of incredibly detailed observation and building from the 'telees' cues is interesting enough on its own. Using the computer vision to create an accurate bodily and verbal cue reader, was ever so slightly out of scope for this piece, but I wanted to continue with the idea of a psychic gaze. So, I created a tarot card reading set up, which incorporates the fun, a bit ridiculous, and somewhat mysterious air of telling the future.

My initial, and continued intent is to train the computer to recognize all 76 cards, however as I have yet to find a way to successfully load pre-stored images and upload the images taken, I scaled down slightly to save sanity when every time the program is restarted the cards are re-scanned.

The set up in a physical space I think is critical to creating the air of mystic, and I pulled a dirty trick in projecting onto a surface by zooming the unnecessary elements out of frame.( This would not be ideal for a final system set up.)

The program itself works beautifully to recognize the different cards, now I just need to figure out how to save and upload the training so that I can implement the entire deck.

Program Link Here




It's a bird! It's a plane! It's a drawing canvas that attempts to identify whether the thing you're drawing is a bird or a plane.

I am not too happy with the result of this project because it's basically a much worse version of Google Quick Draw. Initially, I wanted to use ml5 to make a model that would attempt to categorize images according to this meme:

I would host this as a website where users who stumble across my web page would upload images to complete the chart and submit it for the model to learn. However, I ran into quite a few technical difficulties, including but not exclusive to:

  • submitting images to a database.
  • training the model on newly submitted images.
  • making the model persistent. The inability to save/load the model was the biggest roadblock to this idea.
  • cultivating a good data set in this way.

My biggest priority for choosing a project idea was mainly finding a concept where the accuracy of the model wouldn't obstruct the effectiveness of its delivery, so something as subjective as this meme was a good choice.  However, I had to pivot to a much simpler idea that could work on the p5.js web editor due to all the problems that came with the webpage-on-Glitch approach. I wanted to continue with this meme format, but again, issues with loading/submitting images made me pivot to using drawings, instead. With the drawing approach, the meme format no longer made sense, hence the transformation of the labels to bird/plane.

I don't have much to say about my Google Quick Draw knockoff besides I'm mildly surprised by how well it works even with the many flaws of my data set.

Some images from the dataset:

An image of using the canvas:

A video:

zapra – situated eye

Eye tracker drawing

For this project, I wanted to use machine learning to create a device that explored subtle changes in the eyes. Using a two dimensional image regressor, most of my development involved exploring the nuances of how I interacted with the tracker rather than the code itself. While my original intention was a tracker that would allow the user to draw with their eyes, I spent a lot of time experimenting with how to collect data points, my proximity to the camera, and the range and speed of how I moved my eyes.

View code

Process / Early Iterations

Knowing I wanted to make a program detecting eye movements, I experimented with pupil dilation, lying, and smile lines by recording myself as I performed different tasks. While I was interested in the concept of detecting nuances expressions in people's eyes, I felt the observable changes were too subtle to detect for the scope of this project.

Before adding specific points of reference for the training set, I added samples by clicking in the vicinity of where I was moving. This helped me while I was starting out but did not produce the most accurate results.

An arc I drew with my feet


For greater precision and a method for other users to reproduce the tracker, I created a series of points to use during the training set with indicators of when each point had at least 30 samples.


My final setup involved a large monitor screen, a precise webcam, and a number keypad to train the program. The larger screen allowed for a greater sense of eye movement for the program to track, and the keypad let me train the set without having to glance at my laptop and disrupt eye movement.

sovid – Situated Eye

Sketch can be found here.

To use:
Toggle the training information by tapping 'z'.
Toggle the hand instructions by tapping 'm'.

For this project, I was interested in creating a virtual theremin, where much like an actual theremin, the positions that a user's right hand makes controls the note on the scale, and a slight wiggle controls the vibrato. I used the adapted Image Classifier to train my program on the hand positions, and looked at a point tracker by Kevin McDonald to track the hand for the vibrato. My main issue was finding good lighting and backgrounds to make the image classifier work reliably - I made a lot of strange sets and stands to make it work, so it's a very location-based project.


Last year, a visiting guest in the studio mentioned that they consider many of our interactions with smart assistants quite rude and that these devices reinforce an attitude of barking commands without giving thanks. I think back to this conversation every so often, and ponder to what extent society anthropomorphizes technology. In this project I decided to flip the usual power dynamic of human and computer. The artificial intelligence generally serves our commands and does nothing (other than ping home and record data) when not addressed. Simon says felt like a fun way to explore this relationship by having the computer give the human commands, and chide us when we are wrong. I also decidedly made the gap between commands short as a way to consider how promptly we expect a response from technology. I would say this project is fun to play. My housemate giggled as the computer told him he was doing the wrong motions. However, one may not consider the conceptual meaning of the dynamic during the game. Another issue I ran into during development is that when trained on more than three items, the network rapidly declined in accuracy. In the end, I switched to training KNN on PoseNet data, which worked significantly better. There are still a few tiny glitches, but the basic app works.

New debug with less poses
Old debug with way too many params