-
Kevin L. Ferguson posted a new page, on the site Writing @ Queens 4 months, 3 weeks ago
I Write Digitally; Therefore I Am [2021]
Are you a Zoombie? Have you left your apartment today? Do you miss the social roulette of wondering which coworker you will randomly bump into on your way to the coffee […] -
Kevin L. Ferguson posted a new page, on the site College Writing: Reading Film 5 months, 3 weeks ago
Conference groups, with updated calendar:
M 9/14 A
W 9/16 B
M 9/21 C
W 9/23 D
T 9/29 A
W 9/30 B
M 10/5 C
W 10/7 D
W 10/14 A
M 10/19 B
W 10/21 C
M 10/26 D
W 10/28 A
M 11/2 B
W 11/4 C
M 11/9 D
W […] -
Kevin L. Ferguson wrote a new post, Final, on the site Digital Literary Methods 1 year, 9 months ago
Here are some terms I’ll ask you to define on the final:
corpus
cluster analysis
MFW
OCR
CSV
PCA
n-gram
edges
nodes
dendrogram
topic modelling (bag of words)
stop words
collocation
directed vs. […] -
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 9 months ago
Why do you think you got slightly different results with these two methods? I’m somewhat surprised there are completely separate groups, but I like that you connected this to your other oppose visualization, using the two to reflect against each other.
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 9 months ago
Lol, nodes are the points, edges are the connections. In our case, “undirected” (being equal in both directions).
You don’t have to size the nodes, but it might help see the relationships. It’s sort of like in a dendrogram when you can measure the distance between items to see their distance . . . a larger-sized mode means it has higher…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 9 months ago
I think this is an interesting data set. I’d be curious to see how the role of different speechwriters plays an effect, since the SOTUs are heavily “scripted” (as an aside, looking at transcripts of presidents’ unrehearsed comments would also be illuminating).
I think our assumption would be that presidents should cluster together based on…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 9 months ago
I’m curious about your expectations for comparing Twain and Dickens. Were you looking for national differences? And what would testing Jane Austen help you see?
Why do you think Austen is more like Dickens than Twain? And why is Twain more like Dickens than Austen? And do those results contradict each other? It’s unclear to me how you’re…[Read more]
-
Kevin L. Ferguson wrote a new post, Cluster Analysis, 100MFW, various syllabi groups, on the site Digital Literary Methods 1 year, 10 months ago
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
I think you’re absolutely right to reflect on how making decisions about dicing your corpus sort of determines what exactly you are hoping to test. It also is the moment where our preconceived ideas about literary style can sneak in . . . such as the relative importance of nationality, gender, etc. Looking at protagonist gender is interesting, and…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
I wish I could see your preferred/avoided words better–I’m wondering if they were surprising to you or if anything stood out from that list?
You should also run oppose() again and select “markers” in order to get the plots pointed in those two overlapping shapes, with plusses for your test set.
One challenge for interpreting your results is…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
It’s interesting to see some of your assumptions be revealed around the “consistency” so to speak of a male authorial voice over time.
You’re correct about interpreting the PCA space: the closer to zero, the more similar in usage of the preferred word list. One thing that would make it easier to interpret would be to remove the leading “f_” so…[Read more] -
Kevin L. Ferguson wrote a new post, Some notes on Zeta / oppose(), on the site Digital Literary Methods 1 year, 10 months ago
This method identifies words that are distinct between two corpora; which words one prefers and avoids.
This method will always find difference, so have to tread lightly.
Can compare author to author, group of […] -
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
Don’t forget that you can also add the stop-words back in to see if that would make a difference in terms of the results you see. Along those lines, I bet “footnote” is something you could remove to try to get better results, since it is clearly skewing your y-axis.
It seems like you’re seeing nations spread across the x-axis. Is there any…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
You need to add citations for any material you are quoting or paraphrasing.
I wanted to see you better explain your topics, and if you think you had enough or too many to gain insights into your corpus.
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
I think you may have only done this with stopwords turned on? Don’t forget that this approach, while fine, is also excluding the actual most frequent words.
I’m guessing the word “gutenberg” is so prominent since it’s part of the boilerplate language for most of your corpus? “TM” and work/works are also likely in the same group, and should be…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
I’m not sure I understand the difference between the first and second visualization. What did you do differently there?
Don’t forget that you should run these results both with and without stop words, to see if your argument holds at both levels.
I was hoping you could do better in explaining what you see along the two principal component…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
I hope you find PCA in R easier! Not as many options, but much less fiddly with final results.
The results you got here, though, look very compelling and ordered. The pink and green clusters are rather distinct, as is the more dominant upper left group. Looking at MFW words, clearly the use of “the” is a good predictor of how to separate the…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
Did you try adding stop words in? I wonder if you would notice the same results at both levels, particularly the question of why King Arthur stands out (and what distinguishes fairy tales from history).
I was hoping you could do a better job explaining what you see with the two principal components. Is principal component 1 capturing the…[Read more]
-
Kevin L. Ferguson commented on the page, on the site Digital Literary Methods 1 year, 10 months ago
I’m not sure I understand what you the two principal components are. In the third visualization, there is a strong effect on the Y dimension, but only a little variance on the x-axis. How do you explain that? I wonder why “bones” is so important in accounting for difference? You note how it and “captain” are unique outliers, but I want to know…[Read more]
-
- Load More