By Jonas Daugirdas.
I presume that most of our readers are quite familiar with the Lithuanian – language Catholic newspaper Draugas, founded in 1909 as a weekly publication by a consortium of priests in Wilkes Barre, PA, and moved to Chicago a few years later. The Marian Fathers of the Immaculate Conception took over administration of the newspaper soon thereafter, and in 1916, Draugas transitioned from a weekly to a daily, and was published 6 times per week. A careful recounting of the history of Draugas was published in the journal Lituanus in 2009 by Danguolė Kviklys. The article was reprinted with permission on the Draugas website and readers interested in the paper’s history can access the information there.
Initial impetus and efforts to digitize Draugas
Two hundred fifty thousand pages of text, published over a span of now 109 years; that’s what is now included in the searchable Draugas digital archive. We first decided to explore the idea of creating such a digital archive about 10 years ago, after examining the Web-archived pages of a newspaper called The Scotsman, which has been publishing since 1817. Each issue of The Scotsman is searchable, opening up a treasure trove of data from the past. “If The Scotsman can do it,” we thought, “why can’t we?” Furthermore, a nice feature of Draugas is, that an almost complete set of the issues, going back to the initial publication date on July 12, 1909, is maintained at the Draugas publishing offices, with a similar set archived at the nearby Lithuanian Research and Studies Center. This greatly simplifies the task of creating an archive with few missing components.
Initial price quotes to disbind these heavy volumes, photograph them, and then to apply optical character recognition to the resulting images, allowing word search, were in the range of $275,000–$350,000. Where would we find such a large amount of money? In the first decade of the new millennium, there was some interest in digitizing local newspapers in Chicago, and under the previous editor, Dalia Cidzikaitė, a small grant was obtained to microfilm a number of years of Draugas that were not otherwise available in the microfilm archives of U.S. libraries. However, with the financial crisis of 2008, these funds dried up and the project had to be abandoned. Furthermore, between 2008 and 2014, the newspaper was facing an existential crisis: personnel expenses were high and the number of subscribers was decreasing (along with that of most other print subscriptions). As a result, the capital in the Draugas Foundation, an endowment fund set up to help defray operating losses, had dwindled to an alarmingly low level. As a result, there were no funds available to fund an ambitious digitization project.
Epaveldas.lt at the National Library of Lithuania and the Chronicling America project
There were two other players engaged in the archiving of Draugas. The first was the Martynas Mažvydas National Library of Lithuania. With support from European Union structural funds, the library was able to create a “Virtual Electronic Heritage System” with a web portal www.epaveldas.lt. The goal was to photograph and display online images of many different types of historical Lithuanian publications, including periodicals. Draugas signed an agreement with the library to give it full rights to display contents of the newspaper, and the library was able to acquire images of many issues of Draugas published over a range of years, working from microfilms obtained from ALKA, (Amerikos Lietuvių Kultūros Archyvas), based out of Putnam, CT, as well as from original copies of the newspaper archived in various libraries across Lithuania. The links to the epaveldas.lt archive of issues of Draugas and its cultural supplement Mokslas, Menas, ir Literatūra (MML) can be found on the Draugas home page. Although the epaveldas collection of Draugas issues is about 80% complete, it has its shortcomings. Many issues and some entire years are missing from the critical wartime period around WWII. The images, of variable quality, are presented as relatively low resolution .JPG files, and they are not digitized, and so do not allow one to search them for words, phrases, or names. Also, in their database, each page of the newspaper is a separate image file. So, while the epaveldas Draugas archive is invaluable, one needs to know exactly in what issue and on what page a story resides to be able to find it.
The second player was the U.S. Federal Government, and specifically, the National Endowment for the Humanities (NEH), working with the U.S Library of Congress. Together they created the “Chronicling America” project with a goal of transferring to the Web a searchable, digital archive of historical local newspapers published in the United States. This project includes some papers published in languages other than English. The NEH issues grants on a yearly basis, partnering with nonprofit institutions, usually universities, in various states to allow for a good regional mix of papers to be digitized. After some lobbying on our part, the NEH’s partner in Illinois, the University of Illinois at Champaign – Urbana, decided to include digitization of several years of Draugas in their most recent grant application, which was funded. The main problem with the Chronicling America project with respect to Draugas was its limited scope. The project is limited to digitizing newspapers published prior to 1923 (for copyright reasons), and the regional partnership approach and budgetary limitations limited the digitization done by the University of Illinois to five years of publication, 1917–1922.
How a retired information technology specialist provided a much needed boost
While the project was on the back burner, we came across a news story on the Web about a retired information technology specialist, Tom Tryniski, from Fulton, NY, who had a strong interest in history and who loved to read old newspapers. Tryniski transformed a gazebo on the deck of his home into a computer server center, equipped with two state of the art microfilm scanners and multiple computers. He used this equipment center, which was largely self-funded, to digitize an important newspaper in his local area, the Fulton Patriot. Tom began to digitize newspapers in 1999, and at last count, has digitized more than 50 million pages, posting the results on his website, www.fultonhistory.com. You can read more about this unique individual in a news story by Alexandria Neason published in February, 2018 in the Columbia Journal Review, www.cjr.org/the_profile/tom-tryniski-fultonhistory.php.
Draugas on microfilm
What piqued our interest in seeing if we could digitize Draugas economically was the realization that many of the issues were already stored on microfilm at various libraries. The three principal libraries that had substantial numbers of microfilms were the U.S. Library of Congress (1940–1946, and 1958–2008), the Chicago – based Center for Research Libraries (1920–1938 and some from the 1940s and 50s), and Kent State University Library in Ohio (mostly 1911–1919). The latter two graciously allowed us to borrow their microfilms for free, while the U.S. Library of Congress had an $85.00 per roll fee for duplication.
Starting small with the WWII issues
We decided to focus first on the WWII issues of Draugas, from 1939–1946. Most of these issues were missing from the epaveldas site, and the Library of Congress (LOC) seemed to have many microfilms from this era. In 2016, with the help of some seed money from the Lithuanian Foundation, we were able to duplicate the reels that the LOC had available, and then Tom Tryniski graciously offered to scan and digitize them for a very nominal fee. Tryniski’s advice and expertise in scanning and digitizing microfilms was of invaluable assistance. Although optical character recognition (OCR) programs are available that easily can deal with the special characters (ą č ę ė į š ų ū ž) in Lithuanian text, many microfilms have problems related to pages being overexposed and bleached out, or too dark, especially around the edges, so that the background blends in with the text. In such cases, the OCR can become confused, resulting in largely unreadable text. In addition to digitizing the films for a pittance, Tom’s experience allowed us to “save” many pages that otherwise would have resulted in very poor word recognition. Unfortunately, there were years where many issues were missing, and for two eventful years in particular, 1940 and 1941, when deportations and carnage was occurring in Lithuania, many issues from key months were just not available.
Extending the collection
In 2017, with the help of larger grants from the Lithuanian Foundation as well as the Oak Tree Philanthropic Foundation, we were able to duplicate a much larger set of microfilms from the Library of Congress and with Tom’s help, digitize them. We then outsourced the onerous task of assembling the individual page images into issues, so that the resulting .PDF files would be in a format of one issue per file. The data also had to be compressed down in a way that the issue was still easily readable, but the file size was within a size range that could be easily manipulated, downloaded, and emailed. With the help of James Simon, Vice-President for Collections and Services at the Center for Research Libraries, and Elizabeth Richardson, librarian at Kent State, we obtained microfilms and some originals that we digitized and added to the collection.
This year we began dealing with the problem of missing issues. There were 40,000 pages that were either not available on microfilm or the microfilms were of exceedingly poor quality, including many key issues from the WWII era. This time we relied on our own archive and sent our bound, archived copies to be microfilmed. This stage of the project became more expensive, but fortunately, additional financial support was provided by our previous funders, including the Lithuanian Foundation and Oak Tree Philanthropic Foundation, Lithuanian Catholic Religious Aid and the Lithuanian National Foundation, as well as by new donors, including the Lithuanian Catholic Academy of Science, the Ateitis Foundation and Dr. Jonas Prunskis. After the batch of 40,000 pages was added, we subsequently identified an additional 2,000 pages that needed to be photographed, and these are now being added. With each cycle, the number of pages that needs to be added decreases, and by next year, we hope to have an archive that is more than 99% complete.
Leveraging Google site search
The format of our digitized files is word-searchable .PDF files. In these files, a separate text layer is added over the image layer with the recognized words. Such files are detected by Google and other web crawlers, and are automatically indexed after being posted on a website. Google also offers to nonprofit foundations a very low-priced option to create “custom search engines,” which basically are Google search tools that are limited to one or more specified directories. This option is extremely useful, because with a denominator of 250,000 pages, almost all search results over the entire archive become unmanageably large.
We used this option to set up approximately 140 custom search engines, one for each year of publication, and some that limited searches to the content in the cultural supplement.
Author/title searchable database for Mokslas, Menas, ir Literatūra
There are approximately 3600 issues of the Draugas weekly cultural supplement, Mokslas, Menas, ir Literatūra (MML), which was begun in 1949 and continues to this day. Each supplement contains about 8 articles, so there are 29,000 or so articles written by leading Lithuanian American intellectuals of the day. From 1974 onward, in the last MML issue of each year, a topic-organized index for that year was included. No such indexes existed for the MML issues published in 1949–1974. We parsed each issue from this time period and extracted authors, titles and page numbers, and assigned a topic to each article. We then combined this indexing information with end-ofyear indexes of MML from 1974–2018. All of this data is being transferred to a single large MySQL database, which will allow readers to quickly pull up all articles written by a given author or on a given topic.
Some wrinkles and surprises
We found that the volume numbers assigned to the Draugas issues over the years were not in order. The editors could not decide for a long time if volume 1 was in 1909, when the weekly started, or in 1916, when the daily began. In some years a volume number was not incremented for the following year. In other cases, there were sudden jumps.
We were confronted by an irate reader who complained that we were not posting all of the pages. She had looked up an obituary for her relative in a 1920 issue that she found archived at the Balzekas museum. When she looked up the issue of that same 1920 date on our website, the obituary was not there. We checked, and all of the pages (4 at the time) seemed to be there. What was the explanation? Apparently for the latter part of 1920 and the first half of 1921, Draugas published both a City Edition and a Country Edition. The online files that we posted were from the Center for Research Libraries microfilms, and all were labeled Country Edition. But in the bound archive stored at the Draugas publishing offices, all of the issues were labeled City Edition. And they were markedly different! Sure enough, the missing obituary was present in the City Edition but not in the Country Edition. So, we sent our bound 1920 and 1921 issues to be microfilmed, and for those years, there are often two different issues of the paper on the same date.
In November of 2013, we launched Draugas News, targeting readers who no longer could easily read Lithuanian. Turns out, that our group of directors was not the first to have this bright idea. Way back in 1946–1947, Draugas first published an English language supplement. As far as we can tell, this effort began on September 6, 1946 and continued until June 5, 1947. The 4-page supplements were published on Fridays. Draugas tried an English language supplement for the second time 50 years later. This second effort began on July 6, 1996 and continued until January 25, 1997. The 6-page supplements were published on Saturdays. A separate subscription fee ($60/year) was planned, but apparently, the response was suboptimal. In total, 12 such issues were published.
Helping the archive come alive
Publishing such an enormous archive is one thing. Making it relevant, useful, and interesting to a broad readership is another. On the Draugas website (www.draugas.org) we routinely publish selected articles from the current issue of the newspaper. As of this summer, we are republishing articles from the archive, often from an issue that had been published many years ago but in the same calendar month. These republished archival stories can be reached via www.draugas.org/category/archyvas/. Included are reports of how the Lithuanian American Opera and the Dainava ensemble were founded in displaced persons camps in Germany, an article listing the Lithuanian miners who perished in a mine explosion in 1913 in Courtney, PA, a discussion of why many Lithuanians in Chicago escaped death during the capsizing of the USS Eastland in 1915, and more. On reading these archived pages, one is struck by the grit and inventiveness and fortitude of Lithuanian Americans from the generations of our parents and grandparents, and the cultural richness and talent apparent in our forebears. We hope that this digital archive of Draugas will help our readers better understand those Lithuanian Americans who came before us. By the way, access to the archive from 1909 through 2007 is free of charge, and we signed a sharing agreement to allow the Lithuanian National Library to post the searchable PDF files on their epavelas website as well. Access to the searchable PDF files for issues of Draugas from 2008 onwards requires a paid subscription to the newspaper.
There are a number of other Lithuanian American newspapers out there where archives are still available, scattered among various libraries, including those of Dirva, Darbininkas, Keleivis, Naujienos, and others. The stored print archives are in many cases in poor condition, and we believe that similar digitization projects, ideally in collaboration with the Lithuanian National Library and/or the Lithuanian Research Studies Center, would make an invaluable contribution to preserving our Lithuanian American heritage for future generations to understand and enjoy.
Note added in proof: In the article above, it was mentioned that the Lithuanian Research and Studies Center (LRSC), soon to be relocating to Lemont, IL, also maintains an extensive collection of bound copies of Draugas. Their help in creating the digital archive was invaluable, in that they had two issues from the 1910-1917 period which were not available in the bound collection at the Draugas publishing offices. One of these was availabe on microfilm, but it was of poor quality. For the other year, the LRSC originals were the only version available. The LRSC generously lent us their material, which we microfilmed.
Index to the entire archive.
(Note, the MML – Kultūra supplement links, as well as links to the English 1946 and 1996 supplements, are comingled with links to the regular issues, and should be accessed using the “by year” pages. Beginning in 2008, we posted the regular issues and the MML-Kultūra issues in separate directories.
The 2008-current issues are password-protected and restricted to current Draugas subscribers.
Draugas Search link, 1909–2007 (better to search by each year on the individual year pages)
Draugas search link, 2008–2018 (results issues are password-protected for subscribers only)
MML Kultūra author – article title – topic database (currently 2008-2017 articles only)
Other archives of Draugas (epaveldas, Chronicling America, UIUC):
Searchable archives for Draugas News and Lithuanian Heritage
(open to subscribers only):
Here’s another photo of Tom Tryniski, whose help was invaluable in establishing the Draugas digital archive.