philip@ogren.info
(303)786-1540
Boulder, CO 80303
| CLEAR |
| The Center for Computational Language and Education Research is a research group within the Department of Computer Science that focuses on human language technologies (i.e. Natural Language Processing or Computational Linguistics.) The lab has had great success both academically and commercially in areas such as speech recognition, named entity recognition, and semantic role labeling. I am one of the primary developers of ClearTK which is a toolkit for developing natural language processing (NLP) components on top of UIMA. |
| IT.com |
| IT.com is a small start-up company that is building a search application for email collections obtained for legal discovery. My responsibility is to create software that parses and analyzes email messages to recover their structures. I applied Conditional Random Fields (CRF) to identify various sections interest that occur in many email messages such as embedded message headers, signatures, and legal disclaimers. The implementation was inspired by and derived from Carvalho and Cohen. The features used consist of regular expressions and matches to various lexicons (e.g. U.S. cities, common first names, etc.) The training data was created using Knowtator (see below.) I also implemented a modified version of Yeh and Harnly to reconstruct email threads. Other tasks have included name normalization, identification of duplicate messages, date normalization, and other miscellaneous cleanup of the data. Additionally, I have built the software infrastructure around these algorithms that takes the messages from their raw form and produces relational data. This was accomplished using UIMA, Lucene, and MySQL. |
| Mayo Clinic College of Medicine (2006) |
I worked on the text analysis team within the Division of Biomedical Informatics, an NIH funded research lab. My responsibilities included project management, software development, and research. The following is a list of my major responsibilities:
|
| University of Colorado Health Sciences Center |
|
| Mayo Clinic College of Medicine (1998-2002) |
Research and development in the lab centered on application of
controlled health vocabularies and computational linguistics techniques to the
problem of clinical data encoding and retrieval.
|
Master of Science, Computer Science, University of Colorado at Boulder
|
| Bachelor of Science, Mathematics, Harding University, 1997
Magna Cum Laude, GPA = 3.82 |
| High School - National Merit Scholar |
I will gladly share names and contact information of past and present supervisors and co-workers upon request.