home

SNE Master Research Projects 2014 - 2015 - LeftOvers

http://uva.nl/
# title
summary
supervisor contact

students
R

P
1
/
2
1

Automated migration testing.

Unattended content management systems are a serious risk factor for internet security and for end users, as they allow trustworthy information sources on the web to be easily infected with malware and turn evil.
  • How can we use well known software testing methodologies (e.g. continuous integration) to automatically test if available updates to software running on a website that fix security weaknesses can be safely implement with as minimal involvement of the end user as possible?
  • How would such a migration work in a real world scenario?
In this project you will at the technical requirements for automated migration testing, and if possible design a working prototype.
Michiel Leenaars <michiel=>nlnet.nl>


R
P

2

Virtualization vs. Security Boundaries.

Traditionally, security defenses are built upon a classification of the sensitivity and criticality of data and services. This leads to a logical layering into zones, with an emphasis on command and control at the point of inter-zone traffic. The classical "defense in depth" approach applies a series of defensive measures applied to network traffic as it traverses the various layers.

Virtualization erodes the natural edges, and this affects guarding system and network boundaries. In turn, additional technology is developed to add instruments to virtual infrastructure. The question that arises is the validity of this approach in terms of fitness for purpose, maintainability, scalability and practical viability.
Jeroen Scheerder <Jeroen.Scheerder=>on2it.net>


R
P

3

Efficient delivery of tiled streaming content.

HTTP Adaptive Streaming (e.g. MPEG DASH, Apple HLS, Microsoft Smooth Streaming) is responsible for an ever-increasing share of streaming video, replacing traditional streaming methods such as RTP and RTMP. The main characteristic of HTTP Adaptive Streaming is that it is based on the concept of splitting content up in numerous small chunks that are independently decodable. By sequentially requesting and receiving chunks, a client can recreate the content. An advantage of this mechanism is that it allows a client to seamlessly switch between different encodings (e.g. qualities) of the same content.
The technique known as Tiled Streaming build on this concept by not only splitting up content temporally, but also spatially, allowing for specific areas of a video to be independently encoded and requested. This method allows for the navigation in ultra-high resolution content, while not requiring the entire video to be transmitted.
An open question is how these numerous spatial tiles can be distributed and delivered most efficiently over a network, reducing both unnecessary overhead as well as latency.

Ray van Brandenburg <ray.vanbrandenburg=>tno.nl>
R

P

4

Portable RFID/NFC “Bumping” Device.

In regards to physical social engineering there are two main ways of gaining entry to targeted premises either via tail-gating, that is to follow a valid employee/visitor right behind them as they’ve opened a door or via lock picking, which implies the use of specialised tools to pick physical locks.
As more organisations are replacing traditional entry systems with RFID card controlled entry points and even turnstiles, the use of tail-gaiting and lockpicking is becoming increasingly more difficult, especially when coupled with increased security awareness of employees and security staff. It has long been discussed the ability to read a target’s RFID access card and use that information to replicate it onto a different card of similar make, thus effectively cloning it. Although possible on many occasions this is a multiple step process which requires both time and materials.

There are primarily two different types of cards; cards which support security keys and cards which don’t. Many HID cards or MIFARE Ultralight cards (such as the ones used in disposable OV-chipkaart tickets) do not support a security handshake or encryption, unlike Anonymous/Personal OV-chipkaart tickets that use the MIFARE Classic 4K chips which use security keys. It is woth noting that MIFARE Classic chips have also been cracked (http://www.ru.nl/ds/research/rfid/) but these more elaborate systems require offline analysis. Most organisations with recent RFID implementations on their premises also use MIFARE Classic chips.

System have been designed since mid-2000 (http://www.wired.com/wired/archive/14.05/rfid.html) for “bump” cloning basic/non-encrypted RFID cards but no serious research has been made into designing a portable solution that can on-the-spot 1) clone multiple technologies and 2) clone RFID cards that support security keys. – Such a platform could also be potentially programmed to also read and clone other NFC protocols such as ones used in mobile phones and debit/credit cards and could warrant further research.
Henri Hambartsumyan <HHambartsumyan=>deloitte.nl>


R

P
2
5

Automated vulnerability scanning and exploitation (part 2).

Automated vulnerability scanning is often used in professional development environments to find critical security issues. But what if those techniques are applied to scripts available on the internet? Many scripts are shared on sites like Sourceforge and GitHub, but security might not have been a priority during their development.

Last RP period Thijs Houtenbos and Dennis Pellikaan researched this topic[1]. They developed a completely automated approach in which a large number of these scripts were downloaded, analyzed for vulnerabilities, tested, and finally websites using these scripts were identified. Combining the information gathered during all these steps in this approach, a list could be generated of web servers running vulnerable code and the parameters needed to exploit these.

Their paper suggest a few subjects on which future research is needed to improve the methodology that was developed. This research project is intended to extend or improve the previous work done.

[1] http://rp.delaat.net/2012-2013/p91/report.pdf

Bart Roos <bart.roos=>ncsc.nl>
Jop van der Lelie <jop.vanderlelie=>ncsc.nl>

R

P

6

Electro magnetic fault injection Characterization.

Fault injection attacks are active and either non-invasive (voltage, clock) or semi-invasive attacks (laser) based on the malicious injection of faults. These attacks have proven to be practical and are relevant for all embedded systems that need to operate securely in a potentially hostile environment. Electromagnetic fault injection is a new fault injection technique that can be used during security evaluation projects. The student working on this project will be using the tooling provided by Riscure.

A previously conducted RP project, by Sebastian Carlier, focused on the feasibility of EMFI (see: http://staff.science.uva.nl/~delaat/rp/2011-2012/p19/report.pdf). Another previously conducted RP project, by Albert Spruyt, focused on understanding fault injected in the powerline (see: http://staff.science.uva.nl/~delaat/rp/2011-2012/p61/report.pdf). This project will focus on extending the work performed by Sebastian and Albert.

The goal of this project is:
  • Create a EMFI fault injection setup (Sebastian's work)
  • Extend the fault injection framework (Albert's work)
  • Correlate results with Albert's results
Research question: Are faults introduced using EMFI comparable to faults injected in the powerline?

The following deliverables are requested from the student:
  • A clear description of the performed tests and their results
  • Recommendations for future testing
Topics: Fault injection, EMFI, low level programming, assembly, microcontroller, electronics.
Note: This project can be combined with "Optical fault injection characterization".
Niek Timmers <Timmers=>riscure.com>



10

Qualitative analysis of Internet measurement methods and bias.

In the past year NLnet Labs and other organisations have run a number of measurements on DNSSEC deployment and validation.  We used the RIPE Atlas infrastructure for measurements, while other used Google ads where flash code runs the measurements.  The results differ as the measurement points (or observation points) differ: RIPE Atlas measurment points are mainly located in Europe, while Google ads flash measurements run global (or with some stronger representation of East-Asia).

Question is can we quantify the bias in the Atlas measurements or qualitative compare the measurements, so we can correlate the results of both measurement platforms.  This would greatly help interpret our results and the results from others based on the Atlas infrastructure. The results are highly relevant as many operational discussions on DNS and DNSSEC deployment are supported or falsified by these kind of measurements.
Willem Toorop <willem=>nlnetlabs.nl>


R

P

13

YouTube-scanner.

Goal:
More and more videos are being published on YouTube that contain content which is such that you want to find it soon after upload. The metadata associated with videos is often limited. Therefore, the selection has to be based on the visual content of the video.

Approach:
Develop a demonstrator that automatically downloads and analyses the latest YouTube videos. The demonstrator should operate in a two stage process: first, make a content-based selection of the most relevant video material using the screenshot that YouTube provides for every new video. In case the video is considered relevant, download the entire video for full analysis. Use available open source tools such as OpenCV.

Result:
Demonstrator for the YouTube-scanner.
Mark van Staalduinen <mark.vanstaalduinen=>tno.nl>



15

Cross-linking of objects and people in social media pictures.

Goal:
Automatically cross-link persons and objects found in one social media picture to the same persons and objects in other pictures.

Approach:
Develop a concept and make a quickscan of suitable technologies. Validate the concept by developing a demonstrator using TNO/commercial/open-source software. Investigate which elements influence the cross-linking results.

Result:
Presentation of the concept and demonstrator.
John Schavemaker <john.schavemaker=>tno.nl>



18

Binary analysis of mobile applications with the goal to identify vulnerable versions of used libraries in those applications, based on a known list of vulnerable library versions.

Identifying vulnerable software libraries using binary fingerprinting.

Vulnerabilities in software libraries are often identified, and patched subsequently. However, developers of applications might not update their applications to include the newest versions of the libraries they use. This research aims to identify which version of a library is used in a mobile (iOS) application. This information can be matched with a list of known vulnerable libraries to see if the application is insecure. Only the binary version of the tested applications will be available, so some form of binary fingerprinting will be needed.
Han Sahin <han.sahin=>securify.nl>

R
P
2
21

Mobile app fraud detection framework.

How to prevent fraud in mobile banking applications. Applications for smartphones are commodity goods used for retail (and other) banking purpose. Leveraging this type of technology for money transfer attracts criminal organisations trying to commit fraud. One of many security controls can be detection of fraudulent transactions or other type activity. Detection can be implemented at many levels within the payment chain. One level to implement detection could be at the application level itself. This assignment will entail research into the information that would be required to detect fraud from within mobile banking applications and to turn fraud around by building a client side fraud detection framework within mobile banking applications.
Steven Raspe <steven.raspe=>nl.abnamro.com>
R
P
2
22

Malware analysis NFC enabled smartphones with payment capability.

The risk of mobile malware is rising rapidly. This combined with the development of new techniques provides a lot of new attach scenarios. One of these techniques is the use of mobile phones for payments.
In this research project you will take a look at how resistant these systems are against malware on the mobile. We would like to look at the theoretical threats, but also perform hands-on testing.
NOTE: timing on this project might be a challenge since the testing environment is only available during the pilot from August 1st to November 1st.
Steven Raspe <steven.raspe=>nl.abnamro.com>



23

Designing an open source DMARC aggregation tool.

Email is one of the oldest internet technologies in place, and in bad need of some updates. DMARC is a new approach where a domain owner can make policies about his or her domain visible to recipients. This allows a domain owner to advertise that mail can safely be discarded under certain conditions (such as when DKIM and SPF are not in place). Given that the majority of spam and phishing involves sender address spoofing, this approach can have a very real impact on both spam and security.

DMARC also defines a feedback mechanism from recipients back to domain owners. That means you get an actual copy of the mail sent by the attacker, with a detailed machine processable report that will allow you to investigate what happened. The owner of a domain may get reports from many different sources, depending on the various domains emails are sent to. Of course this moves part of the work load of handling spoofed mail from the original recipient (who no longer sees the mail) to the faked sender that gets alerted.

Each mail triggers a separate report, and given that the volume may be at typical spam levels it is hard to get an adequate overview from a large amount of spoofing incidents. Currently, there is a limited set of commercial tools that offer some insight but is not yet an established standard nor are there good open source tools - which makes users depend on commercial providers (often in another jurisdiction) to parse significant volumes of DMARC data for them. Since this involves sharing data and there might also be valid email that ends up there erroneously (because of configuration error) this is not ideal from a security and confidentiality point of view. In this project you will investigate how to best handle the flows of DMARC data, and design an open source prototype aggregation tool that can freely be used by domain name owners to protect themselves.
Michiel Leenaars <michiel=>nlnet.nl>

R
P

25

Transencrypting streaming video.

Common encryption (CE) and digital right management (DRM) are solutions used by the content industry to control the delivery of digital content, in particular streaming video. Whereas DRM focusses on securely getting a CE key to a trusted piece of user equipment, trans-encryption has been suggested as a technical alternative. Transencryption transforms the encryption of content without decrypting it. So encrypted content that can be decrypted with private key A is transformed into encrypted content that can be decrypted with private key B. This solution enables a content provider to outsource the transencryption of a piece of content to an untrusted third party in order to get the content cryptographically targeted to a single specific piece of user equipment.
 
In this project, you will investigate the technical viability of transencrypting streaming video by building an implementation. Your implementation should answer the following questions for at least the implemented configuration.
·         Is it possible to implement transencryption on commercial-of-the-shelff computer equipment?
·         Can the implementation handle transencryption of streaming video of 2 Mb/s?


Oskar van Deventer, <oskar.vandeventer=>tno.nl>


26

Research MS Enhanced Mitigation Experience Toolkit (EMET).

Every month new security vulnerabilities are identified and reported. Many of these vulnerabilities rely on memory corruption to compromise the system. For most vulnerabilities a patch is released after the fact to remediate the vulnerability. Nowadays there are also new preventive security measures that can prevent vulnerabilities from becoming exploitable without availability of a patch for the specific issue. One of these technologies is Microsoft’s Enhanced Mitigation Experience Toolkit (EMET) this adds additional protection to Windows, preventing many vulnerabilities from becoming exploitable. We would like to research whether this technology is efficient in practice and can indeed prevent exploitation of a number of vulnerabilities without applying the specific patch. Also we would like to research whether there is other impact on the system running EMET, for example a noticeable performance drop or common software which does not function properly once EMET is installed. If time permits it is also interesting to see if existing exploits can be modified to work in an environment protected by EMET.
Henri Hambartsumyan <HHambartsumyan=>deloitte.nl>


27

Triage software.

In previous research a remote acquisition and storage solution was designed and built that allowed sparse acquisition of disks over a VPN using iSCSI. This system allows sparse reading of remote disks. The triage software should decide which parts of the disk must be read. The initial goal is to use meta-data to retrieve the blocks that are assumed to be most relevant first. This in contrast to techniques that perform triage by running remotely while performing  a full disk scan (e.g. run bulk_extractor remotely, keyword scan or do a hash based filescan remotely).

The student is asked to:
  1. Define criteria that can be used for deciding which (parts of) files to acquire
  2. Define a configuration document/language that can be used to order based on these criteria
  3. Implement a prototype for this acquisition
"Ruud Schramp (DT)" <schramp=>holmes.nl>
"Zeno Geradts (DT)" <zeno=>holmes.nl>
"Erwin van Eijk (DT)" <eijk=>holmes.nl>



28

Parsing CentralTable.accdb from Office file cache and restoring cached office documents.

The Microsoft Office suit uses a file cache for several reasons, one of them is delayed uploading and caching of documents from a sharepoint server.
In these cache files office partial or complete documents that have been opened on a computer might be available. Also the master database in the file cache folder contains document metadata from sharepoint sites. In this project you are asked to research the use of the office file cache and deliver a POC for extraction and parsing of metadata from the database file, also decode or parse document contents from the cachefiles (.FSD).
Kevin Jonkers <jonkers=>fox-it.com>



29

UsnJrnl parsing for file system history.

In modern Windows versions, the NTFS filesystem keeps a log (the UsnJrnl file) of all operations that take place on files and folders. This journal is not often included in forensic investigations, and even if it is, parsing and interpreting can be tedious and labour-intensive work. In this project, you are asked to research the type of information that is stored in the UsnJrnl that can be of value for forensic investigations, and create a (PoC) tool that parses out this information in a useable format. Examples of activity that you could identify in the UsnJrnl are filename changes (what were previous filenames of a file?), timestamp modifications (compare to MFT entries and find anomalies), read/write operations (research is still required for a better understanding of relevant traces), etc.
Kevin Jonkers <jonkers=>fox-it.com>



30

UsnJrnl parsing for Microsoft Office activity.

In modern Windows versions, the NTFS filesystem keeps a log (the UsnJrnl file) of all operations that take place on files and folders. This can include interesting information about read- and write-operations on files. Microsoft Office programs perform a lot of file-operations in the background while a user is working on a file (think of autosave, back-up copies, copy-paste operations, etc.). While a lot of this activity leaves short-term traces on the file system, they can often only be found in the UsnJrnl after a while. Only little research has been done on the forensic implications of these traces. In this project, you are requested to research which traces are left in the UsnJrnl when using Office applications like Word and Excel and how these traces can be combined into a hypothesis about what activity was performed on a document.
Kevin Jonkers <jonkers=>fox-it.com>



31

Advanced EVTX file parsing.

In modern Windows versions, Windows stores event logs in evtx format. The evtx file stores the specific content of an event. The (generic) event text of the event log messages is stored in the resources of program DLLs (as defined in the registry). Create a (PoC) tool that parses the event log messages DLL names from the registry, parses the DLLs itself, and parses the event logs, and combines this information in a scalable, easily searchable data store (RDBMS?).
Kevin Jonkers <jonkers=>fox-it.com>



32

The Serval Project.

Here a few projects from the Serval project. Not everything is equally appropriate for the SNE master, but it gives possibly ideas for rp's.

1. Porting Serval Project to iOS

The Serval Project (http://servalproject.org, http://developer.servalproject.org/wiki) is looking to port to iOS.  There are a variety of activities to be explored in this space, including how to provide interoperability with Android and explore user interface issues.

2. Adding unusual data transports to the Serval Project.

(http://servalproject.org, http://developer.servalproject.org/wiki)

Recent discussions on the guardian-dev mailing list have revealed the possibility of using bluetooth device names and Wi-Fi direct directory lists as low-bandwidth ad-hoc communications channels between nearby smart-phones.  The key advantages of these channels is that they require no user intervention, such as peering or association.  Adding such transports to the Serval Project will provide further options for people in disasters or facing oppression to communicate effectively and securely.

3. C65GS FPGA-Based Retro-Computer

The C65GS (http://c65gs.blogspot.nl, http://github.com/gardners/c65gs) is a reimplementation of the Commodore 65 computer in FPGA, plus various enhancements.  The objective is to create a fun 8-bit computer for the 21st century, complete with 1920x1200 display, ethernet, accelerometer and other features -- and then adapt it to make a secure 8-bit smart-phone.  There are various aspects of this project that can be worked on.

4. FPGA Based Mobile Phone

One of the long-term objectives of the Serval Project (http://servalproject.org, http://developer.servalproject.org/wiki) is to create a fully-open mobile phone.  We believe that the most effective path to this is to use a modern FPGA, like a Zynq, that contains an ARM processor and sufficient FPGA resources to directly drive cellular communications, without using a proprietary baseband radio.  In this way it should be possible to make a mobile phone that has no binary blobs, and is built using only free and open-source software.  There are considerable challenges to this project, not the least of which is implementing 2G/3G handset communications in an FPGA.  However, if successful, it raises the possibility of making a mobile phone that has long-range UHF mobile mesh communications as a first-class feature, which would be an extremely disruptive innovation.
Paul Gardner-Stephen <paul.gardner-stephen=>flinders.edu.au>

34

SCTP links for RADIUS and SNMP.

Some protocols are designed for use on an internal network, and land in trouble when they need to cross-over to other operational domains, usually over the public Internet. Two such applications are RADIUS (for AAA) and SNMP (for monitoring). The common way out is to use TCP with TLS; but this has problems too. Instead, we want to try this with SCTP.

Problem Description

Protocols such as RADIUS and SNMP are very useful in relaying information for accounting and management purposes, specifically. Both are run over UDP with good reason; RADIUS wants to avoid the sequential of independent requests; SNMP wants to survive in situations where the network deteriorates.

These protocols are sometimes called for between operational domains, but they are not sufficiently reliable, or secure, to cross the public Internet. So in these cases, TCP and TLS are used as wrappers. This is always at the expense of the original reasons to run the protocols over UDP.

SCTP [RFC 4960] is very suitable to resolve these issues for cross-realm communication. Not only can it be protected through DTLS or GSS-API, but it can specify separate streams, and each of these can specify whether they implement ordered delivery (which is useful for RADIUS) or partially reliable [RFC 3758] due to dropouts (which is useful for SNMP). Even more interestingly, SCTP can support sending information from more than one source address, so failure of one network interface could be resolved by sending through another.

What we would like to see, is an experiment that verifies if this works, and how well it works. This involves the creation of a simple program that translates local RADIUS and SNMP streams back and forth to SCTP, on both ends. In addition, it involves validation that the reasons why these protocols originally preferred SCTP still holds.

Protocol Design.

The full protocol to implement this involves DTLS or GSS-API over one of its streams, as well as some configuration dynamicity; we will not be bothered with those aspects in this exercise. Instead, the following fixed setup will suffice:
  • Stream 0 is silent (reserved for DTLS or GSS-API)
  • Stream 1 is silent (reserved for configuration)
  • Stream 2 is for RADIUS Authentication + Authorisation
  • Stream 3 is for RADIUS Change of Authorisation
  • Stream 4 is for RADIUS Accounting
  • Stream 4 is for SNMP Get and Walk operations
  • Stream 5 is for SNMP Trap operations
The distinction between streams serves a number of purposes. They may be routed separately (Authentication versus Accounting) and they may each have different delivery modes (reliability, ordering).

Questions.

  • Is the proposed stream split optimal, or are there reasons to improve upon it? What are the pros and cons of combining of RADIUS with SNMP (and possibly more) protocols?
  • What are recommended modes of operation for each? Consider various circumstances, including local versus Internet routing and a normal versus a flooded, slow or packet-dropping network.
  • Conduct tests of your recommendations in simulated network settings.
  • Investigate the reasons why RADIUS uses UDP instead of TCP. Does SCTP offer a suitable alternative?
  • Investigate the reasons why SNMP uses UDP instead of TCP. Does SCTP offer a suitable alternative?
See also: http://research.arpa2.org/

The ARPA2 project is the development branch of InternetWide but it also engages in software and protocol research. This is a logical consequence of our quest for modern infrastructure for the Internet.
Rick van Rein <rick=>openfortress.nl>



35

Running FastCGI over SCTP.

Many applications on the web are run in self-contained processes that are approached from a web server over a protocol such as FastCGI or WSGI. Many such protocols run over a single TCP-connection which mixes the various input/output streams of the application. We believe that it is more efficient to use SCTP instead.

Problem Description


Back-end processes are compiled against a FastCGI library, which enables them to pick up requests from a webserver, handle them and recycle the process that did the work for another run. The result is superior speed relative to classical cgi-bin scripts. Furthermore, the longevity of processes enables stateful processes, which can be much faster than for instance php processing, where each script starts from scratch and needs to load both scripts and data for what may end up being a simple result.

When used remotely, a FastCGI daemon process is approached over TCP. The protocol is a fairly simple multiplex of a number of information streams. The problem of such a multiplex may be that packet loss over one of the multiplexed streams will uphold the delivery of the other streams as well. For example, error reporting could slow down content delivery or, perhaps worse, security alerts could get overruled by massive data traffic (e.g. DDoS).

The SCTP protocol can take a different approach, where streams are independently communicated. Moreover, it is possible to allocate enough streams to uphold a connection and start a new request in another set of streams.

We are asking students to modify the existing FastCGI implementation in C, and make it use SCTP instead of TCP. This will take a modification of both client and server code. It may have implications for the point where streams are allocated. The FastCGI API is agnostic to the transport protocol, so SCTP can be a drop-in replacement for TCP. This means that an application can simply be recompiled against the modified FastCGI library to use SCTP instead of TCP.

For connections across realms, protective measures may be required, such as DTLS or GSS-API. This aspect is not necessarily part of this assignment; we will resort to leaving stream 0 silent and reserved for this purpose. Any dynamic configuration will be communicated over stream 1. Streams 2 and beyond are intended for the FastCGI content streams.

Projected Results

  • First and foremost, we want to see a working demonstration of an Nginx webserver with a FastCGI backend communicating over SCTP. This means that programming skills are required from the students.
The trick of a good SCTP implementation is to find a suitable division of work over streams, and a suitable setup of the various channels. As an example, a syslog interface could be made available to supply feedback from the plugin to the webserver; it could be setup per request or once (with lines prefixed by the SERVER_NAME perhaps). Make your choices well, and document them. Where proper, demonstrate that the choices work (better than the alternatives).

See also: http://research.arpa2.org/

The ARPA2 project is the development branch of InternetWide but it also engages in software and protocol research. This is a logical consequence of our quest for modern infrastructure for the Internet.
Rick van Rein <rick=>openfortress.nl>



36

Media handover with SCTP.

The SCTP protocol is a relatively new protocol that is an alternative to UDP and TCP. It has a couple of really interesting capabilities:
  • support for multiple streams that run in parallel
  • support for acknowledgement and replay quality ranging from UDP’s just-drop-it to TCP’s block-until-it-gets-through
  • support for multi-homed endpoints, even supporting a mixture of IPv4 and IPv6 addresses
Now imagine using SCTP for the next connection to your favourite media website.  You’d reserve a TCP-quality stream to exchange HTTP, and another over which you’d receive media, perhaps in an RTP flow, with UDP style (if it needs to be interactive) or a maximum resend attempt timer (if it isn’t live and can suffer the delay of buffering).  And imagine that the web frontend relays your request over SCTP to a media server in its backend.  Then, if the backend can find what you are looking for, the web portal connects you directly to the backend and you can enjoy the media without the hickups that would occur if the media was served over HTTP with full-blown TCP connection quality.

What we would like to see verified is if the following is possible, and if so, a bit of C code to demonstrate this fact:
  • Setup an SCTP session from party A to B
  • Setup an SCTP session from party B to C
  • Let B add the address of C to the session between A and B
  • Let B add the address of A to the session between B and C
  • Figure out what it takes to get one of the streams between A and C to communicate directly
  • B should ask C to make this switch for one stream only; it may need to sync stream sequence numbers, perhaps through the Partial Reliability extension
  • The code on A and C that runs over that one stream may explicitly mention the target socket in sendmsg() and/of recvmsg()
  • Look for race conditions where B sends sequence numbers to C (or A) while C (or A) is just sending a new message
  • Demonstrate that other streams between A and B continue as before, as well as between B and C
  • Demonstrate that this is compatible with the SCTP RFCs
  • If this is not possible, then explain the conflict based on the RFCs — and be sure to ask us if we see a way out
Please understand that this assignment cannot be done without some experience in network stack programming in C.  The socket interface should be addressed directly to get the best out of it. Another requirement for students undertaking this assignment is the ability to understand RFCs.

See also: http://research.arpa2.org/

The ARPA2 project is the development branch of InternetWide but it also engages in software and protocol research. This is a logical consequence of our quest for modern infrastructure for the Internet.
Rick van Rein <rick=>openfortress.nl>



37

Running LDAP over SCTP.

The LDAP protocol is often used to exchange system administrative information, as well as phonebook data. Both applications may call for high responsiveness, even over long-distance connections. The SCTP protocol has the best cards for this game.

SCTP [RFC 4960] is a relatively new standard transport protocol that holds a sort of middle ground between TCP and UDP. It communicates frames of data which may be larger than the MTU, and these are reliably delivered by default. Applications choose between in-order delivery and unordered delivery for each frame sent. SCTP connections have parallel streams, each of which is independently ordered.

LDAP [RFC 4511] is generally run over TCP, and has no mapping to UDP. The reason is that the protocol is not designed to deal with loss of packets. The protocol itself however, uses LDAPMessage frames that hold an session identifier, so that the server can respond concurrently to concurrent queries.

Protocol Construction

Read the two RFC’s and get creative — find methods to map LDAP onto SCTP. The question is not whether it is possible, but how many variations can be constructed. For instance, LDAP’s inherent concurrency may be exploited through parallel streams. But care should be taken to retain protocol correctness, while mapping the LDAPMessage frames onto SCTP frames over (multiple?) streams.

During this phase, we will guide you in constructing your mappings, and finding possible alternatives. You should be able to read the quoted RFC’s, but these two are expected to be sufficiently self-contained so that you need not chase reference upon reference of RFC literature.

Testing Performance

Build a simple tunnel program to test the most interesting mappings of LDAP onto SCTP. You can use Python if you like, you can just tune down the performance of the links to match its reduced speed.

Compare the performance of your mappings under simulated stormy networking conditions:
  • Perfect networking conditions
  • Bandwidth nearly fully exploited by other traffic
  • Various packet dropping rates
  • Constant packet re-ordering due to flipping routes
Your report will argue which LDAP-over-SCTP mapping is advised under each of these conditions, in terms of extra network load and in terms of user-experienced responsiveness to queries, that is, packet delay times.

Please conduct your experiments with some care; measure in the middle of a run and over a reasonable period to get a low variability in your results; you can then make your statements with a high level of confidence.

See also: http://research.arpa2.org/

The ARPA2 project is the development branch of InternetWide but it also engages in software and protocol research. This is a logical consequence of our quest for modern infrastructure for the Internet.
Rick van Rein <rick=>openfortress.nl>



39

Realm Crossover with DANE and DNSSEC.

*DNSSEC is protecting evermore of the Internet, and DANE expands it with facilities for in-promptu certificate validation. This combination opens new windows of opportunity for initial trust across operational realms.

The role of DNSSEC and DANE

Thanks to DNSSEC, it is possible to trace the validity of DNS data all the way to the DNS root, where a "widely known" key is used to sign the root DNS records that define the Internet. Anyone who uses a validating resolver can be certain that the DNS content for DNSSEC-signed sites is unmodified by rogue intermediates between them and the authoritative DNS server.

DANE exploits the newly found reliability of DNS data by storing in it certificates, or hashes of certificates. Where a certificate is a long-lasting statement of trust in an identity by a supposedly trustworthy third party, the addition of DANE is short-lived assurance of that same information, in a way controlled by the domain owner.

DANE for Cross-Realm Kerberos

Kerberos is a single-signon protocol that is widely used for internal authentication and encryption of network connections. It is old, but far from expired -- it is actively developed and widely deployed thanks to various infrastructural products centered around it -- Windows for Workgroups and Active Directory, Samba4, and FreeIPA are a few well-known examples.

The point where Kerberos is barely usable, is for Realm Crossover which is only possible with manual configuration by administrators. That is a sad fact, because once it is configured, it affects the entire internal network. Ideally, trust (in identities) could be established automatically.

This is where the DANE idea enters the game. The Kerberos protocol can already use X.509 certificates of a particular format to pre-authenticate to a Key Distribution Center; the KDC uses another X.509 certificate to establish its own server rights. There is no reason why one KDC could not contact another KDC, using a server certificate on both ends, to pre-authenticate and then obtain a fresh, temporary exchange key for the use of the second realm's services by the first realm's clients. To this end, proof of certificate validity could be confirmed through DNSSEC and DANE.

Assignment

  • The assignment handed out here has two parts:
  • Check out the most recent stable MIT krb5, and modify the KDC code to support this scheme, both as a requesting and offering server.
  • Setup two KDCs, including DNSSEC/DANE, and demonstrate that the clients in one can access the services in the other (by fetching a krbtgt ticket-granting ticket). Demonstrate variations with unidirectional and bidirectional trust.
  • Report which security precautions are required to make this a safe success; this may be investigated in interaction with OpenFortress cryptographer and network expert.

See also: http://research.arpa2.org/

The ARPA2 project is the development branch of InternetWide but it also engages in software and protocol research. This is a logical consequence of our quest for modern infrastructure for the Internet.
Rick van Rein <rick=>openfortress.nl>

R

P
2
40

Applications with Auto-Monitoring.

It is very common for networking devices such as routers and (smart) switches to support monitoring through built-in SNMP support. This habit has not been established for higher-level applications, such as server daemons, although it would be very useful there too.

Why monitoring is useful

In any professional environment, monitoring is a vital part of business. It is a Good Thing |trade| to resolve problems before customers detect it. Not only does it reflect well on Service Level Agreements, but it also helps to keep pressure off the helpdesk.

The standard for monitoring, as established by the IETF, is SNMP. We discussed it in a Good, Bad and Ugly article. We believe that the value of SNMP incorporated into devices could also do wonders for the daemons that we tend to run as servers.

Imagine being able to auto-discover new DNS zones, have their DNSSEC state incorporated and see a report when RRSIGs on the zone start expiring. Or if you like web protocols, imagine hearing about the retraction of a virtual host or backend server before anybody notices it. Or imagine a replicated architecture, where you get to see servers that are down even if your customers cannot see it. These things are an extremely useful part of running a solid online organisation.

Auto-monitoring

The prefix auto means self or by itself. We have forgotten it now, but an automobile was something that moved by itself... without a horse to pull it, that is.

Similarly, auto-monitoring is a term that we coined for systems that present the information with which they can be monitored -- some part of their internal state that signals trouble, either in summary or on a listed object (such as a zone name) or perhaps statistical information (such as traffic flowing through a virtual host).

Not all environments use SNMP -- yet ;-) -- but an application may respond to the ability to load the Net-SNMP library and registering it as an agent by actually exporting information over SNMP. This means that adding SNMP is instantaneous when the Net-SNMP library is installed on a system.

Two Programming Projects

We want you to engage in two concrete projects, and compose in a HOWTO format what needs to be done in general to support SNMP in a piece of software.

  • The first project is a "push" mode assignment. It provides a variable that may be watched, and optionally sends a trap to a monitoring station if it knows one. The daemon to work on is the SMARTmontools daemon, which runs in the background to check on disks.
  • The second project is a "pull" mode assignment, and it is somewhat complex because it reports state in one table, composed from two programs; this sort of thing is possible under the AgentX specification. Your target for this is OpenDNSSEC; we want to see a table indexed by zone names, and reporting on the keys (how many there are, when the next rollover takes place and such) from the Enforcer, as well as the next re-signing moment from the Signer (and, if it is easy to add, the first RRSIG to expire might be a nice addition). These daemons each register their objects into the table, and leave it to the SNMPd master agent to combine the results. Use the functionality of the Net-SNMP to register callbacks that inquire for the information, which you can dig up from the common storage structures -- probably the database for the Enforcer, and the queue for the Signer.
    • As part of your work, construct templates for Zabbix (just one of the many monitoring products that can track SNMP) and use its discovery facilities to demonstrate how newly added zones automatically pop up in the output; also demonstrate how the retraction of a zone leads to an alarm that must be handled, or otherwise silenced by an operator.
See also: http://research.arpa2.org/

The ARPA2 project is the development branch of InternetWide but it also engages in software and protocol research. This is a logical consequence of our quest for modern infrastructure for the Internet.
Rick van Rein <rick=>openfortress.nl>



43

badBIOS.

Last year, publications in the media (
http://arstechnica.com/security/2013/10/meet-badbios-the-mysterious-mac-and-pc-malware-that-jumps-airgaps/
) claim to found malware that can jump to airgapped machines, though, there is also some skepticism on the claims they make (
http://arstechnica.com/security/2013/11/researcher-skepticism-grows-over-badbios-malware-claims/
). Student can research possibilities for malware to jump to airgapped machine via e.g. bluetooth connections (
http://scholar.google.nl/scholar?q=bluetooth+spread+malware&btnG=&hl=en&as_sdt=0%2C5
), but also with the use of the microphone and the speaker (as is suggested for the badBIOS malware).
Martijn Sprengers <sprengers.martijn=>kpmg.nl>



44

RFID hacking.

Recently, the Radboud University Nijmegen has developed a practical attack on HID RFID cards (i.e. Garrcia et al.). Some classes (e.g. the HID iClass) are vulnerable to attacks and as such should not be used anymore. In addition, researchers of the Universitat of Bochum (i.e. David Oswald and Christof Paar) have performed a Differential Power Analysis attack on the Mifare DesFire. In this research, students focus on practical attacks on other (i)Classes or the (in)security of specific RFID applications and chipsets.
Martijn Sprengers <sprengers.martijn=>kpmg.nl>



46

Increase effectiveness of malware.

In the recent years, attacks on end-users have been widely spread and increasingly adopted as an entry point for many compromises. Furthermore, techniques like ‘polymorphism’, ‘fast flux DNS’ and ‘asynchronic callbacks’ make it even harder for traditional anti-virus techniques to prevent infections. In this research, students use open source tooling or self-developed proof-of-concepts to increase the effectiveness of malware, e.g. by creating proof-of-concepts that easily defeat the purpose of traditional AV. Goal is to verify whether these newly created POC’s can be counter measured by detective/heuristic defenses.
Martijn Sprengers <sprengers.martijn=>kpmg.nl>



47

Malware simulation.

In this research, students focus on how malware can be adequately simulated (i.e. as close to real-world scenario’s as possible), for example to train and increase the effectiveness of security operations centers. Goal is creating a POC that is ‘modular’ and ‘manageable’, for example a framework that can quickly deploy all kinds of Advanced Persistent Threats simulations in a real-world or virtual environment. The research also includes a (literature) study on how real world APT’s and malware is able to use specific techniques, and if these techniques can be simulated.

If required, specific tooling to simulate/create malware and APT’s can be provided by KPMG.
Martijn Sprengers <sprengers.martijn=>kpmg.nl>

R

P
2
51

System Security Monitoring using Hadoop.

It involves looking into data mining of system and network logs using Hadoop and then focusing on system security. This research will investigate a real time ’streaming’ approach for monitoring system security - so streaming data through hadoop (e.g. via spark streaming) and then identifying and storing possible incidents. As an example of visualization you could think of a real-time map of the world displaying both failed and successful login attempts. In any case an important first part of the project would be investigating what others have done in this field and which systems and techniques they used. This to get an overview of all the possibilities. Finally implementing a small proof of concept based on ‘best-practice’ or cutting edge tools/API’s would be a great final result.
Mathijs Kattenberg

R

P
2
53

Greening the Cloud.

In the project 'Greening the Cloud', led by the HvA, we are interested in the performance of different Hypervisors with respect to greennnes. Hypervisors segment the physical machines into multiple virtual machines and considering their performance greennnes is mostly not taken into account. For equal benchmark/use
cases we will compare three hypervisors, two open source hypervisors, KVM and XEN, and another hypervisor from VMware. Performance issues with respect to green aspects must be defined for hypervisors and be suited to incorporate in a checklist. This checklist will be part of a larger framework to be developed by the collaboration aimed to green labeling of clouds. The comparison should be a fair comparison, i.e. the comparison should also take network functionality and storage functionality into account. About non-green performance aspects of hypervisors already studies are available, and part of the work will be a literature study.

This work will be conducted in close collaboration with two of the project participant, both cloud providers, Schuberg Philis and Greenhost.
Arie Taal <A.Taal=>uva.nl>,Paola Grosso <p.grosso=>uva.nl>


54

Irregular algorithms on the Xeon Phi.

Irregular algorithms, such as graph processing algorithms, consist of many parallel tasks of irregular size and with irregular memory access patterns. As such the two main bottlenecks for irregular algorithms are:
  1. Available parallelism
  2. Random memory access speed
Modern CPUs are limited both in the amount of available parallelism and their random memory access speed. This has led many people to investigate implementations of irregular algorithms on the GPU. GPUs have large amounts of available parallelism and high memory access speeds.

Unfortunately, optimally exploiting parallelism on the GPU requires the parallel tasks to be identical, which is not the case for irregular algorithms. Additionally, most of the high memory speeds on the GPU rely on coalescing very regular memory access patterns.

Recently, Intel released their Xeon Phi accelerator. The Xeon Phi falls in the middle between CPU and GPU in many ways. It has more cores than a CPU, but less than a GPU. It's memory is faster than a CPU, but not as fast as a GPU. The interesting property is that Xeon Phi processors are more independent than that of a GPU, so optimal exploiting parallelism relies less on regular workloads. Similarly it's memory model relies less on regular access patterns than that on a GPU.

In this project the student is expected to investigate, benchmark and compare the random memory access speeds of CPU, GPU and Xeon Phi and determine whether the Xeon Phi is a worthwhile platform for further investigation for graph processing. The following deliverables are expected of the students:
  • A clear description of the performed benchmarks and results and their applicability.
  • One or more prototype implementations of simple graph processing algorithms, such as breadth-first search, betweenness centrality or pagerank.
Merijn Verstraaten <M.E.Verstraaten=>uva.nl>



55

Partitioning of big graphs.

Distributed graph processing and GPU processing of graphs that are bigger than GPU memory both require that a graph be partitioned into sections that are small enough to fit in a single machine/GPU. Having fair partitions is crucial to obtaining good workload balance, however, most current partitioning algorithms either require the entire graph to fit in memory or repeatedly process the same nodes, causing the partitioning to be a very computationally intensive process.

Since a good partitioning scheme depends on both the number of machines used (i.e., the number of partitions) and the graph itself, this means that precomputing a partitioning is unhelpful. It would mean that incrementally updating the graph becomes impossible, we therefore need to do partitioning on-the-fly, preferably distributedly. This project involves investigating 1 or more possible partitioning schemes and developing prototypes. Possible starting points:
  1. Partitioning that minimises cross-partition communication
  2. Fine-grained partitioning that allows easy recombining of partitions to scale to the appropriate number of machines.
  3. Distributed edge-count based partitioning that minimises communication.
Expected deliverables:
  •   One or more partitioning prototypes
  •   Write-up of the partitioning scheme and it's benefits
Merijn Verstraaten <M.E.Verstraaten=>uva.nl>



56

Analysing ELF binaries to find compiler switches that were used.

The Binary Analysis Tool is an open source tool that can automate analysis of binary files by fingerprinting them. For ELF files this is done by extracting string constants, function names and variable names from the various ELF sections. Sometimes compiler optimisations move the string constants to different ELF sections and extraction will fail in the current implementation.

Your task is to find out if it is possible by looking at the binary to see if optimisation flags that cause constants of ELF sections to be moved were passed to the compiler and reporting them. The scope of this project is limited to Linux.

Armijn Hemel - Tjaldur Software Governance Solutions
Armijn Hemel <armijn=>tjaldur.nl>


58

Designing structured metadata for CVE reports.

Vulnerability reports such as MITRE's CVE are currently free format text, without much structure in them. This makes it hard to machine process reports and automatically extract useful information and combine it with other information sources. With tens of thousands of such reports published each year, it is increasingly hard to keep a holistic overview and see patterns. With our open source Binary Analysis Tool we aim to correlate data with firmware databases.

Your task is to analyse how we can use the information from these reports, what metadata is relevant and propose a useful metadata format for CVE reports. In your research you make an inventory of tools that can be used to convert existing CVE reports with minimal effort.

Armijn Hemel - Tjaldur Software Governance Solutions
Armijn Hemel <armijn=>tjaldur.nl>

59

Forensic correctness of Live CD boot.

As described in http://www.forensicswiki.org/wiki/Forensic_Live_CD_issues, lots of live CD's can modify the underlying filesystems that need to be investigated. The boot system Casper (http://packages.ubuntu.com/trusty/casper)  is pointed at, as being one of the major problems. You are requested to identify the exact problems within this boot system. Is it still an active problem. And implement a solution (patch) that is likely to be accepted (aka easy to incorporate) by current systems that use Casper.
"Ruud Schramp (DT)" <schramp=>holmes.nl>
"Zeno Geradts (DT)" <zeno=>holmes.nl>
"Erwin van Eijk (DT)" <eijk=>holmes.nl>

R

P

61

Performance measurement and tuning of remote acquisition.

In previous research a remote acquisition and storage solution was designed and built that allowed sparse acquisition of disks over a VPN using iSCSI. The performance of this solution (and any solution that does random IO) depends on the tuning of the IO. The student is asked to come up with strategies that find a reasonable optimum between sequential io (full copy) and random io (sparse possibly incomplete logical copy) and give advice on when to choose which method.
"Ruud Schramp (DT)" <schramp=>holmes.nl>
"Zeno Geradts (DT)" <zeno=>holmes.nl>
"Erwin van Eijk (DT)" <eijk=>holmes.nl>



R

P
2
62

Honeypot 2 valuable information.

Honeypots worden vaak gebruikt voor het detecteren van nieuwe soorten aanvallen (zero-day) en zijn zo dus een bron van data. De data uit een honeypot kan vertaald worden naar waardevolle informatie wat ingezet kan worden om (pro)actief de security te verbeteren. De vertaalslag van data naar waardevolle informatie is een lastig probleem. Jij gaat onderzoeken hoe de data uit honeypots vertaald kan worden naar waardevolle informatie. Dit onderzoek wordt uitgevoerd in samenwerking met de Nederlandse Spoorwegen. Je voert een onderzoek uit naar verschillende manieren om de data uit honeypots te vertalen naar waardevolle informatie. Vervolgens kun je kijken naar hoe de informatie gebruikt kan worden om actief security aanvallen te voorkomen.



64

Empirical evaluation of parallel vs. distributed graph processing algorithms.

There are many graph algorithms that are tuned and modified to work on modern architectures. In the same time, lots of effort is put into implementing large scale systems for graph processing over clusters and clouds.

In this project, we aim to compare the differences between the algorithms and their performance when running on single-node architectures and tunning on distributed systems. Specifically, by selecting different types of graphs, we want to analyze the cases where single-node platforms outperform multiple-node ones (i.e., clusters). The basic implementations for different systems will be provided.

The following deliverables are requested from the student:
  1. a selection of 1-3 algorithms chosen for performance analysis.
  2. a comparative description of the algorithms and their implementation details for different platforms.
  3. a description of the selected datasets (at least 10) and their features.
  4. a detailed performance report covering all the platforms and graphs, with a focus on comparative analysis.
Ana Varbanescu <a.l.varbanescu=>uva.nl>




65

Automatic comparison of photo response non uniformity (PRNU) on Youtube.

Goal :
  • This project would like to compare the different files available on Youtube and compare the PRNU patterns in a fast way.
Approach :
  • The software for PRNU extraction and comparison is available at NFI, however the question is how we can process large numbers of video files from Youtube based on this method and limit the amount of data transferred
Result :
  • Report and demonstrator for this approach
Zeno Geradts (DT) <zeno=>holmes.nl>


68

Modelling IT security infrastructures to structure risk assessment during IT audits.

As part of annual accounts IT Audits are executed to gain assurance on the integrity of the information that forms the annual statement of accounts. This information is accessible from an application layer, but also from a database layer. An audit focusses on different parts of the infrastructure to get sufficient assurance on the integrity of information. Different parts of the infrastructure are dependent on each other and because of this there is correlation possible between the different layers.

This research project focusses on the correlation between different infrastructure layers and the automation of performing an IT audit. By making use of reporting tools like QlikView, we would like to create a PoC to verify if specific audit approaches can successfully be automated.
Coen Steenbeek <CSteenbeek=>deloitte.nl>
Derk Wieringa <DWieringa=>deloitte.nl>
Martijn Knuiman <MKnuiman=>deloitte.nl>



69

Efficient networking for clouds-on-a-chip.

The “Cloud” is a way to organize business where the owners of physical servers rent their resources to software companies to run their application as virtual machines. With the growing availability of multiple cores on a chip, it becomes interesting to rent different parts of a chip to different companies. In the near future, multiple virtual machines will co-exist and run simultaneously on larger and larger multi-core chips.
Meanwhile, the technology used to implement virtual machines on a chip is based on very old principles that were designed in the 1970's for single-processor systems, namely the use of shared memory to communicate data between processes running on the same processor.
As multi-core chip become prevalent, we can do better and use more modern techniques. In particular, the direct connections between cores on the chip can be used to implement a faster network than using the off-chip shared memory. This is what this project is about: demonstrate that direct use of on-chip networks yield better networking between VMs on the same chip than using shared memory.
The challenge in this project is that the on-chip network is programmatically different than "regular" network adapters like Ethernet, so we cannot use existing network stacks as-is.
The project candidate will thus need to explore the adaptation and simplification of an existing network stack to use on-chip networking.
The research should be carried out either on a current multi-core product or simulations of future many-core accelerators. Simulation technology will be provided as needed.

Raphael 'kena' Poss <r.poss=>uva.nl>
R

P
2
70

Secure on-chip protocols for clouds-on-a-chip.

The “Cloud” is a way to organize business where the owners of physical servers rent their resources to software companies to run their application as virtual machines. With the growing availability of multiple cores on a chip, it becomes interesting to rent different parts of a chip to different companies. In the near future, multiple virtual machines will co-exist and run simultaneously on larger and larger multi-core chips.
Meanwhile, the technology used to implement virtual machines on a chip is based on very old principles that were designed in the 1970's for single-processor systems, namely the virtualization of shared memory using virtual address translation within the core.
The problem with this old technique is that it assumes that the connection between cores is "secure". The physical memory accesses are communicated over the chip without any protection: if a VM running on core A exchanges data with off-chip memory, a VM running on core B that runs malicious code can exploit hardware errors or hardware design bugs to snoop and tamper with the traffic of core A.
To make Clouds-on-a-chip viable from a security perspective, further research is needed to harden the on-chip protocols, in  particular the protocols for accessing memory, virtual address translation and the routing of I/O data and interrupts.
The candidate for this project should perform a thorough analysis of the various on-chip protocols required to implement VMs on individual cores, then design protocol modifications that provide resistance against snooping and tampering by other cores on the same chip, together with an analysis of the corresponding overheads in hardware complexity and operating costs (extra network latencies and/or energy usage).
The research will be carried out in a simulation environment so that inspection of on-chip network traffic becomes possible. Simulation tools will be provided prior to the start of the project.
Raphael 'kena' Poss <r.poss=>uva.nl>

71

Multicast delivery of HTTP Adaptive Streaming.

HTTP Adaptive Streaming (e.g. MPEG DASH, Apple HLS, Microsoft Smooth Streaming) is responsible for an ever-increasing share of streaming video, replacing traditional streaming methods such as RTP and RTMP. The main characteristic of HTTP Adaptive Streaming is that it is based on the concept of splitting content up in numerous small chunks that are independently decodable. By sequentially requesting and receiving chunks, a client can recreate the content. An advantage of this mechanism is that it allows a client to seamlessly switch between different encodings (e.g. qualities) of the same content.
There is a growing interest from both content parties as well as operators and CDNs to not only be able to deliver these chunks over unicast via HTTP, but to also allow for them to be distributed using multicast. The question is how current multicast technologies could be used, or adapted, to achieve this goal.
Ray van Brandenburg <ray.vanbrandenburg=>tno.nl>



72

Building an open-source, flexible, large-scale static code analyzer.

Background information.

Data drives business, and maybe even the world. Businesses that make it their business to gather data are often aggregators of client­side generated data. Client­side generated data, however, is inherently untrustworthy. Malicious users can construct their data to exploit careless, or naive, programming and use this malicious, untrusted data to steal information or even take over systems.
It is no surprise that large companies such as Google, Facebook and Yahoo spend considerable resources in securing their own systems against would­be attackers. Generally, many methods have been developed to make untrusted data cross the trust­boundary to trusted data, and effectively make malicious data harmless. However, securing your systems against malicious data often requires expertise beyond what even skilled programmers might reasonably possess.

Problem description.

Ideally, tools that analyze code for vulnerabilities would be used to detect common security issues. Such tools, or static code analyzers, exist, but are either out­dated (http://rips­scanner.sourceforge.net/) or part of very expensive commercial packages (https://www.checkmarx.com/ and http://armorize.com/). Next to the need for an open­source alternative to the previously mentioned tools, we also need to look at increasing our scope. Rather than focusing on a single codebase, the tool would ideally be able to scan many remote, large­scale repositories and report the findings back in an easily accessible way.
An interesting target for this research would be very popular, open­source (at this stage) Content Management Systems (CMSs), and specifically plug­ins created for these CMSs. CMS cores are held to a very high coding standard and are often relatively secure. Plug­ins, however, are necessarily less so, but are generally as popular as the CMSs they’re created for. This is problematic, because an insecure plug­in is as dangerous as an insecure CMS. Experienced programmers and security experts generally audit the most popular plug­ins, but this is: a) very time­intensive, b) prone to errors and c) of limited scope, ie not every plug­in can be audited. For example, if it was feasible to audit all aspects of a CMS repository (CMS core and plug­ins), the DigiNotar debacle could have easily been avoided.

Research proposal.

Your research would consist of extending our proof­ of­ concept static code analyzer written in Python and using it to scan code repositories, possibly of some major CMSs and their plug­ins, for security issues and finding innovative ways of reporting on the massive amount of possible issues you are sure to find. Help others keep our data that little bit more safe.
Patrick Jagusiak <patrick.jagusiak=>dongit.nl>


79

Generating test images for forensic file system parsers.

Traditionally, forensic file system parsers (such as The Sleuthkit and the ones contained in Encase/FTK etc.) have been focused on extracting as much information as possible. The state of software in general is lamentable — new security vulnerabilities are found every day — and forensic software is not necessarily an exception. However, software bugs that affect the results used for convictions or acquittals in criminal court are especially damning. As evidence is increasingly being processed in large automated bulk analysis systems without intervention by forensic researchers, investigators unversed in the intricacies of forensic analysis of digital materials are presented with multifaceted results that may be incomplete, incorrect, imprecise, or any combination of these.

There are multiple stages in an automated forensic analysis. The file system parser is usually one of the earlier analysis phases, and errors (in the form of faulty or missing results) produced here will influence the results of the later stages of the investigation, and not always in a predictable or detectable manner. It is relatively easy (modulo programmer quality) to create strict parsers that bomb-out on any unexpected input. But real-world data is often not well-formed, and a parser may need to be able to resync with input data and resume on a best-effort basis after having reached some unexpected input in the format. While file system images are being (semi-) hand-generated to test parsers, when doing so, testers are severely limited by their imagination in coming up with edge cases and corner cases. We need a file system chaos monkey.

The assignment consists of one of the following (may also be spawned in a separate RP:
  1. Test image generator for NTFS. Think of it as some sort of fuzzer for forensic NTFS parsers. NTFS is a complex filesystem which offers interesting possibilities to trip a parser or trick it into yielding incorrect results. For this project, familiarity with C/C++ and the use of the Windows API is required (but only as much as is necessary to create function wrappers). The goal is to automatically produce "valid" — in the sense of "the bytes went by way of ntfs.sys" — but hopefully quite bizarre NTFS images.
  2. Another interesting research avenue lies in the production of /subtly illegal/ images. For instance, in FAT, it should be possible, in the data format, to double-book clusters (aking to a hard link). It may also be possible to create circular structures in some file systems. It will be interesting to see if and how forensic filesystem parsers deal with such errors.
"Wicher Minnaard (DT)" <wicher=>holmes.nl>
Zeno Geradts <zeno=>holmes.nl>




86

SMB Monitoring.

The SMB protocol is part of the core of any Windows environment, everything that is done is using some version of SMB(2/3), talking to each other. This network traffic is particularly interesting to monitor, as well as challenging. Events such as failed Windows network logins (lateral movement), massive file amendments (Cryptolocker/Encryptors) and information leakage (malicious internal employees) have meant that is becoming increasing important to be monitored.

Research questions:
How can SMB monitoring be used to catch a wide variety of (up and coming) threats?
Renato Fontana <renato.fontana=>fox-it.com>

87

Automated malware analysis.

Suspicious files can be uploaded to known websites (VirusTotal) so it can be analyze and flagged if malicious. At the same time that normal users can analyze their files, malware authors can use such repositories to verify if their malicious programs have been flagged as suspicious. A local environment capable of analyzing file samples at bulk would be a safe measure to keep new malware investigation undetectable. A well implemented analysis environment must be able to
mimic normal user activity and internet connectivity. Existing virtualization and tools used for the analysis must also be taken into account as most are detectable by malware.

Research questions:
  • How to reliably automate malware analysis in an isolated environment?
  • How to capture behavior and hijack communication requests to the "outside" ?
Renato Fontana <renato.fontana=>fox-it.com>

89

Malware obfuscation with package compression.

Signature matching and heuristics are known methods used by anti viruses to detect malicious programs. Even with such approaches, it's still not trivial to detect threats as malware authors make use of obfuscation methods. Encryption and compression are most often used to change the implementation details of existing malware and thus it no longer has a matching signature. Packers and wrappers are legitimate software for compression but one could also use it to obfuscate malware. Therefore, most recently released packers are flagged as default malicious by most AV companies.

Research questions:
  • How to detect the usage of new packers?
  • How to reduce false positives in the detection when using packers (differentiate malicious from clean) ?
Renato Fontana <renato.fontana=>fox-it.com>

90

"Smart" TTL IP blacklist.

Every blacklist entry has a TTL, much like a filter rule has a TTL. Why TTL? IP's are added to the blacklist and then they go nowhere. They stay there for an indefinitely amount of time, never to be purged. This project considers extending TTL once there is a match (wherever that might be). Involved parties that advertise blacklists usually have an approach that lists aren't maintainable. Such reasoning don't actually scale that well,  therefore using smart filtering(bloomfilters) and/or incremental listing turns to be a good measure to maintain IP blacklists.

Research questions:
  • Is it possible to come up with a smart blacklist additions checker to then add them and only them to the TTL blacklist?
  • How can a "TTL" blacklist be constructed, tested and put into production?
Renato Fontana <renato.fontana=>fox-it.com>